from:"Shay Banon \(JIRA\)"

[jira] [Created] (LUCENE-5793) Add equals/hashCode to FieldType

2014-06-30 Thread Shay Banon (JIRA)

Shay Banon created LUCENE-5793:
--

 Summary: Add equals/hashCode to FieldType
 Key: LUCENE-5793
 URL: https://issues.apache.org/jira/browse/LUCENE-5793
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Shay Banon


would be nice to have equals and hashCode to FieldType, so one can easily check 
if they are the same, and for example, reuse existing default implementations 
of it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-04-30 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986177#comment-13986177
 ] 

Shay Banon commented on LUCENE-5634:


this optimization has proven to help a lot in the context of ES, but we can use 
a static thread local since we are fully in control of the threading model. 
With Lucene itself, where it can be used in many different environment, then 
this can cause some unexpected behavior. For example, this might cause Tomcat 
to warn on leaking resources when unloading a war.

> Reuse TokenStream instances in Field
> 
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5516) Forward information that trigger a merge to MergeScheduler

2014-03-11 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930457#comment-13930457
 ] 

Shay Banon commented on LUCENE-5516:


+1, this looks great!. Exactly the info we would love to have to better control 
merges.

> Forward information that trigger a merge to MergeScheduler
> --
>
> Key: LUCENE-5516
> URL: https://issues.apache.org/jira/browse/LUCENE-5516
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.7
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5516.patch, LUCENE-5516.patch
>
>
> Today we pass information about the merge trigger to the merge policy. Yet, 
> no matter if the MP finds a merge or not we call the MergeScheduler who runs 
> & blocks even if we didn't find a merge. In some cases we don't even want 
> this to happen but inside the MergeScheduler we have no choice to opt out 
> since we don't know what triggered the merge. We should forward the infos we 
> have to the MergeScheduler as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5373) Lucene42DocValuesProducer.ramBytesUsed is over-estimated

2013-12-19 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853098#comment-13853098
 ] 

Shay Banon commented on LUCENE-5373:


as someone who found this issue, on top of the wrong computation, its also very 
expensive. This call should be lightweight and hopefully not use sizeOf at 
all... . At the very least, if possible, the result of it should be cached? 
Maybe even introduce size caching on a higher level (calling code) if possible.

> Lucene42DocValuesProducer.ramBytesUsed is over-estimated
> 
>
> Key: LUCENE-5373
> URL: https://issues.apache.org/jira/browse/LUCENE-5373
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
>
> Lucene42DocValuesProducer.ramBytesUsed uses 
> {{RamUsageEstimator.sizeOf(this)}} to return an estimation of the memory 
> usage. One of the issues (there might be other ones) is that this class has a 
> reference to an IndexInput that might link to other data-structures that we 
> wouldn't want to take into account. For example, index inputs of a 
> {{RAMDirectory}} all point to the directory itself, so 
> {{Lucene42DocValuesProducer.ramBytesUsed}} would return the amount of memory 
> used by the whole directory.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer

2013-07-04 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699892#comment-13699892
 ] 

Shay Banon commented on LUCENE-5086:


The Java version on the Mac is the latest one:

java version "1.6.0_51"
Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509)
Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode)

Regarding the catch, I think Throwable is the right exceptions to catch here. 
Catch all, who cares, you don't want a bug in the JVM that throws an unexpected 
runtime exception to cause Lucene to break the APP completely because its a 
static block, and I have been right there a few times. But if you feel 
differently, go ahead and change it to explicitly catch whats needed.

> RamUsageEstimator causes AWT classes to be loaded by calling 
> ManagementFactory#getPlatformMBeanServer
> -
>
> Key: LUCENE-5086
> URL: https://issues.apache.org/jira/browse/LUCENE-5086
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Shay Banon
>Assignee: Dawid Weiss
>
> Yea, that type of day and that type of title :).
> Since the last update of Java 6 on OS X, I started to see an annoying icon 
> pop up at the doc whenever running elasticsearch. By default, all of our 
> scripts add headless AWT flag so people will probably not encounter it, but, 
> it was strange that I saw it when before I didn't.
> I started to dig around, and saw that when RamUsageEstimator was being 
> loaded, it was causing AWT classes to be loaded. Further investigation showed 
> that actually for some reason, calling 
> ManagementFactory#getPlatformMBeanServer now with the new Java version causes 
> AWT classes to be loaded (at least on the mac, haven't tested on other 
> platforms yet). 
> There are several ways to try and solve it, for example, by identifying the 
> bug in the JVM itself, but I think that there should be a fix for it in 
> Lucene itself, specifically since there is no need to call 
> #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy 
> call...).
> Here is a simple call that will allow to get the hotspot mxbean without using 
> the #getPlatformMBeanServer method, and not causing it to be loaded and 
> loading all those nasty AWT classes:
> {code}
> Object getHotSpotMXBean() {
> try {
> // Java 6
> Class sunMF = Class.forName("sun.management.ManagementFactory");
> return sunMF.getMethod("getDiagnosticMXBean").invoke(null);
> } catch (Throwable t) {
> // ignore
> }
> // potentially Java 7
> try {
> return ManagementFactory.class.getMethod("getPlatformMXBean", 
> Class.class).invoke(null, 
> Class.forName("com.sun.management.HotSpotDiagnosticMXBean"));
> } catch (Throwable t) {
> // ignore
> }
> return null;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer

2013-07-01 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-5086:
---

Description: 
Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

{code}
Object getHotSpotMXBean() {
Object hotSpotBean = null;
try {
// Java 6
Class sunMF = Class.forName("sun.management.ManagementFactory");
return sunMF.getMethod("getDiagnosticMXBean").invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod("getPlatformMXBean", 
Class.class).invoke(null, 
Class.forName("com.sun.management.HotSpotDiagnosticMXBean"));
} catch (Throwable t) {
// ignore
}
return null;
}
{code}

  was:
Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

[code]
Object getHotSpotMXBean() {
Object hotSpotBean = null;
try {
// Java 6
Class sunMF = Class.forName("sun.management.ManagementFactory");
return sunMF.getMethod("getDiagnosticMXBean").invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod("getPlatformMXBean", 
Class.class).invoke(null, 
Class.forName("com.sun.management.HotSpotDiagnosticMXBean"));
} catch (Throwable t) {
// ignore
}
return null;
}
[/code]


> RamUsageEstimator causes AWT classes to be loaded by calling 
> ManagementFactory#getPlatformMBeanServer
> -
>
> Key: LUCENE-5086
> URL: https://issues.apache.org/jira/browse/LUCENE-5086
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Shay Banon
>
> Yea, that type of day and that type of title :).
> Since the last update of Java 6 on OS X, I started to see an annoying icon 
> pop up at the doc whenever running elasticsearch. By default, all of our 
> scripts add headless AWT flag so people will probably not encounter it, but, 
> it was strange that I saw it when before I didn't.
> I started to dig around, and saw that when RamUsageEstimator was being 
> loaded, it was causing AWT classes to be loaded. Further investigation showed 
> that actually for some reason, calling 
> ManagementFactory#getPlatformMBeanServer now with the new Java version causes 
> AWT classes to be loaded (at least on the mac, haven't tested on other 
> platforms yet). 
> There are several ways to try and solv

[jira] [Updated] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer

2013-07-01 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-5086:
---

Description: 
Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

{code}
Object getHotSpotMXBean() {
try {
// Java 6
Class sunMF = Class.forName("sun.management.ManagementFactory");
return sunMF.getMethod("getDiagnosticMXBean").invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod("getPlatformMXBean", 
Class.class).invoke(null, 
Class.forName("com.sun.management.HotSpotDiagnosticMXBean"));
} catch (Throwable t) {
// ignore
}
return null;
}
{code}

  was:
Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

{code}
Object getHotSpotMXBean() {
Object hotSpotBean = null;
try {
// Java 6
Class sunMF = Class.forName("sun.management.ManagementFactory");
return sunMF.getMethod("getDiagnosticMXBean").invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod("getPlatformMXBean", 
Class.class).invoke(null, 
Class.forName("com.sun.management.HotSpotDiagnosticMXBean"));
} catch (Throwable t) {
// ignore
}
return null;
}
{code}


> RamUsageEstimator causes AWT classes to be loaded by calling 
> ManagementFactory#getPlatformMBeanServer
> -
>
> Key: LUCENE-5086
> URL: https://issues.apache.org/jira/browse/LUCENE-5086
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Shay Banon
>
> Yea, that type of day and that type of title :).
> Since the last update of Java 6 on OS X, I started to see an annoying icon 
> pop up at the doc whenever running elasticsearch. By default, all of our 
> scripts add headless AWT flag so people will probably not encounter it, but, 
> it was strange that I saw it when before I didn't.
> I started to dig around, and saw that when RamUsageEstimator was being 
> loaded, it was causing AWT classes to be loaded. Further investigation showed 
> that actually for some reason, calling 
> ManagementFactory#getPlatformMBeanServer now with the new Java version causes 
> AWT classes to be loaded (at least on the mac, haven't tested on other 
> platforms yet). 
> There are several ways to try and solve it, for example, by identifying th

[jira] [Created] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer

2013-07-01 Thread Shay Banon (JIRA)

Shay Banon created LUCENE-5086:
--

 Summary: RamUsageEstimator causes AWT classes to be loaded by 
calling ManagementFactory#getPlatformMBeanServer
 Key: LUCENE-5086
 URL: https://issues.apache.org/jira/browse/LUCENE-5086
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Shay Banon


Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

[code]
Object getHotSpotMXBean() {
Object hotSpotBean = null;
try {
// Java 6
Class sunMF = Class.forName("sun.management.ManagementFactory");
return sunMF.getMethod("getDiagnosticMXBean").invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod("getPlatformMXBean", 
Class.class).invoke(null, 
Class.forName("com.sun.management.HotSpotDiagnosticMXBean"));
} catch (Throwable t) {
// ignore
}
return null;
}
[/code]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4472) Add setting that prevents merging on updateDocument

2012-10-10 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473617#comment-13473617
 ] 

Shay Banon commented on LUCENE-4472:


Agree with Robert on the additional context flag, that would make things most 
flexible. A flag on IW makes things simpler from the user perspective though, 
cause then there is no need to customize the built in merge policies.

> Add setting that prevents merging on updateDocument
> ---
>
> Key: LUCENE-4472
> URL: https://issues.apache.org/jira/browse/LUCENE-4472
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4472.patch
>
>
> Currently we always call maybeMerge if a segment was flushed after 
> updateDocument. Some apps and in particular ElasticSearch uses some hacky 
> workarounds to disable that ie for merge throttling. It should be easier to 
> enable this kind of behavior. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and "global" cross indices control

2011-09-09 Thread Shay Banon (JIRA)

NRT Caching Dir to allow for exact memory usage, better buffer allocation and 
"global" cross indices control


 Key: LUCENE-3425
 URL: https://issues.apache.org/jira/browse/LUCENE-3425
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Shay Banon


A discussion on IRC raised several improvements that can be made to NRT caching 
dir. Some of the problems it currently has are:

1. Not explicitly controlling the memory usage, which can result in overusing 
memory (for example, large new segments being committed because refreshing is 
too far behind).
2. Heap fragmentation because of constant allocation of (probably promoted to 
old gen) byte buffers.
3. Not being able to control the memory usage across indices for multi index 
usage within a single JVM.

A suggested solution (which still needs to be ironed out) is to have a 
BufferAllocator that controls allocation of byte[], and allow to return unused 
byte[] to it. It will have a cap on the size of memory it allows to be 
allocated.

The NRT caching dir will use the allocator, which can either be provided (for 
usage across several indices) or created internally. The caching dir will also 
create a wrapped IndexOutput, that will flush to the main dir if the allocator 
can no longer provide byte[] (exhausted).

When a file is "flushed" from the cache to the main directory, it will return 
all the currently allocated byte[] to the BufferAllocator to be reused by other 
"files".


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-08 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3416:
---

Attachment: LUCENE-3416.patch

A new patch, remove synchronization. It also adds another field to RateLimiter 
to record the original mbPerSec value set, so we can easily get it back.

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
>Assignee: Simon Willnauer
> Attachments: LUCENE-3416.patch, LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-07 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099287#comment-13099287
 ] 

Shay Banon commented on LUCENE-3416:


I must say that I am at a lost in trying to understand why we need this 
"optimization", but it does not really matter to me as long as the ability to 
set the rate limiter instance gets in.

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
>Assignee: Simon Willnauer
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-07 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099233#comment-13099233
 ] 

Shay Banon commented on LUCENE-3416:


I agree with Mike, I think it should remain synchronized, it does safeguard 
concurrently calling setMaxMergeWriteMBPerSec from falling over itself (who 
"wins" the call is not really relevant). Since thats synchronized, the metod I 
added should be as well. Personally, I really don't think there is a need to 
make it thread safe without "blocking", since calling the "setters" is not 
something people do frequently at all, so the optimization is mute, and it will 
complicate the code.

As for making mergeWriteRateLimiter volatile, it can be done. Though, in 
practice, there really is no need to do it (there is a memory barrier when 
reading it before). But, I think that should go in a different issue? Just to 
keep changes clean and isolated?


> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
>Assignee: Simon Willnauer
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-07 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099160#comment-13099160
 ] 

Shay Banon commented on LUCENE-3416:


> this make no sense to me. If you don't want to set this concurrently how does 
> a lock protect you from this? I mean you if you have two threads accessing 
> this you have either A B or B A. but this would happen without a lock too. if 
> you want to have the changes to take effect immediately you need to either 
> lock on each read on this var or make it volatile which is almost equivalent 
> (a mem barrier).

No, thats not correct. setMaxMergeWriteMBPerSec (not the method I added, the 
other one) is a complex method, and I think Mike wanted to protect from two 
threads setting the value concurrently. As for reading the value, I think Mike 
logic was that its not that importnat the have "immediate" visibility of the 
change to require a volatile field (which is understandable). So, since 
setMaxMergeWriteMBPerSec is synchronized, the method added in this patch has to 
be as well.

> My concern here was related to make this var volatile which would be a 
> cacheline invalidation each time you read the var. I think we should get rid 
> of the synchronized.

Reading a volatile var in x86 is not a cache invalidation, though it does come 
with a cost. Its not relevant here based on what I explained before (and second 
guessing Mike :) )

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
>Assignee: Simon Willnauer
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-07 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099146#comment-13099146
 ] 

Shay Banon commented on LUCENE-3416:


The only reason its synchronized is because the setMaxMergeWriteMBPerSec method 
is synchronized (I guess to protected from setting the rate limit 
concurrently). In practice, I don't see users changing it that often, so 
concerns about cache lines are not really relevant.

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
>Assignee: Simon Willnauer
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098018#comment-13098018
 ] 

Shay Banon commented on LUCENE-3416:


It is possible, but requires more work to do, and depends on overriding the 
createOutput method (as well as all the other methods in Directory). If rate 
limiting makes sense on the directory level to be exposed as a "feature", I 
think that this small change allows for greater control over it.

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3416:
---

Attachment: LUCENE-3416.patch

> Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
> limit merge IO across several directories / instances
> --
>
> Key: LUCENE-3416
> URL: https://issues.apache.org/jira/browse/LUCENE-3416
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Shay Banon
> Attachments: LUCENE-3416.patch
>
>
> This can come in handy when running several Lucene indices in the same VM, 
> and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)

Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit 
merge IO across several directories / instances
--

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon


This can come in handy when running several Lucene indices in the same VM, and 
wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3335) jrebug causes porter stemmer to sigsegv

2011-08-01 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076053#comment-13076053
 ] 

Shay Banon commented on LUCENE-3335:


@Uwe I actually forgot about this, and did not think it was because of the 
porter stemmer at the time, especially since I did try and reproduce it and 
never managed to (I thought it was coincidence it crashed there). From my 
experience, you get very little help from sun/oracle when using unorthodox 
flags like agressive opts without proper recreation. Well, you get very little 
help there even when you do produce recreation... (see this issue that I opened 
for example: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129) . I am 
the reason behind Lucene 1.9.1 release with the major bug in buffering 
introduced in 1.9 way back in the days, do you really think I would not contact 
if I thought there really was a problem associated with Lucene?

> jrebug causes porter stemmer to sigsegv
> ---
>
> Key: LUCENE-3335
> URL: https://issues.apache.org/jira/browse/LUCENE-3335
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 1.9, 1.9.1, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 
> 2.4.1, 2.9, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2, 
> 3.3, 3.4, 4.0
> Environment: - JDK 7 Preview Release, GA (may also affect update _1, 
> targeted fix is JDK 1.7.0_2)
> - JDK 1.6.0_20+ with -XX:+OptimizeStringConcat or -XX:+AggressiveOpts
>Reporter: Robert Muir
>Assignee: Robert Muir
>  Labels: Java7
> Attachments: LUCENE-3335.patch, LUCENE-3335_slow.patch, 
> patch-0uwe.patch
>
>
> happens easily on java7: ant test -Dtestcase=TestPorterStemFilter 
> -Dtests.iter=100
> might happen on 1.6.0_u26 too, a user reported something that looks like the 
> same bug already:
> http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-29 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072979#comment-13072979
 ] 

Shay Banon commented on LUCENE-3282:


Hi, sorry for the late response, I the comment.

Yea, I agree that there will be false positives, but thats the idea of it 
(sometimes you want to run facets for example on "sub queries"). Btw, I got 
your point on advance, do you think if a collector exists, then advance should 
be implemented by iterating over all docs up to the provided doc to advance to.

Regarding the wrapper, interesting!. I need to have a look at how to generalize 
it, but it should be simple, I think, I'll try and work on it.

> BlockJoinQuery: Allow to add a custom child collector, and customize the 
> parent bitset extraction
> -
>
> Key: LUCENE-3282
> URL: https://issues.apache.org/jira/browse/LUCENE-3282
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 3.4, 4.0
>Reporter: Shay Banon
> Attachments: LUCENE-3282.patch, LUCENE-3282.patch
>
>
> It would be nice to allow to add a custom child collector to the 
> BlockJoinQuery to be called on every matching doc (so we can do things with 
> it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
> custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-16 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066536#comment-13066536
 ] 

Shay Banon commented on LUCENE-3282:


The idea of this is to collect matching child docs regardless of what matches 
parent wise, and yea, we might miss some depending on the type of query that is 
actually "wrapping" it, but I think its still useful.

> BlockJoinQuery: Allow to add a custom child collector, and customize the 
> parent bitset extraction
> -
>
> Key: LUCENE-3282
> URL: https://issues.apache.org/jira/browse/LUCENE-3282
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 3.4, 4.0
>Reporter: Shay Banon
> Attachments: LUCENE-3282.patch, LUCENE-3282.patch
>
>
> It would be nice to allow to add a custom child collector to the 
> BlockJoinQuery to be called on every matching doc (so we can do things with 
> it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
> custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-11 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3282:
---

Attachment: LUCENE-3282.patch

New version, with CollectorProvider.

> BlockJoinQuery: Allow to add a custom child collector, and customize the 
> parent bitset extraction
> -
>
> Key: LUCENE-3282
> URL: https://issues.apache.org/jira/browse/LUCENE-3282
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 3.4, 4.0
>Reporter: Shay Banon
> Attachments: LUCENE-3282.patch, LUCENE-3282.patch
>
>
> It would be nice to allow to add a custom child collector to the 
> BlockJoinQuery to be called on every matching doc (so we can do things with 
> it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
> custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-11 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063619#comment-13063619
 ] 

Shay Banon commented on LUCENE-3282:


Heya,

   In my app, I have a wrapper around OBS, that has a common interface that 
allows to access bits by index (similar to Bits in trunk), so I need to extract 
from it the OBS.

   Regarding the Collector, I will work on CollectorProvider interface. I liked 
the NoOpCollector option since then you don't have to check for nulls each 
time...

> BlockJoinQuery: Allow to add a custom child collector, and customize the 
> parent bitset extraction
> -
>
> Key: LUCENE-3282
> URL: https://issues.apache.org/jira/browse/LUCENE-3282
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 3.4, 4.0
>Reporter: Shay Banon
> Attachments: LUCENE-3282.patch
>
>
> It would be nice to allow to add a custom child collector to the 
> BlockJoinQuery to be called on every matching doc (so we can do things with 
> it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
> custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-06 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3282:
---

Attachment: LUCENE-3282.patch

> BlockJoinQuery: Allow to add a custom child collector, and customize the 
> parent bitset extraction
> -
>
> Key: LUCENE-3282
> URL: https://issues.apache.org/jira/browse/LUCENE-3282
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 3.4, 4.0
>Reporter: Shay Banon
> Attachments: LUCENE-3282.patch
>
>
> It would be nice to allow to add a custom child collector to the 
> BlockJoinQuery to be called on every matching doc (so we can do things with 
> it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
> custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-06 Thread Shay Banon (JIRA)

BlockJoinQuery: Allow to add a custom child collector, and customize the parent 
bitset extraction
-

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon


It would be nice to allow to add a custom child collector to the BlockJoinQuery 
to be called on every matching doc (so we can do things with it, like counts 
and such). Also, allow to extend BlockJoinQuery to have a custom code that 
converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-13 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006214#comment-13006214
 ] 

Shay Banon commented on LUCENE-2960:


Just a note regarding the IWC and being able to consult it for live changes, it 
feels strange to me that settings something on the config will affect the IW in 
real time. Maybe its just me, but it feels nicer to have the "live" setters on 
IW compared to IWC.

I also like the ability to decouple construction time configuration through 
IWC, and live settings through setters on IW. It is then very clear what can be 
set on construction time, and what can be set on a live IW. It also allows for 
compile time / static check for the code what can be changed at what lifecycle 
phase.

> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-12 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006049#comment-13006049
 ] 

Shay Banon commented on LUCENE-2960:


Heya,

   If I had to choose between being able to change things in real time to 
better concurrency thanks to immutability, I would definitely go with better 
concurrency. I have no problems with closing the writers and reopening them, 
though, as Mike said, this can come with a big cost.

   The funny thing is that a lot of the setters that were already there on the 
IndexWriter are still exposed, basically, through settings on the relevant 
MergePolicy, so I don't think we are talking about that many setter to begin 
with (I don't think we should bring those back to the IndexWriter).

   I think that the notion of IWC is a good one, and should remain, but only to 
provide construction time parameters to IW. It should not be consulted once the 
construction phase of IW is done. If explicit real time parameters are to be 
set, then IW should expose it as a setter. Now, the question is which, if any, 
setters should be exposed.

   Going through the list of current setters on IW, my vote is for the 
setRAMBufferSizeMB one. I am not sure that its that obscure use case. I believe 
Solr for example has a notion of cores (or something like that), so it can also 
be adaptive in terms of indexing buffer size dependent on the number of cores 
running in the VM. Also, one can easily run a system where it does bulk 
indexing, and then lowers the indexing buffer size for more "streamline" work. 
Its just a shame to close the writer for that (and having to pause all indexing 
work while this happens).

   The term interval and divisor, I agree, are such obscure (funnily, I use the 
divisor quite a lot), that closing the writer and opening it again make sense.

> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-10 Thread Shay Banon (JIRA)

Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
--

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
 Fix For: 3.2, 4.0


In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
It would be great to be able to control that on a live IndexWriter. Other 
possible two methods that would be great to bring back are setTermIndexInterval 
and setReaderTermsIndexDivisor. Most of the other setters can actually be set 
on the MergePolicy itself, so no need for setters for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-26 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2474:
---

Attachment: MapBackedSet.java

A MapBackedSet implementation, that can wrap a CHM to have a concurrent set 
implementation. We can consider using that instead of sync set and copy on read 
when notifying listeners.


> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey)
> 
>
> Key: LUCENE-2474
> URL: https://issues.apache.org/jira/browse/LUCENE-2474
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shay Banon
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2474.patch, 
> LUCENE-2574.patch, MapBackedSet.java
>
>
> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey).
> A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
> make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
> even Lucene itself uses it, for example, with the CachingWrapperFilter. 
> FieldCache enjoys being called explicitly to purge its cache when possible 
> (which is tricky to know from the "outside", especially when using NRT - 
> reader attack of the clones).
> The provided patch allows to plug a CacheEvictionListener which will be 
> called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984263#action_12984263
 ] 

Shay Banon commented on LUCENE-2871:


Agreed Earwin, lets first see if it make sense, this is just an experiment and 
might not make sense for single threaded writes.

> Use FileChannel in FSDirectory
> --
>
> Key: LUCENE-2871
> URL: https://issues.apache.org/jira/browse/LUCENE-2871
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Shay Banon
> Attachments: LUCENE-2871.patch, LUCENE-2871.patch
>
>
> Explore using FileChannel in FSDirectory to see if it improves write 
> operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984206#action_12984206
 ] 

Shay Banon commented on LUCENE-2871:


bq. Looking at the current patch, the class seems wrong. In my opinion, this 
should be only in NIOFSDirectory. SimpleFSDir should only use RAF.

Its a good question, not sure what to do with it. Here is the problem. The 
channel output can be used with all 3 FS dirs (simple, nio, and mmap), and 
actually might make sense to be used even with SimpleFS (i.e. using non nio to 
read, but use file channel to write). In order to have all of them supported, 
currently, the simplest way is to put it in the base class so code will be 
shared. On IRC, a discussion was made to externalize the outputs and inputs, 
and then one can more easily pick and choose, but I think this will belong on a 
different patch.

> Use FileChannel in FSDirectory
> --
>
> Key: LUCENE-2871
> URL: https://issues.apache.org/jira/browse/LUCENE-2871
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Shay Banon
> Attachments: LUCENE-2871.patch, LUCENE-2871.patch
>
>
> Explore using FileChannel in FSDirectory to see if it improves write 
> operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2871:
---

Attachment: LUCENE-2871.patch

Fixed Buffer Overflow exception (I hope, can't really recreate it, mike 
can...). Also, per the IRC discussion, made SimpleFSDirectory default to not 
use the file channel output, while NIO and MMap default to use it. One can 
still control if it will be used or not using the setter method.

> Use FileChannel in FSDirectory
> --
>
> Key: LUCENE-2871
> URL: https://issues.apache.org/jira/browse/LUCENE-2871
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Shay Banon
> Attachments: LUCENE-2871.patch, LUCENE-2871.patch
>
>
> Explore using FileChannel in FSDirectory to see if it improves write 
> operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984134#action_12984134
 ] 

Shay Banon commented on LUCENE-2871:


Strange, did not get it when running the tests, will try and find out why it 
can happen.

> Use FileChannel in FSDirectory
> --
>
> Key: LUCENE-2871
> URL: https://issues.apache.org/jira/browse/LUCENE-2871
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Shay Banon
> Attachments: LUCENE-2871.patch
>
>
> Explore using FileChannel in FSDirectory to see if it improves write 
> operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-17 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982695#action_12982695
 ] 

Shay Banon commented on LUCENE-2474:


Yea, I got the reasoning for Set, we can use that, CHM with PRESENT. If you 
want, I can attach a simple MapBackedSet that makes any Map a Set.

Still, I think that using CopyOnWriteArrayList is best here. I don't think that 
adding and removing listeners is something that will be done often in an app. 
But I might be mistaken. In this case, traversal over listeners is much better 
on CopyOnWriteArrayList compared to CHM.


> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey)
> 
>
> Key: LUCENE-2474
> URL: https://issues.apache.org/jira/browse/LUCENE-2474
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shay Banon
> Attachments: LUCENE-2474.patch, LUCENE-2474.patch
>
>
> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey).
> A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
> make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
> even Lucene itself uses it, for example, with the CachingWrapperFilter. 
> FieldCache enjoys being called explicitly to purge its cache when possible 
> (which is tricky to know from the "outside", especially when using NRT - 
> reader attack of the clones).
> The provided patch allows to plug a CacheEvictionListener which will be 
> called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-16 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982509#action_12982509
 ] 

Shay Banon commented on LUCENE-2474:


bq. OK, here's a patch exposing the readerFinishedListeners as static methods 
on IndexReader.

I think we should use a CopyOneWriteArrayList so calling the listeners will not 
happen under a global synchronize block. If maintaining set behavior is 
required, then I can patch with a ConcurrentHashSet implementation or we can 
simply replace it with a CHM with PRESENT, or any other solution that does not 
require calling the listeners under a global sync block.

> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey)
> 
>
> Key: LUCENE-2474
> URL: https://issues.apache.org/jira/browse/LUCENE-2474
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shay Banon
> Attachments: LUCENE-2474.patch, LUCENE-2474.patch
>
>
> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey).
> A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
> make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
> even Lucene itself uses it, for example, with the CachingWrapperFilter. 
> FieldCache enjoys being called explicitly to purge its cache when possible 
> (which is tricky to know from the "outside", especially when using NRT - 
> reader attack of the clones).
> The provided patch allows to plug a CacheEvictionListener which will be 
> called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-16 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2871:
---

Attachment: LUCENE-2871.patch

Patch supporting using file channel to write. FSDirectory still retain the 
ability to use RAF for writes.

FSDirectory#setUseChannelOutput: Allows to revert back to use RAF by setting to 
false.
FSDirectory#setCacheChannelBuffers: Allow to control if, when using file 
channel, buffers should be cached.

> Use FileChannel in FSDirectory
> --
>
> Key: LUCENE-2871
> URL: https://issues.apache.org/jira/browse/LUCENE-2871
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Shay Banon
> Attachments: LUCENE-2871.patch
>
>
> Explore using FileChannel in FSDirectory to see if it improves write 
> operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-16 Thread Shay Banon (JIRA)

Use FileChannel in FSDirectory
--

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon


Explore using FileChannel in FSDirectory to see if it improves write operations 
performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-07 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978975#action_12978975
 ] 

Shay Banon commented on LUCENE-2474:


bq. But: I think we'd want to have composite reader just forward the 
registration down to the atomic readers? (And, forward on reopen).

I am not sure that you would want to do it. Any caching layer or an external 
component that is properly written would work on the low level segment readers, 
it will not even compile against compound readers. This will help direct people 
to write proper code and dealing only with segment readers.

> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey)
> 
>
> Key: LUCENE-2474
> URL: https://issues.apache.org/jira/browse/LUCENE-2474
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shay Banon
> Attachments: LUCENE-2474.patch
>
>
> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey).
> A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
> make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
> even Lucene itself uses it, for example, with the CachingWrapperFilter. 
> FieldCache enjoys being called explicitly to purge its cache when possible 
> (which is tricky to know from the "outside", especially when using NRT - 
> reader attack of the clones).
> The provided patch allows to plug a CacheEvictionListener which will be 
> called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-07 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978719#action_12978719
 ] 

Shay Banon commented on LUCENE-2474:


> It would be a cache of anything... one element of that cache would be the 
> FieldCache, there could be one for filters, or one entry per-filter.
> edit: Maybe a better way to think about it is like a ServletContext or 
> something - it's just a way to attach anything arbitrary to a reader.

Got you. My personal taste is to try and keep those readers as lightweight as 
possible, and have the proper constructs in place to allow to externally use 
them for caching, without having them manage it as well.

> Not with this current patch, as there is no mechanism to get a callback when 
> you do care about deletes. If I want to cache something that depends on 
> deletions, I want to purge that cache when the actual reader is closed (as 
> opposed to the reader's core cache key that is shared amongst all readers 
> that just have different deletions). So if we go a "close event" route, we 
> really want two different events... one for the close of a reader (i.e. 
> deleted matter), and one for the close of the segment (deletes don't matter).

I think that a cache that is affected by deletes is a problematic cache to 
begin with, so was thinking that maybe it should be discouraged by not allowing 
for it. Especially with NRT. My idea was to simply expand the purge capability 
that the FC gets for free to other external custom components.

Also, if we did have a type safe separation between segment readers and 
compound readers, I would not have added the ability to register a listener on 
the compound readers, just the segment readers, as this will encourage people 
to write caches that only work on segment readers (since the registration for 
the "purge event" will happen within the cache, and it should work only with 
segment readers). That was why my patch does not take compound readers into 
account.




> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey)
> 
>
> Key: LUCENE-2474
> URL: https://issues.apache.org/jira/browse/LUCENE-2474
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shay Banon
> Attachments: LUCENE-2474.patch
>
>
> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey).
> A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
> make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
> even Lucene itself uses it, for example, with the CachingWrapperFilter. 
> FieldCache enjoys being called explicitly to purge its cache when possible 
> (which is tricky to know from the "outside", especially when using NRT - 
> reader attack of the clones).
> The provided patch allows to plug a CacheEvictionListener which will be 
> called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-06 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978480#action_12978480
 ] 

Shay Banon commented on LUCENE-2474:


Right, I was thinking that its a low level API that you can just add it to the 
low level readers, but I agree, it will be nicer to have it on the high level 
as well. Regarding the close method name, I guess we can name it similar to the 
FieldCache one, maybe purge?

> We've talked before about putting caches directly on the readers - that still 
> seems like the most straightforward approach?

not sure I understand that. Do you mean getting FieldCache into the readers? 
And then what about cached filters? And other custom caching constructs that 
rely on the same mechanism as the CachingWrapperFilter? 

I think that if one implements such caching, its an advance enough feature 
where you should know how to handle deletes and other tidbits (if you need to).

> We really need one cache that doesn't care about deletions, and one cache 
> that does.

Isn't that up to the cache to decide? That cache can be anything (internally 
implemented in Lucene or externally) that follows the mechanism of caching 
based on (segment) readers. As long as there are constructs to get the deleted 
docs to handle deletes (for example), then the implementation can use it.



> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey)
> 
>
> Key: LUCENE-2474
> URL: https://issues.apache.org/jira/browse/LUCENE-2474
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shay Banon
> Attachments: LUCENE-2474.patch
>
>
> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey).
> A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
> make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
> even Lucene itself uses it, for example, with the CachingWrapperFilter. 
> FieldCache enjoys being called explicitly to purge its cache when possible 
> (which is tricky to know from the "outside", especially when using NRT - 
> reader attack of the clones).
> The provided patch allows to plug a CacheEvictionListener which will be 
> called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap

2010-12-23 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2292:
---

Attachment: LUCENE-2292.patch

A fixed path that now passes all tests using the byte buffer directory.

Also, includes refactoring into a different package (store.bytebuffer), and 
includes a custom ByteBufferAllocator interface that can control how buffers 
are allocated, including plain and caching implementations.

> ByteBuffer Directory - allowing to store the index outside the heap
> ---
>
> Key: LUCENE-2292
> URL: https://issues.apache.org/jira/browse/LUCENE-2292
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Shay Banon
> Attachments: LUCENE-2292.patch, LUCENE-2292.patch, LUCENE-2292.patch
>
>
> A byte buffer based directory with the benefit of being able to create direct 
> byte buffer thus storing the index outside the JVM heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

2010-12-02 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966376#action_12966376
 ] 

Shay Banon commented on LUCENE-2779:


  If the assumption still stands that an IndexInput will not be opened on a 
"writing" / unclosed IndexOutput, then RAMFile can also be improved when it 
comes to concurrency. The RAMOutputStream can maintain its own list of buffers 
(simple array list, no need to sync), and only when it gets closed, initialize 
the respective RAMFile with the list. This means most of the synchronize 
aspects of RAMFile can be removed. Also, on RAMFile, lastModified can be made 
volatile, and remove the sync on its respective methods.

> Use ConcurrentHashMap in RAMDirectory
> -
>
> Key: LUCENE-2779
> URL: https://issues.apache.org/jira/browse/LUCENE-2779
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, 
> LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to 
> map of RAMFiles, in addition to updating the sizeInBytes member. In many 
> places the sync is done for 'read' purposes, while only in few places we need 
> 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong 
> ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2773) Don't create compound file for large segments by default

2010-11-24 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935299#action_12935299
 ] 

Shay Banon commented on LUCENE-2773:


Mike, are you sure regarding the default maxMergeMB set to 2gb? This ia a big 
change in default behavior. For systems that do updates (deletes) we are 
covered because they are taken (partially) into account when computing the 
segment size. But, lets say you have a 100gb size index, you will end up with 
50 segments, no?

> Don't create compound file for large segments by default
> 
>
> Key: LUCENE-2773
> URL: https://issues.apache.org/jira/browse/LUCENE-2773
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 2.9.4, 3.0.3, 3.1, 4.0
>
> Attachments: LUCENE-2773.patch
>
>
> Spinoff from LUCENE-2762.
> CFS is useful for keeping the open file count down.  But, it costs
> some added time during indexing to build, and also ties up temporary
> disk space, causing eg a large spike on the final merge of an
> optimize.
> Since MergePolicy dictates which segments should be CFS, we can
> change it to only build CFS for "smallish" merges.
> I think we should also set a maxMergeMB by default so that very large
> merges aren't done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1536) if a filter can support random access API, we should use it

2010-11-05 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928859#action_12928859
 ] 

Shay Banon commented on LUCENE-1536:


Hi Mike,

   Wondering what are your thoughts on fixing filters correctly are? I think 
that the initial thought of getting filters all the way down to postings 
enumeration if they support random access is a great one. A random access doc 
id set can be added (interface), and if a filter returns it (can be checked 
using instanceof), then the that doc set can be passed all the way to the 
enumeration (and intersected per doc with the deleted docs).

   I think that any type of solution should support the great feature of Lucene 
queries, for example, FilteredQuery should use that, allowing to build complex 
query expressions without having the mentioned optimization only applied on the 
top level search.

   As most filters results do support random access, either because they use 
OpenBitSet, or because they are built on top of FieldCache functionality, I 
think this feature will give great speed improvements to the query execution 
time.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2010-09-24 Thread Shay Banon (JIRA)

ArrayIndexOutOfBoundsException when iterating over TermDocs
---

 Key: LUCENE-2666
 URL: https://issues.apache.org/jira/browse/LUCENE-2666
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.2
Reporter: Shay Banon


A user got this very strange exception, and I managed to get the index that it 
happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I 
easily reproduced it using the FieldCache which does exactly that (the field in 
question is indexed as numeric). Here is the exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
at 
org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
at 
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
at TestMe.main(TestMe.java:56)

It happens on the following segment: _26t docCount: 914 delCount: 1 
delFileName: _26t_1.del

And as you can see, it smells like a corner case (it fails for document number 
912, the AIOOB happens from the deleted docs). The code to recreate it is 
simple:

FSDirectory dir = FSDirectory.open(new File("index"));
IndexReader reader = IndexReader.open(dir, true);

IndexReader[] subReaders = reader.getSequentialSubReaders();
for (IndexReader subReader : subReaders) {
Field field = 
subReader.getClass().getSuperclass().getDeclaredField("si");
field.setAccessible(true);
SegmentInfo si = (SegmentInfo) field.get(subReader);
System.out.println("--> " + si);
if (si.getDocStoreSegment().contains("_26t")) {
// this is the probleatic one...
System.out.println("problematic one...");
FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
FieldCache.NUMERIC_UTILS_LONG_PARSER);
}
}

Here is the result of a check index on that segment:

  8 of 10: name=_26t docCount=914
compound=true
hasProx=true
numFiles=2
size (MB)=1.641
diagnostics = {optimize=false, mergeFactor=10, 
os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, 
java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_26t_1.del]
test: open reader.OK [1 deleted docs]
test: fields..OK [32 fields]
test: field norms.OK [32 fields]
test: terms, freq, prox...ERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
at 
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
at TestMe.main(TestMe.java:47)
test: stored fields...ERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
at 
org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
at TestMe.main(TestMe.java:47)
test: term vectorsERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
at 
org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
at TestMe.main(TestMe.java:47)



The creation of the index does not do something fancy (all defaults), though 
there is usage of the near real time aspect (IndexWriter#getReader) which does 
complicate deleted docs handling. Seems like the deleted docs got written 
without matching the number of docs?. Sadly, I don't have something that 
recreates it from scratch, but I do have the index if someone want to have a 
look at it (mail me directly and I will provide a download link).

I wil

[jira] Commented: (LUCENE-2161) Some concurrency improvements for NRT

2010-06-01 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874454#action_12874454
 ] 

Shay Banon commented on LUCENE-2161:


Thanks!

> Some concurrency improvements for NRT
> -
>
> Key: LUCENE-2161
> URL: https://issues.apache.org/jira/browse/LUCENE-2161
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2161.patch
>
>
> Some concurrency improvements for NRT
> I found & fixed some silly thread bottlenecks that affect NRT:
>   * Multi/DirectoryReader.numDocs is synchronized, I think so only 1
> thread computes numDocs if it's -1.  I removed this sync, and made
> numDocs volatile, instead.  Yes, multiple threads may compute the
> numDocs for the first time, but I think that's harmless?
>   * Fixed BitVector's ctor to set count to 0 on creating a new BV, and
> clone to copy the count over; this saves CPU computing the count
> unecessarily.
>   * Also strengthened assertions done in SR, testing the delete docs
> count.
> I also found an annoying thread bottleneck that happens, due to CMS.
> Whenever CMS hits the max running merges (default changed from 3 to 1
> recently), and the merge policy now wants to launch another merge, it
> forces the incoming thread to wait until one of the BG threads
> finishes.
> This is a basic crude throttling mechanism -- you force the mutators
> (whoever is causing new segments to appear) to stop, so that merging
> can catch up.
> Unfortunately, when stressing NRT, that thread is the one that's
> opening a new NRT reader.
> So, the first serious problem happens when you call .reopen() on your
> NRT reader -- this call simply forwards to IW.getReader if the reader
> was an NRT reader.  But, because DirectoryReader.doReopen is
> synchronized, this had the horrible effect of holding the monitor lock
> on your main IR.  In my test, this blocked all searches (since each
> search uses incRef/decRef, still sync'd until LUCENE-2156, at least).
> I fixed this by making doReopen only sync'd on this if it's not simply
> forwarding to getWriter.  So that's a good step forward.
> This prevents searches from being blocked while trying to reopen to a
> new NRT.
> However... it doesn't fix the problem that when an immense merge is
> off and running, opening an NRT reader could hit a tremendous delay
> because CMS blocks it.  The BalancedSegmentMergePolicy should help
> here... by avoiding such immense merges.
> But, I think we should also pursue an improvement to CMS.  EG, if it
> has 2 merges running, where one is huge and one is tiny, it ought to
> increase thread priority of the tiny one.  I think with such a change
> we could increase the max thread count again, to prevent this
> starvation.  I'll open a separate issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2161) Some concurrency improvements for NRT

2010-05-30 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873475#action_12873475
 ] 

Shay Banon commented on LUCENE-2161:


Mike, is there a reason why this is not backported to 3.0.2?

> Some concurrency improvements for NRT
> -
>
> Key: LUCENE-2161
> URL: https://issues.apache.org/jira/browse/LUCENE-2161
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9.3, 4.0
>
> Attachments: LUCENE-2161.patch
>
>
> Some concurrency improvements for NRT
> I found & fixed some silly thread bottlenecks that affect NRT:
>   * Multi/DirectoryReader.numDocs is synchronized, I think so only 1
> thread computes numDocs if it's -1.  I removed this sync, and made
> numDocs volatile, instead.  Yes, multiple threads may compute the
> numDocs for the first time, but I think that's harmless?
>   * Fixed BitVector's ctor to set count to 0 on creating a new BV, and
> clone to copy the count over; this saves CPU computing the count
> unecessarily.
>   * Also strengthened assertions done in SR, testing the delete docs
> count.
> I also found an annoying thread bottleneck that happens, due to CMS.
> Whenever CMS hits the max running merges (default changed from 3 to 1
> recently), and the merge policy now wants to launch another merge, it
> forces the incoming thread to wait until one of the BG threads
> finishes.
> This is a basic crude throttling mechanism -- you force the mutators
> (whoever is causing new segments to appear) to stop, so that merging
> can catch up.
> Unfortunately, when stressing NRT, that thread is the one that's
> opening a new NRT reader.
> So, the first serious problem happens when you call .reopen() on your
> NRT reader -- this call simply forwards to IW.getReader if the reader
> was an NRT reader.  But, because DirectoryReader.doReopen is
> synchronized, this had the horrible effect of holding the monitor lock
> on your main IR.  In my test, this blocked all searches (since each
> search uses incRef/decRef, still sync'd until LUCENE-2156, at least).
> I fixed this by making doReopen only sync'd on this if it's not simply
> forwarding to getWriter.  So that's a good step forward.
> This prevents searches from being blocked while trying to reopen to a
> new NRT.
> However... it doesn't fix the problem that when an immense merge is
> off and running, opening an NRT reader could hit a tremendous delay
> because CMS blocks it.  The BalancedSegmentMergePolicy should help
> here... by avoiding such immense merges.
> But, I think we should also pursue an improvement to CMS.  EG, if it
> has 2 merges running, where one is huge and one is tiny, it ought to
> increase thread priority of the tiny one.  I think with such a change
> we could increase the max thread count again, to prevent this
> starvation.  I'll open a separate issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-20 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869779#action_12869779
 ] 

Shay Banon commented on LUCENE-2468:


Hi Mike, 

First, I opened and attached a patch regarding the Cache eviction listeners to 
IndexReader: https://issues.apache.org/jira/browse/LUCENE-2474, tell me what 
you think.

Regarding your last comment, I agree. Though, trying to streamline its usage in 
terms of having all built in components and possible extensions work well with 
it make sense. Thats what you suggest in with the filtered doc set, which is 
cool.

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
> LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2010-05-20 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2474:
---

Attachment: LUCENE-2474.patch

First revision of the patch, tell me what you think... .

> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey)
> 
>
> Key: LUCENE-2474
> URL: https://issues.apache.org/jira/browse/LUCENE-2474
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shay Banon
> Attachments: LUCENE-2474.patch
>
>
> Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
> custom caches that use the IndexReader (getFieldCacheKey).
> A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
> make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
> even Lucene itself uses it, for example, with the CachingWrapperFilter. 
> FieldCache enjoys being called explicitly to purge its cache when possible 
> (which is tricky to know from the "outside", especially when using NRT - 
> reader attack of the clones).
> The provided patch allows to plug a CacheEvictionListener which will be 
> called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2010-05-20 Thread Shay Banon (JIRA)

Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
custom caches that use the IndexReader (getFieldCacheKey)


 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon


Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
custom caches that use the IndexReader (getFieldCacheKey).

A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even 
Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache 
enjoys being called explicitly to purge its cache when possible (which is 
tricky to know from the "outside", especially when using NRT - reader attack of 
the clones).

The provided patch allows to plug a CacheEvictionListener which will be called 
when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-19 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869369#action_12869369
 ] 

Shay Banon commented on LUCENE-2468:


bq. So... why not do this in CachingWrapper/SpanFilter, but, instead of 
discarding the cache entry when deletions must be enforced, we dynamically 
apply the deletions? (I think we could use FilteredDocIdSet).

Yea, that would work well. You will need to somehow still know when to enable 
or disable this based on the filter you use (it should basically only be 
enabled ones that are passed to constant score... .

bq. Really... we need a more generic solution here (but, it's a much bigger 
change), where somehow in creating the scorer per-segment we dynamically 
determine who/where the deletions are enforced. A Filter need not care about 
deletions if it's AND'd w/ a query that already enforces the deletions.

Agreed. As I see it, caching based on IndexReader is key in Lucene, and with 
NRT, it should feel the same way as it is without it. NRT should not change the 
way you build your system.

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
> LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-19 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869253#action_12869253
 ] 

Shay Banon commented on LUCENE-2468:


bq. With the perf fix we are doing here, the problem (not correctly
"seeing" deletes on a reopened reader) is isolated to
CachingWrapperFilter/CachingSpanFilter, right?

Yes, but, this means that ConstantScoreQuery should basically not be cached 
when using NRT (even with using IndexReader as key...), because of the 
excessive readers created. With the one that is deletion aware, you can cache 
it based on the cache key.

bq. I think this would be a good change - it would make eviction immediate 
instead of just when GC gets around to pruning the WeakHashMap. Can you open a 
separate issue and maybe work out a patch?

Sure, I will do it.

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
> LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868959#action_12868959
 ] 

Shay Banon commented on LUCENE-2468:


Another quick question Mike, what do you think about the ability to know when a 
"cache key" is actually closed so it can be removed from a cache? Similar in 
concept to the eviction done from the field cache in trunk by readers, but open 
so other Reader#cacheKey based caches (which is the simplest way to do caching 
in Lucene) can use.

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
> LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2468:
---

Attachment: DeletionAwareConstantScoreQuery.java

Here is a go at making ConstantScoreQuery deletion aware. I named it 
differently, but it can replace ConstantScoreQuery with a flag making it 
deletion aware. What do you think?

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
> LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868923#action_12868923
 ] 

Shay Banon commented on LUCENE-2468:


Ahh, now I see that, sorry I missed it. But, basically, enforcing deletions 
means that we are back to the original problem... . I think it would be quite 
confusing for users, to be honest. Out of the filters, the problematic ones are 
the ones that can be converted to queries. From what I can see, the 
FilteredQuery is ok, so, maybe the ConstantScore can be changed (if possible) 
to do that... .

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868869#action_12868869
 ] 

Shay Banon commented on LUCENE-2468:


Check two comments above :), we discussed it. Basically, it does not work with 
your change and it using a cached filter.

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868816#action_12868816
 ] 

Shay Banon commented on LUCENE-2468:


Thanks for the work Michael!. Is this issue going to include the 
ConstantSoreQuery, or should I open a different issue for this?

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868680#action_12868680
 ] 

Shay Banon commented on LUCENE-2468:


Agreed, seems like ConstantScoreQuery is the only problematic one... .

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2468:
---

Attachment: CacheTest.java

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: CacheTest.java, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868659#action_12868659
 ] 

Shay Banon commented on LUCENE-2468:


I think that the solution suggested, to use the FieldCacheKey is not good 
enough, sadly. I am attaching a simpel test that shows that this does not work 
for cases when a query is passed to a searcher, without a filter, but that 
query, is, for example, a ConstantScoreQuery. I have simply taken the 
CachingWrapperFiler and changed it to use the getFieldCacheKey instead of using 
the IndexReader.

This is problematic, since a filter can be used somewhere in the query tree, 
and wrapped for caching. I am running against 3.0.1.

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868647#action_12868647
 ] 

Shay Banon commented on LUCENE-2468:


bq. Shay, as far as CachingWrapperFilter and CacheEvictionListener, it seems 
more powerful to just let apps create a new query type themselves? That's the 
nice part of lucene's openness to user query types - start with the code for 
CachingWrapperFilter and hook up your own caching logic.

Yea, but it would be great to know when an IndexReader has decided to actually 
close, so caches can be eagerly cleaned. Even if one will write a custom 
implementation, it would benefit it.

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868617#action_12868617
 ] 

Shay Banon commented on LUCENE-2468:


Sounds like a good solution for me. I just noticed in trunk that there is also 
explicit purge from FieldCache when possible. I think it would be great to 
enable to do this for other caches that are based on it (like the 
CachingWrapperFilter, but externally written ones as well).

I was thinking of an expert API to allow to add a "CacheEvictionListener" or 
something similar, which will be called when this happens. What do you think?

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Attachments: LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-05-04 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864157#action_12864157
 ] 

Shay Banon commented on LUCENE-2387:


Thanks!

> IndexWriter retains references to Readers used in Fields (memory leak)
> --
>
> Key: LUCENE-2387
> URL: https://issues.apache.org/jira/browse/LUCENE-2387
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.1
>Reporter: Ruben Laguna
>Assignee: Michael McCandless
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2387-29x.patch, LUCENE-2387.patch
>
>
> As described in [1] IndexWriter retains references to Reader used in Fields 
> and that can lead to big memory leaks when using tika's ParsingReaders (as 
> those can take 1MB per ParsingReader). 
> [2] shows a screenshot of the reference chain to the Reader from the 
> IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the 
> following:
> IndexWriter -> DocumentsWriter -> DocumentsWriterThreadState -> 
> DocFieldProcessorPerThread  -> DocFieldProcessorPerField -> Fieldable -> 
> Field (fieldsData) 
> -
> [1] http://markmail.org/thread/ndmcgffg2mnwjo47
> [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-05-04 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864044#action_12864044
 ] 

Shay Banon commented on LUCENE-2387:


Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be 
really helpful to get this as soon as possible in the next Lucene version.


> IndexWriter retains references to Readers used in Fields (memory leak)
> --
>
> Key: LUCENE-2387
> URL: https://issues.apache.org/jira/browse/LUCENE-2387
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.1
>Reporter: Ruben Laguna
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2387-29x.patch, LUCENE-2387.patch
>
>
> As described in [1] IndexWriter retains references to Reader used in Fields 
> and that can lead to big memory leaks when using tika's ParsingReaders (as 
> those can take 1MB per ParsingReader). 
> [2] shows a screenshot of the reference chain to the Reader from the 
> IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the 
> following:
> IndexWriter -> DocumentsWriter -> DocumentsWriterThreadState -> 
> DocFieldProcessorPerThread  -> DocFieldProcessorPerField -> Fieldable -> 
> Field (fieldsData) 
> -
> [1] http://markmail.org/thread/ndmcgffg2mnwjo47
> [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2283) Possible Memory Leak in StoredFieldsWriter

2010-05-04 Thread Shay Banon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864042#action_12864042
 ] 

Shay Banon commented on LUCENE-2283:


Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be 
really helpful to get this as soon as possible in the next Lucene version.



> Possible Memory Leak in StoredFieldsWriter
> --
>
> Key: LUCENE-2283
> URL: https://issues.apache.org/jira/browse/LUCENE-2283
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2283.patch, LUCENE-2283.patch, LUCENE-2283.patch
>
>
> StoredFieldsWriter creates a pool of PerDoc instances
> this pool will grow but never be reclaimed by any mechanism
> furthermore, each PerDoc instance contains a RAMFile.
> this RAMFile will also never be truncated (and will only ever grow) (as far 
> as i can tell)
> When feeding documents with large number of stored fields (or one large 
> dominating stored field) this can result in memory being consumed in the 
> RAMFile but never reclaimed. Eventually, each pooled PerDoc could grow very 
> large, even if large documents are rare.
> Seems like there should be some attempt to reclaim memory from the PerDoc[] 
> instance pool (or otherwise limit the size of RAMFiles that are cached) etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

65 matches

Mail list logo