[jira] [Created] (LUCENE-5793) Add equals/hashCode to FieldType

2014-06-30 Thread Shay Banon (JIRA)
Shay Banon created LUCENE-5793:
--

 Summary: Add equals/hashCode to FieldType
 Key: LUCENE-5793
 URL: https://issues.apache.org/jira/browse/LUCENE-5793
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Shay Banon


would be nice to have equals and hashCode to FieldType, so one can easily check 
if they are the same, and for example, reuse existing default implementations 
of it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5634) Reuse TokenStream instances in Field

2014-04-30 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986177#comment-13986177
 ] 

Shay Banon commented on LUCENE-5634:


this optimization has proven to help a lot in the context of ES, but we can use 
a static thread local since we are fully in control of the threading model. 
With Lucene itself, where it can be used in many different environment, then 
this can cause some unexpected behavior. For example, this might cause Tomcat 
to warn on leaking resources when unloading a war.

 Reuse TokenStream instances in Field
 

 Key: LUCENE-5634
 URL: https://issues.apache.org/jira/browse/LUCENE-5634
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5634.patch


 If you don't reuse your Doc/Field instances (which is very expert: I
 suspect few apps do) then there's a lot of garbage created to index each
 StringField because we make a new StringTokenStream or
 NumericTokenStream (and their Attributes).
 We should be able to re-use these instances via a static
 ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5516) Forward information that trigger a merge to MergeScheduler

2014-03-11 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930457#comment-13930457
 ] 

Shay Banon commented on LUCENE-5516:


+1, this looks great!. Exactly the info we would love to have to better control 
merges.

 Forward information that trigger a merge to MergeScheduler
 --

 Key: LUCENE-5516
 URL: https://issues.apache.org/jira/browse/LUCENE-5516
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.7
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5516.patch, LUCENE-5516.patch


 Today we pass information about the merge trigger to the merge policy. Yet, 
 no matter if the MP finds a merge or not we call the MergeScheduler who runs 
  blocks even if we didn't find a merge. In some cases we don't even want 
 this to happen but inside the MergeScheduler we have no choice to opt out 
 since we don't know what triggered the merge. We should forward the infos we 
 have to the MergeScheduler as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5373) Lucene42DocValuesProducer.ramBytesUsed is over-estimated

2013-12-19 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853098#comment-13853098
 ] 

Shay Banon commented on LUCENE-5373:


as someone who found this issue, on top of the wrong computation, its also very 
expensive. This call should be lightweight and hopefully not use sizeOf at 
all... . At the very least, if possible, the result of it should be cached? 
Maybe even introduce size caching on a higher level (calling code) if possible.

 Lucene42DocValuesProducer.ramBytesUsed is over-estimated
 

 Key: LUCENE-5373
 URL: https://issues.apache.org/jira/browse/LUCENE-5373
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Priority: Minor

 Lucene42DocValuesProducer.ramBytesUsed uses 
 {{RamUsageEstimator.sizeOf(this)}} to return an estimation of the memory 
 usage. One of the issues (there might be other ones) is that this class has a 
 reference to an IndexInput that might link to other data-structures that we 
 wouldn't want to take into account. For example, index inputs of a 
 {{RAMDirectory}} all point to the directory itself, so 
 {{Lucene42DocValuesProducer.ramBytesUsed}} would return the amount of memory 
 used by the whole directory.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer

2013-07-04 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699892#comment-13699892
 ] 

Shay Banon commented on LUCENE-5086:


The Java version on the Mac is the latest one:

java version 1.6.0_51
Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509)
Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode)

Regarding the catch, I think Throwable is the right exceptions to catch here. 
Catch all, who cares, you don't want a bug in the JVM that throws an unexpected 
runtime exception to cause Lucene to break the APP completely because its a 
static block, and I have been right there a few times. But if you feel 
differently, go ahead and change it to explicitly catch whats needed.

 RamUsageEstimator causes AWT classes to be loaded by calling 
 ManagementFactory#getPlatformMBeanServer
 -

 Key: LUCENE-5086
 URL: https://issues.apache.org/jira/browse/LUCENE-5086
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Shay Banon
Assignee: Dawid Weiss

 Yea, that type of day and that type of title :).
 Since the last update of Java 6 on OS X, I started to see an annoying icon 
 pop up at the doc whenever running elasticsearch. By default, all of our 
 scripts add headless AWT flag so people will probably not encounter it, but, 
 it was strange that I saw it when before I didn't.
 I started to dig around, and saw that when RamUsageEstimator was being 
 loaded, it was causing AWT classes to be loaded. Further investigation showed 
 that actually for some reason, calling 
 ManagementFactory#getPlatformMBeanServer now with the new Java version causes 
 AWT classes to be loaded (at least on the mac, haven't tested on other 
 platforms yet). 
 There are several ways to try and solve it, for example, by identifying the 
 bug in the JVM itself, but I think that there should be a fix for it in 
 Lucene itself, specifically since there is no need to call 
 #getPlatformMBeanServer to get the hotspot diagnostics one (its a heavy 
 call...).
 Here is a simple call that will allow to get the hotspot mxbean without using 
 the #getPlatformMBeanServer method, and not causing it to be loaded and 
 loading all those nasty AWT classes:
 {code}
 Object getHotSpotMXBean() {
 try {
 // Java 6
 Class sunMF = Class.forName(sun.management.ManagementFactory);
 return sunMF.getMethod(getDiagnosticMXBean).invoke(null);
 } catch (Throwable t) {
 // ignore
 }
 // potentially Java 7
 try {
 return ManagementFactory.class.getMethod(getPlatformMXBean, 
 Class.class).invoke(null, 
 Class.forName(com.sun.management.HotSpotDiagnosticMXBean));
 } catch (Throwable t) {
 // ignore
 }
 return null;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer

2013-07-01 Thread Shay Banon (JIRA)
Shay Banon created LUCENE-5086:
--

 Summary: RamUsageEstimator causes AWT classes to be loaded by 
calling ManagementFactory#getPlatformMBeanServer
 Key: LUCENE-5086
 URL: https://issues.apache.org/jira/browse/LUCENE-5086
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Shay Banon


Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

[code]
Object getHotSpotMXBean() {
Object hotSpotBean = null;
try {
// Java 6
Class sunMF = Class.forName(sun.management.ManagementFactory);
return sunMF.getMethod(getDiagnosticMXBean).invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod(getPlatformMXBean, 
Class.class).invoke(null, 
Class.forName(com.sun.management.HotSpotDiagnosticMXBean));
} catch (Throwable t) {
// ignore
}
return null;
}
[/code]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer

2013-07-01 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-5086:
---

Description: 
Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

{code}
Object getHotSpotMXBean() {
try {
// Java 6
Class sunMF = Class.forName(sun.management.ManagementFactory);
return sunMF.getMethod(getDiagnosticMXBean).invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod(getPlatformMXBean, 
Class.class).invoke(null, 
Class.forName(com.sun.management.HotSpotDiagnosticMXBean));
} catch (Throwable t) {
// ignore
}
return null;
}
{code}

  was:
Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

{code}
Object getHotSpotMXBean() {
Object hotSpotBean = null;
try {
// Java 6
Class sunMF = Class.forName(sun.management.ManagementFactory);
return sunMF.getMethod(getDiagnosticMXBean).invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod(getPlatformMXBean, 
Class.class).invoke(null, 
Class.forName(com.sun.management.HotSpotDiagnosticMXBean));
} catch (Throwable t) {
// ignore
}
return null;
}
{code}


 RamUsageEstimator causes AWT classes to be loaded by calling 
 ManagementFactory#getPlatformMBeanServer
 -

 Key: LUCENE-5086
 URL: https://issues.apache.org/jira/browse/LUCENE-5086
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Shay Banon

 Yea, that type of day and that type of title :).
 Since the last update of Java 6 on OS X, I started to see an annoying icon 
 pop up at the doc whenever running elasticsearch. By default, all of our 
 scripts add headless AWT flag so people will probably not encounter it, but, 
 it was strange that I saw it when before I didn't.
 I started to dig around, and saw that when RamUsageEstimator was being 
 loaded, it was causing AWT classes to be loaded. Further investigation showed 
 that actually for some reason, calling 
 ManagementFactory#getPlatformMBeanServer now with the new Java version causes 
 AWT classes to be loaded (at least on the mac, haven't tested on other 
 platforms yet). 
 There are several ways to try and solve it, for example, by identifying the 
 bug in the JVM itself, but I 

[jira] [Updated] (LUCENE-5086) RamUsageEstimator causes AWT classes to be loaded by calling ManagementFactory#getPlatformMBeanServer

2013-07-01 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-5086:
---

Description: 
Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

{code}
Object getHotSpotMXBean() {
Object hotSpotBean = null;
try {
// Java 6
Class sunMF = Class.forName(sun.management.ManagementFactory);
return sunMF.getMethod(getDiagnosticMXBean).invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod(getPlatformMXBean, 
Class.class).invoke(null, 
Class.forName(com.sun.management.HotSpotDiagnosticMXBean));
} catch (Throwable t) {
// ignore
}
return null;
}
{code}

  was:
Yea, that type of day and that type of title :).

Since the last update of Java 6 on OS X, I started to see an annoying icon pop 
up at the doc whenever running elasticsearch. By default, all of our scripts 
add headless AWT flag so people will probably not encounter it, but, it was 
strange that I saw it when before I didn't.

I started to dig around, and saw that when RamUsageEstimator was being loaded, 
it was causing AWT classes to be loaded. Further investigation showed that 
actually for some reason, calling ManagementFactory#getPlatformMBeanServer now 
with the new Java version causes AWT classes to be loaded (at least on the mac, 
haven't tested on other platforms yet). 

There are several ways to try and solve it, for example, by identifying the bug 
in the JVM itself, but I think that there should be a fix for it in Lucene 
itself, specifically since there is no need to call #getPlatformMBeanServer to 
get the hotspot diagnostics one (its a heavy call...).

Here is a simple call that will allow to get the hotspot mxbean without using 
the #getPlatformMBeanServer method, and not causing it to be loaded and loading 
all those nasty AWT classes:

[code]
Object getHotSpotMXBean() {
Object hotSpotBean = null;
try {
// Java 6
Class sunMF = Class.forName(sun.management.ManagementFactory);
return sunMF.getMethod(getDiagnosticMXBean).invoke(null);
} catch (Throwable t) {
// ignore
}
// potentially Java 7
try {
return ManagementFactory.class.getMethod(getPlatformMXBean, 
Class.class).invoke(null, 
Class.forName(com.sun.management.HotSpotDiagnosticMXBean));
} catch (Throwable t) {
// ignore
}
return null;
}
[/code]


 RamUsageEstimator causes AWT classes to be loaded by calling 
 ManagementFactory#getPlatformMBeanServer
 -

 Key: LUCENE-5086
 URL: https://issues.apache.org/jira/browse/LUCENE-5086
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Shay Banon

 Yea, that type of day and that type of title :).
 Since the last update of Java 6 on OS X, I started to see an annoying icon 
 pop up at the doc whenever running elasticsearch. By default, all of our 
 scripts add headless AWT flag so people will probably not encounter it, but, 
 it was strange that I saw it when before I didn't.
 I started to dig around, and saw that when RamUsageEstimator was being 
 loaded, it was causing AWT classes to be loaded. Further investigation showed 
 that actually for some reason, calling 
 ManagementFactory#getPlatformMBeanServer now with the new Java version causes 
 AWT classes to be loaded (at least on the mac, haven't tested on other 
 platforms yet). 
 There are several ways to try and solve it, for example, by identifying the 

[jira] [Commented] (LUCENE-4472) Add setting that prevents merging on updateDocument

2012-10-10 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473617#comment-13473617
 ] 

Shay Banon commented on LUCENE-4472:


Agree with Robert on the additional context flag, that would make things most 
flexible. A flag on IW makes things simpler from the user perspective though, 
cause then there is no need to customize the built in merge policies.

 Add setting that prevents merging on updateDocument
 ---

 Key: LUCENE-4472
 URL: https://issues.apache.org/jira/browse/LUCENE-4472
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4472.patch


 Currently we always call maybeMerge if a segment was flushed after 
 updateDocument. Some apps and in particular ElasticSearch uses some hacky 
 workarounds to disable that ie for merge throttling. It should be easier to 
 enable this kind of behavior. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control

2011-09-09 Thread Shay Banon (JIRA)
NRT Caching Dir to allow for exact memory usage, better buffer allocation and 
global cross indices control


 Key: LUCENE-3425
 URL: https://issues.apache.org/jira/browse/LUCENE-3425
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Shay Banon


A discussion on IRC raised several improvements that can be made to NRT caching 
dir. Some of the problems it currently has are:

1. Not explicitly controlling the memory usage, which can result in overusing 
memory (for example, large new segments being committed because refreshing is 
too far behind).
2. Heap fragmentation because of constant allocation of (probably promoted to 
old gen) byte buffers.
3. Not being able to control the memory usage across indices for multi index 
usage within a single JVM.

A suggested solution (which still needs to be ironed out) is to have a 
BufferAllocator that controls allocation of byte[], and allow to return unused 
byte[] to it. It will have a cap on the size of memory it allows to be 
allocated.

The NRT caching dir will use the allocator, which can either be provided (for 
usage across several indices) or created internally. The caching dir will also 
create a wrapped IndexOutput, that will flush to the main dir if the allocator 
can no longer provide byte[] (exhausted).

When a file is flushed from the cache to the main directory, it will return 
all the currently allocated byte[] to the BufferAllocator to be reused by other 
files.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-07 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099146#comment-13099146
 ] 

Shay Banon commented on LUCENE-3416:


The only reason its synchronized is because the setMaxMergeWriteMBPerSec method 
is synchronized (I guess to protected from setting the rate limit 
concurrently). In practice, I don't see users changing it that often, so 
concerns about cache lines are not really relevant.

 Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
 limit merge IO across several directories / instances
 --

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon
Assignee: Simon Willnauer
 Attachments: LUCENE-3416.patch


 This can come in handy when running several Lucene indices in the same VM, 
 and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-07 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099160#comment-13099160
 ] 

Shay Banon commented on LUCENE-3416:


 this make no sense to me. If you don't want to set this concurrently how does 
 a lock protect you from this? I mean you if you have two threads accessing 
 this you have either A B or B A. but this would happen without a lock too. if 
 you want to have the changes to take effect immediately you need to either 
 lock on each read on this var or make it volatile which is almost equivalent 
 (a mem barrier).

No, thats not correct. setMaxMergeWriteMBPerSec (not the method I added, the 
other one) is a complex method, and I think Mike wanted to protect from two 
threads setting the value concurrently. As for reading the value, I think Mike 
logic was that its not that importnat the have immediate visibility of the 
change to require a volatile field (which is understandable). So, since 
setMaxMergeWriteMBPerSec is synchronized, the method added in this patch has to 
be as well.

 My concern here was related to make this var volatile which would be a 
 cacheline invalidation each time you read the var. I think we should get rid 
 of the synchronized.

Reading a volatile var in x86 is not a cache invalidation, though it does come 
with a cost. Its not relevant here based on what I explained before (and second 
guessing Mike :) )

 Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
 limit merge IO across several directories / instances
 --

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon
Assignee: Simon Willnauer
 Attachments: LUCENE-3416.patch


 This can come in handy when running several Lucene indices in the same VM, 
 and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-07 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099233#comment-13099233
 ] 

Shay Banon commented on LUCENE-3416:


I agree with Mike, I think it should remain synchronized, it does safeguard 
concurrently calling setMaxMergeWriteMBPerSec from falling over itself (who 
wins the call is not really relevant). Since thats synchronized, the metod I 
added should be as well. Personally, I really don't think there is a need to 
make it thread safe without blocking, since calling the setters is not 
something people do frequently at all, so the optimization is mute, and it will 
complicate the code.

As for making mergeWriteRateLimiter volatile, it can be done. Though, in 
practice, there really is no need to do it (there is a memory barrier when 
reading it before). But, I think that should go in a different issue? Just to 
keep changes clean and isolated?


 Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
 limit merge IO across several directories / instances
 --

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon
Assignee: Simon Willnauer
 Attachments: LUCENE-3416.patch


 This can come in handy when running several Lucene indices in the same VM, 
 and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-07 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099287#comment-13099287
 ] 

Shay Banon commented on LUCENE-3416:


I must say that I am at a lost in trying to understand why we need this 
optimization, but it does not really matter to me as long as the ability to 
set the rate limiter instance gets in.

 Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
 limit merge IO across several directories / instances
 --

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon
Assignee: Simon Willnauer
 Attachments: LUCENE-3416.patch


 This can come in handy when running several Lucene indices in the same VM, 
 and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)
Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit 
merge IO across several directories / instances
--

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon


This can come in handy when running several Lucene indices in the same VM, and 
wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3416:
---

Attachment: LUCENE-3416.patch

 Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
 limit merge IO across several directories / instances
 --

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon
 Attachments: LUCENE-3416.patch


 This can come in handy when running several Lucene indices in the same VM, 
 and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-06 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098018#comment-13098018
 ] 

Shay Banon commented on LUCENE-3416:


It is possible, but requires more work to do, and depends on overriding the 
createOutput method (as well as all the other methods in Directory). If rate 
limiting makes sense on the directory level to be exposed as a feature, I 
think that this small change allows for greater control over it.

 Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
 limit merge IO across several directories / instances
 --

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon
 Attachments: LUCENE-3416.patch


 This can come in handy when running several Lucene indices in the same VM, 
 and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3335) jrebug causes porter stemmer to sigsegv

2011-08-01 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076053#comment-13076053
 ] 

Shay Banon commented on LUCENE-3335:


@Uwe I actually forgot about this, and did not think it was because of the 
porter stemmer at the time, especially since I did try and reproduce it and 
never managed to (I thought it was coincidence it crashed there). From my 
experience, you get very little help from sun/oracle when using unorthodox 
flags like agressive opts without proper recreation. Well, you get very little 
help there even when you do produce recreation... (see this issue that I opened 
for example: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129) . I am 
the reason behind Lucene 1.9.1 release with the major bug in buffering 
introduced in 1.9 way back in the days, do you really think I would not contact 
if I thought there really was a problem associated with Lucene?

 jrebug causes porter stemmer to sigsegv
 ---

 Key: LUCENE-3335
 URL: https://issues.apache.org/jira/browse/LUCENE-3335
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 1.9, 1.9.1, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 
 2.4.1, 2.9, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2, 
 3.3, 3.4, 4.0
 Environment: - JDK 7 Preview Release, GA (may also affect update _1, 
 targeted fix is JDK 1.7.0_2)
 - JDK 1.6.0_20+ with -XX:+OptimizeStringConcat or -XX:+AggressiveOpts
Reporter: Robert Muir
Assignee: Robert Muir
  Labels: Java7
 Attachments: LUCENE-3335.patch, LUCENE-3335_slow.patch, 
 patch-0uwe.patch


 happens easily on java7: ant test -Dtestcase=TestPorterStemFilter 
 -Dtests.iter=100
 might happen on 1.6.0_u26 too, a user reported something that looks like the 
 same bug already:
 http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-29 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072979#comment-13072979
 ] 

Shay Banon commented on LUCENE-3282:


Hi, sorry for the late response, I the comment.

Yea, I agree that there will be false positives, but thats the idea of it 
(sometimes you want to run facets for example on sub queries). Btw, I got 
your point on advance, do you think if a collector exists, then advance should 
be implemented by iterating over all docs up to the provided doc to advance to.

Regarding the wrapper, interesting!. I need to have a look at how to generalize 
it, but it should be simple, I think, I'll try and work on it.

 BlockJoinQuery: Allow to add a custom child collector, and customize the 
 parent bitset extraction
 -

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon
 Attachments: LUCENE-3282.patch, LUCENE-3282.patch


 It would be nice to allow to add a custom child collector to the 
 BlockJoinQuery to be called on every matching doc (so we can do things with 
 it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
 custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-16 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066536#comment-13066536
 ] 

Shay Banon commented on LUCENE-3282:


The idea of this is to collect matching child docs regardless of what matches 
parent wise, and yea, we might miss some depending on the type of query that is 
actually wrapping it, but I think its still useful.

 BlockJoinQuery: Allow to add a custom child collector, and customize the 
 parent bitset extraction
 -

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon
 Attachments: LUCENE-3282.patch, LUCENE-3282.patch


 It would be nice to allow to add a custom child collector to the 
 BlockJoinQuery to be called on every matching doc (so we can do things with 
 it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
 custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-11 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063619#comment-13063619
 ] 

Shay Banon commented on LUCENE-3282:


Heya,

   In my app, I have a wrapper around OBS, that has a common interface that 
allows to access bits by index (similar to Bits in trunk), so I need to extract 
from it the OBS.

   Regarding the Collector, I will work on CollectorProvider interface. I liked 
the NoOpCollector option since then you don't have to check for nulls each 
time...

 BlockJoinQuery: Allow to add a custom child collector, and customize the 
 parent bitset extraction
 -

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon
 Attachments: LUCENE-3282.patch


 It would be nice to allow to add a custom child collector to the 
 BlockJoinQuery to be called on every matching doc (so we can do things with 
 it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
 custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-11 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3282:
---

Attachment: LUCENE-3282.patch

New version, with CollectorProvider.

 BlockJoinQuery: Allow to add a custom child collector, and customize the 
 parent bitset extraction
 -

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon
 Attachments: LUCENE-3282.patch, LUCENE-3282.patch


 It would be nice to allow to add a custom child collector to the 
 BlockJoinQuery to be called on every matching doc (so we can do things with 
 it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
 custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-06 Thread Shay Banon (JIRA)
BlockJoinQuery: Allow to add a custom child collector, and customize the parent 
bitset extraction
-

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon


It would be nice to allow to add a custom child collector to the BlockJoinQuery 
to be called on every matching doc (so we can do things with it, like counts 
and such). Also, allow to extend BlockJoinQuery to have a custom code that 
converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction

2011-07-06 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-3282:
---

Attachment: LUCENE-3282.patch

 BlockJoinQuery: Allow to add a custom child collector, and customize the 
 parent bitset extraction
 -

 Key: LUCENE-3282
 URL: https://issues.apache.org/jira/browse/LUCENE-3282
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 3.4, 4.0
Reporter: Shay Banon
 Attachments: LUCENE-3282.patch


 It would be nice to allow to add a custom child collector to the 
 BlockJoinQuery to be called on every matching doc (so we can do things with 
 it, like counts and such). Also, allow to extend BlockJoinQuery to have a 
 custom code that converts the filter bitset to an OpenBitSet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-13 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006214#comment-13006214
 ] 

Shay Banon commented on LUCENE-2960:


Just a note regarding the IWC and being able to consult it for live changes, it 
feels strange to me that settings something on the config will affect the IW in 
real time. Maybe its just me, but it feels nicer to have the live setters on 
IW compared to IWC.

I also like the ability to decouple construction time configuration through 
IWC, and live settings through setters on IW. It is then very clear what can be 
set on construction time, and what can be set on a live IW. It also allows for 
compile time / static check for the code what can be changed at what lifecycle 
phase.

 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-10 Thread Shay Banon (JIRA)
Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
--

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
 Fix For: 3.2, 4.0


In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
It would be great to be able to control that on a live IndexWriter. Other 
possible two methods that would be great to bring back are setTermIndexInterval 
and setReaderTermsIndexDivisor. Most of the other setters can actually be set 
on the MergePolicy itself, so no need for setters for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-26 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2474:
---

Attachment: MapBackedSet.java

A MapBackedSet implementation, that can wrap a CHM to have a concurrent set 
implementation. We can consider using that instead of sync set and copy on read 
when notifying listeners.


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2474.patch, 
 LUCENE-2574.patch, MapBackedSet.java


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984134#action_12984134
 ] 

Shay Banon commented on LUCENE-2871:


Strange, did not get it when running the tests, will try and find out why it 
can happen.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984206#action_12984206
 ] 

Shay Banon commented on LUCENE-2871:


bq. Looking at the current patch, the class seems wrong. In my opinion, this 
should be only in NIOFSDirectory. SimpleFSDir should only use RAF.

Its a good question, not sure what to do with it. Here is the problem. The 
channel output can be used with all 3 FS dirs (simple, nio, and mmap), and 
actually might make sense to be used even with SimpleFS (i.e. using non nio to 
read, but use file channel to write). In order to have all of them supported, 
currently, the simplest way is to put it in the base class so code will be 
shared. On IRC, a discussion was made to externalize the outputs and inputs, 
and then one can more easily pick and choose, but I think this will belong on a 
different patch.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch, LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984263#action_12984263
 ] 

Shay Banon commented on LUCENE-2871:


Agreed Earwin, lets first see if it make sense, this is just an experiment and 
might not make sense for single threaded writes.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch, LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-17 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982695#action_12982695
 ] 

Shay Banon commented on LUCENE-2474:


Yea, I got the reasoning for Set, we can use that, CHM with PRESENT. If you 
want, I can attach a simple MapBackedSet that makes any Map a Set.

Still, I think that using CopyOnWriteArrayList is best here. I don't think that 
adding and removing listeners is something that will be done often in an app. 
But I might be mistaken. In this case, traversal over listeners is much better 
on CopyOnWriteArrayList compared to CHM.


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch, LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-16 Thread Shay Banon (JIRA)
Use FileChannel in FSDirectory
--

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon


Explore using FileChannel in FSDirectory to see if it improves write operations 
performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-16 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2871:
---

Attachment: LUCENE-2871.patch

Patch supporting using file channel to write. FSDirectory still retain the 
ability to use RAF for writes.

FSDirectory#setUseChannelOutput: Allows to revert back to use RAF by setting to 
false.
FSDirectory#setCacheChannelBuffers: Allow to control if, when using file 
channel, buffers should be cached.

 Use FileChannel in FSDirectory
 --

 Key: LUCENE-2871
 URL: https://issues.apache.org/jira/browse/LUCENE-2871
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2871.patch


 Explore using FileChannel in FSDirectory to see if it improves write 
 operations performance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-16 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982509#action_12982509
 ] 

Shay Banon commented on LUCENE-2474:


bq. OK, here's a patch exposing the readerFinishedListeners as static methods 
on IndexReader.

I think we should use a CopyOneWriteArrayList so calling the listeners will not 
happen under a global synchronize block. If maintaining set behavior is 
required, then I can patch with a ConcurrentHashSet implementation or we can 
simply replace it with a CHM with PRESENT, or any other solution that does not 
require calling the listeners under a global sync block.

 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch, LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-07 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978719#action_12978719
 ] 

Shay Banon commented on LUCENE-2474:


 It would be a cache of anything... one element of that cache would be the 
 FieldCache, there could be one for filters, or one entry per-filter.
 edit: Maybe a better way to think about it is like a ServletContext or 
 something - it's just a way to attach anything arbitrary to a reader.

Got you. My personal taste is to try and keep those readers as lightweight as 
possible, and have the proper constructs in place to allow to externally use 
them for caching, without having them manage it as well.

 Not with this current patch, as there is no mechanism to get a callback when 
 you do care about deletes. If I want to cache something that depends on 
 deletions, I want to purge that cache when the actual reader is closed (as 
 opposed to the reader's core cache key that is shared amongst all readers 
 that just have different deletions). So if we go a close event route, we 
 really want two different events... one for the close of a reader (i.e. 
 deleted matter), and one for the close of the segment (deletes don't matter).

I think that a cache that is affected by deletes is a problematic cache to 
begin with, so was thinking that maybe it should be discouraged by not allowing 
for it. Especially with NRT. My idea was to simply expand the purge capability 
that the FC gets for free to other external custom components.

Also, if we did have a type safe separation between segment readers and 
compound readers, I would not have added the ability to register a listener on 
the compound readers, just the segment readers, as this will encourage people 
to write caches that only work on segment readers (since the registration for 
the purge event will happen within the cache, and it should work only with 
segment readers). That was why my patch does not take compound readers into 
account.




 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-07 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978975#action_12978975
 ] 

Shay Banon commented on LUCENE-2474:


bq. But: I think we'd want to have composite reader just forward the 
registration down to the atomic readers? (And, forward on reopen).

I am not sure that you would want to do it. Any caching layer or an external 
component that is properly written would work on the low level segment readers, 
it will not even compile against compound readers. This will help direct people 
to write proper code and dealing only with segment readers.

 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-06 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978480#action_12978480
 ] 

Shay Banon commented on LUCENE-2474:


Right, I was thinking that its a low level API that you can just add it to the 
low level readers, but I agree, it will be nicer to have it on the high level 
as well. Regarding the close method name, I guess we can name it similar to the 
FieldCache one, maybe purge?

 We've talked before about putting caches directly on the readers - that still 
 seems like the most straightforward approach?

not sure I understand that. Do you mean getting FieldCache into the readers? 
And then what about cached filters? And other custom caching constructs that 
rely on the same mechanism as the CachingWrapperFilter? 

I think that if one implements such caching, its an advance enough feature 
where you should know how to handle deletes and other tidbits (if you need to).

 We really need one cache that doesn't care about deletions, and one cache 
 that does.

Isn't that up to the cache to decide? That cache can be anything (internally 
implemented in Lucene or externally) that follows the mechanism of caching 
based on (segment) readers. As long as there are constructs to get the deleted 
docs to handle deletes (for example), then the implementation can use it.



 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap

2010-12-23 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2292:
---

Attachment: LUCENE-2292.patch

A fixed path that now passes all tests using the byte buffer directory.

Also, includes refactoring into a different package (store.bytebuffer), and 
includes a custom ByteBufferAllocator interface that can control how buffers 
are allocated, including plain and caching implementations.

 ByteBuffer Directory - allowing to store the index outside the heap
 ---

 Key: LUCENE-2292
 URL: https://issues.apache.org/jira/browse/LUCENE-2292
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2292.patch, LUCENE-2292.patch, LUCENE-2292.patch


 A byte buffer based directory with the benefit of being able to create direct 
 byte buffer thus storing the index outside the JVM heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

2010-12-02 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966376#action_12966376
 ] 

Shay Banon commented on LUCENE-2779:


  If the assumption still stands that an IndexInput will not be opened on a 
writing / unclosed IndexOutput, then RAMFile can also be improved when it 
comes to concurrency. The RAMOutputStream can maintain its own list of buffers 
(simple array list, no need to sync), and only when it gets closed, initialize 
the respective RAMFile with the list. This means most of the synchronize 
aspects of RAMFile can be removed. Also, on RAMFile, lastModified can be made 
volatile, and remove the sync on its respective methods.

 Use ConcurrentHashMap in RAMDirectory
 -

 Key: LUCENE-2779
 URL: https://issues.apache.org/jira/browse/LUCENE-2779
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, 
 LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java


 RAMDirectory synchronizes on its instance in many places to protect access to 
 map of RAMFiles, in addition to updating the sizeInBytes member. In many 
 places the sync is done for 'read' purposes, while only in few places we need 
 'write' access. This looks like a perfect use case for ConcurrentHashMap
 Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong 
 ...
 I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2773) Don't create compound file for large segments by default

2010-11-24 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935299#action_12935299
 ] 

Shay Banon commented on LUCENE-2773:


Mike, are you sure regarding the default maxMergeMB set to 2gb? This ia a big 
change in default behavior. For systems that do updates (deletes) we are 
covered because they are taken (partially) into account when computing the 
segment size. But, lets say you have a 100gb size index, you will end up with 
50 segments, no?

 Don't create compound file for large segments by default
 

 Key: LUCENE-2773
 URL: https://issues.apache.org/jira/browse/LUCENE-2773
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.4, 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2773.patch


 Spinoff from LUCENE-2762.
 CFS is useful for keeping the open file count down.  But, it costs
 some added time during indexing to build, and also ties up temporary
 disk space, causing eg a large spike on the final merge of an
 optimize.
 Since MergePolicy dictates which segments should be CFS, we can
 change it to only build CFS for smallish merges.
 I think we should also set a maxMergeMB by default so that very large
 merges aren't done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2010-09-24 Thread Shay Banon (JIRA)
ArrayIndexOutOfBoundsException when iterating over TermDocs
---

 Key: LUCENE-2666
 URL: https://issues.apache.org/jira/browse/LUCENE-2666
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.2
Reporter: Shay Banon


A user got this very strange exception, and I managed to get the index that it 
happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I 
easily reproduced it using the FieldCache which does exactly that (the field in 
question is indexed as numeric). Here is the exception:

Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
at 
org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
at 
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
at TestMe.main(TestMe.java:56)

It happens on the following segment: _26t docCount: 914 delCount: 1 
delFileName: _26t_1.del

And as you can see, it smells like a corner case (it fails for document number 
912, the AIOOB happens from the deleted docs). The code to recreate it is 
simple:

FSDirectory dir = FSDirectory.open(new File(index));
IndexReader reader = IndexReader.open(dir, true);

IndexReader[] subReaders = reader.getSequentialSubReaders();
for (IndexReader subReader : subReaders) {
Field field = 
subReader.getClass().getSuperclass().getDeclaredField(si);
field.setAccessible(true);
SegmentInfo si = (SegmentInfo) field.get(subReader);
System.out.println(--  + si);
if (si.getDocStoreSegment().contains(_26t)) {
// this is the probleatic one...
System.out.println(problematic one...);
FieldCache.DEFAULT.getLongs(subReader, __documentdate, 
FieldCache.NUMERIC_UTILS_LONG_PARSER);
}
}

Here is the result of a check index on that segment:

  8 of 10: name=_26t docCount=914
compound=true
hasProx=true
numFiles=2
size (MB)=1.641
diagnostics = {optimize=false, mergeFactor=10, 
os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, 
java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_26t_1.del]
test: open reader.OK [1 deleted docs]
test: fields..OK [32 fields]
test: field norms.OK [32 fields]
test: terms, freq, prox...ERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
at 
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
at TestMe.main(TestMe.java:47)
test: stored fields...ERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
at 
org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
at TestMe.main(TestMe.java:47)
test: term vectorsERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
at 
org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
at TestMe.main(TestMe.java:47)



The creation of the index does not do something fancy (all defaults), though 
there is usage of the near real time aspect (IndexWriter#getReader) which does 
complicate deleted docs handling. Seems like the deleted docs got written 
without matching the number of docs?. Sadly, I don't have something that 
recreates it from scratch, but I do have the index if someone want to have a 
look at it (mail me directly and I will provide a download link).

I will continue to 

[jira] Commented: (LUCENE-2161) Some concurrency improvements for NRT

2010-06-02 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874454#action_12874454
 ] 

Shay Banon commented on LUCENE-2161:


Thanks!

 Some concurrency improvements for NRT
 -

 Key: LUCENE-2161
 URL: https://issues.apache.org/jira/browse/LUCENE-2161
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9.3, 3.0.2, 3.1, 4.0

 Attachments: LUCENE-2161.patch


 Some concurrency improvements for NRT
 I found  fixed some silly thread bottlenecks that affect NRT:
   * Multi/DirectoryReader.numDocs is synchronized, I think so only 1
 thread computes numDocs if it's -1.  I removed this sync, and made
 numDocs volatile, instead.  Yes, multiple threads may compute the
 numDocs for the first time, but I think that's harmless?
   * Fixed BitVector's ctor to set count to 0 on creating a new BV, and
 clone to copy the count over; this saves CPU computing the count
 unecessarily.
   * Also strengthened assertions done in SR, testing the delete docs
 count.
 I also found an annoying thread bottleneck that happens, due to CMS.
 Whenever CMS hits the max running merges (default changed from 3 to 1
 recently), and the merge policy now wants to launch another merge, it
 forces the incoming thread to wait until one of the BG threads
 finishes.
 This is a basic crude throttling mechanism -- you force the mutators
 (whoever is causing new segments to appear) to stop, so that merging
 can catch up.
 Unfortunately, when stressing NRT, that thread is the one that's
 opening a new NRT reader.
 So, the first serious problem happens when you call .reopen() on your
 NRT reader -- this call simply forwards to IW.getReader if the reader
 was an NRT reader.  But, because DirectoryReader.doReopen is
 synchronized, this had the horrible effect of holding the monitor lock
 on your main IR.  In my test, this blocked all searches (since each
 search uses incRef/decRef, still sync'd until LUCENE-2156, at least).
 I fixed this by making doReopen only sync'd on this if it's not simply
 forwarding to getWriter.  So that's a good step forward.
 This prevents searches from being blocked while trying to reopen to a
 new NRT.
 However... it doesn't fix the problem that when an immense merge is
 off and running, opening an NRT reader could hit a tremendous delay
 because CMS blocks it.  The BalancedSegmentMergePolicy should help
 here... by avoiding such immense merges.
 But, I think we should also pursue an improvement to CMS.  EG, if it
 has 2 merges running, where one is huge and one is tiny, it ought to
 increase thread priority of the tiny one.  I think with such a change
 we could increase the max thread count again, to prevent this
 starvation.  I'll open a separate issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2161) Some concurrency improvements for NRT

2010-05-30 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873475#action_12873475
 ] 

Shay Banon commented on LUCENE-2161:


Mike, is there a reason why this is not backported to 3.0.2?

 Some concurrency improvements for NRT
 -

 Key: LUCENE-2161
 URL: https://issues.apache.org/jira/browse/LUCENE-2161
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9.3, 4.0

 Attachments: LUCENE-2161.patch


 Some concurrency improvements for NRT
 I found  fixed some silly thread bottlenecks that affect NRT:
   * Multi/DirectoryReader.numDocs is synchronized, I think so only 1
 thread computes numDocs if it's -1.  I removed this sync, and made
 numDocs volatile, instead.  Yes, multiple threads may compute the
 numDocs for the first time, but I think that's harmless?
   * Fixed BitVector's ctor to set count to 0 on creating a new BV, and
 clone to copy the count over; this saves CPU computing the count
 unecessarily.
   * Also strengthened assertions done in SR, testing the delete docs
 count.
 I also found an annoying thread bottleneck that happens, due to CMS.
 Whenever CMS hits the max running merges (default changed from 3 to 1
 recently), and the merge policy now wants to launch another merge, it
 forces the incoming thread to wait until one of the BG threads
 finishes.
 This is a basic crude throttling mechanism -- you force the mutators
 (whoever is causing new segments to appear) to stop, so that merging
 can catch up.
 Unfortunately, when stressing NRT, that thread is the one that's
 opening a new NRT reader.
 So, the first serious problem happens when you call .reopen() on your
 NRT reader -- this call simply forwards to IW.getReader if the reader
 was an NRT reader.  But, because DirectoryReader.doReopen is
 synchronized, this had the horrible effect of holding the monitor lock
 on your main IR.  In my test, this blocked all searches (since each
 search uses incRef/decRef, still sync'd until LUCENE-2156, at least).
 I fixed this by making doReopen only sync'd on this if it's not simply
 forwarding to getWriter.  So that's a good step forward.
 This prevents searches from being blocked while trying to reopen to a
 new NRT.
 However... it doesn't fix the problem that when an immense merge is
 off and running, opening an NRT reader could hit a tremendous delay
 because CMS blocks it.  The BalancedSegmentMergePolicy should help
 here... by avoiding such immense merges.
 But, I think we should also pursue an improvement to CMS.  EG, if it
 has 2 merges running, where one is huge and one is tiny, it ought to
 increase thread priority of the tiny one.  I think with such a change
 we could increase the max thread count again, to prevent this
 starvation.  I'll open a separate issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-20 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869779#action_12869779
 ] 

Shay Banon commented on LUCENE-2468:


Hi Mike, 

First, I opened and attached a patch regarding the Cache eviction listeners to 
IndexReader: https://issues.apache.org/jira/browse/LUCENE-2474, tell me what 
you think.

Regarding your last comment, I agree. Though, trying to streamline its usage in 
terms of having all built in components and possible extensions work well with 
it make sense. Thats what you suggest in with the filtered doc set, which is 
cool.

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
 LUCENE-2468.patch, LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-19 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869369#action_12869369
 ] 

Shay Banon commented on LUCENE-2468:


bq. So... why not do this in CachingWrapper/SpanFilter, but, instead of 
discarding the cache entry when deletions must be enforced, we dynamically 
apply the deletions? (I think we could use FilteredDocIdSet).

Yea, that would work well. You will need to somehow still know when to enable 
or disable this based on the filter you use (it should basically only be 
enabled ones that are passed to constant score... .

bq. Really... we need a more generic solution here (but, it's a much bigger 
change), where somehow in creating the scorer per-segment we dynamically 
determine who/where the deletions are enforced. A Filter need not care about 
deletions if it's AND'd w/ a query that already enforces the deletions.

Agreed. As I see it, caching based on IndexReader is key in Lucene, and with 
NRT, it should feel the same way as it is without it. NRT should not change the 
way you build your system.

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
 LUCENE-2468.patch, LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868617#action_12868617
 ] 

Shay Banon commented on LUCENE-2468:


Sounds like a good solution for me. I just noticed in trunk that there is also 
explicit purge from FieldCache when possible. I think it would be great to 
enable to do this for other caches that are based on it (like the 
CachingWrapperFilter, but externally written ones as well).

I was thinking of an expert API to allow to add a CacheEvictionListener or 
something similar, which will be called when this happens. What do you think?

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868647#action_12868647
 ] 

Shay Banon commented on LUCENE-2468:


bq. Shay, as far as CachingWrapperFilter and CacheEvictionListener, it seems 
more powerful to just let apps create a new query type themselves? That's the 
nice part of lucene's openness to user query types - start with the code for 
CachingWrapperFilter and hook up your own caching logic.

Yea, but it would be great to know when an IndexReader has decided to actually 
close, so caches can be eagerly cleaned. Even if one will write a custom 
implementation, it would benefit it.

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868659#action_12868659
 ] 

Shay Banon commented on LUCENE-2468:


I think that the solution suggested, to use the FieldCacheKey is not good 
enough, sadly. I am attaching a simpel test that shows that this does not work 
for cases when a query is passed to a searcher, without a filter, but that 
query, is, for example, a ConstantScoreQuery. I have simply taken the 
CachingWrapperFiler and changed it to use the getFieldCacheKey instead of using 
the IndexReader.

This is problematic, since a filter can be used somewhere in the query tree, 
and wrapped for caching. I am running against 3.0.1.

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2468:
---

Attachment: CacheTest.java

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: CacheTest.java, LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868816#action_12868816
 ] 

Shay Banon commented on LUCENE-2468:


Thanks for the work Michael!. Is this issue going to include the 
ConstantSoreQuery, or should I open a different issue for this?

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868869#action_12868869
 ] 

Shay Banon commented on LUCENE-2468:


Check two comments above :), we discussed it. Basically, it does not work with 
your change and it using a cached filter.

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868923#action_12868923
 ] 

Shay Banon commented on LUCENE-2468:


Ahh, now I see that, sorry I missed it. But, basically, enforcing deletions 
means that we are back to the original problem... . I think it would be quite 
confusing for users, to be honest. Out of the filters, the problematic ones are 
the ones that can be converted to queries. From what I can see, the 
FilteredQuery is ok, so, maybe the ConstantScore can be changed (if possible) 
to do that... .

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: CacheTest.java, LUCENE-2468.patch, LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2468:
---

Attachment: DeletionAwareConstantScoreQuery.java

Here is a go at making ConstantScoreQuery deletion aware. I named it 
differently, but it can replace ConstantScoreQuery with a flag making it 
deletion aware. What do you think?

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
 LUCENE-2468.patch, LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-05-18 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868959#action_12868959
 ] 

Shay Banon commented on LUCENE-2468:


Another quick question Mike, what do you think about the ability to know when a 
cache key is actually closed so it can be removed from a cache? Similar in 
concept to the eviction done from the field cache in trunk by readers, but open 
so other Reader#cacheKey based caches (which is the simplest way to do caching 
in Lucene) can use.

 reopen on NRT reader should share readers w/ unchanged segments
 ---

 Key: LUCENE-2468
 URL: https://issues.apache.org/jira/browse/LUCENE-2468
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
 LUCENE-2468.patch, LUCENE-2468.patch


 A repoen on an NRT reader doesn't seem to share readers for those segments 
 that are unchanged.
 http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2283) Possible Memory Leak in StoredFieldsWriter

2010-05-04 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864042#action_12864042
 ] 

Shay Banon commented on LUCENE-2283:


Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be 
really helpful to get this as soon as possible in the next Lucene version.



 Possible Memory Leak in StoredFieldsWriter
 --

 Key: LUCENE-2283
 URL: https://issues.apache.org/jira/browse/LUCENE-2283
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2283.patch, LUCENE-2283.patch, LUCENE-2283.patch


 StoredFieldsWriter creates a pool of PerDoc instances
 this pool will grow but never be reclaimed by any mechanism
 furthermore, each PerDoc instance contains a RAMFile.
 this RAMFile will also never be truncated (and will only ever grow) (as far 
 as i can tell)
 When feeding documents with large number of stored fields (or one large 
 dominating stored field) this can result in memory being consumed in the 
 RAMFile but never reclaimed. Eventually, each pooled PerDoc could grow very 
 large, even if large documents are rare.
 Seems like there should be some attempt to reclaim memory from the PerDoc[] 
 instance pool (or otherwise limit the size of RAMFiles that are cached) etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-05-04 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864044#action_12864044
 ] 

Shay Banon commented on LUCENE-2387:


Is there a chance that this can also be applied to 3.0.2 / 3.1? It would be 
really helpful to get this as soon as possible in the next Lucene version.


 IndexWriter retains references to Readers used in Fields (memory leak)
 --

 Key: LUCENE-2387
 URL: https://issues.apache.org/jira/browse/LUCENE-2387
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.1
Reporter: Ruben Laguna
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2387-29x.patch, LUCENE-2387.patch


 As described in [1] IndexWriter retains references to Reader used in Fields 
 and that can lead to big memory leaks when using tika's ParsingReaders (as 
 those can take 1MB per ParsingReader). 
 [2] shows a screenshot of the reference chain to the Reader from the 
 IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the 
 following:
 IndexWriter - DocumentsWriter - DocumentsWriterThreadState - 
 DocFieldProcessorPerThread  - DocFieldProcessorPerField - Fieldable - 
 Field (fieldsData) 
 -
 [1] http://markmail.org/thread/ndmcgffg2mnwjo47
 [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap

2010-03-02 Thread Shay Banon (JIRA)
ByteBuffer Directory - allowing to store the index outside the heap
---

 Key: LUCENE-2292
 URL: https://issues.apache.org/jira/browse/LUCENE-2292
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon


A byte buffer based directory with the benefit of being able to create direct 
byte buffer thus storing the index outside the JVM heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap

2010-03-02 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2292:
---

Attachment: LUCENE-2292.patch

 ByteBuffer Directory - allowing to store the index outside the heap
 ---

 Key: LUCENE-2292
 URL: https://issues.apache.org/jira/browse/LUCENE-2292
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2292.patch


 A byte buffer based directory with the benefit of being able to create direct 
 byte buffer thus storing the index outside the JVM heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap

2010-03-02 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840379#action_12840379
 ] 

Shay Banon commented on LUCENE-2292:


Hi,

 looks interesting as a replacement for RAMDirectory.

This class uses ByteBuffer, which has its overhead over simple byte[], though 
using the same logic (if you verify it) can be used to improve the concurrency 
in RAMDirectory (just use byte[[).

 Your patch uses a sun. internal package. If you want to do something 
 similar to MMapDirectory to release the buffer without waiting for GC, do it 
 in the same way using reflection like in MMapDirectory.

From what I know, it was there in all JDKs I worked with (its like 
sun.misc.Unsafe). Have you seen otherwise? If so, its a simple change (though 
I am not sure about the access control thingy in MMapDirectory, its a 
performance killer, and caching of the Method(s) make sense).

   

 ByteBuffer Directory - allowing to store the index outside the heap
 ---

 Key: LUCENE-2292
 URL: https://issues.apache.org/jira/browse/LUCENE-2292
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2292.patch


 A byte buffer based directory with the benefit of being able to create direct 
 byte buffer thus storing the index outside the JVM heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap

2010-03-02 Thread Shay Banon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Banon updated LUCENE-2292:
---

Attachment: LUCENE-2292.patch

Attached new patch, does not use sun.* package. I still cache Method since 
cleaning a buffer is not only done on close of the directory.

 ByteBuffer Directory - allowing to store the index outside the heap
 ---

 Key: LUCENE-2292
 URL: https://issues.apache.org/jira/browse/LUCENE-2292
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2292.patch, LUCENE-2292.patch


 A byte buffer based directory with the benefit of being able to create direct 
 byte buffer thus storing the index outside the JVM heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2292) ByteBuffer Directory - allowing to store the index outside the heap

2010-03-02 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840394#action_12840394
 ] 

Shay Banon commented on LUCENE-2292:


By the way, an implementation note. I thought about preallocating a large 
direct buffer and then splicing it into chunks, but currently I think that the 
complexity (and overhead in maintaining splicing locations) is not really 
needed and the current caching should do the trick (with the ability to control 
both the buffer size and the cache size).

 ByteBuffer Directory - allowing to store the index outside the heap
 ---

 Key: LUCENE-2292
 URL: https://issues.apache.org/jira/browse/LUCENE-2292
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Shay Banon
 Attachments: LUCENE-2292.patch, LUCENE-2292.patch


 A byte buffer based directory with the benefit of being able to create direct 
 byte buffer thus storing the index outside the JVM heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1637) Getting an IndexReader from a committed IndexWriter

2009-05-14 Thread Shay Banon (JIRA)
Getting an IndexReader from a committed IndexWriter
---

 Key: LUCENE-1637
 URL: https://issues.apache.org/jira/browse/LUCENE-1637
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9
Reporter: Shay Banon


I just had a look at the job done in IndexWriter in order to get an IndexReader 
with all the current ongoing changes done using the IndexWriter. This feature 
is very useful, and I was wondering if another feature, which (I think) is 
simple to implement (compared to the previous one) might make sense. 

   Many times, an application opens an IndexWriter, does whatever changes it 
does, and then commits the changes. It would be nice to get an IndexReader 
(read only one is fine) that corresponds to the committed (or even closed) 
IndexWriter. This will allow for a cache of IndexReader that is already used to 
be updated with a fresh IndexReader, without the need to reopen one (which 
should be slower than opening one based on the IndexWriter information). The 
main difference is the fact that the mentioned IndexReader could still be 
reopened without the need to throw an AlreadyClosedException. 

   More information can be found here: 
http://www.nabble.com/Getting-an-IndexReader-from-a-committed-IndexWriter-td23551978.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler

2008-03-18 Thread Shay Banon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579874#action_12579874
 ] 

Shay Banon commented on LUCENE-1239:


Yea, it looks like it is my bad, great catch!. While trying to create a better 
scheduler (at least in terms of reusing threads instead of creating them), I 
wondered if there is a chance that the current scheduler can be enhanced to 
support an extension point for that... . I can give such a refactoring a go if 
you think it make sense.

 IndexWriter deadlock when using ConcurrentMergeScheduler
 

 Key: LUCENE-1239
 URL: https://issues.apache.org/jira/browse/LUCENE-1239
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1
 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, 
 Spring Framework 2.0.7.0
Reporter: Michael Lossos
Assignee: Michael McCandless

 I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to 
 Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other 
 things constant and only changing the Compass and Lucene jars. I'm recreating 
 the search index for our data and seeing deadlock in Lucene's IndexWriter. It 
 appears to be waiting on a signal from the merge thread. I've tried creating 
 a simple reproduction case for this but to no avail.
 Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no 
 problems and recreates our search index. That is to say, it's not our code.
 In particular, the main thread performing the commit (Lucene document save) 
 from Compass is calling Lucene's IndexWriter.optimize(). We're using 
 Compass's ExecutorMergeScheduler to handle the merging, and it is calling 
 IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the 
 wait() at the bottom of that method and is never notified. I can't tell if 
 this is because optimizeMergesPending() is returning true incorrectly, or if 
 IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the 
 code, it doesn't seem possible for IndexWriter.optimize() to be waiting and 
 miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to 
 always call notifyAll() even on exceptions -- that is all the relevant 
 IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the 
 deadlock behavior described, and it's reproducible using our app and our test 
 data set.
 Could someone familiar with IndexWriter's synchronization code take another 
 look at it? I'm sorry that I can't give you a simple reproduction test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-511) New BufferedIndexOutput optimization fails to update bufferStart

2006-03-02 Thread Shay Banon (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-511?page=all ]

Shay Banon updated LUCENE-511:
--

Attachment: BufferedIndexOutput.java

 New BufferedIndexOutput optimization fails to update bufferStart
 

  Key: LUCENE-511
  URL: http://issues.apache.org/jira/browse/LUCENE-511
  Project: Lucene - Java
 Type: Bug
   Components: Store
 Versions: 1.9
 Reporter: Shay Banon
 Priority: Critical
  Attachments: BufferedIndexOutput.java, RAMOutputTest.java

 New BufferIndexOutput optimization of writeBytes fails to update bufferStart 
 under some conditions. Test case and fix attached.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]