date:20090627

[jira] Commented: (LUCENE-1709) Parallelize Tests

2009-06-27 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724815#action_12724815
 ] 

Michael McCandless commented on LUCENE-1709:


Actually I see decent gains from concurrency: when I run tests with 6
threads my tests run a little over 3X faster (12:59 with 1 thread and
4:15 with 6 threads).

I'm using a Python script that launches the threads, each specifying
-Dtestpackage to run a certain subset of Lucene's tests.

This is on an OpenSolaris (2009.06) machine, with a Core i7 920 CPU
(= 8 cores presented to the OS) and an Intel X25M SSD, 12 GB RAM.  The
hardware has quite a bit of concurrency.


> Parallelize Tests
> -
>
> Key: LUCENE-1709
> URL: https://issues.apache.org/jira/browse/LUCENE-1709
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
> Fix For: 3.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The Lucene tests can be parallelized to make for a faster testing system.  
> This task from ANT can be used: 
> http://ant.apache.org/manual/CoreTasks/parallel.html
> Previous discussion: 
> http://www.gossamer-threads.com/lists/lucene/java-dev/69669
> Notes from Mike M.:
> {quote}
> I'd love to see a clean solution here (the tests are embarrassingly
> parallelizable, and we all have machines with good concurrency these
> days)... I have a rather hacked up solution now, that uses
> "-Dtestpackage=XXX" to split the tests up.
> Ideally I would be able to say "use N threads" and it'd do the right
> thing... like the -j flag to make.
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-27 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1630:
---

Attachment: LUCENE-1630.patch

Added testcase to TestBooleanQuery

> Mating Collector and Scorer on doc Id orderness
> ---
>
> Key: LUCENE-1630
> URL: https://issues.apache.org/jira/browse/LUCENE-1630
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shai Erera
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: LUCENE-1630-2.patch, LUCENE-1630.patch, 
> LUCENE-1630.patch, LUCENE-1630.patch, LUCENE-1630.patch, LUCENE-1630.patch, 
> LUCENE-1630.patch, LUCENE-1630.patch, LUCENE-1630.patch
>
>
> This is a spin off of LUCENE-1593. This issue proposes to expose appropriate 
> API on Scorer and Collector such that one can create an optimized Collector 
> based on a given Scorer's doc-id orderness and vice versa. Copied from 
> LUCENE-1593, here is the list of changes:
> # Deprecate Weight and create QueryWeight (abstract class) with a new 
> scorer(reader, scoreDocsInOrder), replacing the current scorer(reader) 
> method. QueryWeight implements Weight, while score(reader) calls 
> score(reader, false /* out-of-order */) and scorer(reader, scoreDocsInOrder) 
> is defined abstract.
> #* Also add QueryWeightWrapper to wrap a given Weight implementation. This 
> one will also be deprecated, as well as package-private.
> #* Add to Query variants of createWeight and weight which return QueryWeight. 
> For now, I prefer to add a default impl which wraps the Weight variant 
> instead of overriding in all Query extensions, and in 3.0 when we remove the 
> Weight variants - override in all extending classes.
> # Add to Scorer isOutOfOrder with a default to false, and override in BS to 
> true.
> # Modify BooleanWeight to extend QueryWeight and implement the new scorer 
> method to return BS2 or BS based on the number of required scorers and 
> setAllowOutOfOrder.
> # Add to Collector an abstract _acceptsDocsOutOfOrder_ which returns 
> true/false.
> #* Use it in IndexSearcher.search methods, that accept a Collector, in order 
> to create the appropriate Scorer, using the new QueryWeight.
> #* Provide a static create method to TFC and TSDC which accept this as an 
> argument and creates the proper instance.
> #* Wherever we create a Collector (TSDC or TFC), always ask for out-of-order 
> Scorer and check on the resulting Scorer isOutOfOrder(), so that we can 
> create the optimized Collector instance.
> # Modify IndexSearcher to use all of the above logic.
> The only class I'm worried about, and would like to verify with you, is 
> Searchable. If we want to deprecate all the search methods on IndexSearcher, 
> Searcher and Searchable which accept Weight and add new ones which accept 
> QueryWeight, we must do the following:
> * Deprecate Searchable in favor of Searcher.
> * Add to Searcher the new QueryWeight variants. Here we have two choices: (1) 
> break back-compat and add them as abstract (like we've done with the new 
> Collector method) or (2) add them with a default impl to call the Weight 
> versions, documenting these will become abstract in 3.0.
> * Have Searcher extend UnicastRemoteObject and have RemoteSearchable extend 
> Searcher. That's the part I'm a little bit worried about - Searchable 
> implements java.rmi.Remote, which means there could be an implementation out 
> there which implements Searchable and extends something different than 
> UnicastRemoteObject, like Activeable. I think there is very small chance this 
> has actually happened, but would like to confirm with you guys first.
> * Add a deprecated, package-private, SearchableWrapper which extends Searcher 
> and delegates all calls to the Searchable member.
> * Deprecate all uses of Searchable and add Searcher instead, defaulting the 
> old ones to use SearchableWrapper.
> * Make all the necessary changes to IndexSearcher, MultiSearcher etc. 
> regarding overriding these new methods.
> One other optimization that was discussed in LUCENE-1593 is to expose a 
> topScorer() API (on Weight) which returns a Scorer that its score(Collector) 
> will be called, and additionally add a start() method to DISI. That will 
> allow Scorers to initialize either on start() or score(Collector). This was 
> proposed mainly because of BS and BS2 which check if they are initialized in 
> every call to next(), skipTo() and score(). Personally I prefer to see that 
> in a separate issue, following that one (as it might add methods to 
> QueryWeight).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr

Re: Improving TimeLimitedCollector

2009-06-27 Thread Shai Erera

I like the overall approach. However it's very local to an IndexReader.
I.e., if someone wanted to limit other operations (say indexing), or does
not use an IndexReader (for a Scorer impl maybe), one cannot reuse it.

What if we factor out the timeout logic to a Timeout class (I think it can
be a static class, with the way you implemented it) and use it in
TimeLimitingIndexReader? That class can offer a method check() which will do
the internal logic (the 'if' check and throw exception). It will be similar
to the current ensureOpen() followed by an operation.

It might be considered more expensive since it won't check a boolean, but
instead call a check() method, but it will be more reusable. Also,
ensureOpen today is also a method call, so I don't think Timeout.check() is
that bad. We can even later create a TimeLimitingIndexWriter and document
Timeout class for other usage by external code.

Aside, how about using a PQ for the threads' times, or a TreeMap. That will
save looping over the collection to find the next candidate. Just an
implementation detail though.

Shai

On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood wrote:

> Going back to my post re TimeLimitedIndexReaders - here's an incomplete but
> functional prototype:
> http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
> http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java
>
>
> The principle is that all reader accesses check a volatile variable
> indicating something may have timed out (no need to check thread locals
> etc.) If and only if a time out has been noted threadlocals are checked to
> see which thread should throw a timeout exception.
>
> All time-limited use of reader must be wrapped in try...finally calls to
> indicate the start and stop of a timed set of activities. A background
> thread maintains the next anticipated timeout deadline and simply waits
> until this is reached or the list of planned activities changes with new
> deadlines.
>
>
> Performance seems reasonable on my Wikipedia index:
>
> //some tests for heavy use of termenum/term docs
> Read term docs for 20 terms  in 4755 ms using no timeout limit (warm
> up)
> Read term docs for 20 terms  in 4320 ms using no timeout limit (warm
> up)
> Read term docs for 20 terms  in 4320 ms using no timeout limit
> Read term docs for 20 terms  in 4388 ms using  reader with time-limited
> access
>
> //Example query with heavy use of termEnum/termDocs
> +text:f* +text:a* +text:b* no time limit matched 1090041 docs in 2000 ms
> +text:f* +text:a* +text:b* time limited collector matched 1090041 docs in
> 1963 ms
> +text:f* +text:a* +text:b* time limited reader matched 1090041 docs in 2121
> ms
>
> //Example fuzzy match burning CPU reading TermEnum
> text:accomodation~0.5 no time limit matched 192084 docs in 6428 ms
> text:accomodation~0.5 time limited collector matched 192084 docs in 5923
> ms
> text:accomodation~0.5 time limited reader matched 192084 docs in 5945 ms
>
>
> The reader approach to limiting time is slower but has these advantages :
>
> 1) Multiple reader activities can be time-limited rather than just single
> searches
> 2) No code changes required to scorers/queries/filters etc
> 3) Tasks that spend plenty of  time burning CPU before collection happens
> can be killed earlier
>
> I'm sure there's some thread safety issues to work through in my code and
> not all reader classes are wrapped (e.g. TermPositions) but the basics are
> there and seem to be functioning
>
> Thoughts?
>

[jira] Resolved: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-27 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1630.


Resolution: Fixed

Super -- I just committed this; thanks Shai.

> Mating Collector and Scorer on doc Id orderness
> ---
>
> Key: LUCENE-1630
> URL: https://issues.apache.org/jira/browse/LUCENE-1630
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Shai Erera
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: LUCENE-1630-2.patch, LUCENE-1630.patch, 
> LUCENE-1630.patch, LUCENE-1630.patch, LUCENE-1630.patch, LUCENE-1630.patch, 
> LUCENE-1630.patch, LUCENE-1630.patch, LUCENE-1630.patch
>
>
> This is a spin off of LUCENE-1593. This issue proposes to expose appropriate 
> API on Scorer and Collector such that one can create an optimized Collector 
> based on a given Scorer's doc-id orderness and vice versa. Copied from 
> LUCENE-1593, here is the list of changes:
> # Deprecate Weight and create QueryWeight (abstract class) with a new 
> scorer(reader, scoreDocsInOrder), replacing the current scorer(reader) 
> method. QueryWeight implements Weight, while score(reader) calls 
> score(reader, false /* out-of-order */) and scorer(reader, scoreDocsInOrder) 
> is defined abstract.
> #* Also add QueryWeightWrapper to wrap a given Weight implementation. This 
> one will also be deprecated, as well as package-private.
> #* Add to Query variants of createWeight and weight which return QueryWeight. 
> For now, I prefer to add a default impl which wraps the Weight variant 
> instead of overriding in all Query extensions, and in 3.0 when we remove the 
> Weight variants - override in all extending classes.
> # Add to Scorer isOutOfOrder with a default to false, and override in BS to 
> true.
> # Modify BooleanWeight to extend QueryWeight and implement the new scorer 
> method to return BS2 or BS based on the number of required scorers and 
> setAllowOutOfOrder.
> # Add to Collector an abstract _acceptsDocsOutOfOrder_ which returns 
> true/false.
> #* Use it in IndexSearcher.search methods, that accept a Collector, in order 
> to create the appropriate Scorer, using the new QueryWeight.
> #* Provide a static create method to TFC and TSDC which accept this as an 
> argument and creates the proper instance.
> #* Wherever we create a Collector (TSDC or TFC), always ask for out-of-order 
> Scorer and check on the resulting Scorer isOutOfOrder(), so that we can 
> create the optimized Collector instance.
> # Modify IndexSearcher to use all of the above logic.
> The only class I'm worried about, and would like to verify with you, is 
> Searchable. If we want to deprecate all the search methods on IndexSearcher, 
> Searcher and Searchable which accept Weight and add new ones which accept 
> QueryWeight, we must do the following:
> * Deprecate Searchable in favor of Searcher.
> * Add to Searcher the new QueryWeight variants. Here we have two choices: (1) 
> break back-compat and add them as abstract (like we've done with the new 
> Collector method) or (2) add them with a default impl to call the Weight 
> versions, documenting these will become abstract in 3.0.
> * Have Searcher extend UnicastRemoteObject and have RemoteSearchable extend 
> Searcher. That's the part I'm a little bit worried about - Searchable 
> implements java.rmi.Remote, which means there could be an implementation out 
> there which implements Searchable and extends something different than 
> UnicastRemoteObject, like Activeable. I think there is very small chance this 
> has actually happened, but would like to confirm with you guys first.
> * Add a deprecated, package-private, SearchableWrapper which extends Searcher 
> and delegates all calls to the Searchable member.
> * Deprecate all uses of Searchable and add Searcher instead, defaulting the 
> old ones to use SearchableWrapper.
> * Make all the necessary changes to IndexSearcher, MultiSearcher etc. 
> regarding overriding these new methods.
> One other optimization that was discussed in LUCENE-1593 is to expose a 
> topScorer() API (on Weight) which returns a Scorer that its score(Collector) 
> will be called, and additionally add a start() method to DISI. That will 
> allow Scorers to initialize either on start() or score(Collector). This was 
> proposed mainly because of BS and BS2 which check if they are initialized in 
> every call to next(), skipTo() and score(). Personally I prefer to see that 
> in a separate issue, following that one (as it might add methods to 
> QueryWeight).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: j

Re: Improving TimeLimitedCollector

2009-06-27 Thread Mark Harwood


Thanks for the feedback, Shai.

So I guess you're suggesting breaking this out into a general utility  
class e.g. something like:


class TimeLimitedThreadActivity
{
//called by client
public static void startTimeLimitedActivity(long  
maxTimePermitted).

public static void endTimeLimitedActivity()

   //called by resources (reader/writers) that need to be shared  
fairly by threads
  public static void checkActivityNotElapsed(); //throws some  
form of runtime exception

}

A downside of breaking it out into static methods like this is that a  
thread cannot run >1 time-limited activity simultaneously but I guess  
that might be a reasonable restriction.



>>Aside, how about using a PQ for the threads' times, or a TreeMap.  
That will save looping over the collection to find the next candidate.  
Just an implementation detail though.


Yep, that was one of the rough edges - I just wanted to get raw  
timings first for the all the "is timed out?" checks we're injecting  
into reader calls.


Cheers
Mark


On 27 Jun 2009, at 11:37, Shai Erera wrote:

I like the overall approach. However it's very local to an  
IndexReader. I.e., if someone wanted to limit other operations (say  
indexing), or does not use an IndexReader (for a Scorer impl maybe),  
one cannot reuse it.


What if we factor out the timeout logic to a Timeout class (I think  
it can be a static class, with the way you implemented it) and use  
it in TimeLimitingIndexReader? That class can offer a method check()  
which will do the internal logic (the 'if' check and throw  
exception). It will be similar to the current ensureOpen() followed  
by an operation.


It might be considered more expensive since it won't check a  
boolean, but instead call a check() method, but it will be more  
reusable. Also, ensureOpen today is also a method call, so I don't  
think Timeout.check() is that bad. We can even later create a  
TimeLimitingIndexWriter and document Timeout class for other usage  
by external code.


Aside, how about using a PQ for the threads' times, or a TreeMap.  
That will save looping over the collection to find the next  
candidate. Just an implementation detail though.


Shai

On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood  
 wrote:
Going back to my post re TimeLimitedIndexReaders - here's an  
incomplete but functional prototype:


http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java


The principle is that all reader accesses check a volatile variable  
indicating something may have timed out (no need to check thread  
locals etc.) If and only if a time out has been noted threadlocals  
are checked to see which thread should throw a timeout exception.


All time-limited use of reader must be wrapped in try...finally  
calls to indicate the start and stop of a timed set of activities. A  
background thread maintains the next anticipated timeout deadline  
and simply waits until this is reached or the list of planned  
activities changes with new deadlines.



Performance seems reasonable on my Wikipedia index:

//some tests for heavy use of termenum/term docs
Read term docs for 20 terms  in 4755 ms using no timeout limit  
(warm up)
Read term docs for 20 terms  in 4320 ms using no timeout limit  
(warm up)

Read term docs for 20 terms  in 4320 ms using no timeout limit
Read term docs for 20 terms  in 4388 ms using  reader with time- 
limited access


//Example query with heavy use of termEnum/termDocs
+text:f* +text:a* +text:b* no time limit matched 1090041 docs in  
2000 ms
+text:f* +text:a* +text:b* time limited collector matched 1090041  
docs in 1963 ms
+text:f* +text:a* +text:b* time limited reader matched 1090041 docs  
in 2121 ms


//Example fuzzy match burning CPU reading TermEnum
text:accomodation~0.5 no time limit matched 192084 docs in  6428 ms
text:accomodation~0.5 time limited collector matched 192084 docs in 	 
5923 ms
text:accomodation~0.5 time limited reader matched 192084 docs in 	 
5945 ms



The reader approach to limiting time is slower but has these  
advantages :


1) Multiple reader activities can be time-limited rather than just  
single searches

2) No code changes required to scorers/queries/filters etc
3) Tasks that spend plenty of  time burning CPU before collection  
happens can be killed earlier


I'm sure there's some thread safety issues to work through in my  
code and not all reader classes are wrapped (e.g. TermPositions) but  
the basics are there and seem to be functioning


Thoughts?

Re: Improving TimeLimitedCollector

2009-06-27 Thread Shai Erera

>
> A downside of breaking it out into static methods like this is that a
> thread cannot run >1 time-limited activity simultaneously but I guess that
> might be a reasonable restriction.
>

I'm not sure I understand that - how can a thread run >1 activity
simultaneously anyway, and how's your impl in TimeLimitingIndexReader allows
it to do so? You use the thread as a key to the map. Am I missing something?

Anyway, I think we can let go of the static methods and make them instance
methods. I think .. if I want to use time limited activities, I should
create a TimeLimitedThreadActivity instance and pass it around, to
TimeLimitingIndexReader (and maybe in the future to a similar **IndexWriter)
and any other custom code I have which I want to put a time limit on.

A static class has the advantage of not needing to pass it around
everywhere, and is accessible from everywhere, so that if we discover that
limiting on IndexReader is not enough, and we want some of the scorers to
check more frequently if they should stop, we won't need to pass that
instance all the way down to them.

I don't mind keeping it static, but I also don't mind if it will be an
instance passed around, since currently it's only passed to IndexReader.

Are you going to open an issue for that? Seems like a nice addition to me.
Do you think it should belong in core or contrib? If 'core' then if that's
possible I'd like to see all timeout classes under one package, including
TimeLimitingCollector (which until 2.9 we can safely move around as we
want).
I don't mind working on that w/ you, if you want.

Shai

On Sat, Jun 27, 2009 at 2:24 PM, Mark Harwood wrote:

> Thanks for the feedback, Shai.
> So I guess you're suggesting breaking this out into a general utility class
> e.g. something like:
>
> class TimeLimitedThreadActivity
> {
> //called by client
> public static void startTimeLimitedActivity(long maxTimePermitted).
> public static void endTimeLimitedActivity()
>
>//called by resources (reader/writers) that need to be shared fairly
> by threads
>   public static void checkActivityNotElapsed(); //throws some form of
> runtime exception
> }
>
> A downside of breaking it out into static methods like this is that a
> thread cannot run >1 time-limited activity simultaneously but I guess that
> might be a reasonable restriction.
>
>
> >>Aside, how about using a PQ for the threads' times, or a TreeMap. That
> will save looping over the collection to find the next candidate. Just an
> implementation detail though.
>
> Yep, that was one of the rough edges - I just wanted to get raw timings
> first for the all the "is timed out?" checks we're injecting into reader
> calls.
>
> Cheers
> Mark
>
>
> On 27 Jun 2009, at 11:37, Shai Erera wrote:
>
> I like the overall approach. However it's very local to an IndexReader.
> I.e., if someone wanted to limit other operations (say indexing), or does
> not use an IndexReader (for a Scorer impl maybe), one cannot reuse it.
>
> What if we factor out the timeout logic to a Timeout class (I think it can
> be a static class, with the way you implemented it) and use it in
> TimeLimitingIndexReader? That class can offer a method check() which will do
> the internal logic (the 'if' check and throw exception). It will be similar
> to the current ensureOpen() followed by an operation.
>
> It might be considered more expensive since it won't check a boolean, but
> instead call a check() method, but it will be more reusable. Also,
> ensureOpen today is also a method call, so I don't think Timeout.check() is
> that bad. We can even later create a TimeLimitingIndexWriter and document
> Timeout class for other usage by external code.
>
> Aside, how about using a PQ for the threads' times, or a TreeMap. That will
> save looping over the collection to find the next candidate. Just an
> implementation detail though.
>
> Shai
>
> On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood wrote:
>
>> Going back to my post re TimeLimitedIndexReaders - here's an incomplete
>> but functional prototype:
>> http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
>> http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java
>>
>>
>> The principle is that all reader accesses check a volatile variable
>> indicating something may have timed out (no need to check thread locals
>> etc.) If and only if a time out has been noted threadlocals are checked to
>> see which thread should throw a timeout exception.
>>
>> All time-limited use of reader must be wrapped in try...finally calls to
>> indicate the start and stop of a timed set of activities. A background
>> thread maintains the next anticipated timeout deadline and simply waits
>> until this is reached or the list of planned activities changes with new
>> deadlines.
>>
>>
>> Performance seems reasonable on my Wikipedia index:
>>
>> //some tests for heavy use of termenum/term docs
>> Read term docs for 20 terms  in 4755 ms using no timeout limit

Re: Improving TimeLimitedCollector

2009-06-27 Thread Earwin Burrfoot

Why don't you use Thread.interrupt(), .isInterrupted() ?

On Sat, Jun 27, 2009 at 16:16, Shai Erera wrote:
>> A downside of breaking it out into static methods like this is that a
>> thread cannot run >1 time-limited activity simultaneously but I guess that
>> might be a reasonable restriction.
>
> I'm not sure I understand that - how can a thread run >1 activity
> simultaneously anyway, and how's your impl in TimeLimitingIndexReader allows
> it to do so? You use the thread as a key to the map. Am I missing something?
>
> Anyway, I think we can let go of the static methods and make them instance
> methods. I think .. if I want to use time limited activities, I should
> create a TimeLimitedThreadActivity instance and pass it around, to
> TimeLimitingIndexReader (and maybe in the future to a similar **IndexWriter)
> and any other custom code I have which I want to put a time limit on.
>
> A static class has the advantage of not needing to pass it around
> everywhere, and is accessible from everywhere, so that if we discover that
> limiting on IndexReader is not enough, and we want some of the scorers to
> check more frequently if they should stop, we won't need to pass that
> instance all the way down to them.
>
> I don't mind keeping it static, but I also don't mind if it will be an
> instance passed around, since currently it's only passed to IndexReader.
>
> Are you going to open an issue for that? Seems like a nice addition to me.
> Do you think it should belong in core or contrib? If 'core' then if that's
> possible I'd like to see all timeout classes under one package, including
> TimeLimitingCollector (which until 2.9 we can safely move around as we
> want).
> I don't mind working on that w/ you, if you want.
>
> Shai
>
> On Sat, Jun 27, 2009 at 2:24 PM, Mark Harwood 
> wrote:
>>
>> Thanks for the feedback, Shai.
>> So I guess you're suggesting breaking this out into a general utility
>> class e.g. something like:
>> class TimeLimitedThreadActivity
>> {
>>         //called by client
>>         public static void startTimeLimitedActivity(long
>> maxTimePermitted).
>>         public static void endTimeLimitedActivity()
>>        //called by resources (reader/writers) that need to be shared
>> fairly by threads
>>       public static void checkActivityNotElapsed(); //throws some form of
>> runtime exception
>> }
>> A downside of breaking it out into static methods like this is that a
>> thread cannot run >1 time-limited activity simultaneously but I guess that
>> might be a reasonable restriction.
>>
>> >>Aside, how about using a PQ for the threads' times, or a TreeMap. That
>> >> will save looping over the collection to find the next candidate. Just an
>> >> implementation detail though.
>> Yep, that was one of the rough edges - I just wanted to get raw timings
>> first for the all the "is timed out?" checks we're injecting into reader
>> calls.
>> Cheers
>> Mark
>>
>> On 27 Jun 2009, at 11:37, Shai Erera wrote:
>>
>> I like the overall approach. However it's very local to an IndexReader.
>> I.e., if someone wanted to limit other operations (say indexing), or does
>> not use an IndexReader (for a Scorer impl maybe), one cannot reuse it.
>>
>> What if we factor out the timeout logic to a Timeout class (I think it can
>> be a static class, with the way you implemented it) and use it in
>> TimeLimitingIndexReader? That class can offer a method check() which will do
>> the internal logic (the 'if' check and throw exception). It will be similar
>> to the current ensureOpen() followed by an operation.
>>
>> It might be considered more expensive since it won't check a boolean, but
>> instead call a check() method, but it will be more reusable. Also,
>> ensureOpen today is also a method call, so I don't think Timeout.check() is
>> that bad. We can even later create a TimeLimitingIndexWriter and document
>> Timeout class for other usage by external code.
>>
>> Aside, how about using a PQ for the threads' times, or a TreeMap. That
>> will save looping over the collection to find the next candidate. Just an
>> implementation detail though.
>>
>> Shai
>>
>> On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood 
>> wrote:
>>>
>>> Going back to my post re TimeLimitedIndexReaders - here's an incomplete
>>> but functional prototype:
>>> http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
>>> http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java
>>>
>>> The principle is that all reader accesses check a volatile variable
>>> indicating something may have timed out (no need to check thread locals
>>> etc.) If and only if a time out has been noted threadlocals are checked to
>>> see which thread should throw a timeout exception.
>>> All time-limited use of reader must be wrapped in try...finally calls to
>>> indicate the start and stop of a timed set of activities. A background
>>> thread maintains the next anticipated timeout deadline and simply waits
>>> until this is reached or the list of planned acti

Re: Improving TimeLimitedCollector

2009-06-27 Thread Mark Harwood

Odd. I see you're responding to a message from Shai I didn't get. Some  
mail being dropped somewhere along the line..



Why don't you use Thread.interrupt(), .isInterrupted() ?



Not sure where exactly you mean for that?



I'm not sure I understand that - how can a thread run >1 activity
simultaneously anyway, and how's your impl in  
TimeLimitingIndexReader allows

it to do so?



I can have a single thread performing using multiple readers/writers  
and each reader/writer has different timed activities e.g.



reader1.startActivity(6)
try
 do reader 1 stuff
 reader2.startActivity(1);
 try
 do reader 2 stuff.

 finally
 reader2.endActivity();


finally
   reader1.endActivity();

Each reader has it's own list of threads and their anticipated  
timeouts in relation to that particular resource.



Bit of an edge case perhaps but anything based on statics precludes  
this type of code. An instance of a timeout object would give this  
flexibility but as you say - you would need to pass a timeoutable- 
activity context object around which is a bit more hassle. Flexibility  
vs simplicity decision to be made here.




I don't mind working on that w/ you, if you want.



Appreciated. I'll move the timeout management logic out of reader into  
another class and open a Jira issue so we can move the discussion/ 
development there.


Cheers
Mark

[jira] Created: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

2009-06-27 Thread Steven Rowe (JIRA)

Add javadoc notes about ICUCollationKeyFilter's speed advantage over 
CollationKeyFilter
---

 Key: LUCENE-1719
 URL: https://issues.apache.org/jira/browse/LUCENE-1719
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Steven Rowe
Priority: Trivial
 Fix For: 2.9


contrib/collation's ICUCollationKeyFilter, which uses ICU4J collation, is 
faster than CollationKeyFilter, the JVM-provided java.text.Collator 
implementation in the same package.  The javadocs of these classes should be 
modified to add a note to this effect.

My curiosity was piqued by [Robert Muir's 
comment|https://issues.apache.org/jira/browse/LUCENE-1581?focusedCommentId=12720300#action_12720300]
 on LUCENE-1581, in which he states that ICUCollationKeyFilter is up to 30x 
faster than CollationKeyFilter.

I timed the operation of these two classes, with Sun JVM versions 1.4.2/32-bit, 
1.5.0/32- and 64-bit, and 1.6.0/64-bit, using 90k word lists of 4 languages 
(taken from the corresponding Debian wordlist packages and truncated to the 
first 90k words after a fixed random shuffling), using Collators at the default 
strength, on a Windows Vista 64-bit machine.  I used an analysis pipeline 
consisting of WhitespaceTokenizer chained to the collation key filter, so to 
isolate the time taken by the collation key filters, I also timed 
WhitespaceTokenizer operating alone for each combination, and then subtracted 
it from both of the collation key analysis chains' times.  The rightmost column 
represents the performance advantage of the ICU4J implemtation (ICU) over the 
java.text.Collator implementation (JVM), after discounting the 
WhitespaceTokenizer time (WST): (ICU-WST) / (JVM-WST). The best times out of 5 
runs for each combination, in milliseconds, are as follows:

||Sun JVM||Language||java.text||ICU4J||WhitespaceTokenizer||ICU4J Improvement||
|1.4.2_17 (32 bit)|English|522|212|13|2.6x|
|1.4.2_17 (32 bit)|French|716|243|14|3.1x|
|1.4.2_17 (32 bit)|German|669|264|16|2.6x|
|1.4.2_17 (32 bit)|Ukranian|931|474|25|2.0x|
|1.5.0_15 (32 bit)|English|604|176|16|3.7x|
|1.5.0_15 (32 bit)|French|817|209|17|4.2x|
|1.5.0_15 (32 bit)|German|799|225|20|3.8x|
|1.5.0_15 (32 bit)|Ukranian|1029|436|26|2.4x|
|1.5.0_15 (64 bit)|English|431|89|10|5.3x|
|1.5.0_15 (64 bit)|French|562|112|11|5.5x|
|1.5.0_15 (64 bit)|German|567|116|13|5.4x|
|1.5.0_15 (64 bit)|Ukranian|734|281|21|2.7x|
|1.6.0_13 (64 bit)|English|162|81|9|2.1x|
|1.6.0_13 (64 bit)|French|192|92|10|2.2x|
|1.6.0_13 (64 bit)|German|204|99|14|2.2x|
|1.6.0_13 (64 bit)|Ukranian|273|202|21|1.4x|


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

2009-06-27 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-1719:


Attachment: LUCENE-1719.patch

Patch containing notes to add to collation key filter/analyzer classes' 
javadocs.

> Add javadoc notes about ICUCollationKeyFilter's speed advantage over 
> CollationKeyFilter
> ---
>
> Key: LUCENE-1719
> URL: https://issues.apache.org/jira/browse/LUCENE-1719
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.4.1
>Reporter: Steven Rowe
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1719.patch
>
>
> contrib/collation's ICUCollationKeyFilter, which uses ICU4J collation, is 
> faster than CollationKeyFilter, the JVM-provided java.text.Collator 
> implementation in the same package.  The javadocs of these classes should be 
> modified to add a note to this effect.
> My curiosity was piqued by [Robert Muir's 
> comment|https://issues.apache.org/jira/browse/LUCENE-1581?focusedCommentId=12720300#action_12720300]
>  on LUCENE-1581, in which he states that ICUCollationKeyFilter is up to 30x 
> faster than CollationKeyFilter.
> I timed the operation of these two classes, with Sun JVM versions 
> 1.4.2/32-bit, 1.5.0/32- and 64-bit, and 1.6.0/64-bit, using 90k word lists of 
> 4 languages (taken from the corresponding Debian wordlist packages and 
> truncated to the first 90k words after a fixed random shuffling), using 
> Collators at the default strength, on a Windows Vista 64-bit machine.  I used 
> an analysis pipeline consisting of WhitespaceTokenizer chained to the 
> collation key filter, so to isolate the time taken by the collation key 
> filters, I also timed WhitespaceTokenizer operating alone for each 
> combination, and then subtracted it from both of the collation key analysis 
> chains' times.  The rightmost column represents the performance advantage of 
> the ICU4J implemtation (ICU) over the java.text.Collator implementation 
> (JVM), after discounting the WhitespaceTokenizer time (WST): (ICU-WST) / 
> (JVM-WST). The best times out of 5 runs for each combination, in 
> milliseconds, are as follows:
> ||Sun JVM||Language||java.text||ICU4J||WhitespaceTokenizer||ICU4J 
> Improvement||
> |1.4.2_17 (32 bit)|English|522|212|13|2.6x|
> |1.4.2_17 (32 bit)|French|716|243|14|3.1x|
> |1.4.2_17 (32 bit)|German|669|264|16|2.6x|
> |1.4.2_17 (32 bit)|Ukranian|931|474|25|2.0x|
> |1.5.0_15 (32 bit)|English|604|176|16|3.7x|
> |1.5.0_15 (32 bit)|French|817|209|17|4.2x|
> |1.5.0_15 (32 bit)|German|799|225|20|3.8x|
> |1.5.0_15 (32 bit)|Ukranian|1029|436|26|2.4x|
> |1.5.0_15 (64 bit)|English|431|89|10|5.3x|
> |1.5.0_15 (64 bit)|French|562|112|11|5.5x|
> |1.5.0_15 (64 bit)|German|567|116|13|5.4x|
> |1.5.0_15 (64 bit)|Ukranian|734|281|21|2.7x|
> |1.6.0_13 (64 bit)|English|162|81|9|2.1x|
> |1.6.0_13 (64 bit)|French|192|92|10|2.2x|
> |1.6.0_13 (64 bit)|German|204|99|14|2.2x|
> |1.6.0_13 (64 bit)|Ukranian|273|202|21|1.4x|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

2009-06-27 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-1719:


  Description: 
contrib/collation's ICUCollationKeyFilter, which uses ICU4J collation, is 
faster than CollationKeyFilter, the JVM-provided java.text.Collator 
implementation in the same package.  The javadocs of these classes should be 
modified to add a note to this effect.

My curiosity was piqued by [Robert Muir's 
comment|https://issues.apache.org/jira/browse/LUCENE-1581?focusedCommentId=12720300#action_12720300]
 on LUCENE-1581, in which he states that ICUCollationKeyFilter is up to 30x 
faster than CollationKeyFilter.

I timed the operation of these two classes, with Sun JVM versions 1.4.2/32-bit, 
1.5.0/32- and 64-bit, and 1.6.0/64-bit, using 90k word lists of 4 languages 
(taken from the corresponding Debian wordlist packages and truncated to the 
first 90k words after a fixed random shuffling), using Collators at the default 
strength, on a Windows Vista 64-bit machine.  I used an analysis pipeline 
consisting of WhitespaceTokenizer chained to the collation key filter, so to 
isolate the time taken by the collation key filters, I also timed 
WhitespaceTokenizer operating alone for each combination.  The rightmost column 
represents the performance advantage of the ICU4J implemtation (ICU) over the 
java.text.Collator implementation (JVM), after discounting the 
WhitespaceTokenizer time (WST): (ICU-WST) / (JVM-WST). The best times out of 5 
runs for each combination, in milliseconds, are as follows:

||Sun JVM||Language||java.text||ICU4J||WhitespaceTokenizer||ICU4J Improvement||
|1.4.2_17 (32 bit)|English|522|212|13|2.6x|
|1.4.2_17 (32 bit)|French|716|243|14|3.1x|
|1.4.2_17 (32 bit)|German|669|264|16|2.6x|
|1.4.2_17 (32 bit)|Ukranian|931|474|25|2.0x|
|1.5.0_15 (32 bit)|English|604|176|16|3.7x|
|1.5.0_15 (32 bit)|French|817|209|17|4.2x|
|1.5.0_15 (32 bit)|German|799|225|20|3.8x|
|1.5.0_15 (32 bit)|Ukranian|1029|436|26|2.4x|
|1.5.0_15 (64 bit)|English|431|89|10|5.3x|
|1.5.0_15 (64 bit)|French|562|112|11|5.5x|
|1.5.0_15 (64 bit)|German|567|116|13|5.4x|
|1.5.0_15 (64 bit)|Ukranian|734|281|21|2.7x|
|1.6.0_13 (64 bit)|English|162|81|9|2.1x|
|1.6.0_13 (64 bit)|French|192|92|10|2.2x|
|1.6.0_13 (64 bit)|German|204|99|14|2.2x|
|1.6.0_13 (64 bit)|Ukranian|273|202|21|1.4x|


  was:
contrib/collation's ICUCollationKeyFilter, which uses ICU4J collation, is 
faster than CollationKeyFilter, the JVM-provided java.text.Collator 
implementation in the same package.  The javadocs of these classes should be 
modified to add a note to this effect.

My curiosity was piqued by [Robert Muir's 
comment|https://issues.apache.org/jira/browse/LUCENE-1581?focusedCommentId=12720300#action_12720300]
 on LUCENE-1581, in which he states that ICUCollationKeyFilter is up to 30x 
faster than CollationKeyFilter.

I timed the operation of these two classes, with Sun JVM versions 1.4.2/32-bit, 
1.5.0/32- and 64-bit, and 1.6.0/64-bit, using 90k word lists of 4 languages 
(taken from the corresponding Debian wordlist packages and truncated to the 
first 90k words after a fixed random shuffling), using Collators at the default 
strength, on a Windows Vista 64-bit machine.  I used an analysis pipeline 
consisting of WhitespaceTokenizer chained to the collation key filter, so to 
isolate the time taken by the collation key filters, I also timed 
WhitespaceTokenizer operating alone for each combination, and then subtracted 
it from both of the collation key analysis chains' times.  The rightmost column 
represents the performance advantage of the ICU4J implemtation (ICU) over the 
java.text.Collator implementation (JVM), after discounting the 
WhitespaceTokenizer time (WST): (ICU-WST) / (JVM-WST). The best times out of 5 
runs for each combination, in milliseconds, are as follows:

||Sun JVM||Language||java.text||ICU4J||WhitespaceTokenizer||ICU4J Improvement||
|1.4.2_17 (32 bit)|English|522|212|13|2.6x|
|1.4.2_17 (32 bit)|French|716|243|14|3.1x|
|1.4.2_17 (32 bit)|German|669|264|16|2.6x|
|1.4.2_17 (32 bit)|Ukranian|931|474|25|2.0x|
|1.5.0_15 (32 bit)|English|604|176|16|3.7x|
|1.5.0_15 (32 bit)|French|817|209|17|4.2x|
|1.5.0_15 (32 bit)|German|799|225|20|3.8x|
|1.5.0_15 (32 bit)|Ukranian|1029|436|26|2.4x|
|1.5.0_15 (64 bit)|English|431|89|10|5.3x|
|1.5.0_15 (64 bit)|French|562|112|11|5.5x|
|1.5.0_15 (64 bit)|German|567|116|13|5.4x|
|1.5.0_15 (64 bit)|Ukranian|734|281|21|2.7x|
|1.6.0_13 (64 bit)|English|162|81|9|2.1x|
|1.6.0_13 (64 bit)|French|192|92|10|2.2x|
|1.6.0_13 (64 bit)|German|204|99|14|2.2x|
|1.6.0_13 (64 bit)|Ukranian|273|202|21|1.4x|


Lucene Fields: [New, Patch Available]  (was: [New])

> Add javadoc notes about ICUCollationKeyFilter's speed advantage over 
> CollationKeyFilter
> ---
>
> Key: LUCENE-1719
> URL: https:/

[jira] Commented: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

2009-06-27 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724923#action_12724923
 ] 

Steven Rowe commented on LUCENE-1719:
-

I also tested ICU4J version 4.2 (released 6 weeks ago), and the timings were 
nearly identical to those from ICU4J version 4.0 (the one that's in 
contrib/collation/lib/).

The timings given in the table above were not produced with the "-server" 
option to the JVM.  I separately tested all combinations using the "-server" 
option, but there was no difference for the 32-bit JVMs, though roughly 3-4% 
faster for the 64-bit JVMs.  I got the impression (didn't actually calculate) 
that although the best times of 5 runs were better for the 64-bit JVMs when 
using the "-server" option, the average times seemed to be slightly worse.  In 
any case, the performance improvement of the ICU4J implementation over the 
java.text.Collator implementation was basically unaffected by the use of the 
"-server" JVM option.


> Add javadoc notes about ICUCollationKeyFilter's speed advantage over 
> CollationKeyFilter
> ---
>
> Key: LUCENE-1719
> URL: https://issues.apache.org/jira/browse/LUCENE-1719
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.4.1
>Reporter: Steven Rowe
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1719.patch
>
>
> contrib/collation's ICUCollationKeyFilter, which uses ICU4J collation, is 
> faster than CollationKeyFilter, the JVM-provided java.text.Collator 
> implementation in the same package.  The javadocs of these classes should be 
> modified to add a note to this effect.
> My curiosity was piqued by [Robert Muir's 
> comment|https://issues.apache.org/jira/browse/LUCENE-1581?focusedCommentId=12720300#action_12720300]
>  on LUCENE-1581, in which he states that ICUCollationKeyFilter is up to 30x 
> faster than CollationKeyFilter.
> I timed the operation of these two classes, with Sun JVM versions 
> 1.4.2/32-bit, 1.5.0/32- and 64-bit, and 1.6.0/64-bit, using 90k word lists of 
> 4 languages (taken from the corresponding Debian wordlist packages and 
> truncated to the first 90k words after a fixed random shuffling), using 
> Collators at the default strength, on a Windows Vista 64-bit machine.  I used 
> an analysis pipeline consisting of WhitespaceTokenizer chained to the 
> collation key filter, so to isolate the time taken by the collation key 
> filters, I also timed WhitespaceTokenizer operating alone for each 
> combination.  The rightmost column represents the performance advantage of 
> the ICU4J implemtation (ICU) over the java.text.Collator implementation 
> (JVM), after discounting the WhitespaceTokenizer time (WST): (ICU-WST) / 
> (JVM-WST). The best times out of 5 runs for each combination, in 
> milliseconds, are as follows:
> ||Sun JVM||Language||java.text||ICU4J||WhitespaceTokenizer||ICU4J 
> Improvement||
> |1.4.2_17 (32 bit)|English|522|212|13|2.6x|
> |1.4.2_17 (32 bit)|French|716|243|14|3.1x|
> |1.4.2_17 (32 bit)|German|669|264|16|2.6x|
> |1.4.2_17 (32 bit)|Ukranian|931|474|25|2.0x|
> |1.5.0_15 (32 bit)|English|604|176|16|3.7x|
> |1.5.0_15 (32 bit)|French|817|209|17|4.2x|
> |1.5.0_15 (32 bit)|German|799|225|20|3.8x|
> |1.5.0_15 (32 bit)|Ukranian|1029|436|26|2.4x|
> |1.5.0_15 (64 bit)|English|431|89|10|5.3x|
> |1.5.0_15 (64 bit)|French|562|112|11|5.5x|
> |1.5.0_15 (64 bit)|German|567|116|13|5.4x|
> |1.5.0_15 (64 bit)|Ukranian|734|281|21|2.7x|
> |1.6.0_13 (64 bit)|English|162|81|9|2.1x|
> |1.6.0_13 (64 bit)|French|192|92|10|2.2x|
> |1.6.0_13 (64 bit)|German|204|99|14|2.2x|
> |1.6.0_13 (64 bit)|Ukranian|273|202|21|1.4x|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1581) LowerCaseFilter should be able to be configured to use a specific locale.

2009-06-27 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724926#action_12724926
 ] 

Steven Rowe commented on LUCENE-1581:
-

{quote}
you could add the JDK collation key filter to core if you wanted a core fix.

but the icu one is up to something like 30x faster than the jdk, so why bother 
:)
{quote}

LUCENE-1719 contains some timings I made about the relative speeds of these two 
implementations.  In short, for the platform/language/collator/JVM version 
combinations I tested, the ICU4J implementation's speed advantage ranges from 
1.4x to 5.5x.

> LowerCaseFilter should be able to be configured to use a specific locale.
> -
>
> Key: LUCENE-1581
> URL: https://issues.apache.org/jira/browse/LUCENE-1581
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Digy
> Attachments: TestTurkishCollation.java
>
>
> //Since I am a .Net programmer, Sample codes will be in c# but I don't think 
> that it would be a problem to understand them.
> //
> Assume an input text like "İ" and and analyzer like below
> {code}
>   public class SomeAnalyzer : Analyzer
>   {
>   public override TokenStream TokenStream(string fieldName, 
> System.IO.TextReader reader)
>   {
>   TokenStream t = new SomeTokenizer(reader);
>   t = new Lucene.Net.Analysis.ASCIIFoldingFilter(t);
>   t = new LowerCaseFilter(t);
>   return t;
>   }
> 
>   }
> {code}
>   
> ASCIIFoldingFilter will return "I" and after, LowerCaseFilter will return
>   "i" (if locale is "en-US") 
>   or 
>   "ı' if(locale is "tr-TR") (that means,this token should be input to 
> another instance of ASCIIFoldingFilter)
> So, calling LowerCaseFilter before ASCIIFoldingFilter would be a solution, 
> but a better approach can be adding
> a new constructor to LowerCaseFilter and forcing it to use a specific locale.
> {code}
> public sealed class LowerCaseFilter : TokenFilter
> {
> /* +++ */System.Globalization.CultureInfo CultureInfo = 
> System.Globalization.CultureInfo.CurrentCulture;
> public LowerCaseFilter(TokenStream in) : base(in)
> {
> }
> /* +++ */  public LowerCaseFilter(TokenStream in, 
> System.Globalization.CultureInfo CultureInfo) : base(in)
> /* +++ */  {
> /* +++ */  this.CultureInfo = CultureInfo;
> /* +++ */  }
>   
> public override Token Next(Token result)
> {
> result = Input.Next(result);
> if (result != null)
> {
> char[] buffer = result.TermBuffer();
> int length = result.termLength;
> for (int i = 0; i < length; i++)
> /* +++ */ buffer[i] = 
> System.Char.ToLower(buffer[i],CultureInfo);
> return result;
> }
> else
> return null;
> }
> }
> {code}
> DIGY

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1709) Parallelize Tests

[jira] Updated: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

Re: Improving TimeLimitedCollector

[jira] Resolved: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

Re: Improving TimeLimitedCollector

Re: Improving TimeLimitedCollector

Re: Improving TimeLimitedCollector

Re: Improving TimeLimitedCollector

[jira] Created: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

[jira] Updated: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

[jira] Updated: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

[jira] Commented: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's speed advantage over CollationKeyFilter

[jira] Commented: (LUCENE-1581) LowerCaseFilter should be able to be configured to use a specific locale.

13 matches

Site Navigation

Mail list logo

Footer information