[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725164#action_12725164
 ] 

Mark Harwood commented on LUCENE-1720:
--

Currently the class hinges on a fast fail mechanism whereby all the many 
calls checking for a timeout are very quickly testing a single volatile 
boolean, anActivityHasTimedOut.
99.99% of calls are expected to fail this test (nothing has timed out) and fail 
quickly - I was reluctant to add any hashset lookup etc in there needed to 
determine failure.

With that as a guiding principle maybe the solution is to change
volatile boolean anActivityHasTimedOut
into
volatile int numberOfTimedOutThreads;

which would cater for 1 error condition at once. The fast-fail check then 
becomes:
if(numberOfTimedOutThreads  0)
{
 if(timedoutThreads.contains(Thread.currentThread)
 { 
timedoutThreads.remove(Thread.currentThread);
numberOfTimedOutThreads=timedoutThreads.size();
throw RuntimeException.
 }
   }




 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725168#action_12725168
 ] 

Eks Dev commented on LUCENE-1720:
-

it's been late for this issue, but maybe worth thinking about. We could change 
semantics of this problem completely. Imo, the problem can be reformulated as 
Provide possibility to cancel running queries on best effort basis, with or 
without providing so far collected results

That would leave Timer management to the end users and make an issue focus on 
one Lucene core ... Timeout management can be then provided as an example 
somewhere How to implement Timeout management using ...








 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725172#action_12725172
 ] 

Shai Erera commented on LUCENE-1720:


bq. ... quickly testing a single volatile boolean, anActivityHasTimedOut.

Oh, I did not mean to skip this check. After anActivityHasTimedOut is true, 
instead of comparing Thread.currentThread() to firstAnticipatedThreadToFail, 
check if Thread.currentThread() is in the failed HashSet of threads, or 
something like that.

I totally agree this should be kept and used that way, and it's probably better 
than numberOfTimedOutThreads since we don't need to inc/dec the latter every 
failure, just set a boolean flag and test it.

bq. Imo, the problem can be reformulated as Provide possibility to cancel 
running queries on best effort basis, with or without providing so far 
collected results.

That's where we started from, but Mark here wanted to provide a much more 
generalized way of stopping any other activity, not just search. With this 
utility class, someone can implement a TimeLimitedIndexWriter which times out 
indexing, merging etc. Search is just one operation which will be covered as 
well.

I also think that TimeLimitingCollector already provides a possibility to 
cancel running queries on a best effort basis and therefore if someone is 
interested in just that, he doesn't need to use TimeLimitedIndexReader. However 
this approach seems much more simple if you want to ensure queries are stopped 
ASAP, w/o passing a Timeout object around or anything. This approach also 
guarantees (I think) that any custom Scorer which does a lot of work, but uses 
IndexReader for that, will be stopped, even if the Scorer's developer did not 
implement a Timeout mechanism. Right?

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725176#action_12725176
 ] 

Mark Harwood commented on LUCENE-1720:
--

bq. Oh, I did not mean to skip this check.

But the check is on a variable with a yes/no state. We need to cater for 1 
simultaneous timeout error condition in play. With only a boolean it could be 
hard to know precisely when to clear it, no?

bq. Mark here wanted to provide a much more generalized way of stopping any 
other activity, not just search

To be fair I think the use case for IndexWriter is weaker. In reader you have 
multiple users all expressing different queries and you want them all to share 
nicely with each other. In index writing it's typically a batch system indexing 
docs and there's no fairness to mediate? Breaking it out into a utility class 
seems like a good idea anyway.

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement

2009-06-29 Thread Robert Muir (JIRA)
SmartChineseAnalyzer javadoc improvement


 Key: LUCENE-1722
 URL: https://issues.apache.org/jira/browse/LUCENE-1722
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor


Chinese - English, and corrections to match reality (removes several javadoc 
warnings)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725182#action_12725182
 ] 

Eks Dev commented on LUCENE-1720:
-

Sure, I just wanted to sharpen definition what is Lucene core issue, and what 
we can leave to end users. It is not only about the time, rather about 
canceling search requests (even better, general activities). 

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725183#action_12725183
 ] 

Shai Erera commented on LUCENE-1720:


bq. With only a boolean it could be hard to know precisely when to clear it, no?

We can cleat it when the time out threads' Set's size() is 0?

I agree that this issue is mostly about IndexReader (and hence the name), and 
that the scenario of IndexWriter is weaker. But a utility class together w/ the 
TimeLimitedIndexReader example can help someone write a TimeLimitedIndexWriter 
very easily, and/or reuse this utility elsewhere.

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement

2009-06-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1722:


Attachment: LUCENE-1722.txt

patch file

 SmartChineseAnalyzer javadoc improvement
 

 Key: LUCENE-1722
 URL: https://issues.apache.org/jira/browse/LUCENE-1722
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1722.txt


 Chinese - English, and corrections to match reality (removes several javadoc 
 warnings)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725197#action_12725197
 ] 

Mark Harwood commented on LUCENE-1720:
--

bq. any custom Scorer which does a lot of work, but uses IndexReader for that, 
will be stopped, even if the Scorer's developer did not implement a Timeout 
mechanism. Right?

Correct. I'm not familiar with the proposal to pass around a Timeout object but 
I get the idea and the code here would certainly avoid that overhead.

bq. We can cleat it when the time out threads' Set's size() is 0?

Yes, that would work.


 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725200#action_12725200
 ] 

Shai Erera commented on LUCENE-1720:


bq. I'm not familiar with the proposal to pass around a Timeout object

On the email thread I offered to create on QueryWeight a scorer(IndexSearcher, 
boolean, boolean, Timeout) in order to pass a Timeout object to Scorer, and 
also create a TimeLimitedQuery. But it's no longer needed.

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

2009-06-29 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith reopened LUCENE-1705:
---


Looks like i found an issue with this

The deleteAll() method isn't resetting the nextDocID on the DocumentsWriter (or 
some similar behaviour)

so, the following state will result in an error:
* deleteAll()
* updateDocument(5, doc)
* commit()

this results in a delete for doc 5 getting buffered, but with a very high 
maxDocId
at the same time, doc is added, however, the following will then occur on 
commit:
* flush segments to disk
* doc 5 is now in a segment on disk
* run deletes
* doc 5 is now blacklisted from segment 

Will work on fixing this and post a new patch (along with updated test case)

(was worried i was missing an edge case)

 Add deleteAllDocuments() method to IndexWriter
 --

 Key: LUCENE-1705
 URL: https://issues.apache.org/jira/browse/LUCENE-1705
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Affects Versions: 2.4
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch


 Ideally, there would be a deleteAllDocuments() or clear() method on the 
 IndexWriter
 This method should have the same performance and characteristics as:
 * currentWriter.close()
 * currentWriter = new IndexWriter(..., create=true,...)
 This would greatly optimize a delete all documents case. Using 
 deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
 existing index.
 IndexWriter.deleteAllDocuments() should have the same semantics as a 
 commit(), as far as index visibility goes (new IndexReader opening would get 
 the empty index)
 I see this was previously asked for in LUCENE-932, however it would be nice 
 to finally see this added such that the IndexWriter would not need to be 
 closed to perform the clear as this seems to be the general recommendation 
 for working with an IndexWriter now
 deleteAllDocuments() method should:
 * abort any background merges (they are pointless once a deleteAll has been 
 received)
 * write new segments file referencing no segments
 This method would remove one of the final reasons i would ever need to close 
 an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1706) Site search powered by Lucene/Solr

2009-06-29 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll closed LUCENE-1706.
---

   Resolution: Fixed
Lucene Fields:   (was: [New])

 Site search powered by Lucene/Solr
 --

 Key: LUCENE-1706
 URL: https://issues.apache.org/jira/browse/LUCENE-1706
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1706.patch, LUCENE-1706.patch


 For a number of years now, the Lucene community has been criticized for not 
 eating our own dog food when it comes to search. My company has built and 
 hosts a site search (http://www.lucidimagination.com/search) that is powered 
 by Apache Solr and Lucene and we'd like to donate it's use to the Lucene 
 community. Additionally, it allows one to search all of the Lucene content 
 from a single place, including web, wiki, JIRA and mail archives. See also 
 http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org
 You can see it live on Mahout, Tika and Solr
 Lucid has a fault tolerant setup with replication and fail over as well as 
 monitoring services in place. We are committed to maintaining and expanding 
 the search capabilities on the site.
 The following patch adds a skin to the Forrest site that enables the Lucene 
 site to search Lucene only content using Lucene/Solr. When a search is 
 submitted, it automatically selects the Lucene facet such that only Lucene 
 content is searched. From there, users can then narrow/broaden their search 
 criteria.
 I plan on committing in a 3 or 4 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

2009-06-29 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: TestIndexWriterDelete.patch

Here's a patch to TestIndexWriterDelete that shows the problem

after the deleteAll(), a document is added and a document is updated
the added document gets indexed, the updated document does not

 Add deleteAllDocuments() method to IndexWriter
 --

 Key: LUCENE-1705
 URL: https://issues.apache.org/jira/browse/LUCENE-1705
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Affects Versions: 2.4
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch, 
 TestIndexWriterDelete.patch


 Ideally, there would be a deleteAllDocuments() or clear() method on the 
 IndexWriter
 This method should have the same performance and characteristics as:
 * currentWriter.close()
 * currentWriter = new IndexWriter(..., create=true,...)
 This would greatly optimize a delete all documents case. Using 
 deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
 existing index.
 IndexWriter.deleteAllDocuments() should have the same semantics as a 
 commit(), as far as index visibility goes (new IndexReader opening would get 
 the empty index)
 I see this was previously asked for in LUCENE-932, however it would be nice 
 to finally see this added such that the IndexWriter would not need to be 
 closed to perform the clear as this seems to be the general recommendation 
 for working with an IndexWriter now
 deleteAllDocuments() method should:
 * abort any background merges (they are pointless once a deleteAll has been 
 received)
 * write new segments file referencing no segments
 This method would remove one of the final reasons i would ever need to close 
 an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-06-29 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1566:


Attachment: LUCENE-1566.patch

I was able to reproduce the bug on my machine using several JVMs. The attached 
patch is what I got ready by now - I though I get it out there as soon as 
possible for discussion.
Test pass on my side!

 Large Lucene index can hit false OOM due to Sun JRE issue
 -

 Key: LUCENE-1566
 URL: https://issues.apache.org/jira/browse/LUCENE-1566
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.4.1
Reporter: Michael McCandless
Assignee: Simon Willnauer
Priority: Minor
 Attachments: LUCENE-1566.patch


 This is not a Lucene issue, but I want to open this so future google
 diggers can more easily find it.
 There's this nasty bug in Sun's JRE:
   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
 The gist seems to be, if you try to read a large (eg 200 MB) number of
 bytes during a single RandomAccessFile.read call, you can incorrectly
 hit OOM.  Lucene does this, with norms, since we read in one byte per
 doc per field with norms, as a contiguous array of length maxDoc().
 The workaround was a custom patch to do large file reads as several
 smaller reads.
 Background here:
   http://www.nabble.com/problems-with-large-Lucene-index-td22347854.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

2009-06-29 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: DeleteAllFlushDocCountFix.patch

Here's a patch that fixes the deleteAll() + updateDocument() issue

just needed to set the FlushDocCount to 0 after aborting the outstanding 
documents

 Add deleteAllDocuments() method to IndexWriter
 --

 Key: LUCENE-1705
 URL: https://issues.apache.org/jira/browse/LUCENE-1705
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Affects Versions: 2.4
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: DeleteAllFlushDocCountFix.patch, 
 IndexWriterDeleteAll.patch, LUCENE-1705.patch


 Ideally, there would be a deleteAllDocuments() or clear() method on the 
 IndexWriter
 This method should have the same performance and characteristics as:
 * currentWriter.close()
 * currentWriter = new IndexWriter(..., create=true,...)
 This would greatly optimize a delete all documents case. Using 
 deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
 existing index.
 IndexWriter.deleteAllDocuments() should have the same semantics as a 
 commit(), as far as index visibility goes (new IndexReader opening would get 
 the empty index)
 I see this was previously asked for in LUCENE-932, however it would be nice 
 to finally see this added such that the IndexWriter would not need to be 
 closed to perform the clear as this seems to be the general recommendation 
 for working with an IndexWriter now
 deleteAllDocuments() method should:
 * abort any background merges (they are pointless once a deleteAll has been 
 received)
 * write new segments file referencing no segments
 This method would remove one of the final reasons i would ever need to close 
 an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter

2009-06-29 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: (was: TestIndexWriterDelete.patch)

 Add deleteAllDocuments() method to IndexWriter
 --

 Key: LUCENE-1705
 URL: https://issues.apache.org/jira/browse/LUCENE-1705
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Affects Versions: 2.4
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: DeleteAllFlushDocCountFix.patch, 
 IndexWriterDeleteAll.patch, LUCENE-1705.patch


 Ideally, there would be a deleteAllDocuments() or clear() method on the 
 IndexWriter
 This method should have the same performance and characteristics as:
 * currentWriter.close()
 * currentWriter = new IndexWriter(..., create=true,...)
 This would greatly optimize a delete all documents case. Using 
 deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
 existing index.
 IndexWriter.deleteAllDocuments() should have the same semantics as a 
 commit(), as far as index visibility goes (new IndexReader opening would get 
 the empty index)
 I see this was previously asked for in LUCENE-932, however it would be nice 
 to finally see this added such that the IndexWriter would not need to be 
 closed to perform the clear as this seems to be the general recommendation 
 for working with an IndexWriter now
 deleteAllDocuments() method should:
 * abort any background merges (they are pointless once a deleteAll has been 
 received)
 * write new segments file referencing no segments
 This method would remove one of the final reasons i would ever need to close 
 an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725386#action_12725386
 ] 

Jason Rutherglen commented on LUCENE-1720:
--

Maybe we can benchmark this approach to see if it slows down
queries due to the the Thread.currentThread and hash lookup? As
this would go into 3.0 (?) maybe we can look at how to change
the Lucene API such that we pass in an argument to the
IndexReader methods where the timeout may be checked for?

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)
KeywordTokenizer does not properly set the end offset
-

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java

KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test that exercises various analyzers via a Highlighter 
instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
with the highlight tags, such as bthetext/b. When using KeywordAnalyzer 
the tags appear before the text, for example: b/bthetext. 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = thetext;
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
b, /b), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
CONTENT, text);
assertEquals(Failed for  + 
analazer.getClass().getName(), b
+ text + /b, highighted);
System.out.println(analazer.getClass().getName()
+  passed, value highlighted:  + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Attachment: AnalyzerBug.java

 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test that exercises various analyzers via a Highlighter 
 instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
 with the highlight tags, such as bthetext/b. When using KeywordAnalyzer 
 the tags appear before the text, for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public TokenStream tokenStream(String fieldName, Reader reader) {
   return new NewKeywordTokenizer(reader);
   }
 }
 class NewKeywordTokenizer extends KeywordTokenizer {
   public NewKeywordTokenizer(Reader input) {
   super(input);
   }
   @Override
   public Token next(Token t) throws IOException {
   Token result = super.next(t);
   if (result != null) {
   result.setEndOffset(result.termLength());
   }
   return result;
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: 

[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Description: 
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as bthetext/b. 
When using KeywordAnalyzer the tags appear before the text, for example: 
b/bthetext. 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = thetext;
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
b, /b), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
CONTENT, text);
assertEquals(Failed for  + 
analazer.getClass().getName(), b
+ text + /b, highighted);
System.out.println(analazer.getClass().getName()
+  passed, value highlighted:  + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


  was:
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test that exercises various analyzers via a Highlighter 
instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
with the highlight tags, such as bthetext/b. When using KeywordAnalyzer 
the tags appear before the text, for example: b/bthetext. 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are 

[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Description: 
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as bthetext/b. 
When using KeywordAnalyzer the tags appear before the text, for example: 
b/bthetext. 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

Unless there is an objection I will gladly post a patch in the very near future 
. 

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = thetext;
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
b, /b), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
CONTENT, text);
assertEquals(Failed for  + 
analazer.getClass().getName(), b
+ text + /b, highighted);
System.out.println(analazer.getClass().getName()
+  passed, value highlighted:  + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


  was:
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as bthetext/b. 
When using KeywordAnalyzer the tags appear before the text, for example: 
b/bthetext. 

Please 

[jira] Commented: (LUCENE-1653) Change DateTools to not create a Calendar in every call to dateToString or timeToString

2009-06-29 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725447#action_12725447
 ] 

David Smiley commented on LUCENE-1653:
--

I'm looking through DateTools now and can't help but want to clean it up some.  
One thing I see that is odd is the use of a Calendar in 
timeToString(long,resolution).  The first two lines look like this right now:
{code}
calInstance.setTimeInMillis(round(time, resolution));
Date date = calInstance.getTime();
{code}

Instead, it can simply be:
{code}
Date date = new Date(round(time, resolution));
{code}.

Secondly... I think a good deal of logic can be cleaned up in the other methods 
instead of a bunch of if-else statements that is a bad code smell.  Most of the 
logic of 3 of those methods could be put into Resolution and be made tighter.

 Change DateTools to not create a Calendar in every call to dateToString or 
 timeToString
 ---

 Key: LUCENE-1653
 URL: https://issues.apache.org/jira/browse/LUCENE-1653
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Shai Erera
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1653.patch, LUCENE-1653.patch


 DateTools creates a Calendar instance on every call to dateToString and 
 timeToString. Specifically:
 # timeToString calls Calendar.getInstance on every call.
 # dateToString calls timeToString(date.getTime()), which then instantiates a 
 new Date(). I think we should change the order of the calls, or not have each 
 call the other.
 # round(), which is called from timeToString (after creating a Calendar 
 instance) creates another (!) Calendar instance ...
 Seems that if we synchronize the methods and create the Calendar instance 
 once (static), it should solve it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725448#action_12725448
 ] 

Robert Muir commented on LUCENE-1723:
-

Dima, have you tried your test against the latest lucene trunk?

I got these results:
{noformat}
org.apache.lucene.analysis.standard.StandardAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.SimpleAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.StopAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.WhitespaceAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.NewKeywordAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.KeywordAnalyzer passed, value highlighted: 
bthetext/b
{noformat}

maybe you can verify the same?

 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test (source is also attached) that exercises various 
 analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
 successfully wraps the text with the highlight tags, such as 
 bthetext/b. When using KeywordAnalyzer the tags appear before the text, 
 for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 Unless there is an objection I will gladly post a patch in the very near 
 future . 
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public 

[jira] Commented: (LUCENE-1653) Change DateTools to not create a Calendar in every call to dateToString or timeToString

2009-06-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725456#action_12725456
 ] 

Shai Erera commented on LUCENE-1653:


In 3.0 when we move to Java 5, we can make Resolution an enum, and then use a 
switch statement on passed in Resolution. But performance-wise I don't think it 
would make such a big difference, as we're already comparing instances, which 
should be relatively fast.

How will moving the logic of timeToString, stringToDate and round to Resolution 
make the code tighter? Resolution would still need to check its instance type 
in order to execute the right code. Unless we subclass Resolution internally 
and have each subclass implement just the code section of these 3, that it 
needs?

 Change DateTools to not create a Calendar in every call to dateToString or 
 timeToString
 ---

 Key: LUCENE-1653
 URL: https://issues.apache.org/jira/browse/LUCENE-1653
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Shai Erera
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1653.patch, LUCENE-1653.patch


 DateTools creates a Calendar instance on every call to dateToString and 
 timeToString. Specifically:
 # timeToString calls Calendar.getInstance on every call.
 # dateToString calls timeToString(date.getTime()), which then instantiates a 
 new Date(). I think we should change the order of the calls, or not have each 
 call the other.
 # round(), which is called from timeToString (after creating a Calendar 
 instance) creates another (!) Calendar instance ...
 Seems that if we synchronize the methods and create the Calendar instance 
 once (static), it should solve it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725460#action_12725460
 ] 

Dima May commented on LUCENE-1723:
--

Verified! You are absolutely correct, the bug has been fixed on the latest 
trunk. The next method in the KeywordTokenizer now sets the start and end 
offsets:

   reusableToken.setStartOffset(input.correctOffset(0))
   reusableToken.setEndOffset(input.correctOffset(upto));

I will resolve and close the ticket. Sorry for the trouble and thank you for 
the prompt attention. 


 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test (source is also attached) that exercises various 
 analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
 successfully wraps the text with the highlight tags, such as 
 bthetext/b. When using KeywordAnalyzer the tags appear before the text, 
 for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 Unless there is an objection I will gladly post a patch in the very near 
 future . 
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public TokenStream tokenStream(String fieldName, Reader reader) {
   return new NewKeywordTokenizer(reader);
   }
 }
 class NewKeywordTokenizer extends KeywordTokenizer {
   public NewKeywordTokenizer(Reader input) {
   super(input);
   }
   

[jira] Resolved: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May resolved LUCENE-1723.
--

   Resolution: Fixed
Fix Version/s: 2.9

 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Fix For: 2.9

 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test (source is also attached) that exercises various 
 analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
 successfully wraps the text with the highlight tags, such as 
 bthetext/b. When using KeywordAnalyzer the tags appear before the text, 
 for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 Unless there is an objection I will gladly post a patch in the very near 
 future . 
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public TokenStream tokenStream(String fieldName, Reader reader) {
   return new NewKeywordTokenizer(reader);
   }
 }
 class NewKeywordTokenizer extends KeywordTokenizer {
   public NewKeywordTokenizer(Reader input) {
   super(input);
   }
   @Override
   public Token next(Token t) throws IOException {
   Token result = super.next(t);
   if (result != null) {
   result.setEndOffset(result.termLength());
   }
   return result;
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email 

[jira] Closed: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Dima May (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May closed LUCENE-1723.



 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Fix For: 2.9

 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test (source is also attached) that exercises various 
 analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
 successfully wraps the text with the highlight tags, such as 
 bthetext/b. When using KeywordAnalyzer the tags appear before the text, 
 for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 Unless there is an objection I will gladly post a patch in the very near 
 future . 
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public TokenStream tokenStream(String fieldName, Reader reader) {
   return new NewKeywordTokenizer(reader);
   }
 }
 class NewKeywordTokenizer extends KeywordTokenizer {
   public NewKeywordTokenizer(Reader input) {
   super(input);
   }
   @Override
   public Token next(Token t) throws IOException {
   Token result = super.next(t);
   if (result != null) {
   result.setEndOffset(result.termLength());
   }
   return result;
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.