date:20090629

[
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725172#action_12725172
]

Shai Erera commented on LUCENE-1720:

bq. ... quickly testing a single volatile boolean, anActivityHasTimedOut.

Oh, I did not mean to skip this check. After anActivityHasTimedOut is true,
instead of comparing Thread.currentThread() to firstAnticipatedThreadToFail,
check if Thread.currentThread() is in the failed HashSet of threads, or
something like that.

I totally agree this should be kept and used that way, and it's probably better
than numberOfTimedOutThreads since we don't need to inc/dec the latter every
failure, just set a boolean flag and test it.

bq. Imo, the problem can be reformulated as Provide possibility to cancel
running queries on best effort basis, with or without providing so far
collected results.

That's where we started from, but Mark here wanted to provide a much more
generalized way of stopping any other activity, not just search. With this
utility class, someone can implement a TimeLimitedIndexWriter which times out
indexing, merging etc. Search is just one operation which will be covered as
well.

I also think that TimeLimitingCollector already provides a possibility to
cancel running queries on a best effort basis and therefore if someone is
interested in just that, he doesn't need to use TimeLimitedIndexReader. However
this approach seems much more simple if you want to ensure queries are stopped
ASAP, w/o passing a Timeout object around or anything. This approach also
guarantees (I think) that any custom Scorer which does a lot of work, but uses
IndexReader for that, will be stopped, even if the Scorer's developer did not
implement a Timeout mechanism. Right?

TimeLimitedIndexReader and associated utility class
---

Key: LUCENE-1720
URL: https://issues.apache.org/jira/browse/LUCENE-1720
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
Attachments: ActivityTimedOutException.java,
ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java,
TimeLimitedIndexReader.java

An alternative to TimeLimitedCollector that has the following advantages:
1) Any reader activity can be time-limited rather than just single searches
e.g. the document retrieve phase.
2) Times out faster (i.e. runaway queries such as fuzzies detected quickly
before last collect stage of query processing)
Uses new utility timeout class that is independent of IndexReader.
Initial contribution includes a performance test class but not had time as
yet to work up a formal Junit test.
TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725176#action_12725176
 ] 

Mark Harwood commented on LUCENE-1720:
--

bq. Oh, I did not mean to skip this check.

But the check is on a variable with a yes/no state. We need to cater for 1 
simultaneous timeout error condition in play. With only a boolean it could be 
hard to know precisely when to clear it, no?

bq. Mark here wanted to provide a much more generalized way of stopping any 
other activity, not just search

To be fair I think the use case for IndexWriter is weaker. In reader you have 
multiple users all expressing different queries and you want them all to share 
nicely with each other. In index writing it's typically a batch system indexing 
docs and there's no fairness to mediate? Breaking it out into a utility class 
seems like a good idea anyway.

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement

2009-06-29 Thread Robert Muir (JIRA)

SmartChineseAnalyzer javadoc improvement


 Key: LUCENE-1722
 URL: https://issues.apache.org/jira/browse/LUCENE-1722
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor


Chinese - English, and corrections to match reality (removes several javadoc 
warnings)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725182#action_12725182
 ] 

Eks Dev commented on LUCENE-1720:
-

Sure, I just wanted to sharpen definition what is Lucene core issue, and what 
we can leave to end users. It is not only about the time, rather about 
canceling search requests (even better, general activities). 

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class


[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725183#action_12725183
 ] 

Shai Erera commented on LUCENE-1720:


bq. With only a boolean it could be hard to know precisely when to clear it, no?

We can cleat it when the time out threads' Set's size() is 0?

I agree that this issue is mostly about IndexReader (and hence the name), and 
that the scenario of IndexWriter is weaker. But a utility class together w/ the 
TimeLimitedIndexReader example can help someone write a TimeLimitedIndexWriter 
very easily, and/or reuse this utility elsewhere.

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1722) SmartChineseAnalyzer javadoc improvement

2009-06-29 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1722:


Attachment: LUCENE-1722.txt

patch file

 SmartChineseAnalyzer javadoc improvement
 

 Key: LUCENE-1722
 URL: https://issues.apache.org/jira/browse/LUCENE-1722
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1722.txt


 Chinese - English, and corrections to match reality (removes several javadoc 
 warnings)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725197#action_12725197
 ] 

Mark Harwood commented on LUCENE-1720:
--

bq. any custom Scorer which does a lot of work, but uses IndexReader for that, 
will be stopped, even if the Scorer's developer did not implement a Timeout 
mechanism. Right?

Correct. I'm not familiar with the proposal to pass around a Timeout object but 
I get the idea and the code here would certainly avoid that overhead.

bq. We can cleat it when the time out threads' Set's size() is 0?

Yes, that would work.


 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class


[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725200#action_12725200
 ] 

Shai Erera commented on LUCENE-1720:


bq. I'm not familiar with the proposal to pass around a Timeout object

On the email thread I offered to create on QueryWeight a scorer(IndexSearcher, 
boolean, boolean, Timeout) in order to pass a Timeout object to Scorer, and 
also create a TimeLimitedQuery. But it's no longer needed.

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Reopened: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith reopened LUCENE-1705:
---


Looks like i found an issue with this

The deleteAll() method isn't resetting the nextDocID on the DocumentsWriter (or 
some similar behaviour)

so, the following state will result in an error:
* deleteAll()
* updateDocument(5, doc)
* commit()

this results in a delete for doc 5 getting buffered, but with a very high 
maxDocId
at the same time, doc is added, however, the following will then occur on 
commit:
* flush segments to disk
* doc 5 is now in a segment on disk
* run deletes
* doc 5 is now blacklisted from segment 

Will work on fixing this and post a new patch (along with updated test case)

(was worried i was missing an edge case)

 Add deleteAllDocuments() method to IndexWriter
 --

 Key: LUCENE-1705
 URL: https://issues.apache.org/jira/browse/LUCENE-1705
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Affects Versions: 2.4
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch


 Ideally, there would be a deleteAllDocuments() or clear() method on the 
 IndexWriter
 This method should have the same performance and characteristics as:
 * currentWriter.close()
 * currentWriter = new IndexWriter(..., create=true,...)
 This would greatly optimize a delete all documents case. Using 
 deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
 existing index.
 IndexWriter.deleteAllDocuments() should have the same semantics as a 
 commit(), as far as index visibility goes (new IndexReader opening would get 
 the empty index)
 I see this was previously asked for in LUCENE-932, however it would be nice 
 to finally see this added such that the IndexWriter would not need to be 
 closed to perform the clear as this seems to be the general recommendation 
 for working with an IndexWriter now
 deleteAllDocuments() method should:
 * abort any background merges (they are pointless once a deleteAll has been 
 received)
 * write new segments file referencing no segments
 This method would remove one of the final reasons i would ever need to close 
 an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Closed: (LUCENE-1706) Site search powered by Lucene/Solr

2009-06-29 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll closed LUCENE-1706.
---

Resolution: Fixed
Lucene Fields: (was: [New])

Site search powered by Lucene/Solr
--

Key: LUCENE-1706
URL: https://issues.apache.org/jira/browse/LUCENE-1706
Project: Lucene - Java
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 2.9

Attachments: LUCENE-1706.patch, LUCENE-1706.patch

For a number of years now, the Lucene community has been criticized for not
eating our own dog food when it comes to search. My company has built and
hosts a site search (http://www.lucidimagination.com/search) that is powered
by Apache Solr and Lucene and we'd like to donate it's use to the Lucene
community. Additionally, it allows one to search all of the Lucene content
from a single place, including web, wiki, JIRA and mail archives. See also
http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org
You can see it live on Mahout, Tika and Solr
Lucid has a fault tolerant setup with replication and fail over as well as
monitoring services in place. We are committed to maintaining and expanding
the search capabilities on the site.
The following patch adds a skin to the Forrest site that enables the Lucene
site to search Lucene only content using Lucene/Solr. When a search is
submitted, it automatically selects the Lucene facet such that only Lucene
content is searched. From there, users can then narrow/broaden their search
criteria.
I plan on committing in a 3 or 4 days.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: TestIndexWriterDelete.patch

Here's a patch to TestIndexWriterDelete that shows the problem

after the deleteAll(), a document is added and a document is updated
the added document gets indexed, the updated document does not

 Add deleteAllDocuments() method to IndexWriter
 --

 Key: LUCENE-1705
 URL: https://issues.apache.org/jira/browse/LUCENE-1705
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Affects Versions: 2.4
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: IndexWriterDeleteAll.patch, LUCENE-1705.patch, 
 TestIndexWriterDelete.patch


 Ideally, there would be a deleteAllDocuments() or clear() method on the 
 IndexWriter
 This method should have the same performance and characteristics as:
 * currentWriter.close()
 * currentWriter = new IndexWriter(..., create=true,...)
 This would greatly optimize a delete all documents case. Using 
 deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
 existing index.
 IndexWriter.deleteAllDocuments() should have the same semantics as a 
 commit(), as far as index visibility goes (new IndexReader opening would get 
 the empty index)
 I see this was previously asked for in LUCENE-932, however it would be nice 
 to finally see this added such that the IndexWriter would not need to be 
 closed to perform the clear as this seems to be the general recommendation 
 for working with an IndexWriter now
 deleteAllDocuments() method should:
 * abort any background merges (they are pointless once a deleteAll has been 
 received)
 * write new segments file referencing no segments
 This method would remove one of the final reasons i would ever need to close 
 an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-06-29 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1566:


Attachment: LUCENE-1566.patch

I was able to reproduce the bug on my machine using several JVMs. The attached 
patch is what I got ready by now - I though I get it out there as soon as 
possible for discussion.
Test pass on my side!

 Large Lucene index can hit false OOM due to Sun JRE issue
 -

 Key: LUCENE-1566
 URL: https://issues.apache.org/jira/browse/LUCENE-1566
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.4.1
Reporter: Michael McCandless
Assignee: Simon Willnauer
Priority: Minor
 Attachments: LUCENE-1566.patch


 This is not a Lucene issue, but I want to open this so future google
 diggers can more easily find it.
 There's this nasty bug in Sun's JRE:
   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
 The gist seems to be, if you try to read a large (eg 200 MB) number of
 bytes during a single RandomAccessFile.read call, you can incorrectly
 hit OOM.  Lucene does this, with norms, since we read in one byte per
 doc per field with norms, as a contiguous array of length maxDoc().
 The workaround was a custom patch to do large file reads as several
 smaller reads.
 Background here:
   http://www.nabble.com/problems-with-large-Lucene-index-td22347854.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: DeleteAllFlushDocCountFix.patch

Here's a patch that fixes the deleteAll() + updateDocument() issue

just needed to set the FlushDocCount to 0 after aborting the outstanding 
documents

 Add deleteAllDocuments() method to IndexWriter
 --

 Key: LUCENE-1705
 URL: https://issues.apache.org/jira/browse/LUCENE-1705
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Affects Versions: 2.4
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: DeleteAllFlushDocCountFix.patch, 
 IndexWriterDeleteAll.patch, LUCENE-1705.patch


 Ideally, there would be a deleteAllDocuments() or clear() method on the 
 IndexWriter
 This method should have the same performance and characteristics as:
 * currentWriter.close()
 * currentWriter = new IndexWriter(..., create=true,...)
 This would greatly optimize a delete all documents case. Using 
 deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
 existing index.
 IndexWriter.deleteAllDocuments() should have the same semantics as a 
 commit(), as far as index visibility goes (new IndexReader opening would get 
 the empty index)
 I see this was previously asked for in LUCENE-932, however it would be nice 
 to finally see this added such that the IndexWriter would not need to be 
 closed to perform the clear as this seems to be the general recommendation 
 for working with an IndexWriter now
 deleteAllDocuments() method should:
 * abort any background merges (they are pointless once a deleteAll has been 
 received)
 * write new segments file referencing no segments
 This method would remove one of the final reasons i would ever need to close 
 an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1705) Add deleteAllDocuments() method to IndexWriter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1705:
--

Attachment: (was: TestIndexWriterDelete.patch)

 Add deleteAllDocuments() method to IndexWriter
 --

 Key: LUCENE-1705
 URL: https://issues.apache.org/jira/browse/LUCENE-1705
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Affects Versions: 2.4
Reporter: Tim Smith
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: DeleteAllFlushDocCountFix.patch, 
 IndexWriterDeleteAll.patch, LUCENE-1705.patch


 Ideally, there would be a deleteAllDocuments() or clear() method on the 
 IndexWriter
 This method should have the same performance and characteristics as:
 * currentWriter.close()
 * currentWriter = new IndexWriter(..., create=true,...)
 This would greatly optimize a delete all documents case. Using 
 deleteDocuments(new MatchAllDocsQuery()) could be expensive given a large 
 existing index.
 IndexWriter.deleteAllDocuments() should have the same semantics as a 
 commit(), as far as index visibility goes (new IndexReader opening would get 
 the empty index)
 I see this was previously asked for in LUCENE-932, however it would be nice 
 to finally see this added such that the IndexWriter would not need to be 
 closed to perform the clear as this seems to be the general recommendation 
 for working with an IndexWriter now
 deleteAllDocuments() method should:
 * abort any background merges (they are pointless once a deleteAll has been 
 received)
 * write new segments file referencing no segments
 This method would remove one of the final reasons i would ever need to close 
 an IndexWriter and reopen a new one 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725386#action_12725386
 ] 

Jason Rutherglen commented on LUCENE-1720:
--

Maybe we can benchmark this approach to see if it slows down
queries due to the the Thread.currentThread and hash lookup? As
this would go into 3.0 (?) maybe we can look at how to change
the Lucene API such that we pass in an argument to the
IndexReader methods where the timeout may be checked for?

 TimeLimitedIndexReader and associated utility class
 ---

 Key: LUCENE-1720
 URL: https://issues.apache.org/jira/browse/LUCENE-1720
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: ActivityTimedOutException.java, 
 ActivityTimeMonitor.java, TestTimeLimitedIndexReader.java, 
 TimeLimitedIndexReader.java


 An alternative to TimeLimitedCollector that has the following advantages:
 1) Any reader activity can be time-limited rather than just single searches 
 e.g. the document retrieve phase.
 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly 
 before last collect stage of query processing)
 Uses new utility timeout class that is independent of IndexReader.
 Initial contribution includes a performance test class but not had time as 
 yet to work up a formal Junit test.
 TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

KeywordTokenizer does not properly set the end offset
-

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java

KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test that exercises various analyzers via a Highlighter 
instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
with the highlight tags, such as bthetext/b. When using KeywordAnalyzer 
the tags appear before the text, for example: b/bthetext. 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = thetext;
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
b, /b), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
CONTENT, text);
assertEquals(Failed for  + 
analazer.getClass().getName(), b
+ text + /b, highighted);
System.out.println(analazer.getClass().getName()
+  passed, value highlighted:  + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset


 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Attachment: AnalyzerBug.java

 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test that exercises various analyzers via a Highlighter 
 instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
 with the highlight tags, such as bthetext/b. When using KeywordAnalyzer 
 the tags appear before the text, for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public TokenStream tokenStream(String fieldName, Reader reader) {
   return new NewKeywordTokenizer(reader);
   }
 }
 class NewKeywordTokenizer extends KeywordTokenizer {
   public NewKeywordTokenizer(Reader input) {
   super(input);
   }
   @Override
   public Token next(Token t) throws IOException {
   Token result = super.next(t);
   if (result != null) {
   result.setEndOffset(result.termLength());
   }
   return result;
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail:

[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset


 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Description: 
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as bthetext/b. 
When using KeywordAnalyzer the tags appear before the text, for example: 
b/bthetext. 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = thetext;
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
b, /b), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
CONTENT, text);
assertEquals(Failed for  + 
analazer.getClass().getName(), b
+ text + /b, highighted);
System.out.println(analazer.getClass().getName()
+  passed, value highlighted:  + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


  was:
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test that exercises various analyzers via a Highlighter 
instance. Every analyzer but the KeywordAnazlyzer successfully wraps the text 
with the highlight tags, such as bthetext/b. When using KeywordAnalyzer 
the tags appear before the text, for example: b/bthetext. 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are

[jira] Updated: (LUCENE-1723) KeywordTokenizer does not properly set the end offset


 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May updated LUCENE-1723:
-

Description: 
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as bthetext/b. 
When using KeywordAnalyzer the tags appear before the text, for example: 
b/bthetext. 

Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
using NewKeywordAnalyzer the tags are properly placed around the text. The 
NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
the end offset for the returned Token. NewKeywordAnalyzer utilizes 
KeywordTokenizer to produce proper token.

Unless there is an objection I will gladly post a patch in the very near future 
. 

-
package lucene;

import java.io.IOException;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.WeightedTerm;
import org.junit.Test;
import static org.junit.Assert.*;

public class AnalyzerBug {

@Test
public void testWithHighlighting() throws IOException {
String text = thetext;
WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };

Highlighter highlighter = new Highlighter(new 
SimpleHTMLFormatter(
b, /b), new QueryScorer(terms));

Analyzer[] analazers = { new StandardAnalyzer(), new 
SimpleAnalyzer(),
new StopAnalyzer(), new WhitespaceAnalyzer(),
new NewKeywordAnalyzer(), new KeywordAnalyzer() 
};

// Analyzers pass except KeywordAnalyzer
for (Analyzer analazer : analazers) {
String highighted = 
highlighter.getBestFragment(analazer,
CONTENT, text);
assertEquals(Failed for  + 
analazer.getClass().getName(), b
+ text + /b, highighted);
System.out.println(analazer.getClass().getName()
+  passed, value highlighted:  + 
highighted);
}
}
}

class NewKeywordAnalyzer extends KeywordAnalyzer {

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new NewKeywordTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else
tokenizer.reset(reader);
return tokenizer;
}

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new NewKeywordTokenizer(reader);
}
}

class NewKeywordTokenizer extends KeywordTokenizer {
public NewKeywordTokenizer(Reader input) {
super(input);
}

@Override
public Token next(Token t) throws IOException {
Token result = super.next(t);
if (result != null) {
result.setEndOffset(result.termLength());
}
return result;
}
}


  was:
KeywordTokenizer sets the Token's term length attribute but appears to omit the 
end offset. The issue was discovered while using a highlighter with the 
KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating the 
bug. 

Below is a JUnit test (source is also attached) that exercises various 
analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
successfully wraps the text with the highlight tags, such as bthetext/b. 
When using KeywordAnalyzer the tags appear before the text, for example: 
b/bthetext. 

Please

[jira] Commented: (LUCENE-1653) Change DateTools to not create a Calendar in every call to dateToString or timeToString

2009-06-29 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725447#action_12725447
 ] 

David Smiley commented on LUCENE-1653:
--

I'm looking through DateTools now and can't help but want to clean it up some.  
One thing I see that is odd is the use of a Calendar in 
timeToString(long,resolution).  The first two lines look like this right now:
{code}
calInstance.setTimeInMillis(round(time, resolution));
Date date = calInstance.getTime();
{code}

Instead, it can simply be:
{code}
Date date = new Date(round(time, resolution));
{code}.

Secondly... I think a good deal of logic can be cleaned up in the other methods 
instead of a bunch of if-else statements that is a bad code smell.  Most of the 
logic of 3 of those methods could be put into Resolution and be made tighter.

 Change DateTools to not create a Calendar in every call to dateToString or 
 timeToString
 ---

 Key: LUCENE-1653
 URL: https://issues.apache.org/jira/browse/LUCENE-1653
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Shai Erera
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1653.patch, LUCENE-1653.patch


 DateTools creates a Calendar instance on every call to dateToString and 
 timeToString. Specifically:
 # timeToString calls Calendar.getInstance on every call.
 # dateToString calls timeToString(date.getTime()), which then instantiates a 
 new Date(). I think we should change the order of the calls, or not have each 
 call the other.
 # round(), which is called from timeToString (after creating a Calendar 
 instance) creates another (!) Calendar instance ...
 Seems that if we synchronize the methods and create the Calendar instance 
 once (static), it should solve it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1723) KeywordTokenizer does not properly set the end offset

2009-06-29 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725448#action_12725448
 ] 

Robert Muir commented on LUCENE-1723:
-

Dima, have you tried your test against the latest lucene trunk?

I got these results:
{noformat}
org.apache.lucene.analysis.standard.StandardAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.SimpleAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.StopAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.WhitespaceAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.NewKeywordAnalyzer passed, value highlighted: 
bthetext/b
org.apache.lucene.analysis.KeywordAnalyzer passed, value highlighted: 
bthetext/b
{noformat}

maybe you can verify the same?

 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test (source is also attached) that exercises various 
 analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
 successfully wraps the text with the highlight tags, such as 
 bthetext/b. When using KeywordAnalyzer the tags appear before the text, 
 for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 Unless there is an objection I will gladly post a patch in the very near 
 future . 
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public

[jira] Commented: (LUCENE-1653) Change DateTools to not create a Calendar in every call to dateToString or timeToString

[
https://issues.apache.org/jira/browse/LUCENE-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725456#action_12725456
]

Shai Erera commented on LUCENE-1653:

In 3.0 when we move to Java 5, we can make Resolution an enum, and then use a
switch statement on passed in Resolution. But performance-wise I don't think it
would make such a big difference, as we're already comparing instances, which
should be relatively fast.

How will moving the logic of timeToString, stringToDate and round to Resolution
make the code tighter? Resolution would still need to check its instance type
in order to execute the right code. Unless we subclass Resolution internally
and have each subclass implement just the code section of these 3, that it
needs?

Change DateTools to not create a Calendar in every call to dateToString or
timeToString
---

Key: LUCENE-1653
URL: https://issues.apache.org/jira/browse/LUCENE-1653
Project: Lucene - Java
Issue Type: Improvement
Components: Other
Reporter: Shai Erera
Assignee: Mark Miller
Priority: Minor
Fix For: 2.9

Attachments: LUCENE-1653.patch, LUCENE-1653.patch

DateTools creates a Calendar instance on every call to dateToString and
timeToString. Specifically:
# timeToString calls Calendar.getInstance on every call.
# dateToString calls timeToString(date.getTime()), which then instantiates a
new Date(). I think we should change the order of the calls, or not have each
call the other.
# round(), which is called from timeToString (after creating a Calendar
instance) creates another (!) Calendar instance ...
Seems that if we synchronize the methods and create the Calendar instance
once (static), it should solve it.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1723) KeywordTokenizer does not properly set the end offset


[ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725460#action_12725460
 ] 

Dima May commented on LUCENE-1723:
--

Verified! You are absolutely correct, the bug has been fixed on the latest 
trunk. The next method in the KeywordTokenizer now sets the start and end 
offsets:

   reusableToken.setStartOffset(input.correctOffset(0))
   reusableToken.setEndOffset(input.correctOffset(upto));

I will resolve and close the ticket. Sorry for the trouble and thank you for 
the prompt attention. 


 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test (source is also attached) that exercises various 
 analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
 successfully wraps the text with the highlight tags, such as 
 bthetext/b. When using KeywordAnalyzer the tags appear before the text, 
 for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 Unless there is an objection I will gladly post a patch in the very near 
 future . 
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public TokenStream tokenStream(String fieldName, Reader reader) {
   return new NewKeywordTokenizer(reader);
   }
 }
 class NewKeywordTokenizer extends KeywordTokenizer {
   public NewKeywordTokenizer(Reader input) {
   super(input);
   }

[jira] Resolved: (LUCENE-1723) KeywordTokenizer does not properly set the end offset


 [ 
https://issues.apache.org/jira/browse/LUCENE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima May resolved LUCENE-1723.
--

   Resolution: Fixed
Fix Version/s: 2.9

 KeywordTokenizer does not properly set the end offset
 -

 Key: LUCENE-1723
 URL: https://issues.apache.org/jira/browse/LUCENE-1723
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4.1
Reporter: Dima May
Priority: Minor
 Fix For: 2.9

 Attachments: AnalyzerBug.java


 KeywordTokenizer sets the Token's term length attribute but appears to omit 
 the end offset. The issue was discovered while using a highlighter with the 
 KeywordAnalyzer. KeywordAnalyzer delegates to KeywordTokenizer propagating 
 the bug. 
 Below is a JUnit test (source is also attached) that exercises various 
 analyzers via a Highlighter instance. Every analyzer but the KeywordAnazlyzer 
 successfully wraps the text with the highlight tags, such as 
 bthetext/b. When using KeywordAnalyzer the tags appear before the text, 
 for example: b/bthetext. 
 Please note NewKeywordAnalyzer and NewKeywordTokenizer classes below. When 
 using NewKeywordAnalyzer the tags are properly placed around the text. The 
 NewKeywordTokenizer overrides the next method of the KeywordTokenizer setting 
 the end offset for the returned Token. NewKeywordAnalyzer utilizes 
 KeywordTokenizer to produce proper token.
 Unless there is an objection I will gladly post a patch in the very near 
 future . 
 -
 package lucene;
 import java.io.IOException;
 import java.io.Reader;
 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.analysis.KeywordAnalyzer;
 import org.apache.lucene.analysis.KeywordTokenizer;
 import org.apache.lucene.analysis.SimpleAnalyzer;
 import org.apache.lucene.analysis.StopAnalyzer;
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.Tokenizer;
 import org.apache.lucene.analysis.WhitespaceAnalyzer;
 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.search.highlight.Highlighter;
 import org.apache.lucene.search.highlight.QueryScorer;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.WeightedTerm;
 import org.junit.Test;
 import static org.junit.Assert.*;
 public class AnalyzerBug {
   @Test
   public void testWithHighlighting() throws IOException {
   String text = thetext;
   WeightedTerm[] terms = { new WeightedTerm(1.0f, text) };
   Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(
   b, /b), new QueryScorer(terms));
   Analyzer[] analazers = { new StandardAnalyzer(), new 
 SimpleAnalyzer(),
   new StopAnalyzer(), new WhitespaceAnalyzer(),
   new NewKeywordAnalyzer(), new KeywordAnalyzer() 
 };
   // Analyzers pass except KeywordAnalyzer
   for (Analyzer analazer : analazers) {
   String highighted = 
 highlighter.getBestFragment(analazer,
   CONTENT, text);
   assertEquals(Failed for  + 
 analazer.getClass().getName(), b
   + text + /b, highighted);
   System.out.println(analazer.getClass().getName()
   +  passed, value highlighted:  + 
 highighted);
   }
   }
 }
 class NewKeywordAnalyzer extends KeywordAnalyzer {
   @Override
   public TokenStream reusableTokenStream(String fieldName, Reader reader)
   throws IOException {
   Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
   if (tokenizer == null) {
   tokenizer = new NewKeywordTokenizer(reader);
   setPreviousTokenStream(tokenizer);
   } else
   tokenizer.reset(reader);
   return tokenizer;
   }
   @Override
   public TokenStream tokenStream(String fieldName, Reader reader) {
   return new NewKeywordTokenizer(reader);
   }
 }
 class NewKeywordTokenizer extends KeywordTokenizer {
   public NewKeywordTokenizer(Reader input) {
   super(input);
   }
   @Override
   public Token next(Token t) throws IOException {
   Token result = super.next(t);
   if (result != null) {
   result.setEndOffset(result.termLength());
   }
   return result;
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email

[jira] Closed: (LUCENE-1723) KeywordTokenizer does not properly set the end offset