[jira] Assigned: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
[ https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-1213: --- Assignee: Doron Cohen MultiFieldQueryParser ignores slop parameter Key: LUCENE-1213 URL: https://issues.apache.org/jira/browse/LUCENE-1213 Project: Lucene - Java Issue Type: Bug Components: QueryParser Reporter: Trejkaz Assignee: Doron Cohen Attachments: multifield-fix.patch MultiFieldQueryParser.getFieldQuery(String, String, int) calls super.getFieldQuery(String, String), thus obliterating any slop parameter present in the query. It should probably be changed to call super.getFieldQuery(String, String, int), except doing only that will result in a recursive loop which is a side-effect of what may be a deeper problem in MultiFieldQueryParser -- getFieldQuery(String, String, int) is documented as delegating to getFieldQuery(String, String), yet what it actually does is the exact opposite. This also causes problems for subclasses which need to override getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
[ https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1213: Attachment: multifield-fix.patch Trekaj thanks for the patch. Attached a slightly compacted fix (refactoring slop-applying to a separate method). Also added a test that fails without this fix. All tests pass, if there are no comments I will commit this in a day or two. MultiFieldQueryParser ignores slop parameter Key: LUCENE-1213 URL: https://issues.apache.org/jira/browse/LUCENE-1213 Project: Lucene - Java Issue Type: Bug Components: QueryParser Reporter: Trejkaz Assignee: Doron Cohen Attachments: multifield-fix.patch, multifield-fix.patch MultiFieldQueryParser.getFieldQuery(String, String, int) calls super.getFieldQuery(String, String), thus obliterating any slop parameter present in the query. It should probably be changed to call super.getFieldQuery(String, String, int), except doing only that will result in a recursive loop which is a side-effect of what may be a deeper problem in MultiFieldQueryParser -- getFieldQuery(String, String, int) is documented as delegating to getFieldQuery(String, String), yet what it actually does is the exact opposite. This also causes problems for subclasses which need to override getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception
[ https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576927#action_12576927 ] Michael McCandless commented on LUCENE-1210: Yes, I agree. At some point soon we should do a 2.3.2 point release, and I'll port this issue back for that. IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception -- Key: LUCENE-1210 URL: https://issues.apache.org/jira/browse/LUCENE-1210 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 If you're using CMS (the default) and mergeInit hits an exception (eg OOME), we are not properly clearing IndexWriter's internal tracking of running merges. This causes IW.close() to hang while it incorrectly waits for these non-started merges to finish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush
[ https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576941#action_12576941 ] Michael McCandless commented on LUCENE-1208: Agreed. I'm thinking these issues should be ported to 2.3.2: LUCENE-1191 LUCENE-1197 LUCENE-1198 LUCENE-1199 LUCENE-1200 LUCENE-1208 (this issue) LUCENE-1210 Deadlock case in IndexWriter on exception just before flush --- Key: LUCENE-1208 URL: https://issues.apache.org/jira/browse/LUCENE-1208 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 Attachments: LUCENE-1208.patch If a document hits a non-aborting exception, eg something goes wrong in tokenStream.next(), and, that document had triggered a flush (due to RAM or doc count) then DocumentsWriter will deadlock because that thread marks the flush as pending but fails to clear it on exception. I have a simple test case showing this, and a fix fixing it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception
[ https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576924#action_12576924 ] Michele Bini commented on LUCENE-1210: -- Uhm, shouldn't the patch be committed in the 2.3 branch, too, as it affects 2.3.1? IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception -- Key: LUCENE-1210 URL: https://issues.apache.org/jira/browse/LUCENE-1210 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 If you're using CMS (the default) and mergeInit hits an exception (eg OOME), we are not properly clearing IndexWriter's internal tracking of running merges. This causes IW.close() to hang while it incorrectly waits for these non-started merges to finish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush
[ https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576933#action_12576933 ] Michele Bini commented on LUCENE-1208: -- As with LUCENE-1210, shouldn't the patch be committed in the 2.3 branch, too, as it affects 2.3.1? Other issues, such as the speedups in LUCENE-1211, although useful, can be left out as they are not bugs. But fix for deadlocks seem worthwhile for 2.3.x, too. Deadlock case in IndexWriter on exception just before flush --- Key: LUCENE-1208 URL: https://issues.apache.org/jira/browse/LUCENE-1208 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 Attachments: LUCENE-1208.patch If a document hits a non-aborting exception, eg something goes wrong in tokenStream.next(), and, that document had triggered a flush (due to RAM or doc count) then DocumentsWriter will deadlock because that thread marks the flush as pending but fails to clear it on exception. I have a simple test case showing this, and a fix fixing it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads
You make a good point. I think I will prob make this change. Asgeir Frimannsson (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576858#action_12576858 ] Asgeir Frimannsson commented on LUCENE-1026: Is there any specific reason why this indexaccessor is limited to FSDirectory based indexes? I see FSDirectory.getFile() is used as a unique key in the list of IndexAccessors in the factory. However, it seems more natural to use dir.getLockID() for this purpose. Then it would be possible to use a generic Directory rather than the file-system specific FSDirectory. Provide a simple way to concurrently access a Lucene index from multiple threads Key: LUCENE-1026 URL: https://issues.apache.org/jira/browse/LUCENE-1026 Project: Lucene - Java Issue Type: New Feature Components: Index, Search Reporter: Mark Miller Priority: Minor Attachments: DefaultIndexAccessor.java, DefaultMultiIndexAccessor.java, IndexAccessor-02.04.2008.zip, IndexAccessor-02.07.2008.zip, IndexAccessor-02.28.2008.zip, IndexAccessor-1.26.2008.zip, IndexAccessor-2.15.2008.zip, IndexAccessor.java, IndexAccessor.zip, IndexAccessorFactory.java, MultiIndexAccessor.java, shai-IndexAccessor-2.zip, shai-IndexAccessor.zip, shai-IndexAccessor3.zip, SimpleSearchServer.java, StopWatch.java, TestIndexAccessor.java For building interactive indexes accessed through a network/internet (multiple threads). This builds upon the LuceneIndexAccessor patch. That patch was not very newbie friendly and did not properly handle MultiSearchers (or at the least made it easy to get into trouble). This patch simplifies things and provides out of the box support for sharing the IndexAccessors across threads. There is also a simple test class and example SearchServer to get you started. Future revisions will be zipped. Works pretty solid as is, but could use the ability to warm new Searchers. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-794: --- Attachment: SpanHighlighter-02-10-2008.patch Another attempt at putting this to bed. Added the MultiPhraseQuery support patch above - thanks! Updated some code to stop using deprecated methods. Made highlighting ConstantScoreRangeQuerys optional, defaulting to false. - Mark Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery --- Key: LUCENE-794 URL: https://issues.apache.org/jira/browse/LUCENE-794 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Mark Miller Priority: Minor Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans. See http://issues.apache.org/jira/browse/LUCENE-403 for some background. There is a dependency on MemoryIndex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1214) Possible hidden exception on SegmentInfos commit
Possible hidden exception on SegmentInfos commit Key: LUCENE-1214 URL: https://issues.apache.org/jira/browse/LUCENE-1214 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Reporter: Mark Miller Priority: Trivial I am not sure if this is that big of a deal, but I just ran into it and thought I might mention it. SegmentInfos.commit removes the Segments File if it hits an exception. If it cannot remove the Segments file (because its not there or on Windows something has a hold of it), another Exception is thrown about not being able to delete the Segments file. Because of this, you lose the first exception, which might have useful info, including why the segments file might not be there to delete. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-1214) Possible hidden exception on SegmentInfos commit
[ https://issues.apache.org/jira/browse/LUCENE-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1214: -- Assignee: Michael McCandless Possible hidden exception on SegmentInfos commit Key: LUCENE-1214 URL: https://issues.apache.org/jira/browse/LUCENE-1214 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Reporter: Mark Miller Assignee: Michael McCandless Priority: Trivial I am not sure if this is that big of a deal, but I just ran into it and thought I might mention it. SegmentInfos.commit removes the Segments File if it hits an exception. If it cannot remove the Segments file (because its not there or on Windows something has a hold of it), another Exception is thrown about not being able to delete the Segments file. Because of this, you lose the first exception, which might have useful info, including why the segments file might not be there to delete. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1214) Possible hidden exception on SegmentInfos commit
[ https://issues.apache.org/jira/browse/LUCENE-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576970#action_12576970 ] Michael McCandless commented on LUCENE-1214: Good catch Mark. It seems like we should ignore any exception while trying to delete the partially written segments_N file, and throw the original exception. I'll do that. How did you hit these two exceptions? Possible hidden exception on SegmentInfos commit Key: LUCENE-1214 URL: https://issues.apache.org/jira/browse/LUCENE-1214 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Reporter: Mark Miller Priority: Trivial I am not sure if this is that big of a deal, but I just ran into it and thought I might mention it. SegmentInfos.commit removes the Segments File if it hits an exception. If it cannot remove the Segments file (because its not there or on Windows something has a hold of it), another Exception is thrown about not being able to delete the Segments file. Because of this, you lose the first exception, which might have useful info, including why the segments file might not be there to delete. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Going to Java 5. Was: Re: A bit of planning
On Thu, Jan 17, 2008 at 4:01 PM, DM Smith [EMAIL PROTECTED] wrote: On Jan 17, 2008, at 1:38 AM, Chris Hostetter wrote: : I'd like to recommend that 3.0 contain the new Java 5 API changes and what it : replaces be marked deprecated. 3.0 would also remove what was deprecated in : 2.9. Then in 3.1 we remove the deprecations. FWIW: This would violate the compatibility requirements, since code that compiles against 3.0 (with deprecation warnings) wouldn't compile against 3.1 -- but then again: there has been some mention of revisting the entire back compatibility commitments of Lucene, and now certainly seems like the time to discuss that before too much work is done in any particular direction in an attempt to head towards 2.9/3.0. Any way that it goes, my point is that it needs to be a two step process. The additional step needs to address the language differences. Maybe after 2.9, we add 2.9.5 (or whatever) that introduces the Java 5 APIs, with appropriate deprecations. 2.9.5 would require Java 1.5. Since going to Java 5 is a major change, I think it is not too wild to go from 3.0 straight to 4.0..? Main (and perhaps only) change would be moving to Java 5. This way we don't break any back.comp requirements.
[jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush
[ https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577073#action_12577073 ] Michael Busch commented on LUCENE-1208: --- We had seen this deadlock problem in our tests. I reran all tests with Lucene 2.3.1 + LUCENE-1208 and didn't see the problem again so far! Deadlock case in IndexWriter on exception just before flush --- Key: LUCENE-1208 URL: https://issues.apache.org/jira/browse/LUCENE-1208 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 Attachments: LUCENE-1208.patch If a document hits a non-aborting exception, eg something goes wrong in tokenStream.next(), and, that document had triggered a flush (due to RAM or doc count) then DocumentsWriter will deadlock because that thread marks the flush as pending but fails to clear it on exception. I have a simple test case showing this, and a fix fixing it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush
Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576941#action_12576941 ] Michael McCandless commented on LUCENE-1208: Agreed. I'm thinking these issues should be ported to 2.3.2: LUCENE-1191 LUCENE-1197 LUCENE-1198 LUCENE-1199 LUCENE-1200 LUCENE-1208 (this issue) LUCENE-1210 +1 -Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush
OK I'll backport. Mike Michael Busch wrote: Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1208? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanelfocusedCommentId=12576941#action_12576941 ] Michael McCandless commented on LUCENE-1208: Agreed. I'm thinking these issues should be ported to 2.3.2: LUCENE-1191 LUCENE-1197 LUCENE-1198 LUCENE-1199 LUCENE-1200 LUCENE-1208 (this issue) LUCENE-1210 +1 -Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Going to Java 5. Was: Re: A bit of planning
We voted to make 3.0 Java 1.5, full well knowing that it will break the back compat. requirements. I don't see the point of postponing it or dragging it out. On Mar 10, 2008, at 12:02 PM, Doron Cohen wrote: On Thu, Jan 17, 2008 at 4:01 PM, DM Smith [EMAIL PROTECTED] wrote: On Jan 17, 2008, at 1:38 AM, Chris Hostetter wrote: : I'd like to recommend that 3.0 contain the new Java 5 API changes and what it : replaces be marked deprecated. 3.0 would also remove what was deprecated in : 2.9. Then in 3.1 we remove the deprecations. FWIW: This would violate the compatibility requirements, since code that compiles against 3.0 (with deprecation warnings) wouldn't compile against 3.1 -- but then again: there has been some mention of revisting the entire back compatibility commitments of Lucene, and now certainly seems like the time to discuss that before too much work is done in any particular direction in an attempt to head towards 2.9/3.0. Any way that it goes, my point is that it needs to be a two step process. The additional step needs to address the language differences. Maybe after 2.9, we add 2.9.5 (or whatever) that introduces the Java 5 APIs, with appropriate deprecations. 2.9.5 would require Java 1.5. Since going to Java 5 is a major change, I think it is not too wild to go from 3.0 straight to 4.0..? Main (and perhaps only) change would be moving to Java 5. This way we don't break any back.comp requirements. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Going to Java 5. Was: Re: A bit of planning
Grant Ingersoll wrote: We voted to make 3.0 Java 1.5, full well knowing that it will break the back compat. requirements. I don't see the point of postponing it or dragging it out. I thought his suggestion was to skip 3.0 as a designator and instead use 4.0. If so, the schedule would not change. On Mar 10, 2008, at 12:02 PM, Doron Cohen wrote: On Thu, Jan 17, 2008 at 4:01 PM, DM Smith [EMAIL PROTECTED] wrote: On Jan 17, 2008, at 1:38 AM, Chris Hostetter wrote: : I'd like to recommend that 3.0 contain the new Java 5 API changes and what it : replaces be marked deprecated. 3.0 would also remove what was deprecated in : 2.9. Then in 3.1 we remove the deprecations. FWIW: This would violate the compatibility requirements, since code that compiles against 3.0 (with deprecation warnings) wouldn't compile against 3.1 -- but then again: there has been some mention of revisting the entire back compatibility commitments of Lucene, and now certainly seems like the time to discuss that before too much work is done in any particular direction in an attempt to head towards 2.9/3.0. Any way that it goes, my point is that it needs to be a two step process. The additional step needs to address the language differences. Maybe after 2.9, we add 2.9.5 (or whatever) that introduces the Java 5 APIs, with appropriate deprecations. 2.9.5 would require Java 1.5. Since going to Java 5 is a major change, I think it is not too wild to go from 3.0 straight to 4.0..? Main (and perhaps only) change would be moving to Java 5. This way we don't break any back.comp requirements. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How to add a jar to a contrib build.xml
Hi all, perhaps this is a simple question, but I don't know how to do it. I'm developing on a new contrib subfolder. My development needs to use classes in another contrib subfolder. How do I add the corresponding JAR to the build.xml file? thanks in advance. -- Felipe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Going to Java 5. Was: Re: A bit of planning
On Mon, Mar 10, 2008 at 9:21 PM, DM Smith [EMAIL PROTECTED] wrote: Grant Ingersoll wrote: We voted to make 3.0 Java 1.5, full well knowing that it will break the back compat. requirements. I don't see the point of postponing it or dragging it out. I thought his suggestion was to skip 3.0 as a designator and instead use 4.0. If so, the schedule would not change. Right, that's what I meant: * 2.9 with deprecations, * 3.0 removing deprecated stuff but still Java 1.4, * 4.0 first Java 5 version But I am catching up now a looong list of discussions and missed this vote, so I am ok with taking this back and proceed as voted. - Doron
Re: How to add a jar to a contrib build.xml
Here is how the span highlighter I have been working on uses the Memory contrib (I think I copied this from another contrib that has a dependency): ?xml version=1.0? project name=highlighter default=buildHighlighter description Hits highlighter /description import file=../contrib-build.xml/ property name=memory.jar location=../../build/contrib/memory/lucene-memory-${version}.jar/ path id=classpath pathelement path=${lucene.jar}/ pathelement path=${memory.jar}/ pathelement path=${project.classpath}/ /path target name=buildHighlighter depends=buildMemory,default / target name=buildMemory echoHighlighter building dependency ${memory.jar}/echo ant antfile=../memory/build.xml target=default inheritall=false/ /target /project [EMAIL PROTECTED] wrote: Hi all, perhaps this is a simple question, but I don't know how to do it. I'm developing on a new contrib subfolder. My development needs to use classes in another contrib subfolder. How do I add the corresponding JAR to the build.xml file? thanks in advance. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Going to Java 5. Was: Re: A bit of planning
All it takes is one line in the announcement saying Version 3.0 uses Java 1.5 I don't think the significance will be lost on anyone. Everyone knows what Java 1.5 is. I'm -1 on calling it 4.0. People will then ask where is 3.0. I am +1 for sticking w/ the plan we voted for as described on http://wiki.apache.org/lucene-java/Java_1%2e5_Migration (last edited 10/1/2007) It's not like we are springing this on anyone. In fact, I'd be more than happy to announce it on the user list to let people know ahead of time. On Mar 10, 2008, at 3:52 PM, Doron Cohen wrote: On Mon, Mar 10, 2008 at 9:21 PM, DM Smith [EMAIL PROTECTED] wrote: Grant Ingersoll wrote: We voted to make 3.0 Java 1.5, full well knowing that it will break the back compat. requirements. I don't see the point of postponing it or dragging it out. I thought his suggestion was to skip 3.0 as a designator and instead use 4.0. If so, the schedule would not change. Right, that's what I meant: * 2.9 with deprecations, * 3.0 removing deprecated stuff but still Java 1.4, * 4.0 first Java 5 version But I am catching up now a looong list of discussions and missed this vote, so I am ok with taking this back and proceed as voted. - Doron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Going to Java 5. Was: Re: A bit of planning
Grant Ingersoll wrote: All it takes is one line in the announcement saying Version 3.0 uses Java 1.5 I don't think the significance will be lost on anyone. Everyone knows what Java 1.5 is. I'm -1 on calling it 4.0. People will then ask where is 3.0. I am +1 for sticking w/ the plan we voted for as described on http://wiki.apache.org/lucene-java/Java_1%2e5_Migration (last edited 10/1/2007) It's not like we are springing this on anyone. In fact, I'd be more than happy to announce it on the user list to let people know ahead of time. I'm fine with the plan as far as I understand it, but can you clarify something for me? While 3.0 won't be backward compatible in that it requires Java 5.0, will it be otherwise backward compatible? That is, if I compile with 2.9, eliminate all deprecations and use Java 5, can I drop 3.0 in and expect it to work without any further changes? I think that is what I am reading wrt the plan. DM - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577207#action_12577207 ] Mark Miller commented on LUCENE-584: I think there is still an issue here. The code below just broke for me. java.lang.ClassCastException: org.apache.lucene.util.OpenBitSet cannot be cast to java.util.BitSet at org.apache.lucene.search.CachingWrapperFilter.bits(CachingWrapperFilter.java:55) at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:177) at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:152) at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49) {code} public void testChainedCachedQueryFilter() throws IOException, ParseException { String path = c:/TestIndex; Analyzer analyzer = new WhitespaceAnalyzer(); IndexWriter writer = new IndexWriter(path, analyzer, true); Document doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the big bad fox, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the big bad pig, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the horrific girl, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the dirty boy, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the careful bad fox, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); writer.addDocument(doc); Searcher searcher = null; searcher = new IndexSearcher(path); QueryParser qp = new QueryParser(field, new KeywordAnalyzer()); Query query = qp.parse(content:fox); QueryWrapperFilter queryFilter = new QueryWrapperFilter(query); CachingWrapperFilter cwf = new CachingWrapperFilter(queryFilter); TopDocs hits = searcher.search(query, cwf, 1); System.out.println(hits: + hits.totalHits); queryFilter = new QueryWrapperFilter(qp.parse(category:red)); CachingWrapperFilter fcwf = new CachingWrapperFilter(queryFilter); Filter[] chain = new Filter[2]; chain[0] = cwf; chain[1] = fcwf; ChainedFilter cf = new ChainedFilter(chain, ChainedFilter.AND); hits = searcher.search(new MatchAllDocsQuery(), cf, 1); System.out.println(red: + hits.totalHits); queryFilter = new QueryWrapperFilter(qp.parse(category:blue)); CachingWrapperFilter fbcwf = new CachingWrapperFilter(queryFilter); chain = new Filter[2]; chain[0] = cwf; chain[1] = fbcwf; cf = new ChainedFilter(chain, ChainedFilter.AND); hits = searcher.search(new MatchAllDocsQuery(), cf, 1); System.out.println(blue: + hits.totalHits); } {code} Decouple Filter from BitSet --- Key: LUCENE-584 URL: https://issues.apache.org/jira/browse/LUCENE-584 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.1 Reporter: Peter Schäfer Assignee: Michael Busch Priority: Minor Fix For: 2.4 Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, ContribQueries20080111.patch, lucene-584-take2.patch, lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch {code} package org.apache.lucene.search; public abstract class Filter implements java.io.Serializable { public abstract AbstractBitSet bits(IndexReader reader) throws IOException; } public interface AbstractBitSet { public boolean get(int index); } {code} It would be useful if the method =Filter.bits()= returned an abstract interface, instead of =java.util.BitSet=. Use case: there is a very large index, and, depending on the user's privileges, only a small portion of the index is actually visible. Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It would be desirable to have an alternative BitSet implementation with smaller memory footprint. Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not designed for that purpose. That's why I propose to use an interface instead. The
[jira] Issue Comment Edited: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577207#action_12577207 ] [EMAIL PROTECTED] edited comment on LUCENE-584 at 3/10/08 2:48 PM: - I think there is still an issue here. The code below just broke for me. java.lang.ClassCastException: org.apache.lucene.util.OpenBitSet cannot be cast to java.util.BitSet at org.apache.lucene.search.CachingWrapperFilter.bits(CachingWrapperFilter.java:55) at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:177) at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:152) at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49) {code} public void testChainedCachedQueryFilter() throws IOException, ParseException { String path = c:/TestIndex; Analyzer analyzer = new WhitespaceAnalyzer(); IndexWriter writer = new IndexWriter(path, analyzer, true); Document doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the big bad fox, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the big bad pig, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the horrific girl, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the dirty boy, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the careful bad fox, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); writer.close(); Searcher searcher = null; searcher = new IndexSearcher(path); QueryParser qp = new QueryParser(field, new KeywordAnalyzer()); Query query = qp.parse(content:fox); QueryWrapperFilter queryFilter = new QueryWrapperFilter(query); CachingWrapperFilter cwf = new CachingWrapperFilter(queryFilter); TopDocs hits = searcher.search(query, cwf, 1); System.out.println(hits: + hits.totalHits); queryFilter = new QueryWrapperFilter(qp.parse(category:red)); CachingWrapperFilter fcwf = new CachingWrapperFilter(queryFilter); Filter[] chain = new Filter[2]; chain[0] = cwf; chain[1] = fcwf; ChainedFilter cf = new ChainedFilter(chain, ChainedFilter.AND); hits = searcher.search(new MatchAllDocsQuery(), cf, 1); System.out.println(red: + hits.totalHits); queryFilter = new QueryWrapperFilter(qp.parse(category:blue)); CachingWrapperFilter fbcwf = new CachingWrapperFilter(queryFilter); chain = new Filter[2]; chain[0] = cwf; chain[1] = fbcwf; cf = new ChainedFilter(chain, ChainedFilter.AND); hits = searcher.search(new MatchAllDocsQuery(), cf, 1); System.out.println(blue: + hits.totalHits); } {code} was (Author: [EMAIL PROTECTED]): I think there is still an issue here. The code below just broke for me. java.lang.ClassCastException: org.apache.lucene.util.OpenBitSet cannot be cast to java.util.BitSet at org.apache.lucene.search.CachingWrapperFilter.bits(CachingWrapperFilter.java:55) at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:177) at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:152) at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49) {code} public void testChainedCachedQueryFilter() throws IOException, ParseException { String path = c:/TestIndex; Analyzer analyzer = new WhitespaceAnalyzer(); IndexWriter writer = new IndexWriter(path, analyzer, true); Document doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the big bad fox, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the big bad pig, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, red, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the horrific girl, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the dirty boy, Store.NO, Index.TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED)); doc.add(new Field(content, the careful bad fox,
[jira] Commented: (LUCENE-1214) Possible hidden exception on SegmentInfos commit
[ https://issues.apache.org/jira/browse/LUCENE-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577220#action_12577220 ] Mark Miller commented on LUCENE-1214: - I am still trying to work that out...some craziness that started after I updated Lucene to trunk, but also made other fundamental changes, and windows vista may be haunting me too... The gist of it is that Lucene is failing when it tries to create an index file (creates the directory fine). I don't think its Lucene related at the moment, but I havnt gotten to the bottom of it either. Oddly, if I stop using the NoLockFactory (I manually manage a single Writer), things work...still digging though. Possible hidden exception on SegmentInfos commit Key: LUCENE-1214 URL: https://issues.apache.org/jira/browse/LUCENE-1214 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Reporter: Mark Miller Assignee: Michael McCandless Priority: Trivial Attachments: LUCENE-1214.patch I am not sure if this is that big of a deal, but I just ran into it and thought I might mention it. SegmentInfos.commit removes the Segments File if it hits an exception. If it cannot remove the Segments file (because its not there or on Windows something has a hold of it), another Exception is thrown about not being able to delete the Segments file. Because of this, you lose the first exception, which might have useful info, including why the segments file might not be there to delete. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Going to Java 5. Was: Re: A bit of planning
: I'm fine with the plan as far as I understand it, but can you clarify : something for me? : : While 3.0 won't be backward compatible in that it requires Java 5.0, will it : be otherwise backward compatible? That is, if I compile with 2.9, eliminate : all deprecations and use Java 5, can I drop 3.0 in and expect it to work : without any further changes? I think that point is still up in the air, and will depend largely on what type of APIs start shapping up for 3.0. I suspect that when the time comes, 2.9 may contain deprecations that refer forward to APIs that will be availbale in 3.0, but won't exist in 2.9 ... so true drop in compatibility may not be possible. Then again: the main reason i suspect that is that i'm anticipating APIs that use generics. i know that some weird things happen with generics and bytecode, so it may actually be possible to intruduce non generic (non-typesafe) versions of those APIs in 2.9 that people can compile against that will be bytecode compatible with 3.0 -- i'm not sure. (similar questions may come up with enum's and other misc langauge features however) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1215) Support of Unicode Collation
Support of Unicode Collation Key: LUCENE-1215 URL: https://issues.apache.org/jira/browse/LUCENE-1215 Project: Lucene - Java Issue Type: New Feature Components: Analysis Reporter: Hiroaki Kawai Attachments: NormalizerTokenFilter.java New in java 6, we have java.text.Normalizer that supports Unicode Standard Annex #15 normalization. http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html http://www.unicode.org/unicode/reports/tr15/ The normalization defined has four variants of C, D, KC, KD. Canonical Decomposition or Compatibility Decomposition will be normalize the representation of a String, and the search result will be improved. I'd like to submit a TokenFilter code supporting this feature! :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1215) Support of Unicode Collation
[ https://issues.apache.org/jira/browse/LUCENE-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1215: -- Attachment: NormalizerTokenFilter.java Support of Unicode Collation Key: LUCENE-1215 URL: https://issues.apache.org/jira/browse/LUCENE-1215 Project: Lucene - Java Issue Type: New Feature Components: Analysis Reporter: Hiroaki Kawai Attachments: NormalizerTokenFilter.java New in java 6, we have java.text.Normalizer that supports Unicode Standard Annex #15 normalization. http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html http://www.unicode.org/unicode/reports/tr15/ The normalization defined has four variants of C, D, KC, KD. Canonical Decomposition or Compatibility Decomposition will be normalize the representation of a String, and the search result will be improved. I'd like to submit a TokenFilter code supporting this feature! :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1032) CJKAnalyzer should convert half width katakana to full width katakana
[ https://issues.apache.org/jira/browse/LUCENE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577309#action_12577309 ] Hiroaki Kawai commented on LUCENE-1032: --- I think this feature should merged to https://issues.apache.org/jira/browse/LUCENE-1215 Unicode compatibility decomposition will fix this issue. :-) CJKAnalyzer should convert half width katakana to full width katakana - Key: LUCENE-1032 URL: https://issues.apache.org/jira/browse/LUCENE-1032 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.0.0 Reporter: Andrew Lynch Some of our Japanese customers are reporting errors when performing searches using half width characters. The desired behavior is that a document containing half width characters should be returned when performing a search using full width equivalents or when searching by the half width character itself. Currently, a search will not return any matches for half width characters. Here is a test case outlining desired behavior (this may require a new Analyzer). {code} public class TestJapaneseEncodings extends TestCase { byte[] fullWidthKa = new byte[]{(byte) 0xE3, (byte) 0x82, (byte) 0xAB}; byte[] halfWidthKa = new byte[]{(byte) 0xEF, (byte) 0xBD, (byte) 0xB6}; public void testAnalyzerWithHalfWidth() throws IOException { Reader r1 = new StringReader(makeHalfWidthKa()); TokenStream stream = new CJKAnalyzer().tokenStream(foo, r1); assertNotNull(stream); Token token = stream.next(); assertNotNull(token); assertEquals(makeFullWidthKa(), token.termText()); } public void testAnalyzerWithFullWidth() throws IOException { Reader r1 = new StringReader(makeFullWidthKa()); TokenStream stream = new CJKAnalyzer().tokenStream(foo, r1); assertEquals(makeFullWidthKa(), stream.next().termText()); } private String makeFullWidthKa() throws UnsupportedEncodingException { return new String(fullWidthKa, UTF-8); } private String makeHalfWidthKa() throws UnsupportedEncodingException { return new String(halfWidthKa, UTF-8); } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1215) Support of Unicode Collation
[ https://issues.apache.org/jira/browse/LUCENE-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577312#action_12577312 ] Andrew Lynch commented on LUCENE-1215: -- This will be quite useful. I used the Normalizer to implement my own custom analyzer for https://issues.apache.org/jira/browse/LUCENE-1032. There is actually a Normalizer equivalent in older versions of the Sun JDK, sun.text.Normalizer, but this obviously wouldn't end up being portable across VMs. I ended up using reflection to determine the presence of Normalizer if it existed, then fell back to sun.text.Normalizer, then finally performing no normalization if neither could be found to preserve compatibility with non Java 6/ Sun JDKs. Support of Unicode Collation Key: LUCENE-1215 URL: https://issues.apache.org/jira/browse/LUCENE-1215 Project: Lucene - Java Issue Type: New Feature Components: Analysis Reporter: Hiroaki Kawai Attachments: NormalizerTokenFilter.java New in java 6, we have java.text.Normalizer that supports Unicode Standard Annex #15 normalization. http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html http://www.unicode.org/unicode/reports/tr15/ The normalization defined has four variants of C, D, KC, KD. Canonical Decomposition or Compatibility Decomposition will be normalize the representation of a String, and the search result will be improved. I'd like to submit a TokenFilter code supporting this feature! :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]