[jira] Commented: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768118#action_12768118 ] Uwe Schindler commented on LUCENE-1998: --- I tested it here, we have no backwards problem (at least with normal usage). The dynamic linker of Java when running old Java 1.4 code against the new enum classes has no problem with the replaced superclass: Old code compiled against Field.Store.XXX against lucene-core-2.9.jar with superclass Parameter works perfectly with the new lucene-core-3.0.jar. This works because we only use the parameter class as a type safe enumeration an did not call any methods (only maybe toString()) of it. So the linker has no problem. I would simply apply this ptach to trunk. I would also remove the Parameter class completely, as it breaks no code (only if somebody has used that class for own enums). Maybe we should deprecate Parameter in 2.9.1 and say that it will be removed in 3.0 as this version uses Java5's enum. But it also does not hurt if we keep it and mark it deprecated as in the patch. To your patch: I only added the license header back in the Version class. It must be there. Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1998: -- Attachment: LUCENE-1998_enum.patch Patch with license header restored. Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-1998: - Assignee: Uwe Schindler Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1998: -- Attachment: LUCENE-1998_enum.patch Some fine tuning: You defined package protected abstract methods, but made them public in the enum constant. Changed to all-public. This was also a backwards-break in contrib/queryParser. I think this is ready to commit. Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768132#action_12768132 ] Uwe Schindler commented on LUCENE-1998: --- Some samll problem that may appear in future: We had renamed some enum constants in 2.9 (TOKENIZED - ANALYZED). No problems now, because deprec constants removed. If we want to do the same in future, we can do it the same way, but need to do a hack (because it is not officially supprted by Java 5): [http://forums.sun.com/thread.jspa?threadID=5137742] So it works, but not with switch statements. Just as a comment. But in my opinion, renaming enum constants is a bad thing... Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2000) Use covariant clone() return types
[ https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2000: -- Attachment: LUCENE-2000-clone_covariance.patch Use covariant clone() return types -- Key: LUCENE-2000 URL: https://issues.apache.org/jira/browse/LUCENE-2000 Project: Lucene - Java Issue Type: Task Affects Versions: 3.0 Reporter: Uwe Schindler Attachments: LUCENE-2000-clone_covariance.patch *Paul Cown wrote in LUCENE-1257:* OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not permitted. contrib/ was otherwise left untouched. Let me know what you think, or if you have any other questions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2000) Use covariant clone() return types
Use covariant clone() return types -- Key: LUCENE-2000 URL: https://issues.apache.org/jira/browse/LUCENE-2000 Project: Lucene - Java Issue Type: Task Affects Versions: 3.0 Reporter: Uwe Schindler Attachments: LUCENE-2000-clone_covariance.patch *Paul Cown wrote in LUCENE-1257:* OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not permitted. contrib/ was otherwise left untouched. Let me know what you think, or if you have any other questions. [ Show » ] Paul Cowan added a comment - 21/Oct/09 03:01 AM OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not permitted. contrib/ was otherwise left untouched. Let me know what you think, or if you have any other questions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2000) Use covariant clone() return types
[ https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2000: -- Description: *Paul Cown wrote in LUCENE-1257:* OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not permitted. contrib/ was otherwise left untouched. Let me know what you think, or if you have any other questions. was: *Paul Cown wrote in LUCENE-1257:* OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not permitted. contrib/ was otherwise left untouched. Let me know what you think, or if you have any other questions. [ Show » ] Paul Cowan added a comment - 21/Oct/09 03:01 AM OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not
[jira] Updated: (LUCENE-1257) Port to Java5
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1257: -- Attachment: (was: LUCENE-1257-clone_covariance.patch) Port to Java5 - Key: LUCENE-1257 URL: https://issues.apache.org/jira/browse/LUCENE-1257 Project: Lucene - Java Issue Type: Improvement Components: Analysis, Examples, Index, Other, Query/Scoring, QueryParser, Search, Store, Term Vectors Affects Versions: 3.0 Reporter: Cédric Champeau Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: instantiated_fieldable.patch, LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, LUCENE-1257-CompoundFileReaderWriter.patch, LUCENE-1257-ConcurrentMergeScheduler.patch, LUCENE-1257-DirectoryReader.patch, LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, LUCENE-1257-IndexDeleter.patch, LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, LUCENE-1257_unnecessary_casts.patch, lucene1257surround1.patch, lucene1257surround1.patch, shinglematrixfilter_generified.patch For my needs I've updated Lucene so that it uses Java 5 constructs. I know Java 5 migration had been planned for 2.1 someday in the past, but don't know when it is planned now. This patch against the trunk includes : - most obvious generics usage (there are tons of usages of sets, ... Those which are commonly used have been generified) - PriorityQueue generification - replacement of indexed for loops with for each constructs - removal of unnececessary unboxing The code is to my opinion much more readable with those features (you actually *know* what is stored in collections reading the code, without the need to lookup for field definitions everytime) and it simplifies many algorithms. Note that this patch also includes an interface for the Query class. This has been done for my company's needs for building custom Query classes which add some behaviour to the base Lucene queries. It prevents multiple unnnecessary casts. I know this introduction is not wanted by the team, but it really makes our developments easier to maintain. If you don't want to use this, replace all /Queriable/ calls with standard /Query/. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1257) Port to Java5
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1257: -- Comment: was deleted (was: OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of * Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery -- this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() * Some IndexReaders, e.g. DirectoryReader -- we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not permitted. contrib/ was otherwise left untouched. Let me know what you think, or if you have any other questions. ) Port to Java5 - Key: LUCENE-1257 URL: https://issues.apache.org/jira/browse/LUCENE-1257 Project: Lucene - Java Issue Type: Improvement Components: Analysis, Examples, Index, Other, Query/Scoring, QueryParser, Search, Store, Term Vectors Affects Versions: 3.0 Reporter: Cédric Champeau Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: instantiated_fieldable.patch, LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, LUCENE-1257-CompoundFileReaderWriter.patch, LUCENE-1257-ConcurrentMergeScheduler.patch, LUCENE-1257-DirectoryReader.patch, LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, LUCENE-1257-IndexDeleter.patch, LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, LUCENE-1257_unnecessary_casts.patch, lucene1257surround1.patch, lucene1257surround1.patch, shinglematrixfilter_generified.patch For my needs I've updated Lucene so that it uses Java 5 constructs. I know Java 5 migration had been planned for 2.1 someday in the past, but don't know when it is planned now. This patch against the trunk includes : - most obvious generics usage (there are tons of usages of sets, ... Those which are commonly used have been generified) - PriorityQueue generification - replacement of indexed for loops with for each constructs - removal of unnececessary unboxing The code is to my opinion much more readable with those features (you actually *know* what is stored in collections reading the code, without the need to lookup for field definitions everytime) and it simplifies many algorithms. Note that this patch also includes an interface for the Query class. This has been done
[jira] Commented: (LUCENE-1257) Port to Java5
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768136#action_12768136 ] Uwe Schindler commented on LUCENE-1257: --- Created a new issue out of clone invariance patch: LUCENE-2000 Port to Java5 - Key: LUCENE-1257 URL: https://issues.apache.org/jira/browse/LUCENE-1257 Project: Lucene - Java Issue Type: Improvement Components: Analysis, Examples, Index, Other, Query/Scoring, QueryParser, Search, Store, Term Vectors Affects Versions: 3.0 Reporter: Cédric Champeau Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: instantiated_fieldable.patch, LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, LUCENE-1257-CompoundFileReaderWriter.patch, LUCENE-1257-ConcurrentMergeScheduler.patch, LUCENE-1257-DirectoryReader.patch, LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, LUCENE-1257-IndexDeleter.patch, LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, LUCENE-1257_unnecessary_casts.patch, lucene1257surround1.patch, lucene1257surround1.patch, shinglematrixfilter_generified.patch For my needs I've updated Lucene so that it uses Java 5 constructs. I know Java 5 migration had been planned for 2.1 someday in the past, but don't know when it is planned now. This patch against the trunk includes : - most obvious generics usage (there are tons of usages of sets, ... Those which are commonly used have been generified) - PriorityQueue generification - replacement of indexed for loops with for each constructs - removal of unnececessary unboxing The code is to my opinion much more readable with those features (you actually *know* what is stored in collections reading the code, without the need to lookup for field definitions everytime) and it simplifies many algorithms. Note that this patch also includes an interface for the Query class. This has been done for my company's needs for building custom Query classes which add some behaviour to the base Lucene queries. It prevents multiple unnnecessary casts. I know this introduction is not wanted by the team, but it really makes our developments easier to maintain. If you don't want to use this, replace all /Queriable/ calls with standard /Query/. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2000) Use covariant clone() return types
[ https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2000: -- Description: *Paul Cowan wrote in LUCENE-1257:* OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not permitted. contrib/ was otherwise left untouched. Let me know what you think, or if you have any other questions. was: *Paul Cown wrote in LUCENE-1257:* OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than changing .clone() to return IndexReader, because it returns the result of IndexReader.clone(boolean). We could use covariant types for THAT, which would work fine, but that didn't follow the pattern of the others so that could be a later commit. Two changes were also made in contrib/, where not making the changes would have broken code by trying to widen IndexInput#clone() back out to returning Object, which is not permitted. contrib/ was otherwise left untouched. Let me know what you think, or if you have any other questions. Priority: Minor (was: Major) Use covariant clone() return types -- Key: LUCENE-2000 URL: https://issues.apache.org/jira/browse/LUCENE-2000 Project: Lucene - Java Issue Type: Task Affects Versions: 3.0 Reporter: Uwe Schindler Priority: Minor Attachments: LUCENE-2000-clone_covariance.patch *Paul Cowan wrote in LUCENE-1257:* OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in core, with the exception of Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of a SpanQuery to a SpanQuery - this can't be made covariant without declaring abstract SpanQuery clone() in SpanQuery itself, which breaks those SpanQuerys that don't declare their own clone() Some IndexReaders, e.g. DirectoryReader - we can't be more specific than
[jira] Issue Comment Edited: (LUCENE-2000) Use covariant clone() return types
[ https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768141#action_12768141 ] Uwe Schindler edited comment on LUCENE-2000 at 10/21/09 9:14 AM: - I moved this to an extra issue, because there is some discussion needed. I am strongly against this for various reasons: - Java 5 itsself does not override clone() with covariant return type (nowhere!). So e.g. String.clone() always returns jl.Object. - This is because of backwards problems (which are not easy to explain) -- it has something to do, if a subclass compiled against Java 1.4 version of Lucene overrides clone and calls super.clone(). Because of this, the JDK does not provide String.clone() retrurning String. javac does its best to prevent problems here, but for APIs that need to be backwards compatible, it should return Object as always. - Covariant clone return types need, that *all* subclasses of a class, that originally implemented a covariant clone() also override it covariant to be consistent. And because of this you have consistency problems (see your IndexReader problem). This is not possible for backwards compatibility. Because of this, covariant clone should only be done for internal classes (package-private, private) or final classes. Another example of this problem is AttributeImpl which defines a clone() method. Subclasses would need to override this covariant clone() method. Custom Attributes compiled against Lucene 2.9 would fail to do this - MethodNotFoundException (I tried it out, it breaks) Because of all this problems, I prefer to always cast the return value of clone(). This is not unsafe (and because of this you get no unchecked warning), because you always know how to cast the clone result. By the way: You still have to always clone() the super.clone() call, so you do not get any pros of using covariant return types. I do not want to start a flame war here, but we should not do this. was (Author: thetaphi): I moved this to an extra issue, because there is some discussion needed. I am strongly against this for various reasons: - Java 5 itsself does not override clone() with covariant return type (nowhere!). So e.g. String.clone() always returns jl.Object. - This is because of backwards problems (which are not easy to explain) -- it has something to do, if a subclass compiled against Java 1.4 version of Lucene overrides clone and calls super.clone(). Because of this, the JDK does not provide String.clone() retrurning String. javac does its best to prevent problems here, but for APIs that need to be backwards compatible, it should return Object as always. - Covariant clone return types need, that *all* subclasses of a class, that originally implemented a covariant clone() also override it covariant to be consistent. And because of this you have consistency problems (see your IndexReader problem). This is not possible for backwards compatibility. Because of this, covariant clone should only be done for internal classes (package-private, private) or final classes. Another example of this problem is AttributeImpl which defines an abstract clone method. Subclasses would need to override this covariant clone() method. Custom Attributes compiled against Lucene 2.9 would fail to do this - MethodNotFoundException (I tried it out, it breaks) Because of all this problems, I prefer to always cast the return value of clone(). This is not unsafe (and because of this you get no unchecked warning), because you always know how to cast the clone result. By the way: You still have to always clone() the super.clone() call, so you do not get any pros of using covariant return types. I do not want to start a flame war here, but we should not do this. Use covariant clone() return types -- Key: LUCENE-2000 URL: https://issues.apache.org/jira/browse/LUCENE-2000 Project: Lucene - Java Issue Type: Task Affects Versions: 3.0 Reporter: Uwe Schindler Priority: Minor Attachments: LUCENE-2000-clone_covariance.patch *Paul Cowan wrote in LUCENE-1257:* OK, thought I'd jump in and help out here with one of my Java 5 favourites. Haven't seen anyone discuss this, and don't believe any of the patches address this, so thought I'd throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 covariant return types for (almost) all of the Object#clone() implementations in core. i.e. this: public Object clone() { changes to: public SpanNotQuery clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone == null) clone = this.clone(); Almost everything has been done and all downcasts removed, in
Re: lucene 2.9 sorting algorithm
OK, thanks. I can help out if you've got questions on the python code... it's rather straightforward: it just iterates over each set of params to test, writes an alg file, runs it, opens the resulting output parses it for the best run, confirms both single multi PQ gave precisely the same doc IDs, and prints the results. It's remotely possible the difference in the results is a bug/overhead in contrib/benchmark itself, which'd be good to get to the bottom of anyway. Mike On Tue, Oct 20, 2009 at 9:17 PM, John Wang john.w...@gmail.com wrote: Hi Mike: That's weird. Let me take a look at the patch. Need to brush up on python though :) Thanks -John On Tue, Oct 20, 2009 at 10:25 AM, Michael McCandless luc...@mikemccandless.com wrote: OK I posted a patch that folds the MultiPQ approach into contrib/benchmark, plus a simple python wrapper to run old/new tests across different queries, sort, topN, etc. But I got different results... MultiPQ looks generally slower than SinglePQ. So I think we now need to reconcile what's different between our tests. Mike On Mon, Oct 19, 2009 at 9:28 PM, John Wang john.w...@gmail.com wrote: Hi Michael: Was wondering if you got a chance to take a look at this. Since deprecated APIs are being removed in 3.0, I was wondering if/when we would decide on keeping the ScoreDocComparator API and thus would be kept for Lucene 3.0. Thanks -John On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless luc...@mikemccandless.com wrote: Oh, no problem... Mike On Fri, Oct 16, 2009 at 12:33 PM, John Wang john.w...@gmail.com wrote: Mike, just a clarification on my first perf report email. The first section, numHits is incorrectly labeled, it should be 20 instead of 50. Sorry about the possible confusion. Thanks -John On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless luc...@mikemccandless.com wrote: Thanks John; I'll have a look. Mike On Fri, Oct 16, 2009 at 12:57 AM, John Wang john.w...@gmail.com wrote: Hi Michael: I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as a more general case. I think keeping the old api for ScoreDocComparator and SortComparatorSource would work. Please take a look. Thanks -John On Thu, Oct 15, 2009 at 6:52 PM, John Wang john.w...@gmail.com wrote: Hi Michael: It is open, http://code.google.com/p/lucene-book/source/checkout I think I sent the https url instead, sorry. The multi PQ sorting is fairly self-contained, I have 2 versions, 1 for string and 1 for int, each are Collector impls. I shouldn't say the Multi Q is faster on int sort, it is within the error boundary. The diff is very very small, I would stay they are more equal. If you think it is a good thing to go this way, (if not for the perf, just for the simpler api) I'd be happy to work on a patch. Thanks -John On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless luc...@mikemccandless.com wrote: John, looks like this requires login -- any plans to open that up, or, post the code on an issue? How self-contained is your Multi PQ sorting? EG is it a standalone Collector impl that I can test? Mike On Thu, Oct 15, 2009 at 6:33 PM, John Wang john.w...@gmail.com wrote: BTW, we are have a little sandbox for these experiments. And all my testcode are at. They are not very polished. https://lucene-book.googlecode.com/svn/trunk -John On Thu, Oct 15, 2009 at 3:29 PM, John Wang john.w...@gmail.com wrote: Numbers Mike requested for Int types: only the time/cputime are posted, others are all the same since the algorithm is the same. Lucene 2.9: numhits: 10 time: 14619495 cpu: 146126 numhits: 20 time: 14550568 cpu: 163242 numhits: 100 time: 16467647 cpu: 178379 my test: numHits: 10 time: 14101094 cpu: 144715 numHits: 20 time: 14804821 cpu: 151305 numHits: 100 time: 15372157 cpu time: 158842 Conclusions: The are very similar, the differences are all within error bounds, especially with lower PQ sizes, which second sort alg again slightly faster. Hope this helps. -John On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless luc...@mikemccandless.com wrote: Though it'd be odd if the switch to searching by segment really was most of the gains here. I had assumed that much of the improvement was due to ditching
Re: lucene 2.9 sorting algorithm
On Tue, Oct 20, 2009 at 11:55 AM, John Wang john.w...@gmail.com wrote: the simpler api places less restriction on the type of custom sorting that can be done. Just to verify: this is not a back-compat break, right? Because, in 2.4, such an interesting custom sort must've been operating at the top-level index reader level, which is easy to carry over to 2.9 (you just rebase the docIDs). But, of course in moving to 2.9, you would like to also switch your custom sort to be per-segment (for faster reopen/near real-time perf), but the new sort API makes this more difficult because it requires that you are able to compare hits across different segments during the search, not just at the end. But then I don't understand the difficulty of doing that: if we had a Collector with the MultiPQ approach, at the end during merge, you'd also have to compare results across segments, ie, upgrade your ords to their real values. The MultiPQ approach does this by calling sortValue (returns Comparable) in the end. Putting performance aside for now... when comparing bottom, you don't actually have to truly invert Comparable - ord on segment transition. You could, instead, get the Comparable for each and compare, but then note the smallest ord for the current segment that has failed to compete, and short-ciruit the compareBottom test by checking against that ord. That should enable carrying over the custom sort to the single PQ API without needing invert ord-value. We'd obviously have to test performance... Or, we could commit the MultiPQ approach as another sorting collector? I know it's not great having two wildly differenet sort APIs, but both APIs seem to have their strengths in different cases. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1999) Match spotter for all query types
[ https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768163#action_12768163 ] Michael McCandless commented on LUCENE-1999: Very clever! Since you are wrapping arbitrary query objs, couldn't the wrapper make a separate data structure for tracking which clause matched (instead of encoding it into the score)? Also: doesn't highlighter run, separately, on each doc? And so it's OK if the scores are affected? Ie, I would run my main search with a normal query, get the 10 results for the current page, then step through each of those 10 doc IDs make a single-doc-IndexSearcher, and run this wrapper? {quote} Avoiding these precision issues would require a change to Lucene core to record docId, score AND a matchFlag byte in ScoreDoc objects and collector APIs. This may be something we should consider. {quote} +1 I would love to see the Scorer API extended to optionally provide details on matches. Not just which clause matched which docs/fields, but the positions within the field where the match occurred. I think we could do this by absorbing *SpanQuery into their normal Query counterparts, making the getSpans API [somehow] optional so that if you didn't invoke it you don't pay a performance price. Match spotter for all query types - Key: LUCENE-1999 URL: https://issues.apache.org/jira/browse/LUCENE-1999 Project: Lucene - Java Issue Type: New Feature Affects Versions: 2.9 Reporter: Mark Harwood Attachments: matchflagger.patch Related to LUCENE-1929 and the current inability to highlight NumericRangeQuery, spatial, cached term filters and other exotica. This patch provides the ability to wrap *any* Query objects and record match info as flags encoded in the overall document score. Using this approach it would be possible to understand (and therefore highlight) which fields matched clauses in a query. The match encoding approach loses some precision in scores as noted here: http://tinyurl.com/ykt8nx7 Avoiding these precision issues would require a change to Lucene core to record docId, score AND a matchFlag byte in ScoreDoc objects and collector APIs. This may be something we should consider. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)
[ https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1987: -- Attachment: LUCENE-1987-StopFilter.patch A new patch which resolves the Benchmark problem by adding a static method in NewAnalyzerTask that loads an analyzer by class name: {code} public static final Analyzer createAnalyzer(String className) throws Exception{ final Class? extends Analyzer clazz = Class.forName(className).asSubclass(Analyzer.class); try { // first try to use a ctor with version parameter (needed for many new Analyzers that have no default one anymore Constructor? extends Analyzer cnstr = clazz.getConstructor(Version.class); return cnstr.newInstance(Version.LUCENE_CURRENT); } catch (NoSuchMethodException nsme) { // otherwise use default ctor return clazz.newInstance(); } } {code} This method is reused at other places where an Analyzer is created by a config property. This patch now passes all test. There are still the problems with Analyzer and QueryParsr with wrong default properties, but I would like to commit this first and then solve the problems, also in 2.9.1. Mike, are you OK with that? Remove rest of analysis deprecations (Token, CharacterCache) Key: LUCENE-1987 URL: https://issues.apache.org/jira/browse/LUCENE-1987 Project: Lucene - Java Issue Type: Task Components: Analysis Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9.1, 3.0 Attachments: LUCENE-1987-StopFilter-backport29.patch, LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch These removes the rest of the deprecations in the analysis package: - -Token's termText field-- (DONE) - -eventually un-deprecate ctors of Token taking Strings (they are still useful) - if yes remove deprec in 2.9.1- (DONE) - -remove CharacterCache and use Character.valueOf() from Java5- (DONE) - Stopwords lists - Remove the backwards settings from analyzers (acronym, posIncr,...). They are deprecated, but we still have the VERSION constants. Do not know, how to proceed. Keep the settings alive for index compatibility? Or remove it together with the version constants (which were undeprecated). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)
[ https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768167#action_12768167 ] Michael McCandless commented on LUCENE-1987: bq. Mike, are you OK with that? Looks great! Not only am I OK with it, it's exactly what I proposed (above -- https://issues.apache.org/jira/browse/LUCENE-1987?focusedCommentId=12767449page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767449). Maybe you missed my response there? (I also suggested adding Version to QP ctor). Remove rest of analysis deprecations (Token, CharacterCache) Key: LUCENE-1987 URL: https://issues.apache.org/jira/browse/LUCENE-1987 Project: Lucene - Java Issue Type: Task Components: Analysis Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9.1, 3.0 Attachments: LUCENE-1987-StopFilter-backport29.patch, LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch These removes the rest of the deprecations in the analysis package: - -Token's termText field-- (DONE) - -eventually un-deprecate ctors of Token taking Strings (they are still useful) - if yes remove deprec in 2.9.1- (DONE) - -remove CharacterCache and use Character.valueOf() from Java5- (DONE) - Stopwords lists - Remove the backwards settings from analyzers (acronym, posIncr,...). They are deprecated, but we still have the VERSION constants. Do not know, how to proceed. Keep the settings alive for index compatibility? Or remove it together with the version constants (which were undeprecated). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)
[ https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768178#action_12768178 ] Uwe Schindler commented on LUCENE-1987: --- I have seen your comment yesterday and implemented the benchmark thing that way. The QP ctor with Version param also looks good, but we have to add this to 2.9, too, to be able to remove the no-arg ctor, too. My patch still has a failed test int the ant task (missing no-arg ctor), will look into it, but fix is same like for benchmark. Remove rest of analysis deprecations (Token, CharacterCache) Key: LUCENE-1987 URL: https://issues.apache.org/jira/browse/LUCENE-1987 Project: Lucene - Java Issue Type: Task Components: Analysis Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9.1, 3.0 Attachments: LUCENE-1987-StopFilter-backport29.patch, LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch These removes the rest of the deprecations in the analysis package: - -Token's termText field-- (DONE) - -eventually un-deprecate ctors of Token taking Strings (they are still useful) - if yes remove deprec in 2.9.1- (DONE) - -remove CharacterCache and use Character.valueOf() from Java5- (DONE) - Stopwords lists - Remove the backwards settings from analyzers (acronym, posIncr,...). They are deprecated, but we still have the VERSION constants. Do not know, how to proceed. Keep the settings alive for index compatibility? Or remove it together with the version constants (which were undeprecated). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2001) wordnet parsing bug
wordnet parsing bug --- Key: LUCENE-2001 URL: https://issues.apache.org/jira/browse/LUCENE-2001 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 2.9 Reporter: Robert Muir Priority: Minor A user reported that wordnet parses the prolog file incorrectly. Also need to check the wordnet parser in the memory contrib for this problem. If this is a false alarm, i'm not worried, because the test will be the first unit test wordnet package ever had. {noformat} For example, looking up the synsets for the word king, we get: java SynLookup wnindex king baron magnate mogul power queen rex scrofula struma tycoon Here, scrofula and struma are extraneous. This happens because, the line parser code in Syns2Index.java interpretes the two consecutive single quotes in entry s(114144247,3,'king''s evil',n,1,1) in wn_s.pl file, as termination of the string and separates into king. This entry concerns synset of words scrofula and struma, and thus they get inserted in the synset of king. *There 1382 such entries, in wn_s.pl* and more in other WordNet Prolog data-base files, where such use of two consecutive single quotes appears. We have resolved this by adding a statement in the line parsing portion of Syns2Index.java, as follows: // parse line line = line.substring(2); * line = line.replaceAll(\'\', `); // added statement* int comma = line.indexOf(','); String num = line.substring(0, comma); ... ... etc. In short we replace '' by ` (a back-quote). Then on recreating the index, we get: java SynLookup zwnindex king baron magnate mogul power queen rex tycoon {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1999) Match spotter for all query types
[ https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768191#action_12768191 ] Michael McCandless commented on LUCENE-1999: I see, it sounds like your use case is different from the typical highlighting use case in that 1) you don't need the positions of the matches (just whether a given clause matched the doc or not), and 2) you need it for every single doc visited by the query, not just for the handful of docs that are being presented to the user on the current page. bq. This would suggest that you might need 2 query expressions - one for execution and one for adding highlighter instrumentation. I'm thinking it's the same query, but we fix the Scorer API for all queries (= big change!!) to be able to produce match details on demand, where those match details look something like what getSpans now returns. But for the normal case (only highlighting the docs being shown on current page), we'd only get the match details for that small set of docs. Then we ideally would not need a separate mirrored set of span queries. Ie, SpanTermQuery would be absorbed into TermQuery, etc. But I could easily be being too naive here :) Maybe there is some serious performance cost to even adding the optional API in. Match spotter for all query types - Key: LUCENE-1999 URL: https://issues.apache.org/jira/browse/LUCENE-1999 Project: Lucene - Java Issue Type: New Feature Affects Versions: 2.9 Reporter: Mark Harwood Attachments: matchflagger.patch Related to LUCENE-1929 and the current inability to highlight NumericRangeQuery, spatial, cached term filters and other exotica. This patch provides the ability to wrap *any* Query objects and record match info as flags encoded in the overall document score. Using this approach it would be possible to understand (and therefore highlight) which fields matched clauses in a query. The match encoding approach loses some precision in scores as noted here: http://tinyurl.com/ykt8nx7 Avoiding these precision issues would require a change to Lucene core to record docId, score AND a matchFlag byte in ScoreDoc objects and collector APIs. This may be something we should consider. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)
[ https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1987. --- Resolution: Fixed Committed in 2.9, 3.0, backwards branch. For the QueryParser problems and other additions of version constants I will open another issue. Remove rest of analysis deprecations (Token, CharacterCache) Key: LUCENE-1987 URL: https://issues.apache.org/jira/browse/LUCENE-1987 Project: Lucene - Java Issue Type: Task Components: Analysis Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9.1, 3.0 Attachments: LUCENE-1987-StopFilter-backport29.patch, LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch These removes the rest of the deprecations in the analysis package: - -Token's termText field-- (DONE) - -eventually un-deprecate ctors of Token taking Strings (they are still useful) - if yes remove deprec in 2.9.1- (DONE) - -remove CharacterCache and use Character.valueOf() from Java5- (DONE) - Stopwords lists - Remove the backwards settings from analyzers (acronym, posIncr,...). They are deprecated, but we still have the VERSION constants. Do not know, how to proceed. Keep the settings alive for index compatibility? Or remove it together with the version constants (which were undeprecated). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1998: -- Attachment: LUCENE-1998_enum.patch Updated patch (merged with StandardAnalyzer version constants). Also added Lucene version 3.0 for completeness to enable users to build apps and do not need to use the CURRENT constant. Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Fix For: 3.0, 2.9 This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768215#action_12768215 ] DM Smith commented on LUCENE-1998: -- .bq I only added the license header back in the Version class. It must be there. Sorry about wacking the license on Version. It must have been an accident. I know it needs to be there. .bq Some fine tuning: You defined package protected abstract methods, but made them public in the enum constant. Changed to all-public. This was also a backwards-break in contrib/queryParser. Thanks. Inadvertently, I was following the pattern for an Interface, where scoping does not matter. .bq So it works, but not with switch statements. IMHO: Having a switch statement (or cascading if-then-else) over the collection of values is generally indicative of a bad design (or an opportunity for an improved design :) By adding methods to each enum that return literals, we can eliminate this and at the same time, improve performance. There is another tuning opportunity, which I didn't take. We are marshaling out the flags from the enums into member variables. I'm not sure how efficient the storage of a boolean vs an enum is. If it is a wash, then having an enum value as replacement would be a good thing. It sould clearly document what controls the flag. The only complication would be the set/get for some of the flags. (E.g. AbstractField.setOmitNorms.) What's with that? Are the enum values merely a hint??? Does it make sense to allow omitNorms to be changed after an AbstractField is being used? Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1257) Port to Java5
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768234#action_12768234 ] Uwe Schindler commented on LUCENE-1257: --- Committed: - LUCENE-1257_more_unnecessary_casts.patch - Remove the rest of unchecked warnings. I added a TODO, where I do not understand the code and not for sure know, whats inside the collections. This could be fixed some time later. But the core code now compiles without any unchecked warning. Revision: 828011 Port to Java5 - Key: LUCENE-1257 URL: https://issues.apache.org/jira/browse/LUCENE-1257 Project: Lucene - Java Issue Type: Improvement Components: Analysis, Examples, Index, Other, Query/Scoring, QueryParser, Search, Store, Term Vectors Affects Versions: 3.0 Reporter: Cédric Champeau Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: instantiated_fieldable.patch, LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, LUCENE-1257-CompoundFileReaderWriter.patch, LUCENE-1257-ConcurrentMergeScheduler.patch, LUCENE-1257-DirectoryReader.patch, LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, LUCENE-1257-IndexDeleter.patch, LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, LUCENE-1257_unnecessary_casts.patch, lucene1257surround1.patch, lucene1257surround1.patch, shinglematrixfilter_generified.patch For my needs I've updated Lucene so that it uses Java 5 constructs. I know Java 5 migration had been planned for 2.1 someday in the past, but don't know when it is planned now. This patch against the trunk includes : - most obvious generics usage (there are tons of usages of sets, ... Those which are commonly used have been generified) - PriorityQueue generification - replacement of indexed for loops with for each constructs - removal of unnececessary unboxing The code is to my opinion much more readable with those features (you actually *know* what is stored in collections reading the code, without the need to lookup for field definitions everytime) and it simplifies many algorithms. Note that this patch also includes an interface for the Query class. This has been done for my company's needs for building custom Query classes which add some behaviour to the base Lucene queries. It prevents multiple unnnecessary casts. I know this introduction is not wanted by the team, but it really makes our developments easier to maintain. If you don't want to use this, replace all /Queriable/ calls with standard /Query/. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768242#action_12768242 ] Uwe Schindler commented on LUCENE-1998: --- (it's bq. not .bq :-) ) {quote} bq. So it works, but not with switch statements. IMHO: Having a switch statement (or cascading if-then-else) over the collection of values is generally indicative of a bad design (or an opportunity for an improved design By adding methods to each enum that return literals, we can eliminate this and at the same time, improve performance. {quote} You are right, my problem was more for client code of Lucene that may for example have a switch statement on Field.Index (e.g. Solr) to control some further indexing steps. If we rename the constant, the switch statement would not work (it would work in already compiled code), but not if the code is recompiled against the modified version. That was my problem. In 3.0 this will not happen as there are no deprec enum constants, but maybe later. In this case, a CHANGES.txt entry should be added. bq. There is another tuning opportunity, which I didn't take. We are marshaling out the flags from the enums into member variables. I'm not sure how efficient the storage of a boolean vs an enum is. If it is a wash, then having an enum value as replacement would be a good thing. It sould clearly document what controls the flag. This is currently not possibible because of backwards compatibility, because the fields are protected and not deprecated in 2.9. I think with your change we are fine. bq. The only complication would be the set/get for some of the flags. (E.g. AbstractField.setOmitNorms.) What's with that? Are the enum values merely a hint??? Does it make sense to allow omitNorms to be changed after an AbstractField is being used? It is perfectly legal to change these constants after creating the field, so the setters must be there. Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768215#action_12768215 ] DM Smith edited comment on LUCENE-1998 at 10/21/09 2:22 PM: bq. I only added the license header back in the Version class. It must be there. Sorry about wacking the license on Version. It must have been an accident. I know it needs to be there. bq.Some fine tuning: You defined package protected abstract methods, but made them public in the enum constant. Changed to all-public. This was also a backwards-break in contrib/queryParser. Thanks. Inadvertently, I was following the pattern for an Interface, where scoping does not matter. bq. So it works, but not with switch statements. IMHO: Having a switch statement (or cascading if-then-else) over the collection of values is generally indicative of a bad design (or an opportunity for an improved design :) By adding methods to each enum that return literals, we can eliminate this and at the same time, improve performance. There is another tuning opportunity, which I didn't take. We are marshaling out the flags from the enums into member variables. I'm not sure how efficient the storage of a boolean vs an enum is. If it is a wash, then having an enum value as replacement would be a good thing. It sould clearly document what controls the flag. The only complication would be the set/get for some of the flags. (E.g. AbstractField.setOmitNorms.) What's with that? Are the enum values merely a hint??? Does it make sense to allow omitNorms to be changed after an AbstractField is being used? was (Author: dmsmith): .bq I only added the license header back in the Version class. It must be there. Sorry about wacking the license on Version. It must have been an accident. I know it needs to be there. .bq Some fine tuning: You defined package protected abstract methods, but made them public in the enum constant. Changed to all-public. This was also a backwards-break in contrib/queryParser. Thanks. Inadvertently, I was following the pattern for an Interface, where scoping does not matter. .bq So it works, but not with switch statements. IMHO: Having a switch statement (or cascading if-then-else) over the collection of values is generally indicative of a bad design (or an opportunity for an improved design :) By adding methods to each enum that return literals, we can eliminate this and at the same time, improve performance. There is another tuning opportunity, which I didn't take. We are marshaling out the flags from the enums into member variables. I'm not sure how efficient the storage of a boolean vs an enum is. If it is a wash, then having an enum value as replacement would be a good thing. It sould clearly document what controls the flag. The only complication would be the set/get for some of the flags. (E.g. AbstractField.setOmitNorms.) What's with that? Are the enum values merely a hint??? Does it make sense to allow omitNorms to be changed after an AbstractField is being used? Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1999) Match spotter for all query types
[ https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768257#action_12768257 ] Mark Harwood commented on LUCENE-1999: -- bq. and 2) you need it for every single doc visited by the query Actually I don't need it for every doc, only the top ones - it just happens to be so cheap to produce that I can afford to run this in-line with the query. (I haven't actually benchmarked it at scale buy my gut feel is it would be fast ) I was thinking that this might be orthogonal to the existing free-text based highlighter. The logic for this being roughly that 1) Highlighting of free-text fields is reasonably well-catered for with summarisation etc. 2) The remaining problem areas for highlighting (NumericRangeQuery, Spatial, Cached term filters on enums eg gender:male/female) are all likely to be non-free-text fields which don't require summarisation and only contain a single value. I may be wrong in these assumptions about the existing state of play (any thoughts, Mark M?) but it might be useful to think of attacking the problem with these 2 different requirements in mind. Regardless of type e.g. int, long etc I tend to think of fields as falling into these broad usage categories: a) Identifiers (e.g. primary keys) b) Quantifiers (e.g numerics, dates, spatial) c) Free-text d) Controlled vocabularies (e.g. enums such as gender:m/f) Type a ) is catered for with a straight TermQuery and therefore can be handled with the existing highlighter Type b) needs special indexes/queries (spatial/trie) and isn't catered for by the existing term/span-based Highlighter Type c) is catered for with the existing highlighter and its summarising features Type d) involves many TermDoc.next reads so is usefully cached as filters and therefore not catered for by existing Highlighter So this patch helps cater for types b) and d) where simply knowing the field matched is all that is required to highlight. Match spotter for all query types - Key: LUCENE-1999 URL: https://issues.apache.org/jira/browse/LUCENE-1999 Project: Lucene - Java Issue Type: New Feature Affects Versions: 2.9 Reporter: Mark Harwood Attachments: matchflagger.patch Related to LUCENE-1929 and the current inability to highlight NumericRangeQuery, spatial, cached term filters and other exotica. This patch provides the ability to wrap *any* Query objects and record match info as flags encoded in the overall document score. Using this approach it would be possible to understand (and therefore highlight) which fields matched clauses in a query. The match encoding approach loses some precision in scores as noted here: http://tinyurl.com/ykt8nx7 Avoiding these precision issues would require a change to Lucene core to record docId, score AND a matchFlag byte in ScoreDoc objects and collector APIs. This may be something we should consider. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768270#action_12768270 ] DM Smith commented on LUCENE-1998: -- I just noticed that enums are comparable. For the enum Version, we could take advantage for this and not store a number for each value. It would be important to maintain order of versions in the file from earliest to latest. Should we do this? Then the current patch's (comments removed for clarity): public enum Version { LUCENE_CURRENT (0), LUCENE_20 (2000), LUCENE_21 (2100), LUCENE_22 (2200), LUCENE_23 (2300), LUCENE_24 (2400), LUCENE_29 (2900), LUCENE_30 (3000); private Version(int v) { this.v = v; } public boolean onOrAfter(Version other) { return v == 0 || v = other.v; } private final int v; } Would become (the comment on strict ordering is necessary): public enum Version { // These have to be ordered from the oldest to the newest version LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, // This needs to be last LUCENE_CURRENT; /** A convienence method merely calling this.compareTo(other) = 0 */ public boolean onOrAfter(Version other) { return compareTo(other) = 0; } } Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768273#action_12768273 ] Uwe Schindler commented on LUCENE-1998: --- I thought about that, too: I would not do this. Especially because I want to have the 0-version (current) as first element for serialization purposes (changing the order of enum constants is bad, you should always add them at the end). Eventually we want to make the accessor to the interver v somehow public (for more specific comaprisons and so on). Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2001) wordnet parsing bug
[ https://issues.apache.org/jira/browse/LUCENE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2001: Attachment: LUCENE-2001.patch fix and tests for the bug this only affects wordnet contrib, the bug does not exist in the wordnet synonymfilter from the memory package, but add a test there too. wordnet parsing bug --- Key: LUCENE-2001 URL: https://issues.apache.org/jira/browse/LUCENE-2001 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 2.9 Reporter: Robert Muir Priority: Minor Attachments: LUCENE-2001.patch A user reported that wordnet parses the prolog file incorrectly. Also need to check the wordnet parser in the memory contrib for this problem. If this is a false alarm, i'm not worried, because the test will be the first unit test wordnet package ever had. {noformat} For example, looking up the synsets for the word king, we get: java SynLookup wnindex king baron magnate mogul power queen rex scrofula struma tycoon Here, scrofula and struma are extraneous. This happens because, the line parser code in Syns2Index.java interpretes the two consecutive single quotes in entry s(114144247,3,'king''s evil',n,1,1) in wn_s.pl file, as termination of the string and separates into king. This entry concerns synset of words scrofula and struma, and thus they get inserted in the synset of king. *There 1382 such entries, in wn_s.pl* and more in other WordNet Prolog data-base files, where such use of two consecutive single quotes appears. We have resolved this by adding a statement in the line parsing portion of Syns2Index.java, as follows: // parse line line = line.substring(2); * line = line.replaceAll(\'\', `); // added statement* int comma = line.indexOf(','); String num = line.substring(0, comma); ... ... etc. In short we replace '' by ` (a back-quote). Then on recreating the index, we get: java SynLookup zwnindex king baron magnate mogul power queen rex tycoon {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2002: --- Fix Version/s: (was: 3.0) (was: 2.9) 2.9.1 Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Fix For: 2.9.1 This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768294#action_12768294 ] Michael McCandless commented on LUCENE-2002: Uwe I can take this if you want? Have you started? Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Fix For: 2.9.1 This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768301#action_12768301 ] Uwe Schindler commented on LUCENE-2002: --- During 1987, I also found a bug in Highlighter, which is also not able to handle the posIncr of stopwords correctly. Add another issue? Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Fix For: 2.9.1 This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768304#action_12768304 ] DM Smith commented on LUCENE-1998: -- bq. changing the order of enum constants is bad, you should always add them at the end Is this true? I did not know how Java serializes enums so I went looking: See: http://java.sun.com/j2se/1.5.0/docs/guide/serialization/relnotes15.html Turns out it serializes the text representation of the enum constant and class info. This is just like the Parameter class. If I understand it correctly, with this, an enum is resilient to changes in order. New constants can go in any place (for example, we can later add LUCENE_291 before LUCENE_30) and not break serialization compatibility. This is especially good for the future as it allows a path for deprecations. (E.g. deprecation of o.a.l.d.Field.Index.COMPRESS) So having LUCENE_CURRENT at the end is fine. If we wanted it first (or anywhere else) we could have onOrAfter to be: public boolean onOrAfter(Version other) { return other == LUCENE_CURRENT || compareTo(other) = 0; } If we wanted to expose version numbering info in the future, I'd suggest the following pattern (names are unimportant): LUCENE_29 { public int getMajor() { return 2; } public int getMinor() { return 9; } public int getFix() { return 0; } } because it does not require storage and unlike 2900 does not have positional notation meaning (PIC code), e.g. public int getMajor() { return int(2900/1000); } Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2002: -- Assignee: Michael McCandless Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 2.9.1 This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768308#action_12768308 ] Michael McCandless commented on LUCENE-2002: bq. Add another issue? +1! Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 2.9.1 This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1998: -- Attachment: LUCENE-1998_enum_BW.patch LUCENE-1998_enum.patch I changed the Version enum. All test still pass. I also added a test for the backwards branch that tests, that the transition from Parameter - enum is binary compatible and supported by Java's linker. I will commit soon. Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum_BW.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2001) wordnet parsing bug
[ https://issues.apache.org/jira/browse/LUCENE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2001: Lucene Fields: [New, Patch Available] (was: [New]) Fix Version/s: 3.0 2.9.1 Committed revision 828091 to trunk. I set fix for 2.9.1 here, in case someone has some free time to commit the patch. Thanks Parag! wordnet parsing bug --- Key: LUCENE-2001 URL: https://issues.apache.org/jira/browse/LUCENE-2001 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 2.9 Reporter: Robert Muir Priority: Minor Fix For: 2.9.1, 3.0 Attachments: LUCENE-2001.patch, LUCENE-2001_branch.patch, LUCENE-2001_branch.patch A user reported that wordnet parses the prolog file incorrectly. Also need to check the wordnet parser in the memory contrib for this problem. If this is a false alarm, i'm not worried, because the test will be the first unit test wordnet package ever had. {noformat} For example, looking up the synsets for the word king, we get: java SynLookup wnindex king baron magnate mogul power queen rex scrofula struma tycoon Here, scrofula and struma are extraneous. This happens because, the line parser code in Syns2Index.java interpretes the two consecutive single quotes in entry s(114144247,3,'king''s evil',n,1,1) in wn_s.pl file, as termination of the string and separates into king. This entry concerns synset of words scrofula and struma, and thus they get inserted in the synset of king. *There 1382 such entries, in wn_s.pl* and more in other WordNet Prolog data-base files, where such use of two consecutive single quotes appears. We have resolved this by adding a statement in the line parsing portion of Syns2Index.java, as follows: // parse line line = line.substring(2); * line = line.replaceAll(\'\', `); // added statement* int comma = line.indexOf(','); String num = line.substring(0, comma); ... ... etc. In short we replace '' by ` (a back-quote). Then on recreating the index, we get: java SynLookup zwnindex king baron magnate mogul power queen rex tycoon {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2001) wordnet parsing bug
[ https://issues.apache.org/jira/browse/LUCENE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2001: Attachment: LUCENE-2001_branch.patch updated patch for the branch, i forget about String.replace(String,String) being java 5 only... sorry guys. wordnet parsing bug --- Key: LUCENE-2001 URL: https://issues.apache.org/jira/browse/LUCENE-2001 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 2.9 Reporter: Robert Muir Priority: Minor Attachments: LUCENE-2001.patch, LUCENE-2001_branch.patch, LUCENE-2001_branch.patch A user reported that wordnet parses the prolog file incorrectly. Also need to check the wordnet parser in the memory contrib for this problem. If this is a false alarm, i'm not worried, because the test will be the first unit test wordnet package ever had. {noformat} For example, looking up the synsets for the word king, we get: java SynLookup wnindex king baron magnate mogul power queen rex scrofula struma tycoon Here, scrofula and struma are extraneous. This happens because, the line parser code in Syns2Index.java interpretes the two consecutive single quotes in entry s(114144247,3,'king''s evil',n,1,1) in wn_s.pl file, as termination of the string and separates into king. This entry concerns synset of words scrofula and struma, and thus they get inserted in the synset of king. *There 1382 such entries, in wn_s.pl* and more in other WordNet Prolog data-base files, where such use of two consecutive single quotes appears. We have resolved this by adding a statement in the line parsing portion of Syns2Index.java, as follows: // parse line line = line.substring(2); * line = line.replaceAll(\'\', `); // added statement* int comma = line.indexOf(','); String num = line.substring(0, comma); ... ... etc. In short we replace '' by ` (a back-quote). Then on recreating the index, we get: java SynLookup zwnindex king baron magnate mogul power queen rex tycoon {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2001) wordnet parsing bug
[ https://issues.apache.org/jira/browse/LUCENE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2001: Attachment: LUCENE-2001_branch.patch patch for the 2.9 branch (same just without java 5 constructs). I will commit the one to trunk shortly, can someone help with this one, if we think it should be fixed in 2.9.1 also? wordnet parsing bug --- Key: LUCENE-2001 URL: https://issues.apache.org/jira/browse/LUCENE-2001 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 2.9 Reporter: Robert Muir Priority: Minor Attachments: LUCENE-2001.patch, LUCENE-2001_branch.patch A user reported that wordnet parses the prolog file incorrectly. Also need to check the wordnet parser in the memory contrib for this problem. If this is a false alarm, i'm not worried, because the test will be the first unit test wordnet package ever had. {noformat} For example, looking up the synsets for the word king, we get: java SynLookup wnindex king baron magnate mogul power queen rex scrofula struma tycoon Here, scrofula and struma are extraneous. This happens because, the line parser code in Syns2Index.java interpretes the two consecutive single quotes in entry s(114144247,3,'king''s evil',n,1,1) in wn_s.pl file, as termination of the string and separates into king. This entry concerns synset of words scrofula and struma, and thus they get inserted in the synset of king. *There 1382 such entries, in wn_s.pl* and more in other WordNet Prolog data-base files, where such use of two consecutive single quotes appears. We have resolved this by adding a statement in the line parsing portion of Syns2Index.java, as follows: // parse line line = line.substring(2); * line = line.replaceAll(\'\', `); // added statement* int comma = line.indexOf(','); String num = line.substring(0, comma); ... ... etc. In short we replace '' by ` (a back-quote). Then on recreating the index, we get: java SynLookup zwnindex king baron magnate mogul power queen rex tycoon {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1998: -- Attachment: LUCENE-1998_enum_BW.patch Better BW test Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum_BW.patch, LUCENE-1998_enum_BW.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1998) Use Java 5 enums
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1998. --- Resolution: Fixed Committed revision: 828156 Thanks DM Smith! Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Assignee: Uwe Schindler Priority: Minor Fix For: 3.0 Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum_BW.patch, LUCENE-1998_enum_BW.patch Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter. Replace other custom enum patterns with Java 5 enums. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2003) Highlighter ahs problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on
Highlighter ahs problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on --- Key: LUCENE-2003 URL: https://issues.apache.org/jira/browse/LUCENE-2003 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Fix For: 2.9.1, 3.0 This is a followup on LUCENE-1987: If you set in HighligterTest the constant static final Version TEST_VERSION = Version.LUCENE_24 to LUCENE_29 or LUCENE_CURRENT, the test testSimpleQueryScorerPhraseHighlighting fails. Please note, that currently (before LUCENE-2002 is fixed), you must also set the QueryParser to respect posIncr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768393#action_12768393 ] Uwe Schindler commented on LUCENE-2002: --- Issue created! Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 2.9.1 This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2003) Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on
[ https://issues.apache.org/jira/browse/LUCENE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2003: -- Summary: Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on (was: Highlighter ahs problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on) Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on --- Key: LUCENE-2003 URL: https://issues.apache.org/jira/browse/LUCENE-2003 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Fix For: 2.9.1, 3.0 This is a followup on LUCENE-1987: If you set in HighligterTest the constant static final Version TEST_VERSION = Version.LUCENE_24 to LUCENE_29 or LUCENE_CURRENT, the test testSimpleQueryScorerPhraseHighlighting fails. Please note, that currently (before LUCENE-2002 is fixed), you must also set the QueryParser to respect posIncr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2002: --- Attachment: LUCENE-2002-29.patch Attached patch, for 2.9..x I added required Version param to QueryParser, MultiFieldQueryParser and ComplexPhraseQueryParser (contrib), which enable position increments when matchVersion = LUCENE_19. For the deprecated ctors it defaults to Version.LUCENE_24 for back compat. Unfortunately, JavaCC generates two public ctors for QueryParser (one taking CharStream, another taking QueryParserTokenManager) that I don't know how to override to take a Version param. Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 2.9.1 Attachments: LUCENE-2002-29.patch This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2003) Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on
[ https://issues.apache.org/jira/browse/LUCENE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2003: -- Assignee: Michael McCandless Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on --- Key: LUCENE-2003 URL: https://issues.apache.org/jira/browse/LUCENE-2003 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 2.9.1, 3.0 This is a followup on LUCENE-1987: If you set in HighligterTest the constant static final Version TEST_VERSION = Version.LUCENE_24 to LUCENE_29 or LUCENE_CURRENT, the test testSimpleQueryScorerPhraseHighlighting fails. Please note, that currently (before LUCENE-2002 is fixed), you must also set the QueryParser to respect posIncr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768459#action_12768459 ] Robert Muir commented on LUCENE-2002: - Mike, saw a couple of these and laughed a little :) @param matchVersion Lucene version to *patch*; this is passed through to QueryParser. Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 2.9.1 Attachments: LUCENE-2002-29.patch This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768467#action_12768467 ] Michael McCandless commented on LUCENE-2002: Eek! My fingers are doing the thinking, apparently :) Been typing that word a bit too much!! I'll fix. Thanks. Add oal.util.Version ctor to QueryParser Key: LUCENE-2002 URL: https://issues.apache.org/jira/browse/LUCENE-2002 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.9, 3.0 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 2.9.1 Attachments: LUCENE-2002-29.patch This is a followup of LUCENE-1987: If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them. This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor. To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Build failed in Hudson: Lucene-trunk #986
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/986/changes Changes: [uschindler] remove unneeded import [uschindler] LUCENE-1998: Parameter - Java 5 enum transition [rmuir] LUCENE-2001: Fix parsing bug in wordnet contrib [uschindler] Add varargs to MultiSearcher [uschindler] Fix test failure because of wrong cast. Hard stuff :( Could be implemented better, the hq is used for 2 different types [uschindler] LUCENE-1257: Remove the rest of unchecked warnings and some unneeded casts. I added a TODO, where I do not understand the code and not for sure know, whats inside the collections. This could be fixed some time later. But the core code now compiles without any unchecked warning. [uschindler] LUCENE-1987: Remove rest of analysis deprecations (StandardAnalyzer, StopAnalyzer) -- [...truncated 16035 lines...] [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 2.643 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSort [junit] Tests run: 22, Failures: 0, Errors: 0, Time elapsed: 9.88 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.66 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestStressSort [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.973 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 6.51 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.968 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTermScorer [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.761 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTermVectors [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 3.004 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestThreadSafe [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.951 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 8.635 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.565 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.637 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestWildcard [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.751 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 146.423 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestDocValues [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.309 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 3.117 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestOrdValues [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.529 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestValueSource [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.542 sec [junit] [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadNearQuery [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.331 sec [junit] [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadTermQuery [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 7.347 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestBasics [junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 38.507 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestFieldMaskingSpanQuery [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 7.041 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestNearSpansOrdered [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 1.397 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestPayloadSpans [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 4.653 sec [junit] [junit] - Standard Output --- [junit] [junit] Spans Dump -- [junit] payloads for span:2 [junit] doc:0 s:3 e:6 three:Noise:5 [junit] doc:0 s:3 e:6 one:Entity:3 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:0 s:0 e:3 xx:Entity:0 [junit] doc:0 s:0 e:3 rr:Noise:1 [junit] doc:0 s:0 e:3 yy:Noise:2 [junit]
[jira] Updated: (LUCENE-1359) FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1359: Lucene Fields: [New, Patch Available] (was: [New]) Fix Version/s: 3.0 Assignee: Robert Muir FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer Key: LUCENE-1359 URL: https://issues.apache.org/jira/browse/LUCENE-1359 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.2 Reporter: Andrew Lynch Assignee: Robert Muir Priority: Minor Fix For: 3.0 Attachments: LUCENE-1359.patch In {{Analyzer}} : {code} /** Creates a TokenStream which tokenizes all the text in the provided Reader. Default implementation forwards to tokenStream(Reader) for compatibility with older version. Override to allow Analyzer to choose strategy based on document and/or field. Must be able to handle null field name for backward compatibility. */ public abstract TokenStream tokenStream(String fieldName, Reader reader); {code} and in {{FrenchAnalyzer}} {code} public final TokenStream tokenStream(String fieldName, Reader reader) { if (fieldName == null) throw new IllegalArgumentException(fieldName must not be null); if (reader == null) throw new IllegalArgumentException(reader must not be null); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1359) FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-1359. - Resolution: Fixed Committed revision 828298. this inconsistency annoyed me too. thanks Andrew! FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer Key: LUCENE-1359 URL: https://issues.apache.org/jira/browse/LUCENE-1359 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.2 Reporter: Andrew Lynch Assignee: Robert Muir Priority: Minor Fix For: 3.0 Attachments: LUCENE-1359.patch In {{Analyzer}} : {code} /** Creates a TokenStream which tokenizes all the text in the provided Reader. Default implementation forwards to tokenStream(Reader) for compatibility with older version. Override to allow Analyzer to choose strategy based on document and/or field. Must be able to handle null field name for backward compatibility. */ public abstract TokenStream tokenStream(String fieldName, Reader reader); {code} and in {{FrenchAnalyzer}} {code} public final TokenStream tokenStream(String fieldName, Reader reader) { if (fieldName == null) throw new IllegalArgumentException(fieldName must not be null); if (reader == null) throw new IllegalArgumentException(reader must not be null); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1904) move wordnet based synonym code out of contrib/memory and into contrib/wordnet (or somewhere else)
[ https://issues.apache.org/jira/browse/LUCENE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1904: Fix Version/s: 3.0 Assignee: Robert Muir will bring this patch up to speed. its silly to be in the memory contrib instead of wordnet where it belongs. move wordnet based synonym code out of contrib/memory and into contrib/wordnet (or somewhere else) -- Key: LUCENE-1904 URL: https://issues.apache.org/jira/browse/LUCENE-1904 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Reporter: Hoss Man Assignee: Robert Muir Priority: Minor Fix For: 3.0 Attachments: LUCENE-1904.patch, LUCENE-1904.patch see LUCENE-387 ... some synonym related code has been living in contrib/memory for a very long time ... it should be refactored out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org