[jira] [Updated] (LUCENE-3392) Combining analyzers output
[ https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3392: -- Attachment: ComboAnalyzer-lucene3x.patch Patch for lucene-3x. Tested with Sun's Java 1.6.0_26-b03. Uses a special factory for cloning Readers, some implementation use reflection to gain access to private fields in order to reduce the need to read and copy a Readers' content. Combining analyzers output -- Key: LUCENE-3392 URL: https://issues.apache.org/jira/browse/LUCENE-3392 Project: Lucene - Java Issue Type: New Feature Reporter: Olivier Favre Priority: Minor Labels: analysis Fix For: 3.4 Attachments: ComboAnalyzer-lucene3x.patch Original Estimate: 48h Remaining Estimate: 48h It should be easy to combine the output of multiple Analyzers, or TokenStreams. A ComboAnalyzer and a ComboTokenStream class would take multiple instances, and multiplex their output, keeping a rough order of tokens like increasing position then increasing start offset then increasing end offset. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3392) Combining analyzers output
[ https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089415#comment-13089415 ] Olivier Favre commented on LUCENE-3392: --- The proposed implementation may a have tight bond with the JVM implementation of some classes (StringReader, BufferedReader and FilterReader), as they rely on a named private field (respectively str, in and in). This can be avoided, but any Reader should then be fully read and stored as a String or a char[], which can have a huge overhead. Considering each clone would get read relatively at the same speed (well, only for word delimiting analysis, not for a KeywordAnalyzer) an implementation could only retain in memory the portion read by at least one cloned reader but not all clones, in order to implement a multi read head reader. Another implementation would be to change the API to give a CloneableReader interface with a giveAClone() function instead of a Reader for tokenStream and reusableTokenStream functions. But this involves massive refactoring (13,000 lines) and introduces an important API break. The proposed implementation is the best solution I found. Any suggestions are welcome! Combining analyzers output -- Key: LUCENE-3392 URL: https://issues.apache.org/jira/browse/LUCENE-3392 Project: Lucene - Java Issue Type: New Feature Reporter: Olivier Favre Priority: Minor Labels: analysis Fix For: 3.4 Attachments: ComboAnalyzer-lucene3x.patch Original Estimate: 48h Remaining Estimate: 48h It should be easy to combine the output of multiple Analyzers, or TokenStreams. A ComboAnalyzer and a ComboTokenStream class would take multiple instances, and multiplex their output, keeping a rough order of tokens like increasing position then increasing start offset then increasing end offset. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3392) Combining analyzers output
[ https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3392: -- Component/s: modules/analysis Combining analyzers output -- Key: LUCENE-3392 URL: https://issues.apache.org/jira/browse/LUCENE-3392 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Olivier Favre Priority: Minor Labels: analysis Fix For: 3.4, 4.0 Attachments: ComboAnalyzer-lucene-trunk.patch, ComboAnalyzer-lucene3x.patch, ComboAnalyzer-lucene3x.patch Original Estimate: 48h Remaining Estimate: 48h It should be easy to combine the output of multiple Analyzers, or TokenStreams. A ComboAnalyzer and a ComboTokenStream class would take multiple instances, and multiplex their output, keeping a rough order of tokens like increasing position then increasing start offset then increasing end offset. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076091#comment-13076091 ] Olivier Favre commented on LUCENE-3343: --- Great, thanks! No blockers for 3x? Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser Key: LUCENE-3343 URL: https://issues.apache.org/jira/browse/LUCENE-3343 Project: Lucene - Java Issue Type: New Feature Components: modules/queryparser Reporter: Olivier Favre Assignee: Adriano Crestani Priority: Minor Labels: parser, query Fix For: 3.4, 4.0 Attachments: NumCompQueryParser-3x.patch, NumCompQueryParser.patch Original Estimate: 96h Remaining Estimate: 96h To offer better interoperability with other search engines and to provide an easier and more straight forward syntax, the operators , =, , = and = should be available to express an open range query. They should at least work for numeric queries. '=' can be made a synonym for ':'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3343: -- Fix Version/s: 3.4 Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser Key: LUCENE-3343 URL: https://issues.apache.org/jira/browse/LUCENE-3343 Project: Lucene - Java Issue Type: New Feature Components: modules/queryparser Reporter: Olivier Favre Priority: Minor Labels: parser, query Fix For: 3.4, 4.0 Attachments: NumCompQueryParser.patch Original Estimate: 96h Remaining Estimate: 96h To offer better interoperability with other search engines and to provide an easier and more straight forward syntax, the operators , =, , = and = should be available to express an open range query. They should at least work for numeric queries. '=' can be made a synonym for ':'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3343: -- Attachment: NumCompQueryParser-3x.patch The patch for the lucene-3x branch, including regenerated files by javacc. Tests ran successfully for queryparser (core and contrib). Tests added created for the new feature. QueryParser (core) AND StandardQueryParser (contrib) have been modified. Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser Key: LUCENE-3343 URL: https://issues.apache.org/jira/browse/LUCENE-3343 Project: Lucene - Java Issue Type: New Feature Components: modules/queryparser Reporter: Olivier Favre Priority: Minor Labels: parser, query Fix For: 3.4, 4.0 Attachments: NumCompQueryParser-3x.patch, NumCompQueryParser.patch Original Estimate: 96h Remaining Estimate: 96h To offer better interoperability with other search engines and to provide an easier and more straight forward syntax, the operators , =, , = and = should be available to express an open range query. They should at least work for numeric queries. '=' can be made a synonym for ':'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser
Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser Key: LUCENE-3343 URL: https://issues.apache.org/jira/browse/LUCENE-3343 Project: Lucene - Java Issue Type: New Feature Components: modules/queryparser Reporter: Olivier Favre Priority: Minor Fix For: 4.0 To offer better interoperability with other search engines and to provide an easier and more straight forward syntax, the operators , =, , = and = should be available to express an open range query. They should at least work for numeric queries. '=' can be made a synonym for ':'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3343: -- Attachment: NumCompQueryParser.patch The patch, including regenerated files by javacc. Tests ran successfully for queryparser. New tests created for the new feature. Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser Key: LUCENE-3343 URL: https://issues.apache.org/jira/browse/LUCENE-3343 Project: Lucene - Java Issue Type: New Feature Components: modules/queryparser Reporter: Olivier Favre Priority: Minor Labels: parser, query Fix For: 4.0 Attachments: NumCompQueryParser.patch Original Estimate: 96h Remaining Estimate: 96h To offer better interoperability with other search engines and to provide an easier and more straight forward syntax, the operators , =, , = and = should be available to express an open range query. They should at least work for numeric queries. '=' can be made a synonym for ':'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-1823) QueryParser with new features for Lucene 3
[ https://issues.apache.org/jira/browse/LUCENE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071091#comment-13071091 ] Olivier Favre edited comment on LUCENE-1823 at 7/26/11 1:28 PM: Relates to LUCENE-3343: Open range comparison operator ,=,,= and =. was (Author: ofavre): Open range comparison operator ,=,,= and =. QueryParser with new features for Lucene 3 -- Key: LUCENE-1823 URL: https://issues.apache.org/jira/browse/LUCENE-1823 Project: Lucene - Java Issue Type: New Feature Components: core/queryparser Reporter: Michael Busch Assignee: Luis Alves Priority: Minor Fix For: 4.0 Attachments: lucene_1823_any_opaque_precedence_fuzzybug_v2.patch, lucene_1823_foo_bug_08_26_2009.patch I'd like to have a new QueryParser implementation in Lucene 3.1, ideally based on the new QP framework in contrib. It should share as much code as possible with the current StandardQueryParser implementation for easy maintainability. Wish list (feel free to extend): 1. *Operator precedence*: Support operator precedence for boolean operators 2. *Opaque terms*: Ability to plugin an external parser for certain syntax extensions, e.g. XML query terms 3. *Improved RangeQuery syntax*: Use more intuitive =, =, = instead of [] and {} 4. *Support for trierange queries*: See LUCENE-1768 5. *Complex phrases*: See LUCENE-1486 6. *ANY operator*: E.g. (a b c d) ANY 3 should match if 3 of the 4 terms occur in the same document 7. *New syntax for Span queries*: I think the surround parser supports this? 8. *Escaped wildcards*: See LUCENE-588 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3332) FastVectorHighlighter ignores MultiPhraseQuery (and more)
FastVectorHighlighter ignores MultiPhraseQuery (and more) - Key: LUCENE-3332 URL: https://issues.apache.org/jira/browse/LUCENE-3332 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Affects Versions: 3.3, 4.0 Environment: Tested against Lucene trunk revision 1149488 (4.0), but seen under Lucene 3.3 (tested through ElasticSearch 0.17.1, but the code is clear: this version, and surely other prior (3.1-3.2) are impacted) Reporter: Olivier Favre Priority: Minor Similar to LUCENE-495. Create a MultiPhraseQuery, use FastVectorHighlighter, you'll have no fragment. Using PhraseQuery and/or Highlighter works, but not the previous combination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3332) FastVectorHighlighter ignores MultiPhraseQuery (and more)
[ https://issues.apache.org/jira/browse/LUCENE-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3332: -- Attachment: UsingFallback.patch Implementing a simple fallback creating TermQuerys for each Term gotten via Query.extractTerms(). This gives basic highlighter for any type of query, though some may throw an UnsupportedOperationException. FastVectorHighlighter ignores MultiPhraseQuery (and more) - Key: LUCENE-3332 URL: https://issues.apache.org/jira/browse/LUCENE-3332 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Affects Versions: 3.3, 4.0 Environment: Tested against Lucene trunk revision 1149488 (4.0), but seen under Lucene 3.3 (tested through ElasticSearch 0.17.1, but the code is clear: this version, and surely other prior (3.1-3.2) are impacted) Reporter: Olivier Favre Priority: Minor Labels: highlighting Attachments: TestMultiPhraseQueryHighlighting.java, UsingFallback.patch Original Estimate: 3h Remaining Estimate: 3h Similar to LUCENE-495. Create a MultiPhraseQuery, use FastVectorHighlighter, you'll have no fragment. Using PhraseQuery and/or Highlighter works, but not the previous combination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3332) FastVectorHighlighter ignores MultiPhraseQuery (and more)
[ https://issues.apache.org/jira/browse/LUCENE-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3332: -- Attachment: UsingCombinationAndFallback.patch Flattens a MultiPhraseQuery into multiple PhraseQuerys by generating all the term combinations (can be costly). And Implementing a simple fallback creating TermQuerys for each Term gotten via Query.extractTerms(). This gives basic highlighter for any type of query, though some may throw an UnsupportedOperationException. FastVectorHighlighter ignores MultiPhraseQuery (and more) - Key: LUCENE-3332 URL: https://issues.apache.org/jira/browse/LUCENE-3332 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Affects Versions: 3.3, 4.0 Environment: Tested against Lucene trunk revision 1149488 (4.0), but seen under Lucene 3.3 (tested through ElasticSearch 0.17.1, but the code is clear: this version, and surely other prior (3.1-3.2) are impacted) Reporter: Olivier Favre Priority: Minor Labels: highlighting Attachments: TestMultiPhraseQueryHighlighting.java, UsingCombinationAndFallback.patch, UsingFallback.patch Original Estimate: 3h Remaining Estimate: 3h Similar to LUCENE-495. Create a MultiPhraseQuery, use FastVectorHighlighter, you'll have no fragment. Using PhraseQuery and/or Highlighter works, but not the previous combination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3332) FastVectorHighlighter ignores MultiPhraseQuery (and more)
[ https://issues.apache.org/jira/browse/LUCENE-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069565#comment-13069565 ] Olivier Favre commented on LUCENE-3332: --- The advantage of the generating multiple PhraseQuerys, is that highlighting outputs a single highlighted fragment, instead of multiple highlighted words, one next to the other. I think looking at WeightedSpanTermExtractor's handling of MultiPhraseQuery may help, but considering the job of FieldQuery.expand(), it may need deep modifications. The cons of generating the combination is that it generates lots of other PhraseQuerys (bad complexity class), and that they will all be processed by expand() which itself is O(n^2). If each term is doubled in the source MultiTermQuery (say once stemmed, once intact), for a query having 5 words we will end up with n=32 generated PhraseQuerys, thus making 1024 comparison inside expand(). Fortunately, this process can be done only a single time per query, no need to do it all over again for each field or for each doc. FastVectorHighlighter ignores MultiPhraseQuery (and more) - Key: LUCENE-3332 URL: https://issues.apache.org/jira/browse/LUCENE-3332 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Affects Versions: 3.3, 4.0 Environment: Tested against Lucene trunk revision 1149488 (4.0), but seen under Lucene 3.3 (tested through ElasticSearch 0.17.1, but the code is clear: this version, and surely other prior (3.1-3.2) are impacted) Reporter: Olivier Favre Priority: Minor Labels: highlighting Attachments: TestMultiPhraseQueryHighlighting.java, UsingCombinationAndFallback.patch, UsingFallback.patch Original Estimate: 3h Remaining Estimate: 3h Similar to LUCENE-495. Create a MultiPhraseQuery, use FastVectorHighlighter, you'll have no fragment. Using PhraseQuery and/or Highlighter works, but not the previous combination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029944#comment-13029944 ] Olivier Favre commented on LUCENE-3071: --- It seems you forgot to commit lucene/contrib/CHANGES.txt to describe the new feature. Regards PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Assignee: Ryan McKinley Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3071: -- Attachment: LUCENE-3071.patch Proposed patch attached. Working against Lucene 3.1 (remove the {{path.length()}} last parameter to assert call). But I am having difficulties making the tests work against trunk ({{ant}} and {{ant test}} fail, at global scope). PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3071: -- Attachment: ant.log.tar.bz2 I'm using Ubuntu 10.04.2 LTS. ant -version Apache Ant version 1.7.1 compiled on September 8 2010 I followed the wiki: http://wiki.apache.org/lucene-java/HowToContribute I used svn checkout http://svn.eu.apache.org/repos/asf/lucene/dev/trunk/ lucene-trunk. I'm working under revision 1099843 (yours). See ant log attached. PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029413#comment-13029413 ] Olivier Favre commented on LUCENE-3071: --- {{ant clean test}} did it for me, thanks! As for the failing tests, it is because of the {{finalOffset}} that I set to {{path.length()}}. I'm not sure whether I should use {{path.length()}}, as my tokens don't go up to there when using the reverse mode. When I take a look at the the end() function, I think that I should set it to the end of the string. But I can't see it on the javadoc. If the purpose of the {{finalOffset}} parameter in {{assertTokenStreamContents()}} it to make sure of the {{endOffset}} of the last term, then I should not use {{path.length()}} blindly when using reverse and skip. Can you help me with the purpose of {{finalOffset}}? Or can I simply skip it in my tests (they are working if I skip it)? Thanks PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Favre updated LUCENE-3071: -- Attachment: LUCENE-3071.patch I fixed my code accordingly. Tests run fine now. Ready to ship? PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
[ https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029444#comment-13029444 ] Olivier Favre edited comment on LUCENE-3071 at 5/5/11 5:19 PM: --- Fixed patch attached. Tests run fine now. Ready to ship? was (Author: ofavre): I fixed my code accordingly. Tests run fine now. Ready to ship? PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2 Original Estimate: 2h Remaining Estimate: 2h {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org