[jira] [Updated] (LUCENE-3392) Combining analyzers output

2011-08-23 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3392:
--

Attachment: ComboAnalyzer-lucene3x.patch

Patch for lucene-3x.
Tested with Sun's Java 1.6.0_26-b03.
Uses a special factory for cloning Readers, some implementation use reflection 
to gain access to private fields in order to reduce the need to read and copy a 
Readers' content.

 Combining analyzers output
 --

 Key: LUCENE-3392
 URL: https://issues.apache.org/jira/browse/LUCENE-3392
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Olivier Favre
Priority: Minor
  Labels: analysis
 Fix For: 3.4

 Attachments: ComboAnalyzer-lucene3x.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 It should be easy to combine the output of multiple Analyzers, or 
 TokenStreams.
 A ComboAnalyzer and a ComboTokenStream class would take multiple instances, 
 and multiplex their output, keeping a rough order of tokens like increasing 
 position then increasing start offset then increasing end offset.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3392) Combining analyzers output

2011-08-23 Thread Olivier Favre (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089415#comment-13089415
 ] 

Olivier Favre commented on LUCENE-3392:
---

The proposed implementation may a have tight bond with the JVM implementation 
of some classes (StringReader, BufferedReader and FilterReader), as they rely 
on a named private field (respectively str, in and in).
This can be avoided, but any Reader should then be fully read and stored as a 
String or a char[], which can have a huge overhead.
Considering each clone would get read relatively at the same speed (well, only 
for word delimiting analysis, not for a KeywordAnalyzer) an implementation 
could only retain in memory the portion read by at least one cloned reader but 
not all clones, in order to implement a multi read head reader.

Another implementation would be to change the API to give a CloneableReader 
interface with a giveAClone() function instead of a Reader for tokenStream 
and reusableTokenStream functions.
But this involves massive refactoring (13,000 lines) and introduces an 
important API break.

The proposed implementation is the best solution I found.
Any suggestions are welcome!

 Combining analyzers output
 --

 Key: LUCENE-3392
 URL: https://issues.apache.org/jira/browse/LUCENE-3392
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Olivier Favre
Priority: Minor
  Labels: analysis
 Fix For: 3.4

 Attachments: ComboAnalyzer-lucene3x.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 It should be easy to combine the output of multiple Analyzers, or 
 TokenStreams.
 A ComboAnalyzer and a ComboTokenStream class would take multiple instances, 
 and multiplex their output, keeping a rough order of tokens like increasing 
 position then increasing start offset then increasing end offset.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3392) Combining analyzers output

2011-08-23 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3392:
--

Component/s: modules/analysis

 Combining analyzers output
 --

 Key: LUCENE-3392
 URL: https://issues.apache.org/jira/browse/LUCENE-3392
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Olivier Favre
Priority: Minor
  Labels: analysis
 Fix For: 3.4, 4.0

 Attachments: ComboAnalyzer-lucene-trunk.patch, 
 ComboAnalyzer-lucene3x.patch, ComboAnalyzer-lucene3x.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 It should be easy to combine the output of multiple Analyzers, or 
 TokenStreams.
 A ComboAnalyzer and a ComboTokenStream class would take multiple instances, 
 and multiplex their output, keeping a rough order of tokens like increasing 
 position then increasing start offset then increasing end offset.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser

2011-08-02 Thread Olivier Favre (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076091#comment-13076091
 ] 

Olivier Favre commented on LUCENE-3343:
---

Great, thanks!
No blockers for 3x?

 Comparison operators ,=,,= and = support as RangeQuery syntax in 
 QueryParser
 

 Key: LUCENE-3343
 URL: https://issues.apache.org/jira/browse/LUCENE-3343
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/queryparser
Reporter: Olivier Favre
Assignee: Adriano Crestani
Priority: Minor
  Labels: parser, query
 Fix For: 3.4, 4.0

 Attachments: NumCompQueryParser-3x.patch, NumCompQueryParser.patch

   Original Estimate: 96h
  Remaining Estimate: 96h

 To offer better interoperability with other search engines and to provide an 
 easier and more straight forward syntax,
 the operators , =, , = and = should be available to express an open range 
 query.
 They should at least work for numeric queries.
 '=' can be made a synonym for ':'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser

2011-07-27 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3343:
--

Fix Version/s: 3.4

 Comparison operators ,=,,= and = support as RangeQuery syntax in 
 QueryParser
 

 Key: LUCENE-3343
 URL: https://issues.apache.org/jira/browse/LUCENE-3343
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/queryparser
Reporter: Olivier Favre
Priority: Minor
  Labels: parser, query
 Fix For: 3.4, 4.0

 Attachments: NumCompQueryParser.patch

   Original Estimate: 96h
  Remaining Estimate: 96h

 To offer better interoperability with other search engines and to provide an 
 easier and more straight forward syntax,
 the operators , =, , = and = should be available to express an open range 
 query.
 They should at least work for numeric queries.
 '=' can be made a synonym for ':'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser

2011-07-27 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3343:
--

Attachment: NumCompQueryParser-3x.patch

The patch for the lucene-3x branch, including regenerated files by javacc.
Tests ran successfully for queryparser (core and contrib).
Tests added created for the new feature.

QueryParser (core) AND StandardQueryParser (contrib) have been modified.

 Comparison operators ,=,,= and = support as RangeQuery syntax in 
 QueryParser
 

 Key: LUCENE-3343
 URL: https://issues.apache.org/jira/browse/LUCENE-3343
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/queryparser
Reporter: Olivier Favre
Priority: Minor
  Labels: parser, query
 Fix For: 3.4, 4.0

 Attachments: NumCompQueryParser-3x.patch, NumCompQueryParser.patch

   Original Estimate: 96h
  Remaining Estimate: 96h

 To offer better interoperability with other search engines and to provide an 
 easier and more straight forward syntax,
 the operators , =, , = and = should be available to express an open range 
 query.
 They should at least work for numeric queries.
 '=' can be made a synonym for ':'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser

2011-07-26 Thread Olivier Favre (JIRA)
Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser


 Key: LUCENE-3343
 URL: https://issues.apache.org/jira/browse/LUCENE-3343
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/queryparser
Reporter: Olivier Favre
Priority: Minor
 Fix For: 4.0


To offer better interoperability with other search engines and to provide an 
easier and more straight forward syntax,
the operators , =, , = and = should be available to express an open range 
query.
They should at least work for numeric queries.
'=' can be made a synonym for ':'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser

2011-07-26 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3343:
--

Attachment: NumCompQueryParser.patch

The patch, including regenerated files by javacc.
Tests ran successfully for queryparser.
New tests created for the new feature.

 Comparison operators ,=,,= and = support as RangeQuery syntax in 
 QueryParser
 

 Key: LUCENE-3343
 URL: https://issues.apache.org/jira/browse/LUCENE-3343
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/queryparser
Reporter: Olivier Favre
Priority: Minor
  Labels: parser, query
 Fix For: 4.0

 Attachments: NumCompQueryParser.patch

   Original Estimate: 96h
  Remaining Estimate: 96h

 To offer better interoperability with other search engines and to provide an 
 easier and more straight forward syntax,
 the operators , =, , = and = should be available to express an open range 
 query.
 They should at least work for numeric queries.
 '=' can be made a synonym for ':'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-1823) QueryParser with new features for Lucene 3

2011-07-26 Thread Olivier Favre (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071091#comment-13071091
 ] 

Olivier Favre edited comment on LUCENE-1823 at 7/26/11 1:28 PM:


Relates to LUCENE-3343: Open range comparison operator ,=,,= and =.

  was (Author: ofavre):
Open range comparison operator ,=,,= and =.
  
 QueryParser with new features for Lucene 3
 --

 Key: LUCENE-1823
 URL: https://issues.apache.org/jira/browse/LUCENE-1823
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/queryparser
Reporter: Michael Busch
Assignee: Luis Alves
Priority: Minor
 Fix For: 4.0

 Attachments: lucene_1823_any_opaque_precedence_fuzzybug_v2.patch, 
 lucene_1823_foo_bug_08_26_2009.patch


 I'd like to have a new QueryParser implementation in Lucene 3.1, ideally 
 based on the new QP framework in contrib. It should share as much code as 
 possible with the current StandardQueryParser implementation for easy 
 maintainability.
 Wish list (feel free to extend):
 1. *Operator precedence*: Support operator precedence for boolean operators
 2. *Opaque terms*: Ability to plugin an external parser for certain syntax 
 extensions, e.g. XML query terms
 3. *Improved RangeQuery syntax*: Use more intuitive =, =, = instead of [] 
 and {}
 4. *Support for trierange queries*: See LUCENE-1768
 5. *Complex phrases*: See LUCENE-1486
 6. *ANY operator*: E.g. (a b c d) ANY 3 should match if 3 of the 4 terms 
 occur in the same document
 7. *New syntax for Span queries*: I think the surround parser supports this?
 8. *Escaped wildcards*: See LUCENE-588

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3332) FastVectorHighlighter ignores MultiPhraseQuery (and more)

2011-07-22 Thread Olivier Favre (JIRA)
FastVectorHighlighter ignores MultiPhraseQuery (and more)
-

 Key: LUCENE-3332
 URL: https://issues.apache.org/jira/browse/LUCENE-3332
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 3.3, 4.0
 Environment: Tested against Lucene trunk revision 1149488 (4.0), but 
seen under Lucene 3.3 (tested through ElasticSearch 0.17.1, but the code is 
clear: this version, and surely other prior (3.1-3.2) are impacted)
Reporter: Olivier Favre
Priority: Minor


Similar to LUCENE-495.

Create a MultiPhraseQuery, use FastVectorHighlighter, you'll have no fragment.
Using PhraseQuery and/or Highlighter works, but not the previous combination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3332) FastVectorHighlighter ignores MultiPhraseQuery (and more)

2011-07-22 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3332:
--

Attachment: UsingFallback.patch

Implementing a simple fallback creating TermQuerys for each Term gotten via 
Query.extractTerms().
This gives basic highlighter for any type of query, though some may throw an 
UnsupportedOperationException.

 FastVectorHighlighter ignores MultiPhraseQuery (and more)
 -

 Key: LUCENE-3332
 URL: https://issues.apache.org/jira/browse/LUCENE-3332
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 3.3, 4.0
 Environment: Tested against Lucene trunk revision 1149488 (4.0), but 
 seen under Lucene 3.3 (tested through ElasticSearch 0.17.1, but the code is 
 clear: this version, and surely other prior (3.1-3.2) are impacted)
Reporter: Olivier Favre
Priority: Minor
  Labels: highlighting
 Attachments: TestMultiPhraseQueryHighlighting.java, 
 UsingFallback.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Similar to LUCENE-495.
 Create a MultiPhraseQuery, use FastVectorHighlighter, you'll have no fragment.
 Using PhraseQuery and/or Highlighter works, but not the previous combination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3332) FastVectorHighlighter ignores MultiPhraseQuery (and more)

2011-07-22 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3332:
--

Attachment: UsingCombinationAndFallback.patch

Flattens a MultiPhraseQuery into multiple PhraseQuerys by generating all the 
term combinations (can be costly).

And

Implementing a simple fallback creating TermQuerys for each Term gotten via 
Query.extractTerms().
This gives basic highlighter for any type of query, though some may throw an 
UnsupportedOperationException.

 FastVectorHighlighter ignores MultiPhraseQuery (and more)
 -

 Key: LUCENE-3332
 URL: https://issues.apache.org/jira/browse/LUCENE-3332
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 3.3, 4.0
 Environment: Tested against Lucene trunk revision 1149488 (4.0), but 
 seen under Lucene 3.3 (tested through ElasticSearch 0.17.1, but the code is 
 clear: this version, and surely other prior (3.1-3.2) are impacted)
Reporter: Olivier Favre
Priority: Minor
  Labels: highlighting
 Attachments: TestMultiPhraseQueryHighlighting.java, 
 UsingCombinationAndFallback.patch, UsingFallback.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Similar to LUCENE-495.
 Create a MultiPhraseQuery, use FastVectorHighlighter, you'll have no fragment.
 Using PhraseQuery and/or Highlighter works, but not the previous combination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3332) FastVectorHighlighter ignores MultiPhraseQuery (and more)

2011-07-22 Thread Olivier Favre (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069565#comment-13069565
 ] 

Olivier Favre commented on LUCENE-3332:
---

The advantage of the generating multiple PhraseQuerys, is that highlighting 
outputs a single highlighted fragment, instead of multiple highlighted words, 
one next to the other.
I think looking at WeightedSpanTermExtractor's handling of MultiPhraseQuery may 
help, but considering the job of FieldQuery.expand(), it may need deep 
modifications.

The cons of generating the combination is that it generates lots of other 
PhraseQuerys (bad complexity class), and that they will all be processed by 
expand() which itself is O(n^2).
If each term is doubled in the source MultiTermQuery (say once stemmed, once 
intact), for a query having 5 words we will end up with n=32 generated 
PhraseQuerys, thus making 1024 comparison inside expand().
Fortunately, this process can be done only a single time per query, no need to 
do it all over again for each field or for each doc.

 FastVectorHighlighter ignores MultiPhraseQuery (and more)
 -

 Key: LUCENE-3332
 URL: https://issues.apache.org/jira/browse/LUCENE-3332
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 3.3, 4.0
 Environment: Tested against Lucene trunk revision 1149488 (4.0), but 
 seen under Lucene 3.3 (tested through ElasticSearch 0.17.1, but the code is 
 clear: this version, and surely other prior (3.1-3.2) are impacted)
Reporter: Olivier Favre
Priority: Minor
  Labels: highlighting
 Attachments: TestMultiPhraseQueryHighlighting.java, 
 UsingCombinationAndFallback.patch, UsingFallback.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Similar to LUCENE-495.
 Create a MultiPhraseQuery, use FastVectorHighlighter, you'll have no fragment.
 Using PhraseQuery and/or Highlighter works, but not the previous combination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-06 Thread Olivier Favre (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029944#comment-13029944
 ] 

Olivier Favre commented on LUCENE-3071:
---

It seems you forgot to commit lucene/contrib/CHANGES.txt to describe the new 
feature.

Regards

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, 
 LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-05 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3071:
--

Attachment: LUCENE-3071.patch

Proposed patch attached.

Working against Lucene 3.1 (remove the {{path.length()}} last parameter to 
assert call).
But I am having difficulties making the tests work against trunk ({{ant}} and 
{{ant test}} fail, at global scope).

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-05 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3071:
--

Attachment: ant.log.tar.bz2

I'm using Ubuntu 10.04.2 LTS.
ant -version
Apache Ant version 1.7.1 compiled on September 8 2010
I followed the wiki: http://wiki.apache.org/lucene-java/HowToContribute
I used svn checkout http://svn.eu.apache.org/repos/asf/lucene/dev/trunk/ 
lucene-trunk.
I'm working under revision 1099843 (yours).
See ant log attached.

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-05 Thread Olivier Favre (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029413#comment-13029413
 ] 

Olivier Favre commented on LUCENE-3071:
---

{{ant clean test}} did it for me, thanks!

As for the failing tests, it is because of the {{finalOffset}} that I set to 
{{path.length()}}.
I'm not sure whether I should use {{path.length()}}, as my tokens don't go up 
to there when using the reverse mode.
When I take a look at the the end() function, I think that I should set it to 
the end of the string. But I can't see it on the javadoc.
If the purpose of the {{finalOffset}} parameter in 
{{assertTokenStreamContents()}} it to make sure of the {{endOffset}} of the 
last term, then I should not use {{path.length()}} blindly when using reverse 
and skip.

Can you help me with the purpose of {{finalOffset}}? Or can I simply skip it in 
my tests (they are working if I skip it)?

Thanks

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-05 Thread Olivier Favre (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Favre updated LUCENE-3071:
--

Attachment: LUCENE-3071.patch

I fixed my code accordingly.
Tests run fine now.

Ready to ship?

 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-05 Thread Olivier Favre (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029444#comment-13029444
 ] 

Olivier Favre edited comment on LUCENE-3071 at 5/5/11 5:19 PM:
---

Fixed patch attached.

Tests run fine now.

Ready to ship?

  was (Author: ofavre):
I fixed my code accordingly.
Tests run fine now.

Ready to ship?
  
 PathHierarchyTokenizer adaptation for urls: splits reversed
 ---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor
 Attachments: LUCENE-3071.patch, LUCENE-3071.patch, ant.log.tar.bz2

   Original Estimate: 2h
  Remaining Estimate: 2h

 {{PathHierarchyTokenizer}} should be usable to split urls the a reversed 
 way (useful for faceted search against urls):
 {{www.site.com}} - {{www.site.com, site.com, com}}
 Moreover, it should be able to skip a given number of first (or last, if 
 reversed) tokens:
 {{/usr/share/doc/somesoftware/INTERESTING/PART}}
 Should give with 4 tokens skipped:
 {{INTERESTING}}
 {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-04 Thread Olivier Favre (JIRA)
PathHierarchyTokenizer adaptation for urls: splits reversed
---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor


{{PathHierarchyTokenizer}} should be usable to split urls the a reversed way 
(useful for faceted search against urls):
{{www.site.com}} - {{www.site.com, site.com, com}}

Moreover, it should be able to skip a given number of first (or last, if 
reversed) tokens:
{{/usr/share/doc/somesoftware/INTERESTING/PART}}
Should give with 4 tokens skipped:
{{INTERESTING}}
{{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org