[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721326#action_12721326
 ] 

Michael McCandless commented on LUCENE-1673:


Latest patch looks good Uwe!  We can separately tweak the javadocs...

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch, 
 LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720574#action_12720574
 ] 

Michael McCandless commented on LUCENE-1673:


Note that LUCENE-1505 is open for cutting over contrib/spacial to 
NumericUtils

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720593#action_12720593
 ] 

Michael McCandless commented on LUCENE-1673:


bq. Want a convenience method for the user? TrieUtils.createDocumentField(...) 
, same as the sortField currently works.

I don't think this is convenient enough.

bq.  If you'd like to have end-to-end experience for numeric fields, build 
something schema-like and put it in contribs

+1

Long (medium?) term I'd love to get to this point; I think it'd make
Lucene quite a bit more consumable.  But we shouldn't sacrifice
consumability today on the hope for that future nirvana.

You already have a nice starting point here... is that something you
could donate?

{quote}
bq. I do agree that retrieving a doc is already buggy, in that various things 
are lost from your index time doc (a well known issue at this point!)

How on earth is it buggy?  You're working with an inverted index, you aren't 
supposed to get original document from it in the first place. It's like saying 
a hash function is buggy because it is not reversible.
{quote}

I completely agree: you're not supposed to get the original doc back.
And the fact that Lucene's API now pretends you do, is wrong.  We all
agree to that, and that we need to fix Lucene.

But, as things now stand, it's not yet fixed, so until it's fixed, I
don't like intentionally making it worse.

It'd be great to simply stop returning Document from IndexReader.
Wanna make a patch?  I don't think the new sheriff'd hold 2.9 for this
though ;)

{quote}
bq. hey how come I didn't get a NumericField back on my doc?

Perhaps a good reason to not add a NumericField.
{quote}

I think NumericField (when building your doc) is still valuable, even
if we can't return NumericField when you retrieve the doc.

OK... since adding the bit to the stored fields is controversial, I
think for 2.9, we should only add NumericField at indexing (document
creation) time.  So, we don't store a new bit in stored fields file
and the index format is unchanged.


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719940#action_12719940
 ] 

Uwe Schindler commented on LUCENE-1673:
---

bq. re: NumericField - it wouldn't have back-compat issues, so it could be 
added any time - no need to link it to this issue or to rush it. 

I think the same, I should first resolve this and open some more issues :-)

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720054#action_12720054
 ] 

Michael McCandless commented on LUCENE-1673:


Patch looks good Uwe!  The only thing I think is missing is a single
javadoc that shows the full usage of Numeric*, with code fragments.
But, I think that should wait until we resolve the followon issues,
here.

bq. I think the same, I should first resolve this and open some more issues

Agreed, though I think some of these (NumericField, NumericSortField)
are important to do for 2.9.  Maybe others (adding support for the
missing numeric types (byte  short)) can wait.

Let's wrap this one up and move onto the next ones ;)


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720056#action_12720056
 ] 

Uwe Schindler commented on LUCENE-1673:
---

What do you think about deprecating DateTools? I am not really sure. Maybe we 
should leave it (in contrast to NumberTools), but let the notice there, that it 
may be better to use NumericRangeQuery with the unix timestamp.

The Javadocs are almost central, the entry point (linked from everywhere) is 
NumericRangeQuery. I only wanted to add a short note to package.html in 
analysis and search.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720082#action_12720082
 ] 

Michael McCandless commented on LUCENE-1673:


I think deprecating DateTools makes sense, though, we should add simple code 
fragments showing the migration.

NumericRangeQuery's javadocs are great, but I'd like to crispen it up, by 
decoupling how you use it from how it's implemented.  EG lead right off 
with a for the impatient, this is how it's used, and then a separate section 
detailing how it works, what precisionStep means  (and tradeoffs of high/low 
values for it), the reference to the full paper, etc.

But we can iterate on the javadocs in the separate issue, too.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720084#action_12720084
 ] 

Michael McCandless commented on LUCENE-1673:


bq. NumericField would only work for indexing, but when retrieving from index 
(stored fields), it would change to Field.

Actually, this need not be a limitation; FieldsWriter already writes bits 
recording details for each stored field (binary, tokenized, compressed 
(deprecated)).  We could easily add numeric; then FieldsReader would return a 
NumericField.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720089#action_12720089
 ] 

Uwe Schindler commented on LUCENE-1673:
---

bq. Actually, this need not be a limitation; FieldsWriter already writes bits 
recording details for each stored field (binary, tokenized, compressed 
(deprecated)). We could easily add numeric; then FieldsReader would return a 
NumericField

With all problems in FieldsReader like need to have a LazyNumericField and so 
on... :( I cam around this two month ago when fixing this omitTf things 
there... But it may be an idea.

I will do some changes to the current patch and fix javadocs and add these 
package.html parts. The SortField and FieldCache parts are done directly after 
this issue.

I only wanted to hear one more voice about DateTools, because for index size 
and so on, it may still be good to only index dates in date-granularity. With 
this, you can use a simple TermQuery to retrieve all docs for that day, with 
NumericRangeQuery you must create a NumericRangeQuery.newLongRange() on the 
unix ts from 0:00 on the day to 0:00 on the following day exclusive.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720143#action_12720143
 ] 

Michael McCandless commented on LUCENE-1673:


bq. I only wanted to hear one more voice about DateTools, because for index 
size and so on, it may still be good to only index dates in date-granularity. 
With this, you can use a simple TermQuery to retrieve all docs for that day, 
with NumericRangeQuery you must create a NumericRangeQuery.newLongRange() on 
the unix ts from 0:00 on the day to 0:00 on the following day exclusive.

Couldn't we have a NumericTermQuery for such cases?  You have the full 
precision term in the index...

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720152#action_12720152
 ] 

Uwe Schindler commented on LUCENE-1673:
---

With a NumericTermQuery you would only hit the document exactly on the same 
millisecond.

The good thing behind DateTools is, that you can index the date value as a term 
with some fixed precision, like months. Because of this, you can simply find 
specific month using one TermQuery. For Ranges, NumericRangeQuery is in most 
cases better (not with month resolution).

With NumericRangeQuery it is hard to hit exactly one month using only one term, 
because the month boundaries in epoch milliseconds is not exactly a 2^n value. 
In my opinion:

- DateTools is good to index very coarse dates, months, years out of a 
java.util.Date/Calendar. E.g. days where a room (document) is free in hotel. 
Users then can use term queries and ask is there any free room on a specific 
date, for a date range, it is not bad to use a conventional RangeQuery (only 
few terms affected).
- Use NumericRangeQuery if you want to query any date range (even downto the 
millisecond). The important thing is: the lower precision terms are not at 
common date boundaries.

Because of this different use cases, in my opinion, DateTools has its usage.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720163#action_12720163
 ] 

Yonik Seeley commented on LUCENE-1673:
--

bq.  We could easily add numeric; then FieldsReader would return a 
NumericField.

This is that baking in a specific implementation into the index format that I 
don't like.
There will be changes to Trie*, there will be other implementations of numerics 
by both us and other users.  We don't need to strongly couple core indexing and 
the types of fields... they aren't coupled now except when the generic format 
of the index changes (like omitNorms, omitTf, indexed, etc).

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720191#action_12720191
 ] 

Michael McCandless commented on LUCENE-1673:


{quote}
bq. We could easily add numeric; then FieldsReader would return a 
NumericField.

This is that baking in a specific implementation into the index format that I 
don't like.
{quote}

But we are already baking in the trie indexing format?  That's what
moving trie to core implies.  Lucene can now index numbers, well,
and has committed to a certain approach (trie).

The term dict of a numeric field is trie encoded, each doc field is
indexed under a series of trie encoded tokens (w/ different
precisions), etc.

Sure, in the future we may find improvements to how Lucene indexes
numbers, by why choose to be buggy today (hey how come I didn't get a
NumericField back on my doc?) for this possible future that may or
may not come?  If/when that future arrives, we can improve the index
format at that point rather than intentionally create buggy code
today?

I do agree that retrieving a doc is already buggy, in that various
things are lost from your index time doc (a well known issue at this
point!), but I don't think we should intentionally make that behavior
even more buggy, if we can help it...


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720223#action_12720223
 ] 

Yonik Seeley commented on LUCENE-1673:
--

bq. But we are already baking in the trie indexing format? That's what
moving trie to core implies. 

Nah - no more than the porter stemmer or any other type of analysis is baked 
in.
I thought move meant rename (package and class name).  Upgrading it's 
stability and how core it was.

bq. hey how come I didn't get a NumericField back on my doc?

Perhaps a good reason to not add a NumericField.  It doesn't currently exist 
and is not necessary for Trie.
Want a convenience method for the user?  TrieUtils.createDocumentField(...) , 
same as the sortField currently works.

The current Trie behavior works the same way everything else does in Lucene... 
changing that and encoding types into the index deserves it's own issue and 
discussion (and something big like that doesn't seem to belong in 2.9 which is 
winding down).


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-16 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720231#action_12720231
 ] 

Earwin Burrfoot commented on LUCENE-1673:
-

bq. This is that baking in a specific implementation into the index format that 
I don't like.
+many

bq. I do agree that retrieving a doc is already buggy, in that various things 
are lost from your index time doc (a well known issue at this point!)
How on earth is it buggy? You're working with an inverted index, you aren't 
supposed to get original document from it in the first place. It's like saying 
a hash function is buggy because it is not reversible.

The less coupling various lucene components have on each other - the better. If 
you'd like to have end-to-end experience for numeric fields, build something 
schema-like and put it in contribs. If it's hard to build - Lucene core is to 
blame, it's not extensible enough. From my experience, for that purporse it's 
okay as it is.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719689#action_12719689
 ] 

Michael McCandless commented on LUCENE-1673:


bq. So one using new code must always specify the parser when using 
SortField.INT (SortField.AUTO is already deprectaed so no problem). 

This will apply to int/long/float/double as well right?  How would you
do this (require a parser for only numeric sorts) back-compatibly?  EG,
the others (String, DOC, etc.) don't require a parser.

We could alternatively make NumericSortField (subclassing SortField),
that just uses the right parser?

Did you think about / decide against making a NumericField (that'd set
the right tokenStream itself)?

Other questions/comments:

  * Could we change ShiftAttribute - NumericShiftAttribute?

  * How about oal.util.NumericUtils instead of TrieUtils?

  * Can we rename RangeQuery - TextRangeQuery (TermRangeQuery), to
make it clear that its range checking is by Term sort order.

  * Should we support byte/short for trie indexed fields as well?
(Since SortField, FieldCache support these numeric types too...).


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719692#action_12719692
 ] 

Michael McCandless commented on LUCENE-1673:


bq. The only open point is the name of TrieUtils, any idea for package and/or 
name?

I think NumericUtils?  (I'd like the naming to be consistent w/
NumericRangeQuery, NumericTokenStream, since it's very much a public
API, ie users must interact directly with it to get their SortField
(maybe) and FieldCache parser).

Leaving it util seems OK, since it's used by analysis  searching.


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719699#action_12719699
 ] 

Yonik Seeley commented on LUCENE-1673:
--

bq. This will apply to int/long/float/double as well right? How would you do 
this (require a parser for only numeric sorts) back-compatibly? EG, the others 
(String, DOC, etc.) don't require a parser.

Allow passing parser==null to get the default?

bq. We could alternatively make NumericSortField (subclassing SortField), that 
just uses the right parser?

A factory method TrieUtils.getSortField() could also return the right SortField.



 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719726#action_12719726
 ] 

Uwe Schindler commented on LUCENE-1673:
---

{quote}
bq. This will apply to int/long/float/double as well right? How would you do 
this (require a parser for only numeric sorts) back-compatibly? EG, the others 
(String, DOC, etc.) don't require a parser.

Mike: This will apply to int/long/float/double as well right? How would you
do this (require a parser for only numeric sorts) back-compatibly? EG,
the others (String, DOC, etc.) don't require a parser.

Yonik: Allow passing parser==null to get the default?

bq. We could alternatively make NumericSortField (subclassing SortField), that 
just uses the right parser?

A factory method TrieUtils.getSortField() could also return the right SortField.
{quote}

I want to move this into a new issue after, I will open one.

Nevertheless, I would like to remove emphasis from NumericUtils (which is in 
realyity a helper class). So I want to make the current human-readable numeric 
parsers public and also add the trie parsers to FieldCache.

The SortField factory is then the only parts really needed in NumericUtils, but 
not really. The parser is a singleton, works for all trie fields and could also 
live somewhere else or nowhere at all, if the Parsers all stay in FieldCache.

bq. Should we support byte/short for trie indexed fields as well? (Since 
SortField, FieldCache support these numeric types too...). 

For bytes, TrieRange is not very interesting, for shorts, maybe, but I would 
subsume them during indexing as simple integers. You could not speedup 
searching, but limit index size a little bit.

bq. Could we change ShiftAttribute - NumericShiftAttribute?

No problem, I do this. There is also missing the link from the TokenStream in 
the javadocs to this, see also my reply in java-dev to Grants mail.

bq. Can we rename RangeQuery - TextRangeQuery (TermRangeQuery), to make it 
clear that its range checking is by Term sort order.

We can do this and deprecate the old one, but I added a note to Javadocs (see 
patch). I would do this outside of this issue.

bq. How about oal.util.NumericUtils instead of TrieUtils?

That was my first idea, too. What to do with o.a.l.doc.NumberTools 
(deprecate?). And also update contrib/spatial to use NumericUtils instead of 
the copied and not really goo NumberUtils from Solr (Yonik said, it was written 
at a very early stage, and is not effective with UTF-8 encoding and the 
TermEnum posioning with the term prefixes). It would be a index-format change 
for spatial, but as the code was not yet released (in Lucene), the Lucene 
version should not use NumberUtils at all.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to 

[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719729#action_12719729
 ] 

Uwe Schindler commented on LUCENE-1673:
---

bq. Did you think about / decide against making a NumericField (that'd set the 
right tokenStream itself)?

The problem currently is:
- Field is final and so I must extend AbstractField. But some methods of 
Document return Field and not AbstractField.
- NumericField would only work for indexing, but when retrieving from index 
(stored fields), it would change to Field.

Maybe we should move this after the index-specific schemas and so on. Or 
document, that it can be only used for indexing.

By the way: How do you like the factories in NumericRangeQuery and the setValue 
methods, working like StringBuffer.append() in NumericTokenStream? This makes 
it really easy to index.

The only good thing of NumericField would be the possibility to automatically 
disable TF and Norms per default when indexing.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719738#action_12719738
 ] 

Uwe Schindler commented on LUCENE-1673:
---

I think, I remove the ShiftAttribute in complete, its really useless. Maybe, I 
add a getShift() method to NumericUtils, that returns the shift value of a 
Token/String. See java-dev mailing with Yonik.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719761#action_12719761
 ] 

Michael McCandless commented on LUCENE-1673:


OK let's open a new issue for how to best integrate/default SortField
and FieldCache.

bq. Nevertheless, I would like to remove emphasis from NumericUtils (which is 
in realyity a helper class).

+1

bq. For bytes, TrieRange is not very interesting, for shorts, maybe, but I 
would subsume them during indexing as simple integers. You could not speedup 
searching, but limit index size a little bit.

Well, a RangeQuery on a plain text byte or short field requires
sneakiness (knowing that you must zero-pad; keeping
document.NumberUtils around); I think it's best if NumericXXX in
Lucene handles all of java's native numeric types.  And you want a
byte[] or short[] out of FieldCache (to not waste RAM having to
upgrade to an int[]).

We can do this under the (a?) new issue too...

bq. The SortField factory is then the only parts really needed in NumericUtils, 
but not really. The parser is a singleton, works for all trie fields and could 
also live somewhere else or nowhere at all, if the Parsers all stay in 
FieldCache.

(Under a new issue, but...) I'm not really a fan of leaving the parser
in FieldCache and expecting a user to know to create the SortField
with that parser.  NumericSortField would make it much more consumable
to direct Lucene users.

{quote}
bq. Can we rename RangeQuery - TextRangeQuery (TermRangeQuery), to make it 
clear that its range checking is by Term sort order.

We can do this and deprecate the old one, but I added a note to Javadocs (see 
patch). I would do this outside of this issue.
{quote}

OK.

One benefit of a rename is it's a reminder to users on upgrading to
consider whether they should in fact switch to NumericRangeQuery.

{quote}
bq. How about oal.util.NumericUtils instead of TrieUtils?

That was my first idea, too. What to do with o.a.l.doc.NumberTools 
(deprecate?). And also update contrib/spatial to use NumericUtils instead of 
the copied and not really goo NumberUtils from Solr (Yonik said, it was written 
at a very early stage, and is not effective with UTF-8 encoding and the 
TermEnum posioning with the term prefixes). It would be a index-format change 
for spatial, but as the code was not yet released (in Lucene), the Lucene 
version should not use NumberUtils at all.
{quote}

+1 on both (if we can add byte/short to trie*); we should do this
before 2.9 since we can still change locallucene's format.  Maybe open
a new issue for that, too?  We're forking off new 2.9 issues left and
right here!!

bq. I think, I remove the ShiftAttribute in complete, its really useless. 
Maybe, I add a getShift() method to NumericUtils, that returns the shift value 
of a Token/String. See java-dev mailing with Yonik.

OK

{quote}
bq. Did you think about / decide against making a NumericField (that'd set the 
right tokenStream itself)?

Field is final and so I must extend AbstractField. But some methods of Document 
return Field and not AbstractField.
{quote}

Can we just un-final Field?

{quote}
NumericField would only work for indexing, but when retrieving from index 
(stored fields), it would change to Field.

Maybe we should move this after the index-specific schemas and so on. Or 
document, that it can be only used for indexing.
{quote}

True, but we already have such challenges between index vs search
time Document; documenting it it seems fine.

bq. By the way: How do you like the factories in NumericRangeQuery and the 
setValue methods, working like StringBuffer.append() in NumericTokenStream? 
This makes it really easy to index.

I think this is great!  I like that you return NumericTokenStream :)

bq. The only good thing of NumericField would be the possibility to 
automatically disable TF and Norms per default when indexing.

Consumability (good defaults)!  (And also not having to know that you
must go and get a tokenStream from NumericUtils).


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as 

Re: [jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Mark Miller

Michael McCandless (JIRA) wrote:

 We're forking off new 2.9 issues left and
right here!!
  

Evil :) You guys are like small team working against me.

We still have 29+- issue to wrap up though, so probably plenty of time.

I hope we can set a rough target date soon though - it really feels like 
we could drag for quite a bit longer

if we wanted to.

Remember the last time we started to push for 2.9 in Dec/Jan :)

--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Michael McCandless
On Mon, Jun 15, 2009 at 4:42 PM, Mark Millermarkrmil...@gmail.com wrote:

 Remember the last time we started to push for 2.9 in Dec/Jan :)

Yes this is very much on my mind too!!

So maybe, it's a race between the trie* group of issues, and the other 28 ;)

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: [jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Uwe Schindler
Sorry,

I think these new issues may also be in 3.1 (not all), but I want to have
this trie stuff with a clean API before 2.9 and not deprecate parts of it
again in 3.1, shortly after release :-(

This issues are no hard changes, its just a little bit API cleanup you can
do in your freetime :-] -- I know I am a little bit late, but I am working
hard on this :)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Monday, June 15, 2009 10:51 PM
 To: java-dev@lucene.apache.org
 Subject: Re: [jira] Commented: (LUCENE-1673) Move TrieRange to core
 
 On Mon, Jun 15, 2009 at 4:42 PM, Mark Millermarkrmil...@gmail.com wrote:
 
  Remember the last time we started to push for 2.9 in Dec/Jan :)
 
 Yes this is very much on my mind too!!
 
 So maybe, it's a race between the trie* group of issues, and the other
 28 ;)
 
 Mike
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719888#action_12719888
 ] 

Yonik Seeley commented on LUCENE-1673:
--

re: NumericField - it wouldn't have back-compat issues, so it could be added 
any time - no need to link it to this issue or to rush it.


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9

 Attachments: LUCENE-1673.patch, LUCENE-1673.patch, LUCENE-1673.patch


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719242#action_12719242
 ] 

Uwe Schindler commented on LUCENE-1673:
---

I am currently preparing a first patch for NumericRangeQuery-to-core.

The class NumericUtils (former TrieUtils) should be in o.a.l.util or 
o.a.l.document? At the moment, the public part of this class is only 
interesting to retrieve Parsers or SortField instances. But the latter can be 
refactored, to SortField.TRIE_XXX (not good name, as TRIE no longer used) and 
the parser instances can be added to FieldCache.

For indexing or querying it is not required for end users, one can use 
NumericTokenStream and NumericRangeQuery for all his needs.

So NumberUtils is more internal than before.

Any thoughts?

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719246#action_12719246
 ] 

Uwe Schindler commented on LUCENE-1673:
---

Here my own thoughts:

bq. But the latter can be refactored, to SortField.TRIE_XXX (not good name, as 
TRIE no longer used) and the parser instances can be added to FieldCache.

- deprecate SortField.INT and use SortField.PLAIN_TEXT_INT instead and so on
- use SortField.PREFIX_ENCODED_INT for the trie ones (better name, this is the 
internal encoding name from TrieUtils)
- the default parsers (private) in FieldCache renaming to also PlainText* (but 
accessible)
- add TrieUtils.XxxParser to FieldCache (but accessible)

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719266#action_12719266
 ] 

Yonik Seeley commented on LUCENE-1673:
--

bq.  use SortField.PREFIX_ENCODED_INT for the trie ones

This needlessly couples the Trie stuff strongly to the SortField stuff.  
Something along the lines of the current TrieUtils.getIntSortField(fname, 
reverse) seems preferable.

bq. add TrieUtils.XxxParser to FieldCache (but accessible)

The Trie parsers belong in the Trie class.

bq. re-use INT (and so on) in Sort and cache code, where the data type is meant 
(we already have this in lots of code around), but where the encoding is meant 
use PLAIN_TEXT_* vs. PREFIX_ENCODED_*.

I didn't understand that sentence.

As far as what package it makes sense to go in... what about an 
analysis.numeric package?

As a general comment, moving TrieRange to core should be moving it to the core 
and perhaps renaming the classes if we can think of a better name.  Some of the 
other stuff belongs in a different issue.


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719270#action_12719270
 ] 

Uwe Schindler commented on LUCENE-1673:
---

{quote}
bq. use SortField.PREFIX_ENCODED_INT for the trie ones

This needlessly couples the Trie stuff strongly to the SortField stuff. 
Something along the lines of the current TrieUtils.getIntSortField(fname, 
reverse) seems preferable.

bq. add TrieUtils.XxxParser to FieldCache (but accessible)

The Trie parsers belong in the Trie class.

bq. re-use INT (and so on) in Sort and cache code, where the data type is meant 
(we already have this in lots of code around), but where the encoding is meant 
use PLAIN_TEXT_* vs. PREFIX_ENCODED_*.

I didn't understand that sentence.
{quote}

But on the other hand SortField.INT is also strongly linked to the plain text 
encoding of these tokens. My proposal was to unlink the index encoding of 
numeric data types from the sorting/field cache code and its constants. So it 
should not make a difference if you encoded the long using Integer.toString() 
or TrieUtils, in both cases, sorting code is identical, only the parser is 
different.

Because of this, if we stay with SortField.INT and so on, I would tend to make 
the according Parser/FieldCache a required arg of SortField, defaulting to the 
current parsers for the deprecated backwards-compatibility.

So one using new code must always specify the parser when using SortField.INT 
(SortField.AUTO is already deprectaed so no problem). The same with FieldCache: 
always specify the parser when getting an instance. For that the current 
default parsers should be made public accessible.

bq. As far as what package it makes sense to go in... what about an 
analysis.numeric package

TrieUtils is used in analysis and searching, this is why I tend to util. The 
NumericTokenStream is in analysis (in my not-yet-realeased patch), 
ShiftAttribute in analysis.tokenattributes and TrieRangeQuery/Filter in search.

bq. As a general comment, moving TrieRange to core should be moving it to the 
core and perhaps renaming the classes if we can think of a better name. Some of 
the other stuff belongs in a different issue.

I think this is correct. I will post a patch soon, that leaves TrieUtils alive.

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719273#action_12719273
 ] 

Yonik Seeley commented on LUCENE-1673:
--

bq. But on the other hand SortField.INT is also strongly linked to the plain 
text encoding of these tokens. 

Right - I agree that's not good, and SortField.INT can be misleading.

bq. Because of this, if we stay with SortField.INT and so on, I would tend to 
make the according Parser/FieldCache a required arg of SortField, defaulting to 
the current parsers for the deprecated backwards-compatibility.

That makes sense.  I think it also makes sense (in addition) to keep the 
factory-like method like TrieUtils.getSortField() that instantiates the right 
SortField for the user based on the trie params given (like precisionStep and 
friends).

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718209#action_12718209
 ] 

Michael McCandless commented on LUCENE-1673:


bq. NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and so on.

Could we also do this for a term range?  Then, we could have a single 
RangeQuery that rewrites to the right impl based on what kind of range you are 
doing?

(And in fact it could fold in FieldCacheRangeFilter too).

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717754#action_12717754
 ] 

Michael McCandless commented on LUCENE-1673:


{quote}
In Solr there are three different impls:

Trie (of course)
Text-only numbers (do not work with range queries, but can be used for sorting 
etc.)
A binary encoding (also used by LocalLucene at the moment), that is sortable. 
This can be used for RangeQueries, but sorting is slow (because they have no 
parser, and at the time it was implemented, SortField had no parser support)
{quote}

Ahh OK, this is just Solr's pre-existing numeric field support.  (I
had thought you meant Solr had a different impl for Trie).

bq. The problem, because of backwards compatibility they need to be preserved 
(possibility to read old indexes).

This is indeed quite a challenge.  Actually is there anything in Trie
that encodes which version of the format is indexed in a given
segment?  (So that if we do every change the indexed format, we can
bump a version somewhere to keep back compat).

bq. Maybe we use a static factory instead of same Ctor. By this the name is 
different, but it just creates the correct instance of always the same class: 
NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and so on. 
Same for the TokenStreams (and the Field?)

That sounds like a good approach?

{quote}
 When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
 to your SortField

Or add new SortField types.

The problem with all this: For old indexes, we need some backwards 
compatibility. Ideally we would just create numeric fields in the new way and 
reuse e.g. SortField.INT for this. But this cannot be done. Or even, replace 
the FieldCache parsers by the trie ones. But this cannot be done at the moment.
{quote}

I wonder if we could handle this by adding a setting in FieldInfo?
Ie, to record that this numeric field was indexed as a trie.  Then,
when we need to get the parser for SortField.INT, we'd check the
FieldInfo to see which parser to use.  This could also handle
back-compat, ie if we change the trie format being written we'd change
the setting and segment merging would gradually uprade previously
indexed fields.

{quote}
 I'd also like to rename RangeQuery to something else, with this
 change. EG TermRangeQuery... to emphasize that you use it for
 non-numbers. The javadocs of TermRangeQuery should point to
 Int/LongRangeQuery as strongly preferred for numeric ranges.

Cool. For the others, too (FieldCacheRangeQuery).
{quote}

Yes.


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, 

Re: [jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-09 Thread Jason Rutherglen
 I wonder if we could handle this by adding a setting in FieldInfo?

Do we have an issue open that allows any metadata on a per field basis?
This seems like something flexible indexing will require?

On Tue, Jun 9, 2009 at 10:15 AM, Michael McCandless (JIRA)
j...@apache.orgwrote:


[
 https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717754#action_12717754]

 Michael McCandless commented on LUCENE-1673:
 

 {quote}
 In Solr there are three different impls:

 Trie (of course)
 Text-only numbers (do not work with range queries, but can be used for
 sorting etc.)
 A binary encoding (also used by LocalLucene at the moment), that is
 sortable. This can be used for RangeQueries, but sorting is slow (because
 they have no parser, and at the time it was implemented, SortField had no
 parser support)
 {quote}

 Ahh OK, this is just Solr's pre-existing numeric field support.  (I
 had thought you meant Solr had a different impl for Trie).

 bq. The problem, because of backwards compatibility they need to be
 preserved (possibility to read old indexes).

 This is indeed quite a challenge.  Actually is there anything in Trie
 that encodes which version of the format is indexed in a given
 segment?  (So that if we do every change the indexed format, we can
 bump a version somewhere to keep back compat).

 bq. Maybe we use a static factory instead of same Ctor. By this the name is
 different, but it just creates the correct instance of always the same
 class: NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and
 so on. Same for the TokenStreams (and the Field?)

 That sounds like a good approach?

 {quote}
  When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
  to your SortField

 Or add new SortField types.

 The problem with all this: For old indexes, we need some backwards
 compatibility. Ideally we would just create numeric fields in the new way
 and reuse e.g. SortField.INT for this. But this cannot be done. Or even,
 replace the FieldCache parsers by the trie ones. But this cannot be done at
 the moment.
 {quote}

 I wonder if we could handle this by adding a setting in FieldInfo?
 Ie, to record that this numeric field was indexed as a trie.  Then,
 when we need to get the parser for SortField.INT, we'd check the
 FieldInfo to see which parser to use.  This could also handle
 back-compat, ie if we change the trie format being written we'd change
 the setting and segment merging would gradually uprade previously
 indexed fields.

 {quote}
  I'd also like to rename RangeQuery to something else, with this
  change. EG TermRangeQuery... to emphasize that you use it for
  non-numbers. The javadocs of TermRangeQuery should point to
  Int/LongRangeQuery as strongly preferred for numeric ranges.

 Cool. For the others, too (FieldCacheRangeQuery).
 {quote}

 Yes.


  Move TrieRange to core
  --
 
  Key: LUCENE-1673
  URL: https://issues.apache.org/jira/browse/LUCENE-1673
  Project: Lucene - Java
   Issue Type: New Feature
   Components: Search
 Affects Versions: 2.9
 Reporter: Uwe Schindler
 Assignee: Uwe Schindler
  Fix For: 2.9
 
 
  TrieRange was iterated many times and seems stable now (LUCENE-1470,
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to
 its default FieldTypes (SOLR-940) and if possible I want to move it to core
 before release of 2.9.
  Before this can be done, there are some things to think about:
  # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how
 should they be called in core? I would suggest to leave it as it is. On the
 other hand, if this keeps our only numeric query implementation, we could
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here
 are problems). Same for the TokenStreams and Filters.
  # Maybe the pairs of classes for indexing and searching should be moved
 into one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter.
 The problem here: ctors must be able to pass int, long, double, float as
 range parameters. For the end user, mixing these 4 types in one class is
 hard to handle. If somebody forgets to add a L to a long, it suddenly
 instantiates a int version of range query, hitting no results and so on.
 Same with other types. Maybe accept java.lang.Number as parameter (because
 nullable for half-open bounds) and one enum for the type.
  # TrieUtils move into o.a.l.util? or document or?
  # Move TokenStreams into o.a.l.analysis, ShiftAttribute into
 o.a.l.analysis.tokenattributes? Somewhere else?
  # If we rename the classes, should Solr stay with Trie (because there are
 different impls)?
  # Maybe add a subclass of AbstractField, that automatically creates these
 TokenStreams and omits norms/tf per default for 

RE: [jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-09 Thread Uwe Schindler
No we do not have such an issue, as far as I know. Storing some
version/field type info would be great. In this case we could maybe extend
TrieRange in future to use a different encoding or e.g. CSF for the highest
precisision (as Michael Busch suggested in Amsterdam).

Because TrieRange was and is in contrib until now, I did not wanted to
modify the index internals and file formats for a contrib extension. But if
it moves to core, I could create an subclass of AbstractField for numeric
values, the type is stored in FieldInfos and so it is possible to autodetect
SortFields/FieldCache type, recreate the AbstractField subtype for stored
fields (we may even encode the stored field contents using the prefix
encoding, which is good for floats/doubles because the human-readable
transformation from/to string may loose information).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

  _  

From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: Tuesday, June 09, 2009 8:48 PM
To: java-dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENE-1673) Move TrieRange to core

 

 I wonder if we could handle this by adding a setting in FieldInfo?

Do we have an issue open that allows any metadata on a per field basis?
This seems like something flexible indexing will require?

On Tue, Jun 9, 2009 at 10:15 AM, Michael McCandless (JIRA) j...@apache.org
wrote:


   [
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.pl
ugin.system.issuetabpanels:comment-tabpanel
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717754#actio
n_12717754 focusedCommentId=12717754#action_12717754 ]

Michael McCandless commented on LUCENE-1673:


{quote}

In Solr there are three different impls:

Trie (of course)
Text-only numbers (do not work with range queries, but can be used for
sorting etc.)
A binary encoding (also used by LocalLucene at the moment), that is
sortable. This can be used for RangeQueries, but sorting is slow (because
they have no parser, and at the time it was implemented, SortField had no
parser support)

{quote}

Ahh OK, this is just Solr's pre-existing numeric field support.  (I
had thought you meant Solr had a different impl for Trie).

bq. The problem, because of backwards compatibility they need to be
preserved (possibility to read old indexes).

This is indeed quite a challenge.  Actually is there anything in Trie
that encodes which version of the format is indexed in a given
segment?  (So that if we do every change the indexed format, we can
bump a version somewhere to keep back compat).

bq. Maybe we use a static factory instead of same Ctor. By this the name is
different, but it just creates the correct instance of always the same
class: NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and
so on. Same for the TokenStreams (and the Field?)

That sounds like a good approach?


{quote}
 When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
 to your SortField

Or add new SortField types.

The problem with all this: For old indexes, we need some backwards
compatibility. Ideally we would just create numeric fields in the new way
and reuse e.g. SortField.INT for this. But this cannot be done. Or even,
replace the FieldCache parsers by the trie ones. But this cannot be done at
the moment.
{quote}

I wonder if we could handle this by adding a setting in FieldInfo?
Ie, to record that this numeric field was indexed as a trie.  Then,
when we need to get the parser for SortField.INT, we'd check the
FieldInfo to see which parser to use.  This could also handle
back-compat, ie if we change the trie format being written we'd change
the setting and segment merging would gradually uprade previously
indexed fields.


{quote}
 I'd also like to rename RangeQuery to something else, with this
 change. EG TermRangeQuery... to emphasize that you use it for
 non-numbers. The javadocs of TermRangeQuery should point to
 Int/LongRangeQuery as strongly preferred for numeric ranges.

Cool. For the others, too (FieldCacheRangeQuery).

{quote}

Yes.



 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470,
LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to
its default FieldTypes (SOLR-940) and if possible I want to move it to core
before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery

[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715512#action_12715512
 ] 

Michael McCandless commented on LUCENE-1673:


bq. I want to move it to core before release of 2.9

+1!

bq. There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
should they be called in core?

I prefer to not use trie in the names (package and classes)... that
term very much describes what's under-the-hood in these classes (how
they are implemented), whereas I think [generally] names should
describe how the class is intended to be used.  So I prefer
Long[Numeric]RangeQuery over LongTrieRangeQuery.

I'd also like to rename RangeQuery to something else, with this
change.  EG TermRangeQuery... to emphasize that you use it for
non-numbers.  The javadocs of TermRangeQuery should point to
Int/LongRangeQuery as strongly preferred for numeric ranges.

bq. Maybe the pairs of classes for indexing and searching should be moved into 
one class

I think separate classes for int, long, float, double is better.

bq. TrieUtils move into o.a.l.util? or document or?

Maybe document?

bq. Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
o.a.l.analysis.tokenattributes?

That sounds good?

bq. If we rename the classes, should Solr stay with Trie (because there are 
different impls)?

Well, Solr should decide ;)

But: why are there different impls for Solr?

bq. Maybe add a subclass of AbstractField, that automatically creates these 
TokenStreams and omits norms/tf per default for easier addition to Document 
instances?

+1

For a numeric field where one will sort or do range filtering, Trie*
ought to be the default.  But, unfortunately, the steps needed to make
use of Trie* are numerous:

  * Add your field to your doc with the LongTrieTokenStream

  * When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
to your SortField

  * When you want to filter by range, instantiate
LongTrieRangeFilter.  You'll have to subclass QueryParser to do
this for the right fields.

  * When you want to display values, you must also pass the trie parser
 when populating the FieldCache

Ideally, one would simply use, say, LongNumericField (subclass of
AbstractField) at indexing time, Lucene would remember this
in the index (this is obviously missing today), and then when you
sort, retrieve value, and create queries from QueryParser, all these
places would know that this is a trie field and simply do the right
thing, by default.

(Aside: I just noticed the code fragment in the javadocs for
LongTrieTokenStream won't compile, because the setValue method is not
available for TokenStream; the stream should be defined as
LongTrieTokenStream, I think?; same with IntTrieTokenStream)


 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to 

[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-02 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715529#action_12715529
 ] 

Uwe Schindler commented on LUCENE-1673:
---

{quote}
(Aside: I just noticed the code fragment in the javadocs for
LongTrieTokenStream won't compile, because the setValue method is not
available for TokenStream; the stream should be defined as
LongTrieTokenStream, I think?; same with IntTrieTokenStream)
{quote}

I fixed this :-) Thanks!

{quote}
bq. If we rename the classes, should Solr stay with Trie (because there are 
different impls)?

Well, Solr should decide 

But: why are there different impls for Solr?
{quote}

I only added this here, to know, that Solr already started to implement this. 
In Solr there are three different impls:
- Trie (of course)
- Text-only numbers (do not work with range queries, but can be used for 
sorting etc.)
- A binary encoding (also used by LocalLucene at the moment), that is sortable. 
This can be used for RangeQueries, but sorting is slow (because they have no 
parser, and at the time it was implemented, SortField had no parser support)

The problem, because of backwards compatibility they need to be preserved 
(possibility to read old indexes).

bq. I think separate classes for int, long, float, double is better.

Two more... The problem, all these classes have exact the same impl internally 
and this is code duplication and hard to maintain. Maybe we use a static 
factory instead of same Ctor. By this the name is different, but it just 
creates the correct instance of always the same class: 
NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and so on. 
Same for the TokenStreams (and the Field?)

{quote}
Ideally, one would simply use, say, LongNumericField (subclass of
AbstractField) at indexing time, Lucene would remember this
in the index (this is obviously missing today), and then when you
sort, retrieve value, and create queries from QueryParser, all these
places would know that this is a trie field and simply do the right
thing, by default.
{quote}

For that we need the type information in the index and for that the new 
Field/Document classes. Hopefully Michael will get this working soonly.

{quote}
When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
to your SortField 
{quote}

Or add new SortField types.

The problem with all this: For old indexes, we need some backwards 
compatibility. Ideally we would just create numeric fields in the new way and 
reuse e.g. SortField.INT for this. But this cannot be done. Or even, replace 
the FieldCache parsers by the trie ones. But this cannot be done at the moment.

{quote}
I'd also like to rename RangeQuery to something else, with this
change. EG TermRangeQuery... to emphasize that you use it for
non-numbers. The javadocs of TermRangeQuery should point to
Int/LongRangeQuery as strongly preferred for numeric ranges.
{quote}

Cool. For the others, too (FieldCacheRangeQuery).

There is a lot more to decide, I will keep this issue open a little bit before 
starting to work to collect ideas!

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, 

[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-01 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715144#action_12715144
 ] 

Earwin Burrfoot commented on LUCENE-1673:
-

Sudden thought. Leave it in contribs, you won't be bound by any other 
back-compat policies besides common sense. :)

 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-01 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715219#action_12715219
 ] 

Paul Elschot commented on LUCENE-1673:
--

From a bit of a distance:

You could consider putting everything in o.a.l.trie .

I'd prefer to have explicit class names containing Long, Int etc, and also 
containing Trie.

I don't know the details of the tokenizing, but AbstractTrieField sounds just 
right.



 Move TrieRange to core
 --

 Key: LUCENE-1673
 URL: https://issues.apache.org/jira/browse/LUCENE-1673
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9


 TrieRange was iterated many times and seems stable now (LUCENE-1470, 
 LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
 its default FieldTypes (SOLR-940) and if possible I want to move it to core 
 before release of 2.9.
 Before this can be done, there are some things to think about:
 # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
 should they be called in core? I would suggest to leave it as it is. On the 
 other hand, if this keeps our only numeric query implementation, we could 
 call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
 are problems). Same for the TokenStreams and Filters.
 # Maybe the pairs of classes for indexing and searching should be moved into 
 one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
 problem here: ctors must be able to pass int, long, double, float as range 
 parameters. For the end user, mixing these 4 types in one class is hard to 
 handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
 int version of range query, hitting no results and so on. Same with other 
 types. Maybe accept java.lang.Number as parameter (because nullable for 
 half-open bounds) and one enum for the type.
 # TrieUtils move into o.a.l.util? or document or?
 # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
 o.a.l.analysis.tokenattributes? Somewhere else?
 # If we rename the classes, should Solr stay with Trie (because there are 
 different impls)?
 # Maybe add a subclass of AbstractField, that automatically creates these 
 TokenStreams and omits norms/tf per default for easier addition to Document 
 instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org