[jira] [Updated] (SOLR-11023) Need SortedNumerics/Points version of EnumField

2017-08-04 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-11023:
--
Attachment: SOLR-11023.patch

Final patch, including some precommit, test, and doc fixes.

100 beasting iterations of EnumFieldTest all passed, as did precommit and all 
Solr tests.

Committing shortly.

> Need SortedNumerics/Points version of EnumField
> ---
>
> Key: SOLR-11023
> URL: https://issues.apache.org/jira/browse/SOLR-11023
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Steve Rowe
>Priority: Blocker
>  Labels: numeric-tries-to-points
> Fix For: 7.0
>
> Attachments: SOLR-11023.patch, SOLR-11023.patch, SOLR-11023.patch, 
> SOLR-11023.patch, SOLR-11023.patch, SOLR-11023.patch
>
>
> although it's not a subclass of TrieField, EnumField does use 
> "LegacyIntField" to index the int value associated with each of the enum 
> values, in addition to using SortedSetDocValuesField when {{docValues="true" 
> multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality 
> usecases like EnumField, but either way we should think about a new variant 
> of EnumField that doesn't depend on 
> LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses 
> SortedNumericDocValues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11023) Need SortedNumerics/Points version of EnumField

2017-08-04 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-11023:
--
Attachment: SOLR-11023.patch

Hopefully final patch, including a CHANGES entry and ref guide updates.

I'll commit once precommit, a 100-iteration beasting of EnumFieldTest, and all 
Solr tests pass.

> Need SortedNumerics/Points version of EnumField
> ---
>
> Key: SOLR-11023
> URL: https://issues.apache.org/jira/browse/SOLR-11023
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Steve Rowe
>Priority: Blocker
>  Labels: numeric-tries-to-points
> Fix For: 7.0
>
> Attachments: SOLR-11023.patch, SOLR-11023.patch, SOLR-11023.patch, 
> SOLR-11023.patch, SOLR-11023.patch
>
>
> although it's not a subclass of TrieField, EnumField does use 
> "LegacyIntField" to index the int value associated with each of the enum 
> values, in addition to using SortedSetDocValuesField when {{docValues="true" 
> multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality 
> usecases like EnumField, but either way we should think about a new variant 
> of EnumField that doesn't depend on 
> LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses 
> SortedNumericDocValues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11023) Need SortedNumerics/Points version of EnumField

2017-08-03 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-11023:
--
Attachment: SOLR-11023.patch

More complete WIP patch, almost done.

The tests are mostly fleshed out; there is one remaining nocommit, to add a 
test of set queries over.

Also, the docs need fixing up to mention the new type and the fact that the old 
type is deprecated.

I created SOLR-11193 for problems with multi-valued EnumField; I don't plan on 
blocking 7.0 for fixes there.  I've put assumes in EnumFieldTest to avoid 
testing the problematic functionality with EnumField.

> Need SortedNumerics/Points version of EnumField
> ---
>
> Key: SOLR-11023
> URL: https://issues.apache.org/jira/browse/SOLR-11023
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Steve Rowe
>Priority: Blocker
>  Labels: numeric-tries-to-points
> Fix For: 7.0
>
> Attachments: SOLR-11023.patch, SOLR-11023.patch, SOLR-11023.patch, 
> SOLR-11023.patch
>
>
> although it's not a subclass of TrieField, EnumField does use 
> "LegacyIntField" to index the int value associated with each of the enum 
> values, in addition to using SortedSetDocValuesField when {{docValues="true" 
> multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality 
> usecases like EnumField, but either way we should think about a new variant 
> of EnumField that doesn't depend on 
> LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses 
> SortedNumericDocValues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11023) Need SortedNumerics/Points version of EnumField

2017-08-03 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-11023:
--
Attachment: SOLR-11023.patch

WIP patch, hopefully code complete, more testing needed (described by Hoss'ss 
nocommits in EnumFieldTest).

Changes from Hoss'ss patch: Most nocommits are resolved.  I renamed 
HuperDuperEnumField to EnumFieldType. I decided not to support uninversion in 
EnumFieldType (I couldn't figure out how to make it work), and also to require 
docValues, since I think it would be a trap to allow a situation where sorting 
isn't supported over a field that claims to support arbitrary sorting over the 
enum values.  I've added a test to throw an exception when docvalues aren't 
enabled for an EnumFieldType field.

> Need SortedNumerics/Points version of EnumField
> ---
>
> Key: SOLR-11023
> URL: https://issues.apache.org/jira/browse/SOLR-11023
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Steve Rowe
>Priority: Blocker
>  Labels: numeric-tries-to-points
> Fix For: 7.0
>
> Attachments: SOLR-11023.patch, SOLR-11023.patch, SOLR-11023.patch
>
>
> although it's not a subclass of TrieField, EnumField does use 
> "LegacyIntField" to index the int value associated with each of the enum 
> values, in addition to using SortedSetDocValuesField when {{docValues="true" 
> multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality 
> usecases like EnumField, but either way we should think about a new variant 
> of EnumField that doesn't depend on 
> LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses 
> SortedNumericDocValues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11023) Need SortedNumerics/Points version of EnumField

2017-07-25 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-11023:
--
Fix Version/s: 7.0

> Need SortedNumerics/Points version of EnumField
> ---
>
> Key: SOLR-11023
> URL: https://issues.apache.org/jira/browse/SOLR-11023
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Blocker
>  Labels: numeric-tries-to-points
> Fix For: 7.0
>
> Attachments: SOLR-11023.patch, SOLR-11023.patch
>
>
> although it's not a subclass of TrieField, EnumField does use 
> "LegacyIntField" to index the int value associated with each of the enum 
> values, in addition to using SortedSetDocValuesField when {{docValues="true" 
> multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality 
> usecases like EnumField, but either way we should think about a new variant 
> of EnumField that doesn't depend on 
> LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses 
> SortedNumericDocValues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11023) Need SortedNumerics/Points version of EnumField

2017-07-25 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-11023:
--
Priority: Blocker  (was: Major)

> Need SortedNumerics/Points version of EnumField
> ---
>
> Key: SOLR-11023
> URL: https://issues.apache.org/jira/browse/SOLR-11023
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Blocker
>  Labels: numeric-tries-to-points
> Attachments: SOLR-11023.patch, SOLR-11023.patch
>
>
> although it's not a subclass of TrieField, EnumField does use 
> "LegacyIntField" to index the int value associated with each of the enum 
> values, in addition to using SortedSetDocValuesField when {{docValues="true" 
> multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality 
> usecases like EnumField, but either way we should think about a new variant 
> of EnumField that doesn't depend on 
> LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses 
> SortedNumericDocValues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11023) Need SortedNumerics/Points version of EnumField

2017-07-24 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-11023:

Attachment: SOLR-11023.patch


bq. I think points are not a good fit in that case, they will use more disk and 
be slower at exact queries, ...

Right, that's kind of what i was suspecting based on what i've read about 
points in the past -- really low cardinality like this didn't seem appropriate.

bq. I'd really store it like a string field ...

The hitch there is that, historically, when using "solr.EnumField" it's been 
possible to change the (String) "values" in the enumConfig.xml w/o breaking 
anything because only the (int) ordinals for those values were ever indexed (or 
put in docValues) ... and ideally we still want to do that.

So I think we really still want to use ints for the store/index/docvalues and 
deal with all the String/label conversion only in response writing and query 
parsing.   Obviously we'll still want to use NumericUtils.intToSortableBytes 
for the indexed bytes so that our range queries are still "numeric" and not 
"alphabetical"



I'm about to go off-line for 2 weeks, so here's a patch w/ my current work in 
progress...

* SOLR-5927 related test config beef up
* beef up existing test code a bit so that our enum has more then 10 values (to 
sanity check against potential "1 < 11 < 2" range query mistakes)
** add several nocommit comments about test coverage that seems to be missing
* refactors the EnumConfig parsing up into a new AbstractEnumField super class 
of EnumField
** clones EnumField into a new "HuperDuperEnumField" sibling class
** modify HuperDuperEnumField to use SortedNumericDV (and equiv query) instead 
of SortedSetDV
* update test configs and SolrTestCaseJ4 to randomize HuperDuperEnumField when 
randomizing in "point" fields (i guess really it's just "not trie fields" at 
this point)

obviously the big item still TODO is removing the existing LegacyIntField code 
from HuperDuperEnumField and picking a decent name.

If anyone wants to pick up & continue this patch while i'm gone, by all means 
please do.



> Need SortedNumerics/Points version of EnumField
> ---
>
> Key: SOLR-11023
> URL: https://issues.apache.org/jira/browse/SOLR-11023
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Hoss Man
>  Labels: numeric-tries-to-points
> Attachments: SOLR-11023.patch, SOLR-11023.patch
>
>
> although it's not a subclass of TrieField, EnumField does use 
> "LegacyIntField" to index the int value associated with each of the enum 
> values, in addition to using SortedSetDocValuesField when {{docValues="true" 
> multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality 
> usecases like EnumField, but either way we should think about a new variant 
> of EnumField that doesn't depend on 
> LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses 
> SortedNumericDocValues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11023) Need SortedNumerics/Points version of EnumField

2017-07-20 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-11023:

Attachment: SOLR-11023.patch

step#0: refactor the init logic and xml parsing into an EnumMapping class.

I was initially thinking we could follow in the steps of CurrencyField and 
inject a new superclass above EnumField (which would default to points) and the 
(legacy) EnumField could subclass it and override as needed to use 
LegacyNumeric support where needed ... but the more i look at it the more of a 
pain in the ass that appears to be.

the problem being I suspect we'll really want the "new" field type to extend 
PointField (or maybe even IntPointField?) but we definitely don't want the 
legacy EnumField to transitively extend PointField.

Alternative that's occuring to me now: make the new field type work a *lot* 
like CurrencyField, where it wraps a concrete (int) fieldtype and delegates to 
for create/sort/etc -- only handling the int/string converstion before/after.  
then the legacy type just overrides what fieldtype that hidden inner field uses?

frankly... EnumField has enough weird little quirks about it, that since we 
_know_ we want to deprecate it and remove it in 8, I'm tempted to leave it 
alone as much as possible and take the "i feel dirty" hit of cut/pasting all 
the bits we do really need to keep into a new class that has nothing in common 
with it other then that they are both FieldTypes.

> Need SortedNumerics/Points version of EnumField
> ---
>
> Key: SOLR-11023
> URL: https://issues.apache.org/jira/browse/SOLR-11023
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Hoss Man
>  Labels: numeric-tries-to-points
> Attachments: SOLR-11023.patch
>
>
> although it's not a subclass of TrieField, EnumField does use 
> "LegacyIntField" to index the int value associated with each of the enum 
> values, in addition to using SortedSetDocValuesField when {{docValues="true" 
> multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality 
> usecases like EnumField, but either way we should think about a new variant 
> of EnumField that doesn't depend on 
> LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses 
> SortedNumericDocValues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org