[ 
https://issues.apache.org/jira/browse/LUCENE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219098#comment-15219098
 ] 

Robert Muir commented on LUCENE-7096:
-------------------------------------

Ishan, I don't think it really impacts solr honestly. The thing about points is 
that they can do a lot more than old numerics. Not only do you expand to 128 
bits per value, but you can have up to 8 dimensions.

So really you have to decide what makes sense based on what you are trying to 
do. If you just want to cover single-dimensional primitive types, SortedNumeric 
is an obvious choice (it will not "lose" frequency in a doc, which points does 
not "lose" either). If its a float or double, vs an int or a long, you may want 
to handle it a little differently, e.g. use NumericUtils.sortableDoubleBits so 
that "sortedness" within a document has true meaning. This can make sort 
comparators based on min/max/median work in constant time. But if you go that 
way, I think e.g. faceting/grouping/etc code in solr would need to be modified 
to support that.

On the other hand, for InetAddressPoint (128-bit ipv6), SortedSet is a much 
better choice. Prefix compression basically maps to "compress by network" and 
is not just important for ipv6, but also important for mapped ipv4 data or 
mixed ipv4/ipv6 data (Points/BKD tree has this compression too). Otherwise its 
really using 128-bits storage per value. Sure, you pay a cost for ordinals, but 
ordinals are only 32-bit and will speed up both sorting and faceting (to me: a 
variant of range faceting like "facet-by-network" would be the obvious use case 
there, so ordinals work for that too).

With multidimensional data there is no clear answer. Currently LatLonPoint uses 
2 dimensions of 32-bits each for searching, and shoves them into a single 
64-bit SortedNumeric for sorting and two-phase iteration support. This works 
well because e.g. the typical hotspot in its sort comparator only works on the 
integer value most of the time anyway, and two-phase support is only needed for 
edge cases. 

For Geo3DPoint, who knows? I don't yet have a good understanding of how 
expensive its single-doc verification methods are (i think distance is cheaper 
than 2D, but polygon? dunno), how rare they are, or what would be the best way 
to represent them yet. Maybe its still better to store it in 2D 
(SortedNumeric), reuse that one's same sort comparator if the distance metrics 
are compatible :) If two-phase support is not needed this may work. If its only 
needed in very rare cases we could even convert 2D->3D on the fly or optimize 
it so that conversion is very rare. But maybe this is too complex and a binary 
encoding would be better. 

So I'm hesitant to add new types to UninvertingReader for this reason, 
especially when values are larger and so on. If you really think its the right 
way to go anyway, feel free to pick up my branch 
(https://github.com/rmuir/lucene-solr/tree/fc2) but it only contains API 
changes, no actual uninverting. I'm not really against it for primitive 1D 
numeric types, I'd just rather work on other things, and I feel like its not 
the best direction.

> UninvertingReader needs multi-valued points support
> ---------------------------------------------------
>
>                 Key: LUCENE-7096
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7096
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-7096.patch
>
>
> It now supports the single valued case (deprecating the legacy encoding), but 
> the multi-valued stuff does not yet have a replacement.
> ideally we add a FC.getSortedNumeric(Parser..) that works from points. Unlike 
> postings, points never lose frequency within a field, so its the best fit. 
> when getDocCount() == size(), the field is single-valued, so this should call 
> getNumeric and box that in SortedNumeric, similar to the String case.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to