[
https://issues.apache.org/jira/browse/LUCENE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219098#comment-15219098
]
Robert Muir commented on LUCENE-7096:
-------------------------------------
Ishan, I don't think it really impacts solr honestly. The thing about points is
that they can do a lot more than old numerics. Not only do you expand to 128
bits per value, but you can have up to 8 dimensions.
So really you have to decide what makes sense based on what you are trying to
do. If you just want to cover single-dimensional primitive types, SortedNumeric
is an obvious choice (it will not "lose" frequency in a doc, which points does
not "lose" either). If its a float or double, vs an int or a long, you may want
to handle it a little differently, e.g. use NumericUtils.sortableDoubleBits so
that "sortedness" within a document has true meaning. This can make sort
comparators based on min/max/median work in constant time. But if you go that
way, I think e.g. faceting/grouping/etc code in solr would need to be modified
to support that.
On the other hand, for InetAddressPoint (128-bit ipv6), SortedSet is a much
better choice. Prefix compression basically maps to "compress by network" and
is not just important for ipv6, but also important for mapped ipv4 data or
mixed ipv4/ipv6 data (Points/BKD tree has this compression too). Otherwise its
really using 128-bits storage per value. Sure, you pay a cost for ordinals, but
ordinals are only 32-bit and will speed up both sorting and faceting (to me: a
variant of range faceting like "facet-by-network" would be the obvious use case
there, so ordinals work for that too).
With multidimensional data there is no clear answer. Currently LatLonPoint uses
2 dimensions of 32-bits each for searching, and shoves them into a single
64-bit SortedNumeric for sorting and two-phase iteration support. This works
well because e.g. the typical hotspot in its sort comparator only works on the
integer value most of the time anyway, and two-phase support is only needed for
edge cases.
For Geo3DPoint, who knows? I don't yet have a good understanding of how
expensive its single-doc verification methods are (i think distance is cheaper
than 2D, but polygon? dunno), how rare they are, or what would be the best way
to represent them yet. Maybe its still better to store it in 2D
(SortedNumeric), reuse that one's same sort comparator if the distance metrics
are compatible :) If two-phase support is not needed this may work. If its only
needed in very rare cases we could even convert 2D->3D on the fly or optimize
it so that conversion is very rare. But maybe this is too complex and a binary
encoding would be better.
So I'm hesitant to add new types to UninvertingReader for this reason,
especially when values are larger and so on. If you really think its the right
way to go anyway, feel free to pick up my branch
(https://github.com/rmuir/lucene-solr/tree/fc2) but it only contains API
changes, no actual uninverting. I'm not really against it for primitive 1D
numeric types, I'd just rather work on other things, and I feel like its not
the best direction.
> UninvertingReader needs multi-valued points support
> ---------------------------------------------------
>
> Key: LUCENE-7096
> URL: https://issues.apache.org/jira/browse/LUCENE-7096
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-7096.patch
>
>
> It now supports the single valued case (deprecating the legacy encoding), but
> the multi-valued stuff does not yet have a replacement.
> ideally we add a FC.getSortedNumeric(Parser..) that works from points. Unlike
> postings, points never lose frequency within a field, so its the best fit.
> when getDocCount() == size(), the field is single-valued, so this should call
> getNumeric and box that in SortedNumeric, similar to the String case.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]