[jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache

Robert Muir (JIRA) Tue, 12 Feb 2013 04:49:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576593#comment-13576593
 ]


Robert Muir commented on LUCENE-4769:
-------------------------------------

{quote}
Until then, we have no other choice but to piggy-back on DV API, and that means 
extending DVFormat.
{quote}

Well mainly I'm trying to make sure we only have the minimum DocValues types 
and APIs we actually need. Additional types are very costly to us.

I'm still unsure myself that lucene should have a byte[] docvalues type that is 
unsorted: I don't see any real use cases for it directly.

But for someone who wants to encode their own data structures, having a 
per-document byte[] type where your codec can see all the values is pretty 
powerful. So if having this "catch-all" type prevents additional types from 
being added to lucene, then maybe its worth it.

{quote}
Perhaps separately we can think about an IndexReader impl for facets, which 
will open the road to many different optimizations, e.g. maintaining a 
per-segment taxonomy and top-level reader global-ordinal map (all in-memory), 
encoding facet ordinals in their own structure (and not DV) and maybe even 
managing the global taxonomy as part of the search index (through sidecar files 
or something), w/o the sidecar index, which I think today is a barrier for apps 
as well as integrating that into Solr or ES. But that should be done separately 
as it's a major refactoring to how facets work.
{quote}

I think a custom IndexReader impl would prevent barriers for integration with 
those systems too, just in a different way. Personally I think the current 
design (sidecar) is the most performant. But we should consider adding other 
possibilities to lucene that make different tradeoffs, e.g. work without it. 

{quote}
First, I'm not hell-bent on anything (don't even know what that means). Second, 
facets are now a *lucene* module, and not private to me. From my perspective, 
lucene doesn't need to have anything for me, but lucene should have the best 
facets module. So far I've been busy refactoring facets so they work faster and 
have cleaner API ... not to me, to lucene users. I'm sure things can be 
simplified even further and improved even more. I think about it constantly. If 
you have a better idea of how facets should work (while maintaining current 
capabilities, as much as possible), I'm all open to suggestions, really.
{quote}

I know, you are doing a great job. I'm just explaining my opinion on this 
situation: having facets "build on top of" BinaryDocValues doesnt hurt it in 
the slightest. Sometimes I wonder if you are having this argument with me to 
avoid a single type cast in the facets codebase or for some other cosmetic 
reason :)

                
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4769
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4769
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4769.patch, LUCENE-4769.patch
>
>
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a 
> CachedInts structure on LUCENE-4609. I ported it to the new facets API, as a 
> FacetsAggregator. I think we should offer users the means to use such a 
> cache, even if it consumes more RAM. Mike tests show that this cache consumed 
> x2 more RAM than if the DocValues were loaded into memory in their raw form. 
> Also, a PackedInts version of such cache took almost the same amount of RAM 
> as straight int[], but the gains were minor.
> I will post the patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache

Reply via email to