Thank you so much, mike... Hope, gained a lot of stuff on Doc Values faceting and also clarified all my doubts. Thanks..!!
*Another use case:* After getting matching documents for the given query, Is there any way to calculate mix and max values on NumericDocValuesField ( say date field)? I would like to implement it in numeric range faceting by splitting the numeric values (getting from resulted documents) into ranges. Chitra On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Doc values fields are never loaded into memory; at most some small > index structures are. > > When you use those fields, the bytes (for just the one doc values > field you are using) are pulled from disk, and the OS will cache them > in memory if available. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <chithu.r...@gmail.com> wrote: > > Hi, > > When opening SortedSetDocValuesReaderState at search time, > whether > > the whole doc value files (.dvd & .dvm) information are loaded in memory > or > > specified field information(say $facets field) alone load in memory? > > > > > > > > > > Any help is much appreciated. > > > > > > Regards, > > Chitra > > > > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <chithu.r...@gmail.com> wrote: > >> > >> > >> Kindly post your suggestions. > >> > >> Regards, > >> Chitra > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <chithu.r...@gmail.com> > wrote: > >>> > >>> Hey, I got it clearly. Thank you so much. Could you please help us to > >>> implement it in our use case? > >>> > >>> > >>> In our case, we are having dynamic index and it is variable depth too. > So > >>> flat facet is enough.No need of hierarchical facets. > >>> > >>> What I think is, > >>> > >>> Index my facet field as normal doc value field, so that no special > >>> operation (like taxonomy and sorted set doc values facet field) will > be done > >>> at index time and only doc value field stores its ordinals in their > >>> respective field. > >>> At search time, I will pass query (user search query) , filter (path > >>> traversed list) and collect the matching documents in Facetscollector. > >>> To compute facet count for the specific field, I will gather those > >>> resulted docs, then move through each segment for collecting the > matching > >>> ordinals using AtomicReader. > >>> > >>> > >>> And know when I use this means, can't calculate facet count for more > than > >>> one field(facet) in a search. > >>> > >>> Instead of loading all the dimensions in DocValuesReaderState (will > take > >>> more time and memory) at search time, loading specific fields will > take less > >>> time and memory, hope so. Kindly help to solve. > >>> > >>> > >>> It will do it in a minimal index and search cost, I think. And hope > this > >>> won't put overload at index time, also at search time this will be > better. > >>> > >>> > >>> Kindly post your suggestions. > >>> > >>> > >>> Regards, > >>> Chitra > >>> > >>> > >>> > >>> > >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless > >>> <luc...@mikemccandless.com> wrote: > >>>> > >>>> I think you've summed up exactly the differences! > >>>> > >>>> And, yes, it would be possible to emulate hierarchical facets on top > >>>> of flat facets, if the hierarchy is fixed depth like year/month/day. > >>>> > >>>> But if it's variable depth, it's trickier (but I think still > >>>> possible). See e.g. the Committed Paths drill-down on the left, on > >>>> our dog-food server > >>>> http://jirasearch.mikemccandless.com/search.py?index=jira > >>>> > >>>> Mike McCandless > >>>> > >>>> http://blog.mikemccandless.com > >>>> > >>>> > >>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r...@gmail.com> > wrote: > >>>> > case 1: > >>>> > In taxonomy, for each indexed document, examines facet > label , > >>>> > computes their ordinals and mappings, and which will be stored in > >>>> > sidecar > >>>> > index at index time. > >>>> > > >>>> > case 2: > >>>> > In doc values, these(ordinals) are computed at search time, > so > >>>> > there > >>>> > will be a time and memory trade-off between both cases, hope so. > >>>> > > >>>> > > >>>> > In taxonomy, building hierarchical facets at index time makes > faceting > >>>> > cost > >>>> > minimal at search time than flat facets in doc values. > >>>> > > >>>> > Except (memory,time and NRT latency) , Is any another contrast > between > >>>> > hierarchical and flat facets at search time? > >>>> > > >>>> > > >>>> > Kindly post your suggestions... > >>>> > > >>>> > > >>>> > Regards, > >>>> > Chitra > >>>> > > >>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <chithu.r...@gmail.com> > >>>> > wrote: > >>>> >> > >>>> >> Okay. I agree with you, Taxonomy maintains and supports > hierarchical > >>>> >> facets during indexing. Hope hierarchical in the sense, we might > >>>> >> index the > >>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish > date: > >>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are > >>>> >> maintained > >>>> >> in sidecar index and it is mapped to the main index. > >>>> >> > >>>> >> For example: > >>>> >> > >>>> >> In search-lucene.com , I enter a term (say facet), > >>>> >> top > >>>> >> documents and their categories are displayed after performing the > >>>> >> search. > >>>> >> Say I drill down through Publish date/2010 to collect its child > >>>> >> counts and > >>>> >> after I will pass through publishdate/2010/10 to collect their > child > >>>> >> counts. > >>>> >> And for each drill down, each search will be performed to collect > its > >>>> >> top > >>>> >> docs and categories. > >>>> >> > >>>> >> > >>>> >> Even I can achieve this in flat facets by changing > the > >>>> >> drill down query. > >>>> >> > >>>> >> Am I right or missed anything? yet I don't know if I missed > >>>> >> anything... > >>>> >> > >>>> >> So What is the need of hierarchical facets? Could you please > explain > >>>> >> it(hierarchical facets) in the real-world use case? > >>>> >> > >>>> >> > >>>> >> Regards, > >>>> >> Chitra > >>>> >> > >>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless > >>>> >> <luc...@mikemccandless.com> wrote: > >>>> >>> > >>>> >>> You store dimension + string (a single value path, since it's not > >>>> >>> hierarchical) into SSDVFF so that you can compute facet counts, > >>>> >>> either > >>>> >>> ordinary drill down counts or the drill sideways counts. > >>>> >>> > >>>> >>> You can see examples of drill sideways at > >>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of > >>>> >>> those > >>>> >>> fields on the left and you don't lose the previous facet counts > for > >>>> >>> that field. > >>>> >>> > >>>> >>> Mike McCandless > >>>> >>> > >>>> >>> http://blog.mikemccandless.com > >>>> >>> > >>>> >>> > >>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <chithu.r...@gmail.com> > >>>> >>> wrote: > >>>> >>> > Hi, > >>>> >>> > > >>>> >>> > Lucene-Drill sideways > >>>> >>> > > >>>> >>> > jira_issue:LUCENE-4748 > >>>> >>> > > >>>> >>> > Is this the reason( ie Drill > >>>> >>> > sideways > >>>> >>> > makes > >>>> >>> > a very nice faceted search UI because we > >>>> >>> > don't "lose" the facet counts after drilling in) behind storing > >>>> >>> > path > >>>> >>> > and > >>>> >>> > dimension for the given SSDVF field? Else anything? > >>>> >>> > > >>>> >>> > Regards, > >>>> >>> > Chitra > >>>> >>> > > >>>> >>> > > >>>> >>> > Hey, thank you so much for the fast response, I agree NRT > >>>> >>> > refresh > >>>> >>> > is > >>>> >>> > somewhat costly operations and this is the major pitfall, > suppose > >>>> >>> > we > >>>> >>> > use doc > >>>> >>> > value faceting. > >>>> >>> > > >>>> >>> > > >>>> >>> > While indexing SortedSetDocValuesFacetField , > it > >>>> >>> > stores > >>>> >>> > path and dimension of the given field internally. So Can we > >>>> >>> > achieve > >>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of > storing > >>>> >>> > path > >>>> >>> > and > >>>> >>> > dimension is to achieve hierarchical facets. If yes (ie we can > >>>> >>> > achieve > >>>> >>> > hierarchy in SSDVFF) , so what is the need to move over > taxonomy? > >>>> >>> > Else I missed anything? > >>>> >>> > > >>>> >>> > > >>>> >>> > What is the real purpose to store path and > >>>> >>> > dimension > >>>> >>> > in > >>>> >>> > SSDVF field? > >>>> >>> > > >>>> >>> > > >>>> >>> > Kindly post your suggestions. > >>>> >>> > > >>>> >>> > Regards, > >>>> >>> > Chitra > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless > >>>> >>> > <luc...@mikemccandless.com> wrote: > >>>> >>> >> > >>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R < > chithu.r...@gmail.com> > >>>> >>> >> wrote: > >>>> >>> >> > >>>> >>> >> > i)Hope, when opening SortedSetDocValuesReaderState , > we > >>>> >>> >> > are > >>>> >>> >> > calculating ordinals( this will be used to calculate facet > >>>> >>> >> > count ) > >>>> >>> >> > for > >>>> >>> >> > doc > >>>> >>> >> > values field and this only made the state instance somewhat > >>>> >>> >> > costly. > >>>> >>> >> > Am I right or any other reason behind > >>>> >>> >> > that? > >>>> >>> >> > >>>> >>> >> That's correct. It adds some latency to an NRT refresh, and > some > >>>> >>> >> heap > >>>> >>> >> used to hold the ordinal mappings. > >>>> >>> >> > >>>> >>> >> > ii) During indexing, we are providing facet ordinals > >>>> >>> >> > in > >>>> >>> >> > each > >>>> >>> >> > doc > >>>> >>> >> > and I think it will be useful in search side, to calculate > >>>> >>> >> > facet > >>>> >>> >> > counts > >>>> >>> >> > only for matching docs. otherwise, it carries any other > >>>> >>> >> > benefits? > >>>> >>> >> > >>>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't > require > >>>> >>> >> a > >>>> >>> >> separate index. > >>>> >>> >> > >>>> >>> >> But they add latency/heap usage, and they cannot do > hierarchical > >>>> >>> >> facets yet (though this could be fixed if someone just built > it). > >>>> >>> >> > >>>> >>> >> > iii) Is SortedSetDocValuesReaderState thread-safe > (ie) > >>>> >>> >> > multiple > >>>> >>> >> > threads can call this method concurrently? > >>>> >>> >> > >>>> >>> >> Yes. > >>>> >>> >> > >>>> >>> >> Mike McCandless > >>>> >>> >> > >>>> >>> >> http://blog.mikemccandless.com > >>>> >>> > > >>>> >>> > > >>>> >> > >>>> >> > >>>> > > >>> > >>> > >> > > >