Hi, When opening SortedSetDocValuesReaderState at search time, whether the whole doc value files (.dvd & .dvm) information are loaded in memory or specified field information(say $facets field) alone load in memory?
Any help is much appreciated. Regards, Chitra On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <chithu.r...@gmail.com> wrote: > > Kindly post your suggestions. > > Regards, > Chitra > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <chithu.r...@gmail.com> wrote: > >> Hey, I got it clearly. Thank you so much. Could you please help us to >> implement it in our use case? >> >> >> In our case, we are having dynamic index and it is variable depth too. So >> flat facet is enough.No need of hierarchical facets. >> >> What I think is, >> >> >> 1. Index my facet field as normal doc value field, so that no special >> operation (like taxonomy and sorted set doc values facet field) will be >> done at index time and only doc value field stores its ordinals in their >> respective field. >> 2. At search time, I will pass query (user search query) , filter >> (path traversed list) and collect the matching documents in >> Facetscollector. >> >> 3. To compute facet count for the specific field, I will gather those >> resulted docs, then move through each segment for collecting the matching >> ordinals using AtomicReader. >> >> >> And know when I use this means, can't calculate facet count for more than >> one field(facet) in a search. >> >> Instead of loading all the dimensions in DocValuesReaderState (will take >> more time and memory) at search time, loading specific fields will take >> less time and memory, hope so. Kindly help to solve. >> >> >> It will do it in a minimal index and search cost, I think. And hope this >> won't put overload at index time, also at search time this will be better. >> >> >> Kindly post your suggestions. >> >> >> Regards, >> Chitra >> >> >> >> >> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >>> I think you've summed up exactly the differences! >>> >>> And, yes, it would be possible to emulate hierarchical facets on top >>> of flat facets, if the hierarchy is fixed depth like year/month/day. >>> >>> But if it's variable depth, it's trickier (but I think still >>> possible). See e.g. the Committed Paths drill-down on the left, on >>> our dog-food server >>> http://jirasearch.mikemccandless.com/search.py?index=jira >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> >>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r...@gmail.com> wrote: >>> > case 1: >>> > In taxonomy, for each indexed document, examines facet label , >>> > computes their ordinals and mappings, and which will be stored in >>> sidecar >>> > index at index time. >>> > >>> > case 2: >>> > In doc values, these(ordinals) are computed at search time, so >>> there >>> > will be a time and memory trade-off between both cases, hope so. >>> > >>> > >>> > In taxonomy, building hierarchical facets at index time makes faceting >>> cost >>> > minimal at search time than flat facets in doc values. >>> > >>> > Except (memory,time and NRT latency) , Is any another contrast between >>> > hierarchical and flat facets at search time? >>> > >>> > >>> > Kindly post your suggestions... >>> > >>> > >>> > Regards, >>> > Chitra >>> > >>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <chithu.r...@gmail.com> >>> wrote: >>> >> >>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical >>> >> facets during indexing. Hope hierarchical in the sense, we might >>> index the >>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date: >>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are >>> maintained >>> >> in sidecar index and it is mapped to the main index. >>> >> >>> >> For example: >>> >> >>> >> In search-lucene.com , I enter a term (say facet), >>> top >>> >> documents and their categories are displayed after performing the >>> search. >>> >> Say I drill down through Publish date/2010 to collect its child >>> counts and >>> >> after I will pass through publishdate/2010/10 to collect their child >>> counts. >>> >> And for each drill down, each search will be performed to collect its >>> top >>> >> docs and categories. >>> >> >>> >> >>> >> Even I can achieve this in flat facets by changing the >>> >> drill down query. >>> >> >>> >> Am I right or missed anything? yet I don't know if I missed >>> anything... >>> >> >>> >> So What is the need of hierarchical facets? Could you please explain >>> >> it(hierarchical facets) in the real-world use case? >>> >> >>> >> >>> >> Regards, >>> >> Chitra >>> >> >>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless >>> >> <luc...@mikemccandless.com> wrote: >>> >>> >>> >>> You store dimension + string (a single value path, since it's not >>> >>> hierarchical) into SSDVFF so that you can compute facet counts, >>> either >>> >>> ordinary drill down counts or the drill sideways counts. >>> >>> >>> >>> You can see examples of drill sideways at >>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of >>> those >>> >>> fields on the left and you don't lose the previous facet counts for >>> >>> that field. >>> >>> >>> >>> Mike McCandless >>> >>> >>> >>> http://blog.mikemccandless.com >>> >>> >>> >>> >>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <chithu.r...@gmail.com> >>> wrote: >>> >>> > Hi, >>> >>> > >>> >>> > Lucene-Drill sideways >>> >>> > >>> >>> > jira_issue:LUCENE-4748 >>> >>> > >>> >>> > Is this the reason( ie Drill >>> sideways >>> >>> > makes >>> >>> > a very nice faceted search UI because we >>> >>> > don't "lose" the facet counts after drilling in) behind storing >>> path >>> >>> > and >>> >>> > dimension for the given SSDVF field? Else anything? >>> >>> > >>> >>> > Regards, >>> >>> > Chitra >>> >>> > >>> >>> > >>> >>> > Hey, thank you so much for the fast response, I agree NRT >>> refresh >>> >>> > is >>> >>> > somewhat costly operations and this is the major pitfall, suppose >>> we >>> >>> > use doc >>> >>> > value faceting. >>> >>> > >>> >>> > >>> >>> > While indexing SortedSetDocValuesFacetField , it >>> >>> > stores >>> >>> > path and dimension of the given field internally. So Can we achieve >>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing >>> path >>> >>> > and >>> >>> > dimension is to achieve hierarchical facets. If yes (ie we can >>> achieve >>> >>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy? >>> >>> > Else I missed anything? >>> >>> > >>> >>> > >>> >>> > What is the real purpose to store path and >>> dimension >>> >>> > in >>> >>> > SSDVF field? >>> >>> > >>> >>> > >>> >>> > Kindly post your suggestions. >>> >>> > >>> >>> > Regards, >>> >>> > Chitra >>> >>> > >>> >>> > >>> >>> > >>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless >>> >>> > <luc...@mikemccandless.com> wrote: >>> >>> >> >>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <chithu.r...@gmail.com> >>> >>> >> wrote: >>> >>> >> >>> >>> >> > i)Hope, when opening SortedSetDocValuesReaderState , we >>> are >>> >>> >> > calculating ordinals( this will be used to calculate facet >>> count ) >>> >>> >> > for >>> >>> >> > doc >>> >>> >> > values field and this only made the state instance somewhat >>> costly. >>> >>> >> > Am I right or any other reason behind >>> that? >>> >>> >> >>> >>> >> That's correct. It adds some latency to an NRT refresh, and some >>> heap >>> >>> >> used to hold the ordinal mappings. >>> >>> >> >>> >>> >> > ii) During indexing, we are providing facet ordinals in >>> >>> >> > each >>> >>> >> > doc >>> >>> >> > and I think it will be useful in search side, to calculate facet >>> >>> >> > counts >>> >>> >> > only for matching docs. otherwise, it carries any other >>> benefits? >>> >>> >> >>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't require a >>> >>> >> separate index. >>> >>> >> >>> >>> >> But they add latency/heap usage, and they cannot do hierarchical >>> >>> >> facets yet (though this could be fixed if someone just built it). >>> >>> >> >>> >>> >> > iii) Is SortedSetDocValuesReaderState thread-safe (ie) >>> >>> >> > multiple >>> >>> >> > threads can call this method concurrently? >>> >>> >> >>> >>> >> Yes. >>> >>> >> >>> >>> >> Mike McCandless >>> >>> >> >>> >>> >> http://blog.mikemccandless.com >>> >>> > >>> >>> > >>> >> >>> >> >>> > >>> >> >> >