Re: Numeric Ranges Faceting

2017-02-20 Thread Chitra R
Hey, I got it. Thank you so much.

On Sat, Feb 18, 2017 at 5:33 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> You'll need to make your own buildFacetResults method in your
> DrillSideways subclass, and inside there you compute the facet counts
> for each dim using the implementation that dim used (taxonomy, sorted
> set, or range).  The TestRangeFacetCounts shows another example of
> this.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Feb 18, 2017 at 6:33 AM, Chitra R  wrote:
> > Hi,
> >
> > RangeFaceting computation was working fine while adding numeric Ranges in
> > DrillDownQuery. And this is not my issue.
> >
> > My question is,
> >
> > I need to compute string facets (via SortedSetDocValuesFacetCounts) using
> > drill sideways search by adding numeric Ranges in DrillDownQuery... This
> > case only throws an exception. Is it possible??
> >
> >
> > And I know, Drillsideways search retains previous level facets and its
> count
> > and it will be achieved by taking dimensions from drilldownQuery. In my
> > case, "price" is the dimension which was indexed as numericDocValuesField
> > and added in drilldownQuery. So only it was throwing an exception when I
> > search through drillsideways search. Am I right or missed anything?
> >
> >
> > Kindly help me to solve my issue.
> >
> > Regards,
> > Chitra
> >
> > On Sat, Feb 18, 2017 at 4:29 PM, Michael McCandless
> >  wrote:
> >>
> >> Hi,
> >>
> >> I think you are close!  All you need to do is make a subclass of
> >> DrillSideways and override the buildFacetsResults method to do the
> >> range faceting on your numeric dims.
> >>
> >> I just pushed an improvement to RangeFacetsExample.java showing how to
> >> do this: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/
> 1e4463e3
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Fri, Feb 17, 2017 at 11:15 AM, Chitra R 
> wrote:
> >> > Hey,
> >> > I have indexed "author","module_id" fields as
> >> > SortedSetDocValuesFacetField and "time", "price","salary" fields as
> >> > NumericDocValuesField.
> >> >
> >> > My Category looks like:
> >> >
> >> > *module_id
> >> >  -> author
> >> > *price
> >> >
> >> > module_id and price are parent categories. After selecting any one of
> >> > the
> >> > facets from module_id, sub-category ie "author" field will be shown.
> >> >
> >> > Use-case:
> >> >
> >> > 1. I have received path values from user as "module_id:1" and
> "price:100
> >> > TO
> >> > 500" and also need to perform drillsideways search.
> >> >
> >> > initializing drilldown query
> >> >
> >> >> DrillDownQuery drillDownQuery = new DrillDownQuery(facetsConfig,
> >> >> userGivenSearchQuery);
> >> >> drillDownQuery.add("module_id","1");
> >> >> drillDownQuery.add("price",NumericRangeQuery.newDoubleRange("price",
> >> >> 100.0, 200.0, range.minInclusive, range.maxInclusive));
> >> >
> >> >
> >> >  hits and facets computation
> >> >
> >> >> DrillSideways sideways = new DrillSideways(searcher, facetsConfig,
> >> >> docValuesReaderState);
> >> >> DrillSideways.DrillSidewaysResult drillResult =
> >> >> sideways.search(drillDownQuery, booleanFilter, null, 10, sort,
> >> >> doDocScore,
> >> >> doMaxScore);
> >> >> int totalHits = drillResult.hits.totalHits();   --> it show accurate
> >> >> total
> >> >> hits documents
> >> >> List facetResult = drillResult.facets.getAllDims(10)
> -->
> >> >> this
> >> >> line throws an exception.
> >> >
> >> >
> >> > Exception
> >> >
> >> >>
> >> >>
> >> >>
> >> >> java.lang.IllegalArgumentException: dimension "price" was not
> indexed
> >> >>
> >> >> at
> >> >>
> >> >> org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts.
> getTopChildren(SortedSet

Re: Numeric Ranges Faceting

2017-02-18 Thread Chitra R
Hi,

RangeFaceting computation was working fine while adding numeric Ranges in
DrillDownQuery. And this is not my issue.

My question is,

I need to compute string facets (via SortedSetDocValuesFacetCounts) using
drill sideways search by adding numeric Ranges in DrillDownQuery... This
case only throws an exception. Is it possible??


And I know, Drillsideways search retains previous level facets and its
count and it will be achieved by taking dimensions from drilldownQuery. In
my case, "price" is the dimension which was indexed as
numericDocValuesField and added in drilldownQuery. So only it was throwing
an exception when I search through drillsideways search. Am I right or
missed anything?


Kindly help me to solve my issue.

Regards,
Chitra

On Sat, Feb 18, 2017 at 4:29 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi,
>
> I think you are close!  All you need to do is make a subclass of
> DrillSideways and override the buildFacetsResults method to do the
> range faceting on your numeric dims.
>
> I just pushed an improvement to RangeFacetsExample.java showing how to
> do this: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/1e4463e3
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Feb 17, 2017 at 11:15 AM, Chitra R  wrote:
> > Hey,
> > I have indexed "author","module_id" fields as
> > SortedSetDocValuesFacetField and "time", "price","salary" fields as
> > NumericDocValuesField.
> >
> > My Category looks like:
> >
> > *module_id
> >  -> author
> > *price
> >
> > module_id and price are parent categories. After selecting any one of the
> > facets from module_id, sub-category ie "author" field will be shown.
> >
> > Use-case:
> >
> > 1. I have received path values from user as "module_id:1" and "price:100
> TO
> > 500" and also need to perform drillsideways search.
> >
> > initializing drilldown query
> >
> >> DrillDownQuery drillDownQuery = new DrillDownQuery(facetsConfig,
> >> userGivenSearchQuery);
> >> drillDownQuery.add("module_id","1");
> >> drillDownQuery.add("price",NumericRangeQuery.newDoubleRange("price",
> >> 100.0, 200.0, range.minInclusive, range.maxInclusive));
> >
> >
> >  hits and facets computation
> >
> >> DrillSideways sideways = new DrillSideways(searcher, facetsConfig,
> >> docValuesReaderState);
> >> DrillSideways.DrillSidewaysResult drillResult =
> >> sideways.search(drillDownQuery, booleanFilter, null, 10, sort,
> doDocScore,
> >> doMaxScore);
> >> int totalHits = drillResult.hits.totalHits();   --> it show accurate
> total
> >> hits documents
> >> List facetResult = drillResult.facets.getAllDims(10) -->
> this
> >> line throws an exception.
> >
> >
> > Exception
> >
> >>
> >>
> >>
> >> java.lang.IllegalArgumentException: dimension "price" was not indexed
> >>
> >> at
> >> org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts.
> getTopChildren(SortedSetDocValuesFacetCounts.java:91)
> >>
> >> at org.apache.lucene.facet.MultiFacets.getAllDims(MultiFacets.java:74)
> >
> >
> >
> > Am I did anything wrong???
> >
> >
> > Kindly post your suggestions.
> >
> > Thanks,
> > Chitra
> >
> >
> >
> > On Fri, Feb 17, 2017 at 9:11 PM, Michael McCandless
> >  wrote:
> >>
> >> Hi, how are you instantiating your MultiFacets?  You should be passing
> >> e.g. a LongRangeFacetCounts instance for your "time" dimension, which
> >> should prevent that exception.
> >>
> >> For DrillSideways, I think you must subclass, and then override
> >> buildFacetResult to compute your range facets, because that class
> >> assumes it's either indexed facets or sorted set doc values facets.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Fri, Feb 17, 2017 at 9:14 AM, Chitra R 
> wrote:
> >> > Any suggestions Kindly help me to move forward.
> >> >
> >> > Regards,
> >> > Chitra
> >> >
> >> > On Wed, Feb 15, 2017 at 9:23 PM, Chitra R 
> wrote:
> >> >>
> >> >> Hi,
> >> >>   Thanks for the suggestion. But in the case of drill
> >> >> sid

Re: Searcher Performance

2017-02-17 Thread Chitra R
Thanx a lot Adrien.

On Fri, Feb 17, 2017 at 10:07 PM, Adrien Grand  wrote:

> Some minimal information about the fields is loaded into memory when you
> open the index reader. Things like the list of fields and how they are
> indexed.
>
> However the vast majority of the data is read from disk lazily, we do not
> warm the filesystem cache or anything like that by default. We do not use
> direct I/O either. So say you run a term query, only pages that contain
> information about these particular field and value will be loaded into the
> cache.
>
> In case you want to warm the filesystem cache explicitly, which could be a
> good idea if you have plenty of filesystem cache for your index (ie. the
> unused memory of the system is larger than the index), you can look into
> using MMapDirectory.setPreload.
>
> Le ven. 17 févr. 2017 à 15:13, Chitra R  a écrit :
>
> > Hey, thank you so much. I got it.
> >
> > I have
> >
> >- 10 lakh docs, 30 fields in my index
> >- opening new searcher at initial search and
> >- there will be no filesystem cache for my current index
> >
> > At initial search, I search across only one field out of 30 fields in my
> > index.
> >
> > My question is,
> >
> > *At initial search, Whether the required page (os pages of Lucene index
> > files) for that field (a single field) will be loaded to filesystem cache
> > or all the fields info will be loaded to filesystem cache from disk?*
> >
> >
> > Regards,
> > Chitra
> >
> > On Fri, Feb 17, 2017 at 7:05 PM, Adrien Grand  wrote:
> >
> > > Regarding whether the filesystem cache helps, you could look at whether
> > > there is some disk activity while your queries are running.
> > >
> > > When everything is in the filesystem cache, the latency of search
> > requests
> > > for simple queries (term queries and combinations through boolean
> > queries)
> > > usually mostly depends on the total number of matches since Lucene
> needs
> > to
> > > call the collector on every match.
> > >
> > > Le ven. 17 févr. 2017 à 10:09, Chitra R  a
> écrit
> > :
> > >
> > > > Hi,
> > > >  While working with Searcher.Search, I have noticed a difference
> in
> > > > their performance. I have 10 lakh documents and 30 fields in my
> index.
> > I
> > > > have performed three searches using different queries in a sequential
> > > > manner. At search time, I used MMapDirectory and index is opened.
> > > >
> > > > *case1: *
> > > >
> > > >- During the first search, I ran the Query Say (new TermQuery(new
> > > >Term("name","Chitra"))) and which yields 1 lakh documents as
> result.
> > > > Time
> > > >taken for first search = 50 - 60 ms nearly.
> > > >- And for the second search, I ran the Query Say (new
> TermQuery(new
> > > >Term("animal","lion"))) which also yields 1 lakh documents as
> > result.
> > > > Time
> > > >taken for Second search = 50 - 60 ms nearly.
> > > >- And for the third search,  I ran the Query Say (new
> TermQuery(new
> > > >Term("bird","peacock"))) which also yields 1 lakh documents as
> > result.
> > > >Time taken for Third search = 50 - 60 ms nearly.
> > > >
> > > > In this case, why does searcher.search take the same search time for
> > > > different queries?
> > > >
> > > > *case2:*
> > > >
> > > > Suppose if I ran the same query twice, Searcher.search took less time
> > > than
> > > > the previous search because of os cache.
> > > >
> > > > *Based on above observation, *
> > > >
> > > > During initial search, only the required portion of index files will
> be
> > > > loaded to i/o cache. And for the next search, if the required portion
> > is
> > > > not present in os cache,
> > > >
> > > > Will it take time to read that files from disk? If so, this is the
> > reason
> > > > behind searcher.search is taking the nearly same search time for
> > > different
> > > > queries.
> > > >
> > > >
> > > > Regards,
> > > > Chitra
> > > >
> > >
> >
>


Re: Numeric Ranges Faceting

2017-02-17 Thread Chitra R
Hey,
I have indexed "author","module_id" fields as
SortedSetDocValuesFacetField and "time", "price","salary" fields as
NumericDocValuesField.

My Category looks like:

*module_id
 -> author
*price

module_id and price are parent categories. After selecting any one of the
facets from module_id, sub-category ie "author" field will be shown.

*Use-case:*

1. I have received path values from user as "module_id:1" and "price:100 TO
500" and also need to perform drillsideways search.

*initializing drilldown query*

DrillDownQuery drillDownQuery = new DrillDownQuery(facetsConfig,
> userGivenSearchQuery);
> drillDownQuery.add("module_id","1");
> drillDownQuery.add("price",NumericRangeQuery.newDoubleRange("price",
> 100.0, 200.0, range.minInclusive, range.maxInclusive));
>

* hits and facets computation*

DrillSideways sideways = new DrillSideways(searcher, facetsConfig,
> docValuesReaderState);
> DrillSideways.DrillSidewaysResult drillResult =
> sideways.search(drillDownQuery, booleanFilter, null, 10, sort, doDocScore,
> doMaxScore);
> int totalHits = drillResult.hits.totalHits();   --> it show accurate total
> hits documents
> List facetResult = drillResult.facets.getAllDims(10) -->*
> this line throws an exception.*


*Exception*


>

java.lang.IllegalArgumentException: dimension "price" was not indexed

at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts.
> getTopChildren(SortedSetDocValuesFacetCounts.java:91)

at org.apache.lucene.facet.MultiFacets.getAllDims(MultiFacets.java:74)



Am I did anything wrong???


Kindly post your suggestions.

Thanks,
Chitra



On Fri, Feb 17, 2017 at 9:11 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi, how are you instantiating your MultiFacets?  You should be passing
> e.g. a LongRangeFacetCounts instance for your "time" dimension, which
> should prevent that exception.
>
> For DrillSideways, I think you must subclass, and then override
> buildFacetResult to compute your range facets, because that class
> assumes it's either indexed facets or sorted set doc values facets.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Feb 17, 2017 at 9:14 AM, Chitra R  wrote:
> > Any suggestions Kindly help me to move forward.
> >
> > Regards,
> > Chitra
> >
> > On Wed, Feb 15, 2017 at 9:23 PM, Chitra R  wrote:
> >>
> >> Hi,
> >>   Thanks for the suggestion. But in the case of drill
> sideways
> >> search, retrieving allDimensions (using Facets.getAllDimension()) threw
> an
> >> exception which is shown below...
> >>
> >> 1. While opening DocValuesReaderState, global ordinals and ordinals
> Range
> >> map will be computed for '$facets' field only.
> >> 2. NumericDocValuesField never indexes under '$ facets' so ordinal
> >> RangeMap will be null for the numeric field ie 'time'.
> >>
> >>>> java.lang.IllegalArgumentException: dimension "time" was not indexed
> >>>>
> >>>> at
> >>>> org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts.
> getTopChildren(SortedSetDocValuesFacetCounts.java:91)
> >>>>
> >>>> t org.apache.lucene.facet.MultiFacets.getAllDims(MultiFacets.java:74)
> >>
> >>
> >> In my use case,
> >>
> >> Both string pathTraversed and Numeric PathTraversedRanges will occur.
> >> And both faceted search and drill sideways search will be used.
> >>
> >> So how can I add path-traversed numericRanges?
> >>
> >> Am I missed anything?
> >>
> >>
> >> Kindly post your suggestions.
> >>
> >>
> >> Regards,
> >> Chitra
> >>
> >> On Wed, Feb 15, 2017 at 3:28 PM, Michael McCandless
> >>  wrote:
> >>>
> >>> Hi, have a look at the RangeFacetsExample.java under the lucene/demo
> >>> module... it shows how to do this.
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>>
> >>>
> >>> On Tue, Feb 14, 2017 at 12:07 PM, Chitra R 
> wrote:
> >>> > Hi,
> >>> >We have planned to implement both string and numeric faceting
> using
> >>> > docvalues field.
> >>> >
> >>> > For string faceting, we have added pathtraversed dimensions in
> >>> > drilldownquery. But for numeric faceting , how and where can we add
> >>> > pathtraversed ranges during nextlevel faceted search.?
> >>> > And which is the better way to add pathtraversed ranges
> >>> > ( ie adding pathtraversed ranges in numericRangeQuery or
> >>> > adding pathtraversed ranges in filter).??Or Any other solution.???
> >>> >
> >>> > Thanks & Regards,
> >>> > Chitra
> >>> >
> >>> >
> >>> > Sent from my iPhone
> >>> > 
> -
> >>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>> >
> >>
> >>
> >
>


Re: Numeric Ranges Faceting

2017-02-17 Thread Chitra R
Any suggestions Kindly help me to move forward.

Regards,
Chitra

On Wed, Feb 15, 2017 at 9:23 PM, Chitra R  wrote:

> Hi,
>   Thanks for the suggestion. But in the case of drill sideways
> search, retrieving allDimensions (using Facets.getAllDimension()) threw an
> exception which is shown below...
>
> 1. While opening DocValuesReaderState, global ordinals and ordinals Range
> map will be computed for '$facets' field only.
> 2. NumericDocValuesField never indexes under '$ facets' so ordinal
> RangeMap will be null for the numeric field ie 'time'.
>
> java.lang.IllegalArgumentException: dimension "time" was not indexed
>>
>> at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts.
>>> getTopChildren(SortedSetDocValuesFacetCounts.java:91)
>>
>> t org.apache.lucene.facet.MultiFacets.getAllDims(MultiFacets.java:74)
>>
>>
> In my use case,
>
>- Both string pathTraversed and Numeric PathTraversedRanges will
>occur.
>- And both faceted search and drill sideways search will be used.
>
> So how can I add path-traversed numericRanges?
>
> Am I missed anything?
>
>
> Kindly post your suggestions.
>
>
> Regards,
> Chitra
>
> On Wed, Feb 15, 2017 at 3:28 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hi, have a look at the RangeFacetsExample.java under the lucene/demo
>> module... it shows how to do this.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Feb 14, 2017 at 12:07 PM, Chitra R  wrote:
>> > Hi,
>> >We have planned to implement both string and numeric faceting using
>> docvalues field.
>> >
>> > For string faceting, we have added pathtraversed dimensions in
>> drilldownquery. But for numeric faceting , how and where can we add
>> pathtraversed ranges during nextlevel faceted search.?
>> > And which is the better way to add pathtraversed ranges
>> > ( ie adding pathtraversed ranges in numericRangeQuery or
>> > adding pathtraversed ranges in filter).??Or Any other solution.???
>> >
>> > Thanks & Regards,
>> > Chitra
>> >
>> >
>> > Sent from my iPhone
>> > -
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>>
>
>


Re: Searcher Performance

2017-02-17 Thread Chitra R
Hey, thank you so much. I got it.

I have

   - 10 lakh docs, 30 fields in my index
   - opening new searcher at initial search and
   - there will be no filesystem cache for my current index

At initial search, I search across only one field out of 30 fields in my
index.

My question is,

*At initial search, Whether the required page (os pages of Lucene index
files) for that field (a single field) will be loaded to filesystem cache
or all the fields info will be loaded to filesystem cache from disk?*


Regards,
Chitra

On Fri, Feb 17, 2017 at 7:05 PM, Adrien Grand  wrote:

> Regarding whether the filesystem cache helps, you could look at whether
> there is some disk activity while your queries are running.
>
> When everything is in the filesystem cache, the latency of search requests
> for simple queries (term queries and combinations through boolean queries)
> usually mostly depends on the total number of matches since Lucene needs to
> call the collector on every match.
>
> Le ven. 17 févr. 2017 à 10:09, Chitra R  a écrit :
>
> > Hi,
> >  While working with Searcher.Search, I have noticed a difference in
> > their performance. I have 10 lakh documents and 30 fields in my index. I
> > have performed three searches using different queries in a sequential
> > manner. At search time, I used MMapDirectory and index is opened.
> >
> > *case1: *
> >
> >- During the first search, I ran the Query Say (new TermQuery(new
> >Term("name","Chitra"))) and which yields 1 lakh documents as result.
> > Time
> >taken for first search = 50 - 60 ms nearly.
> >- And for the second search, I ran the Query Say (new TermQuery(new
> >Term("animal","lion"))) which also yields 1 lakh documents as result.
> > Time
> >taken for Second search = 50 - 60 ms nearly.
> >- And for the third search,  I ran the Query Say (new TermQuery(new
> >Term("bird","peacock"))) which also yields 1 lakh documents as result.
> >Time taken for Third search = 50 - 60 ms nearly.
> >
> > In this case, why does searcher.search take the same search time for
> > different queries?
> >
> > *case2:*
> >
> > Suppose if I ran the same query twice, Searcher.search took less time
> than
> > the previous search because of os cache.
> >
> > *Based on above observation, *
> >
> > During initial search, only the required portion of index files will be
> > loaded to i/o cache. And for the next search, if the required portion is
> > not present in os cache,
> >
> > Will it take time to read that files from disk? If so, this is the reason
> > behind searcher.search is taking the nearly same search time for
> different
> > queries.
> >
> >
> > Regards,
> > Chitra
> >
>


Searcher Performance

2017-02-17 Thread Chitra R
Hi,
 While working with Searcher.Search, I have noticed a difference in
their performance. I have 10 lakh documents and 30 fields in my index. I
have performed three searches using different queries in a sequential
manner. At search time, I used MMapDirectory and index is opened.

*case1: *

   - During the first search, I ran the Query Say (new TermQuery(new
   Term("name","Chitra"))) and which yields 1 lakh documents as result. Time
   taken for first search = 50 - 60 ms nearly.
   - And for the second search, I ran the Query Say (new TermQuery(new
   Term("animal","lion"))) which also yields 1 lakh documents as result.  Time
   taken for Second search = 50 - 60 ms nearly.
   - And for the third search,  I ran the Query Say (new TermQuery(new
   Term("bird","peacock"))) which also yields 1 lakh documents as result.
   Time taken for Third search = 50 - 60 ms nearly.

In this case, why does searcher.search take the same search time for
different queries?

*case2:*

Suppose if I ran the same query twice, Searcher.search took less time than
the previous search because of os cache.

*Based on above observation, *

During initial search, only the required portion of index files will be
loaded to i/o cache. And for the next search, if the required portion is
not present in os cache,

Will it take time to read that files from disk? If so, this is the reason
behind searcher.search is taking the nearly same search time for different
queries.


Regards,
Chitra


Re: Numeric Ranges Faceting

2017-02-15 Thread Chitra R
Hi,
  Thanks for the suggestion. But in the case of drill sideways
search, retrieving allDimensions (using Facets.getAllDimension()) threw an
exception which is shown below...

1. While opening DocValuesReaderState, global ordinals and ordinals Range
map will be computed for '$facets' field only.
2. NumericDocValuesField never indexes under '$ facets' so ordinal RangeMap
will be null for the numeric field ie 'time'.

java.lang.IllegalArgumentException: dimension "time" was not indexed
>
> at
>> org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts.getTopChildren(SortedSetDocValuesFacetCounts.java:91)
>
> t org.apache.lucene.facet.MultiFacets.getAllDims(MultiFacets.java:74)
>
>
In my use case,

   - Both string pathTraversed and Numeric PathTraversedRanges will occur.
   - And both faceted search and drill sideways search will be used.

So how can I add path-traversed numericRanges?

Am I missed anything?


Kindly post your suggestions.


Regards,
Chitra

On Wed, Feb 15, 2017 at 3:28 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi, have a look at the RangeFacetsExample.java under the lucene/demo
> module... it shows how to do this.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Feb 14, 2017 at 12:07 PM, Chitra R  wrote:
> > Hi,
> >We have planned to implement both string and numeric faceting using
> docvalues field.
> >
> > For string faceting, we have added pathtraversed dimensions in
> drilldownquery. But for numeric faceting , how and where can we add
> pathtraversed ranges during nextlevel faceted search.?
> > And which is the better way to add pathtraversed ranges
> > ( ie adding pathtraversed ranges in numericRangeQuery or
> > adding pathtraversed ranges in filter).??Or Any other solution.???
> >
> > Thanks & Regards,
> > Chitra
> >
> >
> > Sent from my iPhone
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>


Numeric Ranges Faceting

2017-02-14 Thread Chitra R
Hi, 
   We have planned to implement both string and numeric faceting using 
docvalues field.

For string faceting, we have added pathtraversed dimensions in drilldownquery. 
But for numeric faceting , how and where can we add pathtraversed ranges during 
nextlevel faceted search.?
And which is the better way to add pathtraversed ranges 
( ie adding pathtraversed ranges in numericRangeQuery or
adding pathtraversed ranges in filter).??Or Any other solution.???

Thanks & Regards,
Chitra


Sent from my iPhone
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



DocValues statistics helper

2017-02-09 Thread Chitra R
Hi,
  Great to see DocValuesStatsCollector is available in Lucene 6.4. It
helps to compute statistics on DocValues Field. Hope, it has been achieved
using GlobalOrdinalsCollector.

My doubt is

For statistics,
>
Global ordinals are computed only for query matching ordinals of a specific
> field which map per-segment ordinals to/from global ordinal space.
>

Thanks,
Chitra


Re: Maintain SortedSetDocValuesReaderState in Cache

2017-02-07 Thread Chitra R
Hi,
  How can I improve the performance of SortedSetDocValuesReaderState
which is responsible for computing '$facets' global ordinals at search time?

Is there any better way to warm up this state instead of putting it in
cache?

Any help is much appreciated.

Thanks & Regards,
Chitra


On Fri, Feb 3, 2017 at 11:32 AM, Chitra R  wrote:

> Hi,
>
>   We are going to implement faceted search (for both string & numeric)
> based on DocValuesField. For better performance, we decided to put the
> SortedSetDocValuesReaderState in cache ie
>
> *During first faceted search - new DocValuesReaderState will be opened
> and "$facets" values will be loaded from index and then we put that state
> in cache using ConcurrentHashMap
>
> *For next faceted search - instead of opening new DocValuesReaderState,
> we served the DocValuesReaderState from the ConcurrentHashMap.
>
> we will maintain the DocValuesReaderState in concurrentHashMap till 5
> minutes. After 5 minutes, an Old DocValuesReaderState will be removed from
> the ConcurrentHashMap.
>
> And for every search calls,
>
> DocValuesReaderState will be taken from cache like
>
> SortedSetDocValuesReaderStatereaderState  oldReaderState = new
> DefaultSortedSetDocValuesReaderState(oldIndexSearcher.getIndexReader());
>
> readerStateCacheMap.put(key, oldReaderState);
>
> DocValuesReaderState will be maintained in  readerStateCacheMap till 5
> minutes
>
>  TopDocs will be computed using new IndexSearcher as
> FacetsCollector.search(newIndexSearcher, drillDownQuery, boolFilter,
> noofdocs, facetsCollector);
>
> After computing TopDocs, facets will be estimated from using old
> SortedSetDocValuesReaderState ad
> SortedSetDocValuesFacetCounts facetCount = new
> SortedSetDocValuesFacetCounts(oldReaderState, facetsCollector);
>
>
> 1.So, is there any chance to get conflicts during topDocs search or
> facets estimation (because of old indexSearcher in DocValuesReaderState and
> newIndexSearcher in topDocs computation)?
> 2.   How can we maintain SortedSetDocValuesReaderState in cache?
>
>
> Kindly post your suggestions.
>
> Thanks & Regards,
> Chitra
>
>


Maintain SortedSetDocValuesReaderState in Cache

2017-02-02 Thread Chitra R
Hi,

  We are going to implement faceted search (for both string & numeric)
based on DocValuesField. For better performance, we decided to put the
SortedSetDocValuesReaderState in cache ie

*During first faceted search - new DocValuesReaderState will be opened and
"$facets" values will be loaded from index and then we put that state in
cache using ConcurrentHashMap

*For next faceted search - instead of opening new DocValuesReaderState, we
served the DocValuesReaderState from the ConcurrentHashMap.

we will maintain the DocValuesReaderState in concurrentHashMap till 5
minutes. After 5 minutes, an Old DocValuesReaderState will be removed from
the ConcurrentHashMap.

And for every search calls,

DocValuesReaderState will be taken from cache like

SortedSetDocValuesReaderStatereaderState  oldReaderState = new
DefaultSortedSetDocValuesReaderState(oldIndexSearcher.getIndexReader());

readerStateCacheMap.put(key, oldReaderState);

DocValuesReaderState will be maintained in  readerStateCacheMap till 5
minutes

 TopDocs will be computed using new IndexSearcher as
FacetsCollector.search(newIndexSearcher, drillDownQuery, boolFilter,
noofdocs, facetsCollector);

After computing TopDocs, facets will be estimated from using old
SortedSetDocValuesReaderState ad
SortedSetDocValuesFacetCounts facetCount = new
SortedSetDocValuesFacetCounts(oldReaderState, facetsCollector);


1.So, is there any chance to get conflicts during topDocs search or
facets estimation (because of old indexSearcher in DocValuesReaderState and
newIndexSearcher in topDocs computation)?
2.   How can we maintain SortedSetDocValuesReaderState in cache?


Kindly post your suggestions.

Thanks & Regards,
Chitra


Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-30 Thread Chitra R
Thank you so much, Shai...

Chitra

On Wed, Nov 30, 2016 at 2:17 PM, Shai Erera  wrote:

> This feature is not available in Lucene currently, but it shouldn't be hard
> to add it. See Mike's comment here:
> http://blog.mikemccandless.com/2013/05/dynamic-faceting-
> with-lucene.html?showComment=1412777154420#c363162440067733144
>
> One more tricky (yet nicer) feature would be to have it all in one go, i.e.
> you'd say something like "facet on field price" and you'd get "interesting"
> buckets, per the variance in the results.
>
> But before that, we could have a StatsFacets in Lucene which provide some
> statistics about a numeric field (min/max/avg etc.).
>
> On Wed, Nov 30, 2016 at 7:50 AM Chitra R  wrote:
>
> > Thank you so much, mike... Hope, gained a lot of stuff on Doc
> > Values faceting and also clarified all my doubts. Thanks..!!
> >
> >
> > *Another use case:*
> >
> > After getting matching documents for the given query, Is there any way to
> > calculate mix and max values on NumericDocValuesField ( say date field)?
> >
> >
> > I would like to implement it in numeric range faceting by splitting the
> > numeric values (getting from resulted documents) into ranges.
> >
> >
> > Chitra
> >
> >
> > On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> > > Doc values fields are never loaded into memory; at most some small
> > > index structures are.
> > >
> > > When you use those fields, the bytes (for just the one doc values
> > > field you are using) are pulled from disk, and the OS will cache them
> > > in memory if available.
> > >
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com
> > >
> > >
> > > On Mon, Nov 28, 2016 at 6:01 AM, Chitra R 
> wrote:
> > > > Hi,
> > > >  When opening SortedSetDocValuesReaderState at search time,
> > > whether
> > > > the whole doc value files (.dvd & .dvm) information are loaded in
> > memory
> > > or
> > > > specified field information(say $facets field) alone load in memory?
> > > >
> > > >
> > > >
> > > >
> > > > Any help is much appreciated.
> > > >
> > > >
> > > > Regards,
> > > > Chitra
> > > >
> > > > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R 
> > wrote:
> > > >>
> > > >>
> > > >> Kindly post your suggestions.
> > > >>
> > > >> Regards,
> > > >> Chitra
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R 
> > > wrote:
> > > >>>
> > > >>> Hey, I got it clearly. Thank you so much. Could you please help us
> to
> > > >>> implement it in our use case?
> > > >>>
> > > >>>
> > > >>> In our case, we are having dynamic index and it is variable depth
> > too.
> > > So
> > > >>> flat facet is enough.No need of hierarchical facets.
> > > >>>
> > > >>> What I think is,
> > > >>>
> > > >>> Index my facet field as normal doc value field, so that no special
> > > >>> operation (like taxonomy and sorted set doc values facet field)
> will
> > > be done
> > > >>> at index time and only doc value field stores its ordinals in their
> > > >>> respective field.
> > > >>> At search time, I will pass query (user search query) , filter
> (path
> > > >>> traversed list)  and collect the matching documents in
> > Facetscollector.
> > > >>> To compute facet count for the specific field, I will gather those
> > > >>>

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-29 Thread Chitra R
Thank you so much, mike... Hope, gained a lot of stuff on Doc
Values faceting and also clarified all my doubts. Thanks..!!


*Another use case:*

After getting matching documents for the given query, Is there any way to
calculate mix and max values on NumericDocValuesField ( say date field)?


I would like to implement it in numeric range faceting by splitting the
numeric values (getting from resulted documents) into ranges.


Chitra


On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Doc values fields are never loaded into memory; at most some small
> index structures are.
>
> When you use those fields, the bytes (for just the one doc values
> field you are using) are pulled from disk, and the OS will cache them
> in memory if available.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 28, 2016 at 6:01 AM, Chitra R  wrote:
> > Hi,
> >  When opening SortedSetDocValuesReaderState at search time,
> whether
> > the whole doc value files (.dvd & .dvm) information are loaded in memory
> or
> > specified field information(say $facets field) alone load in memory?
> >
> >
> >
> >
> > Any help is much appreciated.
> >
> >
> > Regards,
> > Chitra
> >
> > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R  wrote:
> >>
> >>
> >> Kindly post your suggestions.
> >>
> >> Regards,
> >> Chitra
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R 
> wrote:
> >>>
> >>> Hey, I got it clearly. Thank you so much. Could you please help us to
> >>> implement it in our use case?
> >>>
> >>>
> >>> In our case, we are having dynamic index and it is variable depth too.
> So
> >>> flat facet is enough.No need of hierarchical facets.
> >>>
> >>> What I think is,
> >>>
> >>> Index my facet field as normal doc value field, so that no special
> >>> operation (like taxonomy and sorted set doc values facet field) will
> be done
> >>> at index time and only doc value field stores its ordinals in their
> >>> respective field.
> >>> At search time, I will pass query (user search query) , filter (path
> >>> traversed list)  and collect the matching documents in Facetscollector.
> >>> To compute facet count for the specific field, I will gather those
> >>> resulted docs, then move through each segment for collecting the
> matching
> >>> ordinals using AtomicReader.
> >>>
> >>>
> >>> And know when I use this means, can't calculate facet count for more
> than
> >>> one field(facet) in a search.
> >>>
> >>> Instead of loading all the dimensions in DocValuesReaderState (will
> take
> >>> more time and memory) at search time, loading specific fields will
> take less
> >>> time and memory, hope so. Kindly help to solve.
> >>>
> >>>
> >>> It will do it in a minimal index and search cost, I think. And hope
> this
> >>> won't put overload at index time, also at search time this will be
> better.
> >>>
> >>>
> >>> Kindly post your suggestions.
> >>>
> >>>
> >>> Regards,
> >>> Chitra
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless
> >>>  wrote:
> >>>>
> >>>> I think you've summed up exactly the differences!
> >>>>
> >>>> And, yes, it would be possible to emulate hierarchical facets on top
> >>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
> >>>>
> >>>> But if it's variable depth, it's trickier (but I think still
> >>>> possible).  See e.g. the Committed Paths drill-down on the left, on
> >>>> our dog-food server
> >>>> http://jirasearch.mikemccandless.com/search.py?index=jira
> >>>>
> >>>> Mike McCandless
> >>>>
> >>>> http://blog.mikemccandless.com

DocValues Field - Memory consumption

2016-11-28 Thread Chitra R
Hi,

 I would like to enable doc values on all fields that I need to
sort or aggregate on.  At search time, I am performing sort for a single
field, whether the whole doc value files (.dvd & .dvm) information are loaded
in memory or a particular field information from that file(say a single
field which is used for sorting) load in memory?



Any help is much appreciated.


Regards,
Chitra


Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-28 Thread Chitra R
Hi,
 When opening SortedSetDocValuesReaderState at search time, whether
the whole doc value files (.dvd & .dvm) information are loaded in memory or
specified field information(say $facets field) alone load in memory?




Any help is much appreciated.


Regards,
Chitra

On Tue, Nov 22, 2016 at 5:47 PM, Chitra R  wrote:

>
> Kindly post your suggestions.
>
> Regards,
> Chitra
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R  wrote:
>
>> Hey, I got it clearly. Thank you so much. Could you please help us to
>> implement it in our use case?
>>
>>
>> In our case, we are having dynamic index and it is variable depth too. So
>> flat facet is enough.No need of hierarchical facets.
>>
>> What I think is,
>>
>>
>>1. Index my facet field as normal doc value field, so that no special
>>operation (like taxonomy and sorted set doc values facet field) will be
>>done at index time and only doc value field stores its ordinals in their
>>respective field.
>>2. At search time, I will pass query (user search query) , filter
>>(path traversed list)  and collect the matching documents in
>>Facetscollector.
>>
>>3. To compute facet count for the specific field, I will gather those
>>resulted docs, then move through each segment for collecting the matching
>>ordinals using AtomicReader.
>>
>>
>> And know when I use this means, can't calculate facet count for more than
>> one field(facet) in a search.
>>
>> Instead of loading all the dimensions in DocValuesReaderState (will take
>> more time and memory) at search time, loading specific fields will take
>> less time and memory, hope so. Kindly help to solve.
>>
>>
>> It will do it in a minimal index and search cost, I think. And hope this
>> won't put overload at index time, also at search time this will be better.
>>
>>
>> Kindly post your suggestions.
>>
>>
>> Regards,
>> Chitra
>>
>>
>>
>>
>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> I think you've summed up exactly the differences!
>>>
>>> And, yes, it would be possible to emulate hierarchical facets on top
>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
>>>
>>> But if it's variable depth, it's trickier (but I think still
>>> possible).  See e.g. the Committed Paths drill-down on the left, on
>>> our dog-food server
>>> http://jirasearch.mikemccandless.com/search.py?index=jira
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R  wrote:
>>> > case 1:
>>> > In taxonomy, for each indexed document, examines facet label ,
>>> > computes their ordinals and mappings, and which will be stored in
>>> sidecar
>>> > index at index time.
>>> >
>>> > case 2:
>>> > In doc values, these(ordinals) are computed at search time, so
>>> there
>>> > will be a time and memory trade-off between both cases, hope so.
>>> >
>>> >
>>> > In taxonomy, building hierarchical facets at index time makes faceting
>>> cost
>>> > minimal at search time than flat facets in doc values.
>>> >
>>> > Except (memory,time and NRT latency) , Is any another contrast between
>>> > hierarchical and flat facets at search time?
>>> >
>>> >
>>> > Kindly post your suggestions...
>>> >
>>> >
>>> > Regards,
>>> > Chitra
>>> >
>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R 
>>> wrote:
>>> >>
>>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>>> >> facets during indexing. Hope hierarchical in the sense, we might
>>> index the
>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
>>> maintained
>>> >> in sidecar index and it is mapped to the main index.
>>> >>
>>> >> For example:
>>> >>
>>> >> In search-lucene.com , I enter a term (say facet),
&

Min/Max values on DocValuesField

2016-11-25 Thread Chitra R
Hi,
For matching documents, Is there any way to calculate mix and max
values on NumericDocValuesField ( say date field)?

*Case:*

I would like to implement it in numeric range faceting by splitting the
numeric values( from result documents) into ranges.



Kindly help me to move forward.


Regards,
Chitra


Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-22 Thread Chitra R
Kindly post your suggestions.

Regards,
Chitra






























On Sat, Nov 19, 2016 at 1:38 PM, Chitra R  wrote:

> Hey, I got it clearly. Thank you so much. Could you please help us to
> implement it in our use case?
>
>
> In our case, we are having dynamic index and it is variable depth too. So
> flat facet is enough.No need of hierarchical facets.
>
> What I think is,
>
>
>1. Index my facet field as normal doc value field, so that no special
>operation (like taxonomy and sorted set doc values facet field) will be
>done at index time and only doc value field stores its ordinals in their
>respective field.
>2. At search time, I will pass query (user search query) , filter
>(path traversed list)  and collect the matching documents in
>Facetscollector.
>
>3. To compute facet count for the specific field, I will gather those
>resulted docs, then move through each segment for collecting the matching
>ordinals using AtomicReader.
>
>
> And know when I use this means, can't calculate facet count for more than
> one field(facet) in a search.
>
> Instead of loading all the dimensions in DocValuesReaderState (will take
> more time and memory) at search time, loading specific fields will take
> less time and memory, hope so. Kindly help to solve.
>
>
> It will do it in a minimal index and search cost, I think. And hope this
> won't put overload at index time, also at search time this will be better.
>
>
> Kindly post your suggestions.
>
>
> Regards,
> Chitra
>
>
>
>
> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> I think you've summed up exactly the differences!
>>
>> And, yes, it would be possible to emulate hierarchical facets on top
>> of flat facets, if the hierarchy is fixed depth like year/month/day.
>>
>> But if it's variable depth, it's trickier (but I think still
>> possible).  See e.g. the Committed Paths drill-down on the left, on
>> our dog-food server
>> http://jirasearch.mikemccandless.com/search.py?index=jira
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R  wrote:
>> > case 1:
>> > In taxonomy, for each indexed document, examines facet label ,
>> > computes their ordinals and mappings, and which will be stored in
>> sidecar
>> > index at index time.
>> >
>> > case 2:
>> > In doc values, these(ordinals) are computed at search time, so
>> there
>> > will be a time and memory trade-off between both cases, hope so.
>> >
>> >
>> > In taxonomy, building hierarchical facets at index time makes faceting
>> cost
>> > minimal at search time than flat facets in doc values.
>> >
>> > Except (memory,time and NRT latency) , Is any another contrast between
>> > hierarchical and flat facets at search time?
>> >
>> >
>> > Kindly post your suggestions...
>> >
>> >
>> > Regards,
>> > Chitra
>> >
>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R 
>> wrote:
>> >>
>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>> >> facets during indexing. Hope hierarchical in the sense, we might index
>> the
>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
>> maintained
>> >> in sidecar index and it is mapped to the main index.
>> >>
>> >> For example:
>> >>
>> >> In search-lucene.com , I enter a term (say facet), top
>> >> documents and their categories are displayed after performing the
>> search.
>> >> Say I drill down through Publish date/2010 to collect its child counts
>> and
>> >> after I will pass through publishdate/2010/10 to collect their child
>> counts.
>> >> And for each drill down, each search will be performed to collect its
>> top
>> >> docs and categories.
>> >>
>> >>
>> >>Even I can achieve this in flat facets by changing the
>> >> drill down query.
>> >>
>> >> Am I right or missed anything? yet I don't know if I missed anything...
>> >>
>> >> So What is the need of hierarchical facets? Could you please explain
>> >> it(hierarchical facets) 

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-19 Thread Chitra R
Hey, I got it clearly. Thank you so much. Could you please help us to
implement it in our use case?


In our case, we are having dynamic index and it is variable depth too. So
flat facet is enough.No need of hierarchical facets.

What I think is,


   1. Index my facet field as normal doc value field, so that no special
   operation (like taxonomy and sorted set doc values facet field) will be
   done at index time and only doc value field stores its ordinals in their
   respective field.
   2. At search time, I will pass query (user search query) , filter (path
   traversed list)  and collect the matching documents in Facetscollector.


   3. To compute facet count for the specific field, I will gather those
   resulted docs, then move through each segment for collecting the matching
   ordinals using AtomicReader.


And know when I use this means, can't calculate facet count for more than
one field(facet) in a search.

Instead of loading all the dimensions in DocValuesReaderState (will take
more time and memory) at search time, loading specific fields will take
less time and memory, hope so. Kindly help to solve.


It will do it in a minimal index and search cost, I think. And hope this
won't put overload at index time, also at search time this will be better.


Kindly post your suggestions.


Regards,
Chitra




On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> I think you've summed up exactly the differences!
>
> And, yes, it would be possible to emulate hierarchical facets on top
> of flat facets, if the hierarchy is fixed depth like year/month/day.
>
> But if it's variable depth, it's trickier (but I think still
> possible).  See e.g. the Committed Paths drill-down on the left, on
> our dog-food server
> http://jirasearch.mikemccandless.com/search.py?index=jira
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R  wrote:
> > case 1:
> > In taxonomy, for each indexed document, examines facet label ,
> > computes their ordinals and mappings, and which will be stored in sidecar
> > index at index time.
> >
> > case 2:
> > In doc values, these(ordinals) are computed at search time, so
> there
> > will be a time and memory trade-off between both cases, hope so.
> >
> >
> > In taxonomy, building hierarchical facets at index time makes faceting
> cost
> > minimal at search time than flat facets in doc values.
> >
> > Except (memory,time and NRT latency) , Is any another contrast between
> > hierarchical and flat facets at search time?
> >
> >
> > Kindly post your suggestions...
> >
> >
> > Regards,
> > Chitra
> >
> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R  wrote:
> >>
> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
> >> facets during indexing. Hope hierarchical in the sense, we might index
> the
> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
> maintained
> >> in sidecar index and it is mapped to the main index.
> >>
> >> For example:
> >>
> >> In search-lucene.com , I enter a term (say facet), top
> >> documents and their categories are displayed after performing the
> search.
> >> Say I drill down through Publish date/2010 to collect its child counts
> and
> >> after I will pass through publishdate/2010/10 to collect their child
> counts.
> >> And for each drill down, each search will be performed to collect its
> top
> >> docs and categories.
> >>
> >>
> >>Even I can achieve this in flat facets by changing the
> >> drill down query.
> >>
> >> Am I right or missed anything? yet I don't know if I missed anything...
> >>
> >> So What is the need of hierarchical facets? Could you please explain
> >> it(hierarchical facets) in the real-world use case?
> >>
> >>
> >> Regards,
> >> Chitra
> >>
> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
> >>  wrote:
> >>>
> >>> You store dimension + string (a single value path, since it's not
> >>> hierarchical) into SSDVFF so that you can compute facet counts, either
> >>> ordinary drill down counts or the drill sideways counts.
> >>>
> >>> You can see examples of drill sideways at
> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
> >>> fields o

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-17 Thread Chitra R
case 1:
In taxonomy, for each indexed document, examines facet label ,
computes their ordinals and mappings, and which will be stored in sidecar
index at index time.

case 2:
In doc values, these(ordinals) are computed at search time, so
there will be a time and memory trade-off between both cases, hope so.


In taxonomy, building hierarchical facets at index time makes faceting cost
minimal at search time than flat facets in doc values.

Except (memory,time and NRT latency) , Is any another contrast between
hierarchical and flat facets at search time?


Kindly post your suggestions...


Regards,
Chitra

On Thu, Nov 17, 2016 at 6:40 PM, Chitra R  wrote:

> Okay. I agree with you, Taxonomy maintains and supports hierarchical
> facets during indexing. Hope hierarchical in the sense, we might index the 
> field
> Publish date : 2010/10/15 as Publish date: 2010 , Publish date: 2010/10
> and Publish date: 2010/10/15 , their facet ordinals are maintained in
> sidecar index and it is mapped to the main index.
>
> For example:
>
> In search-lucene.com , I enter a term (say facet), top
> documents and their categories are displayed after performing the search.
> Say I drill down through Publish date/2010 to collect its child counts and
> after I will pass through publishdate/2010/10 to collect their child
> counts. And for each drill down, each search will be performed to collect
> its top docs and categories.
>
>
>*Even I can achieve this in flat facets by changing the
> drill down query. *
>
> Am I right or missed anything? yet I don't know if I missed anything...
>
> So What is the need of hierarchical facets? Could you please explain
> it(hierarchical facets) in the real-world use case?
>
>
> Regards,
> Chitra
>
> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> You store dimension + string (a single value path, since it's not
>> hierarchical) into SSDVFF so that you can compute facet counts, either
>> ordinary drill down counts or the drill sideways counts.
>>
>> You can see examples of drill sideways at
>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
>> fields on the left and you don't lose the previous facet counts for
>> that field.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R  wrote:
>> > Hi,
>> >
>> > Lucene-Drill sideways
>> >
>> > jira_issue:LUCENE-4748
>> >
>> >  Is this the reason( ie Drill sideways
>> makes
>> > a very nice faceted search UI because we
>> > don't "lose" the facet counts after drilling in) behind storing path and
>> > dimension for the given SSDVF field? Else anything?
>> >
>> > Regards,
>> > Chitra
>> >
>> >
>> >  Hey, thank you so much for the fast response, I agree NRT refresh
>> is
>> > somewhat costly operations and this is the major pitfall, suppose we
>> use doc
>> > value faceting.
>> >
>> >
>> >  While indexing SortedSetDocValuesFacetField , it stores
>> > path and dimension of the given field internally. So Can we achieve
>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing path
>> and
>> > dimension is to achieve hierarchical facets. If yes (ie we can achieve
>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>> >  Else I missed anything?
>> >
>> >
>> >  What is the real purpose to store path and dimension in
>> > SSDVF field?
>> >
>> >
>> > Kindly post your suggestions.
>> >
>> > Regards,
>> > Chitra
>> >
>> >
>> >
>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>> >  wrote:
>> >>
>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R 
>> wrote:
>> >>
>> >> > i)Hope, when opening SortedSetDocValuesReaderState , we are
>> >> > calculating ordinals( this will be used to calculate facet count )
>> for
>> >> > doc
>> >> > values field and this only made the state instance somewhat costly.
>> >> >   Am I right or any other reason behind that?
>> >>
>> >> That's correct.  It adds some latency to an NRT refresh, and some heap
>> >> used to hold the ordinal mappings.
>> >>
>> >> >  ii) During indexing, we are providing facet ordinals in each
>> >> > doc
>> >> > and I think it will be useful in search side, to calculate facet
>> counts
>> >> > only for matching docs.  otherwise, it carries any other benefits?
>> >>
>> >> Well, compared to the taxonomy facets, SSDV facets don't require a
>> >> separate index.
>> >>
>> >> But they add latency/heap usage, and they cannot do hierarchical
>> >> facets yet (though this could be fixed if someone just built it).
>> >>
>> >> >  iii) Is SortedSetDocValuesReaderState thread-safe (ie)
>> multiple
>> >> > threads can call this method concurrently?
>> >>
>> >> Yes.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >
>> >
>>
>
>


Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-17 Thread Chitra R
Okay. I agree with you, Taxonomy maintains and supports hierarchical facets
during indexing. Hope hierarchical in the sense, we might index the field
Publish date : 2010/10/15 as Publish date: 2010 , Publish date: 2010/10
and Publish date: 2010/10/15 , their facet ordinals are maintained in
sidecar index and it is mapped to the main index.

For example:

In search-lucene.com , I enter a term (say facet), top
documents and their categories are displayed after performing the search.
Say I drill down through Publish date/2010 to collect its child counts and
after I will pass through publishdate/2010/10 to collect their child
counts. And for each drill down, each search will be performed to collect
its top docs and categories.


   *Even I can achieve this in flat facets by changing the
drill down query. *

Am I right or missed anything? yet I don't know if I missed anything...

So What is the need of hierarchical facets? Could you please explain
it(hierarchical facets) in the real-world use case?


Regards,
Chitra

On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> You store dimension + string (a single value path, since it's not
> hierarchical) into SSDVFF so that you can compute facet counts, either
> ordinary drill down counts or the drill sideways counts.
>
> You can see examples of drill sideways at
> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
> fields on the left and you don't lose the previous facet counts for
> that field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R  wrote:
> > Hi,
> >
> > Lucene-Drill sideways
> >
> > jira_issue:LUCENE-4748
> >
> >  Is this the reason( ie Drill sideways
> makes
> > a very nice faceted search UI because we
> > don't "lose" the facet counts after drilling in) behind storing path and
> > dimension for the given SSDVF field? Else anything?
> >
> > Regards,
> > Chitra
> >
> >
> >  Hey, thank you so much for the fast response, I agree NRT refresh is
> > somewhat costly operations and this is the major pitfall, suppose we use
> doc
> > value faceting.
> >
> >
> >  While indexing SortedSetDocValuesFacetField , it stores
> > path and dimension of the given field internally. So Can we achieve
> > hierarchical facets using DrillDownQuery? Hope, purpose of storing path
> and
> > dimension is to achieve hierarchical facets. If yes (ie we can achieve
> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
> >  Else I missed anything?
> >
> >
> >  What is the real purpose to store path and dimension in
> > SSDVF field?
> >
> >
> > Kindly post your suggestions.
> >
> > Regards,
> > Chitra
> >
> >
> >
> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> >  wrote:
> >>
> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R 
> wrote:
> >>
> >> > i)Hope, when opening SortedSetDocValuesReaderState , we are
> >> > calculating ordinals( this will be used to calculate facet count ) for
> >> > doc
> >> > values field and this only made the state instance somewhat costly.
> >> >   Am I right or any other reason behind that?
> >>
> >> That's correct.  It adds some latency to an NRT refresh, and some heap
> >> used to hold the ordinal mappings.
> >>
> >> >  ii) During indexing, we are providing facet ordinals in each
> >> > doc
> >> > and I think it will be useful in search side, to calculate facet
> counts
> >> > only for matching docs.  otherwise, it carries any other benefits?
> >>
> >> Well, compared to the taxonomy facets, SSDV facets don't require a
> >> separate index.
> >>
> >> But they add latency/heap usage, and they cannot do hierarchical
> >> facets yet (though this could be fixed if someone just built it).
> >>
> >> >  iii) Is SortedSetDocValuesReaderState thread-safe (ie)
> multiple
> >> > threads can call this method concurrently?
> >>
> >> Yes.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >
> >
>


Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-16 Thread Chitra R
Hi,

Lucene-Drill sideways
<http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html>

jira_issue:LUCENE-4748 <https://issues.apache.org/jira/browse/LUCENE-4748>

 Is this the reason( ie Drill sideways
makes a very nice faceted search UI because we
don't "lose" the facet counts after drilling in) behind storing path and
dimension for the given SSDVF field? Else anything?

Regards,
Chitra

 Hey, thank you so much for the fast response, I agree NRT refresh is
somewhat costly operations and this is the major pitfall, suppose we use
doc value faceting.


 While indexing SortedSetDocValuesFacetField , it stores
path and dimension of the given field internally. So Can we achieve
hierarchical facets using DrillDownQuery? Hope, purpose of storing path and
dimension is to achieve hierarchical facets. If yes (ie we can achieve
hierarchy in SSDVFF) , so what is the need to move over taxonomy?
 Else I missed anything?


 What is the real purpose to store path and dimension in
SSDVF field?


Kindly post your suggestions.

Regards,
Chitra



On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R  wrote:
>
> > i)Hope, when opening SortedSetDocValuesReaderState , we are
> > calculating ordinals( this will be used to calculate facet count ) for
> doc
> > values field and this only made the state instance somewhat costly.
> >   Am I right or any other reason behind that?
>
> That's correct.  It adds some latency to an NRT refresh, and some heap
> used to hold the ordinal mappings.
>
> >  ii) During indexing, we are providing facet ordinals in each doc
> > and I think it will be useful in search side, to calculate facet counts
> > only for matching docs.  otherwise, it carries any other benefits?
>
> Well, compared to the taxonomy facets, SSDV facets don't require a
> separate index.
>
> But they add latency/heap usage, and they cannot do hierarchical
> facets yet (though this could be fixed if someone just built it).
>
> >  iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
> > threads can call this method concurrently?
>
> Yes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>


Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-13 Thread Chitra R
 Hey, thank you so much for the fast response, I agree NRT refresh is
somewhat costly operations and this is the major pitfall, suppose we use
doc value faceting.


 While indexing SortedSetDocValuesFacetField , it stores
path and dimension of the given field internally. So Can we achieve
hierarchical facets using DrillDownQuery? Hope, purpose of storing path and
dimension is to achieve hierarchical facets. If yes (ie we can achieve
hierarchy in SSDVFF) , so what is the need to move over taxonomy?
 Else I missed anything?


 What is the real purpose to store path and dimension in
SSDVF field?


Kindly post your suggestions.

Regards,
Chitra



On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R  wrote:
>
> > i)Hope, when opening SortedSetDocValuesReaderState , we are
> > calculating ordinals( this will be used to calculate facet count ) for
> doc
> > values field and this only made the state instance somewhat costly.
> >   Am I right or any other reason behind that?
>
> That's correct.  It adds some latency to an NRT refresh, and some heap
> used to hold the ordinal mappings.
>
> >  ii) During indexing, we are providing facet ordinals in each doc
> > and I think it will be useful in search side, to calculate facet counts
> > only for matching docs.  otherwise, it carries any other benefits?
>
> Well, compared to the taxonomy facets, SSDV facets don't require a
> separate index.
>
> But they add latency/heap usage, and they cannot do hierarchical
> facets yet (though this could be fixed if someone just built it).
>
> >  iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
> > threads can call this method concurrently?
>
> Yes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>


Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-11 Thread Chitra R
Hi Shai,

i)Hope, when opening SortedSetDocValuesReaderState , we are
calculating ordinals( this will be used to calculate facet count ) for doc
values field and this only made the state instance somewhat costly.
  Am I right or any other reason behind that?



 ii) During indexing, we are providing facet ordinals in each doc
and I think it will be useful in search side, to calculate facet counts
only for matching docs.  otherwise, it carries any other benefits?


 iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
threads can call this method concurrently?


Kindly post your suggestions.


Thanks,

Chitra


On Thu, Nov 10, 2016 at 4:34 PM, Shai Erera  wrote:

> Hi
>
> The reason IMO is historic - ES and Solr had faceting solutions before
> Lucene had it. There were discussions in the past about using the Lucene
> faceting module in Solr (can't tell for ES) but, sadly, I can't say I see
> it happening at this point.
>
> Regarding your other question, IMO the Lucene faceting engine, in terms of
> performance and customizability, is on par with Solr/ES. However, it lacks
> distributed faceting support and aggregations. Since many people use
> Solr/ES and not Lucene directly, the Solr/ES faceting module continues to
> advance separately from the Lucene one.
>
> Enhancing Lucene facets with aggregations and even distributed faceting
> capabilities is mostly a matter of time and priorities. If you're
> interested in it, I'd be willing to collaborate with you on that as much as
> I can!
>
> And I'd still hope that this work finds its way into Solr/ES, as I think
> it's silly to have that many number of faceting implementations, where they
> all rely on the same low-level data structure - Lucene!
>
> Shai
>
>
> On Thu, Nov 10, 2016 at 12:32 PM Kumaran Ramasubramanian <
> kums@gmail.com>
> wrote:
>
> > Hi All,
> > We all know that Lucene supports faceting by providing
> > Taxonomy(Separate index and hierarchical facets) and
> > SortedSetDocValuesFacetField ( flat facets and no sidecar index).
> >
> >   Then why did solr and elastic search go for its own implementation
> ?
> >  ( that is, solr uses block join & elasticsearch uses aggregations ) Is
> > there any limitations in lucene's implementation ?
> >
> >
> > --
> > Kumaran R
> >
>


Re: Faceting: Taxonomy index Vs SortedSetDocValues

2016-09-26 Thread Chitra R
Hi,

  Kindly post your suggestions..

Chitra

On Mon, Sep 26, 2016 at 3:48 PM, Chitra R  wrote:

> Hi,
>
> Issues(LUCENE-4795):  Add FacetsCollector based on SortedSetDocValues
>
> https://issues.apache.org/jira/browse/LUCENE-4795
>
>
> i) In the above discussion, mentioned that there is no need to maintain
> sidecar index to collect facets & its count (FacetsCollector) and even we
> can achieve it in flat index using SortedSetDocValues...  Then what is the
> main benefits of using Sidecar or Taxonomy index?
>
> ii)And i tried to achieve multilevel (hierarchical) categorization using
> SortedSetDocValues and got it simply by changing the query  and opening the
> IndexReader for each level of query using SortedSetDocValuesReaderState..
>
>
> And i know, in SortedSetDocValuesFacetField
>>
>>
>>- Faceting is a bit slower  (~25%), and there is added cost on
>>every IndexReader open to create a new SortedSetDocValuesReaderState.
>>
>>
>>- does not support hierarchical facets
>>
>>
>
>
> 1. what are the functionalities we will be missing when we use
> SortedSetDocValues for faceting? what is hierarchical facets? what can we
> achieve using hierarchical facets?
>
>
> 2. Except faster faceting and supporting hierarchical facets, is there any
> benefit of using taxonomy index over docvalues field for faceting?
>
>
> Any ideas/help/recommendations greatly appreciated..
>
>
> Regards,
> Chitra
>
>
>
>
>
>
>


Faceting: Taxonomy index Vs SortedSetDocValues

2016-09-26 Thread Chitra R
Hi,

Issues(LUCENE-4795):  Add FacetsCollector based on SortedSetDocValues

https://issues.apache.org/jira/browse/LUCENE-4795


i) In the above discussion, mentioned that there is no need to maintain
sidecar index to collect facets & its count (FacetsCollector) and even we
can achieve it in flat index using SortedSetDocValues...  Then what is the
main benefits of using Sidecar or Taxonomy index?

ii)And i tried to achieve multilevel (hierarchical) categorization using
SortedSetDocValues and got it simply by changing the query  and opening the
IndexReader for each level of query using SortedSetDocValuesReaderState..


And i know, in SortedSetDocValuesFacetField
>
>
>- Faceting is a bit slower  (~25%), and there is added cost on
>every IndexReader open to create a new SortedSetDocValuesReaderState.
>
>
>- does not support hierarchical facets
>
>


1. what are the functionalities we will be missing when we use
SortedSetDocValues for faceting? what is hierarchical facets? what can we
achieve using hierarchical facets?


2. Except faster faceting and supporting hierarchical facets, is there any
benefit of using taxonomy index over docvalues field for faceting?


Any ideas/help/recommendations greatly appreciated..


Regards,
Chitra


Delete a specific term or facet mapping from lucene main index which refers to categories in taxonomy

2016-08-01 Thread Chitra R
Hi,
I am using Lucene 4.10.4. Is there any way to delete a term or
facetfield mapping from lucene main index which is used to refer the
categories in taxonomy. And i know once a category is added to the
taxonomy, it can never be deleted. So i am trying to remove that facet map
from main index.

Thanks in advance.

Regards,
Chitra


Re: Drawbacks of using Docvalues Fields

2016-06-03 Thread Chitra R
Thank you so much for the information Erick.




On Wed, Jun 1, 2016 at 9:47 PM, Erick Erickson 
wrote:

> You can tell very little about performance with 50 documents,
> so I wouldn't trust these results at all. In particular I'm pretty
> sure that your search speed will suffer _greatly_ as you get
> more and more documents in your corpus if you use only
> DocValues but don't have indexed="true" set.
>
>
> Best,
> Erick
>
> On Tue, May 31, 2016 at 10:51 PM, Chitra R  wrote:
> > Hi,
> >   I have one doubt. Actually I have indexed 50 documents for both
> > SortedNumericDocValuesField and IntField separately . Each document
> > consists of 49 fields. After that, i performed indexing & searching (
> with
> > & without sorting) .
> >
> >  It seems DocValuesField indexes and searches the documents faster
> than
> > normal FieldType (eg: IntField) and also it retrieves the values more
> > efficent than normal field.  Is their any drawbacks to
> > use SortedNumericDocValuesField over IntField?
> >
> > Also It seems like we can use Docvalues Field(String or Numeric)
> > efficiently for any functionality(Searching and sorting). So generally,
> do
> > we have any drawbacks of using DocValues Field(String and Numeric)?
> >
> >
> > Known Drawbacks(on StringFields):
> > -->docvalues don't work on analyzed fields
> > -->search_analyzer cannot be specified on non-analyzed fields.
> >
> > is there any specific drawback on numeric docvalue fields? is range query
> > not possible?? or anyother?
> >
> >
> > Thanks,
> > Chitra
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Drawbacks of using Docvalues Fields

2016-05-31 Thread Chitra R
Hi,
  I have one doubt. Actually I have indexed 50 documents for both
SortedNumericDocValuesField and IntField separately . Each document
consists of 49 fields. After that, i performed indexing & searching ( with
& without sorting) .

 It seems DocValuesField indexes and searches the documents faster than
normal FieldType (eg: IntField) and also it retrieves the values more
efficent than normal field.  Is their any drawbacks to
use SortedNumericDocValuesField over IntField?

Also It seems like we can use Docvalues Field(String or Numeric)
efficiently for any functionality(Searching and sorting). So generally, do
we have any drawbacks of using DocValues Field(String and Numeric)?


Known Drawbacks(on StringFields):
-->docvalues don't work on analyzed fields
-->search_analyzer cannot be specified on non-analyzed fields.

is there any specific drawback on numeric docvalue fields? is range query
not possible?? or anyother?


Thanks,
Chitra


Re: SortedNumericDocValuesFIeld

2016-05-31 Thread Chitra R
Hi,
 Could you please explain how to store the SortedNumericDocValuesField
to include in the code?


Thanks,
Chitra

On Tue, May 31, 2016 at 3:02 PM, Chitra R  wrote:

> Thanks.
>
> On Fri, May 27, 2016 at 6:02 PM, Adrien Grand  wrote:
>
>> DocValues.getSortedNumeric is indeed not really useful for high-level
>> code,
>> such as you example which is getting the sort values for each top hit.
>> However, if you wanted to implement lower level functionality like
>> building
>> a histogram of the prices of all matching documents, you would need to
>> build a custom collector and use these sorted numeric doc values to get
>> the
>> prices.
>>
>> Le ven. 27 mai 2016 à 14:10, Chitra R  a écrit :
>>
>> > Hi,
>> >
>> > i have achieved the same sorting using
>> >
>> > Sort sort = new Sort(new
>> SortedNumericSortField("Numericdoc_price",
>> > > SortField.Type.LONG,true));
>> > > int maxDoc = searcher.getIndexReader().maxDoc();
>> > > TopFieldDocs topdocs =searcher.search(query, 10,sort);
>> > >
>> > > for (ScoreDoc scoreDoc : topdocs.scoreDocs) {
>> > > doc = reader.document(scoreDoc.doc);
>> > > System.out.println(scoreDoc);
>> > > }
>> > >
>> >
>> >
>> > So what is the purpose of sortedDocValues obtained
>> > from DocValues.getSortedNumeric(atomicReader, "Numericdoc_price");
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> > Chitra
>> >
>> >
>> > On Fri, May 27, 2016 at 3:18 PM, Chitra R 
>> wrote:
>> >
>> > > Hi,
>> > >  Actually I like to print the sorted numeric list from the
>> > > sortedDocValues and I dont know which api have to use . Could you
>> please
>> > > help me to achieve this?
>> > >
>> > >
>> > > Thanks,
>> > > Chitra
>> > >
>> > > On Thu, May 26, 2016 at 8:17 PM, Michael McCandless <
>> > > luc...@mikemccandless.com> wrote:
>> > >
>> > >> This looks about right ... did something go wrong?
>> > >>
>> > >> Mike McCandless
>> > >>
>> > >> http://blog.mikemccandless.com
>> > >>
>> > >> On Thu, May 26, 2016 at 9:29 AM, Chitra R 
>> > wrote:
>> > >>
>> > >> > Hi,
>> > >> >   I am new to lucene. Anyone please explain how to sort the
>> > numeric
>> > >> > values by SortedNumericDocValuesField?
>> > >> >
>> > >> > I tried like this using Lucene 4.10.4:
>> > >> >
>> > >> > doc.add(new SortedNumericDocValuesField("Numericdoc_price", 1L));
>> > >> > doc.add(new SortedNumericDocValuesField("Numericdoc_price", 15L));
>> > >> > ..
>> > >> >  for (AtomicReaderContext context : indexReader.leaves()) {
>> > >> > AtomicReader atomicReader = context.reader();
>> > >> >  SortedNumericDocValues
>> > >> >  sortedDocValues=DocValues.getSortedNumeric(atomicReader,
>> > >> > "Numericdoc_price");
>> > >> > }
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > Thanks,
>> > >> > Chitra
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>


Re: SortedNumericDocValuesFIeld

2016-05-31 Thread Chitra R
Thanks.

On Fri, May 27, 2016 at 6:02 PM, Adrien Grand  wrote:

> DocValues.getSortedNumeric is indeed not really useful for high-level code,
> such as you example which is getting the sort values for each top hit.
> However, if you wanted to implement lower level functionality like building
> a histogram of the prices of all matching documents, you would need to
> build a custom collector and use these sorted numeric doc values to get the
> prices.
>
> Le ven. 27 mai 2016 à 14:10, Chitra R  a écrit :
>
> > Hi,
> >
> > i have achieved the same sorting using
> >
> > Sort sort = new Sort(new
> SortedNumericSortField("Numericdoc_price",
> > > SortField.Type.LONG,true));
> > > int maxDoc = searcher.getIndexReader().maxDoc();
> > > TopFieldDocs topdocs =searcher.search(query, 10,sort);
> > >
> > > for (ScoreDoc scoreDoc : topdocs.scoreDocs) {
> > > doc = reader.document(scoreDoc.doc);
> > > System.out.println(scoreDoc);
> > > }
> > >
> >
> >
> > So what is the purpose of sortedDocValues obtained
> > from DocValues.getSortedNumeric(atomicReader, "Numericdoc_price");
> >
> >
> >
> >
> >
> > Thanks,
> > Chitra
> >
> >
> > On Fri, May 27, 2016 at 3:18 PM, Chitra R  wrote:
> >
> > > Hi,
> > >  Actually I like to print the sorted numeric list from the
> > > sortedDocValues and I dont know which api have to use . Could you
> please
> > > help me to achieve this?
> > >
> > >
> > > Thanks,
> > > Chitra
> > >
> > > On Thu, May 26, 2016 at 8:17 PM, Michael McCandless <
> > > luc...@mikemccandless.com> wrote:
> > >
> > >> This looks about right ... did something go wrong?
> > >>
> > >> Mike McCandless
> > >>
> > >> http://blog.mikemccandless.com
> > >>
> > >> On Thu, May 26, 2016 at 9:29 AM, Chitra R 
> > wrote:
> > >>
> > >> > Hi,
> > >> >   I am new to lucene. Anyone please explain how to sort the
> > numeric
> > >> > values by SortedNumericDocValuesField?
> > >> >
> > >> > I tried like this using Lucene 4.10.4:
> > >> >
> > >> > doc.add(new SortedNumericDocValuesField("Numericdoc_price", 1L));
> > >> > doc.add(new SortedNumericDocValuesField("Numericdoc_price", 15L));
> > >> > ..
> > >> >  for (AtomicReaderContext context : indexReader.leaves()) {
> > >> > AtomicReader atomicReader = context.reader();
> > >> >  SortedNumericDocValues
> > >> >  sortedDocValues=DocValues.getSortedNumeric(atomicReader,
> > >> > "Numericdoc_price");
> > >> > }
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Thanks,
> > >> > Chitra
> > >> >
> > >>
> > >
> > >
> >
>


Re: SortedNumericDocValuesFIeld

2016-05-27 Thread Chitra R
Hi,

i have achieved the same sorting using

Sort sort = new Sort(new SortedNumericSortField("Numericdoc_price",
> SortField.Type.LONG,true));
> int maxDoc = searcher.getIndexReader().maxDoc();
> TopFieldDocs topdocs =searcher.search(query, 10,sort);
>
> for (ScoreDoc scoreDoc : topdocs.scoreDocs) {
> doc = reader.document(scoreDoc.doc);
> System.out.println(scoreDoc);
> }
>


So what is the purpose of sortedDocValues obtained
from DocValues.getSortedNumeric(atomicReader, "Numericdoc_price");





Thanks,
Chitra


On Fri, May 27, 2016 at 3:18 PM, Chitra R  wrote:

> Hi,
>  Actually I like to print the sorted numeric list from the
> sortedDocValues and I dont know which api have to use . Could you please
> help me to achieve this?
>
>
> Thanks,
> Chitra
>
> On Thu, May 26, 2016 at 8:17 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> This looks about right ... did something go wrong?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, May 26, 2016 at 9:29 AM, Chitra R  wrote:
>>
>> > Hi,
>> >   I am new to lucene. Anyone please explain how to sort the numeric
>> > values by SortedNumericDocValuesField?
>> >
>> > I tried like this using Lucene 4.10.4:
>> >
>> > doc.add(new SortedNumericDocValuesField("Numericdoc_price", 1L));
>> > doc.add(new SortedNumericDocValuesField("Numericdoc_price", 15L));
>> > ..
>> >  for (AtomicReaderContext context : indexReader.leaves()) {
>> > AtomicReader atomicReader = context.reader();
>> >  SortedNumericDocValues
>> >  sortedDocValues=DocValues.getSortedNumeric(atomicReader,
>> > "Numericdoc_price");
>> > }
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> > Chitra
>> >
>>
>
>


Re: SortedNumericDocValuesFIeld

2016-05-27 Thread Chitra R
Hi,
 Actually I like to print the sorted numeric list from the
sortedDocValues and I dont know which api have to use . Could you please
help me to achieve this?


Thanks,
Chitra

On Thu, May 26, 2016 at 8:17 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> This looks about right ... did something go wrong?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, May 26, 2016 at 9:29 AM, Chitra R  wrote:
>
> > Hi,
> >   I am new to lucene. Anyone please explain how to sort the numeric
> > values by SortedNumericDocValuesField?
> >
> > I tried like this using Lucene 4.10.4:
> >
> > doc.add(new SortedNumericDocValuesField("Numericdoc_price", 1L));
> > doc.add(new SortedNumericDocValuesField("Numericdoc_price", 15L));
> > ..
> >  for (AtomicReaderContext context : indexReader.leaves()) {
> > AtomicReader atomicReader = context.reader();
> >  SortedNumericDocValues
> >  sortedDocValues=DocValues.getSortedNumeric(atomicReader,
> > "Numericdoc_price");
> > }
> >
> >
> >
> >
> >
> > Thanks,
> > Chitra
> >
>


SortedNumericDocValuesFIeld

2016-05-26 Thread Chitra R
Hi,
  I am new to lucene. Anyone please explain how to sort the numeric
values by SortedNumericDocValuesField?

I tried like this using Lucene 4.10.4:

doc.add(new SortedNumericDocValuesField("Numericdoc_price", 1L));
doc.add(new SortedNumericDocValuesField("Numericdoc_price", 15L));
..
 for (AtomicReaderContext context : indexReader.leaves()) {
AtomicReader atomicReader = context.reader();
 SortedNumericDocValues
 sortedDocValues=DocValues.getSortedNumeric(atomicReader,
"Numericdoc_price");
}





Thanks,
Chitra