Hi Shai,
 
Thanks so much for the clear explanation.

I agree on the first question. Taxonomy Writer with a separate index would 
probably be my approach too.

For the second question:
I am a little new to the Facets API so I will try to figure out the approach 
that you outlined below.

However, the scenario is such: Assume a document corpus that is indexed. For a 
user query, a document is returned and selected by the user for editing as part 
of some use case/workflow. That document is now marked as either historically 
interesting or not, financially relevant, specific to media or entertainment 
domain, etc. by the user. So, essentially the user is flagging the document 
with certain markers.
Another set of users could possibly want to query on these markers. So, lets 
say, a second user comes along, and wants to see the top documents belonging to 
one category, say, agriculture or farming. Since these markers are run time 
activities, how can I use the facets on them? So, I was envisioning facets as 
the various markers. But, if I constantly re-index or update the documents 
whenever a marker changes, I believe it would not be very efficient. 

Is there anything, facets or otherwise, in Lucene that can help me solve this 
use case? 

Please let me know. And, thanks!

-----------------------
Thanks n Regards,
Sandeep Ramesh Khanzode


On Friday, June 13, 2014 9:51 PM, Shai Erera <ser...@gmail.com> wrote:
 


Hi

You can check the demo code here:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/.
This code is updated with each release, so you always get a working code
examples, even when the API changes.

If you don't mind managing the sidecar index, which I agree isn't such a
big deal, then yes - the taxonomy index currently performs the fastest. I
plan to explore porting the taxonomy-based approach from BinaryDocValues to
the new SortedNumericDocValues (coming out in 4.9) since it might perform
even faster.

I didn't quite get the marker/flag facet. Can you give an example? For
instance, if you can model that as a NumericDocValuesField added to
documents (w/ the different markers/flags translated to numbers), then you
can use Lucene's updatable numeric DocValues and write a custom Facets to
aggregate on that NumericDocValues field.

Shai



On Fri, Jun 13, 2014 at 11:48 AM, Sandeep Khanzode <
sandeep_khanz...@yahoo.com.invalid> wrote:

> Hi,
>
> I am evaluating Lucene Facets for a project. Since there is a lot of
> change in 4.7.2 for Facets, I am relying on UTs for reference. Please let
> me know if there are other sources of information.
>
> I have a couple of questions:
>
> 1.] All categories in my application are flat, not hierarchical. But, it
> seems from a few sources, that even that notwithstanding, you would want to
> use a Taxonomy based index for performance reasons. It is faster but uses
> more RAM. Or is the deterrent to use it is the fact that it is a separate
> data structure. If one could do with the life-cycle management of the extra
> index, should we go ahead with the taxonomy index for better performance
> across tens of millions of documents?
>
> Another note to add is that I do not see a scenario wherein I would want
> to re-index my collection over and over again or, in other words, the
> changes would be spread over time.
>
> 2.] I need a type of dynamic facet that allows me to add a flag or marker
> to the document at runtime since it will change/update every time a user
> modifies or adds to the list of markers. Is this possible to do with the
> current implementation? Since I believe, that currently all faceting is
> done at indexing time.
>
>
> -----------------------
> Thanks n Regards,
> Sandeep Ramesh Khanzode

Reply via email to