Re: Getting multi-values to use in filter?

Shai Erera Sun, 27 Apr 2014 12:28:27 -0700

Hi Rob,

Your question got me interested, so I wrote a quick prototype of what I
think solves your problem (and if not, I hope it solves someone else's!
:)). The idea is to write a special ValueSource, e.g. MaxValueSource which
reads a BinadyDocValues, decodes the values and returns the maximum one. It
can then be embedded in an expression quite easily.


I published a post on Lucene expressions and included some prototype code
which demonstrates how to do it. Hope it's still helpful to you:
http://shaierera.blogspot.com/2014/04/expressions-with-lucene.html.

Shai


On Thu, Apr 24, 2014 at 1:20 PM, Shai Erera <ser...@gmail.com> wrote:

> I don't think that you should use the facet module. If all you want is to
> encode a bunch of numbers under a 'foo' field, you can encode them into a
> byte[] and index them as a BDV. Then at search time you get the BDV and
> decode the numbers back. The facet module adds complexity here: yes, you
> get the encoding/decoding for free, but at the cost of adding mock
> categories to the taxonomy, or use associations, for no good reason IMO.
>
> Once you do that, you need to figure out how to extend the expressions
> module to support a function like maxValues(fieldName) (cannot use 'max'
> since it's reserved). I read about it some, and still haven't figured out
> exactly how to do it. The JavascriptCompiler can take custom functions to
> compile expressions, but the methods should take only double values. So I
> think it should be some sort of binding, but I'm not sure yet how to do it.
> Perhaps it should be a name like max_fieldName, which you add a custom
> Expression to as a binding ... I will try to look into it later.
>
> Shai
>
>
> On Wed, Apr 23, 2014 at 6:49 PM, Rob Audenaerde 
> <rob.audenae...@gmail.com>wrote:
>
>> Thanks for all the questions, gives me an opportunity to clarify it :)
>>
>> I want the user to be able to give a (simple) formula (so I don't know it
>> on beforehand) and use that formula in the search. The Javascript
>> expressions are really powerful in this use case, but have the
>> single-value
>> limitation. Ideally, I would like to make it really flexible by for
>> example
>> allowing (in-document aggregating) expressions like: max(fieldA) - fieldB
>> >
>> fieldC.
>>
>> Currently, using single values, I can handle expressions in the form of
>> "fieldA - fieldB - fieldC > 0" and evaluate the long-value that I receive
>> from the FunctionValues and the ValueSource. I also optimize the query by
>> assuring the field exists and has a value, etc. to the search still fast
>> enough. This works well, but single value only.
>>
>> I also looked into the facets Association Fields, as they somewhat look
>> like the thing that I want. Only in the faceting module, all ordinals and
>> values are stored in one field, so there is no easy way extract the fields
>> that are used in the expression.
>>
>> I like the solution one you suggested, to add all the numeric fields an
>> encoded byte[] like the facets do, but then on a per-field basis, so that
>> each numeric field has a BDV field that contains all multiple values for
>> that field for that document.
>>
>> Now that I am typing this, I think there is another way. I could use the
>> faceting module and add a different facet field ($facetFIELDA,
>> $facetFIELDB) in the FacetsConfig for each field. That way it would be
>> relatively straightforward to get all the values for a field, as they are
>> exact all the values for the BDV for that document's facet field. Only
>> aggregating all facets will be harder, as the
>> TaxonomyFacetSum*Associations
>> would need to do this for all fields that I need facet counts/sums for.
>>
>> What do you think?
>>
>> -Rob
>>
>>
>> On Wed, Apr 23, 2014 at 5:13 PM, Shai Erera <ser...@gmail.com> wrote:
>>
>> > A NumericDocValues field can only hold one value. Have you thought about
>> > encoding the values in a BinaryDocValues field? Or are you talking about
>> > multiple fields (different names), each has its own single value, and at
>> > search time you sum the values from a different set of fields?
>> >
>> > If it's one field, multiple values, then why do you need to separate the
>> > values? Is it because you sometimes sum and sometimes e.g. avg? Do you
>> > always include all values of a document in the formula, but the formula
>> > changes between searches, or do you sometimes use only a subset of the
>> > values?
>> >
>> > If you always use all values, but change the formula between queries,
>> then
>> > perhaps you can just encode the pre-computed value under different NDV
>> > fields? If you only use a handful of functions (and they are known in
>> > advance), it may not be too heavy on the index, and definitely perform
>> > better during search.
>> >
>> > Otherwise, I believe I'd consider indexing them as a BDV field. For
>> facets,
>> > we basically need the same multi-valued numeric field, and given that
>> NDV
>> > is single valued, we went w/ BDV.
>> >
>> > If I misunderstood the scenario, I'd appreciate if you clarify it :)
>> >
>> > Shai
>> >
>> >
>> > On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde <
>> rob.audenae...@gmail.com
>> > >wrote:
>> >
>> > > Hi Shai, all,
>> > >
>> > > I am trying to write that Filter :). But I'm a bit at loss as how to
>> > > efficiently grab the multi-values. I can access the
>> > > context.reader().document() that accesses the storedfields, but that
>> > seems
>> > > slow.
>> > >
>> > > For single-value fields I use a compiled JavaScript Expression with
>> > > simplebindings as ValueSource, which seems to work quite well. The
>> > downside
>> > > is that I cannot find a way to implement multi-value through that
>> > solution.
>> > >
>> > > These create for example a LongFieldSource, which uses the
>> > > FieldCache.LongParser. These parsers only seem te parse one field.
>> > >
>> > > Is there an efficient way to get -all- of the (numeric) values for a
>> > field
>> > > in a document?
>> > >
>> > >
>> > > On Wed, Apr 23, 2014 at 4:38 PM, Shai Erera <ser...@gmail.com> wrote:
>> > >
>> > > > You can do that by writing a Filter which returns matching documents
>> > > based
>> > > > on a sum of the field's value. However I suspect that is going to be
>> > > slow,
>> > > > unless you know that you will need several such filters and can
>> cache
>> > > them.
>> > > >
>> > > > Another approach would be to write a Collector which serves as a
>> > Filter,
>> > > > but computes the sum only for documents that match the query.
>> Hopefully
>> > > > that would mean you compute the sum for less documents than you
>> would
>> > > have
>> > > > w/ the Filter approach.
>> > > >
>> > > > Shai
>> > > >
>> > > >
>> > > > On Wed, Apr 23, 2014 at 5:11 PM, Michael Sokolov <
>> > > > msoko...@safaribooksonline.com> wrote:
>> > > >
>> > > > > This isn't really a good use case for an index like Lucene.  The
>> most
>> > > > > essential property of an index is that it lets you look up
>> documents
>> > > very
>> > > > > quickly based on *precomputed* values.
>> > > > >
>> > > > > -Mike
>> > > > >
>> > > > >
>> > > > > On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
>> > > > >
>> > > > >> Hi all,
>> > > > >>
>> > > > >> I'm looking for a way to use multi-values in a filter.
>> > > > >>
>> > > > >> I want to be able to search on  sum(field)=100, where field has
>> > values
>> > > > in
>> > > > >> one documents:
>> > > > >>
>> > > > >> field=60
>> > > > >> field=40
>> > > > >>
>> > > > >> In this case 'field' is a LongField. I examined the code in the
>> > > > >> FieldCache,
>> > > > >> but that seems to focus on single-valued fields only, or
>> > > > >>
>> > > > >>
>> > > > >> It this something that can be done in Lucene? And what would be a
>> > good
>> > > > >> approach?
>> > > > >>
>> > > > >> Thanks in advance,
>> > > > >>
>> > > > >> -Rob
>> > > > >>
>> > > > >>
>> > > > >
>> > > > >
>> ---------------------------------------------------------------------
>> > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Getting multi-values to use in filter?

Reply via email to