I don't think that you should use the facet module. If all you want is to
encode a bunch of numbers under a 'foo' field, you can encode them into a
byte[] and index them as a BDV. Then at search time you get the BDV and
decode the numbers back. The facet module adds complexity here: yes, you
get the encoding/decoding for free, but at the cost of adding mock
categories to the taxonomy, or use associations, for no good reason IMO.

Once you do that, you need to figure out how to extend the expressions
module to support a function like maxValues(fieldName) (cannot use 'max'
since it's reserved). I read about it some, and still haven't figured out
exactly how to do it. The JavascriptCompiler can take custom functions to
compile expressions, but the methods should take only double values. So I
think it should be some sort of binding, but I'm not sure yet how to do it.
Perhaps it should be a name like max_fieldName, which you add a custom
Expression to as a binding ... I will try to look into it later.

Shai


On Wed, Apr 23, 2014 at 6:49 PM, Rob Audenaerde <rob.audenae...@gmail.com>wrote:

> Thanks for all the questions, gives me an opportunity to clarify it :)
>
> I want the user to be able to give a (simple) formula (so I don't know it
> on beforehand) and use that formula in the search. The Javascript
> expressions are really powerful in this use case, but have the single-value
> limitation. Ideally, I would like to make it really flexible by for example
> allowing (in-document aggregating) expressions like: max(fieldA) - fieldB >
> fieldC.
>
> Currently, using single values, I can handle expressions in the form of
> "fieldA - fieldB - fieldC > 0" and evaluate the long-value that I receive
> from the FunctionValues and the ValueSource. I also optimize the query by
> assuring the field exists and has a value, etc. to the search still fast
> enough. This works well, but single value only.
>
> I also looked into the facets Association Fields, as they somewhat look
> like the thing that I want. Only in the faceting module, all ordinals and
> values are stored in one field, so there is no easy way extract the fields
> that are used in the expression.
>
> I like the solution one you suggested, to add all the numeric fields an
> encoded byte[] like the facets do, but then on a per-field basis, so that
> each numeric field has a BDV field that contains all multiple values for
> that field for that document.
>
> Now that I am typing this, I think there is another way. I could use the
> faceting module and add a different facet field ($facetFIELDA,
> $facetFIELDB) in the FacetsConfig for each field. That way it would be
> relatively straightforward to get all the values for a field, as they are
> exact all the values for the BDV for that document's facet field. Only
> aggregating all facets will be harder, as the TaxonomyFacetSum*Associations
> would need to do this for all fields that I need facet counts/sums for.
>
> What do you think?
>
> -Rob
>
>
> On Wed, Apr 23, 2014 at 5:13 PM, Shai Erera <ser...@gmail.com> wrote:
>
> > A NumericDocValues field can only hold one value. Have you thought about
> > encoding the values in a BinaryDocValues field? Or are you talking about
> > multiple fields (different names), each has its own single value, and at
> > search time you sum the values from a different set of fields?
> >
> > If it's one field, multiple values, then why do you need to separate the
> > values? Is it because you sometimes sum and sometimes e.g. avg? Do you
> > always include all values of a document in the formula, but the formula
> > changes between searches, or do you sometimes use only a subset of the
> > values?
> >
> > If you always use all values, but change the formula between queries,
> then
> > perhaps you can just encode the pre-computed value under different NDV
> > fields? If you only use a handful of functions (and they are known in
> > advance), it may not be too heavy on the index, and definitely perform
> > better during search.
> >
> > Otherwise, I believe I'd consider indexing them as a BDV field. For
> facets,
> > we basically need the same multi-valued numeric field, and given that NDV
> > is single valued, we went w/ BDV.
> >
> > If I misunderstood the scenario, I'd appreciate if you clarify it :)
> >
> > Shai
> >
> >
> > On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde <
> rob.audenae...@gmail.com
> > >wrote:
> >
> > > Hi Shai, all,
> > >
> > > I am trying to write that Filter :). But I'm a bit at loss as how to
> > > efficiently grab the multi-values. I can access the
> > > context.reader().document() that accesses the storedfields, but that
> > seems
> > > slow.
> > >
> > > For single-value fields I use a compiled JavaScript Expression with
> > > simplebindings as ValueSource, which seems to work quite well. The
> > downside
> > > is that I cannot find a way to implement multi-value through that
> > solution.
> > >
> > > These create for example a LongFieldSource, which uses the
> > > FieldCache.LongParser. These parsers only seem te parse one field.
> > >
> > > Is there an efficient way to get -all- of the (numeric) values for a
> > field
> > > in a document?
> > >
> > >
> > > On Wed, Apr 23, 2014 at 4:38 PM, Shai Erera <ser...@gmail.com> wrote:
> > >
> > > > You can do that by writing a Filter which returns matching documents
> > > based
> > > > on a sum of the field's value. However I suspect that is going to be
> > > slow,
> > > > unless you know that you will need several such filters and can cache
> > > them.
> > > >
> > > > Another approach would be to write a Collector which serves as a
> > Filter,
> > > > but computes the sum only for documents that match the query.
> Hopefully
> > > > that would mean you compute the sum for less documents than you would
> > > have
> > > > w/ the Filter approach.
> > > >
> > > > Shai
> > > >
> > > >
> > > > On Wed, Apr 23, 2014 at 5:11 PM, Michael Sokolov <
> > > > msoko...@safaribooksonline.com> wrote:
> > > >
> > > > > This isn't really a good use case for an index like Lucene.  The
> most
> > > > > essential property of an index is that it lets you look up
> documents
> > > very
> > > > > quickly based on *precomputed* values.
> > > > >
> > > > > -Mike
> > > > >
> > > > >
> > > > > On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> I'm looking for a way to use multi-values in a filter.
> > > > >>
> > > > >> I want to be able to search on  sum(field)=100, where field has
> > values
> > > > in
> > > > >> one documents:
> > > > >>
> > > > >> field=60
> > > > >> field=40
> > > > >>
> > > > >> In this case 'field' is a LongField. I examined the code in the
> > > > >> FieldCache,
> > > > >> but that seems to focus on single-valued fields only, or
> > > > >>
> > > > >>
> > > > >> It this something that can be done in Lucene? And what would be a
> > good
> > > > >> approach?
> > > > >>
> > > > >> Thanks in advance,
> > > > >>
> > > > >> -Rob
> > > > >>
> > > > >>
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to