bq. Expanding a query over numerous fields grows combinatorically in the number of fields (if I want my query to match when all terms match in *some* field), doesn't it?
I don't think it does? It grows linearly with the number of fields? In my experience the number of fields searchable "by default" is typically limited - it's not *all* fields - it's just a subset that constitutes the "text body" of a document. Of course everyone's experience will vary depending on the application. > Re: query parsing; wasn't there at one time an interval query parser? It had > operators like w() and n() IIRC I've tried that but it's really unusable unless the queries are automated - the syntax is difficult to use; mistakes cause cryptic parse errors and are hard to recover from. Dawid On Thu, Sep 10, 2020 at 10:40 PM Michael Sokolov <msoko...@gmail.com> wrote: > > A slightly different but related topic is how to manage lots of fields > > I agree that sub-fields are a pain and that mashing everything > together in an all-field is a mess, but for best performance with a > large number of fields/sub-fields, it is the only workable option I > can see? Expanding a query over numerous fields grows combinatorically > in the number of fields (if I want my query to match when all terms > match in *some* field), doesn't it? > > I would like to see a mechanism for defining sub-fields using > positions. Together with an absolute positional query this would > enable both match-any-field as well as field-specific matching with > each token indexed only once (multi-values are possible within this > with boundary tokens or big enough position ranges, as Alan > suggested). It does mean that the sub-field boundaries have to be > managed somehow. Without index support, you can set an arbitrary large > size for your sub-field and insert position gaps at the boundaries, > but maybe we could detect the largest sub-field at flush time and > write that metadata somewhere in the index to enable smaller gaps? > Another issue is differing analysis for the sub-fields, and properly > updating the positions during analysis: at the boundaries(you don't > want to insert a gap, rather advance to a fixed position, and you have > to index sub-fields in order. Maybe we could make it less horrible by > adding better support for it. > > Re: query parsing; wasn't there at one time an interval query parser? > It had operators like w() and n() IIRC > > On Thu, Sep 10, 2020 at 4:20 PM Dawid Weiss <dawid.we...@gmail.com> wrote: > > > > > Ok so the more general question is whether we need an interval query > > > parser > > > > Oh, to this I'd say: yes, yes, yes. > > > > I didn't have much prior experience writing frontend apps on top of > > Solr/Lucene but once I did have > > to go that route it quickly turns out that several things that are > > readily available from code-level > > are so darn difficult to achieve and integrate from the outside. > > Specifically: > > > > - Field expansion in query parsers is a must (so that unqualified > > terms are expanded over multiple fields). > > Any query parser that doesn't support this is in my opinion of zero > > use. The "default" copy-to sink field known > > from Solr brings more problems than it solves. > > > > - Exact match-region hit highlighting is a strong expectation. I > > solved this with matches API (see LUCENE-9461) > > and flexible query parser's multifield expansion. Works like a charm. > > > > - Multivalued fields are common and sub-document handling is a pain. > > The problem I raised here is a result of > > direct user feedback. In real life multivalued fields are omnipresent > > and searches over those fields can be complex. > > Users see hits that just should not be there and are confused. > > > > - People do use complex queries. Maybe not all people but there are > > people out there who do... Just recently I extended > > flexible query parser with a handcrafted min-should-match operator > > because it is otherwise not accessible in any Lucene > > query parser (!). I can make this code available (it's not terribly > > complex), although, since you asked, I think a query parser that > > exposes all sorts of "higher level" functionality of intervals would > > be very, very useful. > > > > It may end up that I'll have to write something for intervals anyway > > so we can work on this together if you like. > > Especially the syntax is an open question - should it be > > operator-based (like the current boost of fuzzy operators) or > > meta-function-based (so that pseudo-functions would be available). Or > > maybe a mix of both? I don't know, really. :) > > > > Dawid > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org