subject:"Re\: When is too many fields in \"qf\" is too many\?"

Re: When is too many fields in qf is too many?

2015-05-31 Thread Tomasz Borek

Steven,

What does being your hero entails, beside a salute? :-)

Approach 1: Tinker with your-app - Solr relationship.
Approach 2: Gauge what's really used and limit the customization.
Approach 3: Offer what's wanted (might be different than what you're trying
to achieve).

In your write-up I'm unsure on what is being demanded of Solr, but I assume
you're after searchability no matter the view / ... / field combination.
Every single field can be searchable and I assume you're looking for a way
to provide just that - search on every field is to be possible if user
wants it.

Ad 3: with customize-all apps folks usually quickly find patterns where
they have what they want and they don't use other patterns unless they
really need to. Offer relevant search for their favourite patterns and much
weaker search for other patterns.

Your current approach may be over-engineering. You may be trying to answer
a problem product folks posed you, while real need lies elsewhere. So, I'm
kinda asking you question the problem, to find out your work won't be in
vain, or not as good for end-user as they might have a different problem
(like: which field is which in my so-and-so customized view nr 57 which
changed again this month).

Ad 2: find out which fields are most important for searches and offer these
to Solr. The real usage is usually much less than capability offered - so
if you have a view with 200 fields, I doubt folks even want to query all
200, but perhaps only 5 matter. Find a way to know those 5 (via user prefs
per view perhaps or default view config) and search only them. Ties nicely
with #3 as folks most likely don't even WANT to query all fields. Humans
like limits: we don't want too much elements on screen, we like simple UIs,
we don't want to input too long search query and we often don't want too
many choices.

Ad 1. EXPERIMENT. First, create a way for you to manage your configs
automatically and have them in version control, you'll need fast generation
and even faster revert / regeneration when something is NOT OK. Set up more
than one way to achieve your search-them-all-as-user-pleases approach and
test and compare them.

Your case is quite unique (3,5k qf fields anyone? changing collections
monthly) and I don't think without experimenting you will get good results,
you need to compare number of options.

Can you offer Solr servers per user group? Do you have similarities in
views for user groups - even if informal? Like, 30% of your user base uses
only 20% of all views that you have? Then it makes sense to have dedicated
Solr for those 20% of all views. You'll need routing here and rules per
user groups in your app.

How many customizations you have and how can you use that? Are there any
patterns in customizing views that you can predict / observe / use?

Kinda synthesis of all approaches, but with your customization level I
don't think one Solr for all cases will be of any use, even if you do
manage to have it by some tinkering with settings.

As I kinda looked at the problem not in terms of Solr settings, this is
somewhat off-topic, so if you wish to ask something it might be better off
the group, unless others want the thread to continue here out of curiosity
how it ends.

pozdrawiam,
LAFK


 -Original Message-
 From: Steven White [mailto:swhite4...@gmail.com]
 Sent: Thursday, May 28, 2015 5:59 PM
 To: solr-user@lucene.apache.org
 Subject: Re: When is too many fields in qf is too many?

 Hi Folks,

 First, thanks for taking the time to read and reply to this subject, it is
 much appreciated, I have yet to come up with a final solution that
 optimizes Solr.  To give you more context, let me give you the big picture
 of how the application and the database is structured for which I'm trying
 to enable Solr search on.

 Application: Has the concept of views.  A view contains one or more
 object types.  An object type may exist in any view.  An object type has
 one or more field groups.  A field group has a set of fields.  A field
 group can be used with any object type of any view.  Notice how field
 groups are free standing, that they can be linked to an object type of
 any view?

 Here is a diagram of the above:

 FieldGroup-#1 == Field-1, Field-2, Field-5, etc.
 FieldGroup-#2 == Field-1, Field-5, Field-6, Field-7, Field-8, etc.
 FieldGroup-#3 == Field-2, Field-5, Field-8, etc.

 View-#1 == ObjType-#2 (using FieldGroup-#1  #3)  +  ObjType-#4 (using
 FieldGroup-#1)  +  ObjType-#5 (using FieldGroup-#1, #2, #3, etc).

 View-#2 == ObjType-#1 (using FieldGroup-#3, #15, #16, #19, etc.)  +
  ObjType-#4 (using FieldGroup-#1, #4, #19, etc.)  +  etc.

 View-#3 == ObjType-#1 (using FieldGroup-#1,  #8)  +  etc.

 Do you see where this is heading?  To make it even a bit more interesting,
 ObjType-#4 (which is in view-#1 and #2 per the above) which in both views,
 it uses FieldGroup-#1, in one view it can be configured to have its own
 fields off FieldGroup-#1.

 With the above setting, a user is assigned

RE: When is too many fields in qf is too many?

2015-05-29 Thread Reitzel, Charles

Before giving up, I might try a copyTo fields per field group and see how 
that works.   Won't that get you down to  10-20 fields per query and be stable 
wrt view changes?

But Solr is column oriented, in that the core query logic is a scatter/gather 
over qf list.   Perhaps there is a reason qf does not support wildcards.  Not 
sure.  But it seems likely.

That said, having thousands of columns is not weird at all in some 
applications.   You might be better served with a product oriented to this type 
of usage.  Maybe HBASE?

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Thursday, May 28, 2015 5:59 PM
To: solr-user@lucene.apache.org
Subject: Re: When is too many fields in qf is too many?

Hi Folks,

First, thanks for taking the time to read and reply to this subject, it is much 
appreciated, I have yet to come up with a final solution that optimizes Solr.  
To give you more context, let me give you the big picture of how the 
application and the database is structured for which I'm trying to enable Solr 
search on.

Application: Has the concept of views.  A view contains one or more object 
types.  An object type may exist in any view.  An object type has one or more 
field groups.  A field group has a set of fields.  A field group can be used 
with any object type of any view.  Notice how field groups are free standing, 
that they can be linked to an object type of any view?

Here is a diagram of the above:

FieldGroup-#1 == Field-1, Field-2, Field-5, etc.
FieldGroup-#2 == Field-1, Field-5, Field-6, Field-7, Field-8, etc.
FieldGroup-#3 == Field-2, Field-5, Field-8, etc.

View-#1 == ObjType-#2 (using FieldGroup-#1  #3)  +  ObjType-#4 (using
FieldGroup-#1)  +  ObjType-#5 (using FieldGroup-#1, #2, #3, etc).

View-#2 == ObjType-#1 (using FieldGroup-#3, #15, #16, #19, etc.)  +
 ObjType-#4 (using FieldGroup-#1, #4, #19, etc.)  +  etc.

View-#3 == ObjType-#1 (using FieldGroup-#1,  #8)  +  etc.

Do you see where this is heading?  To make it even a bit more interesting,
ObjType-#4 (which is in view-#1 and #2 per the above) which in both views, it 
uses FieldGroup-#1, in one view it can be configured to have its own fields off 
FieldGroup-#1.

With the above setting, a user is assigned a view and can be moved around views 
but cannot be in multiple views at the same time.  Based on which view that 
user is in, that user will see different fields of ObjType-#1 (the example I 
gave for FieldGroup-#1) or even not see an object type that he was able to see 
in another view.

If I have not lost you with the above, you can see that per view, there can be 
may fields.  To make it even yet more interesting, a field in
FieldGroup-#1 may have the exact same name as a field in another FieldGroup and 
the two could be of different type (one is date, the other is string type).  
Thus when I build my Solr doc object (and create list of Solr
fields) those fields must be prefixed with the FieldGroup name otherwise I 
could end up overwriting the type of another field.

Are you still with me?  :-)

Now you see how a view can end up with many fields (over 3500 in my case), but 
a doc I post to Solr for indexing will have on average 50 fields, worse case 
maybe 200 fields.  This is fine, and it is not my issue but I want to call it 
out to get it out of our way.

Another thing I need to mention is this (in case it is not clear from the 
above).  Users create and edit records in the DB by an instance of ObjType-#N.  
Those object types that are created do NOT belong to a view, in fact they do 
NOT have any view concept in them.  They simply have the concept of what fields 
the user can see / edit based on which view that user is in.  In effect, in the 
DB, we have instances of object types data.

One last thing I should point out is that views, and field groups are dynamic.  
This month, View-#3 may have ObjType-#1, but next month it may not or a new 
object type may be added to it.

Still with me?  If so, you are my hero!!  :-)

So, I setup my Solr schema.xml to include all fields off each field group that 
exists in the database like so:

field name=FieldGroup-1.Headline type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-1.Summary type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-1. ... ... ... ... /
field name=FieldGroup-2.Headline type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2.Summary type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2.Date type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2. ... ... ... ... /
field name=FieldGroup-3. ... ... ... ... /
field name=FieldGroup-4. ... ... ... ... /

You got the idea.  Each record of an object type I index contains ALL the 
fields off that that object type REGARDLESS which view that object type is set

Re: When is too many fields in qf is too many?

2015-05-28 Thread Jack Krupansky

I would reconsider the strategy of mashing so many different record types
into one Solr collection. Sure, you get some advantage from denormalizing
data, but if the downside cost gets too high, it may not make so much sense.

I'd consider a collection per record type, or at least group similar record
types, and then query as many collections - in parallel - as needed for a
given user. That should also assure that a query for a given record type
should be much faster as well.

Surely you should be able to examine the query in the app and determine
what record types it might apply to.

When in doubt, make your schema as clean and simple as possible. Simplicity
over complexity.


-- Jack Krupansky

On Thu, May 28, 2015 at 12:06 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Gotta agree with Jack here. This is an insane number of fields, query
 performance on any significant corpus will be fraught etc. The very
 first thing I'd look at is having that many fields. You have 3,500
 different fields! Whatever the motivation for having that many fields
 is the place I'd start.

 Best,
 Erick

 On Thu, May 28, 2015 at 5:50 AM, Jack Krupansky
 jack.krupan...@gmail.com wrote:
  This does not even pass a basic smell test for reasonability of matching
  the capabilities of Solr and the needs of your application. I'd like to
  hear from others, but I personally would be -1 on this approach to
 misusing
  qf. I'd simply say that you need to go back to the drawing board, and
 that
  your primary focus should be on working with your application product
  manager to revise your application requirements to more closely match the
  capabilities of Solr.
 
  To put it simply, if you have more than a dozen fields in qf, you're
  probably doing something wrong. In this case horribly wrong.
 
  Focus on designing your app to exploit the capabilities of Solr, not to
  misuse them.
 
  In short, to answer the original question, more than a couple dozen
 fields
  in qf is indeed too many. More than a dozen raises a yellow flag for me.
 
 
  -- Jack Krupansky
 
  On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com
 wrote:
 
  Hi Charles,
 
  That is what I have done.  At the moment, I have 22 request handlers,
 some
  have 3490 field items in qf (that's the most and the qf line spans
 over
  95,000 characters in solrconfig.xml file) and the least one has 1341
  fields.  I'm working on seeing if I can use copyField to copy the data
 of
  that view's field into a single pseudo-view-field and use that pseudo
 field
  for qf of that view's request handler.  The I still have outstanding
 with
  using copyField in this way is that it could lead to a complete
 re-indexing
  of all the data in that view when a field is adding / removing from that
  view.
 
  Thanks
 
  Steve
 
  On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   One request handler per view?
  
   I think if you are able to make the actual view in use for the current
   request a single value (vs. all views that the user could use over
 time),
   it would keep the qf list down to a manageable size (e.g. specified
  within
   the request handler XML).   Not sure if this is feasible for  you,
 but it
   seems like a reasonable approach given the use case you describe.
  
   Just a thought ...
  
   -Original Message-
   From: Steven White [mailto:swhite4...@gmail.com]
   Sent: Tuesday, May 26, 2015 4:48 PM
   To: solr-user@lucene.apache.org
   Subject: Re: When is too many fields in qf is too many?
  
   Thanks Doug.  I might have to take you on the hangout offer.  Let me
   refine the requirement further and if I still see the need, I will let
  you
   know.
  
   Steve
  
   On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
   dturnb...@opensourceconnections.com wrote:
  
How you have tie is fine. Setting tie to 1 might give you reasonable
results. You could easily still have scores that are just always an
order of magnitude or two higher, but try it out!
   
BTW Anything you put in teh URL can also be put into a request
 handler.
   
If you ever just want to have a 15 minute conversation via hangout,
happy to chat with you :) Might be fun to think through your prob
   together.
   
-Doug
   
On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
 
wrote:
   
 Hi Doug,

 I'm back to this topic.  Unfortunately, due to my DB structer, and
business
 need, I will not be able to search against a single field (i.e.:
 using copyField).  Thus, I have to use list of fields via qf.
 Given this, I see you said above to use tie=1.0 will that, more
 or
 less, address this scoring issue?  Should tie=1.0 be set on the
   request handler like so:

   requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str name

Re: When is too many fields in qf is too many?

2015-05-28 Thread Steven White

 to reindex my entire database to reflect a
view change even when the actual data has not changed.

2) My Solr index size will now be larger.  I have to create a pseudo Solr
field to copyField to it for each view in my database.

I have also considered creating multiple cores per view, but that still
doesn't solve the above two issues, requiring reindex and increasing the
index size.

Now that you see what my backend application is like, let me know if you
have any ideas on how you would solve this puzzle.

And if you have read this all the way to the end, I solute you!!

Steve


On Thu, May 28, 2015 at 4:23 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 I would reconsider the strategy of mashing so many different record types
 into one Solr collection. Sure, you get some advantage from denormalizing
 data, but if the downside cost gets too high, it may not make so much
 sense.

 I'd consider a collection per record type, or at least group similar record
 types, and then query as many collections - in parallel - as needed for a
 given user. That should also assure that a query for a given record type
 should be much faster as well.

 Surely you should be able to examine the query in the app and determine
 what record types it might apply to.

 When in doubt, make your schema as clean and simple as possible. Simplicity
 over complexity.


 -- Jack Krupansky

 On Thu, May 28, 2015 at 12:06 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Gotta agree with Jack here. This is an insane number of fields, query
  performance on any significant corpus will be fraught etc. The very
  first thing I'd look at is having that many fields. You have 3,500
  different fields! Whatever the motivation for having that many fields
  is the place I'd start.
 
  Best,
  Erick
 
  On Thu, May 28, 2015 at 5:50 AM, Jack Krupansky
  jack.krupan...@gmail.com wrote:
   This does not even pass a basic smell test for reasonability of
 matching
   the capabilities of Solr and the needs of your application. I'd like to
   hear from others, but I personally would be -1 on this approach to
  misusing
   qf. I'd simply say that you need to go back to the drawing board, and
  that
   your primary focus should be on working with your application product
   manager to revise your application requirements to more closely match
 the
   capabilities of Solr.
  
   To put it simply, if you have more than a dozen fields in qf, you're
   probably doing something wrong. In this case horribly wrong.
  
   Focus on designing your app to exploit the capabilities of Solr, not to
   misuse them.
  
   In short, to answer the original question, more than a couple dozen
  fields
   in qf is indeed too many. More than a dozen raises a yellow flag for
 me.
  
  
   -- Jack Krupansky
  
   On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com
  wrote:
  
   Hi Charles,
  
   That is what I have done.  At the moment, I have 22 request handlers,
  some
   have 3490 field items in qf (that's the most and the qf line spans
  over
   95,000 characters in solrconfig.xml file) and the least one has 1341
   fields.  I'm working on seeing if I can use copyField to copy the data
  of
   that view's field into a single pseudo-view-field and use that pseudo
  field
   for qf of that view's request handler.  The I still have outstanding
  with
   using copyField in this way is that it could lead to a complete
  re-indexing
   of all the data in that view when a field is adding / removing from
 that
   view.
  
   Thanks
  
   Steve
  
   On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
   charles.reit...@tiaa-cref.org wrote:
  
One request handler per view?
   
I think if you are able to make the actual view in use for the
 current
request a single value (vs. all views that the user could use over
  time),
it would keep the qf list down to a manageable size (e.g. specified
   within
the request handler XML).   Not sure if this is feasible for  you,
  but it
seems like a reasonable approach given the use case you describe.
   
Just a thought ...
   
-Original Message-
From: Steven White [mailto:swhite4...@gmail.com]
Sent: Tuesday, May 26, 2015 4:48 PM
To: solr-user@lucene.apache.org
Subject: Re: When is too many fields in qf is too many?
   
Thanks Doug.  I might have to take you on the hangout offer.  Let me
refine the requirement further and if I still see the need, I will
 let
   you
know.
   
Steve
   
On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:
   
 How you have tie is fine. Setting tie to 1 might give you
 reasonable
 results. You could easily still have scores that are just always
 an
 order of magnitude or two higher, but try it out!

 BTW Anything you put in teh URL can also be put into a request
  handler.

 If you ever just want to have a 15 minute conversation via
 hangout

RE: When is too many fields in qf is too many?

2015-05-28 Thread Reitzel, Charles

Still, it seems like the right direction.   

Does it smell ok to have a few hundred request handlers?Again, my logic 
is that if any given view requires no more than 50 fields, one request handler 
per view would work.   This is different than a request handler per user 
category (which requires access to any number of views and, thus, many more 
fields).

This does require a design change for Steven's application ...

Steven, do you have tables of the many-to-many relationship between fields and 
views and users and views?   If so, you should be able to programmatically 
generate the request handlers.

If these relationships change frequently, then some custom plugin will be 
required to access these tables at query time.

See what I mean?

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, May 28, 2015 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: When is too many fields in qf is too many?

Gotta agree with Jack here. This is an insane number of fields, query 
performance on any significant corpus will be fraught etc. The very first 
thing I'd look at is having that many fields. You have 3,500 different fields! 
Whatever the motivation for having that many fields is the place I'd start.

Best,
Erick

On Thu, May 28, 2015 at 5:50 AM, Jack Krupansky jack.krupan...@gmail.com 
wrote:
 This does not even pass a basic smell test for reasonability of 
 matching the capabilities of Solr and the needs of your application. 
 I'd like to hear from others, but I personally would be -1 on this 
 approach to misusing qf. I'd simply say that you need to go back to 
 the drawing board, and that your primary focus should be on working 
 with your application product manager to revise your application 
 requirements to more closely match the capabilities of Solr.

 To put it simply, if you have more than a dozen fields in qf, you're 
 probably doing something wrong. In this case horribly wrong.

 Focus on designing your app to exploit the capabilities of Solr, not 
 to misuse them.

 In short, to answer the original question, more than a couple dozen 
 fields in qf is indeed too many. More than a dozen raises a yellow flag for 
 me.


 -- Jack Krupansky

 On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com wrote:

 Hi Charles,

 That is what I have done.  At the moment, I have 22 request handlers, 
 some have 3490 field items in qf (that's the most and the qf line 
 spans over
 95,000 characters in solrconfig.xml file) and the least one has 1341 
 fields.  I'm working on seeing if I can use copyField to copy the 
 data of that view's field into a single pseudo-view-field and use 
 that pseudo field for qf of that view's request handler.  The I 
 still have outstanding with using copyField in this way is that it 
 could lead to a complete re-indexing of all the data in that view 
 when a field is adding / removing from that view.

 Thanks

 Steve

 On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles  
 charles.reit...@tiaa-cref.org wrote:

  One request handler per view?
 
  I think if you are able to make the actual view in use for the 
  current request a single value (vs. all views that the user could 
  use over time), it would keep the qf list down to a manageable size 
  (e.g. specified
 within
  the request handler XML).   Not sure if this is feasible for  you, but it
  seems like a reasonable approach given the use case you describe.
 
  Just a thought ...
 
  -Original Message-
  From: Steven White [mailto:swhite4...@gmail.com]
  Sent: Tuesday, May 26, 2015 4:48 PM
  To: solr-user@lucene.apache.org
  Subject: Re: When is too many fields in qf is too many?
 
  Thanks Doug.  I might have to take you on the hangout offer.  Let 
  me refine the requirement further and if I still see the need, I 
  will let
 you
  know.
 
  Steve
 
  On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull  
  dturnb...@opensourceconnections.com wrote:
 
   How you have tie is fine. Setting tie to 1 might give you 
   reasonable results. You could easily still have scores that are 
   just always an order of magnitude or two higher, but try it out!
  
   BTW Anything you put in teh URL can also be put into a request handler.
  
   If you ever just want to have a 15 minute conversation via 
   hangout, happy to chat with you :) Might be fun to think through 
   your prob
  together.
  
   -Doug
  
   On Tue, May 26, 2015 at 1:42 PM, Steven White 
   swhite4...@gmail.com
   wrote:
  
Hi Doug,
   
I'm back to this topic.  Unfortunately, due to my DB structer, 
and
   business
need, I will not be able to search against a single field (i.e.:
using copyField).  Thus, I have to use list of fields via qf.
Given this, I see you said above to use tie=1.0 will that, 
more or less, address this scoring issue?  Should tie=1.0 be 
set on the
  request handler like so:
   
  requestHandler name=/select class=solr.SearchHandler
 lst

Re: When is too many fields in qf is too many?

2015-05-28 Thread Steven White

Hi Charles,

That is what I have done.  At the moment, I have 22 request handlers, some
have 3490 field items in qf (that's the most and the qf line spans over
95,000 characters in solrconfig.xml file) and the least one has 1341
fields.  I'm working on seeing if I can use copyField to copy the data of
that view's field into a single pseudo-view-field and use that pseudo field
for qf of that view's request handler.  The I still have outstanding with
using copyField in this way is that it could lead to a complete re-indexing
of all the data in that view when a field is adding / removing from that
view.

Thanks

Steve

On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 One request handler per view?

 I think if you are able to make the actual view in use for the current
 request a single value (vs. all views that the user could use over time),
 it would keep the qf list down to a manageable size (e.g. specified within
 the request handler XML).   Not sure if this is feasible for  you, but it
 seems like a reasonable approach given the use case you describe.

 Just a thought ...

 -Original Message-
 From: Steven White [mailto:swhite4...@gmail.com]
 Sent: Tuesday, May 26, 2015 4:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: When is too many fields in qf is too many?

 Thanks Doug.  I might have to take you on the hangout offer.  Let me
 refine the requirement further and if I still see the need, I will let you
 know.

 Steve

 On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:

  How you have tie is fine. Setting tie to 1 might give you reasonable
  results. You could easily still have scores that are just always an
  order of magnitude or two higher, but try it out!
 
  BTW Anything you put in teh URL can also be put into a request handler.
 
  If you ever just want to have a 15 minute conversation via hangout,
  happy to chat with you :) Might be fun to think through your prob
 together.
 
  -Doug
 
  On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
  wrote:
 
   Hi Doug,
  
   I'm back to this topic.  Unfortunately, due to my DB structer, and
  business
   need, I will not be able to search against a single field (i.e.:
   using copyField).  Thus, I have to use list of fields via qf.
   Given this, I see you said above to use tie=1.0 will that, more or
   less, address this scoring issue?  Should tie=1.0 be set on the
 request handler like so:
  
 requestHandler name=/select class=solr.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str
  int name=rows20/int
  str name=defTypeedismax/str
  str name=qfF1 F2 F3 F4 ... ... .../str
  float name=tie1.0/float
  str name=fl_UNIQUE_FIELD_,score/str
  str name=wtxml/str
  str name=indenttrue/str
/lst
 /requestHandler
  
   Or must tie be passed as part of the URL?
  
   Thanks
  
   Steve
  
  
   On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
   dturnb...@opensourceconnections.com wrote:
  
Yeah a copyField into one could be a good space/time tradeoff. It
can
  be
more manageable to use an all field for both relevancy and
performance,
   if
you can handle the duplication of data.
   
You could set tie=1.0, which effectively sums all the matches
instead
  of
picking the best match. You'll still have cases where one field's
score might just happen to be far off of another, and thus
dominating the summation. But something easy to try if you want to
keep playing with dismax.
   
-Doug
   
On Wed, May 20, 2015 at 2:56 PM, Steven White
swhite4...@gmail.com
wrote:
   
 Hi Doug,

 Your blog write up on relevancy is very interesting, I didn't
 know
   this.
 Looks like I have to go back to my drawing board and figure out
 an alternative solution: somehow get those group-based-fields
 data into
  a
 single field using copyField.

 Thanks

 Steve

 On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:

  Steven,
 
  I'd be concerned about your relevance with that many qf fields.
   Dismax
  takes a winner takes all point of view to search. Field
  scores
  can
vary
  by an order of magnitude (or even two) despite the attempts of
  query
  normalization. You can read more here
 
 

   
  
  http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dis
  max-why-your-incorrect-assumptions-about-dismax-are-hurting-search-rel
  evancy/
 
  I'm about to win the blashphemer merit badge, but ad-hoc
  all-field
like
  searching over many fields is actually a good use case for
 Elasticsearch's
  cross field queries.
 
 

   
  
  https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fiel
  ds_queries.html

Re: When is too many fields in qf is too many?

2015-05-28 Thread Jack Krupansky

This does not even pass a basic smell test for reasonability of matching
the capabilities of Solr and the needs of your application. I'd like to
hear from others, but I personally would be -1 on this approach to misusing
qf. I'd simply say that you need to go back to the drawing board, and that
your primary focus should be on working with your application product
manager to revise your application requirements to more closely match the
capabilities of Solr.

To put it simply, if you have more than a dozen fields in qf, you're
probably doing something wrong. In this case horribly wrong.

Focus on designing your app to exploit the capabilities of Solr, not to
misuse them.

In short, to answer the original question, more than a couple dozen fields
in qf is indeed too many. More than a dozen raises a yellow flag for me.


-- Jack Krupansky

On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com wrote:

 Hi Charles,

 That is what I have done.  At the moment, I have 22 request handlers, some
 have 3490 field items in qf (that's the most and the qf line spans over
 95,000 characters in solrconfig.xml file) and the least one has 1341
 fields.  I'm working on seeing if I can use copyField to copy the data of
 that view's field into a single pseudo-view-field and use that pseudo field
 for qf of that view's request handler.  The I still have outstanding with
 using copyField in this way is that it could lead to a complete re-indexing
 of all the data in that view when a field is adding / removing from that
 view.

 Thanks

 Steve

 On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  One request handler per view?
 
  I think if you are able to make the actual view in use for the current
  request a single value (vs. all views that the user could use over time),
  it would keep the qf list down to a manageable size (e.g. specified
 within
  the request handler XML).   Not sure if this is feasible for  you, but it
  seems like a reasonable approach given the use case you describe.
 
  Just a thought ...
 
  -Original Message-
  From: Steven White [mailto:swhite4...@gmail.com]
  Sent: Tuesday, May 26, 2015 4:48 PM
  To: solr-user@lucene.apache.org
  Subject: Re: When is too many fields in qf is too many?
 
  Thanks Doug.  I might have to take you on the hangout offer.  Let me
  refine the requirement further and if I still see the need, I will let
 you
  know.
 
  Steve
 
  On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
  dturnb...@opensourceconnections.com wrote:
 
   How you have tie is fine. Setting tie to 1 might give you reasonable
   results. You could easily still have scores that are just always an
   order of magnitude or two higher, but try it out!
  
   BTW Anything you put in teh URL can also be put into a request handler.
  
   If you ever just want to have a 15 minute conversation via hangout,
   happy to chat with you :) Might be fun to think through your prob
  together.
  
   -Doug
  
   On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
   wrote:
  
Hi Doug,
   
I'm back to this topic.  Unfortunately, due to my DB structer, and
   business
need, I will not be able to search against a single field (i.e.:
using copyField).  Thus, I have to use list of fields via qf.
Given this, I see you said above to use tie=1.0 will that, more or
less, address this scoring issue?  Should tie=1.0 be set on the
  request handler like so:
   
  requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=defTypeedismax/str
   str name=qfF1 F2 F3 F4 ... ... .../str
   float name=tie1.0/float
   str name=fl_UNIQUE_FIELD_,score/str
   str name=wtxml/str
   str name=indenttrue/str
 /lst
  /requestHandler
   
Or must tie be passed as part of the URL?
   
Thanks
   
Steve
   
   
On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:
   
 Yeah a copyField into one could be a good space/time tradeoff. It
 can
   be
 more manageable to use an all field for both relevancy and
 performance,
if
 you can handle the duplication of data.

 You could set tie=1.0, which effectively sums all the matches
 instead
   of
 picking the best match. You'll still have cases where one field's
 score might just happen to be far off of another, and thus
 dominating the summation. But something easy to try if you want to
 keep playing with dismax.

 -Doug

 On Wed, May 20, 2015 at 2:56 PM, Steven White
 swhite4...@gmail.com
 wrote:

  Hi Doug,
 
  Your blog write up on relevancy is very interesting, I didn't
  know
this.
  Looks like I have to go back to my drawing board and figure out
  an alternative solution: somehow

Re: When is too many fields in qf is too many?

2015-05-28 Thread Erick Erickson

Gotta agree with Jack here. This is an insane number of fields, query
performance on any significant corpus will be fraught etc. The very
first thing I'd look at is having that many fields. You have 3,500
different fields! Whatever the motivation for having that many fields
is the place I'd start.

Best,
Erick

On Thu, May 28, 2015 at 5:50 AM, Jack Krupansky
jack.krupan...@gmail.com wrote:
 This does not even pass a basic smell test for reasonability of matching
 the capabilities of Solr and the needs of your application. I'd like to
 hear from others, but I personally would be -1 on this approach to misusing
 qf. I'd simply say that you need to go back to the drawing board, and that
 your primary focus should be on working with your application product
 manager to revise your application requirements to more closely match the
 capabilities of Solr.

 To put it simply, if you have more than a dozen fields in qf, you're
 probably doing something wrong. In this case horribly wrong.

 Focus on designing your app to exploit the capabilities of Solr, not to
 misuse them.

 In short, to answer the original question, more than a couple dozen fields
 in qf is indeed too many. More than a dozen raises a yellow flag for me.


 -- Jack Krupansky

 On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com wrote:

 Hi Charles,

 That is what I have done.  At the moment, I have 22 request handlers, some
 have 3490 field items in qf (that's the most and the qf line spans over
 95,000 characters in solrconfig.xml file) and the least one has 1341
 fields.  I'm working on seeing if I can use copyField to copy the data of
 that view's field into a single pseudo-view-field and use that pseudo field
 for qf of that view's request handler.  The I still have outstanding with
 using copyField in this way is that it could lead to a complete re-indexing
 of all the data in that view when a field is adding / removing from that
 view.

 Thanks

 Steve

 On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  One request handler per view?
 
  I think if you are able to make the actual view in use for the current
  request a single value (vs. all views that the user could use over time),
  it would keep the qf list down to a manageable size (e.g. specified
 within
  the request handler XML).   Not sure if this is feasible for  you, but it
  seems like a reasonable approach given the use case you describe.
 
  Just a thought ...
 
  -Original Message-
  From: Steven White [mailto:swhite4...@gmail.com]
  Sent: Tuesday, May 26, 2015 4:48 PM
  To: solr-user@lucene.apache.org
  Subject: Re: When is too many fields in qf is too many?
 
  Thanks Doug.  I might have to take you on the hangout offer.  Let me
  refine the requirement further and if I still see the need, I will let
 you
  know.
 
  Steve
 
  On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
  dturnb...@opensourceconnections.com wrote:
 
   How you have tie is fine. Setting tie to 1 might give you reasonable
   results. You could easily still have scores that are just always an
   order of magnitude or two higher, but try it out!
  
   BTW Anything you put in teh URL can also be put into a request handler.
  
   If you ever just want to have a 15 minute conversation via hangout,
   happy to chat with you :) Might be fun to think through your prob
  together.
  
   -Doug
  
   On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
   wrote:
  
Hi Doug,
   
I'm back to this topic.  Unfortunately, due to my DB structer, and
   business
need, I will not be able to search against a single field (i.e.:
using copyField).  Thus, I have to use list of fields via qf.
Given this, I see you said above to use tie=1.0 will that, more or
less, address this scoring issue?  Should tie=1.0 be set on the
  request handler like so:
   
  requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=defTypeedismax/str
   str name=qfF1 F2 F3 F4 ... ... .../str
   float name=tie1.0/float
   str name=fl_UNIQUE_FIELD_,score/str
   str name=wtxml/str
   str name=indenttrue/str
 /lst
  /requestHandler
   
Or must tie be passed as part of the URL?
   
Thanks
   
Steve
   
   
On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:
   
 Yeah a copyField into one could be a good space/time tradeoff. It
 can
   be
 more manageable to use an all field for both relevancy and
 performance,
if
 you can handle the duplication of data.

 You could set tie=1.0, which effectively sums all the matches
 instead
   of
 picking the best match. You'll still have cases where one field's
 score might just happen to be far off of another, and thus
 dominating

RE: When is too many fields in qf is too many?

2015-05-27 Thread Reitzel, Charles

One request handler per view?   

I think if you are able to make the actual view in use for the current request 
a single value (vs. all views that the user could use over time), it would keep 
the qf list down to a manageable size (e.g. specified within the request 
handler XML).   Not sure if this is feasible for  you, but it seems like a 
reasonable approach given the use case you describe.

Just a thought ...

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Tuesday, May 26, 2015 4:48 PM
To: solr-user@lucene.apache.org
Subject: Re: When is too many fields in qf is too many?

Thanks Doug.  I might have to take you on the hangout offer.  Let me refine the 
requirement further and if I still see the need, I will let you know.

Steve

On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull  
dturnb...@opensourceconnections.com wrote:

 How you have tie is fine. Setting tie to 1 might give you reasonable 
 results. You could easily still have scores that are just always an 
 order of magnitude or two higher, but try it out!

 BTW Anything you put in teh URL can also be put into a request handler.

 If you ever just want to have a 15 minute conversation via hangout, 
 happy to chat with you :) Might be fun to think through your prob together.

 -Doug

 On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
 wrote:

  Hi Doug,
 
  I'm back to this topic.  Unfortunately, due to my DB structer, and
 business
  need, I will not be able to search against a single field (i.e.: 
  using copyField).  Thus, I have to use list of fields via qf.  
  Given this, I see you said above to use tie=1.0 will that, more or 
  less, address this scoring issue?  Should tie=1.0 be set on the request 
  handler like so:
 
requestHandler name=/select class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows20/int
 str name=defTypeedismax/str
 str name=qfF1 F2 F3 F4 ... ... .../str
 float name=tie1.0/float
 str name=fl_UNIQUE_FIELD_,score/str
 str name=wtxml/str
 str name=indenttrue/str
   /lst
/requestHandler
 
  Or must tie be passed as part of the URL?
 
  Thanks
 
  Steve
 
 
  On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull  
  dturnb...@opensourceconnections.com wrote:
 
   Yeah a copyField into one could be a good space/time tradeoff. It 
   can
 be
   more manageable to use an all field for both relevancy and 
   performance,
  if
   you can handle the duplication of data.
  
   You could set tie=1.0, which effectively sums all the matches 
   instead
 of
   picking the best match. You'll still have cases where one field's 
   score might just happen to be far off of another, and thus 
   dominating the summation. But something easy to try if you want to 
   keep playing with dismax.
  
   -Doug
  
   On Wed, May 20, 2015 at 2:56 PM, Steven White 
   swhite4...@gmail.com
   wrote:
  
Hi Doug,
   
Your blog write up on relevancy is very interesting, I didn't 
know
  this.
Looks like I have to go back to my drawing board and figure out 
an alternative solution: somehow get those group-based-fields 
data into
 a
single field using copyField.
   
Thanks
   
Steve
   
On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull  
dturnb...@opensourceconnections.com wrote:
   
 Steven,

 I'd be concerned about your relevance with that many qf fields.
  Dismax
 takes a winner takes all point of view to search. Field 
 scores
 can
   vary
 by an order of magnitude (or even two) despite the attempts of
 query
 normalization. You can read more here


   
  
 
 http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dis
 max-why-your-incorrect-assumptions-about-dismax-are-hurting-search-rel
 evancy/

 I'm about to win the blashphemer merit badge, but ad-hoc
 all-field
   like
 searching over many fields is actually a good use case for
Elasticsearch's
 cross field queries.


   
  
 
 https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fiel
 ds_queries.html


   
  
 
 http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-f
 ield-search-is-a-lie/

 It wouldn't be hard (and actually a great feature for the 
 project)
 to
   get
 the Lucene query associated with cross field search into Solr. 
 You
   could
 easily write a plugin to integrate it into a query parser:


   
  
 
 https://github.com/elastic/elasticsearch/blob/master/src/main/java/org
 /apache/lucene/queries/BlendedTermQuery.java

 Hope that helps
 -Doug
 --
 *Doug Turnbull **| *Search Relevance Consultant | OpenSource
   Connections,
 LLC | 240.476.9983 | http://www.opensourceconnections.com
 Author: Relevant Search http://manning.com/turnbull from 
 Manning Publications This e-mail and all contents, including

Re: When is too many fields in qf is too many?

2015-05-26 Thread Doug Turnbull

How you have tie is fine. Setting tie to 1 might give you reasonable
results. You could easily still have scores that are just always an order
of magnitude or two higher, but try it out!

BTW Anything you put in teh URL can also be put into a request handler.

If you ever just want to have a 15 minute conversation via hangout, happy
to chat with you :) Might be fun to think through your prob together.

-Doug

On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com wrote:

Hi Doug,

I'm back to this topic. Unfortunately, due to my DB structer, and business
need, I will not be able to search against a single field (i.e.: using
copyField). Thus, I have to use list of fields via qf. Given this, I
see you said above to use tie=1.0 will that, more or less, address this
scoring issue? Should tie=1.0 be set on the request handler like so:

requestHandler name=/select class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str name=defTypeedismax/str
str name=qfF1 F2 F3 F4 ... ... .../str
float name=tie1.0/float
str name=fl_UNIQUE_FIELD_,score/str
str name=wtxml/str
str name=indenttrue/str
/lst
/requestHandler

Or must tie be passed as part of the URL?

Thanks

Steve

On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

Yeah a copyField into one could be a good space/time tradeoff. It can be
more manageable to use an all field for both relevancy and performance,
if
you can handle the duplication of data.

You could set tie=1.0, which effectively sums all the matches instead of
picking the best match. You'll still have cases where one field's score
might just happen to be far off of another, and thus dominating the
summation. But something easy to try if you want to keep playing with
dismax.

-Doug

On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com
wrote:

Hi Doug,

Your blog write up on relevancy is very interesting, I didn't know
this.
Looks like I have to go back to my drawing board and figure out an
alternative solution: somehow get those group-based-fields data into a
single field using copyField.

Thanks

Steve

On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

Steven,

http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/

I'm about to win the blashphemer merit badge, but ad-hoc all-field
like
searching over many fields is actually a good use case for
Elasticsearch's
cross field queries.

https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html

http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/

https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java

Hope that helps
-Doug
--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource
Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search http://manning.com/turnbull from Manning
Publications
This e-mail and all contents, including attachments, is considered to
be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com
wrote:

Hi everyone,

My solution requires that users in group-A can only search against
a
set
of
fields-A and users in group-B can only search against a set of
fields-B,
etc. There can be several groups, as many as 100 even more. To
meet
this
need, I build my search by passing in the list of fields via qf.
What
goes into qf can be large: as many as 1500 fields and each field
name
averages 15 characters long, in effect the data passed via qf
will
be
over 20K characters.

Given the above, beside the fact that a search for apple
translating
to a
20K characters passing over the network, what else within Solr and
Lucene I
should be worried about if any? Will I hit some kind of a limit?
Will
each search now require more CPU cycles?

Re: When is too many fields in qf is too many?

2015-05-26 Thread Steven White

Thanks Doug. I might have to take you on the hangout offer. Let me refine
the requirement further and if I still see the need, I will let you know.

Steve

On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

How you have tie is fine. Setting tie to 1 might give you reasonable
results. You could easily still have scores that are just always an order
of magnitude or two higher, but try it out!

BTW Anything you put in teh URL can also be put into a request handler.

If you ever just want to have a 15 minute conversation via hangout, happy
to chat with you :) Might be fun to think through your prob together.

-Doug

On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
wrote:

Hi Doug,

I'm back to this topic. Unfortunately, due to my DB structer, and
business
need, I will not be able to search against a single field (i.e.: using
copyField). Thus, I have to use list of fields via qf. Given this, I
see you said above to use tie=1.0 will that, more or less, address this
scoring issue? Should tie=1.0 be set on the request handler like so:

Or must tie be passed as part of the URL?

Thanks

Steve

On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

Yeah a copyField into one could be a good space/time tradeoff. It can
be
more manageable to use an all field for both relevancy and performance,
if
you can handle the duplication of data.

You could set tie=1.0, which effectively sums all the matches instead
of
picking the best match. You'll still have cases where one field's score
might just happen to be far off of another, and thus dominating the
summation. But something easy to try if you want to keep playing with
dismax.

-Doug

On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com
wrote:

Hi Doug,

Thanks

Steve

On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

Steven,

http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/

I'm about to win the blashphemer merit badge, but ad-hoc
all-field
like
searching over many fields is actually a good use case for
Elasticsearch's
cross field queries.

https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html

http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/

https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java

Hope that helps
-Doug
--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource
Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search http://manning.com/turnbull from Manning
Publications
This e-mail and all contents, including attachments, is considered
to
be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
On Wed, May 20, 2015 at 8:27 AM, Steven White
swhite4...@gmail.com
wrote:

Hi everyone,

My solution requires that users in group-A can only search
against
a
set
of
fields-A and users in group-B can only search against a set of
fields-B,
etc. There can be several groups, as many as 100 even more. To
meet
this
need, I build my search by passing in the list of fields via
qf.
What
goes into qf can be large: as many as 1500 fields and each
field
name

Re: When is too many fields in qf is too many?

2015-05-26 Thread Steven White

Hi Doug,

Or must tie be passed as part of the URL?

Thanks

Steve

On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

Yeah a copyField into one could be a good space/time tradeoff. It can be
more manageable to use an all field for both relevancy and performance, if
you can handle the duplication of data.

-Doug

On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com
wrote:

Hi Doug,

Your blog write up on relevancy is very interesting, I didn't know this.
Looks like I have to go back to my drawing board and figure out an
alternative solution: somehow get those group-based-fields data into a
single field using copyField.

Thanks

Steve

On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

Steven,

http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/

I'm about to win the blashphemer merit badge, but ad-hoc all-field
like
searching over many fields is actually a good use case for
Elasticsearch's
cross field queries.

https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html

http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/

https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java

Hi everyone,

My solution requires that users in group-A can only search against a
set
of
fields-A and users in group-B can only search against a set of
fields-B,
etc. There can be several groups, as many as 100 even more. To meet
this
need, I build my search by passing in the list of fields via qf.
What
goes into qf can be large: as many as 1500 fields and each field
name
averages 15 characters long, in effect the data passed via qf will
be
over 20K characters.

If the network traffic becomes an issue, my alternative solution is
to
create a /select handler for each group and in that handler list the
fields
under qf.

I have considered creating pseudo-fields for each group and then use
copyField into that group. During search, I than can qf against
that
one
field. Unfortunately, this is not ideal for my solution because the
fields
that go into each group dynamically change (at least once a month)
and
when
they do change, I have to re-index everything (this I have to avoid)
to
sync that group-field.

Re: When is too many fields in qf is too many?

2015-05-20 Thread Steven White

 Also, is this 1500 fields that are always populated, or are there really a
 larger number of different record types, each with a relatively small
 number of fields populated in a particular document?

Answer: This is a large number of different record types, each with a
relatively small number of fields in a particular document.  Some documents
will have 5 fields, others may have 50 (that's the average)

 Could you try to point to a real-world example of where your use case
might
 apply, so we can relate to it?

I'm indexing data off a DB, all the fields of each record is indexed.  The
application is complex such that it has views and users belong to 1 or
more views.  Users can move between views and views can change over time.
A user in view-A can see certain fields, while a user in view-B can see
some other fields.  So, when a user issues a search, I have to limit into
which fields that search is executed against.  And like I said, because
users can move between views, and views can change over time, the list of
fields isn't static.  This is why I have to pass the list of fields for
each search based on user's current view.

I hope this gives context to my problem I'm trying to solve and describes
why I'm using fq and why the list of fields maybe long because there is a
case in which a user may belong to N - 1 views.

Steve


On Wed, May 20, 2015 at 11:14 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 The uf parameter is used to specify which fields a user may query against
 - the qf parameter specifies the set of fields that an unfielded query
 term must be queried against. The user is free to specify fielded query
 terms, like field1:term1 OR field2:term2. So, which use case are you
 really talking about.

 Could you try to point to a real-world example of where your use case might
 apply, so we can relate to it?

 Generally, I would say that a Solr document/collection should have no more
 than low hundreds of fields. It's not that you absolutely can't have more
 or absolutely can't have 5,000 or more, but simply that you will be asking
 for trouble, for example, with the cost of comprehending and maintaining
 and communicating your solution with others, including this mailing list
 for support.

 What specifically pushed you to have documents with 1500 field?

 Also, is this 1500 fields that are always populated, or are there really a
 larger number of different record types, each with a relatively small
 number of fields populated in a particular document?


 -- Jack Krupansky

 On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com
 wrote:

  Hi everyone,
 
  My solution requires that users in group-A can only search against a set
 of
  fields-A and users in group-B can only search against a set of fields-B,
  etc.  There can be several groups, as many as 100 even more.  To meet
 this
  need, I build my search by passing in the list of fields via qf.  What
  goes into qf can be large: as many as 1500 fields and each field name
  averages 15 characters long, in effect the data passed via qf will be
  over 20K characters.
 
  Given the above, beside the fact that a search for apple translating
 to a
  20K characters passing over the network, what else within Solr and
 Lucene I
  should be worried about if any?  Will I hit some kind of a limit?  Will
  each search now require more CPU cycles?  Memory?  Etc.
 
  If the network traffic becomes an issue, my alternative solution is to
  create a /select handler for each group and in that handler list the
 fields
  under qf.
 
  I have considered creating pseudo-fields for each group and then use
  copyField into that group.  During search, I than can qf against that
 one
  field.  Unfortunately, this is not ideal for my solution because the
 fields
  that go into each group dynamically change (at least once a month) and
 when
  they do change, I have to re-index everything (this I have to avoid) to
  sync that group-field.
 
  I'm using qf with edismax and my Solr version is 5.1.
 
  Thanks
 
  Steve

Re: When is too many fields in qf is too many?

2015-05-20 Thread Steven White

Thanks Shawn.

I have already switched to using POST because I need to send a long list of
data in qf.  My question isn't about POST / GET, it's about Solr and
Lucene having to deal with such long list of fields.  Here is the text of
my question reposted:

 Given the above, beside the fact that a search for apple translating to
 a 20K characters passing over the network, what else within Solr and
Lucene
 I should be worried about if any?  Will I hit some kind of a limit?  Will
 each search now require more CPU cycles?  Memory?  Etc.

Steve



On Wed, May 20, 2015 at 10:52 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/20/2015 6:27 AM, Steven White wrote:
  My solution requires that users in group-A can only search against a set
 of
  fields-A and users in group-B can only search against a set of fields-B,
  etc.  There can be several groups, as many as 100 even more.  To meet
 this
  need, I build my search by passing in the list of fields via qf.  What
  goes into qf can be large: as many as 1500 fields and each field name
  averages 15 characters long, in effect the data passed via qf will be
  over 20K characters.
 
  Given the above, beside the fact that a search for apple translating
 to a
  20K characters passing over the network, what else within Solr and
 Lucene I
  should be worried about if any?  Will I hit some kind of a limit?  Will
  each search now require more CPU cycles?  Memory?  Etc.

 You have two choices when queries become that large.

 One is to increase the max HTTP header size in the servlet container.
 In most containers, webservers, and proxy servers, this defaults to 8192
 bytes.  This is an approach that works very well, but will not scale to
 extremely large sizes.  I have done this on my indexes, because I
 regularly have queries in the 20K range, but I do not expect them to get
 very much larger than this.

 The other option is to switch to sending a POST instead of a GET.  The
 default max POST size that Solr sets is 2MB, which is plenty for just
 about any query, and can be increased easily to much larger sizes.  If
 you are using SolrJ, switching to POST is very easy ... you'd need to
 research to figure out how if you're using another framework.

 Thanks,
 Shawn

Re: When is too many fields in qf is too many?

2015-05-20 Thread Shawn Heisey

On 5/20/2015 9:24 AM, Steven White wrote:
 I have already switched to using POST because I need to send a long list of
 data in qf.  My question isn't about POST / GET, it's about Solr and
 Lucene having to deal with such long list of fields.  Here is the text of
 my question reposted:
 
 Given the above, beside the fact that a search for apple translating to
 a 20K characters passing over the network, what else within Solr and
 Lucene
 I should be worried about if any?  Will I hit some kind of a limit?  Will
 each search now require more CPU cycles?  Memory?  Etc.

You may need to increase maxBooleanClauses beyond the default of 1024.
There will be a message in the log if that is required.  Note that such
an increase must happen on EVERY config you have, or one of them may set
it back to the 1024 default -- it's a global JVM-wide config.

Large complex queries are usually slow, requiring more memory and CPU
than simple queries, but if you have the resources, Solr will handle it
just fine.

Thanks,
Shawn

Re: When is too many fields in qf is too many?

2015-05-20 Thread Doug Turnbull

Yeah a copyField into one could be a good space/time tradeoff. It can be
more manageable to use an all field for both relevancy and performance, if
you can handle the duplication of data.

-Doug

On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com wrote:

Hi Doug,

Your blog write up on relevancy is very interesting, I didn't know this.
Looks like I have to go back to my drawing board and figure out an
alternative solution: somehow get those group-based-fields data into a
single field using copyField.

Thanks

Steve

On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

Steven,

http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/

I'm about to win the blashphemer merit badge, but ad-hoc all-field like
searching over many fields is actually a good use case for
Elasticsearch's
cross field queries.

https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html

http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/

https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java

Hope that helps
-Doug
--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search http://manning.com/turnbull from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com
wrote:

Hi everyone,

My solution requires that users in group-A can only search against a
set
of
fields-A and users in group-B can only search against a set of
fields-B,
etc. There can be several groups, as many as 100 even more. To meet
this
need, I build my search by passing in the list of fields via qf.
What
goes into qf can be large: as many as 1500 fields and each field name
averages 15 characters long, in effect the data passed via qf will be
over 20K characters.

Given the above, beside the fact that a search for apple translating
to a
20K characters passing over the network, what else within Solr and
Lucene I
should be worried about if any? Will I hit some kind of a limit? Will
each search now require more CPU cycles? Memory? Etc.

If the network traffic becomes an issue, my alternative solution is to
create a /select handler for each group and in that handler list the
fields
under qf.

I have considered creating pseudo-fields for each group and then use
copyField into that group. During search, I than can qf against that
one
field. Unfortunately, this is not ideal for my solution because the
fields
that go into each group dynamically change (at least once a month) and
when
they do change, I have to re-index everything (this I have to avoid) to
sync that group-field.

I'm using qf with edismax and my Solr version is 5.1.

Thanks

Steve

--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search http://manning.com/turnbull from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: When is too many fields in qf is too many?

2015-05-20 Thread Steven White

Hi Doug,

Your blog write up on relevancy is very interesting, I didn't know this.
Looks like I have to go back to my drawing board and figure out an
alternative solution: somehow get those group-based-fields data into a
single field using copyField.

Thanks

Steve

On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull
dturnb...@opensourceconnections.com wrote:

Steven,

http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/

I'm about to win the blashphemer merit badge, but ad-hoc all-field like
searching over many fields is actually a good use case for Elasticsearch's
cross field queries.

https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html

http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/

https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java

Hope that helps
-Doug
--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search http://manning.com/turnbull from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com
wrote:

Hi everyone,

My solution requires that users in group-A can only search against a set
of
fields-A and users in group-B can only search against a set of fields-B,
etc. There can be several groups, as many as 100 even more. To meet
this
need, I build my search by passing in the list of fields via qf. What
goes into qf can be large: as many as 1500 fields and each field name
averages 15 characters long, in effect the data passed via qf will be
over 20K characters.

Given the above, beside the fact that a search for apple translating
to a
20K characters passing over the network, what else within Solr and
Lucene I
should be worried about if any? Will I hit some kind of a limit? Will
each search now require more CPU cycles? Memory? Etc.

If the network traffic becomes an issue, my alternative solution is to
create a /select handler for each group and in that handler list the
fields
under qf.

I have considered creating pseudo-fields for each group and then use
copyField into that group. During search, I than can qf against that
one
field. Unfortunately, this is not ideal for my solution because the
fields
that go into each group dynamically change (at least once a month) and
when
they do change, I have to re-index everything (this I have to avoid) to
sync that group-field.

I'm using qf with edismax and my Solr version is 5.1.

Thanks

Steve

Re: When is too many fields in qf is too many?

2015-05-20 Thread Steven White

Thanks for calling out maxBooleanClauses.  The current default of 1024 has
not caused me any issues (so far) in my testing.

However, you probably saw Doug Tumbull's reply, it looks like my relevance
will suffer.

Steve

On Wed, May 20, 2015 at 11:42 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/20/2015 9:24 AM, Steven White wrote:
  I have already switched to using POST because I need to send a long list
 of
  data in qf.  My question isn't about POST / GET, it's about Solr and
  Lucene having to deal with such long list of fields.  Here is the text of
  my question reposted:
 
  Given the above, beside the fact that a search for apple translating
 to
  a 20K characters passing over the network, what else within Solr and
  Lucene
  I should be worried about if any?  Will I hit some kind of a limit?
 Will
  each search now require more CPU cycles?  Memory?  Etc.

 You may need to increase maxBooleanClauses beyond the default of 1024.
 There will be a message in the log if that is required.  Note that such
 an increase must happen on EVERY config you have, or one of them may set
 it back to the 1024 default -- it's a global JVM-wide config.

 Large complex queries are usually slow, requiring more memory and CPU
 than simple queries, but if you have the resources, Solr will handle it
 just fine.

 Thanks,
 Shawn

Re: When is too many fields in qf is too many?

2015-05-20 Thread Jack Krupansky

The uf parameter is used to specify which fields a user may query against
- the qf parameter specifies the set of fields that an unfielded query
term must be queried against. The user is free to specify fielded query
terms, like field1:term1 OR field2:term2. So, which use case are you
really talking about.

Could you try to point to a real-world example of where your use case might
apply, so we can relate to it?

Generally, I would say that a Solr document/collection should have no more
than low hundreds of fields. It's not that you absolutely can't have more
or absolutely can't have 5,000 or more, but simply that you will be asking
for trouble, for example, with the cost of comprehending and maintaining
and communicating your solution with others, including this mailing list
for support.

What specifically pushed you to have documents with 1500 field?

Also, is this 1500 fields that are always populated, or are there really a
larger number of different record types, each with a relatively small
number of fields populated in a particular document?


-- Jack Krupansky

On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com wrote:

 Hi everyone,

 My solution requires that users in group-A can only search against a set of
 fields-A and users in group-B can only search against a set of fields-B,
 etc.  There can be several groups, as many as 100 even more.  To meet this
 need, I build my search by passing in the list of fields via qf.  What
 goes into qf can be large: as many as 1500 fields and each field name
 averages 15 characters long, in effect the data passed via qf will be
 over 20K characters.

 Given the above, beside the fact that a search for apple translating to a
 20K characters passing over the network, what else within Solr and Lucene I
 should be worried about if any?  Will I hit some kind of a limit?  Will
 each search now require more CPU cycles?  Memory?  Etc.

 If the network traffic becomes an issue, my alternative solution is to
 create a /select handler for each group and in that handler list the fields
 under qf.

 I have considered creating pseudo-fields for each group and then use
 copyField into that group.  During search, I than can qf against that one
 field.  Unfortunately, this is not ideal for my solution because the fields
 that go into each group dynamically change (at least once a month) and when
 they do change, I have to re-index everything (this I have to avoid) to
 sync that group-field.

 I'm using qf with edismax and my Solr version is 5.1.

 Thanks

 Steve

Re: When is too many fields in qf is too many?

2015-05-20 Thread Doug Turnbull

Steven,

I'd be concerned about your relevance with that many qf fields. Dismax
takes a winner takes all point of view to search. Field scores can vary
by an order of magnitude (or even two) despite the attempts of query
normalization. You can read more here
http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/

I'm about to win the blashphemer merit badge, but ad-hoc all-field like
searching over many fields is actually a good use case for Elasticsearch's
cross field queries.
https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html
http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/

It wouldn't be hard (and actually a great feature for the project) to get
the Lucene query associated with cross field search into Solr. You could
easily write a plugin to integrate it into a query parser:
https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java

Hope that helps
-Doug
--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search http://manning.com/turnbull from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com wrote:

Hi everyone,

My solution requires that users in group-A can only search against a set of
fields-A and users in group-B can only search against a set of fields-B,
etc. There can be several groups, as many as 100 even more. To meet this
need, I build my search by passing in the list of fields via qf. What
goes into qf can be large: as many as 1500 fields and each field name
averages 15 characters long, in effect the data passed via qf will be
over 20K characters.

Given the above, beside the fact that a search for apple translating to a
20K characters passing over the network, what else within Solr and Lucene I
should be worried about if any? Will I hit some kind of a limit? Will
each search now require more CPU cycles? Memory? Etc.

If the network traffic becomes an issue, my alternative solution is to
create a /select handler for each group and in that handler list the fields
under qf.

I have considered creating pseudo-fields for each group and then use
copyField into that group. During search, I than can qf against that one
field. Unfortunately, this is not ideal for my solution because the fields
that go into each group dynamically change (at least once a month) and when
they do change, I have to re-index everything (this I have to avoid) to
sync that group-field.

I'm using qf with edismax and my Solr version is 5.1.

Thanks

Steve

Re: When is too many fields in qf is too many?

2015-05-20 Thread Shawn Heisey

On 5/20/2015 6:27 AM, Steven White wrote:
 My solution requires that users in group-A can only search against a set of
 fields-A and users in group-B can only search against a set of fields-B,
 etc.  There can be several groups, as many as 100 even more.  To meet this
 need, I build my search by passing in the list of fields via qf.  What
 goes into qf can be large: as many as 1500 fields and each field name
 averages 15 characters long, in effect the data passed via qf will be
 over 20K characters.
 
 Given the above, beside the fact that a search for apple translating to a
 20K characters passing over the network, what else within Solr and Lucene I
 should be worried about if any?  Will I hit some kind of a limit?  Will
 each search now require more CPU cycles?  Memory?  Etc.

You have two choices when queries become that large.

One is to increase the max HTTP header size in the servlet container.
In most containers, webservers, and proxy servers, this defaults to 8192
bytes.  This is an approach that works very well, but will not scale to
extremely large sizes.  I have done this on my indexes, because I
regularly have queries in the 20K range, but I do not expect them to get
very much larger than this.

The other option is to switch to sending a POST instead of a GET.  The
default max POST size that Solr sets is 2MB, which is plenty for just
about any query, and can be increased easily to much larger sizes.  If
you are using SolrJ, switching to POST is very easy ... you'd need to
research to figure out how if you're using another framework.

Thanks,
Shawn

Re: When is too many fields in qf is too many?

RE: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

RE: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

RE: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

Re: When is too many fields in qf is too many?

21 matches

Site Navigation

Mail list logo

Footer information