Re: Using payloads and user provided data in score

2015-07-23 Thread Jamie Johnson
Well you've at least confirmed what I was thinking :).

I am using payloads now for this and I think I have something very basic
working.  The results don't get dropped out when the scores are 0 so I had
to also write a custom collector that could be plugged into the
AnalyticQueryAPI (maybe there is somewhere better) that drops docs with a 0
score.

On a side note it would be really nice to be able to plug in a custom
collector somewhere, I couldn't find anywhere to do that without using the
AnalyticsQueryAPI.  I had hoped to use the PositiveScoresOnlyCollector to
not have to do anything but didn't see where I could do that.

Again I really appreciate all of the feedback on this!

On Thu, Jul 23, 2015 at 12:30 PM, Erick Erickson erickerick...@gmail.com
wrote:

 bq: Your ugly problem is my situation I think ;)

 No, your problem is much worse ;(

 The _contents_ of fields are restricted, which is
 horrible.

 OK, here's another idea out of waaay left field: Payloads.

 It hinges on there being an OK number of possible combinations
 which seems to be the case here. OK here means  1B say. It
 also hinges on being able to pre-calculate the access rights for
 each term as you index it.

 Then you attach a payload to each term which is, in effect, the
 authorization token for that term that expresses your possibilities,
 A, B, AB, A|B, whatever. Payloads are simply a float that
 gets carried along with the term and is accessible at scoring
 time.

 Now at scoring time, you drop out any terms that have bad
 auth tokens. WARNING: this is totally off the top of my head,
 so I'm sure there are gotchas in here. Like does returning 0
 from the scoring negate the search.

 No clue whether this can work for you, but here's some sample
 code that could give you an idea of how it all works:
 https://lucidworks.com/blog/end-to-end-payload-example-in-solr/

 Good Luck. You're going places Solr wasn't designed to deal
 with so whatever you do will be exciting. And you're right,
 creating huge clauses will be a performance issue, the payloads
 thing may help you tame that.

 Best,
 Erick

 On Thu, Jul 23, 2015 at 7:30 AM, Jamie Johnson jej2...@gmail.com wrote:
  Sorry for being vague, I'll try to explain more.  In my use case a
  particular field does not have a security control, it's the data in the
  field.  So for instance if I had a schema with a field called name, there
  could be data that should be secured at A, B, AB, A|B, etc within that
  field.  So again it's not the field that has this control it's the data
 in
  the field.  My thought based on your suggestion was to dynamically
 generate
  the fields based on the authorizations, this way the user would only see
  name, but it would get translated to the fields in the index that they
 can
  see.  So at index time if a field was added to the solr document that
 said
  name:foo with authorizations AB I would need to translate that to
  name_AB_txt:foo.  Then subsequently on search I would check what fields
 in
  the index the user should be able to see and rewrite queries that said
  name:foo to name_AB_txt:foo (assuming the user can see AB).
 
  We do not explicitly control the fields the user or calling application
 has
  access to because I don't want to expose the name_AB_txt:foo fields to
  calling applications, they know that a field name exists, based on
 that I
  need to translate a name:foo query into the appropriately controlled
  version.  Does that make sense?
 
  My biggest concern with this (beyond the query rewrite) is how it will
  impact scoring (especially in the case information is available with
  multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has
 a
  value of foo and the user has authorizations A and B) and possibly
 bumping
  up against the maximum clause limit as we expand the query.
 
  These reasons were why I thought it best to use payloads to make terms
 with
  authorizations a user can't see not impact the score and then resolve the
  actual object the user can see using a store that already supports this
  type of access pattern (specifically Accumulo in this case).
 
  Your ugly problem is my situation I think ;)
 
  On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  I'm not quite getting it here. I'm guessing that you do not
  allow fielded queries or you strictly control the fields a user
  sees to pick from. Otherwise your security stuff goes out the
  window, say you have a drop-down list of fields to choose from
  or something.
 
  Assuming you do NOT have such a thing, the user is just typing
  words in a box, then you have to figure out, once at the
  app layer, what fields they have access to and just append a
  qf=field_secure1,field_secure2.
  parameter to the query.
 
  That's it. You do not have to rewrite the user query at all, the q
  parameter is just passed through as is.
 
  bq:  I guess in a search component I could look up all of the fields
  

Re: Using payloads and user provided data in score

2015-07-23 Thread Jamie Johnson
Sorry for being vague, I'll try to explain more.  In my use case a
particular field does not have a security control, it's the data in the
field.  So for instance if I had a schema with a field called name, there
could be data that should be secured at A, B, AB, A|B, etc within that
field.  So again it's not the field that has this control it's the data in
the field.  My thought based on your suggestion was to dynamically generate
the fields based on the authorizations, this way the user would only see
name, but it would get translated to the fields in the index that they can
see.  So at index time if a field was added to the solr document that said
name:foo with authorizations AB I would need to translate that to
name_AB_txt:foo.  Then subsequently on search I would check what fields in
the index the user should be able to see and rewrite queries that said
name:foo to name_AB_txt:foo (assuming the user can see AB).

We do not explicitly control the fields the user or calling application has
access to because I don't want to expose the name_AB_txt:foo fields to
calling applications, they know that a field name exists, based on that I
need to translate a name:foo query into the appropriately controlled
version.  Does that make sense?

My biggest concern with this (beyond the query rewrite) is how it will
impact scoring (especially in the case information is available with
multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has a
value of foo and the user has authorizations A and B) and possibly bumping
up against the maximum clause limit as we expand the query.

These reasons were why I thought it best to use payloads to make terms with
authorizations a user can't see not impact the score and then resolve the
actual object the user can see using a store that already supports this
type of access pattern (specifically Accumulo in this case).

Your ugly problem is my situation I think ;)

On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson erickerick...@gmail.com
wrote:

 I'm not quite getting it here. I'm guessing that you do not
 allow fielded queries or you strictly control the fields a user
 sees to pick from. Otherwise your security stuff goes out the
 window, say you have a drop-down list of fields to choose from
 or something.

 Assuming you do NOT have such a thing, the user is just typing
 words in a box, then you have to figure out, once at the
 app layer, what fields they have access to and just append a
 qf=field_secure1,field_secure2.
 parameter to the query.

 That's it. You do not have to rewrite the user query at all, the q
 parameter is just passed through as is.

 bq:  I guess in a search component I could look up all of the fields
 that are in the index and only run queries against fields they should be
 able to see once I know what is in the index (this is what you're
 suggesting right?).

 Kind of, except not in a search component. You have to have modeled
 the access rights somewhere, so I'm not getting why you can't just use
 that model to generate the list of restricted fields the user has access
 to.
 You haven't explained that model other than to say it's complex. So I
 have no clue whether you're talking about not _knowing_ what fields are
 in the docs in the first place (quite possible with dynamic fields) or
 whether you do know the complete field list but calculating the user's
 access
 rights to which fields is complex.

 But I should emphasize again that my assumption is that once calculated,
 this list is invariant so it does not need to be done for every request.
 Indeed,
 what I'm envisioning is not writing any Solr code at all, all done in
 the app layer.

 As far as extra work, there isn't any as far as Solr is concerned.
 It's exactly as though you were specifying this in, say, the request
 handler. So I don't get your concern about lots and lots of fields.
 Now, I'm assuming a simple document model with some number
 of fields. The access rights to which of those fields a user can
 see may be a complex calculation, but again you only need to do it
 once. For that matter, you could pre-calculate that set of fields
 or otherwise cache it.

 Now, this breaks down if the document model isn't that simple,
 say the same field in doc1 can be seen by userX, but userX
 can't see the _same_ field in doc2. That's an ugly problem...

 And let's further say there are a number of fields that _everyone_
 can see. They can be placed in an appends section of the request
 handler so you don't have to specify them for each request.

 Best,
 Erick

 On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson jej2...@gmail.com wrote:
  Looks like this may be what I'm looking for
 
  *SolrRequestInfo*
 
  I have not tried this yet but looks promising.
 
  Assuming this works, thinking about your suggestion I would need to
 rewrite
  the users query with the appropriate fields, are there any utilities for
  doing this?  I'd be looking to rewrite a fielded query like +field:value
  possibly 

Re: Using payloads and user provided data in score

2015-07-23 Thread Erick Erickson
bq: Your ugly problem is my situation I think ;)

No, your problem is much worse ;(

The _contents_ of fields are restricted, which is
horrible.

OK, here's another idea out of waaay left field: Payloads.

It hinges on there being an OK number of possible combinations
which seems to be the case here. OK here means  1B say. It
also hinges on being able to pre-calculate the access rights for
each term as you index it.

Then you attach a payload to each term which is, in effect, the
authorization token for that term that expresses your possibilities,
A, B, AB, A|B, whatever. Payloads are simply a float that
gets carried along with the term and is accessible at scoring
time.

Now at scoring time, you drop out any terms that have bad
auth tokens. WARNING: this is totally off the top of my head,
so I'm sure there are gotchas in here. Like does returning 0
from the scoring negate the search.

No clue whether this can work for you, but here's some sample
code that could give you an idea of how it all works:
https://lucidworks.com/blog/end-to-end-payload-example-in-solr/

Good Luck. You're going places Solr wasn't designed to deal
with so whatever you do will be exciting. And you're right,
creating huge clauses will be a performance issue, the payloads
thing may help you tame that.

Best,
Erick

On Thu, Jul 23, 2015 at 7:30 AM, Jamie Johnson jej2...@gmail.com wrote:
 Sorry for being vague, I'll try to explain more.  In my use case a
 particular field does not have a security control, it's the data in the
 field.  So for instance if I had a schema with a field called name, there
 could be data that should be secured at A, B, AB, A|B, etc within that
 field.  So again it's not the field that has this control it's the data in
 the field.  My thought based on your suggestion was to dynamically generate
 the fields based on the authorizations, this way the user would only see
 name, but it would get translated to the fields in the index that they can
 see.  So at index time if a field was added to the solr document that said
 name:foo with authorizations AB I would need to translate that to
 name_AB_txt:foo.  Then subsequently on search I would check what fields in
 the index the user should be able to see and rewrite queries that said
 name:foo to name_AB_txt:foo (assuming the user can see AB).

 We do not explicitly control the fields the user or calling application has
 access to because I don't want to expose the name_AB_txt:foo fields to
 calling applications, they know that a field name exists, based on that I
 need to translate a name:foo query into the appropriately controlled
 version.  Does that make sense?

 My biggest concern with this (beyond the query rewrite) is how it will
 impact scoring (especially in the case information is available with
 multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has a
 value of foo and the user has authorizations A and B) and possibly bumping
 up against the maximum clause limit as we expand the query.

 These reasons were why I thought it best to use payloads to make terms with
 authorizations a user can't see not impact the score and then resolve the
 actual object the user can see using a store that already supports this
 type of access pattern (specifically Accumulo in this case).

 Your ugly problem is my situation I think ;)

 On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 I'm not quite getting it here. I'm guessing that you do not
 allow fielded queries or you strictly control the fields a user
 sees to pick from. Otherwise your security stuff goes out the
 window, say you have a drop-down list of fields to choose from
 or something.

 Assuming you do NOT have such a thing, the user is just typing
 words in a box, then you have to figure out, once at the
 app layer, what fields they have access to and just append a
 qf=field_secure1,field_secure2.
 parameter to the query.

 That's it. You do not have to rewrite the user query at all, the q
 parameter is just passed through as is.

 bq:  I guess in a search component I could look up all of the fields
 that are in the index and only run queries against fields they should be
 able to see once I know what is in the index (this is what you're
 suggesting right?).

 Kind of, except not in a search component. You have to have modeled
 the access rights somewhere, so I'm not getting why you can't just use
 that model to generate the list of restricted fields the user has access
 to.
 You haven't explained that model other than to say it's complex. So I
 have no clue whether you're talking about not _knowing_ what fields are
 in the docs in the first place (quite possible with dynamic fields) or
 whether you do know the complete field list but calculating the user's
 access
 rights to which fields is complex.

 But I should emphasize again that my assumption is that once calculated,
 this list is invariant so it does not need to be done for every request.
 

Using payloads and user provided data in score

2015-07-22 Thread Jamie Johnson
I am looking for a way to prevent fields that users shouldn't be able to
know exist from contributing to the score.  The goal is to provide a way to
essentially hide certain fields from requests based on an access level
provided on the query.  I have managed to make terms that users shouldn't
be able to see not impact the score by implementing a custom Similarity
class that looks at the terms payloads and returns 0 for the score if they
shouldn't know the field exists.  The issue however is that I don't have
access to the request at this point so getting the users access level is
proving problematic.  Is there a way to get the current request that is
being processed via some thread local variable or something similar that
Solr maintains?  If not is there another approach that I could be using to
access information from the request within my Similarity implementation?
Any thoughts on this would be greatly appreciated.

-Jamie


Re: Using payloads and user provided data in score

2015-07-22 Thread Erick Erickson
Why don't you handle it all at the app level? Here's what I mean:

I'm assuming that you're using edismax here, but the same principle
applies if not.

Your handler (say the /select handler) has a qf parameter which defines
the fields that are searched over in the absence of a field qualifier, e.g.
q=whateverqf=title,description

causes the search term to be looked for in the two fields title and
description
You can also set up the qf fields in the /select handler as one of
the items in
the defaults section

But, the qf param in the defaults section is just that... a default.
So individual
queries can override it. What I have in mind is that you'd look up the user's
field-access list and append that list as necessary to the query and
just pass it
on through.

Things to watch out for:
1 if the user specifies a field, you'll have to strip that off if
they don't have rights,
i.e. q=field1:whatever whenever
ignores the qf parameter for whatever but does respect the qf param
for whenever.
2 If you have some kind of date field say that you want to facet
over, you'd have
to control that.
3 if you have a bag of words where you use copyField to add a bunch
of field's
data to an uber-field then the user can infer some things from that
info, so you probably
don't want to be careful about what copyFields you use.

Best,
Erick

On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson jej2...@gmail.com wrote:
 I am looking for a way to prevent fields that users shouldn't be able to
 know exist from contributing to the score.  The goal is to provide a way to
 essentially hide certain fields from requests based on an access level
 provided on the query.  I have managed to make terms that users shouldn't
 be able to see not impact the score by implementing a custom Similarity
 class that looks at the terms payloads and returns 0 for the score if they
 shouldn't know the field exists.  The issue however is that I don't have
 access to the request at this point so getting the users access level is
 proving problematic.  Is there a way to get the current request that is
 being processed via some thread local variable or something similar that
 Solr maintains?  If not is there another approach that I could be using to
 access information from the request within my Similarity implementation?
 Any thoughts on this would be greatly appreciated.

 -Jamie


Re: Using payloads and user provided data in score

2015-07-22 Thread Erick Erickson
I'm not quite getting it here. I'm guessing that you do not
allow fielded queries or you strictly control the fields a user
sees to pick from. Otherwise your security stuff goes out the
window, say you have a drop-down list of fields to choose from
or something.

Assuming you do NOT have such a thing, the user is just typing
words in a box, then you have to figure out, once at the
app layer, what fields they have access to and just append a
qf=field_secure1,field_secure2.
parameter to the query.

That's it. You do not have to rewrite the user query at all, the q
parameter is just passed through as is.

bq:  I guess in a search component I could look up all of the fields
that are in the index and only run queries against fields they should be
able to see once I know what is in the index (this is what you're
suggesting right?).

Kind of, except not in a search component. You have to have modeled
the access rights somewhere, so I'm not getting why you can't just use
that model to generate the list of restricted fields the user has access to.
You haven't explained that model other than to say it's complex. So I
have no clue whether you're talking about not _knowing_ what fields are
in the docs in the first place (quite possible with dynamic fields) or
whether you do know the complete field list but calculating the user's access
rights to which fields is complex.

But I should emphasize again that my assumption is that once calculated,
this list is invariant so it does not need to be done for every request. Indeed,
what I'm envisioning is not writing any Solr code at all, all done in
the app layer.

As far as extra work, there isn't any as far as Solr is concerned.
It's exactly as though you were specifying this in, say, the request
handler. So I don't get your concern about lots and lots of fields.
Now, I'm assuming a simple document model with some number
of fields. The access rights to which of those fields a user can
see may be a complex calculation, but again you only need to do it
once. For that matter, you could pre-calculate that set of fields
or otherwise cache it.

Now, this breaks down if the document model isn't that simple,
say the same field in doc1 can be seen by userX, but userX
can't see the _same_ field in doc2. That's an ugly problem...

And let's further say there are a number of fields that _everyone_
can see. They can be placed in an appends section of the request
handler so you don't have to specify them for each request.

Best,
Erick

On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson jej2...@gmail.com wrote:
 Looks like this may be what I'm looking for

 *SolrRequestInfo*

 I have not tried this yet but looks promising.

 Assuming this works, thinking about your suggestion I would need to rewrite
 the users query with the appropriate fields, are there any utilities for
 doing this?  I'd be looking to rewrite a fielded query like +field:value
 possibly to something like +(field.secure:value field.secure2:value)

 Again thanks for suggestions
 On Jul 22, 2015 5:20 PM, Jamie Johnson jej2...@gmail.com wrote:

 I answered my own question, looks like the field infos are always read
 within the IndexSearcher so that cost is already being paid.

 I would potentially have to duplicate information in multiple fields if it
 was present at multiple authorization levels, is there a limit to the
 number of fields within a document?  I'm also concerned this might skew my
 search results as terms that had more authorizations would appear in more
 fields and would result in more matches on query.  I'll play with this a
 little but I am still wondering about my original question.

 On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson jej2...@gmail.com wrote:

 I had thought about this in the past, but thought it might be too
 expensive.  I guess in a search component I could look up all of the fields
 that are in the index and only run queries against fields they should be
 able to see once I know what is in the index (this is what you're
 suggesting right?).

 My concern would be that the number of fields per document would grow too
 large to support this.  Our controls aren't simple like user or admin they
 are complex combinations of authorizations so I would think there might be
 a large number of fields that are generated using this approach.  Would
 retrieving all field infos from Solr be expensive on each request to see
 what they should be able to query?

 On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Why don't you handle it all at the app level? Here's what I mean:

 I'm assuming that you're using edismax here, but the same principle
 applies if not.

 Your handler (say the /select handler) has a qf parameter which
 defines
 the fields that are searched over in the absence of a field qualifier,
 e.g.
 q=whateverqf=title,description

 causes the search term to be looked for in the two fields title and
 description
 You can also set up the qf fields in the /select 

Re: Using payloads and user provided data in score

2015-07-22 Thread Jamie Johnson
I answered my own question, looks like the field infos are always read
within the IndexSearcher so that cost is already being paid.

I would potentially have to duplicate information in multiple fields if it
was present at multiple authorization levels, is there a limit to the
number of fields within a document?  I'm also concerned this might skew my
search results as terms that had more authorizations would appear in more
fields and would result in more matches on query.  I'll play with this a
little but I am still wondering about my original question.

On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson jej2...@gmail.com wrote:

 I had thought about this in the past, but thought it might be too
 expensive.  I guess in a search component I could look up all of the fields
 that are in the index and only run queries against fields they should be
 able to see once I know what is in the index (this is what you're
 suggesting right?).

 My concern would be that the number of fields per document would grow too
 large to support this.  Our controls aren't simple like user or admin they
 are complex combinations of authorizations so I would think there might be
 a large number of fields that are generated using this approach.  Would
 retrieving all field infos from Solr be expensive on each request to see
 what they should be able to query?

 On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Why don't you handle it all at the app level? Here's what I mean:

 I'm assuming that you're using edismax here, but the same principle
 applies if not.

 Your handler (say the /select handler) has a qf parameter which
 defines
 the fields that are searched over in the absence of a field qualifier,
 e.g.
 q=whateverqf=title,description

 causes the search term to be looked for in the two fields title and
 description
 You can also set up the qf fields in the /select handler as one of
 the items in
 the defaults section

 But, the qf param in the defaults section is just that... a default.
 So individual
 queries can override it. What I have in mind is that you'd look up the
 user's
 field-access list and append that list as necessary to the query and
 just pass it
 on through.

 Things to watch out for:
 1 if the user specifies a field, you'll have to strip that off if
 they don't have rights,
 i.e. q=field1:whatever whenever
 ignores the qf parameter for whatever but does respect the qf param
 for whenever.
 2 If you have some kind of date field say that you want to facet
 over, you'd have
 to control that.
 3 if you have a bag of words where you use copyField to add a bunch
 of field's
 data to an uber-field then the user can infer some things from that
 info, so you probably
 don't want to be careful about what copyFields you use.

 Best,
 Erick

 On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson jej2...@gmail.com
 wrote:
  I am looking for a way to prevent fields that users shouldn't be able to
  know exist from contributing to the score.  The goal is to provide a
 way to
  essentially hide certain fields from requests based on an access level
  provided on the query.  I have managed to make terms that users
 shouldn't
  be able to see not impact the score by implementing a custom Similarity
  class that looks at the terms payloads and returns 0 for the score if
 they
  shouldn't know the field exists.  The issue however is that I don't have
  access to the request at this point so getting the users access level is
  proving problematic.  Is there a way to get the current request that is
  being processed via some thread local variable or something similar that
  Solr maintains?  If not is there another approach that I could be using
 to
  access information from the request within my Similarity implementation?
  Any thoughts on this would be greatly appreciated.
 
  -Jamie





Re: Using payloads and user provided data in score

2015-07-22 Thread Jamie Johnson
I had thought about this in the past, but thought it might be too
expensive.  I guess in a search component I could look up all of the fields
that are in the index and only run queries against fields they should be
able to see once I know what is in the index (this is what you're
suggesting right?).

My concern would be that the number of fields per document would grow too
large to support this.  Our controls aren't simple like user or admin they
are complex combinations of authorizations so I would think there might be
a large number of fields that are generated using this approach.  Would
retrieving all field infos from Solr be expensive on each request to see
what they should be able to query?

On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Why don't you handle it all at the app level? Here's what I mean:

 I'm assuming that you're using edismax here, but the same principle
 applies if not.

 Your handler (say the /select handler) has a qf parameter which defines
 the fields that are searched over in the absence of a field qualifier, e.g.
 q=whateverqf=title,description

 causes the search term to be looked for in the two fields title and
 description
 You can also set up the qf fields in the /select handler as one of
 the items in
 the defaults section

 But, the qf param in the defaults section is just that... a default.
 So individual
 queries can override it. What I have in mind is that you'd look up the
 user's
 field-access list and append that list as necessary to the query and
 just pass it
 on through.

 Things to watch out for:
 1 if the user specifies a field, you'll have to strip that off if
 they don't have rights,
 i.e. q=field1:whatever whenever
 ignores the qf parameter for whatever but does respect the qf param
 for whenever.
 2 If you have some kind of date field say that you want to facet
 over, you'd have
 to control that.
 3 if you have a bag of words where you use copyField to add a bunch
 of field's
 data to an uber-field then the user can infer some things from that
 info, so you probably
 don't want to be careful about what copyFields you use.

 Best,
 Erick

 On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson jej2...@gmail.com wrote:
  I am looking for a way to prevent fields that users shouldn't be able to
  know exist from contributing to the score.  The goal is to provide a way
 to
  essentially hide certain fields from requests based on an access level
  provided on the query.  I have managed to make terms that users shouldn't
  be able to see not impact the score by implementing a custom Similarity
  class that looks at the terms payloads and returns 0 for the score if
 they
  shouldn't know the field exists.  The issue however is that I don't have
  access to the request at this point so getting the users access level is
  proving problematic.  Is there a way to get the current request that is
  being processed via some thread local variable or something similar that
  Solr maintains?  If not is there another approach that I could be using
 to
  access information from the request within my Similarity implementation?
  Any thoughts on this would be greatly appreciated.
 
  -Jamie



Re: Using payloads and user provided data in score

2015-07-22 Thread Jamie Johnson
Looks like this may be what I'm looking for

*SolrRequestInfo*

I have not tried this yet but looks promising.

Assuming this works, thinking about your suggestion I would need to rewrite
the users query with the appropriate fields, are there any utilities for
doing this?  I'd be looking to rewrite a fielded query like +field:value
possibly to something like +(field.secure:value field.secure2:value)

Again thanks for suggestions
On Jul 22, 2015 5:20 PM, Jamie Johnson jej2...@gmail.com wrote:

 I answered my own question, looks like the field infos are always read
 within the IndexSearcher so that cost is already being paid.

 I would potentially have to duplicate information in multiple fields if it
 was present at multiple authorization levels, is there a limit to the
 number of fields within a document?  I'm also concerned this might skew my
 search results as terms that had more authorizations would appear in more
 fields and would result in more matches on query.  I'll play with this a
 little but I am still wondering about my original question.

 On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson jej2...@gmail.com wrote:

 I had thought about this in the past, but thought it might be too
 expensive.  I guess in a search component I could look up all of the fields
 that are in the index and only run queries against fields they should be
 able to see once I know what is in the index (this is what you're
 suggesting right?).

 My concern would be that the number of fields per document would grow too
 large to support this.  Our controls aren't simple like user or admin they
 are complex combinations of authorizations so I would think there might be
 a large number of fields that are generated using this approach.  Would
 retrieving all field infos from Solr be expensive on each request to see
 what they should be able to query?

 On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Why don't you handle it all at the app level? Here's what I mean:

 I'm assuming that you're using edismax here, but the same principle
 applies if not.

 Your handler (say the /select handler) has a qf parameter which
 defines
 the fields that are searched over in the absence of a field qualifier,
 e.g.
 q=whateverqf=title,description

 causes the search term to be looked for in the two fields title and
 description
 You can also set up the qf fields in the /select handler as one of
 the items in
 the defaults section

 But, the qf param in the defaults section is just that... a default.
 So individual
 queries can override it. What I have in mind is that you'd look up the
 user's
 field-access list and append that list as necessary to the query and
 just pass it
 on through.

 Things to watch out for:
 1 if the user specifies a field, you'll have to strip that off if
 they don't have rights,
 i.e. q=field1:whatever whenever
 ignores the qf parameter for whatever but does respect the qf param
 for whenever.
 2 If you have some kind of date field say that you want to facet
 over, you'd have
 to control that.
 3 if you have a bag of words where you use copyField to add a bunch
 of field's
 data to an uber-field then the user can infer some things from that
 info, so you probably
 don't want to be careful about what copyFields you use.

 Best,
 Erick

 On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson jej2...@gmail.com
 wrote:
  I am looking for a way to prevent fields that users shouldn't be able
 to
  know exist from contributing to the score.  The goal is to provide a
 way to
  essentially hide certain fields from requests based on an access level
  provided on the query.  I have managed to make terms that users
 shouldn't
  be able to see not impact the score by implementing a custom Similarity
  class that looks at the terms payloads and returns 0 for the score if
 they
  shouldn't know the field exists.  The issue however is that I don't
 have
  access to the request at this point so getting the users access level
 is
  proving problematic.  Is there a way to get the current request that is
  being processed via some thread local variable or something similar
 that
  Solr maintains?  If not is there another approach that I could be
 using to
  access information from the request within my Similarity
 implementation?
  Any thoughts on this would be greatly appreciated.
 
  -Jamie