Re: Using payloads and user provided data in score

2015-07-23 Thread Jamie Johnson
Well you've at least confirmed what I was thinking :).

I am using payloads now for this and I think I have something very basic
working.  The results don't get dropped out when the scores are 0 so I had
to also write a custom collector that could be plugged into the
AnalyticQueryAPI (maybe there is somewhere better) that drops docs with a 0
score.

On a side note it would be really nice to be able to plug in a custom
collector somewhere, I couldn't find anywhere to do that without using the
AnalyticsQueryAPI.  I had hoped to use the PositiveScoresOnlyCollector to
not have to do anything but didn't see where I could do that.

Again I really appreciate all of the feedback on this!

On Thu, Jul 23, 2015 at 12:30 PM, Erick Erickson 
wrote:

> bq: Your "ugly problem" is my situation I think ;)
>
> No, your problem is much worse ;(
>
> The _contents_ of fields are restricted, which is
> horrible.
>
> OK, here's another idea out of waaay left field: Payloads.
>
> It hinges on there being an OK number of possible combinations
> which seems to be the case here. "OK" here means < 1B say. It
> also hinges on being able to pre-calculate the access rights for
> each term as you index it.
>
> Then you attach a payload to each term which is, in effect, the
> authorization token for that term that expresses your possibilities,
> A, B, A&B, A|B, whatever. Payloads are simply a float that
> gets carried along with the term and is accessible at scoring
> time.
>
> Now at scoring time, you "drop out" any terms that have "bad"
> auth tokens. WARNING: this is totally off the top of my head,
> so I'm sure there are gotchas in here. Like does returning 0
> from the scoring negate the search.
>
> No clue whether this can work for you, but here's some sample
> code that could give you an idea of how it all works:
> https://lucidworks.com/blog/end-to-end-payload-example-in-solr/
>
> Good Luck. You're going places Solr wasn't designed to deal
> with so whatever you do will be "exciting". And you're right,
> creating huge clauses will be a performance issue, the payloads
> thing may help you tame that.
>
> Best,
> Erick
>
> On Thu, Jul 23, 2015 at 7:30 AM, Jamie Johnson  wrote:
> > Sorry for being vague, I'll try to explain more.  In my use case a
> > particular field does not have a security control, it's the data in the
> > field.  So for instance if I had a schema with a field called name, there
> > could be data that should be secured at A, B, A&B, A|B, etc within that
> > field.  So again it's not the field that has this control it's the data
> in
> > the field.  My thought based on your suggestion was to dynamically
> generate
> > the fields based on the authorizations, this way the user would only see
> > name, but it would get translated to the fields in the index that they
> can
> > see.  So at index time if a field was added to the solr document that
> said
> > name:foo with authorizations A&B I would need to translate that to
> > name_A&B_txt:foo.  Then subsequently on search I would check what fields
> in
> > the index the user should be able to see and rewrite queries that said
> > name:foo to name_A&B_txt:foo (assuming the user can see A&B).
> >
> > We do not explicitly control the fields the user or calling application
> has
> > access to because I don't want to expose the name_A&B_txt:foo fields to
> > calling applications, they know that a field "name" exists, based on
> that I
> > need to translate a name:foo query into the appropriately controlled
> > version.  Does that make sense?
> >
> > My biggest concern with this (beyond the query rewrite) is how it will
> > impact scoring (especially in the case information is available with
> > multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has
> a
> > value of foo and the user has authorizations A and B) and possibly
> bumping
> > up against the maximum clause limit as we expand the query.
> >
> > These reasons were why I thought it best to use payloads to make terms
> with
> > authorizations a user can't see not impact the score and then resolve the
> > actual object the user can see using a store that already supports this
> > type of access pattern (specifically Accumulo in this case).
> >
> > Your "ugly problem" is my situation I think ;)
> >
> > On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> I'm not quite getting it here. I'm guessing that you do not
> >> allow fielded queries or you strictly control the fields a user
> >> sees to pick from. Otherwise your security stuff goes out the
> >> window, say you have a drop-down list of fields to choose from
> >> or something.
> >>
> >> Assuming you do NOT have such a thing, the user is just typing
> >> words in a box, then you have to figure out, once at the
> >> app layer, what fields they have access to and just append a
> >> qf=field_secure1,field_secure2.
> >> parameter to the query.
> >>
> >> That's it. You do not have to rewrite the us

Re: Using payloads and user provided data in score

2015-07-23 Thread Erick Erickson
bq: Your "ugly problem" is my situation I think ;)

No, your problem is much worse ;(

The _contents_ of fields are restricted, which is
horrible.

OK, here's another idea out of waaay left field: Payloads.

It hinges on there being an OK number of possible combinations
which seems to be the case here. "OK" here means < 1B say. It
also hinges on being able to pre-calculate the access rights for
each term as you index it.

Then you attach a payload to each term which is, in effect, the
authorization token for that term that expresses your possibilities,
A, B, A&B, A|B, whatever. Payloads are simply a float that
gets carried along with the term and is accessible at scoring
time.

Now at scoring time, you "drop out" any terms that have "bad"
auth tokens. WARNING: this is totally off the top of my head,
so I'm sure there are gotchas in here. Like does returning 0
from the scoring negate the search.

No clue whether this can work for you, but here's some sample
code that could give you an idea of how it all works:
https://lucidworks.com/blog/end-to-end-payload-example-in-solr/

Good Luck. You're going places Solr wasn't designed to deal
with so whatever you do will be "exciting". And you're right,
creating huge clauses will be a performance issue, the payloads
thing may help you tame that.

Best,
Erick

On Thu, Jul 23, 2015 at 7:30 AM, Jamie Johnson  wrote:
> Sorry for being vague, I'll try to explain more.  In my use case a
> particular field does not have a security control, it's the data in the
> field.  So for instance if I had a schema with a field called name, there
> could be data that should be secured at A, B, A&B, A|B, etc within that
> field.  So again it's not the field that has this control it's the data in
> the field.  My thought based on your suggestion was to dynamically generate
> the fields based on the authorizations, this way the user would only see
> name, but it would get translated to the fields in the index that they can
> see.  So at index time if a field was added to the solr document that said
> name:foo with authorizations A&B I would need to translate that to
> name_A&B_txt:foo.  Then subsequently on search I would check what fields in
> the index the user should be able to see and rewrite queries that said
> name:foo to name_A&B_txt:foo (assuming the user can see A&B).
>
> We do not explicitly control the fields the user or calling application has
> access to because I don't want to expose the name_A&B_txt:foo fields to
> calling applications, they know that a field "name" exists, based on that I
> need to translate a name:foo query into the appropriately controlled
> version.  Does that make sense?
>
> My biggest concern with this (beyond the query rewrite) is how it will
> impact scoring (especially in the case information is available with
> multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has a
> value of foo and the user has authorizations A and B) and possibly bumping
> up against the maximum clause limit as we expand the query.
>
> These reasons were why I thought it best to use payloads to make terms with
> authorizations a user can't see not impact the score and then resolve the
> actual object the user can see using a store that already supports this
> type of access pattern (specifically Accumulo in this case).
>
> Your "ugly problem" is my situation I think ;)
>
> On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson 
> wrote:
>
>> I'm not quite getting it here. I'm guessing that you do not
>> allow fielded queries or you strictly control the fields a user
>> sees to pick from. Otherwise your security stuff goes out the
>> window, say you have a drop-down list of fields to choose from
>> or something.
>>
>> Assuming you do NOT have such a thing, the user is just typing
>> words in a box, then you have to figure out, once at the
>> app layer, what fields they have access to and just append a
>> qf=field_secure1,field_secure2.
>> parameter to the query.
>>
>> That's it. You do not have to rewrite the user query at all, the q
>> parameter is just passed through as is.
>>
>> bq:  I guess in a search component I could look up all of the fields
>> that are in the index and only run queries against fields they should be
>> able to see once I know what is in the index (this is what you're
>> suggesting right?).
>>
>> Kind of, except not in a search component. You have to have modeled
>> the access rights somewhere, so I'm not getting why you can't just use
>> that model to generate the list of restricted fields the user has access
>> to.
>> You haven't explained that model other than to say it's "complex". So I
>> have no clue whether you're talking about not _knowing_ what fields are
>> in the docs in the first place (quite possible with dynamic fields) or
>> whether you do know the complete field list but calculating the user's
>> access
>> rights to which fields is complex.
>>
>> But I should emphasize again that my assumption is that once calcula

Re: Using payloads and user provided data in score

2015-07-23 Thread Jamie Johnson
Sorry for being vague, I'll try to explain more.  In my use case a
particular field does not have a security control, it's the data in the
field.  So for instance if I had a schema with a field called name, there
could be data that should be secured at A, B, A&B, A|B, etc within that
field.  So again it's not the field that has this control it's the data in
the field.  My thought based on your suggestion was to dynamically generate
the fields based on the authorizations, this way the user would only see
name, but it would get translated to the fields in the index that they can
see.  So at index time if a field was added to the solr document that said
name:foo with authorizations A&B I would need to translate that to
name_A&B_txt:foo.  Then subsequently on search I would check what fields in
the index the user should be able to see and rewrite queries that said
name:foo to name_A&B_txt:foo (assuming the user can see A&B).

We do not explicitly control the fields the user or calling application has
access to because I don't want to expose the name_A&B_txt:foo fields to
calling applications, they know that a field "name" exists, based on that I
need to translate a name:foo query into the appropriately controlled
version.  Does that make sense?

My biggest concern with this (beyond the query rewrite) is how it will
impact scoring (especially in the case information is available with
multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has a
value of foo and the user has authorizations A and B) and possibly bumping
up against the maximum clause limit as we expand the query.

These reasons were why I thought it best to use payloads to make terms with
authorizations a user can't see not impact the score and then resolve the
actual object the user can see using a store that already supports this
type of access pattern (specifically Accumulo in this case).

Your "ugly problem" is my situation I think ;)

On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson 
wrote:

> I'm not quite getting it here. I'm guessing that you do not
> allow fielded queries or you strictly control the fields a user
> sees to pick from. Otherwise your security stuff goes out the
> window, say you have a drop-down list of fields to choose from
> or something.
>
> Assuming you do NOT have such a thing, the user is just typing
> words in a box, then you have to figure out, once at the
> app layer, what fields they have access to and just append a
> qf=field_secure1,field_secure2.
> parameter to the query.
>
> That's it. You do not have to rewrite the user query at all, the q
> parameter is just passed through as is.
>
> bq:  I guess in a search component I could look up all of the fields
> that are in the index and only run queries against fields they should be
> able to see once I know what is in the index (this is what you're
> suggesting right?).
>
> Kind of, except not in a search component. You have to have modeled
> the access rights somewhere, so I'm not getting why you can't just use
> that model to generate the list of restricted fields the user has access
> to.
> You haven't explained that model other than to say it's "complex". So I
> have no clue whether you're talking about not _knowing_ what fields are
> in the docs in the first place (quite possible with dynamic fields) or
> whether you do know the complete field list but calculating the user's
> access
> rights to which fields is complex.
>
> But I should emphasize again that my assumption is that once calculated,
> this list is invariant so it does not need to be done for every request.
> Indeed,
> what I'm envisioning is not writing any Solr code at all, all done in
> the app layer.
>
> As far as extra work, there isn't any as far as Solr is concerned.
> It's exactly as though you were specifying this in, say, the request
> handler. So I don't get your concern about lots and lots of fields.
> Now, I'm assuming a simple document model with some number
> of fields. The access rights to which of those fields a user can
> see may be a complex calculation, but again you only need to do it
> once. For that matter, you could pre-calculate that set of fields
> or otherwise cache it.
>
> Now, this breaks down if the document model isn't that simple,
> say the same field in doc1 can be seen by userX, but userX
> can't see the _same_ field in doc2. That's an ugly problem...
>
> And let's further say there are a number of fields that _everyone_
> can see. They can be placed in an  section of the request
> handler so you don't have to specify them for each request.
>
> Best,
> Erick
>
> On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson  wrote:
> > Looks like this may be what I'm looking for
> >
> > *SolrRequestInfo*
> >
> > I have not tried this yet but looks promising.
> >
> > Assuming this works, thinking about your suggestion I would need to
> rewrite
> > the users query with the appropriate fields, are there any utilities for
> > doing this?  I'd be looking to rewrite a f

Re: Using payloads and user provided data in score

2015-07-22 Thread Erick Erickson
I'm not quite getting it here. I'm guessing that you do not
allow fielded queries or you strictly control the fields a user
sees to pick from. Otherwise your security stuff goes out the
window, say you have a drop-down list of fields to choose from
or something.

Assuming you do NOT have such a thing, the user is just typing
words in a box, then you have to figure out, once at the
app layer, what fields they have access to and just append a
qf=field_secure1,field_secure2.
parameter to the query.

That's it. You do not have to rewrite the user query at all, the q
parameter is just passed through as is.

bq:  I guess in a search component I could look up all of the fields
that are in the index and only run queries against fields they should be
able to see once I know what is in the index (this is what you're
suggesting right?).

Kind of, except not in a search component. You have to have modeled
the access rights somewhere, so I'm not getting why you can't just use
that model to generate the list of restricted fields the user has access to.
You haven't explained that model other than to say it's "complex". So I
have no clue whether you're talking about not _knowing_ what fields are
in the docs in the first place (quite possible with dynamic fields) or
whether you do know the complete field list but calculating the user's access
rights to which fields is complex.

But I should emphasize again that my assumption is that once calculated,
this list is invariant so it does not need to be done for every request. Indeed,
what I'm envisioning is not writing any Solr code at all, all done in
the app layer.

As far as extra work, there isn't any as far as Solr is concerned.
It's exactly as though you were specifying this in, say, the request
handler. So I don't get your concern about lots and lots of fields.
Now, I'm assuming a simple document model with some number
of fields. The access rights to which of those fields a user can
see may be a complex calculation, but again you only need to do it
once. For that matter, you could pre-calculate that set of fields
or otherwise cache it.

Now, this breaks down if the document model isn't that simple,
say the same field in doc1 can be seen by userX, but userX
can't see the _same_ field in doc2. That's an ugly problem...

And let's further say there are a number of fields that _everyone_
can see. They can be placed in an  section of the request
handler so you don't have to specify them for each request.

Best,
Erick

On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson  wrote:
> Looks like this may be what I'm looking for
>
> *SolrRequestInfo*
>
> I have not tried this yet but looks promising.
>
> Assuming this works, thinking about your suggestion I would need to rewrite
> the users query with the appropriate fields, are there any utilities for
> doing this?  I'd be looking to rewrite a fielded query like +field:value
> possibly to something like +(field.secure:value field.secure2:value)
>
> Again thanks for suggestions
> On Jul 22, 2015 5:20 PM, "Jamie Johnson"  wrote:
>
>> I answered my own question, looks like the field infos are always read
>> within the IndexSearcher so that cost is already being paid.
>>
>> I would potentially have to duplicate information in multiple fields if it
>> was present at multiple authorization levels, is there a limit to the
>> number of fields within a document?  I'm also concerned this might skew my
>> search results as terms that had more authorizations would appear in more
>> fields and would result in more matches on query.  I'll play with this a
>> little but I am still wondering about my original question.
>>
>> On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson  wrote:
>>
>>> I had thought about this in the past, but thought it might be too
>>> expensive.  I guess in a search component I could look up all of the fields
>>> that are in the index and only run queries against fields they should be
>>> able to see once I know what is in the index (this is what you're
>>> suggesting right?).
>>>
>>> My concern would be that the number of fields per document would grow too
>>> large to support this.  Our controls aren't simple like user or admin they
>>> are complex combinations of authorizations so I would think there might be
>>> a large number of fields that are generated using this approach.  Would
>>> retrieving all field infos from Solr be expensive on each request to see
>>> what they should be able to query?
>>>
>>> On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson 
>>> wrote:
>>>
 Why don't you handle it all at the app level? Here's what I mean:

 I'm assuming that you're using edismax here, but the same principle
 applies if not.

 Your handler (say the "/select" handler) has a "qf" parameter which
 defines
 the fields that are searched over in the absence of a field qualifier,
 e.g.
 q=whatever&qf=title,description

 causes the search term to be looked for in the two fields "title" and
>>

Re: Using payloads and user provided data in score

2015-07-22 Thread Jamie Johnson
Looks like this may be what I'm looking for

*SolrRequestInfo*

I have not tried this yet but looks promising.

Assuming this works, thinking about your suggestion I would need to rewrite
the users query with the appropriate fields, are there any utilities for
doing this?  I'd be looking to rewrite a fielded query like +field:value
possibly to something like +(field.secure:value field.secure2:value)

Again thanks for suggestions
On Jul 22, 2015 5:20 PM, "Jamie Johnson"  wrote:

> I answered my own question, looks like the field infos are always read
> within the IndexSearcher so that cost is already being paid.
>
> I would potentially have to duplicate information in multiple fields if it
> was present at multiple authorization levels, is there a limit to the
> number of fields within a document?  I'm also concerned this might skew my
> search results as terms that had more authorizations would appear in more
> fields and would result in more matches on query.  I'll play with this a
> little but I am still wondering about my original question.
>
> On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson  wrote:
>
>> I had thought about this in the past, but thought it might be too
>> expensive.  I guess in a search component I could look up all of the fields
>> that are in the index and only run queries against fields they should be
>> able to see once I know what is in the index (this is what you're
>> suggesting right?).
>>
>> My concern would be that the number of fields per document would grow too
>> large to support this.  Our controls aren't simple like user or admin they
>> are complex combinations of authorizations so I would think there might be
>> a large number of fields that are generated using this approach.  Would
>> retrieving all field infos from Solr be expensive on each request to see
>> what they should be able to query?
>>
>> On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson 
>> wrote:
>>
>>> Why don't you handle it all at the app level? Here's what I mean:
>>>
>>> I'm assuming that you're using edismax here, but the same principle
>>> applies if not.
>>>
>>> Your handler (say the "/select" handler) has a "qf" parameter which
>>> defines
>>> the fields that are searched over in the absence of a field qualifier,
>>> e.g.
>>> q=whatever&qf=title,description
>>>
>>> causes the search term to be looked for in the two fields "title" and
>>> "description"
>>> You can also set up the qf fields in the "/select" handler as one of
>>> the items in
>>> the  section
>>>
>>> But, the qf param in the  section is just that... a default.
>>> So individual
>>> queries can override it. What I have in mind is that you'd look up the
>>> user's
>>> field-access list and append that list as necessary to the query and
>>> just pass it
>>> on through.
>>>
>>> Things to watch out for:
>>> 1> if the user specifies a field, you'll have to strip that off if
>>> they don't have rights,
>>> i.e. q=field1:whatever whenever
>>> ignores the qf parameter for "whatever" but does respect the qf param
>>> for "whenever".
>>> 2> If you have some kind of date field say that you want to facet
>>> over, you'd have
>>> to control that.
>>> 3> if you have a "bag of words" where you use copyField to add a bunch
>>> of field's
>>> data to an uber-field then the user can infer some things from that
>>> info, so you probably
>>> don't want to be careful about what copyFields you use.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson 
>>> wrote:
>>> > I am looking for a way to prevent fields that users shouldn't be able
>>> to
>>> > know exist from contributing to the score.  The goal is to provide a
>>> way to
>>> > essentially hide certain fields from requests based on an access level
>>> > provided on the query.  I have managed to make terms that users
>>> shouldn't
>>> > be able to see not impact the score by implementing a custom Similarity
>>> > class that looks at the terms payloads and returns 0 for the score if
>>> they
>>> > shouldn't know the field exists.  The issue however is that I don't
>>> have
>>> > access to the request at this point so getting the users access level
>>> is
>>> > proving problematic.  Is there a way to get the current request that is
>>> > being processed via some thread local variable or something similar
>>> that
>>> > Solr maintains?  If not is there another approach that I could be
>>> using to
>>> > access information from the request within my Similarity
>>> implementation?
>>> > Any thoughts on this would be greatly appreciated.
>>> >
>>> > -Jamie
>>>
>>
>>
>


Re: Using payloads and user provided data in score

2015-07-22 Thread Jamie Johnson
I answered my own question, looks like the field infos are always read
within the IndexSearcher so that cost is already being paid.

I would potentially have to duplicate information in multiple fields if it
was present at multiple authorization levels, is there a limit to the
number of fields within a document?  I'm also concerned this might skew my
search results as terms that had more authorizations would appear in more
fields and would result in more matches on query.  I'll play with this a
little but I am still wondering about my original question.

On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson  wrote:

> I had thought about this in the past, but thought it might be too
> expensive.  I guess in a search component I could look up all of the fields
> that are in the index and only run queries against fields they should be
> able to see once I know what is in the index (this is what you're
> suggesting right?).
>
> My concern would be that the number of fields per document would grow too
> large to support this.  Our controls aren't simple like user or admin they
> are complex combinations of authorizations so I would think there might be
> a large number of fields that are generated using this approach.  Would
> retrieving all field infos from Solr be expensive on each request to see
> what they should be able to query?
>
> On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson 
> wrote:
>
>> Why don't you handle it all at the app level? Here's what I mean:
>>
>> I'm assuming that you're using edismax here, but the same principle
>> applies if not.
>>
>> Your handler (say the "/select" handler) has a "qf" parameter which
>> defines
>> the fields that are searched over in the absence of a field qualifier,
>> e.g.
>> q=whatever&qf=title,description
>>
>> causes the search term to be looked for in the two fields "title" and
>> "description"
>> You can also set up the qf fields in the "/select" handler as one of
>> the items in
>> the  section
>>
>> But, the qf param in the  section is just that... a default.
>> So individual
>> queries can override it. What I have in mind is that you'd look up the
>> user's
>> field-access list and append that list as necessary to the query and
>> just pass it
>> on through.
>>
>> Things to watch out for:
>> 1> if the user specifies a field, you'll have to strip that off if
>> they don't have rights,
>> i.e. q=field1:whatever whenever
>> ignores the qf parameter for "whatever" but does respect the qf param
>> for "whenever".
>> 2> If you have some kind of date field say that you want to facet
>> over, you'd have
>> to control that.
>> 3> if you have a "bag of words" where you use copyField to add a bunch
>> of field's
>> data to an uber-field then the user can infer some things from that
>> info, so you probably
>> don't want to be careful about what copyFields you use.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson 
>> wrote:
>> > I am looking for a way to prevent fields that users shouldn't be able to
>> > know exist from contributing to the score.  The goal is to provide a
>> way to
>> > essentially hide certain fields from requests based on an access level
>> > provided on the query.  I have managed to make terms that users
>> shouldn't
>> > be able to see not impact the score by implementing a custom Similarity
>> > class that looks at the terms payloads and returns 0 for the score if
>> they
>> > shouldn't know the field exists.  The issue however is that I don't have
>> > access to the request at this point so getting the users access level is
>> > proving problematic.  Is there a way to get the current request that is
>> > being processed via some thread local variable or something similar that
>> > Solr maintains?  If not is there another approach that I could be using
>> to
>> > access information from the request within my Similarity implementation?
>> > Any thoughts on this would be greatly appreciated.
>> >
>> > -Jamie
>>
>
>


Re: Using payloads and user provided data in score

2015-07-22 Thread Jamie Johnson
I had thought about this in the past, but thought it might be too
expensive.  I guess in a search component I could look up all of the fields
that are in the index and only run queries against fields they should be
able to see once I know what is in the index (this is what you're
suggesting right?).

My concern would be that the number of fields per document would grow too
large to support this.  Our controls aren't simple like user or admin they
are complex combinations of authorizations so I would think there might be
a large number of fields that are generated using this approach.  Would
retrieving all field infos from Solr be expensive on each request to see
what they should be able to query?

On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson 
wrote:

> Why don't you handle it all at the app level? Here's what I mean:
>
> I'm assuming that you're using edismax here, but the same principle
> applies if not.
>
> Your handler (say the "/select" handler) has a "qf" parameter which defines
> the fields that are searched over in the absence of a field qualifier, e.g.
> q=whatever&qf=title,description
>
> causes the search term to be looked for in the two fields "title" and
> "description"
> You can also set up the qf fields in the "/select" handler as one of
> the items in
> the  section
>
> But, the qf param in the  section is just that... a default.
> So individual
> queries can override it. What I have in mind is that you'd look up the
> user's
> field-access list and append that list as necessary to the query and
> just pass it
> on through.
>
> Things to watch out for:
> 1> if the user specifies a field, you'll have to strip that off if
> they don't have rights,
> i.e. q=field1:whatever whenever
> ignores the qf parameter for "whatever" but does respect the qf param
> for "whenever".
> 2> If you have some kind of date field say that you want to facet
> over, you'd have
> to control that.
> 3> if you have a "bag of words" where you use copyField to add a bunch
> of field's
> data to an uber-field then the user can infer some things from that
> info, so you probably
> don't want to be careful about what copyFields you use.
>
> Best,
> Erick
>
> On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson  wrote:
> > I am looking for a way to prevent fields that users shouldn't be able to
> > know exist from contributing to the score.  The goal is to provide a way
> to
> > essentially hide certain fields from requests based on an access level
> > provided on the query.  I have managed to make terms that users shouldn't
> > be able to see not impact the score by implementing a custom Similarity
> > class that looks at the terms payloads and returns 0 for the score if
> they
> > shouldn't know the field exists.  The issue however is that I don't have
> > access to the request at this point so getting the users access level is
> > proving problematic.  Is there a way to get the current request that is
> > being processed via some thread local variable or something similar that
> > Solr maintains?  If not is there another approach that I could be using
> to
> > access information from the request within my Similarity implementation?
> > Any thoughts on this would be greatly appreciated.
> >
> > -Jamie
>


Re: Using payloads and user provided data in score

2015-07-22 Thread Erick Erickson
Why don't you handle it all at the app level? Here's what I mean:

I'm assuming that you're using edismax here, but the same principle
applies if not.

Your handler (say the "/select" handler) has a "qf" parameter which defines
the fields that are searched over in the absence of a field qualifier, e.g.
q=whatever&qf=title,description

causes the search term to be looked for in the two fields "title" and
"description"
You can also set up the qf fields in the "/select" handler as one of
the items in
the  section

But, the qf param in the  section is just that... a default.
So individual
queries can override it. What I have in mind is that you'd look up the user's
field-access list and append that list as necessary to the query and
just pass it
on through.

Things to watch out for:
1> if the user specifies a field, you'll have to strip that off if
they don't have rights,
i.e. q=field1:whatever whenever
ignores the qf parameter for "whatever" but does respect the qf param
for "whenever".
2> If you have some kind of date field say that you want to facet
over, you'd have
to control that.
3> if you have a "bag of words" where you use copyField to add a bunch
of field's
data to an uber-field then the user can infer some things from that
info, so you probably
don't want to be careful about what copyFields you use.

Best,
Erick

On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson  wrote:
> I am looking for a way to prevent fields that users shouldn't be able to
> know exist from contributing to the score.  The goal is to provide a way to
> essentially hide certain fields from requests based on an access level
> provided on the query.  I have managed to make terms that users shouldn't
> be able to see not impact the score by implementing a custom Similarity
> class that looks at the terms payloads and returns 0 for the score if they
> shouldn't know the field exists.  The issue however is that I don't have
> access to the request at this point so getting the users access level is
> proving problematic.  Is there a way to get the current request that is
> being processed via some thread local variable or something similar that
> Solr maintains?  If not is there another approach that I could be using to
> access information from the request within my Similarity implementation?
> Any thoughts on this would be greatly appreciated.
>
> -Jamie