Re: Using payloads and user provided data in score
Well you've at least confirmed what I was thinking :). I am using payloads now for this and I think I have something very basic working. The results don't get dropped out when the scores are 0 so I had to also write a custom collector that could be plugged into the AnalyticQueryAPI (maybe there is somewhere better) that drops docs with a 0 score. On a side note it would be really nice to be able to plug in a custom collector somewhere, I couldn't find anywhere to do that without using the AnalyticsQueryAPI. I had hoped to use the PositiveScoresOnlyCollector to not have to do anything but didn't see where I could do that. Again I really appreciate all of the feedback on this! On Thu, Jul 23, 2015 at 12:30 PM, Erick Erickson wrote: > bq: Your "ugly problem" is my situation I think ;) > > No, your problem is much worse ;( > > The _contents_ of fields are restricted, which is > horrible. > > OK, here's another idea out of waaay left field: Payloads. > > It hinges on there being an OK number of possible combinations > which seems to be the case here. "OK" here means < 1B say. It > also hinges on being able to pre-calculate the access rights for > each term as you index it. > > Then you attach a payload to each term which is, in effect, the > authorization token for that term that expresses your possibilities, > A, B, A&B, A|B, whatever. Payloads are simply a float that > gets carried along with the term and is accessible at scoring > time. > > Now at scoring time, you "drop out" any terms that have "bad" > auth tokens. WARNING: this is totally off the top of my head, > so I'm sure there are gotchas in here. Like does returning 0 > from the scoring negate the search. > > No clue whether this can work for you, but here's some sample > code that could give you an idea of how it all works: > https://lucidworks.com/blog/end-to-end-payload-example-in-solr/ > > Good Luck. You're going places Solr wasn't designed to deal > with so whatever you do will be "exciting". And you're right, > creating huge clauses will be a performance issue, the payloads > thing may help you tame that. > > Best, > Erick > > On Thu, Jul 23, 2015 at 7:30 AM, Jamie Johnson wrote: > > Sorry for being vague, I'll try to explain more. In my use case a > > particular field does not have a security control, it's the data in the > > field. So for instance if I had a schema with a field called name, there > > could be data that should be secured at A, B, A&B, A|B, etc within that > > field. So again it's not the field that has this control it's the data > in > > the field. My thought based on your suggestion was to dynamically > generate > > the fields based on the authorizations, this way the user would only see > > name, but it would get translated to the fields in the index that they > can > > see. So at index time if a field was added to the solr document that > said > > name:foo with authorizations A&B I would need to translate that to > > name_A&B_txt:foo. Then subsequently on search I would check what fields > in > > the index the user should be able to see and rewrite queries that said > > name:foo to name_A&B_txt:foo (assuming the user can see A&B). > > > > We do not explicitly control the fields the user or calling application > has > > access to because I don't want to expose the name_A&B_txt:foo fields to > > calling applications, they know that a field "name" exists, based on > that I > > need to translate a name:foo query into the appropriately controlled > > version. Does that make sense? > > > > My biggest concern with this (beyond the query rewrite) is how it will > > impact scoring (especially in the case information is available with > > multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has > a > > value of foo and the user has authorizations A and B) and possibly > bumping > > up against the maximum clause limit as we expand the query. > > > > These reasons were why I thought it best to use payloads to make terms > with > > authorizations a user can't see not impact the score and then resolve the > > actual object the user can see using a store that already supports this > > type of access pattern (specifically Accumulo in this case). > > > > Your "ugly problem" is my situation I think ;) > > > > On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> I'm not quite getting it here. I'm guessing that you do not > >> allow fielded queries or you strictly control the fields a user > >> sees to pick from. Otherwise your security stuff goes out the > >> window, say you have a drop-down list of fields to choose from > >> or something. > >> > >> Assuming you do NOT have such a thing, the user is just typing > >> words in a box, then you have to figure out, once at the > >> app layer, what fields they have access to and just append a > >> qf=field_secure1,field_secure2. > >> parameter to the query. > >> > >> That's it. You do not have to rewrite the us
Re: Using payloads and user provided data in score
bq: Your "ugly problem" is my situation I think ;) No, your problem is much worse ;( The _contents_ of fields are restricted, which is horrible. OK, here's another idea out of waaay left field: Payloads. It hinges on there being an OK number of possible combinations which seems to be the case here. "OK" here means < 1B say. It also hinges on being able to pre-calculate the access rights for each term as you index it. Then you attach a payload to each term which is, in effect, the authorization token for that term that expresses your possibilities, A, B, A&B, A|B, whatever. Payloads are simply a float that gets carried along with the term and is accessible at scoring time. Now at scoring time, you "drop out" any terms that have "bad" auth tokens. WARNING: this is totally off the top of my head, so I'm sure there are gotchas in here. Like does returning 0 from the scoring negate the search. No clue whether this can work for you, but here's some sample code that could give you an idea of how it all works: https://lucidworks.com/blog/end-to-end-payload-example-in-solr/ Good Luck. You're going places Solr wasn't designed to deal with so whatever you do will be "exciting". And you're right, creating huge clauses will be a performance issue, the payloads thing may help you tame that. Best, Erick On Thu, Jul 23, 2015 at 7:30 AM, Jamie Johnson wrote: > Sorry for being vague, I'll try to explain more. In my use case a > particular field does not have a security control, it's the data in the > field. So for instance if I had a schema with a field called name, there > could be data that should be secured at A, B, A&B, A|B, etc within that > field. So again it's not the field that has this control it's the data in > the field. My thought based on your suggestion was to dynamically generate > the fields based on the authorizations, this way the user would only see > name, but it would get translated to the fields in the index that they can > see. So at index time if a field was added to the solr document that said > name:foo with authorizations A&B I would need to translate that to > name_A&B_txt:foo. Then subsequently on search I would check what fields in > the index the user should be able to see and rewrite queries that said > name:foo to name_A&B_txt:foo (assuming the user can see A&B). > > We do not explicitly control the fields the user or calling application has > access to because I don't want to expose the name_A&B_txt:foo fields to > calling applications, they know that a field "name" exists, based on that I > need to translate a name:foo query into the appropriately controlled > version. Does that make sense? > > My biggest concern with this (beyond the query rewrite) is how it will > impact scoring (especially in the case information is available with > multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has a > value of foo and the user has authorizations A and B) and possibly bumping > up against the maximum clause limit as we expand the query. > > These reasons were why I thought it best to use payloads to make terms with > authorizations a user can't see not impact the score and then resolve the > actual object the user can see using a store that already supports this > type of access pattern (specifically Accumulo in this case). > > Your "ugly problem" is my situation I think ;) > > On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson > wrote: > >> I'm not quite getting it here. I'm guessing that you do not >> allow fielded queries or you strictly control the fields a user >> sees to pick from. Otherwise your security stuff goes out the >> window, say you have a drop-down list of fields to choose from >> or something. >> >> Assuming you do NOT have such a thing, the user is just typing >> words in a box, then you have to figure out, once at the >> app layer, what fields they have access to and just append a >> qf=field_secure1,field_secure2. >> parameter to the query. >> >> That's it. You do not have to rewrite the user query at all, the q >> parameter is just passed through as is. >> >> bq: I guess in a search component I could look up all of the fields >> that are in the index and only run queries against fields they should be >> able to see once I know what is in the index (this is what you're >> suggesting right?). >> >> Kind of, except not in a search component. You have to have modeled >> the access rights somewhere, so I'm not getting why you can't just use >> that model to generate the list of restricted fields the user has access >> to. >> You haven't explained that model other than to say it's "complex". So I >> have no clue whether you're talking about not _knowing_ what fields are >> in the docs in the first place (quite possible with dynamic fields) or >> whether you do know the complete field list but calculating the user's >> access >> rights to which fields is complex. >> >> But I should emphasize again that my assumption is that once calcula
Re: Using payloads and user provided data in score
Sorry for being vague, I'll try to explain more. In my use case a particular field does not have a security control, it's the data in the field. So for instance if I had a schema with a field called name, there could be data that should be secured at A, B, A&B, A|B, etc within that field. So again it's not the field that has this control it's the data in the field. My thought based on your suggestion was to dynamically generate the fields based on the authorizations, this way the user would only see name, but it would get translated to the fields in the index that they can see. So at index time if a field was added to the solr document that said name:foo with authorizations A&B I would need to translate that to name_A&B_txt:foo. Then subsequently on search I would check what fields in the index the user should be able to see and rewrite queries that said name:foo to name_A&B_txt:foo (assuming the user can see A&B). We do not explicitly control the fields the user or calling application has access to because I don't want to expose the name_A&B_txt:foo fields to calling applications, they know that a field "name" exists, based on that I need to translate a name:foo query into the appropriately controlled version. Does that make sense? My biggest concern with this (beyond the query rewrite) is how it will impact scoring (especially in the case information is available with multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has a value of foo and the user has authorizations A and B) and possibly bumping up against the maximum clause limit as we expand the query. These reasons were why I thought it best to use payloads to make terms with authorizations a user can't see not impact the score and then resolve the actual object the user can see using a store that already supports this type of access pattern (specifically Accumulo in this case). Your "ugly problem" is my situation I think ;) On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson wrote: > I'm not quite getting it here. I'm guessing that you do not > allow fielded queries or you strictly control the fields a user > sees to pick from. Otherwise your security stuff goes out the > window, say you have a drop-down list of fields to choose from > or something. > > Assuming you do NOT have such a thing, the user is just typing > words in a box, then you have to figure out, once at the > app layer, what fields they have access to and just append a > qf=field_secure1,field_secure2. > parameter to the query. > > That's it. You do not have to rewrite the user query at all, the q > parameter is just passed through as is. > > bq: I guess in a search component I could look up all of the fields > that are in the index and only run queries against fields they should be > able to see once I know what is in the index (this is what you're > suggesting right?). > > Kind of, except not in a search component. You have to have modeled > the access rights somewhere, so I'm not getting why you can't just use > that model to generate the list of restricted fields the user has access > to. > You haven't explained that model other than to say it's "complex". So I > have no clue whether you're talking about not _knowing_ what fields are > in the docs in the first place (quite possible with dynamic fields) or > whether you do know the complete field list but calculating the user's > access > rights to which fields is complex. > > But I should emphasize again that my assumption is that once calculated, > this list is invariant so it does not need to be done for every request. > Indeed, > what I'm envisioning is not writing any Solr code at all, all done in > the app layer. > > As far as extra work, there isn't any as far as Solr is concerned. > It's exactly as though you were specifying this in, say, the request > handler. So I don't get your concern about lots and lots of fields. > Now, I'm assuming a simple document model with some number > of fields. The access rights to which of those fields a user can > see may be a complex calculation, but again you only need to do it > once. For that matter, you could pre-calculate that set of fields > or otherwise cache it. > > Now, this breaks down if the document model isn't that simple, > say the same field in doc1 can be seen by userX, but userX > can't see the _same_ field in doc2. That's an ugly problem... > > And let's further say there are a number of fields that _everyone_ > can see. They can be placed in an section of the request > handler so you don't have to specify them for each request. > > Best, > Erick > > On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson wrote: > > Looks like this may be what I'm looking for > > > > *SolrRequestInfo* > > > > I have not tried this yet but looks promising. > > > > Assuming this works, thinking about your suggestion I would need to > rewrite > > the users query with the appropriate fields, are there any utilities for > > doing this? I'd be looking to rewrite a f
Re: Using payloads and user provided data in score
I'm not quite getting it here. I'm guessing that you do not allow fielded queries or you strictly control the fields a user sees to pick from. Otherwise your security stuff goes out the window, say you have a drop-down list of fields to choose from or something. Assuming you do NOT have such a thing, the user is just typing words in a box, then you have to figure out, once at the app layer, what fields they have access to and just append a qf=field_secure1,field_secure2. parameter to the query. That's it. You do not have to rewrite the user query at all, the q parameter is just passed through as is. bq: I guess in a search component I could look up all of the fields that are in the index and only run queries against fields they should be able to see once I know what is in the index (this is what you're suggesting right?). Kind of, except not in a search component. You have to have modeled the access rights somewhere, so I'm not getting why you can't just use that model to generate the list of restricted fields the user has access to. You haven't explained that model other than to say it's "complex". So I have no clue whether you're talking about not _knowing_ what fields are in the docs in the first place (quite possible with dynamic fields) or whether you do know the complete field list but calculating the user's access rights to which fields is complex. But I should emphasize again that my assumption is that once calculated, this list is invariant so it does not need to be done for every request. Indeed, what I'm envisioning is not writing any Solr code at all, all done in the app layer. As far as extra work, there isn't any as far as Solr is concerned. It's exactly as though you were specifying this in, say, the request handler. So I don't get your concern about lots and lots of fields. Now, I'm assuming a simple document model with some number of fields. The access rights to which of those fields a user can see may be a complex calculation, but again you only need to do it once. For that matter, you could pre-calculate that set of fields or otherwise cache it. Now, this breaks down if the document model isn't that simple, say the same field in doc1 can be seen by userX, but userX can't see the _same_ field in doc2. That's an ugly problem... And let's further say there are a number of fields that _everyone_ can see. They can be placed in an section of the request handler so you don't have to specify them for each request. Best, Erick On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson wrote: > Looks like this may be what I'm looking for > > *SolrRequestInfo* > > I have not tried this yet but looks promising. > > Assuming this works, thinking about your suggestion I would need to rewrite > the users query with the appropriate fields, are there any utilities for > doing this? I'd be looking to rewrite a fielded query like +field:value > possibly to something like +(field.secure:value field.secure2:value) > > Again thanks for suggestions > On Jul 22, 2015 5:20 PM, "Jamie Johnson" wrote: > >> I answered my own question, looks like the field infos are always read >> within the IndexSearcher so that cost is already being paid. >> >> I would potentially have to duplicate information in multiple fields if it >> was present at multiple authorization levels, is there a limit to the >> number of fields within a document? I'm also concerned this might skew my >> search results as terms that had more authorizations would appear in more >> fields and would result in more matches on query. I'll play with this a >> little but I am still wondering about my original question. >> >> On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson wrote: >> >>> I had thought about this in the past, but thought it might be too >>> expensive. I guess in a search component I could look up all of the fields >>> that are in the index and only run queries against fields they should be >>> able to see once I know what is in the index (this is what you're >>> suggesting right?). >>> >>> My concern would be that the number of fields per document would grow too >>> large to support this. Our controls aren't simple like user or admin they >>> are complex combinations of authorizations so I would think there might be >>> a large number of fields that are generated using this approach. Would >>> retrieving all field infos from Solr be expensive on each request to see >>> what they should be able to query? >>> >>> On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson >>> wrote: >>> Why don't you handle it all at the app level? Here's what I mean: I'm assuming that you're using edismax here, but the same principle applies if not. Your handler (say the "/select" handler) has a "qf" parameter which defines the fields that are searched over in the absence of a field qualifier, e.g. q=whatever&qf=title,description causes the search term to be looked for in the two fields "title" and >>
Re: Using payloads and user provided data in score
Looks like this may be what I'm looking for *SolrRequestInfo* I have not tried this yet but looks promising. Assuming this works, thinking about your suggestion I would need to rewrite the users query with the appropriate fields, are there any utilities for doing this? I'd be looking to rewrite a fielded query like +field:value possibly to something like +(field.secure:value field.secure2:value) Again thanks for suggestions On Jul 22, 2015 5:20 PM, "Jamie Johnson" wrote: > I answered my own question, looks like the field infos are always read > within the IndexSearcher so that cost is already being paid. > > I would potentially have to duplicate information in multiple fields if it > was present at multiple authorization levels, is there a limit to the > number of fields within a document? I'm also concerned this might skew my > search results as terms that had more authorizations would appear in more > fields and would result in more matches on query. I'll play with this a > little but I am still wondering about my original question. > > On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson wrote: > >> I had thought about this in the past, but thought it might be too >> expensive. I guess in a search component I could look up all of the fields >> that are in the index and only run queries against fields they should be >> able to see once I know what is in the index (this is what you're >> suggesting right?). >> >> My concern would be that the number of fields per document would grow too >> large to support this. Our controls aren't simple like user or admin they >> are complex combinations of authorizations so I would think there might be >> a large number of fields that are generated using this approach. Would >> retrieving all field infos from Solr be expensive on each request to see >> what they should be able to query? >> >> On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson >> wrote: >> >>> Why don't you handle it all at the app level? Here's what I mean: >>> >>> I'm assuming that you're using edismax here, but the same principle >>> applies if not. >>> >>> Your handler (say the "/select" handler) has a "qf" parameter which >>> defines >>> the fields that are searched over in the absence of a field qualifier, >>> e.g. >>> q=whatever&qf=title,description >>> >>> causes the search term to be looked for in the two fields "title" and >>> "description" >>> You can also set up the qf fields in the "/select" handler as one of >>> the items in >>> the section >>> >>> But, the qf param in the section is just that... a default. >>> So individual >>> queries can override it. What I have in mind is that you'd look up the >>> user's >>> field-access list and append that list as necessary to the query and >>> just pass it >>> on through. >>> >>> Things to watch out for: >>> 1> if the user specifies a field, you'll have to strip that off if >>> they don't have rights, >>> i.e. q=field1:whatever whenever >>> ignores the qf parameter for "whatever" but does respect the qf param >>> for "whenever". >>> 2> If you have some kind of date field say that you want to facet >>> over, you'd have >>> to control that. >>> 3> if you have a "bag of words" where you use copyField to add a bunch >>> of field's >>> data to an uber-field then the user can infer some things from that >>> info, so you probably >>> don't want to be careful about what copyFields you use. >>> >>> Best, >>> Erick >>> >>> On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson >>> wrote: >>> > I am looking for a way to prevent fields that users shouldn't be able >>> to >>> > know exist from contributing to the score. The goal is to provide a >>> way to >>> > essentially hide certain fields from requests based on an access level >>> > provided on the query. I have managed to make terms that users >>> shouldn't >>> > be able to see not impact the score by implementing a custom Similarity >>> > class that looks at the terms payloads and returns 0 for the score if >>> they >>> > shouldn't know the field exists. The issue however is that I don't >>> have >>> > access to the request at this point so getting the users access level >>> is >>> > proving problematic. Is there a way to get the current request that is >>> > being processed via some thread local variable or something similar >>> that >>> > Solr maintains? If not is there another approach that I could be >>> using to >>> > access information from the request within my Similarity >>> implementation? >>> > Any thoughts on this would be greatly appreciated. >>> > >>> > -Jamie >>> >> >> >
Re: Using payloads and user provided data in score
I answered my own question, looks like the field infos are always read within the IndexSearcher so that cost is already being paid. I would potentially have to duplicate information in multiple fields if it was present at multiple authorization levels, is there a limit to the number of fields within a document? I'm also concerned this might skew my search results as terms that had more authorizations would appear in more fields and would result in more matches on query. I'll play with this a little but I am still wondering about my original question. On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson wrote: > I had thought about this in the past, but thought it might be too > expensive. I guess in a search component I could look up all of the fields > that are in the index and only run queries against fields they should be > able to see once I know what is in the index (this is what you're > suggesting right?). > > My concern would be that the number of fields per document would grow too > large to support this. Our controls aren't simple like user or admin they > are complex combinations of authorizations so I would think there might be > a large number of fields that are generated using this approach. Would > retrieving all field infos from Solr be expensive on each request to see > what they should be able to query? > > On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson > wrote: > >> Why don't you handle it all at the app level? Here's what I mean: >> >> I'm assuming that you're using edismax here, but the same principle >> applies if not. >> >> Your handler (say the "/select" handler) has a "qf" parameter which >> defines >> the fields that are searched over in the absence of a field qualifier, >> e.g. >> q=whatever&qf=title,description >> >> causes the search term to be looked for in the two fields "title" and >> "description" >> You can also set up the qf fields in the "/select" handler as one of >> the items in >> the section >> >> But, the qf param in the section is just that... a default. >> So individual >> queries can override it. What I have in mind is that you'd look up the >> user's >> field-access list and append that list as necessary to the query and >> just pass it >> on through. >> >> Things to watch out for: >> 1> if the user specifies a field, you'll have to strip that off if >> they don't have rights, >> i.e. q=field1:whatever whenever >> ignores the qf parameter for "whatever" but does respect the qf param >> for "whenever". >> 2> If you have some kind of date field say that you want to facet >> over, you'd have >> to control that. >> 3> if you have a "bag of words" where you use copyField to add a bunch >> of field's >> data to an uber-field then the user can infer some things from that >> info, so you probably >> don't want to be careful about what copyFields you use. >> >> Best, >> Erick >> >> On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson >> wrote: >> > I am looking for a way to prevent fields that users shouldn't be able to >> > know exist from contributing to the score. The goal is to provide a >> way to >> > essentially hide certain fields from requests based on an access level >> > provided on the query. I have managed to make terms that users >> shouldn't >> > be able to see not impact the score by implementing a custom Similarity >> > class that looks at the terms payloads and returns 0 for the score if >> they >> > shouldn't know the field exists. The issue however is that I don't have >> > access to the request at this point so getting the users access level is >> > proving problematic. Is there a way to get the current request that is >> > being processed via some thread local variable or something similar that >> > Solr maintains? If not is there another approach that I could be using >> to >> > access information from the request within my Similarity implementation? >> > Any thoughts on this would be greatly appreciated. >> > >> > -Jamie >> > >
Re: Using payloads and user provided data in score
I had thought about this in the past, but thought it might be too expensive. I guess in a search component I could look up all of the fields that are in the index and only run queries against fields they should be able to see once I know what is in the index (this is what you're suggesting right?). My concern would be that the number of fields per document would grow too large to support this. Our controls aren't simple like user or admin they are complex combinations of authorizations so I would think there might be a large number of fields that are generated using this approach. Would retrieving all field infos from Solr be expensive on each request to see what they should be able to query? On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson wrote: > Why don't you handle it all at the app level? Here's what I mean: > > I'm assuming that you're using edismax here, but the same principle > applies if not. > > Your handler (say the "/select" handler) has a "qf" parameter which defines > the fields that are searched over in the absence of a field qualifier, e.g. > q=whatever&qf=title,description > > causes the search term to be looked for in the two fields "title" and > "description" > You can also set up the qf fields in the "/select" handler as one of > the items in > the section > > But, the qf param in the section is just that... a default. > So individual > queries can override it. What I have in mind is that you'd look up the > user's > field-access list and append that list as necessary to the query and > just pass it > on through. > > Things to watch out for: > 1> if the user specifies a field, you'll have to strip that off if > they don't have rights, > i.e. q=field1:whatever whenever > ignores the qf parameter for "whatever" but does respect the qf param > for "whenever". > 2> If you have some kind of date field say that you want to facet > over, you'd have > to control that. > 3> if you have a "bag of words" where you use copyField to add a bunch > of field's > data to an uber-field then the user can infer some things from that > info, so you probably > don't want to be careful about what copyFields you use. > > Best, > Erick > > On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson wrote: > > I am looking for a way to prevent fields that users shouldn't be able to > > know exist from contributing to the score. The goal is to provide a way > to > > essentially hide certain fields from requests based on an access level > > provided on the query. I have managed to make terms that users shouldn't > > be able to see not impact the score by implementing a custom Similarity > > class that looks at the terms payloads and returns 0 for the score if > they > > shouldn't know the field exists. The issue however is that I don't have > > access to the request at this point so getting the users access level is > > proving problematic. Is there a way to get the current request that is > > being processed via some thread local variable or something similar that > > Solr maintains? If not is there another approach that I could be using > to > > access information from the request within my Similarity implementation? > > Any thoughts on this would be greatly appreciated. > > > > -Jamie >
Re: Using payloads and user provided data in score
Why don't you handle it all at the app level? Here's what I mean: I'm assuming that you're using edismax here, but the same principle applies if not. Your handler (say the "/select" handler) has a "qf" parameter which defines the fields that are searched over in the absence of a field qualifier, e.g. q=whatever&qf=title,description causes the search term to be looked for in the two fields "title" and "description" You can also set up the qf fields in the "/select" handler as one of the items in the section But, the qf param in the section is just that... a default. So individual queries can override it. What I have in mind is that you'd look up the user's field-access list and append that list as necessary to the query and just pass it on through. Things to watch out for: 1> if the user specifies a field, you'll have to strip that off if they don't have rights, i.e. q=field1:whatever whenever ignores the qf parameter for "whatever" but does respect the qf param for "whenever". 2> If you have some kind of date field say that you want to facet over, you'd have to control that. 3> if you have a "bag of words" where you use copyField to add a bunch of field's data to an uber-field then the user can infer some things from that info, so you probably don't want to be careful about what copyFields you use. Best, Erick On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson wrote: > I am looking for a way to prevent fields that users shouldn't be able to > know exist from contributing to the score. The goal is to provide a way to > essentially hide certain fields from requests based on an access level > provided on the query. I have managed to make terms that users shouldn't > be able to see not impact the score by implementing a custom Similarity > class that looks at the terms payloads and returns 0 for the score if they > shouldn't know the field exists. The issue however is that I don't have > access to the request at this point so getting the users access level is > proving problematic. Is there a way to get the current request that is > being processed via some thread local variable or something similar that > Solr maintains? If not is there another approach that I could be using to > access information from the request within my Similarity implementation? > Any thoughts on this would be greatly appreciated. > > -Jamie
Using payloads and user provided data in score
I am looking for a way to prevent fields that users shouldn't be able to know exist from contributing to the score. The goal is to provide a way to essentially hide certain fields from requests based on an access level provided on the query. I have managed to make terms that users shouldn't be able to see not impact the score by implementing a custom Similarity class that looks at the terms payloads and returns 0 for the score if they shouldn't know the field exists. The issue however is that I don't have access to the request at this point so getting the users access level is proving problematic. Is there a way to get the current request that is being processed via some thread local variable or something similar that Solr maintains? If not is there another approach that I could be using to access information from the request within my Similarity implementation? Any thoughts on this would be greatly appreciated. -Jamie