Hi Karl,

Apologies for the delayed reply. I've been away on business, and in the
middle of a product release, so it's been a busy time...

In response to your eariler questions:

The 'AND/OR' filter query, will ultimately map down Lucene Boolean clauses,
although the point at which these are done is slightly different.

I think I am correct in my understanding that with filter queries, the
results are filtered 'post-Lucene', but are separately (Solr) cached, so you
get a hit on the first search, but then benefit from cached hits on
subsequent searches. The lower-level 'MUST NOT/SHOULD' etc. clauses are
applied at the Lucene query directly, so don't have separate Solr caching.
I've not benchmarked the two, so one or other might be slower/faster for
various search scenarios.

In any case, I believe either technique can be employed in either 1834 or
1872.


With regards schema extension, I believe we need to be very careful here, as
requiring index-time storage of access control data will pose a problem for
any use cases where the access control needs to change (maybe often, maybe
only occasionally). I'm trying to think of a use case where this wouldn't at
least potentially be the case, and I can't think of one, but perhaps I'm not
truly understanding what exactly is stored in the __ALLOW_TOKEN__ and
__DENY_TOKEN__ fields, and how/where subsequent acl changes would fit in
(e.g. let's say someone has left my organization, do I have to update
documents to remove his/her access?).

Also, would such indexed tokens be entirely 'document-context-free'? I.e.
Would the same type/format of tokens be used for data from different sources
(e.g. NTFS files, network streams, NFS, web pages, etc.). Will the tokens be
compatible with multiple and/or changing authorities (e.g. AD, documentum,
LDAP, custom, etc.)?

I like the idea of an LCF plugin to hold the acl data. I admit, I've not had
enough time to look into how this might look at the moment, but it sounds
like it could be a good way to hold generic (authority-agnostic) acl data,
and [hopefully] not have to tie it to document data at index-time.

I hope this makes sense, but if I've misunderstood the proposed mechanism,
please correct me. Would the __ALLOW_TOKEN__ et al fields store, for
example, SID information?


Thanks,
Peter



On Tue, Apr 27, 2010 at 10:21 PM, <karl.wri...@nokia.com> wrote:

> Ok, not hearing back from Peter, I've done some Solr research and written
> some code that might work.  The approach I've taken is most similar to SOLR
> 1834, other than the LCF-centric logic.  Hopefully there will be a chance to
> try this out in a full end-to-end way  on the weekend, after which I will
> submit it to the Solr team (where I think it most naturally would be built
> and delivered).
>
> What it's going to need is either a static or dynamic schema addition to
> define __ALLOW_TOKEN__document, __DENY_TOKEN__document,
> __ALLOW_TOKEN__share, and __DENY_TOKEN__share fields.  These should be
> string, multivalued fields (I think).  It would be great if these could be
> made a default part of Solr; similarly, it would be good if the new search
> component was predelivered with Solr and mentioned (even if commented out)
> in the example solrconfig.xml file.  The only other thing that needs to be
> done to hook up the search component is to include a configuration parameter
> describing the base URL of the LCF authority service.  Plus, as I said
> earlier, we still don't have a canned solution for authentication yet -
> although I feel that will be straightforward.
>
> Comments welcome...
> Karl
>
>
> ________________________________________
> From: Wright Karl (Nokia-S/Cambridge)
> Sent: Tuesday, April 27, 2010 8:20 AM
> To: connectors-dev@incubator.apache.org; d...@lucene.apache.org
> Cc: connectors-u...@incubator.apache.org; lucene-...@apache.org
> Subject: RE: FW: Solr and LCF security at query time
>
> Hi Peter,
>
> I finally had a moment to review the SOLR 1872 and SOLR 1834 contributions
> in detail, and have a couple of SOLR-related questions.
>
> Both contributions rely on a SearchComponent to work their magic.  However,
> it also appears that each modifies the user query in a different way.  1834
> uses MUST, MUST_NOT, and SHOULD filter items, while 1872 uses standard AND
> and OR filterquery clauses.  Both of them are constructed using Solr
> FilterQuery objects.  Here are my questions:
>
> (1) I am not conversant enough with Solr yet to know the difference between
> the different kinds of clause structure.  Do you know if there is a
> difference?  For example, is there any possibility that AND/OR clauses can
> permit documents to be seen that should not be seen?  (MUST and MUST_NOT
> sound a lot more definite...)
>
> (2) Are Solr FilterQuery objects applied to constructing the query that
> will be sent to Lucene?  Or are they applied by Solr after-the-fact to the
> resultset?  Or, is it a combination of the two, depending on the details of
> your actual filter clause?
>
> I also haven't heard much from you in the last week or so - have you
> thought further about what you intend to do, and can you let me know whether
> you are still interested in developing an LCF plugin for Solr?
>
> Thanks,
> Karl
>
> -----Original Message-----
> From: ext Peter Sturge [mailto:peter.stu...@googlemail.com]
> Sent: Thursday, April 22, 2010 12:23 PM
> To: d...@lucene.apache.org
> Cc: connectors-dev@incubator.apache.org;
> connectors-u...@incubator.apache.org; lucene-...@apache.org
> Subject: Re: FW: Solr and LCF security at query time
>
> Hi Karl,
>
> See inline...
>
> On Thu, Apr 22, 2010 at 4:57 PM, <karl.wri...@nokia.com> wrote:
>
> > Hi Peter,
> >
> > The authority connectors don't perform authentication at this time.
> > In fact, LCF has nothing to do with authentication at all - just
> authorization.
> >  The reason for this is because it is almost never the case that
> > somebody wants to provide multiple credentials in order to be able see
> their results.
> >  Most enterprises who have multiple repositories authenticate against
> > AD and then map AD user names to repository user names in order to
> > access those repositories.  If you noted my earlier posts from this
> > morning, you may have noted that I'm looking at recommending JAAS plus
> > sun's kerb5 login module for handling the "authenticate against AD"
> > case, which would cover some 95%+ of the real world authentication needed
> out there.
> >
> >
> I did read your earlier post regarding this, and I totally agree with you -
> this is best handled 'upstream'. In fact, I use a JAAS plugin in other
> places in the product (not Solr) for authentication.
>
>
> >
> > Yes, the idea is to store SIDs in solr at index time.  I don't know
> > enough about solr to know what kinds of issues this might entail, but
> > Lucene certainly has a model of metadata that's pretty flexible, so I
> > don't think this would be difficult at all.  Eric Hatcher also seemed
> > to confirm my suspicions that this would not be a problem.
> >
>
> It's certainly not a problem to store this data in Solr. The problem is
> more that you don't really *want* to store this data at index time.
> There are lots of reasons for not wanting to 'hard-code' SID data with
> documents in the index. Here's just a few:
>  * What happens if/when you want to add explicit user access to some [group
> of] documents ? (i.e. not via a group)
>  * What happens if you need to revoke or change a user's or group's access?
>  * It's difficult to move/replicate the index to another domain
>  * For AD, SIDs are generally not meant to be stored long term outside of
> AD, as they can be changed (this doesn't happen often, but it can happen
> after an AD rebuild, domain type upgrade, data recovery etc.)
>
> These and other senarios mean re-indexing the stored data. When the index
> is huge, this is non-trivial (time-wise). There are not uncommon scenarios
> where user/group access control can change multiple times in one day.
>
> There might be a way of storing acl data in a payload or similar, but I'm
> not sure how that would work across millions of [arbitrarily grouped ]
> documents (I'm not familiar enough with payloads to know if this would be a
> good or bad idea).
>
>
> >
> > This is exactly why I think that we need to do the authentication
> > upstream of the authority world.
> >
> >
> Agreed.
>
>
>
> >
> > If Solr handles arbitrary document metadata, then I think we could
> > just use that feature.  But you know more about it than me, at this
> > point.  It would be great to get an overview of potential ways of doing
> this.
> >
> >
> Payloads, maybe?
>
>
> >
> > For your particular task, it sounds like you are trying to read from
> > NTFS and apply security after-the-fact with some acl specification
> > file.  In that case, I'd write a repository connector that was based
> > on the file system connector (already part of the stable of connectors
> > for LCF) which reads ACL information from your acl.xml file.  Or, if
> > you prefer a UI for specifying ACL information, you could extend the
> > connector so that security is configured in the UI without having an
> > external acl.xml file at all - which would be a nice addition to the
> > existing file system connector.  (Repository connections and jobs are
> > configured internally in LCF by XML documents stored in the database,
> > so they can be arbitrarily structured.  I'm happy to help you figure
> > out how to do this if this is what you decide to do.)
> >
> > For my particular requirements, there are no files -  the data is
> > generated
> from the network and stored. After the fact, there is no persistent
> location of this data other than in Solr.
>
> Storing the acl info using the connector sounds very interesting. Could be
> worth looking at in more details. Thanks!
>
>
>
> > I think we still need to add in the authentication piece to make this
> > all work for you, so perhaps you can describe how you expect a user to
> > interact with your system, so I can understand your design issues.
> >
> > Thanks,
> > Karl
> >
> >
>
>
>
>
>
>
>
>
> > -----Original Message-----
> > From: ext Peter Sturge [mailto:peter.stu...@googlemail.com]
> > Sent: Thursday, April 22, 2010 11:32 AM
> > To: d...@lucene.apache.org
> > Cc: connectors-u...@incubator.apache.org; lucene-...@apache.org;
> > connectors-dev@incubator.apache.org
> > Subject: Re: FW: Solr and LCF security at query time
> >
> > Hi Karl,
> >
> > Thanks very much for your detailed explanation - really good!
> >
> > As I've thought through some of the implications, I've added comments
> > below, so I hope they don't seem too jumbled...
> >
> > I suppose on the 'authority' side, it works kind of as I envisioned it
> > would.
> >
> > For general Solr access control, there's two layers of security that
> > need to be addressed:
> >  1. Authentication - make sure the incoming query is from a valid
> > user, and the passed-in credentials (hash, certificate etc.) are
> > correct  2. Query filtering - potentially reduce the number/type of
> > returned results based on the allow/deny metadata for the
> > authenticated user
> >
> > I can see how the LCF auth connector works for 2., but can it do 1. as
> > well?
> > It would be good if this could somehow be integrated into any
> > container (Tomcat/Jetty et al) authentication that might be configured
> > (probably related to your previous post). I many ways, it could/should
> > be that the Authority (AD) part of the connector should only be
> > concerned with 1. and not 2. (see below).
> >
> > So, on the repository side, there is also an LCF connector that
> > 'closes the loop' to provide the 'what is it I'm trying to control' side
> of things.
> > I understand that LCF doesn't do the mapping - it delegates this task
> > to the caller, but provides both sides of the equation (authority &
> > repository).
> >
> > >>>>>
> > - Each file in DirectoryA will have the following
> > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890",
> and "myAD:S-23-64-12345".
> > - Each file in DirectoryB will have the following
> > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890"
> > <<<<<
> > I think this is the bit that is worrying me - is this storing the SIDs
> > into Solr at document index time? This would be a problem for a whole
> > load of reasons, but maybe I'm missing something here? (see below for
> > a possible
> > alternative)
> >
> > Basically, what I'm getting at here is that the allow/deny values need
> > to be stored in one of three places:
> >  1. In the authority (e.g. inside AD)
> >  2. In the document metadata (index-time)  3. In external storage
> > (e.g. acl.xml, NTFS etc.)
> >
> > 1. Extending AD is pretty much out, as this causes too many interop
> > problems 2. 'Hard-coding' acl information in the index makes it
> > non-portable, resistent to changes, etc.
> > 3. acl.xml is coupled with a Solr instance, but is easily
> > ported/replicated.
> > Storing/retrieving acl information from the source (e.g. NTFS) is
> > problematic, as the source may not be accessible (it may not even exist).
> >
> > I believe 3. or a variant is the way to go on the repo side, which
> > means the LCF Authority connector is mainly for Authentication (see
> > above), which is what you want from AD et al integration.
> > The problem that arises from 'pluggable' authentication is that, if
> > you're not using a certificate, you have to start with a password, but
> > the connector only has access to the password hash (unless the pwd is
> > sent in the query url). I don't know of a way to confirm identities in
> > AD using only the username and hash (AD does the hash compare). I
> > believe this is where container-based integration will likely work
> better.
> >
> > So that I can confirm my understanding...a scenario might be like this:
> >
> > We have an AD connector that fetches the SIDs and we can read them etc.
> > For my environment, where there are no 'files' (there's only a
> > transient network stream), we have an LCF 'Solr Field Filter Query'
> > connector that decides which Filter Queries to apply (allow and deny)
> > for the passed in SID(s).
> >
> > For another environment, let's say, NTFS, there might be an 'NTFS'
> > connector that would provide some kind of mapping of files/folders to
> > SID(s). Since Solr wouldn't intrinscially know about this, the acl
> > information would need to be stored somewhere in the index. This would
> > mean extending the Solr schema and storing metadata at index time.
> > The alternative is to re-use the 'Solr Field Filter Query' connector
> > for this as well (and any other document types that might be read in).
> > This keeps the index 'clean' of acl-specific metadata, and allows for
> > in-place changes and easy cross-document/index/instance access control.
> >
> >
> > If the above interpretation is [roughly] correct (please let me know
> > if I've got this wrong!), this would reduce down to having:
> >   1. One or more LCF Authority connectors (e.g. AD, Documentum, etc.)
> > (possibly/partly at the container level)
> >   2. At least an LCF Repository connector for 'acl.xml'
> >   3. Optional other LCF Repository connectors
> >
> > It sounds like you've now finished the first half of 1. by adding the
> > ability to get the required auth data from a Solr api call. The other
> > half of 1. will be implementing the LCF interface in the
> > SolrACLSecurity class, to effectively replace the 'user', 'group' and
> 'password' bits of acl.xml.
> >
> > Does the above sound like an accurate interpretation? Just trying to
> > get a good picture of what work needs doing, where it goes, etc.
> >
> > Many thanks!
> > Peter
> >
> >
> >
> >
> > On Thu, Apr 22, 2010 at 2:52 PM, <karl.wri...@nokia.com> wrote:
> >
> > >  >>>>>>
> > > What is the relationship between stored data (documents) and
> authorities'
> > > access/deny attributes? (do you have any examples of what an
> > > access_token value might contain?) <<<<<<
> > >
> > > Documents have access/deny attributes; authorities simply provide
> > > the list of tokens that belong to an authenticated user.  Thus,
> > > there's no access/deny for an authority; that's attached to the
> > > document (as it is in real-world repositories).
> > >
> > > Let's run a quick example, using Active Directory and a Windows file
> > > system.  Suppose that you have a directory with documents in it,
> > > call it DirectoryA, and the directory allows read access to the
> > > following
> > SIDs:
> > >
> > > S-123-456-76890
> > > S-23-64-12345
> > >
> > > These SIDs correspond to active directory groups, let's call them
> > > Group1 and Group2, respectively.
> > >
> > > DirectoryB also has documents in it, and those documents have just
> > > the SID S-123-456-76890 attached, because only Group1 can read its
> contents.
> > >
> > > Now, pretend that someone has created an LCF Active Directory
> > > authority connection (in the LCF UI), which is called "myAD", and
> > > this connection is set up to talk to the governing AD domain
> > > controller for this Windows file system.  We now know enough to
> > > describe the document
> > indexing process:
> > >
> > > - Each file in DirectoryA will have the following
> > > __ALLOW_TOKEN__document attributes inside Solr:
> > > "myAD:S-123-456-76890",
> > and "myAD:S-23-64-12345".
> > > - Each file in DirectoryB will have the following
> > > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890"
> > >
> > > Now, suppose that a user (let's call him "Peter") is authenticated
> > > with the AD domain controller.  Peter belongs to Group2, so his SIDs
> > > are
> > (say):
> > >
> > > S-1-1-0 (the 'everyone' SID)
> > > S-323-999-12345 (his own personal user SID)
> > > S-23-64-12345 (the SID he gets because he belongs to group 2)
> > >
> > > We want to look up the documents in the search index that he can see.
> > > So, we ask the LCF authority service what his tokens are, and we get
> > back:
> > >
> > > "myAD:S-1-1-0", "myAD:S-323-999-12345", and "myAD:S-23-64-12345"
> > >
> > > The documents we should return in his search are the ones matching
> > > his search criteria, PLUS the intersection of his tokens with the
> > > document ALLOW tokens, MINUS the intersection of his tokens with the
> > > document DENY tokens (there aren't any involved in this example).
> > > So only files that have one of his three tokens as an ALLOW
> > > attribute would be
> > returned.
> > >
> > > Note that what we are attempting to do is enforce AD's security with
> > > the search results we present.  There is no need to define a whole
> > > new security mechanism, because AD already has one that people use.
> > >
> > > >>>>>>
> > > One of the key requirements I've worked to adhere to in SOLR-1872 is
> > > to ensure there are no security or other dependencies of indexed
> > > data with any external repository - most notably the file system.
> > > There are many reasons for wanting this, but one of the main ones is
> > > that Solr-stored data is not always based on file data (or
> > > accessible
> > file data).
> > > In fact, in my particular case, almost none of the indexed data
> > > comes from files.
> > > <<<<<<
> > >
> > > LCF is all about abstracting from repositories.  It's not
> > > specifically about a file system, although that is a convenient
> > > example.  If you are building your own kind of repository with your
> > > own security setup, that's fine - but in the LCF world you'd need to
> > > create an authority connector for your repository (which maybe reads
> > > your acl.xml file), as well as a repository connector (which hands
> > > documents to LCF and provides it with the access tokens that make
> > > security work).  Of course, you can something much lighter that
> > > doesn't include LCF at all if you are just integrating a custom
> > > repository of your own, but it sounded like you were interested in the
> broader problem here.
> > >
> > > So, LCF doesn't do "acl mapping" at all.  It relies on its various
> > > connectors to work cooperatively to define access tokens in a way
> > > that is consistent from authority connector to repository connector
> > > for a given repository kind.  Anybody can write a connector, so the
> > > beauty of all this is that you can build a system where data from
> > > many disparate sources is indexed, and security for each is
> > > simultaneously
> > enforced.
> > >
> > > Karl
> > >
> > >
> > >  ------------------------------
> > > *From:* ext Peter Sturge [mailto:peter.stu...@googlemail.com]
> > > *Sent:* Thursday, April 22, 2010 9:24 AM
> > >
> > > *To:* d...@lucene.apache.org
> > > *Cc:* connectors-u...@incubator.apache.org; lucene-...@apache.org;
> > > connectors-dev@incubator.apache.org
> > > *Subject:* Re: FW: Solr and LCF security at query time
> > >
> > > Hi Karl,
> > >
> > > Thanks very much for the diagram -
> > > Sorry about all the questions, but this raises a few new ones...
> > >
> > > What is the relationship between stored data (documents) and
> authorities'
> > > access/deny attributes? (do you have any examples of what an
> > > access_token value might contain?)
> > >
> > > One of the key requirements I've worked to adhere to in SOLR-1872 is
> > > to ensure there are no security or other dependencies of indexed
> > > data with any external repository - most notably the file system.
> > > There are many reasons for wanting this, but one of the main ones is
> > > that Solr-stored data is not always based on file data (or
> > > accessible
> > file data).
> > > In fact, in my particular case, almost none of the indexed data
> > > comes from files.
> > >
> > > This is one reason why SOLR-1872 uses filter queries for its
> > > access/deny tokens - so that all the required information for access
> > > control completely resides within the Solr index itself.
> > > Is the LCF architecture acl 'mapping' between Solr fields (queries)
> > > and users, some external 'repository' (files) and users, or
> > > arbitrary
> > data (e.g.
> > > either of these)?
> > >
> > > I hope that makes sense...
> > >
> > > Thanks!
> > > Peter
> > >
> > >
> > >
> > >
> > > On Thu, Apr 22, 2010 at 10:25 AM, <karl.wri...@nokia.com> wrote:
> > >
> > >> Hi Peter,
> > >>
> > >> I've attached a diagram that is not in the wiki as of yet, and I'll
> > >> try to answer your questions.
> > >>
> > >> >>>>>>
> > >> Are the ACCESS_TOKEN and DENY_TOKEN values whatever have been
> > >> stored for a particular user in the underlying acl store (e.g.
> > >> Active
> > Directory)?
> > >> How does AD and/or LCF handle storing such data in its schema?
> > >> (does AD needs its schema extended?) Presumably, any such AD fields
> > >> would need to be queried for effective rights in order to cater for
> > >> group membership allows and denies.
> > >> <<<<<<
> > >>
> > >> The ACCESS_TOKEN and DENY_TOKEN values are, in one sense, arbitrary
> > >> strings that represent a contract between an LCF authority
> > >> connection and the LCF repository connection that picks up the
> > >> documents (from
> > wherever).
> > >>  These tokens thus have no real meaning outside of LCF.  You must
> > >> regard them as opaque.
> > >>
> > >> The contract, however, states that if you use the LCF authority
> > >> service to obtain tokens for an authenticated user, you will get
> > >> back a set that is CONSISTENT with the tokens that were attached to
> > >> the documents LCF sent to Solr for indexing in the first place.
> > >> So, you don't have to worry about it, and that's kind of the idea.
> > >> So you
> > imagine the following flow:
> > >>
> > >> (1) Use LCF to fetch documents and send them to Solr
> > >> (2) When searching, use the LCF authority service to get the
> > >> desired user's access tokens
> > >> (3) Either filter the results, or modify the query, to be sure the
> > >> access tokens all match up properly
> > >>
> > >> For the AD authority, the LCF access tokens consist, in part, of
> > >> the user's SIDs.  For other authorities, the access tokens are
> > >> wildly
> > different.
> > >>  You really don't want to know what's in them, since that's the job
> > >> of the LCF authority to determine. ;-)
> > >>
> > >> LCF is not, by the way, joined at the hip with AD.  However, in
> > >> practice, most enterprises in the world use some form of AD single
> > >> signon for their web applications, and even if they're using some
> > >> repository with its own idea of security, there's a mapping between
> > >> the AD users and the repository's users.  Doing that mapping is
> > >> also the job of the LCF authority for that repository.
> > >>
> > >> Hope this helps.  Also, I'm not expecting time miracles here, so
> > >> don't sweat the schedule.
> > >>
> > >>
> > >> Karl
> > >>
> > >>
> > >> ________________________________________
> > >> From: ext Peter Sturge [peter.stu...@googlemail.com]
> > >> Sent: Thursday, April 22, 2010 4:27 AM
> > >> To: d...@lucene.apache.org
> > >> Cc: connectors-u...@incubator.apache.org; lucene-...@apache.org;
> > >> connectors-dev@incubator.apache.org
> > >> Subject: Re: FW: Solr and LCF security at query time
> > >>
> > >> Hi Karl,
> > >>
> > >> Thanks for the quick turnaround.
> > >> I'm in the middle of a product release for us, so I fear I won't be
> > >> as quick as you... :-)
> > >>
> > >> I couldn't find a simple flow diagram or similar for LCF with
> > >> regards security (probably looking in the wrong place).
> > >> Perhaps you could help on these questions...?
> > >>
> > >> In SOLR-1872, the allows and denies are stored (in acl.xml) as
> > >> sub-queries, which are then used as filter queries in a user's search.
> > >>
> > >> Are the ACCESS_TOKEN and DENY_TOKEN values whatever have been
> > >> stored for a particular user in the underlying acl store (e.g.
> > >> Active
> > Directory)?
> > >> How does AD and/or LCF handle storing such data in its schema?
> > >> (does AD needs its schema extended?) Presumably, any such AD fields
> > >> would need to be queried for effective rights in order to cater for
> > >> group membership allows and denies.
> > >>
> > >> I guess I'm just trying to understand the architectural
> > >> flow/storage/retrieval of data in the various parts of the system,
> > >> but I admit, I need to do more research on this.
> > >> After our product release, when I get a few more spare cycles, I
> > >> can look at it in more detail.
> > >>
> > >> Many thanks!
> > >> Peter
> > >>
> > >>
> > >>
> > >> On Thu, Apr 22, 2010 at 1:02 AM, <karl.wri...@nokia.com<mailto:
> > >> karl.wri...@nokia.com>> wrote:
> > >> Hi Peter,
> > >>
> > >> I just committed the promised changes to the LCF Solr output
> connector.
> > >>
> > >> ACL metadata will now be posted to the Solr Http interface along
> > >> with the document as the two following fields:
> > >>
> > >> __ACCESS_TOKEN__document
> > >> __DENY_TOKEN__document
> > >>
> > >> There will, of course, potentially be multiple values for each of
> > >> these two fields.
> > >>
> > >> Hope this helps,
> > >> Karl
> > >>
> > >> ________________________________
> > >> From: ext Peter Sturge [mailto:peter.stu...@googlemail.com<mailto:
> > >> peter.stu...@googlemail.com>]
> > >> Sent: Tuesday, April 20, 2010 6:51 PM
> > >>
> > >> To: connectors-u...@incubator.apache.org<mailto:
> > >> connectors-u...@incubator.apache.org>
> > >> Subject: Re: FW: Solr and LCF security at query time
> > >>
> > >> Hi Karl,
> > >>
> > >> Thanks for the info. I'll have a look at the link and try to take
> > >> in as much sugar as my insulin levels will handle...
> > >> It sounds like the necessary interface(s) are already in LCF - just
> > >> a matter of implementing them in the Solr 1872 plugin.
> > >> I'll need to digest the LCF stuff to get to grips with it..please
> > >> bear with me while I do that...
> > >>
> > >> When you say:
> > >>   The LCF solr output connection doesn't yet do this, but it is
> > >> trivial for me to make that happen.
> > >> Do you mean a mechanism by which solr.war can get url et al info
> > >> from its parent container (Tomcat, Jetty etc.), or have I
> > >> misinterpreted
> > this?
> > >>
> > >>
> > >> Thanks,
> > >> Peter
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Apr 20, 2010 at 11:05 PM, <karl.wri...@nokia.com<mailto:
> > >> karl.wri...@nokia.com>> wrote:
> > >> Hi Peter,
> > >>
> > >> I'm the principal committer for LCF, but I don't know as much about
> > >> Solr as I ought to, so it sounds like a potentially productive
> > collaboration.
> > >>
> > >> LCF does exactly what you are looking for - the only issue at all
> > >> is that you need to fetch a URL from a webapp to get what you are
> > >> looking for.  The "plugs" are all inside LCF for different kinds of
> > >> repositories.  Here's a link that might help with drinking the LCF
> > "koolaid", as it were:
> > >> https://cwiki.apache.org/confluence/display/CONNECTORS/Lucene+Conne
> > >> ct
> > >> ors+Framework+concepts
> > >>
> > >> The url would be something like this (on a locally installed
> > >> tomcat-based LCF instance):
> > >>
> > >>
> > >> http://localhost:8080/lcf-authority-service/UserACLs?username=someu
> > >> se
> > >> rn...@somedomain.com
> > >>
> > >> ... and this fetch returns something like:
> > >>
> > >> TOKEN:xxxxxxx
> > >> TOKEN:yyyyyyy
> > >> TOKEN:zzzzzzz
> > >> ....
> > >>
> > >> ... which represent the amalgamated tokens for all of the defined
> > >> authorities, and by some strange coincidence ( ;-) ) are compatible
> > >> with certain pieces of metadata that have been passed into Solr
> > >> with each document - one set of Allow tokens, and a second set of
> > >> Deny tokens.  The LCF solr output connection doesn't yet do this,
> > >> but it is trivial for me to make that happen.
> > >>
> > >> Does this sound plausible to you?
> > >>
> > >> Karl
> > >>
> > >>
> > >> ________________________________
> > >> From: ext Peter Sturge [mailto:peter.stu...@googlemail.com<mailto:
> > >> peter.stu...@googlemail.com>]
> > >> Sent: Tuesday, April 20, 2010 5:41 PM
> > >> To: connectors-u...@incubator.apache.org<mailto:
> > >> connectors-u...@incubator.apache.org>; d...@lucene.apache.org<mailto:
> > >> d...@lucene.apache.org>
> > >>
> > >> Subject: Re: FW: Solr and LCF security at query time
> > >>
> > >> Hi Karl,
> > >>
> > >> Integrating LCF to get external token support for SOLR-1872 sounds
> > >> very interesting indeed. I don't know anything about LCF, but one
> > >> of the things I was planning for SOLR-1872 is to make acl.xml (or
> > >> rather its behaviour) 'pluggable' - i.e. it would just be one of a
> > >> series of plugins that could be used for obtaining back-end
> > >> authentication
> > information.
> > >>
> > >> If you're good with LCF, perhaps we could work together to build
> > >> this
> > in.
> > >> One of the first things would be defining an interface that would
> > >> be as easy as possible to plug LCF into. Have you any
> > >> suggestions/insight on this front?
> > >>
> > >> Many thanks,
> > >> Peter
> > >>
> > >>
> > >>
> > >> On Tue, Apr 20, 2010 at 4:08 PM, <karl.wri...@nokia.com<mailto:
> > >> karl.wri...@nokia.com>> wrote:
> > >> SOLR-1872 looks exactly like what I was envisioning, from the
> > >> search query perspective, although instead of the acl xml file you
> > >> specify LCF stipulates you would dynamically query the
> > >> lcf-authority-service servlet for the access tokens themselves.
> > >> That would get you support for AD, Documentum, LiveLink, Meridio,
> > >> and Memex for free. It seems likely that this component could be
> > >> modified to work with LCF with minor
> > effort.
> > >>
> > >> The missing component still seems to be AD authentication, which
> > >> needs a solution.
> > >>
> > >> Karl
> > >>
> > >> ________________________________
> > >> From: ext Peter Sturge [mailto:peter.stu...@googlemail.com<mailto:
> > >> peter.stu...@googlemail.com>]
> > >> Sent: Tuesday, April 20, 2010 10:44 AM
> > >> To: d...@lucene.apache.org<mailto:d...@lucene.apache.org>
> > >> Subject: Re: FW: Solr and LCF security at query time
> > >>
> > >> If you want to do this completely within Solr, have a look at:
> > >> SOLR-1834 and SOLR-1872. These use a SearchComponent plugin for Solr.
> > >>
> > >> Thanks,
> > >> Peter
> > >>
> > >>
> > >>
> > >> On Tue, Apr 20, 2010 at 1:25 PM, <karl.wri...@nokia.com<mailto:
> > >> karl.wri...@nokia.com>> wrote:
> > >> FYI
> > >>
> > >> ________________________________
> > >> From: Wright Karl (Nokia-S/Cambridge)
> > >> Sent: Tuesday, April 20, 2010 8:16 AM
> > >> To: 'dominique.bej...@eolya.fr<mailto:dominique.bej...@eolya.fr>'
> > >> Cc: 'solr-...@apache.org<mailto:solr-...@apache.org>'; '
> > >> connectors-dev@incubator.apache.org<mailto:
> > >> connectors-dev@incubator.apache.org>'; '
> > >> connectors-u...@incubator.apache.org<mailto:
> > >> connectors-u...@incubator.apache.org>'
> > >> Subject: RE: Solr and LCF security at query time
> > >>
> > >> Dominique,
> > >>
> > >> Yes, I am aware of this ticket and contribution.  Luckily LCF
> > >> establishes a powerful multi-repository security model, even though
> > >> it doesn't yet do the final step of enforcing that model at the
> > >> search end.  LCF allows you to define multiple authorities to
> > >> operate against disparate repositories, and use the appropriate
> > >> authority to secure any given document.  The solr people are aware
> > >> of this design, which addresses the issues raised by SOLR-1834 very
> > >> nicely.  However, as I said before, time is a problem, and the work
> > >> still needs to be
> > done.
> > >>
> > >> I suggest you read up on the actual security model of LCF, and
> > >> perhaps experiment with that and the SOLR-1834 contribution, to see
> > >> if there is common ground.  One thing we've learned at MetaCarta is
> > >> that post-filtering for security purposes is expensive, and it is
> > >> better to modify the queries themselves to restrict the results, if
> > >> possible.  I'm not sure which approach SOLR-1834 takes, although it
> > >> sounds like it might be the filtering approach.  Still, it would be
> > better than nothing.
> > >>
> > >> Please let me know what you find out.
> > >>
> > >> Thanks,
> > >> Karl
> > >>
> > >> ________________________________
> > >> From: ext Dominique Bejean [mailto:dominique.bej...@eolya.fr<mailto:
> > >> dominique.bej...@eolya.fr>]
> > >> Sent: Tuesday, April 20, 2010 8:03 AM
> > >> To: Wright Karl (Nokia-S/Cambridge)
> > >> Cc: connectors-u...@incubator.apache.org<mailto:
> > >> connectors-u...@incubator.apache.org>;
> > >> connectors-dev@incubator.apache.org<mailto:
> > >> connectors-dev@incubator.apache.org>
> > >> Subject: Re: Solr and LCF security at query time
> > >>
> > >> Karl,
> > >>
> > >> Thank you for your reply.
> > >>
> > >> I made some research today and I found this :
> > >> http://freesurf001.appspot.com/issues.apache.org/jira/browse/SOLR-1
> > >> 83
> > >> 4 http://demo.findwise.se:8880/SolrSecurity/
> > >>
> > >> Sorl security model have to be able to filter result list with
> > >> items coming from various sources at the same time (livelink,
> > >> documentum, file system, ...). Big subject :)
> > >>
> > >> Dominique
> > >>
> > >>
> > >> Le 20/04/10 13:34,
> > >> karl.wri...@nokia.com<mailto:karl.wri...@nokia.com> a ?crit :
> > >> Hi Dominique,
> > >>
> > >> At the moment, in order to enforce the LCF security model within
> > >> Lucene/Solr, you will need to build this functionality into
> > >> whatever client you are using to display the Lucene search results.
> > >> Specifically, you would need to take the following steps:
> > >>
> > >> (1) Have your users access your search client through Apache.
> > >> (2) Use the Apache module mod_auth_kerb, combined with LCF's
> > >> mod_authz_annotate, to cause authorization HTTP headers to be
> > >> transmitted to the client webapp.
> > >> (3) Have your client webapp alter whatever queries it is doing, to
> > >> add an appropriate query clause for each of the access tokens
> > >> transmitted in the headers.
> > >>
> > >> (This is how it is done at MetaCarta.)
> > >>
> > >> Alternatively, you may find a way to do this completely with a web
> > >> application under a Java app server such as Tomcat.  I have not yet
> > >> done the research to find out whether this is a feasible alternative.
> > >> Effectively, what you need something like mod_auth_kerb to do is to
> > >> authenticate your user against Active Directory, or whomever the
> > authenticator ought to be.
> > >>  JAAS may be helpful here.
> > >>
> > >> There are, of course, intentions to fill out the missing pieces
> > >> more completely and transparently via a Solr search plugin and/or
> filter.
> > >> What has been lacking is time.  If you are in a position to do
> > >> development in this area, we're happy to have any assistance you
> > >> might
> > provide.
> > >>
> > >> Thanks,
> > >> Karl
> > >> ________________________________
> > >> From: ext Dominique Bejean [mailto:dominique.bej...@eolya.fr]
> > >> Sent: Tuesday, April 20, 2010 5:06 AM
> > >> To: connectors-u...@incubator.apache.org<mailto:
> > >> connectors-u...@incubator.apache.org>
> > >>  Subject: Solr and LCF security at query time
> > >>
> > >> Hi,
> > >>
> > >> I don't see in LCF wiki how Solr and LCF works together at query
> > >> time in order to remove from the result list the items the user is
> > >> not allowed to access.
> > >>
> > >> In
> > >> http://cwiki.apache.org/CONNECTORS/lucene-connectors-framework-conc
> > >> ep
> > >> ts.html,
> > >> I just see these sentences :
> > >>
> > >> " Once all these documents and their access tokens are handed to
> > >> the search engine, it is the search engine's job to enforce
> > >> security by excluding inappropriate documents from the search
> > >> results. For Lucene, this infrastructure is expected to be built on
> > >> top of Lucene's generic metadata abilities, but has not been
> > >> implemented at
> > this time."
> > >>
> > >> I am not sure to understand. Does this mean that for the moment, it
> > >> is not possible for Solr to apply security by using an Authority
> > Connector ?
> > >>
> > >> Dominique
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> -------------------------------------------------------------------
> > >> -- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> > >> additional commands, e-mail: dev-h...@lucene.apache.org
> > >>
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> > additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to