Hi Karl, Apologies for the delayed reply. I've been away on business, and in the middle of a product release, so it's been a busy time...
In response to your eariler questions: The 'AND/OR' filter query, will ultimately map down Lucene Boolean clauses, although the point at which these are done is slightly different. I think I am correct in my understanding that with filter queries, the results are filtered 'post-Lucene', but are separately (Solr) cached, so you get a hit on the first search, but then benefit from cached hits on subsequent searches. The lower-level 'MUST NOT/SHOULD' etc. clauses are applied at the Lucene query directly, so don't have separate Solr caching. I've not benchmarked the two, so one or other might be slower/faster for various search scenarios. In any case, I believe either technique can be employed in either 1834 or 1872. With regards schema extension, I believe we need to be very careful here, as requiring index-time storage of access control data will pose a problem for any use cases where the access control needs to change (maybe often, maybe only occasionally). I'm trying to think of a use case where this wouldn't at least potentially be the case, and I can't think of one, but perhaps I'm not truly understanding what exactly is stored in the __ALLOW_TOKEN__ and __DENY_TOKEN__ fields, and how/where subsequent acl changes would fit in (e.g. let's say someone has left my organization, do I have to update documents to remove his/her access?). Also, would such indexed tokens be entirely 'document-context-free'? I.e. Would the same type/format of tokens be used for data from different sources (e.g. NTFS files, network streams, NFS, web pages, etc.). Will the tokens be compatible with multiple and/or changing authorities (e.g. AD, documentum, LDAP, custom, etc.)? I like the idea of an LCF plugin to hold the acl data. I admit, I've not had enough time to look into how this might look at the moment, but it sounds like it could be a good way to hold generic (authority-agnostic) acl data, and [hopefully] not have to tie it to document data at index-time. I hope this makes sense, but if I've misunderstood the proposed mechanism, please correct me. Would the __ALLOW_TOKEN__ et al fields store, for example, SID information? Thanks, Peter On Tue, Apr 27, 2010 at 10:21 PM, <karl.wri...@nokia.com> wrote: > Ok, not hearing back from Peter, I've done some Solr research and written > some code that might work. The approach I've taken is most similar to SOLR > 1834, other than the LCF-centric logic. Hopefully there will be a chance to > try this out in a full end-to-end way on the weekend, after which I will > submit it to the Solr team (where I think it most naturally would be built > and delivered). > > What it's going to need is either a static or dynamic schema addition to > define __ALLOW_TOKEN__document, __DENY_TOKEN__document, > __ALLOW_TOKEN__share, and __DENY_TOKEN__share fields. These should be > string, multivalued fields (I think). It would be great if these could be > made a default part of Solr; similarly, it would be good if the new search > component was predelivered with Solr and mentioned (even if commented out) > in the example solrconfig.xml file. The only other thing that needs to be > done to hook up the search component is to include a configuration parameter > describing the base URL of the LCF authority service. Plus, as I said > earlier, we still don't have a canned solution for authentication yet - > although I feel that will be straightforward. > > Comments welcome... > Karl > > > ________________________________________ > From: Wright Karl (Nokia-S/Cambridge) > Sent: Tuesday, April 27, 2010 8:20 AM > To: connectors-dev@incubator.apache.org; d...@lucene.apache.org > Cc: connectors-u...@incubator.apache.org; lucene-...@apache.org > Subject: RE: FW: Solr and LCF security at query time > > Hi Peter, > > I finally had a moment to review the SOLR 1872 and SOLR 1834 contributions > in detail, and have a couple of SOLR-related questions. > > Both contributions rely on a SearchComponent to work their magic. However, > it also appears that each modifies the user query in a different way. 1834 > uses MUST, MUST_NOT, and SHOULD filter items, while 1872 uses standard AND > and OR filterquery clauses. Both of them are constructed using Solr > FilterQuery objects. Here are my questions: > > (1) I am not conversant enough with Solr yet to know the difference between > the different kinds of clause structure. Do you know if there is a > difference? For example, is there any possibility that AND/OR clauses can > permit documents to be seen that should not be seen? (MUST and MUST_NOT > sound a lot more definite...) > > (2) Are Solr FilterQuery objects applied to constructing the query that > will be sent to Lucene? Or are they applied by Solr after-the-fact to the > resultset? Or, is it a combination of the two, depending on the details of > your actual filter clause? > > I also haven't heard much from you in the last week or so - have you > thought further about what you intend to do, and can you let me know whether > you are still interested in developing an LCF plugin for Solr? > > Thanks, > Karl > > -----Original Message----- > From: ext Peter Sturge [mailto:peter.stu...@googlemail.com] > Sent: Thursday, April 22, 2010 12:23 PM > To: d...@lucene.apache.org > Cc: connectors-dev@incubator.apache.org; > connectors-u...@incubator.apache.org; lucene-...@apache.org > Subject: Re: FW: Solr and LCF security at query time > > Hi Karl, > > See inline... > > On Thu, Apr 22, 2010 at 4:57 PM, <karl.wri...@nokia.com> wrote: > > > Hi Peter, > > > > The authority connectors don't perform authentication at this time. > > In fact, LCF has nothing to do with authentication at all - just > authorization. > > The reason for this is because it is almost never the case that > > somebody wants to provide multiple credentials in order to be able see > their results. > > Most enterprises who have multiple repositories authenticate against > > AD and then map AD user names to repository user names in order to > > access those repositories. If you noted my earlier posts from this > > morning, you may have noted that I'm looking at recommending JAAS plus > > sun's kerb5 login module for handling the "authenticate against AD" > > case, which would cover some 95%+ of the real world authentication needed > out there. > > > > > I did read your earlier post regarding this, and I totally agree with you - > this is best handled 'upstream'. In fact, I use a JAAS plugin in other > places in the product (not Solr) for authentication. > > > > > > Yes, the idea is to store SIDs in solr at index time. I don't know > > enough about solr to know what kinds of issues this might entail, but > > Lucene certainly has a model of metadata that's pretty flexible, so I > > don't think this would be difficult at all. Eric Hatcher also seemed > > to confirm my suspicions that this would not be a problem. > > > > It's certainly not a problem to store this data in Solr. The problem is > more that you don't really *want* to store this data at index time. > There are lots of reasons for not wanting to 'hard-code' SID data with > documents in the index. Here's just a few: > * What happens if/when you want to add explicit user access to some [group > of] documents ? (i.e. not via a group) > * What happens if you need to revoke or change a user's or group's access? > * It's difficult to move/replicate the index to another domain > * For AD, SIDs are generally not meant to be stored long term outside of > AD, as they can be changed (this doesn't happen often, but it can happen > after an AD rebuild, domain type upgrade, data recovery etc.) > > These and other senarios mean re-indexing the stored data. When the index > is huge, this is non-trivial (time-wise). There are not uncommon scenarios > where user/group access control can change multiple times in one day. > > There might be a way of storing acl data in a payload or similar, but I'm > not sure how that would work across millions of [arbitrarily grouped ] > documents (I'm not familiar enough with payloads to know if this would be a > good or bad idea). > > > > > > This is exactly why I think that we need to do the authentication > > upstream of the authority world. > > > > > Agreed. > > > > > > > If Solr handles arbitrary document metadata, then I think we could > > just use that feature. But you know more about it than me, at this > > point. It would be great to get an overview of potential ways of doing > this. > > > > > Payloads, maybe? > > > > > > For your particular task, it sounds like you are trying to read from > > NTFS and apply security after-the-fact with some acl specification > > file. In that case, I'd write a repository connector that was based > > on the file system connector (already part of the stable of connectors > > for LCF) which reads ACL information from your acl.xml file. Or, if > > you prefer a UI for specifying ACL information, you could extend the > > connector so that security is configured in the UI without having an > > external acl.xml file at all - which would be a nice addition to the > > existing file system connector. (Repository connections and jobs are > > configured internally in LCF by XML documents stored in the database, > > so they can be arbitrarily structured. I'm happy to help you figure > > out how to do this if this is what you decide to do.) > > > > For my particular requirements, there are no files - the data is > > generated > from the network and stored. After the fact, there is no persistent > location of this data other than in Solr. > > Storing the acl info using the connector sounds very interesting. Could be > worth looking at in more details. Thanks! > > > > > I think we still need to add in the authentication piece to make this > > all work for you, so perhaps you can describe how you expect a user to > > interact with your system, so I can understand your design issues. > > > > Thanks, > > Karl > > > > > > > > > > > > > > -----Original Message----- > > From: ext Peter Sturge [mailto:peter.stu...@googlemail.com] > > Sent: Thursday, April 22, 2010 11:32 AM > > To: d...@lucene.apache.org > > Cc: connectors-u...@incubator.apache.org; lucene-...@apache.org; > > connectors-dev@incubator.apache.org > > Subject: Re: FW: Solr and LCF security at query time > > > > Hi Karl, > > > > Thanks very much for your detailed explanation - really good! > > > > As I've thought through some of the implications, I've added comments > > below, so I hope they don't seem too jumbled... > > > > I suppose on the 'authority' side, it works kind of as I envisioned it > > would. > > > > For general Solr access control, there's two layers of security that > > need to be addressed: > > 1. Authentication - make sure the incoming query is from a valid > > user, and the passed-in credentials (hash, certificate etc.) are > > correct 2. Query filtering - potentially reduce the number/type of > > returned results based on the allow/deny metadata for the > > authenticated user > > > > I can see how the LCF auth connector works for 2., but can it do 1. as > > well? > > It would be good if this could somehow be integrated into any > > container (Tomcat/Jetty et al) authentication that might be configured > > (probably related to your previous post). I many ways, it could/should > > be that the Authority (AD) part of the connector should only be > > concerned with 1. and not 2. (see below). > > > > So, on the repository side, there is also an LCF connector that > > 'closes the loop' to provide the 'what is it I'm trying to control' side > of things. > > I understand that LCF doesn't do the mapping - it delegates this task > > to the caller, but provides both sides of the equation (authority & > > repository). > > > > >>>>> > > - Each file in DirectoryA will have the following > > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890", > and "myAD:S-23-64-12345". > > - Each file in DirectoryB will have the following > > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890" > > <<<<< > > I think this is the bit that is worrying me - is this storing the SIDs > > into Solr at document index time? This would be a problem for a whole > > load of reasons, but maybe I'm missing something here? (see below for > > a possible > > alternative) > > > > Basically, what I'm getting at here is that the allow/deny values need > > to be stored in one of three places: > > 1. In the authority (e.g. inside AD) > > 2. In the document metadata (index-time) 3. In external storage > > (e.g. acl.xml, NTFS etc.) > > > > 1. Extending AD is pretty much out, as this causes too many interop > > problems 2. 'Hard-coding' acl information in the index makes it > > non-portable, resistent to changes, etc. > > 3. acl.xml is coupled with a Solr instance, but is easily > > ported/replicated. > > Storing/retrieving acl information from the source (e.g. NTFS) is > > problematic, as the source may not be accessible (it may not even exist). > > > > I believe 3. or a variant is the way to go on the repo side, which > > means the LCF Authority connector is mainly for Authentication (see > > above), which is what you want from AD et al integration. > > The problem that arises from 'pluggable' authentication is that, if > > you're not using a certificate, you have to start with a password, but > > the connector only has access to the password hash (unless the pwd is > > sent in the query url). I don't know of a way to confirm identities in > > AD using only the username and hash (AD does the hash compare). I > > believe this is where container-based integration will likely work > better. > > > > So that I can confirm my understanding...a scenario might be like this: > > > > We have an AD connector that fetches the SIDs and we can read them etc. > > For my environment, where there are no 'files' (there's only a > > transient network stream), we have an LCF 'Solr Field Filter Query' > > connector that decides which Filter Queries to apply (allow and deny) > > for the passed in SID(s). > > > > For another environment, let's say, NTFS, there might be an 'NTFS' > > connector that would provide some kind of mapping of files/folders to > > SID(s). Since Solr wouldn't intrinscially know about this, the acl > > information would need to be stored somewhere in the index. This would > > mean extending the Solr schema and storing metadata at index time. > > The alternative is to re-use the 'Solr Field Filter Query' connector > > for this as well (and any other document types that might be read in). > > This keeps the index 'clean' of acl-specific metadata, and allows for > > in-place changes and easy cross-document/index/instance access control. > > > > > > If the above interpretation is [roughly] correct (please let me know > > if I've got this wrong!), this would reduce down to having: > > 1. One or more LCF Authority connectors (e.g. AD, Documentum, etc.) > > (possibly/partly at the container level) > > 2. At least an LCF Repository connector for 'acl.xml' > > 3. Optional other LCF Repository connectors > > > > It sounds like you've now finished the first half of 1. by adding the > > ability to get the required auth data from a Solr api call. The other > > half of 1. will be implementing the LCF interface in the > > SolrACLSecurity class, to effectively replace the 'user', 'group' and > 'password' bits of acl.xml. > > > > Does the above sound like an accurate interpretation? Just trying to > > get a good picture of what work needs doing, where it goes, etc. > > > > Many thanks! > > Peter > > > > > > > > > > On Thu, Apr 22, 2010 at 2:52 PM, <karl.wri...@nokia.com> wrote: > > > > > >>>>>> > > > What is the relationship between stored data (documents) and > authorities' > > > access/deny attributes? (do you have any examples of what an > > > access_token value might contain?) <<<<<< > > > > > > Documents have access/deny attributes; authorities simply provide > > > the list of tokens that belong to an authenticated user. Thus, > > > there's no access/deny for an authority; that's attached to the > > > document (as it is in real-world repositories). > > > > > > Let's run a quick example, using Active Directory and a Windows file > > > system. Suppose that you have a directory with documents in it, > > > call it DirectoryA, and the directory allows read access to the > > > following > > SIDs: > > > > > > S-123-456-76890 > > > S-23-64-12345 > > > > > > These SIDs correspond to active directory groups, let's call them > > > Group1 and Group2, respectively. > > > > > > DirectoryB also has documents in it, and those documents have just > > > the SID S-123-456-76890 attached, because only Group1 can read its > contents. > > > > > > Now, pretend that someone has created an LCF Active Directory > > > authority connection (in the LCF UI), which is called "myAD", and > > > this connection is set up to talk to the governing AD domain > > > controller for this Windows file system. We now know enough to > > > describe the document > > indexing process: > > > > > > - Each file in DirectoryA will have the following > > > __ALLOW_TOKEN__document attributes inside Solr: > > > "myAD:S-123-456-76890", > > and "myAD:S-23-64-12345". > > > - Each file in DirectoryB will have the following > > > __ALLOW_TOKEN__document attributes inside Solr: "myAD:S-123-456-76890" > > > > > > Now, suppose that a user (let's call him "Peter") is authenticated > > > with the AD domain controller. Peter belongs to Group2, so his SIDs > > > are > > (say): > > > > > > S-1-1-0 (the 'everyone' SID) > > > S-323-999-12345 (his own personal user SID) > > > S-23-64-12345 (the SID he gets because he belongs to group 2) > > > > > > We want to look up the documents in the search index that he can see. > > > So, we ask the LCF authority service what his tokens are, and we get > > back: > > > > > > "myAD:S-1-1-0", "myAD:S-323-999-12345", and "myAD:S-23-64-12345" > > > > > > The documents we should return in his search are the ones matching > > > his search criteria, PLUS the intersection of his tokens with the > > > document ALLOW tokens, MINUS the intersection of his tokens with the > > > document DENY tokens (there aren't any involved in this example). > > > So only files that have one of his three tokens as an ALLOW > > > attribute would be > > returned. > > > > > > Note that what we are attempting to do is enforce AD's security with > > > the search results we present. There is no need to define a whole > > > new security mechanism, because AD already has one that people use. > > > > > > >>>>>> > > > One of the key requirements I've worked to adhere to in SOLR-1872 is > > > to ensure there are no security or other dependencies of indexed > > > data with any external repository - most notably the file system. > > > There are many reasons for wanting this, but one of the main ones is > > > that Solr-stored data is not always based on file data (or > > > accessible > > file data). > > > In fact, in my particular case, almost none of the indexed data > > > comes from files. > > > <<<<<< > > > > > > LCF is all about abstracting from repositories. It's not > > > specifically about a file system, although that is a convenient > > > example. If you are building your own kind of repository with your > > > own security setup, that's fine - but in the LCF world you'd need to > > > create an authority connector for your repository (which maybe reads > > > your acl.xml file), as well as a repository connector (which hands > > > documents to LCF and provides it with the access tokens that make > > > security work). Of course, you can something much lighter that > > > doesn't include LCF at all if you are just integrating a custom > > > repository of your own, but it sounded like you were interested in the > broader problem here. > > > > > > So, LCF doesn't do "acl mapping" at all. It relies on its various > > > connectors to work cooperatively to define access tokens in a way > > > that is consistent from authority connector to repository connector > > > for a given repository kind. Anybody can write a connector, so the > > > beauty of all this is that you can build a system where data from > > > many disparate sources is indexed, and security for each is > > > simultaneously > > enforced. > > > > > > Karl > > > > > > > > > ------------------------------ > > > *From:* ext Peter Sturge [mailto:peter.stu...@googlemail.com] > > > *Sent:* Thursday, April 22, 2010 9:24 AM > > > > > > *To:* d...@lucene.apache.org > > > *Cc:* connectors-u...@incubator.apache.org; lucene-...@apache.org; > > > connectors-dev@incubator.apache.org > > > *Subject:* Re: FW: Solr and LCF security at query time > > > > > > Hi Karl, > > > > > > Thanks very much for the diagram - > > > Sorry about all the questions, but this raises a few new ones... > > > > > > What is the relationship between stored data (documents) and > authorities' > > > access/deny attributes? (do you have any examples of what an > > > access_token value might contain?) > > > > > > One of the key requirements I've worked to adhere to in SOLR-1872 is > > > to ensure there are no security or other dependencies of indexed > > > data with any external repository - most notably the file system. > > > There are many reasons for wanting this, but one of the main ones is > > > that Solr-stored data is not always based on file data (or > > > accessible > > file data). > > > In fact, in my particular case, almost none of the indexed data > > > comes from files. > > > > > > This is one reason why SOLR-1872 uses filter queries for its > > > access/deny tokens - so that all the required information for access > > > control completely resides within the Solr index itself. > > > Is the LCF architecture acl 'mapping' between Solr fields (queries) > > > and users, some external 'repository' (files) and users, or > > > arbitrary > > data (e.g. > > > either of these)? > > > > > > I hope that makes sense... > > > > > > Thanks! > > > Peter > > > > > > > > > > > > > > > On Thu, Apr 22, 2010 at 10:25 AM, <karl.wri...@nokia.com> wrote: > > > > > >> Hi Peter, > > >> > > >> I've attached a diagram that is not in the wiki as of yet, and I'll > > >> try to answer your questions. > > >> > > >> >>>>>> > > >> Are the ACCESS_TOKEN and DENY_TOKEN values whatever have been > > >> stored for a particular user in the underlying acl store (e.g. > > >> Active > > Directory)? > > >> How does AD and/or LCF handle storing such data in its schema? > > >> (does AD needs its schema extended?) Presumably, any such AD fields > > >> would need to be queried for effective rights in order to cater for > > >> group membership allows and denies. > > >> <<<<<< > > >> > > >> The ACCESS_TOKEN and DENY_TOKEN values are, in one sense, arbitrary > > >> strings that represent a contract between an LCF authority > > >> connection and the LCF repository connection that picks up the > > >> documents (from > > wherever). > > >> These tokens thus have no real meaning outside of LCF. You must > > >> regard them as opaque. > > >> > > >> The contract, however, states that if you use the LCF authority > > >> service to obtain tokens for an authenticated user, you will get > > >> back a set that is CONSISTENT with the tokens that were attached to > > >> the documents LCF sent to Solr for indexing in the first place. > > >> So, you don't have to worry about it, and that's kind of the idea. > > >> So you > > imagine the following flow: > > >> > > >> (1) Use LCF to fetch documents and send them to Solr > > >> (2) When searching, use the LCF authority service to get the > > >> desired user's access tokens > > >> (3) Either filter the results, or modify the query, to be sure the > > >> access tokens all match up properly > > >> > > >> For the AD authority, the LCF access tokens consist, in part, of > > >> the user's SIDs. For other authorities, the access tokens are > > >> wildly > > different. > > >> You really don't want to know what's in them, since that's the job > > >> of the LCF authority to determine. ;-) > > >> > > >> LCF is not, by the way, joined at the hip with AD. However, in > > >> practice, most enterprises in the world use some form of AD single > > >> signon for their web applications, and even if they're using some > > >> repository with its own idea of security, there's a mapping between > > >> the AD users and the repository's users. Doing that mapping is > > >> also the job of the LCF authority for that repository. > > >> > > >> Hope this helps. Also, I'm not expecting time miracles here, so > > >> don't sweat the schedule. > > >> > > >> > > >> Karl > > >> > > >> > > >> ________________________________________ > > >> From: ext Peter Sturge [peter.stu...@googlemail.com] > > >> Sent: Thursday, April 22, 2010 4:27 AM > > >> To: d...@lucene.apache.org > > >> Cc: connectors-u...@incubator.apache.org; lucene-...@apache.org; > > >> connectors-dev@incubator.apache.org > > >> Subject: Re: FW: Solr and LCF security at query time > > >> > > >> Hi Karl, > > >> > > >> Thanks for the quick turnaround. > > >> I'm in the middle of a product release for us, so I fear I won't be > > >> as quick as you... :-) > > >> > > >> I couldn't find a simple flow diagram or similar for LCF with > > >> regards security (probably looking in the wrong place). > > >> Perhaps you could help on these questions...? > > >> > > >> In SOLR-1872, the allows and denies are stored (in acl.xml) as > > >> sub-queries, which are then used as filter queries in a user's search. > > >> > > >> Are the ACCESS_TOKEN and DENY_TOKEN values whatever have been > > >> stored for a particular user in the underlying acl store (e.g. > > >> Active > > Directory)? > > >> How does AD and/or LCF handle storing such data in its schema? > > >> (does AD needs its schema extended?) Presumably, any such AD fields > > >> would need to be queried for effective rights in order to cater for > > >> group membership allows and denies. > > >> > > >> I guess I'm just trying to understand the architectural > > >> flow/storage/retrieval of data in the various parts of the system, > > >> but I admit, I need to do more research on this. > > >> After our product release, when I get a few more spare cycles, I > > >> can look at it in more detail. > > >> > > >> Many thanks! > > >> Peter > > >> > > >> > > >> > > >> On Thu, Apr 22, 2010 at 1:02 AM, <karl.wri...@nokia.com<mailto: > > >> karl.wri...@nokia.com>> wrote: > > >> Hi Peter, > > >> > > >> I just committed the promised changes to the LCF Solr output > connector. > > >> > > >> ACL metadata will now be posted to the Solr Http interface along > > >> with the document as the two following fields: > > >> > > >> __ACCESS_TOKEN__document > > >> __DENY_TOKEN__document > > >> > > >> There will, of course, potentially be multiple values for each of > > >> these two fields. > > >> > > >> Hope this helps, > > >> Karl > > >> > > >> ________________________________ > > >> From: ext Peter Sturge [mailto:peter.stu...@googlemail.com<mailto: > > >> peter.stu...@googlemail.com>] > > >> Sent: Tuesday, April 20, 2010 6:51 PM > > >> > > >> To: connectors-u...@incubator.apache.org<mailto: > > >> connectors-u...@incubator.apache.org> > > >> Subject: Re: FW: Solr and LCF security at query time > > >> > > >> Hi Karl, > > >> > > >> Thanks for the info. I'll have a look at the link and try to take > > >> in as much sugar as my insulin levels will handle... > > >> It sounds like the necessary interface(s) are already in LCF - just > > >> a matter of implementing them in the Solr 1872 plugin. > > >> I'll need to digest the LCF stuff to get to grips with it..please > > >> bear with me while I do that... > > >> > > >> When you say: > > >> The LCF solr output connection doesn't yet do this, but it is > > >> trivial for me to make that happen. > > >> Do you mean a mechanism by which solr.war can get url et al info > > >> from its parent container (Tomcat, Jetty etc.), or have I > > >> misinterpreted > > this? > > >> > > >> > > >> Thanks, > > >> Peter > > >> > > >> > > >> > > >> > > >> On Tue, Apr 20, 2010 at 11:05 PM, <karl.wri...@nokia.com<mailto: > > >> karl.wri...@nokia.com>> wrote: > > >> Hi Peter, > > >> > > >> I'm the principal committer for LCF, but I don't know as much about > > >> Solr as I ought to, so it sounds like a potentially productive > > collaboration. > > >> > > >> LCF does exactly what you are looking for - the only issue at all > > >> is that you need to fetch a URL from a webapp to get what you are > > >> looking for. The "plugs" are all inside LCF for different kinds of > > >> repositories. Here's a link that might help with drinking the LCF > > "koolaid", as it were: > > >> https://cwiki.apache.org/confluence/display/CONNECTORS/Lucene+Conne > > >> ct > > >> ors+Framework+concepts > > >> > > >> The url would be something like this (on a locally installed > > >> tomcat-based LCF instance): > > >> > > >> > > >> http://localhost:8080/lcf-authority-service/UserACLs?username=someu > > >> se > > >> rn...@somedomain.com > > >> > > >> ... and this fetch returns something like: > > >> > > >> TOKEN:xxxxxxx > > >> TOKEN:yyyyyyy > > >> TOKEN:zzzzzzz > > >> .... > > >> > > >> ... which represent the amalgamated tokens for all of the defined > > >> authorities, and by some strange coincidence ( ;-) ) are compatible > > >> with certain pieces of metadata that have been passed into Solr > > >> with each document - one set of Allow tokens, and a second set of > > >> Deny tokens. The LCF solr output connection doesn't yet do this, > > >> but it is trivial for me to make that happen. > > >> > > >> Does this sound plausible to you? > > >> > > >> Karl > > >> > > >> > > >> ________________________________ > > >> From: ext Peter Sturge [mailto:peter.stu...@googlemail.com<mailto: > > >> peter.stu...@googlemail.com>] > > >> Sent: Tuesday, April 20, 2010 5:41 PM > > >> To: connectors-u...@incubator.apache.org<mailto: > > >> connectors-u...@incubator.apache.org>; d...@lucene.apache.org<mailto: > > >> d...@lucene.apache.org> > > >> > > >> Subject: Re: FW: Solr and LCF security at query time > > >> > > >> Hi Karl, > > >> > > >> Integrating LCF to get external token support for SOLR-1872 sounds > > >> very interesting indeed. I don't know anything about LCF, but one > > >> of the things I was planning for SOLR-1872 is to make acl.xml (or > > >> rather its behaviour) 'pluggable' - i.e. it would just be one of a > > >> series of plugins that could be used for obtaining back-end > > >> authentication > > information. > > >> > > >> If you're good with LCF, perhaps we could work together to build > > >> this > > in. > > >> One of the first things would be defining an interface that would > > >> be as easy as possible to plug LCF into. Have you any > > >> suggestions/insight on this front? > > >> > > >> Many thanks, > > >> Peter > > >> > > >> > > >> > > >> On Tue, Apr 20, 2010 at 4:08 PM, <karl.wri...@nokia.com<mailto: > > >> karl.wri...@nokia.com>> wrote: > > >> SOLR-1872 looks exactly like what I was envisioning, from the > > >> search query perspective, although instead of the acl xml file you > > >> specify LCF stipulates you would dynamically query the > > >> lcf-authority-service servlet for the access tokens themselves. > > >> That would get you support for AD, Documentum, LiveLink, Meridio, > > >> and Memex for free. It seems likely that this component could be > > >> modified to work with LCF with minor > > effort. > > >> > > >> The missing component still seems to be AD authentication, which > > >> needs a solution. > > >> > > >> Karl > > >> > > >> ________________________________ > > >> From: ext Peter Sturge [mailto:peter.stu...@googlemail.com<mailto: > > >> peter.stu...@googlemail.com>] > > >> Sent: Tuesday, April 20, 2010 10:44 AM > > >> To: d...@lucene.apache.org<mailto:d...@lucene.apache.org> > > >> Subject: Re: FW: Solr and LCF security at query time > > >> > > >> If you want to do this completely within Solr, have a look at: > > >> SOLR-1834 and SOLR-1872. These use a SearchComponent plugin for Solr. > > >> > > >> Thanks, > > >> Peter > > >> > > >> > > >> > > >> On Tue, Apr 20, 2010 at 1:25 PM, <karl.wri...@nokia.com<mailto: > > >> karl.wri...@nokia.com>> wrote: > > >> FYI > > >> > > >> ________________________________ > > >> From: Wright Karl (Nokia-S/Cambridge) > > >> Sent: Tuesday, April 20, 2010 8:16 AM > > >> To: 'dominique.bej...@eolya.fr<mailto:dominique.bej...@eolya.fr>' > > >> Cc: 'solr-...@apache.org<mailto:solr-...@apache.org>'; ' > > >> connectors-dev@incubator.apache.org<mailto: > > >> connectors-dev@incubator.apache.org>'; ' > > >> connectors-u...@incubator.apache.org<mailto: > > >> connectors-u...@incubator.apache.org>' > > >> Subject: RE: Solr and LCF security at query time > > >> > > >> Dominique, > > >> > > >> Yes, I am aware of this ticket and contribution. Luckily LCF > > >> establishes a powerful multi-repository security model, even though > > >> it doesn't yet do the final step of enforcing that model at the > > >> search end. LCF allows you to define multiple authorities to > > >> operate against disparate repositories, and use the appropriate > > >> authority to secure any given document. The solr people are aware > > >> of this design, which addresses the issues raised by SOLR-1834 very > > >> nicely. However, as I said before, time is a problem, and the work > > >> still needs to be > > done. > > >> > > >> I suggest you read up on the actual security model of LCF, and > > >> perhaps experiment with that and the SOLR-1834 contribution, to see > > >> if there is common ground. One thing we've learned at MetaCarta is > > >> that post-filtering for security purposes is expensive, and it is > > >> better to modify the queries themselves to restrict the results, if > > >> possible. I'm not sure which approach SOLR-1834 takes, although it > > >> sounds like it might be the filtering approach. Still, it would be > > better than nothing. > > >> > > >> Please let me know what you find out. > > >> > > >> Thanks, > > >> Karl > > >> > > >> ________________________________ > > >> From: ext Dominique Bejean [mailto:dominique.bej...@eolya.fr<mailto: > > >> dominique.bej...@eolya.fr>] > > >> Sent: Tuesday, April 20, 2010 8:03 AM > > >> To: Wright Karl (Nokia-S/Cambridge) > > >> Cc: connectors-u...@incubator.apache.org<mailto: > > >> connectors-u...@incubator.apache.org>; > > >> connectors-dev@incubator.apache.org<mailto: > > >> connectors-dev@incubator.apache.org> > > >> Subject: Re: Solr and LCF security at query time > > >> > > >> Karl, > > >> > > >> Thank you for your reply. > > >> > > >> I made some research today and I found this : > > >> http://freesurf001.appspot.com/issues.apache.org/jira/browse/SOLR-1 > > >> 83 > > >> 4 http://demo.findwise.se:8880/SolrSecurity/ > > >> > > >> Sorl security model have to be able to filter result list with > > >> items coming from various sources at the same time (livelink, > > >> documentum, file system, ...). Big subject :) > > >> > > >> Dominique > > >> > > >> > > >> Le 20/04/10 13:34, > > >> karl.wri...@nokia.com<mailto:karl.wri...@nokia.com> a ?crit : > > >> Hi Dominique, > > >> > > >> At the moment, in order to enforce the LCF security model within > > >> Lucene/Solr, you will need to build this functionality into > > >> whatever client you are using to display the Lucene search results. > > >> Specifically, you would need to take the following steps: > > >> > > >> (1) Have your users access your search client through Apache. > > >> (2) Use the Apache module mod_auth_kerb, combined with LCF's > > >> mod_authz_annotate, to cause authorization HTTP headers to be > > >> transmitted to the client webapp. > > >> (3) Have your client webapp alter whatever queries it is doing, to > > >> add an appropriate query clause for each of the access tokens > > >> transmitted in the headers. > > >> > > >> (This is how it is done at MetaCarta.) > > >> > > >> Alternatively, you may find a way to do this completely with a web > > >> application under a Java app server such as Tomcat. I have not yet > > >> done the research to find out whether this is a feasible alternative. > > >> Effectively, what you need something like mod_auth_kerb to do is to > > >> authenticate your user against Active Directory, or whomever the > > authenticator ought to be. > > >> JAAS may be helpful here. > > >> > > >> There are, of course, intentions to fill out the missing pieces > > >> more completely and transparently via a Solr search plugin and/or > filter. > > >> What has been lacking is time. If you are in a position to do > > >> development in this area, we're happy to have any assistance you > > >> might > > provide. > > >> > > >> Thanks, > > >> Karl > > >> ________________________________ > > >> From: ext Dominique Bejean [mailto:dominique.bej...@eolya.fr] > > >> Sent: Tuesday, April 20, 2010 5:06 AM > > >> To: connectors-u...@incubator.apache.org<mailto: > > >> connectors-u...@incubator.apache.org> > > >> Subject: Solr and LCF security at query time > > >> > > >> Hi, > > >> > > >> I don't see in LCF wiki how Solr and LCF works together at query > > >> time in order to remove from the result list the items the user is > > >> not allowed to access. > > >> > > >> In > > >> http://cwiki.apache.org/CONNECTORS/lucene-connectors-framework-conc > > >> ep > > >> ts.html, > > >> I just see these sentences : > > >> > > >> " Once all these documents and their access tokens are handed to > > >> the search engine, it is the search engine's job to enforce > > >> security by excluding inappropriate documents from the search > > >> results. For Lucene, this infrastructure is expected to be built on > > >> top of Lucene's generic metadata abilities, but has not been > > >> implemented at > > this time." > > >> > > >> I am not sure to understand. Does this mean that for the moment, it > > >> is not possible for Solr to apply security by using an Authority > > Connector ? > > >> > > >> Dominique > > >> > > >> > > >> > > >> > > >> > > >> > > >> ------------------------------------------------------------------- > > >> -- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > > >> additional commands, e-mail: dev-h...@lucene.apache.org > > >> > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > > additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >