Re: Solr JOIN: keeping permission data out of primary documents

2014-11-19 Thread Yonik Seeley
On Tue, Nov 18, 2014 at 3:47 PM, Philip Durbin
philip_dur...@harvard.edu wrote:
 Solr JOINs are a way to enforce simple document security, as explained
 by Yonik Seeley at
 http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html

 I'm trying to tweak this pattern so that I don't have to keep the
 security information in each of my primary Solr documents.

 I just posted the gist at
 https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9 as an example of
 my working Solr JOIN based on data in `before.json` . Permissions per
 user are embedded in the primary documents like this:

 {
 id: dataset_3,
 perms_ss: [
 alice,
 bob
 ]
 },
 {
 id: dataset_4,
 perms_ss: [
 alice,
 bob,
 public
 ]
 },

 User document have been created to do the JOIN on:

 {
 id: alice,
 groups_s: alice
 },

 The JOIN looks like this:

 {!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice

It would probably be faster written as a single join:
fq={!join+from=groups_s+to=perms_ss}id:(public alice)

Or, if you're using Heliosearch you could cache the filters separately
for better hit rates on commonly used perms via the filter keyword:
fq=filter({!join+from=groups_s+to=perms_ss}id:public) OR
filter({!join+from=groups_s+to=perms_ss}id:alice)

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: Solr JOIN: keeping permission data out of primary documents

2014-11-19 Thread Philip Durbin
On Wed, Nov 19, 2014 at 5:45 AM, Yonik Seeley yo...@heliosearch.com wrote:
 On Tue, Nov 18, 2014 at 3:47 PM, Philip Durbin
 philip_dur...@harvard.edu wrote:
 Solr JOINs are a way to enforce simple document security, as explained
 by Yonik Seeley at
 http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html

 I'm trying to tweak this pattern so that I don't have to keep the
 security information in each of my primary Solr documents.

 I just posted the gist at
 https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9 as an example of
 my working Solr JOIN based on data in `before.json` . Permissions per
 user are embedded in the primary documents like this:

 {
 id: dataset_3,
 perms_ss: [
 alice,
 bob
 ]
 },
 {
 id: dataset_4,
 perms_ss: [
 alice,
 bob,
 public
 ]
 },

 User document have been created to do the JOIN on:

 {
 id: alice,
 groups_s: alice
 },

 The JOIN looks like this:

 {!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice

 It would probably be faster written as a single join:
 fq={!join+from=groups_s+to=perms_ss}id:(public alice)

Hmm, I can't get the single JOIN to work on the before example
(perms embedded in each primary doc) in the gist I posted so I guess
I'll live with the slower version with OR.

 Or, if you're using Heliosearch you could cache the filters separately
 for better hit rates on commonly used perms via the filter keyword:
 fq=filter({!join+from=groups_s+to=perms_ss}id:public) OR
 filter({!join+from=groups_s+to=perms_ss}id:alice)

Getting back to my original question about keeping permission
information out of my primary documents, I noticed that
http://heliosearch.org describes the Pseudo-Join feature as selects a
set of documents based on their relationship to a **second** set of
documents (emphasis mine) so I assume I can't take the perms out of
my primary Solr documents and put them in a **third** set of
permission assignments documents with definition points and role
assignees: https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9#file-after-json
. That is, the three sets of documents would be:

1. primary (datasets, with no permission info)
2. users
3. permission assignments

So, I guess I'll continue to embed permissions into the primary
documents, since it's working. :)

Thanks, Yonik. I appreciate you taking a look at this.

Phil

-- 
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin


Re: Solr JOIN: keeping permission data out of primary documents

2014-11-19 Thread Yonik Seeley
On Wed, Nov 19, 2014 at 9:22 AM, Philip Durbin
philip_dur...@harvard.edu wrote:
 On Wed, Nov 19, 2014 at 5:45 AM, Yonik Seeley yo...@heliosearch.com wrote:
 On Tue, Nov 18, 2014 at 3:47 PM, Philip Durbin
 philip_dur...@harvard.edu wrote:
 Solr JOINs are a way to enforce simple document security, as explained
 by Yonik Seeley at
 http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html

 I'm trying to tweak this pattern so that I don't have to keep the
 security information in each of my primary Solr documents.

 I just posted the gist at
 https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9 as an example of
 my working Solr JOIN based on data in `before.json` . Permissions per
 user are embedded in the primary documents like this:

 {
 id: dataset_3,
 perms_ss: [
 alice,
 bob
 ]
 },
 {
 id: dataset_4,
 perms_ss: [
 alice,
 bob,
 public
 ]
 },

 User document have been created to do the JOIN on:

 {
 id: alice,
 groups_s: alice
 },

 The JOIN looks like this:

 {!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice

 It would probably be faster written as a single join:
 fq={!join+from=groups_s+to=perms_ss}id:(public alice)

 Hmm, I can't get the single JOIN to work on the before example
 (perms embedded in each primary doc) in the gist I posted so I guess
 I'll live with the slower version with OR.

 Or, if you're using Heliosearch you could cache the filters separately
 for better hit rates on commonly used perms via the filter keyword:
 fq=filter({!join+from=groups_s+to=perms_ss}id:public) OR
 filter({!join+from=groups_s+to=perms_ss}id:alice)

 Getting back to my original question about keeping permission
 information out of my primary documents, I noticed that
 http://heliosearch.org describes the Pseudo-Join feature as selects a
 set of documents based on their relationship to a **second** set of
 documents (emphasis mine) so I assume I can't take the perms out of
 my primary Solr documents and put them in a **third** set of
 permission assignments documents with definition points and role
 assignees: 
 https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9#file-after-json
 . That is, the three sets of documents would be:

 1. primary (datasets, with no permission info)
 2. users
 3. permission assignments

You should be able to chain joins to follow any number of links.
I don't quite understand how you mean to use your schema... but something like

fq={!join from=definition_point_s to=id}role_assignee_ss:alice

That's only following a single link and ignoring the group_s field, so
I'm probably missing something.

-Yonik


Re: Solr JOIN: keeping permission data out of primary documents

2014-11-19 Thread Philip Durbin
On Wed, Nov 19, 2014 at 11:56 AM, Yonik Seeley yo...@heliosearch.com wrote:
 On Wed, Nov 19, 2014 at 9:22 AM, Philip Durbin
 philip_dur...@harvard.edu wrote:
 On Wed, Nov 19, 2014 at 5:45 AM, Yonik Seeley yo...@heliosearch.com wrote:
 On Tue, Nov 18, 2014 at 3:47 PM, Philip Durbin
 philip_dur...@harvard.edu wrote:
 Solr JOINs are a way to enforce simple document security, as explained
 by Yonik Seeley at
 http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html

 I'm trying to tweak this pattern so that I don't have to keep the
 security information in each of my primary Solr documents.

 I just posted the gist at
 https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9 as an example of
 my working Solr JOIN based on data in `before.json` . Permissions per
 user are embedded in the primary documents like this:

 {
 id: dataset_3,
 perms_ss: [
 alice,
 bob
 ]
 },
 {
 id: dataset_4,
 perms_ss: [
 alice,
 bob,
 public
 ]
 },

 User document have been created to do the JOIN on:

 {
 id: alice,
 groups_s: alice
 },

 The JOIN looks like this:

 {!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice

 It would probably be faster written as a single join:
 fq={!join+from=groups_s+to=perms_ss}id:(public alice)

 Hmm, I can't get the single JOIN to work on the before example
 (perms embedded in each primary doc) in the gist I posted so I guess
 I'll live with the slower version with OR.

 Or, if you're using Heliosearch you could cache the filters separately
 for better hit rates on commonly used perms via the filter keyword:
 fq=filter({!join+from=groups_s+to=perms_ss}id:public) OR
 filter({!join+from=groups_s+to=perms_ss}id:alice)

 Getting back to my original question about keeping permission
 information out of my primary documents, I noticed that
 http://heliosearch.org describes the Pseudo-Join feature as selects a
 set of documents based on their relationship to a **second** set of
 documents (emphasis mine) so I assume I can't take the perms out of
 my primary Solr documents and put them in a **third** set of
 permission assignments documents with definition points and role
 assignees: 
 https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9#file-after-json
 . That is, the three sets of documents would be:

 1. primary (datasets, with no permission info)
 2. users
 3. permission assignments

 You should be able to chain joins to follow any number of links.
 I don't quite understand how you mean to use your schema... but something like

 fq={!join from=definition_point_s to=id}role_assignee_ss:alice

 That's only following a single link and ignoring the group_s field, so
 I'm probably missing something.

No, no, this is PERFECT! I think...

Again my goal is to get away from putting the permissions in the
primary documents.

In the before example, I put the permissions in the primary
documents. Then I JOIN on those documents using a secondary set of
group documents: the public group, the alice group, the bob
group, etc.

As of the commit below, using your suggestion, in the after example
I've taken the permissions out of the primary documents. Instead the
permissions go into a set of permission assignments documents. This
means that when permissions change, rather than re-indexing my primary
documents (which is a somewhat expensive operation with many database
calls), I think I'll be able to reindex only the permission
assignments documents. As you noted, the third set of documents about
groups aren't being used so I deleted them.

I'm going to play around with this in our actual code. Thanks, Yonik!

Phil

p.s. You were right about the single JOIN as well, so that's in the
commit too (looking for both the alice group and the public group
at the same time). In my haste I forgot that when testing this stuff
with curl I need to replace spaces with the plus (+) sign.

p.p.s. I can't seem to figure out how to link to a specific diff in a
gist but what you see below is the third revision. This one:
https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9/0c0a9120299e3b0c112dc1687b89de83598fcb02

murphy:4d27fea7b431ef3bf4f9 pdurbin$ git show 0c0a912 | cat
commit 0c0a9120299e3b0c112dc1687b89de83598fcb02
Author: Philip Durbin philipdur...@gmail.com
Date:   Wed Nov 19 12:48:00 2014 -0500

A solution from Yonik Seeley! Permissions are gone from primary docs

Details at 
http://lucene.472066.n3.nabble.com/Solr-JOIN-keeping-permission-data-out-of-primary-documents-tp4169739p4169934.html

diff --git a/after.json b/after.json
index dd817e5..c2516d9 100644
--- a/after.json
+++ b/after.json
@@ -12,22 +12,6 @@
 id: dataset_4
 },
 {
-id: public,
-groups_s: public
-},
-{
-id: alice,
-groups_s: alice
-},
-{
-id: bob,
-