[jira] [Comment Edited] (HBASE-10885) Support visibility expressions on Deletes

2014-07-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052112#comment-14052112
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-10885 at 7/4/14 3:02 AM:


[~enis]
Ping !! Can i commit this to branch-1?


was (Author: ram_krish):
[~enis]
Ping !! Can i commit this to 0.98?

> Support visibility expressions on Deletes
> -
>
> Key: HBASE-10885
> URL: https://issues.apache.org/jira/browse/HBASE-10885
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.1
>Reporter: Andrew Purtell
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.99.0, 1.0.0, 0.98.4
>
> Attachments: 
> 10885-org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes-output.txt,
>  HBASE-10885_0.98_1.patch, HBASE-10885_1.patch, HBASE-10885_2.patch, 
> HBASE-10885_branch_1.patch, HBASE-10885_new_tag_type_1.patch, 
> HBASE-10885_new_tag_type_2.patch, HBASE-10885_v1.patch, 
> HBASE-10885_v12.patch, HBASE-10885_v12.patch, HBASE-10885_v13.patch, 
> HBASE-10885_v15.patch, HBASE-10885_v17.patch, HBASE-10885_v2.patch, 
> HBASE-10885_v2.patch, HBASE-10885_v2.patch, HBASE-10885_v3.patch, 
> HBASE-10885_v4.patch, HBASE-10885_v5.patch, HBASE-10885_v7.patch, 
> HBASE-10885_v8.patch, HBASE-10885_v9.patch
>
>
> Accumulo can specify visibility expressions for delete markers. During 
> compaction the cells covered by the tombstone are determined in part by 
> matching the visibility expression. This is useful for the use case of data 
> set coalescing, where entries from multiple data sets carrying different 
> labels are combined into one common large table. Later, a subset of entries 
> can be conveniently removed using visibility expressions.
> Currently doing the same in HBase would only be possible with a custom 
> coprocessor. Otherwise, a Delete will affect all cells covered by the 
> tombstone regardless of any visibility expression scoping. This is correct 
> behavior in that no data spill is possible, but certainly could be 
> surprising, and is only meant to be transitional. We decided not to support 
> visibility expressions on Deletes to control the complexity of the initial 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HBASE-10885) Support visibility expressions on Deletes

2014-04-24 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980520#comment-13980520
 ] 

Andrew Purtell edited comment on HBASE-10885 at 4/25/14 12:01 AM:
--

On sorting of terminals or not, a discussion that Ram, Anoop, and I had 
included this topic and it seems reasonable to change the serialization. I 
think we should start by splitting out the ad hoc visibility tag serialization 
in VisibilityController to a separate file. We could put magic bytes in front 
and test for those, falling back to an expensive comparison of we don't find 
the magic, otherwise use one optimized for sorted representation. While we are 
at it we could use protobuf for the new serialization and so the magic preamble 
would be 'PBUF' I suppose. 


was (Author: apurtell):
On sorting of terminals or not, a discussion that Ram, Anoop, and I had 
included this topic and it seems reasonable to change the serialization. I 
think we should start by splitting out the custom visibility tag serialization 
in VisibilityController to a separate file. We could put magic bytes in front 
and test for those, falling back to an expensive comparison of we don't find 
the magic, otherwise use one optimized for sorted representation. While we are 
at it we could use protobuf for the new serialization and so the magic preamble 
would be 'PBUF' I suppose. 

> Support visibility expressions on Deletes
> -
>
> Key: HBASE-10885
> URL: https://issues.apache.org/jira/browse/HBASE-10885
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.1
>Reporter: Andrew Purtell
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.99.0, 0.98.2
>
>
> Accumulo can specify visibility expressions for delete markers. During 
> compaction the cells covered by the tombstone are determined in part by 
> matching the visibility expression. This is useful for the use case of data 
> set coalescing, where entries from multiple data sets carrying different 
> labels are combined into one common large table. Later, a subset of entries 
> can be conveniently removed using visibility expressions.
> Currently doing the same in HBase would only be possible with a custom 
> coprocessor. Otherwise, a Delete will affect all cells covered by the 
> tombstone regardless of any visibility expression scoping. This is correct 
> behavior in that no data spill is possible, but certainly could be 
> surprising, and is only meant to be transitional. We decided not to support 
> visibility expressions on Deletes to control the complexity of the initial 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HBASE-10885) Support visibility expressions on Deletes

2014-04-24 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980520#comment-13980520
 ] 

Andrew Purtell edited comment on HBASE-10885 at 4/25/14 12:00 AM:
--

On sorting of terminals or not, a discussion that Ram, Anoop, and I had 
included this topic and it seems reasonable to change the serialization. I 
think we should start by splitting out the custom visibility tag serialization 
in VisibilityController to a separate file. We could put magic bytes in front 
and test for those, falling back to an expensive comparison of we don't find 
the magic, otherwise use one optimized for sorted representation. While we are 
at it we could use protobuf for the new serialization and so the magic preamble 
would be 'PBUF' I suppose. 


was (Author: apurtell):
On sorting of terminals or not, a discussion that Ram, Anoop, and I included 
this topic and it seems reasonable to change the serialization. I think we 
should start by splitting out the custom visibility tag serialization in 
VisibilityController to a separate file. We could put magic bytes in front and 
test for those, falling back to an expensive comparison of we don't find the 
magic, otherwise use one optimized for sorted representation. While we are at 
it we could use protobuf for the new serialization and so the magic preamble 
would be 'PBUF' I suppose. 

> Support visibility expressions on Deletes
> -
>
> Key: HBASE-10885
> URL: https://issues.apache.org/jira/browse/HBASE-10885
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.1
>Reporter: Andrew Purtell
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.99.0, 0.98.2
>
>
> Accumulo can specify visibility expressions for delete markers. During 
> compaction the cells covered by the tombstone are determined in part by 
> matching the visibility expression. This is useful for the use case of data 
> set coalescing, where entries from multiple data sets carrying different 
> labels are combined into one common large table. Later, a subset of entries 
> can be conveniently removed using visibility expressions.
> Currently doing the same in HBase would only be possible with a custom 
> coprocessor. Otherwise, a Delete will affect all cells covered by the 
> tombstone regardless of any visibility expression scoping. This is correct 
> behavior in that no data spill is possible, but certainly could be 
> surprising, and is only meant to be transitional. We decided not to support 
> visibility expressions on Deletes to control the complexity of the initial 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HBASE-10885) Support visibility expressions on Deletes

2014-04-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958648#comment-13958648
 ] 

Andrew Purtell edited comment on HBASE-10885 at 4/3/14 9:00 AM:


bq. Doing like what ACL does may be easier because we could see which subject 
issues the delete. If a super user/admin that makes the put does the delete 
then we can just allow the delete to happen.

Above I suggest splitting the authorization check and the actual delete 
handling. Do the authorization check in the preDelete hook because we have the 
user's effective label set in the RPC context. Do the delete handling in 
compaction because for the deleteColumn or deleteFamily cases if we convert 
that delete request to a set of per-cell deletes, this could produce an 
explosion of tombstones. 

bq. Apart from this with the ACL delete handling case, some doubts regarding 
the handling of the deleteColumn() -  which deletes only the latest version.  
But with the current implementation even though the current version allows the 
delete with valid permissions for the user, because there is an older version 
with lesser permission we deny the delete.  Is that valid? same applies with 
deleteFamily() also.

Yes, the rule is all visible cells with an ACL must allow the delete, or the 
delete will be denied. However, we should respect the MAX_VERSIONS of column 
families as defined in the schema when determining the scope of visibility and 
so changes are needed for that (HBASE-10899). 


was (Author: apurtell):
bq. Doing like what ACL does may be easier because we could see which subject 
issues the delete. If a super user/admin that makes the put does the delete 
then we can just allow the delete to happen.

Above I suggest splitting the authorization check and the actual delete 
handling. Do the authorization check in the preDelete hook because we have the 
user's effective label set in the RPC context. Do the delete handling in 
compaction because for the deleteColumn or deleteFamily cases if we convert 
that delete request to a set of per-cell deletes, this could produce an 
explosion of tombstones. 

bq. Apart from this with the ACL delete handling case, some doubts regarding 
the handling of the deleteColumn() -  which deletes only the latest version.  
But with the current implementation even though the current version allows the 
delete with valid permissions for the user, because there is an older version 
with lesser permission we deny the delete.  Is that valid? same applies with 
deleteFamily() also.

Yes, the rule is all visible cells with an ACL must allow the delete, or the 
delete will be denied. However, we should respect the MAX_VERSION of the schema 
when determining the scope of visibility and so changes are needed for that 
(HBASE-10899). 

> Support visibility expressions on Deletes
> -
>
> Key: HBASE-10885
> URL: https://issues.apache.org/jira/browse/HBASE-10885
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.1
>Reporter: Andrew Purtell
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.99.0, 0.98.2
>
>
> Accumulo can specify visibility expressions for delete markers. During 
> compaction the cells covered by the tombstone are determined in part by 
> matching the visibility expression. This is useful for the use case of data 
> set coalescing, where entries from multiple data sets carrying different 
> labels are combined into one common large table. Later, a subset of entries 
> can be conveniently removed using visibility expressions.
> Currently doing the same in HBase would only be possible with a custom 
> coprocessor. Otherwise, a Delete will affect all cells covered by the 
> tombstone regardless of any visibility expression scoping. This is correct 
> behavior in that no data spill is possible, but certainly could be 
> surprising, and is only meant to be transitional. We decided not to support 
> visibility expressions on Deletes to control the complexity of the initial 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HBASE-10885) Support visibility expressions on Deletes

2014-04-01 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956348#comment-13956348
 ] 

Andrew Purtell edited comment on HBASE-10885 at 4/1/14 10:42 AM:
-

bq. Delete.setCellVisibility() should be supported now.

Yes.

bq. And these labels passed here will be only a list of labels and not 
visibility expressions like A|B!C?

No. Deletes should support visibility expressions just like Put, etc. The 
supplied visibility expression is then associated with the delete marker(s).

Actually, scratch what I said above in the first comment. We can store the 
delete marker as a cell with a visibility expression tag and do the work later, 
by hooking the compaction scanner. We would check for visibility expressions in 
tags on delete markers at compaction time. If we find one, then we have to 
filter only the cells covered by the tombstone that have a matching expression.

If we are not storing visibility expression terminals (LeafExpressionNodes) in 
sorted order by ordinal we probably should consider it. (I don't think we are.) 
Because e.g. A|B == B|A. It would be most efficient if we can simply do byte 
comparison of serialized visibility expressions on the delete marker and any 
found while enumerating cells covered by it. 

If a delete marker has a visibility expression, then we only apply it to cells 
with matching visibility  expressions. If a cell has no visibility tag then it 
does not match. (A|B != nil)

Should we check that the supplied expression does not exceed the maximal 
authorization set for the user submitting the Delete in the preDelete hook? In 
other words, should we we allow a user only granted authorization A to submit a 
delete with visibility expression A|B? We should not, in my opinion. Recommend 
we answer this question for other op types on another JIRA, should there be any.


was (Author: apurtell):
bq. Delete.setCellVisibility() should be supported now.

Yes.

bq. And these labels passed here will be only a list of labels and not 
visibility expressions like A|B!C?

No. Deletes should support visibility expressions just like Put, etc. The 
supplied visibility expression is then associated with the delete marker(s).

Actually, scratch what I said above in the first comment. We can store the 
delete marker as a cell with a visibility expression tag and do the work later, 
by hooking the compaction scanner. We would check for visibility expressions in 
tags on delete markers at compaction time. If we find one, then we have to 
filter only the cells covered by the tombstone that have a matching expression.

If we are not storing visibility expression terminals (LeafExpressionNodes) in 
sorted order by ordinal we probably should consider it. (I don't think we are.) 
Because e.g. A|B == B|A. It would be most efficient if we can simply do byte 
comparison of serialized visibility expressions on the delete marker and any 
found while enumerating cells covered by it. 

If a delete marker has a visibility expression, then we only apply it to cells 
with matching visibility  expressions. If a cell has no visibility tag then it 
does not match. (A|B != nil)

Should we check that the supplied expression does not exceed the maximal 
authorization set for the user submitting the Delete in the preDelete hook? In 
other words, should we we allow a user only granted authorization A to submit a 
delete with visibility expression A|B? We should not, in my opinion. It is 
different for the delete case than others because delete is a destructive 
operation. Recommend we answer this question for other op types on another 
JIRA, should there be any.

> Support visibility expressions on Deletes
> -
>
> Key: HBASE-10885
> URL: https://issues.apache.org/jira/browse/HBASE-10885
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.1
>Reporter: Andrew Purtell
> Fix For: 0.99.0, 0.98.2
>
>
> Accumulo can specify visibility expressions for delete markers. During 
> compaction the cells covered by the tombstone are determined in part by 
> matching the visibility expression. This is useful for the use case of data 
> set coalescing, where entries from multiple data sets carrying different 
> labels are combined into one common large table. Later, a subset of entries 
> can be conveniently removed using visibility expressions.
> Currently doing the same in HBase would only be possible with a custom 
> coprocessor. Otherwise, a Delete will affect all cells covered by the 
> tombstone regardless of any visibility expression scoping. This is correct 
> behavior in that no data spill is possible, but certainly could be 
> surprising, and is only meant to be transitional. We decided not to support 
> visibility expressions on Deletes to contr

[jira] [Comment Edited] (HBASE-10885) Support visibility expressions on Deletes

2014-04-01 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956348#comment-13956348
 ] 

Andrew Purtell edited comment on HBASE-10885 at 4/1/14 10:40 AM:
-

bq. Delete.setCellVisibility() should be supported now.

Yes.

bq. And these labels passed here will be only a list of labels and not 
visibility expressions like A|B!C?

No. Deletes should support visibility expressions just like Put, etc. The 
supplied visibility expression is then associated with the delete marker(s).

Actually, scratch what I said above in the first comment. We can store the 
delete marker as a cell with a visibility expression tag and do the work later, 
by hooking the compaction scanner. We would check for visibility expressions in 
tags on delete markers at compaction time. If we find one, then we have to 
filter only the cells covered by the tombstone that have a matching expression.

If we are not storing visibility expression terminals (LeafExpressionNodes) in 
sorted order by ordinal we probably should consider it. (I don't think we are.) 
Because e.g. A|B == B|A. It would be most efficient if we can simply do byte 
comparison of serialized visibility expressions on the delete marker and any 
found while enumerating cells covered by it. 

If a delete marker has a visibility expression, then we only apply it to cells 
with matching visibility  expressions. If a cell has no visibility tag then it 
does not match. (A|B != nil)

Should we check that the supplied expression does not exceed the maximal 
authorization set for the user submitting the Delete in the preDelete hook? In 
other words, should we we allow a user only granted authorization A to submit a 
delete with visibility expression A|B? We should not, in my opinion. It is 
different for the delete case than others because delete is a destructive 
operation. Recommend we answer this question for other op types on another 
JIRA, should there be any.


was (Author: apurtell):
bq. Delete.setCellVisibility() should be supported now.

Yes.

bq. And these labels passed here will be only a list of labels and not 
visibility expressions like A|B!C?

No. Deletes should support visibility expressions just like Put, etc. The 
supplied visibility expression is then associated with the delete marker(s).

Actually, scratch what I said above in the first comment. I think we can check 
that the supplied expression does not exceed the maximal authorization set for 
the user submitting the Delete in the preDelete hook and then store the delete 
marker as a cell with a visibility expression tag and do the rest of the work 
later, by hooking the compaction scanner. The big change then would be checking 
for visibility expressions in tags on delete markers at compaction time. If we 
find one, then we have to filter only the cells covered by the tombstone that 
have a matching expression.

If we are not storing visibility expression terminals (LeafExpressionNodes) in 
sorted order by ordinal we probably should consider it. (I don't think we are.) 
Because e.g. A|B == B|A. It would be most efficient if we can simply do byte 
comparison of serialized visibility expressions on the delete marker and any 
found while enumerating cells covered by it. 

If a delete marker has a visibility expression, then we only apply it to cells 
with matching visibility  expressions. If a cell has no visibility tag then it 
does not match. (A|B != nil)

> Support visibility expressions on Deletes
> -
>
> Key: HBASE-10885
> URL: https://issues.apache.org/jira/browse/HBASE-10885
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.1
>Reporter: Andrew Purtell
> Fix For: 0.99.0, 0.98.2
>
>
> Accumulo can specify visibility expressions for delete markers. During 
> compaction the cells covered by the tombstone are determined in part by 
> matching the visibility expression. This is useful for the use case of data 
> set coalescing, where entries from multiple data sets carrying different 
> labels are combined into one common large table. Later, a subset of entries 
> can be conveniently removed using visibility expressions.
> Currently doing the same in HBase would only be possible with a custom 
> coprocessor. Otherwise, a Delete will affect all cells covered by the 
> tombstone regardless of any visibility expression scoping. This is correct 
> behavior in that no data spill is possible, but certainly could be 
> surprising, and is only meant to be transitional. We decided not to support 
> visibility expressions on Deletes to control the complexity of the initial 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HBASE-10885) Support visibility expressions on Deletes

2014-04-01 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956348#comment-13956348
 ] 

Andrew Purtell edited comment on HBASE-10885 at 4/1/14 10:25 AM:
-

bq. Delete.setCellVisibility() should be supported now.

Yes.

bq. And these labels passed here will be only a list of labels and not 
visibility expressions like A|B!C?

No. Deletes should support visibility expressions just like Put, etc. The 
supplied visibility expression is then associated with the delete marker(s).

Actually, scratch what I said above in the first comment. I think we can check 
that the supplied expression does not exceed the maximal authorization set for 
the user submitting the Delete in the preDelete hook and then store the delete 
marker as a cell with a visibility expression tag and do the rest of the work 
later, by hooking the compaction scanner. The big change then would be checking 
for visibility expressions in tags on delete markers at compaction time. If we 
find one, then we have to filter only the cells covered by the tombstone that 
have a matching expression.

If we are not storing visibility expression terminals (LeafExpressionNodes) in 
sorted order by ordinal we probably should consider it. (I don't think we are.) 
Because e.g. A|B == B|A. It would be most efficient if we can simply do byte 
comparison of serialized visibility expressions on the delete marker and any 
found while enumerating cells covered by it. 

If a delete marker has a visibility expression, then we only apply it to cells 
with matching visibility  expressions. If a cell has no visibility tag then it 
does not match. (A|B != nil)


was (Author: apurtell):
bq. Delete.setCellVisibility() should be supported now.

Yes.

bq. And these labels passed here will be only a list of labels and not 
visibility expressions like A|B!C?

No. Deletes should support visibility expressions just like Put, etc. The 
supplied visibility expression is then associated with the delete marker(s).

Actually, scratch what I said above in the first comment. I think we can check 
that the supplied expression does not exceed the maximal authorization set for 
the user submitting the Delete in the preDelete hook and then store the delete 
marker as a cell with a visibility expression tag and do the rest of the work 
later, by hooking the compaction scanner. The big change then would be checking 
for visibility expressions in tags on delete markers at compaction time. If we 
find one, then we have to filter only the cells covered by the tombstone that 
have a matching expression.

If we are not storing visibility expression terminals (LeafExpressionNodes) in 
sorted order by ordinal we probably should consider it. (I don't think we are.) 
Because e.g. A|B == B|A. It would be most efficient if we can simply do byte 
comparison of serialized visibility expressions. 

> Support visibility expressions on Deletes
> -
>
> Key: HBASE-10885
> URL: https://issues.apache.org/jira/browse/HBASE-10885
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.1
>Reporter: Andrew Purtell
> Fix For: 0.99.0, 0.98.2
>
>
> Accumulo can specify visibility expressions for delete markers. During 
> compaction the cells covered by the tombstone are determined in part by 
> matching the visibility expression. This is useful for the use case of data 
> set coalescing, where entries from multiple data sets carrying different 
> labels are combined into one common large table. Later, a subset of entries 
> can be conveniently removed using visibility expressions.
> Currently doing the same in HBase would only be possible with a custom 
> coprocessor. Otherwise, a Delete will affect all cells covered by the 
> tombstone regardless if they are visible to the user issuing the delete or 
> not. This is correct behavior in that no data spill is possible, but 
> certainly could be surprising, and is only meant to be transitional. We 
> decided not to support visibility expressions on Deletes to control the 
> complexity of the initial implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)