RE: Writing visibility labels with HFileOutputFormat2

2016-06-16 Thread Ellis, Tom (Financial Markets IT)
Hi Again Ram,

"hbase.security.visibility.mutations.checkauths" - for now the method of 
set_auths 'client','system' along with only giving 'client' read on 
'hbase:labels' is working for me.

"Coming to reading the HFile and creating a bulk load, I think we should be 
more cautious here " - I don't follow again sorry. The spark user writes the 
HFile, and then initiates the load with LoadIncrementalHFiles.doBulkLoad - so 
long as only the HBase user and the spark user can read/write to the file, I'm 
not sure what the risk is?

HBASE-15707 - am I able to read the HFile manually to determine if Tags have 
been written properly?

Cheers,

Tom


-Original Message-
From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
Sent: 16 June 2016 06:01
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


Thanks for the updates here. Going through the mails here
>> Why is it that a client user without admin/super user privileges can
>> set
a visibility expression using Put.setCellVisibility, but if we want to write 
using HFiles,

I get your point now. There is a property 
'"hbase.security.visibility.mutations.checkauths" if set will check if the user 
is authorized to mutate the visibility labels that he is trying to write. If 
the user is not allowed to add that label the mutation will fail.
Can you see if this solves the other problem of allowing any client user to 
write? If the above is not well documented pls feel free to raise a JIRA and we 
are happy to address it.

Coming to reading the HFile and creating a bulk load, I think we should be more 
cautious here. There are some critical info stored in the HFile and just 
allowing any user to read it is going to be risky.

Coming to the PutSortReducer problem,  I think what you say is true. Not sure 
if there is a bug already, if not pls feel free to raise a bug here.
We need to fix it.

 HBASE-15707 - you may need this because for scala's HBasecontext you need to 
ensure tags are included just incase ImportTSV has to be used.

Write back, if I had missed something or if my info was lacking. Its been quite 
sometime we had worked in this area so have to see code every time to know what 
was done.

Regards
Ram

On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) < 
tom.el...@lloydsbanking.com.invalid> wrote:

> So, I can see that I can correctly get the Lists from the
> VisibilityExpressionResolver, set them on the Cell, and write them
> using HFileOutputFormat2, however when I scan using an unprivileged
> user I can still see the cells. If I write the cells with
> setCellVisibility the unprivileged user can't see them.
>
> Then I noticed the fix for HBASE-15707. I am using the Hortonworks'
> HBase
> 1.1.2 - am affected by this/does HFileOutputFormat2 support tags
> before this fix?
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low
> carbon economy.
> Corporate Responsibility Report:
> www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: Ellis, Tom (Financial Markets IT) [mailto:
> tom.el...@lloydsbanking.com.INVALID]
> Sent: 15 June 2016 17:42
> To: user@hbase.apache.org
> Subject: RE: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> Looking at the source for how DefaultCellLabelServiceImpl checks
> authorisation I noted it's just that the user just needs to have the
> 'system' label auth privileges - not admin/super user as I thought you
> meant Ram. So technically, I could have a client user that is given
> the system label privileges, but only read access to the 'hbase:labels' table?
>
> Then that user will still be able to scan and read the labels +
> ordinal, and create the tags correctly :) I'll give it a go..
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low
> carbon economy.
> Corporate Responsibility Report:
> www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: Ellis, Tom (Financial Markets IT) [mailto:
> tom.el...@lloydsbanking.com.INVALID]
> Sent: 15 June 2016 16:56
> To: user@hbase.apache.org
> Subject: RE: Writing visibility labels with HFileOutputFormat2
>
> -- This email has

RE: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread Ellis, Tom (Financial Markets IT)
So, I can see that I can correctly get the Lists from the 
VisibilityExpressionResolver, set them on the Cell, and write them using 
HFileOutputFormat2, however when I scan using an unprivileged user I can still 
see the cells. If I write the cells with setCellVisibility the unprivileged 
user can't see them.

Then I noticed the fix for HBASE-15707. I am using the Hortonworks' HBase 1.1.2 
- am affected by this/does HFileOutputFormat2 support tags before this fix?

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: Ellis, Tom (Financial Markets IT) 
[mailto:tom.el...@lloydsbanking.com.INVALID]
Sent: 15 June 2016 17:42
To: user@hbase.apache.org
Subject: RE: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


Looking at the source for how DefaultCellLabelServiceImpl checks authorisation 
I noted it's just that the user just needs to have the 'system' label auth 
privileges - not admin/super user as I thought you meant Ram. So technically, I 
could have a client user that is given the system label privileges, but only 
read access to the 'hbase:labels' table?

Then that user will still be able to scan and read the labels + ordinal, and 
create the tags correctly :) I'll give it a go..

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: Ellis, Tom (Financial Markets IT) 
[mailto:tom.el...@lloydsbanking.com.INVALID]
Sent: 15 June 2016 16:56
To: user@hbase.apache.org
Subject: RE: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


I see now from some other examples I've found that actually this form of using 
HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map 
output class of the job you give it to Put. Looking at the source for 
PutSourceReducer it seems that it will actually lose the Cell Visibility 
information as it uses the getFamilyCellMap to create KeyValue objects and just 
uses that, and the CellVisibility is actually on the Put Mutation.

So I think that unfortunately, I can only really work around this by giving the 
application user writing the HFile admin access so it can then use the 
VisibilityExpressionResolver to create cells with tags with the correct 
ordinals.

Am I missing something? Why is it that a client user without admin/super user 
privileges can set a visibility expression using Put.setCellVisibility, but if 
we want to write using HFiles, the client user has to have admin/super user 
privileges so they can use VisibilityExpressionResolver to correctly create the 
tags on the Cell with correct ordinals?

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: Ellis, Tom (Financial Markets IT) 
[mailto:tom.el...@lloydsbanking.com.INVALID]
Sent: 15 June 2016 16:25
To: user@hbase.apache.org
Subject: RE: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


So I have a working prototype using just bulk puts on a table and using 
setCellVisibility as necessary. Now I'm trying to do it using HFile.

Sorry Ram, I don't quite follow why the user doing the writing of the HFile has 
to be an admin/super user? Is that necessary to load HFiles?

The use case is to hopefully have an application user (non admin) performing 
the writes to an hbase table via a bulk load of an hfile, setting visibility 
labels on individual cells as necessary. Then business users who has been given 
the auth to view that label can see those cells, and others not.

I've seen that it's possible to do this with map reduce & setting the map 
output to be a Put (and thus could setCellVisibility on the puts), but I'm 
struggling to do this with Spark, as I keep getting the exception that I can't 
cast a Put to a Cell.

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Ban

RE: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread Ellis, Tom (Financial Markets IT)
Looking at the source for how DefaultCellLabelServiceImpl checks authorisation 
I noted it's just that the user just needs to have the 'system' label auth 
privileges - not admin/super user as I thought you meant Ram. So technically, I 
could have a client user that is given the system label privileges, but only 
read access to the 'hbase:labels' table?

Then that user will still be able to scan and read the labels + ordinal, and 
create the tags correctly :) I'll give it a go..

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: Ellis, Tom (Financial Markets IT) 
[mailto:tom.el...@lloydsbanking.com.INVALID]
Sent: 15 June 2016 16:56
To: user@hbase.apache.org
Subject: RE: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


I see now from some other examples I've found that actually this form of using 
HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map 
output class of the job you give it to Put. Looking at the source for 
PutSourceReducer it seems that it will actually lose the Cell Visibility 
information as it uses the getFamilyCellMap to create KeyValue objects and just 
uses that, and the CellVisibility is actually on the Put Mutation.

So I think that unfortunately, I can only really work around this by giving the 
application user writing the HFile admin access so it can then use the 
VisibilityExpressionResolver to create cells with tags with the correct 
ordinals.

Am I missing something? Why is it that a client user without admin/super user 
privileges can set a visibility expression using Put.setCellVisibility, but if 
we want to write using HFiles, the client user has to have admin/super user 
privileges so they can use VisibilityExpressionResolver to correctly create the 
tags on the Cell with correct ordinals?

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: Ellis, Tom (Financial Markets IT) 
[mailto:tom.el...@lloydsbanking.com.INVALID]
Sent: 15 June 2016 16:25
To: user@hbase.apache.org
Subject: RE: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


So I have a working prototype using just bulk puts on a table and using 
setCellVisibility as necessary. Now I'm trying to do it using HFile.

Sorry Ram, I don't quite follow why the user doing the writing of the HFile has 
to be an admin/super user? Is that necessary to load HFiles?

The use case is to hopefully have an application user (non admin) performing 
the writes to an hbase table via a bulk load of an hfile, setting visibility 
labels on individual cells as necessary. Then business users who has been given 
the auth to view that label can see those cells, and others not.

I've seen that it's possible to do this with map reduce & setting the map 
output to be a Put (and thus could setCellVisibility on the puts), but I'm 
struggling to do this with Spark, as I keep getting the exception that I can't 
cast a Put to a Cell.

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
Sent: 15 June 2016 12:31
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


>>We could I guess create multiple puts for cells in the same row with
different labels and use the setCellVisibility on each individual put/cell, but 
will this create additional overhead?
This can be done. If you want different cells in the same row to have different 
labels then it is better to create those many puts and setCellVisibility on 
each of them. What type of overhead you see here? In terms of the server 
processing them? If so there should not be much overhead here and also adding 
different cells to every column inturn means you need every cell to be treated 
differenly in terms of security. so should be fine IMHO.

Without doing put.setCellvisibility() there is no other way I be

RE: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread Ellis, Tom (Financial Markets IT)
Thanks Ted - It was just a class cast on line 161 of HFileOutput2.write, 
because I had previously read that you could give it Puts, but it can actually 
only take Cells. You can only do Puts if you use configureIncrementalLoad which 
then sets up the PutSortReducer as I discussed in my other email.

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: 15 June 2016 17:01
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


Tom:
Can you pastebin the stack trace for the exception ?

It would be nice if you can show snippet of your code too.

Thanks

> On Jun 15, 2016, at 8:24 AM, Ellis, Tom (Financial Markets IT) 
> <tom.el...@lloydsbanking.com.INVALID> wrote:
>
> So I have a working prototype using just bulk puts on a table and using 
> setCellVisibility as necessary. Now I'm trying to do it using HFile.
>
> Sorry Ram, I don't quite follow why the user doing the writing of the HFile 
> has to be an admin/super user? Is that necessary to load HFiles?
>
> The use case is to hopefully have an application user (non admin) performing 
> the writes to an hbase table via a bulk load of an hfile, setting visibility 
> labels on individual cells as necessary. Then business users who has been 
> given the auth to view that label can see those cells, and others not.
>
> I've seen that it's possible to do this with map reduce & setting the map 
> output to be a Put (and thus could setCellVisibility on the puts), but I'm 
> struggling to do this with Spark, as I keep getting the exception that I 
> can't cast a Put to a Cell.
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low carbon 
> economy.
> Corporate Responsibility Report:
> www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> Sent: 15 June 2016 12:31
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
>>> We could I guess create multiple puts for cells in the same row with
> different labels and use the setCellVisibility on each individual put/cell, 
> but will this create additional overhead?
> This can be done. If you want different cells in the same row to have 
> different labels then it is better to create those many puts and 
> setCellVisibility on each of them. What type of overhead you see here? In 
> terms of the server processing them? If so there should not be much overhead 
> here and also adding different cells to every column inturn means you need 
> every cell to be treated differenly in terms of security. so should be fine 
> IMHO.
>
> Without doing put.setCellvisibility() there is no other way I believe. One 
> question regarding your use case Now in the mail you had told about the spark 
> job where you will create a bulk loaded file. Now if that is to have all the 
> visibility related information of all the cells then the user doing this job 
> should be an admin or super user right Why is the case that a normal client 
> user will read through all the visibility cells which may or may not be 
> associated with that user?
>
> Thank you very much for testing and using this feature. LEt us know your 
> feedback and if you find any gaps here. Happy to help.
>
> Regards
> Ram
>
>
>> On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < 
>> tom.el...@lloydsbanking.com.invalid> wrote:
>>
>> Hmm, is there no other way to set labels on individual cells where we
>> don't have to give the client users system perms? For instance,
>> client users can set the cell visibility on the entire put without
>> having this (i.e. put.setCellVisibility("label")) and the
>> VisibilityController will check this.
>>
>> We could I guess create multiple puts for cells in the same row with
>> different labels and use the setCellVisibility on each individual
>> put/cell, but will this create additional overhead?
>>
>> Cheers

RE: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread Ellis, Tom (Financial Markets IT)
I see now from some other examples I've found that actually this form of using 
HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map 
output class of the job you give it to Put. Looking at the source for 
PutSourceReducer it seems that it will actually lose the Cell Visibility 
information as it uses the getFamilyCellMap to create KeyValue objects and just 
uses that, and the CellVisibility is actually on the Put Mutation.

So I think that unfortunately, I can only really work around this by giving the 
application user writing the HFile admin access so it can then use the 
VisibilityExpressionResolver to create cells with tags with the correct 
ordinals.

Am I missing something? Why is it that a client user without admin/super user 
privileges can set a visibility expression using Put.setCellVisibility, but if 
we want to write using HFiles, the client user has to have admin/super user 
privileges so they can use VisibilityExpressionResolver to correctly create the 
tags on the Cell with correct ordinals?

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: Ellis, Tom (Financial Markets IT) 
[mailto:tom.el...@lloydsbanking.com.INVALID]
Sent: 15 June 2016 16:25
To: user@hbase.apache.org
Subject: RE: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


So I have a working prototype using just bulk puts on a table and using 
setCellVisibility as necessary. Now I'm trying to do it using HFile.

Sorry Ram, I don't quite follow why the user doing the writing of the HFile has 
to be an admin/super user? Is that necessary to load HFiles?

The use case is to hopefully have an application user (non admin) performing 
the writes to an hbase table via a bulk load of an hfile, setting visibility 
labels on individual cells as necessary. Then business users who has been given 
the auth to view that label can see those cells, and others not.

I've seen that it's possible to do this with map reduce & setting the map 
output to be a Put (and thus could setCellVisibility on the puts), but I'm 
struggling to do this with Spark, as I keep getting the exception that I can't 
cast a Put to a Cell.

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
Sent: 15 June 2016 12:31
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


>>We could I guess create multiple puts for cells in the same row with
different labels and use the setCellVisibility on each individual put/cell, but 
will this create additional overhead?
This can be done. If you want different cells in the same row to have different 
labels then it is better to create those many puts and setCellVisibility on 
each of them. What type of overhead you see here? In terms of the server 
processing them? If so there should not be much overhead here and also adding 
different cells to every column inturn means you need every cell to be treated 
differenly in terms of security. so should be fine IMHO.

Without doing put.setCellvisibility() there is no other way I believe. One 
question regarding your use case Now in the mail you had told about the spark 
job where you will create a bulk loaded file. Now if that is to have all the 
visibility related information of all the cells then the user doing this job 
should be an admin or super user right Why is the case that a normal client 
user will read through all the visibility cells which may or may not be 
associated with that user?

Thank you very much for testing and using this feature. LEt us know your 
feedback and if you find any gaps here. Happy to help.

Regards
Ram


On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < 
tom.el...@lloydsbanking.com.invalid> wrote:

> Hmm, is there no other way to set labels on individual cells where we
> don't have to give the client users system perms? For instance, client
> users can set the cell visibility on the entire put without having
> this (i.e. put.setCellVisibility("label")) and the
> VisibilityController will check this.
>
> We could I guess create multiple puts for cells in the same row with
> different labels and 

RE: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread Ellis, Tom (Financial Markets IT)
So I have a working prototype using just bulk puts on a table and using 
setCellVisibility as necessary. Now I'm trying to do it using HFile.

Sorry Ram, I don't quite follow why the user doing the writing of the HFile has 
to be an admin/super user? Is that necessary to load HFiles?

The use case is to hopefully have an application user (non admin) performing 
the writes to an hbase table via a bulk load of an hfile, setting visibility 
labels on individual cells as necessary. Then business users who has been given 
the auth to view that label can see those cells, and others not.

I've seen that it's possible to do this with map reduce & setting the map 
output to be a Put (and thus could setCellVisibility on the puts), but I'm 
struggling to do this with Spark, as I keep getting the exception that I can't 
cast a Put to a Cell.

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads


-Original Message-
From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
Sent: 15 June 2016 12:31
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


>>We could I guess create multiple puts for cells in the same row with
different labels and use the setCellVisibility on each individual put/cell, but 
will this create additional overhead?
This can be done. If you want different cells in the same row to have different 
labels then it is better to create those many puts and setCellVisibility on 
each of them. What type of overhead you see here? In terms of the server 
processing them? If so there should not be much overhead here and also adding 
different cells to every column inturn means you need every cell to be treated 
differenly in terms of security. so should be fine IMHO.

Without doing put.setCellvisibility() there is no other way I believe. One 
question regarding your use case Now in the mail you had told about the spark 
job where you will create a bulk loaded file. Now if that is to have all the 
visibility related information of all the cells then the user doing this job 
should be an admin or super user right Why is the case that a normal client 
user will read through all the visibility cells which may or may not be 
associated with that user?

Thank you very much for testing and using this feature. LEt us know your 
feedback and if you find any gaps here. Happy to help.

Regards
Ram


On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < 
tom.el...@lloydsbanking.com.invalid> wrote:

> Hmm, is there no other way to set labels on individual cells where we
> don't have to give the client users system perms? For instance, client
> users can set the cell visibility on the entire put without having
> this (i.e. put.setCellVisibility("label")) and the
> VisibilityController will check this.
>
> We could I guess create multiple puts for cells in the same row with
> different labels and use the setCellVisibility on each individual
> put/cell, but will this create additional overhead?
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low
> carbon economy.
> Corporate Responsibility Report:
> www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> Sent: 15 June 2016 11:24
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> The visibility expression resolver tries to scan the labels table and
> the user using the resolver should have the SYSTEM privileges. Since
> the information that is getting accessed is sensitive information.
>
> Suppose in your above case you have the client user added as a an
> admin then when you scan the label table you should be able to  scan it.
>
> Regards
> Ram
>
> On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
> tom.el...@lloydsbanking.com.invalid> wrote:
>
> > Yeah, thanks for this Ram. Although in my testing I have found that
> > a client user attempting to use the visibility expression resolver
> > doesn't seem to have the ability to scan the hbase:labels table for
> > the full list of labels and thus can't get the ordinals

RE: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread Ellis, Tom (Financial Markets IT)
Hmm, is there no other way to set labels on individual cells where we don't 
have to give the client users system perms? For instance, client users can set 
the cell visibility on the entire put without having this (i.e. 
put.setCellVisibility("label")) and the VisibilityController will check this. 

We could I guess create multiple puts for cells in the same row with different 
labels and use the setCellVisibility on each individual put/cell, but will this 
create additional overhead?

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , , 
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads 


-Original Message-
From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] 
Sent: 15 June 2016 11:24
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --
 

The visibility expression resolver tries to scan the labels table and the user 
using the resolver should have the SYSTEM privileges. Since the information 
that is getting accessed is sensitive information.

Suppose in your above case you have the client user added as a an admin then 
when you scan the label table you should be able to  scan it.

Regards
Ram

On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) < 
tom.el...@lloydsbanking.com.invalid> wrote:

> Yeah, thanks for this Ram. Although in my testing I have found that a 
> client user attempting to use the visibility expression resolver 
> doesn't seem to have the ability to scan the hbase:labels table for 
> the full list of labels and thus can't get the ordinals/tags to add to 
> the cell. Does the client user attempting to use the 
> VisibilityExpressionResolver have to have some special permissions?
>
> Scan of hbase:labels by client user:
>
> hbase(main):003:0> scan 'hbase:labels'
> ROW COLUMN+CELL
>  \x00\x00\x00\x01   column=f:\x00,
> timestamp=1465216652662, value=system
> 1 row(s) in 0.0650 seconds
>
> Scan of hbase:labels by hbase user:
>
> hbase(main):001:0> scan 'hbase:labels'
> ROW COLUMN+CELL
>  \x00\x00\x00\x01   column=f:\x00,
> timestamp=1465216652662, value=system
>  \x00\x00\x00\x02   column=f:\x00,
> timestamp=1465216944935, value=protected
>  \x00\x00\x00\x02   column=f:hbase,
> timestamp=1465547138533, value=
>  \x00\x00\x00\x02   column=f:tom,
> timestamp=1465980236882, value=
>  \x00\x00\x00\x03   column=f:\x00,
> timestamp=1465500156667, value=testtesttest
>  \x00\x00\x00\x03   column=f:@hadoop,
> timestamp=1465980236967, value=
>  \x00\x00\x00\x03   column=f:hadoop,
> timestamp=1465547304610, value=
>  \x00\x00\x00\x03   column=f:hive,
> timestamp=1465501322616, value=
>  \x00\x00\x00\x04   column=f:\x00,
> timestamp=1465570719901, value=confidential
>  \x00\x00\x00\x05   column=f:\x00,
> timestamp=1465835047835, value=branch
>  \x00\x00\x00\x05   column=f:hdfs,
> timestamp=1465980237060, value=
>  \x00\x00\x00\x06   column=f:\x00,
> timestamp=1465980447307, value=group
>  \x00\x00\x00\x06   column=f:hdfs,
> timestamp=1465980454130, value=
> 6 row(s) in 0.7370 seconds
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low 
> carbon economy.
> Corporate Responsibility Report: 
> www.lloydsbankinggroup-cr.com/downloads
>
> -Original Message-
> From: Anoop John [mailto:anoop.hb...@gmail.com]
> Sent: 08 June 2016 11:58
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> Thanks Ram.. Ya that seems the best way as CellCreator is public 
> exposed class. May be we should explain abt this in hbase book under 
> the Visibility labels area.  Good to know you have Visibility labels 
> based usecase. Let us know in case of any trouble.  Thanks.
>
> -Anoop-
>
> On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasude

RE: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread Ellis, Tom (Financial Markets IT)
Yeah, thanks for this Ram. Although in my testing I have found that a client 
user attempting to use the visibility expression resolver doesn't seem to have 
the ability to scan the hbase:labels table for the full list of labels and thus 
can't get the ordinals/tags to add to the cell. Does the client user attempting 
to use the VisibilityExpressionResolver have to have some special permissions?

Scan of hbase:labels by client user:

hbase(main):003:0> scan 'hbase:labels'
ROW COLUMN+CELL
 \x00\x00\x00\x01   column=f:\x00, 
timestamp=1465216652662, value=system
1 row(s) in 0.0650 seconds

Scan of hbase:labels by hbase user:

hbase(main):001:0> scan 'hbase:labels'
ROW COLUMN+CELL
 \x00\x00\x00\x01   column=f:\x00, 
timestamp=1465216652662, value=system
 \x00\x00\x00\x02   column=f:\x00, 
timestamp=1465216944935, value=protected
 \x00\x00\x00\x02   column=f:hbase, 
timestamp=1465547138533, value=
 \x00\x00\x00\x02   column=f:tom, 
timestamp=1465980236882, value=
 \x00\x00\x00\x03   column=f:\x00, 
timestamp=1465500156667, value=testtesttest
 \x00\x00\x00\x03   column=f:@hadoop, 
timestamp=1465980236967, value=
 \x00\x00\x00\x03   column=f:hadoop, 
timestamp=1465547304610, value=
 \x00\x00\x00\x03   column=f:hive, 
timestamp=1465501322616, value=
 \x00\x00\x00\x04   column=f:\x00, 
timestamp=1465570719901, value=confidential
 \x00\x00\x00\x05   column=f:\x00, 
timestamp=1465835047835, value=branch
 \x00\x00\x00\x05   column=f:hdfs, 
timestamp=1465980237060, value=
 \x00\x00\x00\x06   column=f:\x00, 
timestamp=1465980447307, value=group
 \x00\x00\x00\x06   column=f:hdfs, 
timestamp=1465980454130, value=
6 row(s) in 0.7370 seconds

Cheers,

Tom Ellis
Consultant Developer – Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , , 
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads 

-Original Message-
From: Anoop John [mailto:anoop.hb...@gmail.com] 
Sent: 08 June 2016 11:58
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --
 

Thanks Ram.. Ya that seems the best way as CellCreator is public exposed class. 
May be we should explain abt this in hbase book under the Visibility labels 
area.  Good to know you have Visibility labels based usecase. Let us know in 
case of any trouble.  Thanks.

-Anoop-

On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan 
<ramkrishna.s.vasude...@gmail.com> wrote:
> Hi
>
> It can be done. See the class CellCreator which is Public facing interface.
> When you create your spark job to create the hadoop files that 
> produces the
> HFileOutputformat2 data. While creating the KeyValues you can use the 
> CellCreator to create your KeyValues and use the
> CellCreator.getVisibilityExpressionResolver() to map your String 
> Visibility tags with the system generated ordinals.
>
> For eg, you can see how TextSortReducer works.  I think this should 
> help you solve your problem. Let us know if you need further information.
>
> Regards
> Ram
>
> On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) < 
> tom.el...@lloydsbanking.com.invalid> wrote:
>
>> Hi Ram,
>>
>> We're attempting to do it programmatically so:
>>
>> The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and 
>> using ImmutableBytesWritable as the key (rowkey) with KeyValue as the 
>> value, and using the HFilOutputFormat2 format.
>> This HFile is then loaded using HBase client's 
>> LoadIncrementalHFiles.doBulkLoad
>>
>> Is there a way to do this programmatically without using the 
>> ImportTsv tool? I was taking a look at 
>> VisibilityUtils.createVisibilityExpTags and maybe being able to just 
>> create the Tags myself that way (although it's obviously 
>> @InterfaceAudience.Private) but it seems to be able to use that I'd need to 
>> know Label ordinality client side..
>>
>> Thanks for your help,
>>
>> Tom
>>
>> -Original Message-
>> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
>> Sent: 07 June 2016 11:19
>> To: user@hbase.apache.org
>> Subject: Re: Writing visibility labels with HFileOutputFormat2
>>
>> -- This email has

RE: Writing visibility labels with HFileOutputFormat2

2016-06-07 Thread Ellis, Tom (Financial Markets IT)
Hi Ram,

We're attempting to do it programmatically so:

The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and using 
ImmutableBytesWritable as the key (rowkey) with KeyValue as the value, and 
using the HFilOutputFormat2 format.
This HFile is then loaded using HBase client's LoadIncrementalHFiles.doBulkLoad

Is there a way to do this programmatically without using the ImportTsv tool? I 
was taking a look at VisibilityUtils.createVisibilityExpTags and maybe being 
able to just create the Tags myself that way (although it's obviously 
@InterfaceAudience.Private) but it seems to be able to use that I'd need to 
know Label ordinality client side..

Thanks for your help,

Tom 

-Original Message-
From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] 
Sent: 07 June 2016 11:19
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --
 

Hi Ellis

How is the HFileOutputFormat2 files created?  Are you using the ImportTsv tool? 
 If you are using the ImportTsv tool then yes there is a way to specify 
visibility tags while loading from the ImportTsv tool and those visibility tags 
are also bulk loaded as HFile.

There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to indicate 
that the data will have Visibility Tags and the tool will automatically parse 
the specified field as Visibility Tag.

In case you have access to the code you can see the test case 
TestImportTSVWithVisibilityLabels to get an initial idea of how it is being 
done. If not get back to us, happy to help .

Regards
Ram



On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) < 
tom.el...@lloydsbanking.com.invalid> wrote:

> Hi,
>
> I was wondering if it's possible/how to write Visibility Labels to an 
> HFileOutputFormat2? I believe Visibility Labels are just implemented 
> as Tags, but with the normal way of writing them with 
> Mutation#setCellVisibility these are formally written as Tags to the 
> cells during the VisibilityController coprocessor as we need to assert 
> the expression is valid for the labels configured.
>
> How can we add visibility labels to cells if we have a job that 
> creates an HFile with HFileOutputFormat2 which is then subsequently 
> loaded using LoadIncrementalHFiles?
>
> Cheers,
>
> Tom Ellis
> Consultant Developer - Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
> 
>
> E: tom.el...@lloydsbanking.com<mailto:tom.el...@lloydsbanking.com>
> Website: 
> www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
> >
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low 
> carbon economy.
> Corporate Responsibility Report: 
> www.lloydsbankinggroup-cr.com/downloads<
> http://www.lloydsbankinggroup-cr.com/downloads>
>
>
>
> Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
> Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds 
> Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. 
> Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of 
> Scotland plc.
> Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
> SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. 
> Registered
> Office: Barnett Way, Gloucester GL4 3RL. Registered in England and 
> Wales 2299428. Telephone: 0345 603 1637
>
> Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential 
> Regulation Authority and regulated by the Financial Conduct Authority 
> and Prudential Regulation Authority.
>
> Cheltenham & Gloucester plc is authorised and regulated by the 
> Financial Conduct Authority.
>
> Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester 
> Savings is a division of Lloyds Bank plc.
>
> HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered 
> in Scotland no. SC218813.
>
> This e-mail (including any attachments) is private and confidential 
> and may contain privileged material. If you have received this e-mail 
> in error, please notify the sender and delete it (including any 
> attachments) immediately. You must not copy, distribute, disclose or 
> use any of the information in it or any attachments. Telephone calls 
> may be monitored or recorded.
>


Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. 
Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. 
Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England 
and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered 
Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotla

Writing visibility labels with HFileOutputFormat2

2016-06-07 Thread Ellis, Tom (Financial Markets IT)
Hi,

I was wondering if it's possible/how to write Visibility Labels to an 
HFileOutputFormat2? I believe Visibility Labels are just implemented as Tags, 
but with the normal way of writing them with Mutation#setCellVisibility these 
are formally written as Tags to the cells during the VisibilityController 
coprocessor as we need to assert the expression is valid for the labels 
configured.

How can we add visibility labels to cells if we have a job that creates an 
HFile with HFileOutputFormat2 which is then subsequently loaded using 
LoadIncrementalHFiles?

Cheers,

Tom Ellis
Consultant Developer - Excelian
Data Lake | Financial Markets IT
LLOYDS BANK COMMERCIAL BANKING


E: tom.el...@lloydsbanking.com
Website: www.lloydsbankcommercial.com
, , ,
Reduce printing. Lloyds Banking Group is helping to build the low carbon 
economy.
Corporate Responsibility Report: 
www.lloydsbankinggroup-cr.com/downloads



Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. 
Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank plc. 
Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England 
and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. Registered 
Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. SC327000. 
Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered Office: 
Barnett Way, Gloucester GL4 3RL. Registered in England and Wales 2299428. 
Telephone: 0345 603 1637

Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential 
Regulation Authority and regulated by the Financial Conduct Authority and 
Prudential Regulation Authority.

Cheltenham & Gloucester plc is authorised and regulated by the Financial 
Conduct Authority.

Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings 
is a division of Lloyds Bank plc.

HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in 
Scotland no. SC218813.

This e-mail (including any attachments) is private and confidential and may 
contain privileged material. If you have received this e-mail in error, please 
notify the sender and delete it (including any attachments) immediately. You 
must not copy, distribute, disclose or use any of the information in it or any 
attachments. Telephone calls may be monitored or recorded.