on blockCache hitRatio
HBASE_HEAP_SIZE=10G, use LruBlockCache with 0.4 of HBASE_HEAP_SIZE after hbase run 15 days , in find in some RS, there are 200M free block cache, but hit ration is 10%, too low i think the hit ration is low may be bacause of small block cache size(4G) is there any suggestions to get a higher hit ration? does use offheap-bucket cache can have effect? in hbase doc, i see bucket cache is mainly used to descraese CMS by GC and when we use combined cache(LRUBlokcCache+bucketCache), meta data is stored in lru, data is stored on offheap-bucketCache, does not has effect on read performance? is there any test data?
Re: Writing visibility labels with HFileOutputFormat2
Thanks for the updates here. Going through the mails here >> Why is it that a client user without admin/super user privileges can set a visibility expression using Put.setCellVisibility, but if we want to write using HFiles, I get your point now. There is a property '"hbase.security.visibility.mutations.checkauths" if set will check if the user is authorized to mutate the visibility labels that he is trying to write. If the user is not allowed to add that label the mutation will fail. Can you see if this solves the other problem of allowing any client user to write? If the above is not well documented pls feel free to raise a JIRA and we are happy to address it. Coming to reading the HFile and creating a bulk load, I think we should be more cautious here. There are some critical info stored in the HFile and just allowing any user to read it is going to be risky. Coming to the PutSortReducer problem, I think what you say is true. Not sure if there is a bug already, if not pls feel free to raise a bug here. We need to fix it. HBASE-15707 - you may need this because for scala's HBasecontext you need to ensure tags are included just incase ImportTSV has to be used. Write back, if I had missed something or if my info was lacking. Its been quite sometime we had worked in this area so have to see code every time to know what was done. Regards Ram On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) < tom.el...@lloydsbanking.com.invalid> wrote: > So, I can see that I can correctly get the Lists from the > VisibilityExpressionResolver, set them on the Cell, and write them using > HFileOutputFormat2, however when I scan using an unprivileged user I can > still see the cells. If I write the cells with setCellVisibility the > unprivileged user can't see them. > > Then I noticed the fix for HBASE-15707. I am using the Hortonworks' HBase > 1.1.2 - am affected by this/does HFileOutputFormat2 support tags before > this fix? > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low carbon > economy. > Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads > > > -Original Message- > From: Ellis, Tom (Financial Markets IT) [mailto: > tom.el...@lloydsbanking.com.INVALID] > Sent: 15 June 2016 17:42 > To: user@hbase.apache.org > Subject: RE: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > Looking at the source for how DefaultCellLabelServiceImpl checks > authorisation I noted it's just that the user just needs to have the > 'system' label auth privileges - not admin/super user as I thought you > meant Ram. So technically, I could have a client user that is given the > system label privileges, but only read access to the 'hbase:labels' table? > > Then that user will still be able to scan and read the labels + ordinal, > and create the tags correctly :) I'll give it a go.. > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low carbon > economy. > Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads > > > -Original Message- > From: Ellis, Tom (Financial Markets IT) [mailto: > tom.el...@lloydsbanking.com.INVALID] > Sent: 15 June 2016 16:56 > To: user@hbase.apache.org > Subject: RE: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > I see now from some other examples I've found that actually this form of > using HFileOutputFormat2 to write Puts will use the PutSortReducer if you > set the map output class of the job you give it to Put. Looking at the > source for PutSourceReducer it seems that it will actually lose the Cell > Visibility information as it uses the getFamilyCellMap to create KeyValue > objects and just uses that, and the CellVisibility is actually on the Put > Mutation. > > So I think that unfortunately, I can only really work around this by > giving the application user writing the HFile admin access so it can then > use the VisibilityExpressionResolver to create cells with tags with the > correct ordinals. > > Am I missing something? Why is it that a client user without admin/super > user privileges can set a visibility expression using > Put.setCellVisibility, but if we want to write using HFiles, the client > user has to have admin/super user privileges so they can use > VisibilityExpressionResolver to correctly create the tags on the Cell with > correct ordinals? > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets I
Re: May I run hbase on top of Alluxio/tacyon
I want to test if run on alluxio could improve performance,because alluxio is a distribution filesystem top on memory and under filesystem could be hdfs or s3 or something. 2016-06-16 10:32 GMT+08:00 Ted Yu : > Since you already have hadoop 2.7.1, why is alluxio 1.1.0 needed ? > > Can you illustrate your use case ? > > Thanks > > On Wed, Jun 15, 2016 at 7:27 PM, kevin wrote: > > > hi,all: > > > > I wonder to know If run hbase on Alluxio/tacyon is possible and a good > > idea, and can anybody share the experience.,thanks. > > I will try hbase0.98.16 with hadoop2.7.1 on top of alluxio 1.1.0. > > >
Re: May I run hbase on top of Alluxio/tacyon
Since you already have hadoop 2.7.1, why is alluxio 1.1.0 needed ? Can you illustrate your use case ? Thanks On Wed, Jun 15, 2016 at 7:27 PM, kevin wrote: > hi,all: > > I wonder to know If run hbase on Alluxio/tacyon is possible and a good > idea, and can anybody share the experience.,thanks. > I will try hbase0.98.16 with hadoop2.7.1 on top of alluxio 1.1.0. >
May I run hbase on top of Alluxio/tacyon
hi,all: I wonder to know If run hbase on Alluxio/tacyon is possible and a good idea, and can anybody share the experience.,thanks. I will try hbase0.98.16 with hadoop2.7.1 on top of alluxio 1.1.0.
Re: HBase regionserver SIGSEGV periodically
This seems to have resolve the issue, no sigsegv seen... yet. Thanks Esteban Harry On Fri, Jun 10, 2016 at 6:08 PM Esteban Gutierrez wrote: > Hi Harry, > > As you mentioned, moving to JDK8 is a good idea. There are many known > issues with G1GC and JDK7 that make using the G1 collector unreliable and > you will see that kind of crashes once in a while. > > cheers, > esteban. > > > -- > Cloudera, Inc. > > > On Fri, Jun 10, 2016 at 9:35 AM, Harry Waye wrote: > > > Our regionservers are periodically seg faulting, roughly once a day, and > I > > would appreciate some help debugging. Some version details: > > > > Java HotSpot(TM) 64-Bit Server VM (24.80-b11) for linux-amd64 JRE > > (1.7.0_80-b15), built on Apr 10 2015 19:53:14 by "java_re" with gcc 4.3.0 > > 20080428 (Red Hat 4.3.0-8) > > > > hbase-regionserver 0.98.6+cdh5.3.2+83-1.cdh5.3.2.p0.17~precise-cdh5.3.2 > > > > The segfault log output is: > > https://gist.github.com/hazzadous/aa5013f50824658e75b75fd860c73f02 > > > > This has details of the java options as well. Heap is ~46GB using G1GC. > > > > The log output from HBase around this event is: > > https://gist.github.com/hazzadous/4b94b5dcb351a360881c04fb54b5a70f > > > > I'm going to attempt to update to java 8 to see if there's any > improvements > > but any analysis will be appreciated. > > > > Harry > > >
Re: hbase bulk load with map reduce error
Please let me know what am I missing here. I am using MapR hadoop. Please find the classpath which is showing all the jars with versions, mean while I will get the code snippet also. I am using bulk write approach with mapper and reducer. HBase table created with bulk load enabled as true. + HADOOP_CLASSPATH='/opt/mapr/hbase/hbase-0.98.9/bin/../ conf:/opt/mapr/java/jdk1.7.0_25/lib/tools.jar:/opt/mapr/hbase/hbase -0.98.9/bin/..:/opt/mapr/lib/zookeeper-3.4.5-mapr-1406.jar:/opt/mapr/hbase/ hbase-0.98.9/bin/../lib/activation-1.1.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/aopalliance-1.0.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/apacheds-i18n-2.0.0-M15.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/api-asn1-api-1.0.0-M20.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/api-util-1.0.0-M20.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/asm-3.1.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/commons-beanutils-1.7.0.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/commons-beanutils-core-1.8.0.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/commons-cli-1.2.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/commons-codec-1.7.jar:/opt/mapr/hbase/hbase- 0.98.9/bin/../lib/commons-collections-3.2.1.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/commons-compress-1.4.1.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/commons-configuration-1.6.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/commons-daemon-1.0.13.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/commons-digester-1.8.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/commons-el-1.0.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/commons-httpclient-3.1.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/commons-io-2.4.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/commons-lang-2.6.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/commons-logging-1.1.1.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/commons-math-2.1.jar:/opt/mapr/hbase/hbase-0.98.9/bin/.. /lib/commons-math3-3.1.1.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/commons-net-3.1.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/findbugs-annotations-1.3.9-1.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/guava-12.0.1.jar:/opt/mapr/hbase/hbase-0.98.9/bin/.. /lib/guice-3.0.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/ guice-servlet-3.0.jar:/opt/mapr/hbase/hbase-0.98.9/bin/.. /lib/hamcrest-core-1.3.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/hbase -annotations-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/ hbase-checkstyle-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/hbase-client-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/hbase-common-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/hbase-common-0.98.9-mapr-1503-tests.jar:/opt/mapr/hbase/ hbase-0.98.9/bin/../lib/hbase-examples-0.98.9-mapr-1503.jar:/opt/mapr/hbase/ hbase-0.98.9/bin/../lib/hbase-hadoop2-compat-0.98.9-mapr-1503.jar:/opt/mapr/ hbase/hbase-0.98.9/bin/../lib/hbase-hadoop-compat-0.98.9-mapr-1503.jar:/ opt/mapr/hbase/hbase-0.98.9/bin/../lib/hbase-it-0.98.9- mapr-1503.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/hbase- it-0.98.9-mapr-1503-tests.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/hbase -prefix-tree-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/ hbase-protocol-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/ hbase-rest-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/ hbase-server-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase-0.98.9/bin/../lib/ hbase-server-0.98.9-mapr-1503-tests.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/hbase-shell-0.98.9-mapr-1503.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/hbase-testing-util-0.98.9-mapr-1503.jar:/opt/mapr/hbase/ hbase-0.98.9/bin/../lib/hbase-thrift-0.98.9-mapr-1503.jar:/opt/mapr/hbase/ hbase-0.98.9/bin/../lib/high-scale-lib-1.1.1.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/htrace-core-2.04.jar:/opt/mapr/hbase/hbase-0.98.9/bin/.. /lib/httpclient-4.2.5.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/httpcore-4.1.3.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/jackson-core-asl-1.8.8.jar:/opt/mapr/hbase/hbase- 0.98.9/bin/../lib/jackson-jaxrs-1.8.8.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/jackson-mapper-asl-1.8.8.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/jackson-xc-1.8.8.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/jamon-runtime-2.3.1.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/jasper-compiler-5.5.23.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/jasper-runtime-5.5.23.jar:/opt/mapr/hbase/hbase- 0.98.9/bin/../lib/javax.inject-1.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/java-xmlbuilder-0.4.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/jaxb-api-2.2.2.jar:/opt/mapr/hbase/hbase -0.98.9/bin/../lib/jaxb-impl-2.2.3-1.jar:/opt/mapr/hbase/hbase-0.98.9/bin/.. /lib/jcodings-1.0.8.jar:/opt/mapr/hbase/hbase-0.98.9/bin/.. /lib/jersey-client-1.9.jar:/opt/mapr/hbase/hbase-0.98.9/ bin/../lib/jersey-core-1.8.jar:/opt/mapr/hbase/hbase-0. 98.9/bin/../lib/jersey-guice-1.9.jar:/opt/mapr/hbase/hbase- 0.98.9/bin/
RE: Writing visibility labels with HFileOutputFormat2
So, I can see that I can correctly get the Lists from the VisibilityExpressionResolver, set them on the Cell, and write them using HFileOutputFormat2, however when I scan using an unprivileged user I can still see the cells. If I write the cells with setCellVisibility the unprivileged user can't see them. Then I noticed the fix for HBASE-15707. I am using the Hortonworks' HBase 1.1.2 - am affected by this/does HFileOutputFormat2 support tags before this fix? Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: Ellis, Tom (Financial Markets IT) [mailto:tom.el...@lloydsbanking.com.INVALID] Sent: 15 June 2016 17:42 To: user@hbase.apache.org Subject: RE: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- Looking at the source for how DefaultCellLabelServiceImpl checks authorisation I noted it's just that the user just needs to have the 'system' label auth privileges - not admin/super user as I thought you meant Ram. So technically, I could have a client user that is given the system label privileges, but only read access to the 'hbase:labels' table? Then that user will still be able to scan and read the labels + ordinal, and create the tags correctly :) I'll give it a go.. Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: Ellis, Tom (Financial Markets IT) [mailto:tom.el...@lloydsbanking.com.INVALID] Sent: 15 June 2016 16:56 To: user@hbase.apache.org Subject: RE: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- I see now from some other examples I've found that actually this form of using HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map output class of the job you give it to Put. Looking at the source for PutSourceReducer it seems that it will actually lose the Cell Visibility information as it uses the getFamilyCellMap to create KeyValue objects and just uses that, and the CellVisibility is actually on the Put Mutation. So I think that unfortunately, I can only really work around this by giving the application user writing the HFile admin access so it can then use the VisibilityExpressionResolver to create cells with tags with the correct ordinals. Am I missing something? Why is it that a client user without admin/super user privileges can set a visibility expression using Put.setCellVisibility, but if we want to write using HFiles, the client user has to have admin/super user privileges so they can use VisibilityExpressionResolver to correctly create the tags on the Cell with correct ordinals? Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: Ellis, Tom (Financial Markets IT) [mailto:tom.el...@lloydsbanking.com.INVALID] Sent: 15 June 2016 16:25 To: user@hbase.apache.org Subject: RE: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile. Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles? The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not. I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell. Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Grou
RE: Writing visibility labels with HFileOutputFormat2
Looking at the source for how DefaultCellLabelServiceImpl checks authorisation I noted it's just that the user just needs to have the 'system' label auth privileges - not admin/super user as I thought you meant Ram. So technically, I could have a client user that is given the system label privileges, but only read access to the 'hbase:labels' table? Then that user will still be able to scan and read the labels + ordinal, and create the tags correctly :) I'll give it a go.. Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: Ellis, Tom (Financial Markets IT) [mailto:tom.el...@lloydsbanking.com.INVALID] Sent: 15 June 2016 16:56 To: user@hbase.apache.org Subject: RE: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- I see now from some other examples I've found that actually this form of using HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map output class of the job you give it to Put. Looking at the source for PutSourceReducer it seems that it will actually lose the Cell Visibility information as it uses the getFamilyCellMap to create KeyValue objects and just uses that, and the CellVisibility is actually on the Put Mutation. So I think that unfortunately, I can only really work around this by giving the application user writing the HFile admin access so it can then use the VisibilityExpressionResolver to create cells with tags with the correct ordinals. Am I missing something? Why is it that a client user without admin/super user privileges can set a visibility expression using Put.setCellVisibility, but if we want to write using HFiles, the client user has to have admin/super user privileges so they can use VisibilityExpressionResolver to correctly create the tags on the Cell with correct ordinals? Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: Ellis, Tom (Financial Markets IT) [mailto:tom.el...@lloydsbanking.com.INVALID] Sent: 15 June 2016 16:25 To: user@hbase.apache.org Subject: RE: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile. Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles? The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not. I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell. Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] Sent: 15 June 2016 12:31 To: user@hbase.apache.org Subject: Re: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- >>We could I guess create multiple puts for cells in the same row with different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead? This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO. Without doing put.setCellvisibility() there is no other way I believe. One
RE: Writing visibility labels with HFileOutputFormat2
Thanks Ted - It was just a class cast on line 161 of HFileOutput2.write, because I had previously read that you could give it Puts, but it can actually only take Cells. You can only do Puts if you use configureIncrementalLoad which then sets up the PutSortReducer as I discussed in my other email. Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: 15 June 2016 17:01 To: user@hbase.apache.org Subject: Re: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- Tom: Can you pastebin the stack trace for the exception ? It would be nice if you can show snippet of your code too. Thanks > On Jun 15, 2016, at 8:24 AM, Ellis, Tom (Financial Markets IT) > wrote: > > So I have a working prototype using just bulk puts on a table and using > setCellVisibility as necessary. Now I'm trying to do it using HFile. > > Sorry Ram, I don't quite follow why the user doing the writing of the HFile > has to be an admin/super user? Is that necessary to load HFiles? > > The use case is to hopefully have an application user (non admin) performing > the writes to an hbase table via a bulk load of an hfile, setting visibility > labels on individual cells as necessary. Then business users who has been > given the auth to view that label can see those cells, and others not. > > I've seen that it's possible to do this with map reduce & setting the map > output to be a Put (and thus could setCellVisibility on the puts), but I'm > struggling to do this with Spark, as I keep getting the exception that I > can't cast a Put to a Cell. > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low carbon > economy. > Corporate Responsibility Report: > www.lloydsbankinggroup-cr.com/downloads > > > -Original Message- > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] > Sent: 15 June 2016 12:31 > To: user@hbase.apache.org > Subject: Re: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > >>> We could I guess create multiple puts for cells in the same row with > different labels and use the setCellVisibility on each individual put/cell, > but will this create additional overhead? > This can be done. If you want different cells in the same row to have > different labels then it is better to create those many puts and > setCellVisibility on each of them. What type of overhead you see here? In > terms of the server processing them? If so there should not be much overhead > here and also adding different cells to every column inturn means you need > every cell to be treated differenly in terms of security. so should be fine > IMHO. > > Without doing put.setCellvisibility() there is no other way I believe. One > question regarding your use case Now in the mail you had told about the spark > job where you will create a bulk loaded file. Now if that is to have all the > visibility related information of all the cells then the user doing this job > should be an admin or super user right Why is the case that a normal client > user will read through all the visibility cells which may or may not be > associated with that user? > > Thank you very much for testing and using this feature. LEt us know your > feedback and if you find any gaps here. Happy to help. > > Regards > Ram > > >> On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < >> tom.el...@lloydsbanking.com.invalid> wrote: >> >> Hmm, is there no other way to set labels on individual cells where we >> don't have to give the client users system perms? For instance, >> client users can set the cell visibility on the entire put without >> having this (i.e. put.setCellVisibility("label")) and the >> VisibilityController will check this. >> >> We could I guess create multiple puts for cells in the same row with >> different labels and use the setCellVisibility on each individual >> put/cell, but will this create additional overhead? >> >> Cheers, >> >> Tom Ellis >> Consultant Developer – Excelian >> Data Lake | Financial Markets IT >> LLOYDS BANK COMMERCIAL BANKING >> >> >> E: tom.el...@lloydsbanking.com >> Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds >> Banking Group is helping to build the low carbon economy. >> Corporate Responsibility Report: >> www.lloydsbankinggroup-cr.com/downloads >> >> >> -Or
RE: Writing visibility labels with HFileOutputFormat2
I see now from some other examples I've found that actually this form of using HFileOutputFormat2 to write Puts will use the PutSortReducer if you set the map output class of the job you give it to Put. Looking at the source for PutSourceReducer it seems that it will actually lose the Cell Visibility information as it uses the getFamilyCellMap to create KeyValue objects and just uses that, and the CellVisibility is actually on the Put Mutation. So I think that unfortunately, I can only really work around this by giving the application user writing the HFile admin access so it can then use the VisibilityExpressionResolver to create cells with tags with the correct ordinals. Am I missing something? Why is it that a client user without admin/super user privileges can set a visibility expression using Put.setCellVisibility, but if we want to write using HFiles, the client user has to have admin/super user privileges so they can use VisibilityExpressionResolver to correctly create the tags on the Cell with correct ordinals? Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: Ellis, Tom (Financial Markets IT) [mailto:tom.el...@lloydsbanking.com.INVALID] Sent: 15 June 2016 16:25 To: user@hbase.apache.org Subject: RE: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile. Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles? The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not. I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell. Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] Sent: 15 June 2016 12:31 To: user@hbase.apache.org Subject: Re: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- >>We could I guess create multiple puts for cells in the same row with different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead? This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO. Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user? Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help. Regards Ram On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < tom.el...@lloydsbanking.com.invalid> wrote: > Hmm, is there no other way to set labels on individual cells where we > don't have to give the client users system perms? For instance, client > users can set the cell visibility on the entire put without having > this (i.e. put.setCellVisibility("label")) and the > VisibilityController will check this. > > We could I guess create multiple puts for cells in the same row with > different labels and use the setCellVisibility on each individual > put/
Re: Writing visibility labels with HFileOutputFormat2
Tom: Can you pastebin the stack trace for the exception ? It would be nice if you can show snippet of your code too. Thanks > On Jun 15, 2016, at 8:24 AM, Ellis, Tom (Financial Markets IT) > wrote: > > So I have a working prototype using just bulk puts on a table and using > setCellVisibility as necessary. Now I'm trying to do it using HFile. > > Sorry Ram, I don't quite follow why the user doing the writing of the HFile > has to be an admin/super user? Is that necessary to load HFiles? > > The use case is to hopefully have an application user (non admin) performing > the writes to an hbase table via a bulk load of an hfile, setting visibility > labels on individual cells as necessary. Then business users who has been > given the auth to view that label can see those cells, and others not. > > I've seen that it's possible to do this with map reduce & setting the map > output to be a Put (and thus could setCellVisibility on the puts), but I'm > struggling to do this with Spark, as I keep getting the exception that I > can't cast a Put to a Cell. > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low carbon > economy. > Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads > > > -Original Message- > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] > Sent: 15 June 2016 12:31 > To: user@hbase.apache.org > Subject: Re: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > >>> We could I guess create multiple puts for cells in the same row with > different labels and use the setCellVisibility on each individual put/cell, > but will this create additional overhead? > This can be done. If you want different cells in the same row to have > different labels then it is better to create those many puts and > setCellVisibility on each of them. What type of overhead you see here? In > terms of the server processing them? If so there should not be much overhead > here and also adding different cells to every column inturn means you need > every cell to be treated differenly in terms of security. so should be fine > IMHO. > > Without doing put.setCellvisibility() there is no other way I believe. One > question regarding your use case Now in the mail you had told about the spark > job where you will create a bulk loaded file. Now if that is to have all the > visibility related information of all the cells then the user doing this job > should be an admin or super user right Why is the case that a normal client > user will read through all the visibility cells which may or may not be > associated with that user? > > Thank you very much for testing and using this feature. LEt us know your > feedback and if you find any gaps here. Happy to help. > > Regards > Ram > > >> On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < >> tom.el...@lloydsbanking.com.invalid> wrote: >> >> Hmm, is there no other way to set labels on individual cells where we >> don't have to give the client users system perms? For instance, client >> users can set the cell visibility on the entire put without having >> this (i.e. put.setCellVisibility("label")) and the >> VisibilityController will check this. >> >> We could I guess create multiple puts for cells in the same row with >> different labels and use the setCellVisibility on each individual >> put/cell, but will this create additional overhead? >> >> Cheers, >> >> Tom Ellis >> Consultant Developer – Excelian >> Data Lake | Financial Markets IT >> LLOYDS BANK COMMERCIAL BANKING >> >> >> E: tom.el...@lloydsbanking.com >> Website: www.lloydsbankcommercial.com >> , , , >> Reduce printing. Lloyds Banking Group is helping to build the low >> carbon economy. >> Corporate Responsibility Report: >> www.lloydsbankinggroup-cr.com/downloads >> >> >> -Original Message- >> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] >> Sent: 15 June 2016 11:24 >> To: user@hbase.apache.org >> Subject: Re: Writing visibility labels with HFileOutputFormat2 >> >> -- This email has reached the Bank via an external source -- >> >> >> The visibility expression resolver tries to scan the labels table and >> the user using the resolver should have the SYSTEM privileges. Since >> the information that is getting accessed is sensitive information. >> >> Suppose in your above case you have the client user added as a an >> admin then when you scan the label table you should be able to scan it. >> >> Regards >> Ram >> >> On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) < >> tom.el...@lloydsbanking.com.invalid> wrote: >> >>> Yeah, thanks for this Ram. Although in my testing I have fou
RE: Writing visibility labels with HFileOutputFormat2
So I have a working prototype using just bulk puts on a table and using setCellVisibility as necessary. Now I'm trying to do it using HFile. Sorry Ram, I don't quite follow why the user doing the writing of the HFile has to be an admin/super user? Is that necessary to load HFiles? The use case is to hopefully have an application user (non admin) performing the writes to an hbase table via a bulk load of an hfile, setting visibility labels on individual cells as necessary. Then business users who has been given the auth to view that label can see those cells, and others not. I've seen that it's possible to do this with map reduce & setting the map output to be a Put (and thus could setCellVisibility on the puts), but I'm struggling to do this with Spark, as I keep getting the exception that I can't cast a Put to a Cell. Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] Sent: 15 June 2016 12:31 To: user@hbase.apache.org Subject: Re: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- >>We could I guess create multiple puts for cells in the same row with different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead? This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO. Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user? Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help. Regards Ram On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < tom.el...@lloydsbanking.com.invalid> wrote: > Hmm, is there no other way to set labels on individual cells where we > don't have to give the client users system perms? For instance, client > users can set the cell visibility on the entire put without having > this (i.e. put.setCellVisibility("label")) and the > VisibilityController will check this. > > We could I guess create multiple puts for cells in the same row with > different labels and use the setCellVisibility on each individual > put/cell, but will this create additional overhead? > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low > carbon economy. > Corporate Responsibility Report: > www.lloydsbankinggroup-cr.com/downloads > > > -Original Message- > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] > Sent: 15 June 2016 11:24 > To: user@hbase.apache.org > Subject: Re: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > The visibility expression resolver tries to scan the labels table and > the user using the resolver should have the SYSTEM privileges. Since > the information that is getting accessed is sensitive information. > > Suppose in your above case you have the client user added as a an > admin then when you scan the label table you should be able to scan it. > > Regards > Ram > > On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) < > tom.el...@lloydsbanking.com.invalid> wrote: > > > Yeah, thanks for this Ram. Although in my testing I have found that > > a client user attempting to use the visibility expression resolver > > doesn't seem to have the ability to scan the hbase:labels table for > > the full list of labels and thus can't get the ordinals/tags to add > > to the cell. Does the client user attempting to use the > > VisibilityExpressionResolver have to have some special permissions? > > > > Scan of hbase:labels by client user: > > > > hbase(
Re: HBase acl commands are too slow
Have you looked at http://hbase.apache.org/book.html#security ? I noticed that DEBUG logging was not on in the log you posted earlier. Is it possible to turn on DEBUG logging and repeat the operation ? Thanks On Wed, Jun 15, 2016 at 2:12 AM, kumar r wrote: > Hi Ted, > > Thanks for your reply. > > I cannot find anything in configuration. Can you tell me what might be root > cause for this issue? > > What will be major cause for acl command taking more than 30 seconds to > process. I cannot find anything other than this in hbase log. Is there any > documentation available to secure zookeeper and hbase with kerberos > properly? > > The same log occurs in normal cluster also and i have enabled > authorization. The same authorization command runs in 5 to 6 seconds. > > Thanks, > Kumar > > On Tue, Jun 14, 2016 at 7:59 PM, Ted Yu wrote: > > > bq. Opening socket connection to server machine2/192.168.60.3:2181. > Will > > not attempt to authenticate using SASL (unknown error) > > > > It seems connection to zookeeper might have some issue. > > Can you double check configuration ? > > > > On Mon, Jun 13, 2016 at 11:56 PM, kumar r wrote: > > > > > Hi, > > > > > > Thanks for the reply. > > > > > > Please find the command and time took to process it, > > > > > > > > > > > > > > > > > > *hbase(main):006:0> grant 'Selva','RW','@default'0 row(s) in 11.8830 > > > secondshbase(main):007:0> revoke 'Selva','@default'0 row(s) in 32.4330 > > > seconds* > > > > > > Find my HBase log in below pastebin > > > > > > http://pastebin.com/MHMjhHuF > > > > > > > > > Thanks, > > > > > > Kumar > > > > > > > > > On Mon, Jun 13, 2016 at 7:42 PM, Ted Yu wrote: > > > > > > > Can you inspect master log for the corresponding 40 seconds to see if > > > there > > > > was some clue ? > > > > > > > > Feel free to pastebin the log snippet for this period if you cannot > > > > determine the cause. > > > > > > > > Cheers > > > > > > > > On Sun, Jun 12, 2016 at 10:19 PM, kumar r > wrote: > > > > > > > > > Hi, > > > > > > > > > > I have configured secure HBase-1.1.3. Hadoop version using 2.7.2. > > > > > > > > > > I have enabled authorization in HBase. > > > > > > > > > > When executing any authorization command like user_permission, > grant, > > > > > revoke, > > > > > etc. > > > > > > > > > > Its getting more than 40 seconds to display the result. > > > > > > > > > > Below are hbase-site.xml configuration properties > > > > > > > > > > > > > > > > > > > > hbase.master > > > > > IP:6 > > > > > > > > > > > > > > > hbase.rootdir > > > > > hdfs://IP:9000/HBase > > > > > > > > > > > > > > > hbase.cluster.distributed > > > > > true > > > > > > > > > > > > > > > hbase.zookeeper.quorum > > > > > IP1:2181,IP2:2181,IP3:2181 > > > > > > > > > > > > > > > hbase.master.port > > > > > 6 > > > > > > > > > > > > > > > hbase.master.info.port > > > > > 60010 > > > > > > > > > > > > > > > hbase.regionserver.port > > > > > 60020 > > > > > > > > > > > > > > > hbase.regionserver.info.port > > > > > 60030 > > > > > > > > > > > > > > > hbase.security.authentication > > > > > KERBEROS > > > > > > > > > > > > > > > hbase.master.keytab.file > > > > > masterkeytab > > > > > > > > > > > > > > > hbase.regionserver.keytab.file > > > > > regionserverkeytab > > > > > > > > > > > > > > > hbase.master.kerberos.principal > > > > > masterprincipal > > > > > > > > > > > > > > > hbase.regionserver.kerberos.principal > > > > > regionserverprincipal > > > > > > > > > > > > > > > hbase.rpc.engine > > > > > org.apache.hadoop.hbase.ipc.SecureRpcEngine > > > > > > > > > > > > > > > hbase.ssl.enabled > > > > > true > > > > > > > > > > > > > > > hbase.superuser > > > > > @HadoopUser > > > > > > > > > > > > > > > hbase.security.authorization > > > > > true > > > > > > > > > > > > > > > hbase.coprocessor.master.classes > > > > > > > > > > > org.apache.hadoop.hbase.security.access.AccessController > > > > > > > > > > > > > > > hbase.coprocessor.region.classes > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController > > > > > > > > > > > > > > > Find my stack overflow question here > > > > > > > > > > > > > > > http://stackoverflow.com/questions/37782043/hbase-acl-commands-are-too-slow > > > > > > > > > > Thanks, > > > > > > > > > > Kumar > > > > > > > > > > > > > > >
Re: Big Data Interview
Please don't cross post. This seems to be an advertisement. > On Jun 15, 2016, at 4:41 AM, Chaturvedi Chola > wrote: > > Good book on interview preparation for big data > > https://notionpress.com/read/big-data-interview-faqs
Big Data Interview
Good book on interview preparation for big data https://notionpress.com/read/big-data-interview-faqs
Re: Writing visibility labels with HFileOutputFormat2
>>We could I guess create multiple puts for cells in the same row with different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead? This can be done. If you want different cells in the same row to have different labels then it is better to create those many puts and setCellVisibility on each of them. What type of overhead you see here? In terms of the server processing them? If so there should not be much overhead here and also adding different cells to every column inturn means you need every cell to be treated differenly in terms of security. so should be fine IMHO. Without doing put.setCellvisibility() there is no other way I believe. One question regarding your use case Now in the mail you had told about the spark job where you will create a bulk loaded file. Now if that is to have all the visibility related information of all the cells then the user doing this job should be an admin or super user right Why is the case that a normal client user will read through all the visibility cells which may or may not be associated with that user? Thank you very much for testing and using this feature. LEt us know your feedback and if you find any gaps here. Happy to help. Regards Ram On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < tom.el...@lloydsbanking.com.invalid> wrote: > Hmm, is there no other way to set labels on individual cells where we > don't have to give the client users system perms? For instance, client > users can set the cell visibility on the entire put without having this > (i.e. put.setCellVisibility("label")) and the VisibilityController will > check this. > > We could I guess create multiple puts for cells in the same row with > different labels and use the setCellVisibility on each individual put/cell, > but will this create additional overhead? > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low carbon > economy. > Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads > > > -Original Message- > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] > Sent: 15 June 2016 11:24 > To: user@hbase.apache.org > Subject: Re: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > The visibility expression resolver tries to scan the labels table and the > user using the resolver should have the SYSTEM privileges. Since the > information that is getting accessed is sensitive information. > > Suppose in your above case you have the client user added as a an admin > then when you scan the label table you should be able to scan it. > > Regards > Ram > > On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) < > tom.el...@lloydsbanking.com.invalid> wrote: > > > Yeah, thanks for this Ram. Although in my testing I have found that a > > client user attempting to use the visibility expression resolver > > doesn't seem to have the ability to scan the hbase:labels table for > > the full list of labels and thus can't get the ordinals/tags to add to > > the cell. Does the client user attempting to use the > > VisibilityExpressionResolver have to have some special permissions? > > > > Scan of hbase:labels by client user: > > > > hbase(main):003:0> scan 'hbase:labels' > > ROW COLUMN+CELL > > \x00\x00\x00\x01 column=f:\x00, > > timestamp=1465216652662, value=system > > 1 row(s) in 0.0650 seconds > > > > Scan of hbase:labels by hbase user: > > > > hbase(main):001:0> scan 'hbase:labels' > > ROW COLUMN+CELL > > \x00\x00\x00\x01 column=f:\x00, > > timestamp=1465216652662, value=system > > \x00\x00\x00\x02 column=f:\x00, > > timestamp=1465216944935, value=protected > > \x00\x00\x00\x02 column=f:hbase, > > timestamp=1465547138533, value= > > \x00\x00\x00\x02 column=f:tom, > > timestamp=1465980236882, value= > > \x00\x00\x00\x03 column=f:\x00, > > timestamp=1465500156667, value=testtesttest > > \x00\x00\x00\x03 column=f:@hadoop, > > timestamp=1465980236967, value= > > \x00\x00\x00\x03 column=f:hadoop, > > timestamp=1465547304610, value= > > \x00\x00\x00\x03 column=f:hive, > > timestamp=1465501322616, value= > > \x00\x00\x00\x04 column=f:\x00, > > timestamp=1465570719901, value=confidential > > \x00\x00\x00\x05 column=f:\x00, > > timestamp=1465835047835, value=branch > > \x00\x00\x00\x05 column=f:hdfs,
HBase number of columns
Hi, As per the official documentation of HBase it is mentioned that HBase typical schema should contain 1 to 3 column families per table (https://hbase.apache.org/book.html#table_schema_rules_of_thumb ) . However there is no mention of how many column qualifiers should a row contain for each column family to see good read & write performance. Could anybody let us know their input on how many columns per row is desirable in HBase or how many column qualifiers per column family would be desirable. Thanks, Siddharth Ubale,
RE: Writing visibility labels with HFileOutputFormat2
Hmm, is there no other way to set labels on individual cells where we don't have to give the client users system perms? For instance, client users can set the cell visibility on the entire put without having this (i.e. put.setCellVisibility("label")) and the VisibilityController will check this. We could I guess create multiple puts for cells in the same row with different labels and use the setCellVisibility on each individual put/cell, but will this create additional overhead? Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] Sent: 15 June 2016 11:24 To: user@hbase.apache.org Subject: Re: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- The visibility expression resolver tries to scan the labels table and the user using the resolver should have the SYSTEM privileges. Since the information that is getting accessed is sensitive information. Suppose in your above case you have the client user added as a an admin then when you scan the label table you should be able to scan it. Regards Ram On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) < tom.el...@lloydsbanking.com.invalid> wrote: > Yeah, thanks for this Ram. Although in my testing I have found that a > client user attempting to use the visibility expression resolver > doesn't seem to have the ability to scan the hbase:labels table for > the full list of labels and thus can't get the ordinals/tags to add to > the cell. Does the client user attempting to use the > VisibilityExpressionResolver have to have some special permissions? > > Scan of hbase:labels by client user: > > hbase(main):003:0> scan 'hbase:labels' > ROW COLUMN+CELL > \x00\x00\x00\x01 column=f:\x00, > timestamp=1465216652662, value=system > 1 row(s) in 0.0650 seconds > > Scan of hbase:labels by hbase user: > > hbase(main):001:0> scan 'hbase:labels' > ROW COLUMN+CELL > \x00\x00\x00\x01 column=f:\x00, > timestamp=1465216652662, value=system > \x00\x00\x00\x02 column=f:\x00, > timestamp=1465216944935, value=protected > \x00\x00\x00\x02 column=f:hbase, > timestamp=1465547138533, value= > \x00\x00\x00\x02 column=f:tom, > timestamp=1465980236882, value= > \x00\x00\x00\x03 column=f:\x00, > timestamp=1465500156667, value=testtesttest > \x00\x00\x00\x03 column=f:@hadoop, > timestamp=1465980236967, value= > \x00\x00\x00\x03 column=f:hadoop, > timestamp=1465547304610, value= > \x00\x00\x00\x03 column=f:hive, > timestamp=1465501322616, value= > \x00\x00\x00\x04 column=f:\x00, > timestamp=1465570719901, value=confidential > \x00\x00\x00\x05 column=f:\x00, > timestamp=1465835047835, value=branch > \x00\x00\x00\x05 column=f:hdfs, > timestamp=1465980237060, value= > \x00\x00\x00\x06 column=f:\x00, > timestamp=1465980447307, value=group > \x00\x00\x00\x06 column=f:hdfs, > timestamp=1465980454130, value= > 6 row(s) in 0.7370 seconds > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low > carbon economy. > Corporate Responsibility Report: > www.lloydsbankinggroup-cr.com/downloads > > -Original Message- > From: Anoop John [mailto:anoop.hb...@gmail.com] > Sent: 08 June 2016 11:58 > To: user@hbase.apache.org > Subject: Re: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > Thanks Ram.. Ya that seems the best way as CellCreator is public > exposed class. May be we should explain abt this in hbase book under > the Visibility labels area. Good to know you have Visibility labels > based usecase. Let us know in case of any trouble. Thanks. > > -Anoop- > > On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan < > ramkrishna.s.vasude...@gmail.com> wrote: > > Hi > > > > It can be done. See the class CellCreator which is Public facing > interface. > > When you create your spark job to create the hadoop files that > > produces the > > HFileOutputformat2 data. While creatin
Re: Writing visibility labels with HFileOutputFormat2
The visibility expression resolver tries to scan the labels table and the user using the resolver should have the SYSTEM privileges. Since the information that is getting accessed is sensitive information. Suppose in your above case you have the client user added as a an admin then when you scan the label table you should be able to scan it. Regards Ram On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) < tom.el...@lloydsbanking.com.invalid> wrote: > Yeah, thanks for this Ram. Although in my testing I have found that a > client user attempting to use the visibility expression resolver doesn't > seem to have the ability to scan the hbase:labels table for the full list > of labels and thus can't get the ordinals/tags to add to the cell. Does the > client user attempting to use the VisibilityExpressionResolver have to have > some special permissions? > > Scan of hbase:labels by client user: > > hbase(main):003:0> scan 'hbase:labels' > ROW COLUMN+CELL > \x00\x00\x00\x01 column=f:\x00, > timestamp=1465216652662, value=system > 1 row(s) in 0.0650 seconds > > Scan of hbase:labels by hbase user: > > hbase(main):001:0> scan 'hbase:labels' > ROW COLUMN+CELL > \x00\x00\x00\x01 column=f:\x00, > timestamp=1465216652662, value=system > \x00\x00\x00\x02 column=f:\x00, > timestamp=1465216944935, value=protected > \x00\x00\x00\x02 column=f:hbase, > timestamp=1465547138533, value= > \x00\x00\x00\x02 column=f:tom, > timestamp=1465980236882, value= > \x00\x00\x00\x03 column=f:\x00, > timestamp=1465500156667, value=testtesttest > \x00\x00\x00\x03 column=f:@hadoop, > timestamp=1465980236967, value= > \x00\x00\x00\x03 column=f:hadoop, > timestamp=1465547304610, value= > \x00\x00\x00\x03 column=f:hive, > timestamp=1465501322616, value= > \x00\x00\x00\x04 column=f:\x00, > timestamp=1465570719901, value=confidential > \x00\x00\x00\x05 column=f:\x00, > timestamp=1465835047835, value=branch > \x00\x00\x00\x05 column=f:hdfs, > timestamp=1465980237060, value= > \x00\x00\x00\x06 column=f:\x00, > timestamp=1465980447307, value=group > \x00\x00\x00\x06 column=f:hdfs, > timestamp=1465980454130, value= > 6 row(s) in 0.7370 seconds > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low carbon > economy. > Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads > > -Original Message- > From: Anoop John [mailto:anoop.hb...@gmail.com] > Sent: 08 June 2016 11:58 > To: user@hbase.apache.org > Subject: Re: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > Thanks Ram.. Ya that seems the best way as CellCreator is public exposed > class. May be we should explain abt this in hbase book under the Visibility > labels area. Good to know you have Visibility labels based usecase. Let us > know in case of any trouble. Thanks. > > -Anoop- > > On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan < > ramkrishna.s.vasude...@gmail.com> wrote: > > Hi > > > > It can be done. See the class CellCreator which is Public facing > interface. > > When you create your spark job to create the hadoop files that > > produces the > > HFileOutputformat2 data. While creating the KeyValues you can use the > > CellCreator to create your KeyValues and use the > > CellCreator.getVisibilityExpressionResolver() to map your String > > Visibility tags with the system generated ordinals. > > > > For eg, you can see how TextSortReducer works. I think this should > > help you solve your problem. Let us know if you need further information. > > > > Regards > > Ram > > > > On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) < > > tom.el...@lloydsbanking.com.invalid> wrote: > > > >> Hi Ram, > >> > >> We're attempting to do it programmatically so: > >> > >> The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and > >> using ImmutableBytesWritable as the key (rowkey) with KeyValue as the > >> value, and using the HFilOutputFormat2 format. > >> This HFile is then loaded using HBase client's > >> LoadIncrementalHFiles.doBulkLoad > >> > >> Is there a way to do this programmatically without using the > >> ImportTsv tool? I was taking a look at > >> VisibilityUtils.createVisibilityExpTags and maybe being able to just > >> create the Tags myself that way (although it's obviously > >>
RE: Writing visibility labels with HFileOutputFormat2
Yeah, thanks for this Ram. Although in my testing I have found that a client user attempting to use the visibility expression resolver doesn't seem to have the ability to scan the hbase:labels table for the full list of labels and thus can't get the ordinals/tags to add to the cell. Does the client user attempting to use the VisibilityExpressionResolver have to have some special permissions? Scan of hbase:labels by client user: hbase(main):003:0> scan 'hbase:labels' ROW COLUMN+CELL \x00\x00\x00\x01 column=f:\x00, timestamp=1465216652662, value=system 1 row(s) in 0.0650 seconds Scan of hbase:labels by hbase user: hbase(main):001:0> scan 'hbase:labels' ROW COLUMN+CELL \x00\x00\x00\x01 column=f:\x00, timestamp=1465216652662, value=system \x00\x00\x00\x02 column=f:\x00, timestamp=1465216944935, value=protected \x00\x00\x00\x02 column=f:hbase, timestamp=1465547138533, value= \x00\x00\x00\x02 column=f:tom, timestamp=1465980236882, value= \x00\x00\x00\x03 column=f:\x00, timestamp=1465500156667, value=testtesttest \x00\x00\x00\x03 column=f:@hadoop, timestamp=1465980236967, value= \x00\x00\x00\x03 column=f:hadoop, timestamp=1465547304610, value= \x00\x00\x00\x03 column=f:hive, timestamp=1465501322616, value= \x00\x00\x00\x04 column=f:\x00, timestamp=1465570719901, value=confidential \x00\x00\x00\x05 column=f:\x00, timestamp=1465835047835, value=branch \x00\x00\x00\x05 column=f:hdfs, timestamp=1465980237060, value= \x00\x00\x00\x06 column=f:\x00, timestamp=1465980447307, value=group \x00\x00\x00\x06 column=f:hdfs, timestamp=1465980454130, value= 6 row(s) in 0.7370 seconds Cheers, Tom Ellis Consultant Developer – Excelian Data Lake | Financial Markets IT LLOYDS BANK COMMERCIAL BANKING E: tom.el...@lloydsbanking.com Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds Banking Group is helping to build the low carbon economy. Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads -Original Message- From: Anoop John [mailto:anoop.hb...@gmail.com] Sent: 08 June 2016 11:58 To: user@hbase.apache.org Subject: Re: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- Thanks Ram.. Ya that seems the best way as CellCreator is public exposed class. May be we should explain abt this in hbase book under the Visibility labels area. Good to know you have Visibility labels based usecase. Let us know in case of any trouble. Thanks. -Anoop- On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan wrote: > Hi > > It can be done. See the class CellCreator which is Public facing interface. > When you create your spark job to create the hadoop files that > produces the > HFileOutputformat2 data. While creating the KeyValues you can use the > CellCreator to create your KeyValues and use the > CellCreator.getVisibilityExpressionResolver() to map your String > Visibility tags with the system generated ordinals. > > For eg, you can see how TextSortReducer works. I think this should > help you solve your problem. Let us know if you need further information. > > Regards > Ram > > On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) < > tom.el...@lloydsbanking.com.invalid> wrote: > >> Hi Ram, >> >> We're attempting to do it programmatically so: >> >> The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and >> using ImmutableBytesWritable as the key (rowkey) with KeyValue as the >> value, and using the HFilOutputFormat2 format. >> This HFile is then loaded using HBase client's >> LoadIncrementalHFiles.doBulkLoad >> >> Is there a way to do this programmatically without using the >> ImportTsv tool? I was taking a look at >> VisibilityUtils.createVisibilityExpTags and maybe being able to just >> create the Tags myself that way (although it's obviously >> @InterfaceAudience.Private) but it seems to be able to use that I'd need to >> know Label ordinality client side.. >> >> Thanks for your help, >> >> Tom >> >> -Original Message- >> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] >> Sent: 07 June 2016 11:19 >> To: user@hbase.apache.org >> Subject: Re: Writing visibility labels with HFileOutputFormat2 >> >> -- This email has reached the Bank via an external source -- >> >> >> Hi Ellis >> >> How is the HFileOutputFormat2 files created? Are you using the >> ImportTsv tool? If you are using the ImportTsv tool then yes there >> is a way to specify visibility tags while loading from the ImportTsv >>
Re: HBase acl commands are too slow
Hi Ted, Thanks for your reply. I cannot find anything in configuration. Can you tell me what might be root cause for this issue? What will be major cause for acl command taking more than 30 seconds to process. I cannot find anything other than this in hbase log. Is there any documentation available to secure zookeeper and hbase with kerberos properly? The same log occurs in normal cluster also and i have enabled authorization. The same authorization command runs in 5 to 6 seconds. Thanks, Kumar On Tue, Jun 14, 2016 at 7:59 PM, Ted Yu wrote: > bq. Opening socket connection to server machine2/192.168.60.3:2181. Will > not attempt to authenticate using SASL (unknown error) > > It seems connection to zookeeper might have some issue. > Can you double check configuration ? > > On Mon, Jun 13, 2016 at 11:56 PM, kumar r wrote: > > > Hi, > > > > Thanks for the reply. > > > > Please find the command and time took to process it, > > > > > > > > > > > > *hbase(main):006:0> grant 'Selva','RW','@default'0 row(s) in 11.8830 > > secondshbase(main):007:0> revoke 'Selva','@default'0 row(s) in 32.4330 > > seconds* > > > > Find my HBase log in below pastebin > > > > http://pastebin.com/MHMjhHuF > > > > > > Thanks, > > > > Kumar > > > > > > On Mon, Jun 13, 2016 at 7:42 PM, Ted Yu wrote: > > > > > Can you inspect master log for the corresponding 40 seconds to see if > > there > > > was some clue ? > > > > > > Feel free to pastebin the log snippet for this period if you cannot > > > determine the cause. > > > > > > Cheers > > > > > > On Sun, Jun 12, 2016 at 10:19 PM, kumar r wrote: > > > > > > > Hi, > > > > > > > > I have configured secure HBase-1.1.3. Hadoop version using 2.7.2. > > > > > > > > I have enabled authorization in HBase. > > > > > > > > When executing any authorization command like user_permission, grant, > > > > revoke, > > > > etc. > > > > > > > > Its getting more than 40 seconds to display the result. > > > > > > > > Below are hbase-site.xml configuration properties > > > > > > > > > > > > > > > > hbase.master > > > > IP:6 > > > > > > > > > > > > hbase.rootdir > > > > hdfs://IP:9000/HBase > > > > > > > > > > > > hbase.cluster.distributed > > > > true > > > > > > > > > > > > hbase.zookeeper.quorum > > > > IP1:2181,IP2:2181,IP3:2181 > > > > > > > > > > > > hbase.master.port > > > > 6 > > > > > > > > > > > > hbase.master.info.port > > > > 60010 > > > > > > > > > > > > hbase.regionserver.port > > > > 60020 > > > > > > > > > > > > hbase.regionserver.info.port > > > > 60030 > > > > > > > > > > > > hbase.security.authentication > > > > KERBEROS > > > > > > > > > > > > hbase.master.keytab.file > > > > masterkeytab > > > > > > > > > > > > hbase.regionserver.keytab.file > > > > regionserverkeytab > > > > > > > > > > > > hbase.master.kerberos.principal > > > > masterprincipal > > > > > > > > > > > > hbase.regionserver.kerberos.principal > > > > regionserverprincipal > > > > > > > > > > > > hbase.rpc.engine > > > > org.apache.hadoop.hbase.ipc.SecureRpcEngine > > > > > > > > > > > > hbase.ssl.enabled > > > > true > > > > > > > > > > > > hbase.superuser > > > > @HadoopUser > > > > > > > > > > > > hbase.security.authorization > > > > true > > > > > > > > > > > > hbase.coprocessor.master.classes > > > > > > > > org.apache.hadoop.hbase.security.access.AccessController > > > > > > > > > > > > hbase.coprocessor.region.classes > > > > > > > > > > > > > > org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController > > > > > > > > > > > > Find my stack overflow question here > > > > > > > > > > http://stackoverflow.com/questions/37782043/hbase-acl-commands-are-too-slow > > > > > > > > Thanks, > > > > > > > > Kumar > > > > > > > > > >