[jira] [Commented] (HBASE-16414) Improve performance for RPC encryption with Apache Common Crypto

2016-09-21 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509150#comment-15509150
 ] 

Jerry Chen commented on HBASE-16414:


[~rayokota], agree, when creating the Apache Commons Crypto project, there was 
a discussion to replace Hadoop Commons crypto code (AES-NI) with Apache Commons 
Crypto. This is in plan. 
Thanks for pointing out.

> Improve performance for RPC encryption with Apache Common Crypto
> 
>
> Key: HBASE-16414
> URL: https://issues.apache.org/jira/browse/HBASE-16414
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC
>Affects Versions: 2.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HBASE-16414.001.patch, HBASE-16414.002.patch, 
> HBASE-16414.003.patch, HBASE-16414.004.patch, 
> HbaseRpcEncryptionWithCrypoto.docx
>
>
> Hbase RPC encryption is enabled by setting “hbase.rpc.protection” to 
> "privacy". With the token authentication, it utilized DIGEST-MD5 mechanisms 
> for secure authentication and data protection. For DIGEST-MD5, it uses DES, 
> 3DES or RC4 to do encryption and it is very slow, especially for Scan. This 
> will become the bottleneck of the RPC throughput.
> Apache Commons Crypto is a cryptographic library optimized with AES-NI. It 
> provides Java API for both cipher level and Java stream level. Developers can 
> use it to implement high performance AES encryption/decryption with the 
> minimum code and effort. Compare with the current implementation of 
> org.apache.hadoop.hbase.io.crypto.aes.AES, Crypto supports both JCE Cipher 
> and OpenSSL Cipher which is better performance than JCE Cipher. User can 
> configure the cipher type and the default is JCE Cipher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3776) Add Bloom Filter Support to HFileOutputFormat

2012-12-14 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated HBASE-3776:
--

Assignee: Anoop Sam John  (was: Jerry Chen)

 Add Bloom Filter Support to HFileOutputFormat
 -

 Key: HBASE-3776
 URL: https://issues.apache.org/jira/browse/HBASE-3776
 Project: HBase
  Issue Type: Sub-task
Reporter: Nicolas Spiegelberg
Assignee: Anoop Sam John
Priority: Critical
  Labels: hbase
 Fix For: 0.96.0

 Attachments: HBASE-3776.patch


 Add Bloom Filter support for bulk imports.  Lacking a bloom filter, even on a 
 single imported file, can cause perf degradation.  Since we now set our 
 compression type based on the HBase CF configuration, it would be good to 
 follow this path for the bloom filter addition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3776) Add Bloom Filter Support to HFileOutputFormat

2012-11-21 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502588#comment-13502588
 ] 

Jerry Chen commented on HBASE-3776:
---

Please. Sorry I wasn't able to get to this. 

Sent from my iPhone




 Add Bloom Filter Support to HFileOutputFormat
 -

 Key: HBASE-3776
 URL: https://issues.apache.org/jira/browse/HBASE-3776
 Project: HBase
  Issue Type: Sub-task
Reporter: Nicolas Spiegelberg
Assignee: Jerry Chen
Priority: Critical
  Labels: hbase
 Fix For: 0.96.0


 Add Bloom Filter support for bulk imports.  Lacking a bloom filter, even on a 
 single imported file, can cause perf degradation.  Since we now set our 
 compression type based on the HBase CF configuration, it would be good to 
 follow this path for the bloom filter addition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

2012-10-28 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485788#comment-13485788
 ] 

Jerry Chen commented on HBASE-6222:
---

[~saint@gmail.com]
bq. A put that broadened visibility would be for the current put only? How 
would it effect already-put values?
When the visibility is part of the column key, a broadened visibility will not 
affect the existing columns that with the same {rowid, family, qualifier} but 
with different visibilities. Thus, the put will only affect the columns that 
has the same {rowid, family, qualifier, visibility}. Different visibilities has 
the same effect as different qualifiers.

While as to DeleteFamily or DeleteColumn, Accumulo doesn't have such as 
operations. It has only Delete mutation to delete a specified {rowid, family, 
qualifier, visibility}. The idea to keep DeleteFamily and DeleteColumn still 
working with visibility in HBase is that A DeleteFamily operation now will only 
affect all columns in this family with the specified visibility other than 
originally all columns in this family. The same with DeleteColumn.

One thing to consider if the visibility is part of the key. As there are 
suggestions to provide support for general tags for KV so that not only 
visibility tags can be stored in it, but also other tags that needed in the 
future can be added easily. Will general tags concept (comparing to a 
visibility tag) makes the concept of the column key too complex?





 Add per-KeyValue Security
 -

 Key: HBASE-6222
 URL: https://issues.apache.org/jira/browse/HBASE-6222
 Project: HBase
  Issue Type: New Feature
  Components: security
Reporter: stack

 Saw an interesting article: 
 http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
 The  Senate Armed Services Committee version of the fiscal 2013 national 
 defense authorization act (S. 3254) would require DoD agencies to foreswear 
 the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO 
 certifies that there exists either no viable commercial open source database 
 with security features comparable to [Accumulo] (such as the HBase or 
 Cassandra databases)...
 Not sure what a 'commercial open source database' is, and I'm not sure whats 
 going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' 
 like Accumulo's, we might put ourselves in the running for federal 
 contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6222) Add per-KeyValue Security

2012-10-25 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483960#comment-13483960
 ] 

Jerry Chen commented on HBASE-6222:
---

@Jon

bq. From my point of view, I'd like really like to understand more than just 
accumulo's implementation – I really care about if accumulo's semantics are 1) 
intentional and required for accumulo use cases and 2) if applications only use 
a constrained sets of its capabilities. One specific thing I don't quite 
understand is the ramifications of having column visibility settings are 
encoded as part of the key and sort order. This could be equivalent expressions 
that are no longer equals, and some of somewhat goofy future/past views.  

I looked into the accumulo implementation. And one thing that accumulo want to 
achieve by making the ColumnVisibility as a part of the column key is that the 
authorization can be enforced without reading the existing records that may be 
affected by the current mutation. Because the ColumnVisibility is part of the 
key, you need to explicitly give the ColumnVisibility to identify/match to the 
column you are targeting to change (put or delete), and thus 
VisibilityConstraint check can be performed on the given ColumnVisibility with 
the user's authorization tokens to make sure the user has been authorized for 
performing the mutation logically on an existing column, without scanning the 
existing columns that it may make changes on. HBase and accumulo are at the 
same situation for this.

There may some other issues to be addressed if something is not part of the key 
and also not part of the value when multiple versions of a logical column 
existed. For example, a Put with new Visibility values of the key of {row1, 
family1:qualifier1} will make an logical changes on all the cells with key 
{row1, family1:qualifier1}, and thus authorization must be checked over all 
these affected items (which may with different Visibility values) with the user 
authorization to see whether the Put can be performed or not. And the deletion 
gets the same thing to consider when considering DeleteFamily and DeleteColumn 
which logically affect a lot of columns that may have different Visibility 
values).

One important change on the client API when the Visibility is part of the 
column key is the Visibility need to be specified either explicitly or 
implicitly (such as empty Visibility is used when no Visibility provided in the 
parameters) when performing Put or Delete mutations. This does seem a little 
strange at a first glance when comparing with approaches used by traditional 
database row level authorization such as Oracle Label Security. But the 
question is do we have other better choices both solve the problem and fit into 
the current framework?


 Add per-KeyValue Security
 -

 Key: HBASE-6222
 URL: https://issues.apache.org/jira/browse/HBASE-6222
 Project: HBase
  Issue Type: New Feature
  Components: security
Reporter: stack

 Saw an interesting article: 
 http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
 The  Senate Armed Services Committee version of the fiscal 2013 national 
 defense authorization act (S. 3254) would require DoD agencies to foreswear 
 the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO 
 certifies that there exists either no viable commercial open source database 
 with security features comparable to [Accumulo] (such as the HBase or 
 Cassandra databases)...
 Not sure what a 'commercial open source database' is, and I'm not sure whats 
 going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' 
 like Accumulo's, we might put ourselves in the running for federal 
 contributions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6868) Skip checksum is broke; are we double-checksumming by default?

2012-09-22 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461293#comment-13461293
 ] 

Jerry Chen commented on HBASE-6868:
---

On 89-fb, we are depending on inline HDFS checksum to solve the checksum iop 
overhead. See https://issues.apache.org/jira/browse/HDFS-2699. Our HDFS 
progress can be seen here: https://github.com/facebook/hadoop-20/tree/develop. 
It is code complete (not committed to github yet) and is under production 
testing. 

 Skip checksum is broke; are we double-checksumming by default?
 --

 Key: HBASE-6868
 URL: https://issues.apache.org/jira/browse/HBASE-6868
 Project: HBase
  Issue Type: Bug
  Components: HFile, wal
Affects Versions: 0.94.0, 0.94.1
Reporter: LiuLei
Priority: Blocker
 Fix For: 0.94.3, 0.96.0


 The HFile contains checksums for decrease the iops, so when Hbase read HFile 
 , that dont't need to read the checksum from meta file of HDFS.  But HLog 
 file of Hbase don't contain the checksum, so when HBase read the HLog, that 
 must read checksum from meta file of HDFS.  We could  add setSkipChecksum per 
 file to hdfs or we could write checksums into WAL if this skip checksum 
 facility is enabled 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-13 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019468#comment-13019468
 ] 

Jerry Chen commented on HBASE-3685:
---

@stack, can you take a look at this? Kannan and Jonathan have reviewed it 
internally. 

 when multiple columns are combined with TimestampFilter, only one column is 
 returned
 

 Key: HBASE-3685
 URL: https://issues.apache.org/jira/browse/HBASE-3685
 Project: HBase
  Issue Type: Bug
  Components: filters, regionserver
Reporter: Jerry Chen
Priority: Minor
  Labels: noob
 Attachments: 3685-missing-column.patch


 As reported by an Hbase user: 
 I have a ThreadMetadata column family, and there are two columns in it: 
 v12:th: and v12:me. The following code only returns v12:me
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:);
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:);
 ListLong threadIds = new ArrayListLong();
 threadIds.add(10709L);
 TimestampFilter filter = new TimestampFilter(threadIds);
 get.setFilter(filter);
 get.setMaxVersions();
 Result result = table.get(get);
 I checked hbase for the key/value, they are present. Also other combinations 
 like no timestampfilter, it returns both.
 Kannan was able to do a small repro of the issue and commented that if we 
 drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-04 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated HBASE-3685:
--

Affects Version/s: (was: 0.89.20100924)
 Release Note: bug fix. 
   Status: Patch Available  (was: Open)

 when multiple columns are combined with TimestampFilter, only one column is 
 returned
 

 Key: HBASE-3685
 URL: https://issues.apache.org/jira/browse/HBASE-3685
 Project: HBase
  Issue Type: Bug
  Components: filters, regionserver
Reporter: Jerry Chen
Priority: Minor
  Labels: noob

 As reported by an Hbase user: 
 I have a ThreadMetadata column family, and there are two columns in it: 
 v12:th: and v12:me. The following code only returns v12:me
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:);
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:);
 ListLong threadIds = new ArrayListLong();
 threadIds.add(10709L);
 TimestampFilter filter = new TimestampFilter(threadIds);
 get.setFilter(filter);
 get.setMaxVersions();
 Result result = table.get(get);
 I checked hbase for the key/value, they are present. Also other combinations 
 like no timestampfilter, it returns both.
 Kannan was able to do a small repro of the issue and commented that if we 
 drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-04 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated HBASE-3685:
--

Attachment: 3685-missing-column.patch

bug fix. This patch has been submitted to review board as well. 

 when multiple columns are combined with TimestampFilter, only one column is 
 returned
 

 Key: HBASE-3685
 URL: https://issues.apache.org/jira/browse/HBASE-3685
 Project: HBase
  Issue Type: Bug
  Components: filters, regionserver
Reporter: Jerry Chen
Priority: Minor
  Labels: noob
 Attachments: 3685-missing-column.patch


 As reported by an Hbase user: 
 I have a ThreadMetadata column family, and there are two columns in it: 
 v12:th: and v12:me. The following code only returns v12:me
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:);
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:);
 ListLong threadIds = new ArrayListLong();
 threadIds.add(10709L);
 TimestampFilter filter = new TimestampFilter(threadIds);
 get.setFilter(filter);
 get.setMaxVersions();
 Result result = table.get(get);
 I checked hbase for the key/value, they are present. Also other combinations 
 like no timestampfilter, it returns both.
 Kannan was able to do a small repro of the issue and commented that if we 
 drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3684) Support column range filter

2011-03-29 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated HBASE-3684:
--

Attachment: 3684.patch

patch for column range filter. 

 Support column range filter 
 

 Key: HBASE-3684
 URL: https://issues.apache.org/jira/browse/HBASE-3684
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Jerry Chen
Priority: Minor
 Attachments: 3684.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Currently we have a ColumnPrefix filter which will seek to the proper column 
 prefix. We also need a column range filter to query a range of columns. The 
 proposed interface is the following: ColumnRangeFilter(final byte[] 
 minColumn, boolean minColumnInclusive,final byte[] maxColumn, boolean 
 maxColumnInclusive) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3684) Support column range filter

2011-03-22 Thread Jerry Chen (JIRA)
Support column range filter 


 Key: HBASE-3684
 URL: https://issues.apache.org/jira/browse/HBASE-3684
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Jerry Chen
Priority: Minor


Currently we have a ColumnPrefix filter which will seek to the proper column 
prefix. We also need a column range filter to query a range of columns. The 
proposed interface is the following: ColumnRangeFilter(final byte[] minColumn, 
boolean minColumnInclusive,final byte[] maxColumn, boolean maxColumnInclusive) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-03-22 Thread Jerry Chen (JIRA)
when multiple columns are combined with TimestampFilter, only one column is 
returned


 Key: HBASE-3685
 URL: https://issues.apache.org/jira/browse/HBASE-3685
 Project: HBase
  Issue Type: Bug
  Components: filters, regionserver
Affects Versions: 0.89.20100924
Reporter: Jerry Chen
Priority: Minor


As reported by an Hbase user: 

I have a ThreadMetadata column family, and there are two columns in it: 
v12:th: and v12:me. The following code only returns v12:me

get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:);
get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:);
ListLong threadIds = new ArrayListLong();
threadIds.add(10709L);
TimestampFilter filter = new TimestampFilter(threadIds);
get.setFilter(filter);
get.setMaxVersions();
Result result = table.get(get);

I checked hbase for the key/value, they are present. Also other combinations 
like no timestampfilter, it returns both.

Kannan was able to do a small repro of the issue and commented that if we drop 
the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira