[jira] [Commented] (HBASE-16414) Improve performance for RPC encryption with Apache Common Crypto
[ https://issues.apache.org/jira/browse/HBASE-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509150#comment-15509150 ] Jerry Chen commented on HBASE-16414: [~rayokota], agree, when creating the Apache Commons Crypto project, there was a discussion to replace Hadoop Commons crypto code (AES-NI) with Apache Commons Crypto. This is in plan. Thanks for pointing out. > Improve performance for RPC encryption with Apache Common Crypto > > > Key: HBASE-16414 > URL: https://issues.apache.org/jira/browse/HBASE-16414 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC >Affects Versions: 2.0.0 >Reporter: Colin Ma >Assignee: Colin Ma > Attachments: HBASE-16414.001.patch, HBASE-16414.002.patch, > HBASE-16414.003.patch, HBASE-16414.004.patch, > HbaseRpcEncryptionWithCrypoto.docx > > > Hbase RPC encryption is enabled by setting “hbase.rpc.protection” to > "privacy". With the token authentication, it utilized DIGEST-MD5 mechanisms > for secure authentication and data protection. For DIGEST-MD5, it uses DES, > 3DES or RC4 to do encryption and it is very slow, especially for Scan. This > will become the bottleneck of the RPC throughput. > Apache Commons Crypto is a cryptographic library optimized with AES-NI. It > provides Java API for both cipher level and Java stream level. Developers can > use it to implement high performance AES encryption/decryption with the > minimum code and effort. Compare with the current implementation of > org.apache.hadoop.hbase.io.crypto.aes.AES, Crypto supports both JCE Cipher > and OpenSSL Cipher which is better performance than JCE Cipher. User can > configure the cipher type and the default is JCE Cipher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-3776) Add Bloom Filter Support to HFileOutputFormat
[ https://issues.apache.org/jira/browse/HBASE-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated HBASE-3776: -- Assignee: Anoop Sam John (was: Jerry Chen) Add Bloom Filter Support to HFileOutputFormat - Key: HBASE-3776 URL: https://issues.apache.org/jira/browse/HBASE-3776 Project: HBase Issue Type: Sub-task Reporter: Nicolas Spiegelberg Assignee: Anoop Sam John Priority: Critical Labels: hbase Fix For: 0.96.0 Attachments: HBASE-3776.patch Add Bloom Filter support for bulk imports. Lacking a bloom filter, even on a single imported file, can cause perf degradation. Since we now set our compression type based on the HBase CF configuration, it would be good to follow this path for the bloom filter addition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3776) Add Bloom Filter Support to HFileOutputFormat
[ https://issues.apache.org/jira/browse/HBASE-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502588#comment-13502588 ] Jerry Chen commented on HBASE-3776: --- Please. Sorry I wasn't able to get to this. Sent from my iPhone Add Bloom Filter Support to HFileOutputFormat - Key: HBASE-3776 URL: https://issues.apache.org/jira/browse/HBASE-3776 Project: HBase Issue Type: Sub-task Reporter: Nicolas Spiegelberg Assignee: Jerry Chen Priority: Critical Labels: hbase Fix For: 0.96.0 Add Bloom Filter support for bulk imports. Lacking a bloom filter, even on a single imported file, can cause perf degradation. Since we now set our compression type based on the HBase CF configuration, it would be good to follow this path for the bloom filter addition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6222) Add per-KeyValue Security
[ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485788#comment-13485788 ] Jerry Chen commented on HBASE-6222: --- [~saint@gmail.com] bq. A put that broadened visibility would be for the current put only? How would it effect already-put values? When the visibility is part of the column key, a broadened visibility will not affect the existing columns that with the same {rowid, family, qualifier} but with different visibilities. Thus, the put will only affect the columns that has the same {rowid, family, qualifier, visibility}. Different visibilities has the same effect as different qualifiers. While as to DeleteFamily or DeleteColumn, Accumulo doesn't have such as operations. It has only Delete mutation to delete a specified {rowid, family, qualifier, visibility}. The idea to keep DeleteFamily and DeleteColumn still working with visibility in HBase is that A DeleteFamily operation now will only affect all columns in this family with the specified visibility other than originally all columns in this family. The same with DeleteColumn. One thing to consider if the visibility is part of the key. As there are suggestions to provide support for general tags for KV so that not only visibility tags can be stored in it, but also other tags that needed in the future can be added easily. Will general tags concept (comparing to a visibility tag) makes the concept of the column key too complex? Add per-KeyValue Security - Key: HBASE-6222 URL: https://issues.apache.org/jira/browse/HBASE-6222 Project: HBase Issue Type: New Feature Components: security Reporter: stack Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14 The Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)... Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6222) Add per-KeyValue Security
[ https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483960#comment-13483960 ] Jerry Chen commented on HBASE-6222: --- @Jon bq. From my point of view, I'd like really like to understand more than just accumulo's implementation – I really care about if accumulo's semantics are 1) intentional and required for accumulo use cases and 2) if applications only use a constrained sets of its capabilities. One specific thing I don't quite understand is the ramifications of having column visibility settings are encoded as part of the key and sort order. This could be equivalent expressions that are no longer equals, and some of somewhat goofy future/past views. I looked into the accumulo implementation. And one thing that accumulo want to achieve by making the ColumnVisibility as a part of the column key is that the authorization can be enforced without reading the existing records that may be affected by the current mutation. Because the ColumnVisibility is part of the key, you need to explicitly give the ColumnVisibility to identify/match to the column you are targeting to change (put or delete), and thus VisibilityConstraint check can be performed on the given ColumnVisibility with the user's authorization tokens to make sure the user has been authorized for performing the mutation logically on an existing column, without scanning the existing columns that it may make changes on. HBase and accumulo are at the same situation for this. There may some other issues to be addressed if something is not part of the key and also not part of the value when multiple versions of a logical column existed. For example, a Put with new Visibility values of the key of {row1, family1:qualifier1} will make an logical changes on all the cells with key {row1, family1:qualifier1}, and thus authorization must be checked over all these affected items (which may with different Visibility values) with the user authorization to see whether the Put can be performed or not. And the deletion gets the same thing to consider when considering DeleteFamily and DeleteColumn which logically affect a lot of columns that may have different Visibility values). One important change on the client API when the Visibility is part of the column key is the Visibility need to be specified either explicitly or implicitly (such as empty Visibility is used when no Visibility provided in the parameters) when performing Put or Delete mutations. This does seem a little strange at a first glance when comparing with approaches used by traditional database row level authorization such as Oracle Label Security. But the question is do we have other better choices both solve the problem and fit into the current framework? Add per-KeyValue Security - Key: HBASE-6222 URL: https://issues.apache.org/jira/browse/HBASE-6222 Project: HBase Issue Type: New Feature Components: security Reporter: stack Saw an interesting article: http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14 The Senate Armed Services Committee version of the fiscal 2013 national defense authorization act (S. 3254) would require DoD agencies to foreswear the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO certifies that there exists either no viable commercial open source database with security features comparable to [Accumulo] (such as the HBase or Cassandra databases)... Not sure what a 'commercial open source database' is, and I'm not sure whats going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security' like Accumulo's, we might put ourselves in the running for federal contributions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6868) Skip checksum is broke; are we double-checksumming by default?
[ https://issues.apache.org/jira/browse/HBASE-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461293#comment-13461293 ] Jerry Chen commented on HBASE-6868: --- On 89-fb, we are depending on inline HDFS checksum to solve the checksum iop overhead. See https://issues.apache.org/jira/browse/HDFS-2699. Our HDFS progress can be seen here: https://github.com/facebook/hadoop-20/tree/develop. It is code complete (not committed to github yet) and is under production testing. Skip checksum is broke; are we double-checksumming by default? -- Key: HBASE-6868 URL: https://issues.apache.org/jira/browse/HBASE-6868 Project: HBase Issue Type: Bug Components: HFile, wal Affects Versions: 0.94.0, 0.94.1 Reporter: LiuLei Priority: Blocker Fix For: 0.94.3, 0.96.0 The HFile contains checksums for decrease the iops, so when Hbase read HFile , that dont't need to read the checksum from meta file of HDFS. But HLog file of Hbase don't contain the checksum, so when HBase read the HLog, that must read checksum from meta file of HDFS. We could add setSkipChecksum per file to hdfs or we could write checksums into WAL if this skip checksum facility is enabled -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned
[ https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019468#comment-13019468 ] Jerry Chen commented on HBASE-3685: --- @stack, can you take a look at this? Kannan and Jonathan have reviewed it internally. when multiple columns are combined with TimestampFilter, only one column is returned Key: HBASE-3685 URL: https://issues.apache.org/jira/browse/HBASE-3685 Project: HBase Issue Type: Bug Components: filters, regionserver Reporter: Jerry Chen Priority: Minor Labels: noob Attachments: 3685-missing-column.patch As reported by an Hbase user: I have a ThreadMetadata column family, and there are two columns in it: v12:th: and v12:me. The following code only returns v12:me get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:); get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:); ListLong threadIds = new ArrayListLong(); threadIds.add(10709L); TimestampFilter filter = new TimestampFilter(threadIds); get.setFilter(filter); get.setMaxVersions(); Result result = table.get(get); I checked hbase for the key/value, they are present. Also other combinations like no timestampfilter, it returns both. Kannan was able to do a small repro of the issue and commented that if we drop the get.setMaxVersions(), then the problem goes away. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned
[ https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated HBASE-3685: -- Affects Version/s: (was: 0.89.20100924) Release Note: bug fix. Status: Patch Available (was: Open) when multiple columns are combined with TimestampFilter, only one column is returned Key: HBASE-3685 URL: https://issues.apache.org/jira/browse/HBASE-3685 Project: HBase Issue Type: Bug Components: filters, regionserver Reporter: Jerry Chen Priority: Minor Labels: noob As reported by an Hbase user: I have a ThreadMetadata column family, and there are two columns in it: v12:th: and v12:me. The following code only returns v12:me get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:); get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:); ListLong threadIds = new ArrayListLong(); threadIds.add(10709L); TimestampFilter filter = new TimestampFilter(threadIds); get.setFilter(filter); get.setMaxVersions(); Result result = table.get(get); I checked hbase for the key/value, they are present. Also other combinations like no timestampfilter, it returns both. Kannan was able to do a small repro of the issue and commented that if we drop the get.setMaxVersions(), then the problem goes away. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned
[ https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated HBASE-3685: -- Attachment: 3685-missing-column.patch bug fix. This patch has been submitted to review board as well. when multiple columns are combined with TimestampFilter, only one column is returned Key: HBASE-3685 URL: https://issues.apache.org/jira/browse/HBASE-3685 Project: HBase Issue Type: Bug Components: filters, regionserver Reporter: Jerry Chen Priority: Minor Labels: noob Attachments: 3685-missing-column.patch As reported by an Hbase user: I have a ThreadMetadata column family, and there are two columns in it: v12:th: and v12:me. The following code only returns v12:me get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:); get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:); ListLong threadIds = new ArrayListLong(); threadIds.add(10709L); TimestampFilter filter = new TimestampFilter(threadIds); get.setFilter(filter); get.setMaxVersions(); Result result = table.get(get); I checked hbase for the key/value, they are present. Also other combinations like no timestampfilter, it returns both. Kannan was able to do a small repro of the issue and commented that if we drop the get.setMaxVersions(), then the problem goes away. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3684) Support column range filter
[ https://issues.apache.org/jira/browse/HBASE-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated HBASE-3684: -- Attachment: 3684.patch patch for column range filter. Support column range filter Key: HBASE-3684 URL: https://issues.apache.org/jira/browse/HBASE-3684 Project: HBase Issue Type: New Feature Components: filters Reporter: Jerry Chen Priority: Minor Attachments: 3684.patch Original Estimate: 48h Remaining Estimate: 48h Currently we have a ColumnPrefix filter which will seek to the proper column prefix. We also need a column range filter to query a range of columns. The proposed interface is the following: ColumnRangeFilter(final byte[] minColumn, boolean minColumnInclusive,final byte[] maxColumn, boolean maxColumnInclusive) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3684) Support column range filter
Support column range filter Key: HBASE-3684 URL: https://issues.apache.org/jira/browse/HBASE-3684 Project: HBase Issue Type: New Feature Components: filters Reporter: Jerry Chen Priority: Minor Currently we have a ColumnPrefix filter which will seek to the proper column prefix. We also need a column range filter to query a range of columns. The proposed interface is the following: ColumnRangeFilter(final byte[] minColumn, boolean minColumnInclusive,final byte[] maxColumn, boolean maxColumnInclusive) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned
when multiple columns are combined with TimestampFilter, only one column is returned Key: HBASE-3685 URL: https://issues.apache.org/jira/browse/HBASE-3685 Project: HBase Issue Type: Bug Components: filters, regionserver Affects Versions: 0.89.20100924 Reporter: Jerry Chen Priority: Minor As reported by an Hbase user: I have a ThreadMetadata column family, and there are two columns in it: v12:th: and v12:me. The following code only returns v12:me get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:); get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:); ListLong threadIds = new ArrayListLong(); threadIds.add(10709L); TimestampFilter filter = new TimestampFilter(threadIds); get.setFilter(filter); get.setMaxVersions(); Result result = table.get(get); I checked hbase for the key/value, they are present. Also other combinations like no timestampfilter, it returns both. Kannan was able to do a small repro of the issue and commented that if we drop the get.setMaxVersions(), then the problem goes away. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira