[jira] [Commented] (HBASE-5783) Faster HBase bulk loader
[ https://issues.apache.org/jira/browse/HBASE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476780#comment-13476780 ] Lars Hofhansl commented on HBASE-5783: -- The tracking cookie is a very interesting idea! Do we need to track every single cookie? Or can we just track the highest one per region, and if that made it to disk all previous ones are on disk? With MR Bulk Loader you mean LoadIncrementalHFiles? Or Import/ImportTsv? Faster HBase bulk loader Key: HBASE-5783 URL: https://issues.apache.org/jira/browse/HBASE-5783 Project: HBase Issue Type: New Feature Components: Client, IPC/RPC, Performance, regionserver Reporter: Karthik Ranganathan Assignee: Amitanand Aiyer We can get a 3x to 4x gain based on a prototype demonstrating this approach in effect (hackily) over the MR bulk loader for very large data sets by doing the following: 1. Do direct multi-puts from HBase client using GZIP compressed RPC's 2. Turn off WAL (we will ensure no data loss in another way) 3. For each bulk load client, we need to: 3.1 do a put 3.2 get back a tracking cookie (memstoreTs or HLogSequenceId) per put 3.3 be able to ask the RS if the tracking cookie has been flushed to disk 4. For each client, we can succeed it if the tracking cookie for the last put it did (for every RS) makes it to disk. Otherwise the map task fails and is retried. 5. If the last put did not make it to disk for a timeout (say a second or so) we issue a manual flush. Enhancements: - Increase the memstore size so that we flush larger files - Decrease the compaction ratios (say increase the number of files to compact) Quick background: The bottlenecks in the multiput approach are that the data is transferred *uncompressed* twice over the top-of-rack: once from the client to the RS (on the multi put call) and again because of WAL (HDFS replication). We reduced the former with RPC compression and eliminated the latter above while still guaranteeing that data wont be lost. This is better than the MR bulk loader at a high level because we dont need to merge sort all the files for a given region and then make it a HFile - thats the equivalent of bulk loading AND majorcompacting in one shot. Also there is much more disk involved in the MR method (sort/spill). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476804#comment-13476804 ] Anoop Sam John commented on HBASE-6942: --- API signature wise I am okey Lars. Passing in Scan attributes, I was in 2 minds all the time..:) BulkDeleteResponse delete(Scan scan, DeleteType type, Long timestamp) We need to pass the rowBatchSize too.. This is needed to accumulate the rows for a batched delete. Do we need a Request object taking up the attributes? Yes some thing inline of protobufs? Just asked Yes above suggestions sounds good to me bq.Documenting this will be tricky. I can have a shot at that (if you like, Anoop. If you prefer to do that, that's fine too). Yes Lars you can do that if you like :) bq.Eventually, since we made it so general now, I can see this as an official API in HTable... But let's do that in an another jira (if others agree). +1 Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476807#comment-13476807 ] Anoop Sam John commented on HBASE-6942: --- bq.Just worried that N will necessarily be (much?) larger than M. We do not have any way to get only the 1st KVs from all the families of row right? We have FirstKeyOnlyFilter now. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6786) Convert MultiRowMutationProtocol to protocol buffer service
[ https://issues.apache.org/jira/browse/HBASE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476809#comment-13476809 ] Hadoop QA commented on HBASE-6786: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549266/6786-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 82 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3056//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3056//console This message is automatically generated. Convert MultiRowMutationProtocol to protocol buffer service --- Key: HBASE-6786 URL: https://issues.apache.org/jira/browse/HBASE-6786 Project: HBase Issue Type: Sub-task Components: Coprocessors Reporter: Gary Helmling Assignee: Devaraj Das Fix For: 0.96.0 Attachments: 6786-1.patch With coprocessor endpoints now exposed as protobuf defined services, we should convert over all of our built-in endpoints to PB services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6979) recovered.edits file should not break distributed log splitting
[ https://issues.apache.org/jira/browse/HBASE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476812#comment-13476812 ] stack commented on HBASE-6979: -- I'm ok w/ this. /tmp is better than no where. And this is a mess when folks run into it -- if they ever do. This is good clean up in that case. recovered.edits file should not break distributed log splitting --- Key: HBASE-6979 URL: https://issues.apache.org/jira/browse/HBASE-6979 Project: HBase Issue Type: Improvement Components: master Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-6979.patch Distributed log splitting fails in creating the recovered.edits folder during upgrade because there is a file called recovered.edits there. Instead of checking if the patch exists, we need to check if it exists and is a path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476814#comment-13476814 ] Lars Hofhansl commented on HBASE-6942: -- bq. We do not have any way to get only the 1st KVs from all the families of row right? We have FirstKeyOnlyFilter now. Right. I can't see any way to only get the 1st KV for a CF. I think we can live with that for now. bq. We need to pass the rowBatchSize too Yes, forgot about that. bq. Do we need a Request object taking up the attributes? Yes some thing inline of protobufs? Not sure. I like just passing the four parameters needed. (If that is not possible with protobufs, we should use a request object.) Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
[ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476830#comment-13476830 ] nkeywal commented on HBASE-5843: bq. Not sure what you mean here. Are the rows the same or not? Are there are just more flushes on the 10M case? Yes, it's exactly the same rows for 1M puts and 10M puts. Improve HBase MTTR - Mean Time To Recover - Key: HBASE-5843 URL: https://issues.apache.org/jira/browse/HBASE-5843 Project: HBase Issue Type: Umbrella Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal A part of the approach is described here: https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit The ideal target is: - failure impact client applications only by an added delay to execute a query, whatever the failure. - this delay is always inferior to 1 second. We're not going to achieve that immediately... Priority will be given to the most frequent issues. Short term: - software crash - standard administrative tasks as stop/start of a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6962) Upgrade hadoop 1 dependency to hadoop 1.1
[ https://issues.apache.org/jira/browse/HBASE-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476935#comment-13476935 ] Hudson commented on HBASE-6962: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #222 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/222/]) HBASE-6962 Upgrade hadoop 1 dependency to hadoop 1.1 (Revision 1398580) Result = FAILURE enis : Files : * /hbase/trunk/pom.xml Upgrade hadoop 1 dependency to hadoop 1.1 - Key: HBASE-6962 URL: https://issues.apache.org/jira/browse/HBASE-6962 Project: HBase Issue Type: Bug Environment: hadoop 1.1 contains multiple important fixes, including HDFS-3703 Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.0 Attachments: 6962.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6998) Uncatched exception in main() makes the HMaster/HRegionServer process suspend
liang xie created HBASE-6998: Summary: Uncatched exception in main() makes the HMaster/HRegionServer process suspend Key: HBASE-6998 URL: https://issues.apache.org/jira/browse/HBASE-6998 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.2, 0.96.0 Environment: CentOS6.2 + CDH4.1 HDFS + hbase0.94.2 Reporter: liang xie Assignee: liang xie I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691) Then i need to kill the process manually each time, so annoyed. After applied the attached patch, the process will exist as expected, then i am happy again :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6998) Uncatched exception in main() makes the HMaster/HRegionServer process suspend
[ https://issues.apache.org/jira/browse/HBASE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang xie updated HBASE-6998: - Attachment: HBASE-6998.patch Uncatched exception in main() makes the HMaster/HRegionServer process suspend - Key: HBASE-6998 URL: https://issues.apache.org/jira/browse/HBASE-6998 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.2, 0.96.0 Environment: CentOS6.2 + CDH4.1 HDFS + hbase0.94.2 Reporter: liang xie Assignee: liang xie Attachments: HBASE-6998.patch I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691) Then i need to kill the process manually each time, so annoyed. After applied the attached patch, the process will exist as expected, then i am happy again :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6998) Uncatched exception in main() makes the HMaster/HRegionServer process suspend
[ https://issues.apache.org/jira/browse/HBASE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang xie updated HBASE-6998: - Description: I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691) Then i need to kill the process manually to cleanup each time, so annoyed. After applied the attached patch, the process will exist as expected, then i am happy again :) was: I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691) Then i need to kill the process manually each time, so annoyed. After applied the attached patch, the process will exist as expected, then i am happy again :) Uncatched exception in main() makes the HMaster/HRegionServer process suspend - Key: HBASE-6998 URL: https://issues.apache.org/jira/browse/HBASE-6998 Project:
[jira] [Commented] (HBASE-6974) Metric for blocked updates
[ https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476960#comment-13476960 ] Michael Drzal commented on HBASE-6974: -- I'll try to get a patch up later today. Metric for blocked updates -- Key: HBASE-6974 URL: https://issues.apache.org/jira/browse/HBASE-6974 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Michael Drzal Priority: Critical Fix For: 0.94.3, 0.96.0 When the disc subsystem cannot keep up with a sustained high write load, a region will eventually block updates to throttle clients. (HRegion.checkResources). It would be nice to have a metric for this, so that these occurrences can be tracked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6998) Uncatched exception in main() makes the HMaster/HRegionServer process suspend
[ https://issues.apache.org/jira/browse/HBASE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang xie updated HBASE-6998: - Attachment: (was: HBASE-6998.patch) Uncatched exception in main() makes the HMaster/HRegionServer process suspend - Key: HBASE-6998 URL: https://issues.apache.org/jira/browse/HBASE-6998 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.2, 0.96.0 Environment: CentOS6.2 + CDH4.1 HDFS + hbase0.94.2 Reporter: liang xie Assignee: liang xie Attachments: HBASE-6998.patch I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691) Then i need to kill the process manually to cleanup each time, so annoyed. After applied the attached patch, the process will exist as expected, then i am happy again :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6998) Uncatched exception in main() makes the HMaster/HRegionServer process suspend
[ https://issues.apache.org/jira/browse/HBASE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang xie updated HBASE-6998: - Attachment: HBASE-6998.patch Uncatched exception in main() makes the HMaster/HRegionServer process suspend - Key: HBASE-6998 URL: https://issues.apache.org/jira/browse/HBASE-6998 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.2, 0.96.0 Environment: CentOS6.2 + CDH4.1 HDFS + hbase0.94.2 Reporter: liang xie Assignee: liang xie Attachments: HBASE-6998.patch I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691) Then i need to kill the process manually to cleanup each time, so annoyed. After applied the attached patch, the process will exist as expected, then i am happy again :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6998) Uncatched exception in main() makes the HMaster/HRegionServer process suspend
[ https://issues.apache.org/jira/browse/HBASE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang xie updated HBASE-6998: - Status: Patch Available (was: Open) Uncatched exception in main() makes the HMaster/HRegionServer process suspend - Key: HBASE-6998 URL: https://issues.apache.org/jira/browse/HBASE-6998 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.2, 0.96.0 Environment: CentOS6.2 + CDH4.1 HDFS + hbase0.94.2 Reporter: liang xie Assignee: liang xie Attachments: HBASE-6998.patch I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691) Then i need to kill the process manually to cleanup each time, so annoyed. After applied the attached patch, the process will exist as expected, then i am happy again :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6965) Generic MXBean Utility class to support all JDK vendors
[ https://issues.apache.org/jira/browse/HBASE-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476983#comment-13476983 ] Kumar Ravi commented on HBASE-6965: --- As suggested, I have combined the two patches into one. I ran above core tests with multiple JDKs before and after submitting the patch and have been unable to recreaet the problems in above builds. I understand that the problem reported in above pre-commit build is in the hbase-server area, whereas the patch above applies to the hbase-common module. Can someone suggest next steps on how to resolve this issue? Generic MXBean Utility class to support all JDK vendors --- Key: HBASE-6965 URL: https://issues.apache.org/jira/browse/HBASE-6965 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.94.1 Reporter: Kumar Ravi Assignee: Kumar Ravi Labels: patch Fix For: 0.94.3 Attachments: HBASE-6965.patch This issue is related to JIRA https://issues.apache.org/jira/browse/HBASE-6945. This issue is opened to propose the use of a newly created generic org.apache.hadoop.hbase.util.OSMXBean class that can be used by other classes. JIRA HBASE-6945 contains a patch for the class org.apache.hadoop.hbase.ResourceChecker that uses OSMXBean. With the inclusion of this new class, HBase can be built and become functional with JDKs and JREs other than what is provided by Oracle. This class uses reflection to determine the JVM vendor (Sun, IBM) and the platform (Linux or Windows), and contains other methods that return the OS properties - 1. Number of Open File descriptors; 2. Maximum number of File Descriptors. This class compiles without any problems with IBM JDK 7, OpenJDK 6 as well as Oracle JDK 6. Junit tests (runDevTests category) completed without any failures or errors when tested on all the three JDKs.The builds and tests were attempted on branch hbase-0.94 Revision 1396305. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6965) Generic MXBean Utility class to support all JDK vendors
[ https://issues.apache.org/jira/browse/HBASE-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476997#comment-13476997 ] nkeywal commented on HBASE-6965: Hi Kumar, The issue above is very unlikely to be caused by your patch. precommit is an environment where all tests are executed before being integrated. But some tests are, unfortunately, flaky. It's likely to be the cause here. The next step for your patch is to be reviewed and committed to trunk. Ted reviewed it already, so if he's ok he will commit it. If he does not look at it within two days I will have a look at it myself (or another committer will take the lead in between). Thanks for your contribution :-) Generic MXBean Utility class to support all JDK vendors --- Key: HBASE-6965 URL: https://issues.apache.org/jira/browse/HBASE-6965 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.94.1 Reporter: Kumar Ravi Assignee: Kumar Ravi Labels: patch Fix For: 0.94.3 Attachments: HBASE-6965.patch This issue is related to JIRA https://issues.apache.org/jira/browse/HBASE-6945. This issue is opened to propose the use of a newly created generic org.apache.hadoop.hbase.util.OSMXBean class that can be used by other classes. JIRA HBASE-6945 contains a patch for the class org.apache.hadoop.hbase.ResourceChecker that uses OSMXBean. With the inclusion of this new class, HBase can be built and become functional with JDKs and JREs other than what is provided by Oracle. This class uses reflection to determine the JVM vendor (Sun, IBM) and the platform (Linux or Windows), and contains other methods that return the OS properties - 1. Number of Open File descriptors; 2. Maximum number of File Descriptors. This class compiles without any problems with IBM JDK 7, OpenJDK 6 as well as Oracle JDK 6. Junit tests (runDevTests category) completed without any failures or errors when tested on all the three JDKs.The builds and tests were attempted on branch hbase-0.94 Revision 1396305. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6998) Uncatched exception in main() makes the HMaster/HRegionServer process suspend
[ https://issues.apache.org/jira/browse/HBASE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477013#comment-13477013 ] Hadoop QA commented on HBASE-6998: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549299/HBASE-6998.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 82 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3057//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3057//console This message is automatically generated. Uncatched exception in main() makes the HMaster/HRegionServer process suspend - Key: HBASE-6998 URL: https://issues.apache.org/jira/browse/HBASE-6998 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.2, 0.96.0 Environment: CentOS6.2 + CDH4.1 HDFS + hbase0.94.2 Reporter: liang xie Assignee: liang xie Attachments: HBASE-6998.patch I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691)
[jira] [Commented] (HBASE-6965) Generic MXBean Utility class to support all JDK vendors
[ https://issues.apache.org/jira/browse/HBASE-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477023#comment-13477023 ] Ted Yu commented on HBASE-6965: --- {code} + * It will decide to use the sun api or its own implementation {code} I think it's better to replace sun with Oracle. {code} +public class OSMXBean {code} Please add annotation for audience and stability for the above class. {code} + * Check if the OS is unix. If using the IBM java runtime, this + * will only work for linux. {code} Do you need to mention IBM in the above javadoc ? {code} + public boolean getUnix() { {code} Rename method to isUnix(). {code} + private Long getOSUnixMXBeanMethod (String mBeanMethodName) {code} Rename above method runUnixMXBeanMethod(). Generic MXBean Utility class to support all JDK vendors --- Key: HBASE-6965 URL: https://issues.apache.org/jira/browse/HBASE-6965 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.94.1 Reporter: Kumar Ravi Assignee: Kumar Ravi Labels: patch Fix For: 0.94.3 Attachments: HBASE-6965.patch This issue is related to JIRA https://issues.apache.org/jira/browse/HBASE-6945. This issue is opened to propose the use of a newly created generic org.apache.hadoop.hbase.util.OSMXBean class that can be used by other classes. JIRA HBASE-6945 contains a patch for the class org.apache.hadoop.hbase.ResourceChecker that uses OSMXBean. With the inclusion of this new class, HBase can be built and become functional with JDKs and JREs other than what is provided by Oracle. This class uses reflection to determine the JVM vendor (Sun, IBM) and the platform (Linux or Windows), and contains other methods that return the OS properties - 1. Number of Open File descriptors; 2. Maximum number of File Descriptors. This class compiles without any problems with IBM JDK 7, OpenJDK 6 as well as Oracle JDK 6. Junit tests (runDevTests category) completed without any failures or errors when tested on all the three JDKs.The builds and tests were attempted on branch hbase-0.94 Revision 1396305. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6998) Uncaught exception in main() makes the HMaster/HRegionServer process suspend
[ https://issues.apache.org/jira/browse/HBASE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6998: -- Summary: Uncaught exception in main() makes the HMaster/HRegionServer process suspend (was: Uncatched exception in main() makes the HMaster/HRegionServer process suspend) Uncaught exception in main() makes the HMaster/HRegionServer process suspend Key: HBASE-6998 URL: https://issues.apache.org/jira/browse/HBASE-6998 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.2, 0.96.0 Environment: CentOS6.2 + CDH4.1 HDFS + hbase0.94.2 Reporter: liang xie Assignee: liang xie Attachments: HBASE-6998.patch I am trying HDFS QJM feature in our test env. after a misconfig, i found the HMaster/HRegionServer process still up if the main thread is dead. Here is the stack trace: xception in thread main java.net.UnknownHostException: unknown host: cluster1 at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196) at org.apache.hadoop.ipc.Client.call(Client.java:1050) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3647) at org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:3631) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3691) Then i need to kill the process manually to cleanup each time, so annoyed. After applied the attached patch, the process will exist as expected, then i am happy again :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit
[ https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477031#comment-13477031 ] nkeywal commented on HBASE-4955: From JUnit mailing list (15th oct) ??I am happy to announce the release of JUnit 4.11-beta-1. There have been a lot of contributions by a full cast of contributors.?? So we will have the release this quarter with some luck. Surefire: Still waiting for #800. May be it will make it to the 2.13. No date. Use the official versions of surefire junit - Key: HBASE-4955 URL: https://issues.apache.org/jira/browse/HBASE-4955 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor We currently use private versions for Surefire JUnit since HBASE-4763. This JIRA traks what we need to move to official versions. Surefire 2.11 is just out, but, after some tests, it does not contain all what we need. JUnit. Could be for JUnit 4.11. Issue to monitor: https://github.com/KentBeck/junit/issues/359: fixed in our version, no feedback for an integration on trunk Surefire: Could be for Surefire 2.12. Issues to monitor are: 329 (category support): fixed, we use the official implementation from the trunk 786 (@Category with forkMode=always): fixed, we use the official implementation from the trunk 791 (incorrect elapsed time on test failure): fixed, we use the official implementation from the trunk 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on our version. 760 (does not take into account the test method): fixed in trunk, not fixed in our version 798 (print immediately the test class name): not fixed in trunk, not fixed in our version 799 (Allow test parallelization when forkMode=always): not fixed in trunk, not fixed in our version 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, fixed on our version 800 793 are the more important to monitor, it's the only ones that are fixed in our version but not on trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477064#comment-13477064 ] Ted Yu commented on HBASE-6942: --- {code} +long noOfRowsDeleted = invokeBulkDeleteProtocol(tableName, new Scan(), 500, DeleteType.ROW, +null); {code} I think the test should also cover the case where batchSize is smaller than the number of rows to be deleted. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HBASE-6974) Metric for blocked updates
[ https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-6974 started by Michael Drzal. Metric for blocked updates -- Key: HBASE-6974 URL: https://issues.apache.org/jira/browse/HBASE-6974 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Michael Drzal Priority: Critical Fix For: 0.94.3, 0.96.0 When the disc subsystem cannot keep up with a sustained high write load, a region will eventually block updates to throttle clients. (HRegion.checkResources). It would be nice to have a metric for this, so that these occurrences can be tracked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6965) Generic MXBean Utility class to support all JDK vendors
[ https://issues.apache.org/jira/browse/HBASE-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477069#comment-13477069 ] Kumar Ravi commented on HBASE-6965: --- Ted and nkeywal - Thanks for your comments and patience working with me on this Jira. Ted, I'm working on addressing your concerns raised above. I do have one question: Not sure what you mean by Please add annotation for audience and stability for the above class. I think by audience you mean when this class gets invoked? I am not clear about what you mean by stability. If you could point me to some examples in the hbase code that would be great. Generic MXBean Utility class to support all JDK vendors --- Key: HBASE-6965 URL: https://issues.apache.org/jira/browse/HBASE-6965 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.94.1 Reporter: Kumar Ravi Assignee: Kumar Ravi Labels: patch Fix For: 0.94.3 Attachments: HBASE-6965.patch This issue is related to JIRA https://issues.apache.org/jira/browse/HBASE-6945. This issue is opened to propose the use of a newly created generic org.apache.hadoop.hbase.util.OSMXBean class that can be used by other classes. JIRA HBASE-6945 contains a patch for the class org.apache.hadoop.hbase.ResourceChecker that uses OSMXBean. With the inclusion of this new class, HBase can be built and become functional with JDKs and JREs other than what is provided by Oracle. This class uses reflection to determine the JVM vendor (Sun, IBM) and the platform (Linux or Windows), and contains other methods that return the OS properties - 1. Number of Open File descriptors; 2. Maximum number of File Descriptors. This class compiles without any problems with IBM JDK 7, OpenJDK 6 as well as Oracle JDK 6. Junit tests (runDevTests category) completed without any failures or errors when tested on all the three JDKs.The builds and tests were attempted on branch hbase-0.94 Revision 1396305. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6965) Generic MXBean Utility class to support all JDK vendors
[ https://issues.apache.org/jira/browse/HBASE-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477089#comment-13477089 ] Kumar Ravi commented on HBASE-6965: --- I looked at the other classes in the util sub-dir and I now understand what is meant by Audience and stability. Please ignore my earlier comment. Thanks Generic MXBean Utility class to support all JDK vendors --- Key: HBASE-6965 URL: https://issues.apache.org/jira/browse/HBASE-6965 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.94.1 Reporter: Kumar Ravi Assignee: Kumar Ravi Labels: patch Fix For: 0.94.3 Attachments: HBASE-6965.patch This issue is related to JIRA https://issues.apache.org/jira/browse/HBASE-6945. This issue is opened to propose the use of a newly created generic org.apache.hadoop.hbase.util.OSMXBean class that can be used by other classes. JIRA HBASE-6945 contains a patch for the class org.apache.hadoop.hbase.ResourceChecker that uses OSMXBean. With the inclusion of this new class, HBase can be built and become functional with JDKs and JREs other than what is provided by Oracle. This class uses reflection to determine the JVM vendor (Sun, IBM) and the platform (Linux or Windows), and contains other methods that return the OS properties - 1. Number of Open File descriptors; 2. Maximum number of File Descriptors. This class compiles without any problems with IBM JDK 7, OpenJDK 6 as well as Oracle JDK 6. Junit tests (runDevTests category) completed without any failures or errors when tested on all the three JDKs.The builds and tests were attempted on branch hbase-0.94 Revision 1396305. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477128#comment-13477128 ] Ted Yu commented on HBASE-6942: --- {code} BulkDeleteResponse delete(Scan scan, DeleteType type, Long timestamp, int batchSize) {code} What if user wants to delete more than one column family ? How would he / she formulate through one request ? Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5783) Faster HBase bulk loader
[ https://issues.apache.org/jira/browse/HBASE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477137#comment-13477137 ] Karthik Ranganathan commented on HBASE-5783: No, we track only the last (highest) one per region. Also, in the actual implementation, we did it with just timestamps from the RS. So, after doing all the puts the loader gets the time on the RS (t1). The server tracks the start time of the last successfully completed flush {t2). Querying that and making sure t2 t1 is enough. Of course - if the region has moved gracefully, thats considered a success too as an optimization. We used the term MR Bulk Loader simply to say that the load of the data should be repeatable in case of failure (as opposed to a online use case). Faster HBase bulk loader Key: HBASE-5783 URL: https://issues.apache.org/jira/browse/HBASE-5783 Project: HBase Issue Type: New Feature Components: Client, IPC/RPC, Performance, regionserver Reporter: Karthik Ranganathan Assignee: Amitanand Aiyer We can get a 3x to 4x gain based on a prototype demonstrating this approach in effect (hackily) over the MR bulk loader for very large data sets by doing the following: 1. Do direct multi-puts from HBase client using GZIP compressed RPC's 2. Turn off WAL (we will ensure no data loss in another way) 3. For each bulk load client, we need to: 3.1 do a put 3.2 get back a tracking cookie (memstoreTs or HLogSequenceId) per put 3.3 be able to ask the RS if the tracking cookie has been flushed to disk 4. For each client, we can succeed it if the tracking cookie for the last put it did (for every RS) makes it to disk. Otherwise the map task fails and is retried. 5. If the last put did not make it to disk for a timeout (say a second or so) we issue a manual flush. Enhancements: - Increase the memstore size so that we flush larger files - Decrease the compaction ratios (say increase the number of files to compact) Quick background: The bottlenecks in the multiput approach are that the data is transferred *uncompressed* twice over the top-of-rack: once from the client to the RS (on the multi put call) and again because of WAL (HDFS replication). We reduced the former with RPC compression and eliminated the latter above while still guaranteeing that data wont be lost. This is better than the MR bulk loader at a high level because we dont need to merge sort all the files for a given region and then make it a HFile - thats the equivalent of bulk loading AND majorcompacting in one shot. Also there is much more disk involved in the MR method (sort/spill). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6979) recovered.edits file should not break distributed log splitting
[ https://issues.apache.org/jira/browse/HBASE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477141#comment-13477141 ] Jimmy Xiang commented on HBASE-6979: Thanks a lot for the review. I will commit this in trunk tomorrow if no objection. recovered.edits file should not break distributed log splitting --- Key: HBASE-6979 URL: https://issues.apache.org/jira/browse/HBASE-6979 Project: HBase Issue Type: Improvement Components: master Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-6979.patch Distributed log splitting fails in creating the recovered.edits folder during upgrade because there is a file called recovered.edits there. Instead of checking if the patch exists, we need to check if it exists and is a path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477143#comment-13477143 ] Anoop Sam John commented on HBASE-6942: --- bq.What if user wants to delete more than one column family ? How would he / she formulate through one request ? When the type is FAMILY, we will delete all the families coming as part of the scan result.. So add N families in Scan Yes I will add that test case also.. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477156#comment-13477156 ] Ted Yu commented on HBASE-6942: --- bq. as part of the scan result ListKeyValue is returned from scanner.next(). It would be easier to understand user intention through Scan.getFamilyMap() instead of analyzing scan result. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477174#comment-13477174 ] Lars Hofhansl commented on HBASE-6942: -- I don't necessarily agree here, Ted. Analyzing the scan result is the whole point of this jira. Passing a template family map will not make this easier for a user (IMHO). Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477177#comment-13477177 ] Ted Yu commented on HBASE-6942: --- bq. Just worried that N will necessarily be (much?) larger than M. How do we correlate scan result with which column families to delete ? Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores
[ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477194#comment-13477194 ] Karthik Ranganathan commented on HBASE-6980: @ramakrishna - this should not be necessary for ensuring no data loss right? Once we have a snapshot memstore, we automatically should know the max seq id to which it has data - that would never change. 1. From what I remember of the code (when I was looking into something unrelated), we track the *min* seq id from the current memstore instead of the max seq id from the snapshot memstore to put into the HLog when its rolled after a flush. So this synchronization becomes necessary - if we store the max seq id along with the memstore that is flushed, we should be able to eliminate the locks. 2. Also, its arguable if we need the absolute correct max-seq-id flushed. In a very small % of cases, we would end up rolling logs a bit slower. As long as we are conservative with updating the max seq id in the HLog we should be good, right? Parallel Flushing Of Memstores -- Key: HBASE-6980 URL: https://issues.apache.org/jira/browse/HBASE-6980 Project: HBase Issue Type: New Feature Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide. * For puts with WAL enabled, the bottleneck is more likely the single WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981). * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6949) Automatically delete empty directories in CleanerChore
[ https://issues.apache.org/jira/browse/HBASE-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477207#comment-13477207 ] Jesse Yates commented on HBASE-6949: [~stack], [~lhofhansl] what do you guys think? Good to go? Automatically delete empty directories in CleanerChore -- Key: HBASE-6949 URL: https://issues.apache.org/jira/browse/HBASE-6949 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.3, 0.96.0 Attachments: hbase-6949-v0.patch, hbase-6949-v1.patch Currently the CleanerChore asks cleaner delegates if both directories and files should be deleted. However, this leads to somewhat odd behavior in some delegates - you don't actually care if the directory hierarchy is preserved, the files; this means you always will delete directories and then implement the logic you actually want for preserving files. Instead we can handle this logic one layer higher in the CleanerChore and let the delegates just worry about preserving files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6797) TestHFileCleaner#testHFileCleaning sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HBASE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-6797: --- Resolution: Fixed Status: Resolved (was: Patch Available) TestHFileCleaner#testHFileCleaning sometimes fails in trunk --- Key: HBASE-6797 URL: https://issues.apache.org/jira/browse/HBASE-6797 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Attachments: hbase-6797-v0.patch, hbase-6797-v1.patch In build #3334, I saw: {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner.testHFileCleaning(TestHFileCleaner.java:88) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6858: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into trunk. Thanks all for the review. Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Priority: Critical Fix For: 0.96.0 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch, HBASE-6858_v3.patch, trunk-6858.patch, trunk-6858_v2.patch, trunk-6858_v3.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6986) Reenable TestClientTimeouts for security build
[ https://issues.apache.org/jira/browse/HBASE-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan resolved HBASE-6986. --- Resolution: Won't Fix Marking as Won't Fix. Getting it to work with both the secure and non-secure builds is difficult. The issue is you don't seem to be able to change the invocation handler for a proxy once it's been set. I want to set my own handler and dispatch through the actual invocation handler for the RpcEngine, but I don't know how to create the InvocationHandler for an arbitrary RpcEngine. I can maintain a mapping for each type of RpcEngine, but that code ended up looking pretty ugly. I still think having an RpcEngine that throws random SocketTimeoutExceptions is useful for testing, but I'll investigate doing it only on trunk via HBASE-6987. Reenable TestClientTimeouts for security build -- Key: HBASE-6986 URL: https://issues.apache.org/jira/browse/HBASE-6986 Project: HBase Issue Type: Sub-task Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.94.3 TestClientTimeouts was disabled to get 0.94.2 out the door because it didn't work in security build. Investigate and reenable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477306#comment-13477306 ] Hudson commented on HBASE-6858: --- Integrated in HBase-TRUNK #3450 (See [https://builds.apache.org/job/HBase-TRUNK/3450/]) HBASE-6858 Fix the incorrect BADVERSION checking in the recoverable zookeeper (Revision 1398920) Result = FAILURE jxiang : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Priority: Critical Fix For: 0.96.0 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch, HBASE-6858_v3.patch, trunk-6858.patch, trunk-6858_v2.patch, trunk-6858_v3.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6894) Adding metadata to a table in the shell is both arcane and painful
[ https://issues.apache.org/jira/browse/HBASE-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477352#comment-13477352 ] Sergey Shelukhin commented on HBASE-6894: - ping? :) Adding metadata to a table in the shell is both arcane and painful -- Key: HBASE-6894 URL: https://issues.apache.org/jira/browse/HBASE-6894 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.96.0 Reporter: stack Assignee: Sergey Shelukhin Labels: noob Attachments: HBASE-6894.patch, HBASE-6894.patch, HBASE-6894.patch In production we have hundreds of tables w/ whack names like 'aliaserv', 'ashish_bulk', 'age_gender_topics', etc. It be grand if you could look in master UI and see stuff like owner, eng group responsible, miscellaneous description, etc. Now, HTD has support for this; each carries a dictionary. Whats a PITA though is adding attributes to the dictionary. Here is what seems to work on trunk (though I do not trust it is doing the right thing): {code} hbase create 'SOME_TABLENAME', {NAME = 'd', VERSION = 1, COMPRESSION = 'LZO'} hbase # Here is how I added metadata hbase disable 'SOME_TABLENAME' hbase alter 'SOME_TABLENAME', METHOD = 'table_att', OWNER = 'SOMEON', CONFIG = {'ENVIRONMENT' = 'BLAH BLAH', 'SIZING' = 'The size should be between 0-10K most of the time with new URLs coming in and getting removed as they are processed unless the pipeline has fallen behind', 'MISCELLANEOUS' = 'Holds the list of URLs waiting to be processed in the parked page detection analyzer in ingestion pipeline.'} ... describe... enable... {code} The above doesn't work in 0.94. Complains about the CONFIG, the keyword we are using for the HTD dictionary. It works in 0.96 though I'd have to poke around some more to ensure it is doing the right thing. But this METHOD = 'table_att' stuff is really ugly can we fix it? And I can't add table attributes on table create seemingly. A little bit of thought and a bit of ruby could clean this all up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6583) Enhance Hbase load test tool to automatically create cf's if not present
[ https://issues.apache.org/jira/browse/HBASE-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477358#comment-13477358 ] Sergey Shelukhin commented on HBASE-6583: - as in, it creates columns if not present... although to actually use this functionality some other change would need to be made - I noticed columns are hardcoded Enhance Hbase load test tool to automatically create cf's if not present Key: HBASE-6583 URL: https://issues.apache.org/jira/browse/HBASE-6583 Project: HBase Issue Type: Bug Components: test Reporter: Karthik Ranganathan Assignee: Sergey Shelukhin Labels: noob Attachments: HBASE-6583.patch, HBASE-6583.patch The load test tool currently disables the table and applies any changes to the cf descriptor if any, but does not create the cf if not present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6583) Enhance Hbase load test tool to automatically create cf's if not present
[ https://issues.apache.org/jira/browse/HBASE-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477357#comment-13477357 ] Sergey Shelukhin commented on HBASE-6583: - verified it actually works Enhance Hbase load test tool to automatically create cf's if not present Key: HBASE-6583 URL: https://issues.apache.org/jira/browse/HBASE-6583 Project: HBase Issue Type: Bug Components: test Reporter: Karthik Ranganathan Assignee: Sergey Shelukhin Labels: noob Attachments: HBASE-6583.patch, HBASE-6583.patch The load test tool currently disables the table and applies any changes to the cf descriptor if any, but does not create the cf if not present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6974) Metric for blocked updates
[ https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-6974: - Attachment: HBASE-6974.patch First shot at this. Metric for blocked updates -- Key: HBASE-6974 URL: https://issues.apache.org/jira/browse/HBASE-6974 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Michael Drzal Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6974.patch When the disc subsystem cannot keep up with a sustained high write load, a region will eventually block updates to throttle clients. (HRegion.checkResources). It would be nice to have a metric for this, so that these occurrences can be tracked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6974) Metric for blocked updates
[ https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-6974: - Status: Patch Available (was: In Progress) Metric for blocked updates -- Key: HBASE-6974 URL: https://issues.apache.org/jira/browse/HBASE-6974 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Michael Drzal Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6974.patch When the disc subsystem cannot keep up with a sustained high write load, a region will eventually block updates to throttle clients. (HRegion.checkResources). It would be nice to have a metric for this, so that these occurrences can be tracked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
[ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reopened HBASE-6577: -- RegionScannerImpl.nextRow() should seek to next row --- Key: HBASE-6577 URL: https://issues.apache.org/jira/browse/HBASE-6577 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case we should seek to the next row rather then iterating over all versions of all columns to get there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
[ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477379#comment-13477379 ] Lars Hofhansl commented on HBASE-6577: -- This just came up on the mailing list again: {code} at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) - locked 0x00059584fab8 (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) - locked 0x00059584fab8 (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) at ... {code} zahoor mentioned there that his KVs have very many version 1500+. Presumably each new column (likely) starts on a new (HBase) block, because of the many versions, which is why we see a lot of seeking. I wonder whether a solution like the following would work: In HRegionScannerImpl.nextRow(...) we try the current naive iteration for N KVs (let's say 100). If by then we have not reached the next row, we'll issue a direct seek. That way if there are few version we avoid unnecessary seeks, but with many version we can seek past a lot of KVs (and thus also avoid unnecessary seeks). I can make a patch for that. [~jdcryans] Would you be able the recreate the issue you saw with the initial version of this patch in production? RegionScannerImpl.nextRow() should seek to next row --- Key: HBASE-6577 URL: https://issues.apache.org/jira/browse/HBASE-6577 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.3, 0.96.0 Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case we should seek to the next row rather then iterating over all versions of all columns to get there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
[ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6577: - Fix Version/s: 0.96.0 0.94.3 RegionScannerImpl.nextRow() should seek to next row --- Key: HBASE-6577 URL: https://issues.apache.org/jira/browse/HBASE-6577 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.3, 0.96.0 Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case we should seek to the next row rather then iterating over all versions of all columns to get there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6974) Metric for blocked updates
[ https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477397#comment-13477397 ] Lars Hofhansl commented on HBASE-6974: -- Looks good. Few minor comments: * I think you snug a divider by 1024 in there to convert from ms to s :) * We should also collect another metric when this situation happens in the memstore flusher (here it happens because of global memory pressure) * Let's use EnvironmentEdge.currentTimeMillis() * Nit: a call to currentTimeMillis is not free, we should only call it in the !blocked part inside the while loop (which means it cannot be final, has to be initialized with 0, etc) Metric for blocked updates -- Key: HBASE-6974 URL: https://issues.apache.org/jira/browse/HBASE-6974 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Michael Drzal Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6974.patch When the disc subsystem cannot keep up with a sustained high write load, a region will eventually block updates to throttle clients. (HRegion.checkResources). It would be nice to have a metric for this, so that these occurrences can be tracked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
[ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6577: - Status: Patch Available (was: Reopened) RegionScannerImpl.nextRow() should seek to next row --- Key: HBASE-6577 URL: https://issues.apache.org/jira/browse/HBASE-6577 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.3, 0.96.0 Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt, 6577-v3.txt RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case we should seek to the next row rather then iterating over all versions of all columns to get there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
[ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6577: - Attachment: 6577-v3.txt Something like this. The 100 should probably be configurable. This should take care of the case of a few version and the case of very many versions. ... let me know what you think. RegionScannerImpl.nextRow() should seek to next row --- Key: HBASE-6577 URL: https://issues.apache.org/jira/browse/HBASE-6577 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.3, 0.96.0 Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt, 6577-v3.txt RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case we should seek to the next row rather then iterating over all versions of all columns to get there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6974) Metric for blocked updates
[ https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477416#comment-13477416 ] Hadoop QA commented on HBASE-6974: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549387/HBASE-6974.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 82 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3058//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3058//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3058//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3058//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3058//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3058//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3058//console This message is automatically generated. Metric for blocked updates -- Key: HBASE-6974 URL: https://issues.apache.org/jira/browse/HBASE-6974 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Michael Drzal Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6974.patch When the disc subsystem cannot keep up with a sustained high write load, a region will eventually block updates to throttle clients. (HRegion.checkResources). It would be nice to have a metric for this, so that these occurrences can be tracked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5355) Compressed RPC's for HBase
[ https://issues.apache.org/jira/browse/HBASE-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477428#comment-13477428 ] Devaraj Das commented on HBASE-5355: [~lhofhansl], do you think it is okay to commit the patch since this can be configured to be off anyway? From the comments on this jira and from the Facebook reviewboard, it seems like Facebook folks have stood to gain from this feature - https://reviews.facebook.net/D1671#summary (and hence this could help other similar deployments too). What do you think? Compressed RPC's for HBase -- Key: HBASE-5355 URL: https://issues.apache.org/jira/browse/HBASE-5355 Project: HBase Issue Type: Improvement Components: IPC/RPC Affects Versions: 0.89.20100924 Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBASE-5355-0.94.patch Some application need ability to do large batched writes and reads from a remote MR cluster. These eventually get bottlenecked on the network. These results are also pretty compressible sometimes. The aim here is to add the ability to do compressed calls to the server on both the send and receive paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6410) Move RegionServer Metrics to metrics2
[ https://issues.apache.org/jira/browse/HBASE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477455#comment-13477455 ] Elliott Clark commented on HBASE-6410: -- https://reviews.apache.org/r/7616/ Move RegionServer Metrics to metrics2 - Key: HBASE-6410 URL: https://issues.apache.org/jira/browse/HBASE-6410 Project: HBase Issue Type: Sub-task Components: metrics Affects Versions: 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Blocker Attachments: HBASE-6410-1.patch, HBASE-6410-2.patch, HBASE-6410.patch Move RegionServer Metrics to metrics2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5355) Compressed RPC's for HBase
[ https://issues.apache.org/jira/browse/HBASE-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477456#comment-13477456 ] Lars Hofhansl commented on HBASE-5355: -- Would need to digest the patch some more, but I do not see any principle reason against it. Would need support the SecureRpcEngine too. Also it would be nice to get some numbers about much latency is increased. Compressed RPC's for HBase -- Key: HBASE-5355 URL: https://issues.apache.org/jira/browse/HBASE-5355 Project: HBase Issue Type: Improvement Components: IPC/RPC Affects Versions: 0.89.20100924 Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBASE-5355-0.94.patch Some application need ability to do large batched writes and reads from a remote MR cluster. These eventually get bottlenecked on the network. These results are also pretty compressible sometimes. The aim here is to add the ability to do compressed calls to the server on both the send and receive paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
[ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477458#comment-13477458 ] Hadoop QA commented on HBASE-6577: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549397/6577-v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 82 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3059//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3059//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3059//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3059//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3059//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3059//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3059//console This message is automatically generated. RegionScannerImpl.nextRow() should seek to next row --- Key: HBASE-6577 URL: https://issues.apache.org/jira/browse/HBASE-6577 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.3, 0.96.0 Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt, 6577-v3.txt RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case we should seek to the next row rather then iterating over all versions of all columns to get there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5355) Compressed RPC's for HBase
[ https://issues.apache.org/jira/browse/HBASE-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477465#comment-13477465 ] Devaraj Das commented on HBASE-5355: Thanks, [~lhofhansl], I have done the required work for making it work in trunk (via HBASE-6966). As far as I am concerned, I'd like to get the patch in 0.96. I'll try to get some latency numbers soon using that patch. Compressed RPC's for HBase -- Key: HBASE-5355 URL: https://issues.apache.org/jira/browse/HBASE-5355 Project: HBase Issue Type: Improvement Components: IPC/RPC Affects Versions: 0.89.20100924 Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBASE-5355-0.94.patch Some application need ability to do large batched writes and reads from a remote MR cluster. These eventually get bottlenecked on the network. These results are also pretty compressible sometimes. The aim here is to add the ability to do compressed calls to the server on both the send and receive paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
[ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6577: - Fix Version/s: (was: 0.94.3) I tried to reproduce the issue on the mailing (using a PrefixFilter), but I couldn't. Probably too risky at this point for 0.94. RegionScannerImpl.nextRow() should seek to next row --- Key: HBASE-6577 URL: https://issues.apache.org/jira/browse/HBASE-6577 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0 Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt, 6577-v3.txt RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case we should seek to the next row rather then iterating over all versions of all columns to get there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477494#comment-13477494 ] Hudson commented on HBASE-6858: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #223 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/223/]) HBASE-6858 Fix the incorrect BADVERSION checking in the recoverable zookeeper (Revision 1398920) Result = FAILURE jxiang : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Priority: Critical Fix For: 0.96.0 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch, HBASE-6858_v3.patch, trunk-6858.patch, trunk-6858_v2.patch, trunk-6858_v3.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6999) Start/end row should be configurable in TableInputFormat
Mikhail Bautin created HBASE-6999: - Summary: Start/end row should be configurable in TableInputFormat Key: HBASE-6999 URL: https://issues.apache.org/jira/browse/HBASE-6999 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6815) [WINDOWS] Provide hbase scripts in order to start HBASE on Windows in a single user mode
[ https://issues.apache.org/jira/browse/HBASE-6815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6815: - Assignee: Slavik Krassovsky [WINDOWS] Provide hbase scripts in order to start HBASE on Windows in a single user mode Key: HBASE-6815 URL: https://issues.apache.org/jira/browse/HBASE-6815 Project: HBase Issue Type: Sub-task Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Slavik Krassovsky Provide .cmd scripts in order to start HBASE on Windows in a single user mode -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6793) Make hbase-examples module
[ https://issues.apache.org/jira/browse/HBASE-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-6793: Attachment: HBASE-6793.patch Here's the patch #1. I'd prefer to move the remaining examples into the module as a next step, because this is already too big. Or I can keep working on the same patch, but if something intervenes there's a risk of this sitting around in JIRA :) The patch: 1) Creates the module. 2) Ports the mapreduce examples as is. 3) Ports, adds thrift-generated code, and touches up thrift (not thrift2 yet) examples: a) Java example builds and runs out of the box. b) Perl/PHP/Ruby/Python examples run, but some of them are out of date: they bail out when something that should produce error doesn't. I found some old Jira-s to fix that for Java and CPP; these should be updated similarly. I think this should also be a separate Jira (or -s). c) CPP example cannot be built in mvn in absence of native thrift/boost, so it's copied in sources and user can set up the above and run `make`. Make hbase-examples module -- Key: HBASE-6793 URL: https://issues.apache.org/jira/browse/HBASE-6793 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Sergey Shelukhin Labels: noob Attachments: HBASE-6793.patch There are some examples under /examples/, which are not compiled as a part of the build. We can move them to an hbase-examples module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6793) Make hbase-examples module
[ https://issues.apache.org/jira/browse/HBASE-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477531#comment-13477531 ] Sergey Shelukhin commented on HBASE-6793: - https://reviews.apache.org/r/7626/ Make hbase-examples module -- Key: HBASE-6793 URL: https://issues.apache.org/jira/browse/HBASE-6793 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Sergey Shelukhin Labels: noob Attachments: HBASE-6793.patch There are some examples under /examples/, which are not compiled as a part of the build. We can move them to an hbase-examples module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4962) Optimize time range scans using a delete Bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin resolved HBASE-4962. --- Resolution: Duplicate Optimize time range scans using a delete Bloom filter - Key: HBASE-4962 URL: https://issues.apache.org/jira/browse/HBASE-4962 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Pritam Damania Priority: Minor To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5032) Add other DELETE type information into the delete bloom filter to optimize the time range query
[ https://issues.apache.org/jira/browse/HBASE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5032: -- Assignee: Adela Maznikar (was: Liyin Tang) Add other DELETE type information into the delete bloom filter to optimize the time range query --- Key: HBASE-5032 URL: https://issues.apache.org/jira/browse/HBASE-5032 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Adela Maznikar To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation prospective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6793) Make hbase-examples module
[ https://issues.apache.org/jira/browse/HBASE-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477545#comment-13477545 ] Jesse Yates commented on HBASE-6793: [~sershe] good stuff sergey! its definitely a massive patch - feel free to file follow-ons for the rest of the stuff. Make hbase-examples module -- Key: HBASE-6793 URL: https://issues.apache.org/jira/browse/HBASE-6793 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Sergey Shelukhin Labels: noob Attachments: HBASE-6793.patch There are some examples under /examples/, which are not compiled as a part of the build. We can move them to an hbase-examples module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6983) Metric for unencoded size of cached blocks
[ https://issues.apache.org/jira/browse/HBASE-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-6983: --- Attachment: D5979.1.patch mbautin requested code review of [jira] [HBASE-6983] [89-fb] Metric for unencoded size of cached blocks. Reviewers: Kannan, Karthik, Liyin, aaiyer, mcorgan, JIRA We need to measure the amount of unencoded data in the block cache when data block encoding is enabled. TEST PLAN Unit tests REVISION DETAIL https://reviews.facebook.net/D5979 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java src/main/java/org/apache/hadoop/hbase/io/hfile/CachedBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/regionserver/EncodedSeekPerformanceTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java src/test/java/org/apache/hadoop/hbase/regionserver/metrics/TestSchemaMetrics.java To: JIRA Metric for unencoded size of cached blocks -- Key: HBASE-6983 URL: https://issues.apache.org/jira/browse/HBASE-6983 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Priority: Minor Attachments: D5979.1.patch We need to measure the amount of unencoded data in the block cache when data block encoding is enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477554#comment-13477554 ] Anoop Sam John commented on HBASE-6942: --- When user say he want to delete family cf1 and cf2 (with passing TS or not) Then user need to create the Scan object appropriately. Include the cf1 and cf2 in the Scan Now from the KVs we can create the Delete object {code} case FAMILY: Setbyte[] families = new TreeSetbyte[](Bytes.BYTES_COMPARATOR); for (KeyValue kv : deleteRow) { if (families.add(kv.getFamily())) { delete.deleteFamily(kv.getFamily(), ts); } } break; {code} Add family of all the KVs into Delete..Used to set to avoid duplicate calls. Am I making you clear Ted? Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477558#comment-13477558 ] Anoop Sam John commented on HBASE-6942: --- One another question Do some one tried passing an enum type via the CP Endpoints? I think it wont work.. I was checking why and found it is as per the code in HbaseObjectWritable. In the kernel code also only one enum is passed across wire I think ie.RegionOpeningState This one is specifically added to CODE_TO_CLASS and CLASS_TO_CODE Maps in HbaseObjectWritable Is it a bug we need to address? Or some where we are telling that enums can not be used? Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477566#comment-13477566 ] Lars Hofhansl commented on HBASE-6942: -- You can send the enum's ordinal number across. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5032) Add other DELETE type information into the delete bloom filter to optimize the time range query
[ https://issues.apache.org/jira/browse/HBASE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Muthukkaruppan updated HBASE-5032: - Description: To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation perspective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete. was: To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation prospective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete. Add other DELETE type information into the delete bloom filter to optimize the time range query --- Key: HBASE-5032 URL: https://issues.apache.org/jira/browse/HBASE-5032 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Adela Maznikar To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation perspective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477567#comment-13477567 ] Ted Yu commented on HBASE-6942: --- Do we support delete family cf1 and delete column qualifier cq2 of family cf2 in one request ? Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477568#comment-13477568 ] Anoop Sam John commented on HBASE-6942: --- bq.Do we support delete family cf1 and delete column qualifier cq2 of family cf2 in one request ? No Ted.. In this we can not do that... Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477570#comment-13477570 ] Anoop Sam John commented on HBASE-6942: --- See my previous comment https://issues.apache.org/jira/browse/HBASE-6942?focusedCommentId=13476126page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13476126 Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477571#comment-13477571 ] Anoop Sam John commented on HBASE-6942: --- bq.You can send the enum's ordinal number across. Yes Lars. Then we can not accept Enum types as a parameter in the CP Endpoint.. So user also need to pass the ordinal(as we don't have a client side wrapper API to call this Endpoint as of now) I shall do it like that now.. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6986) Reenable TestClientTimeouts for security build
[ https://issues.apache.org/jira/browse/HBASE-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477572#comment-13477572 ] Lars Hofhansl commented on HBASE-6986: -- +1 on punting here. We do not want to introduce more fragility into the HBase just so it can be tested. Maybe this is a case for a more powerful mocking framework so that the RPC engines can be mocked with failure insertion. Reenable TestClientTimeouts for security build -- Key: HBASE-6986 URL: https://issues.apache.org/jira/browse/HBASE-6986 Project: HBase Issue Type: Sub-task Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.94.3 TestClientTimeouts was disabled to get 0.94.2 out the door because it didn't work in security build. Investigate and reenable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477573#comment-13477573 ] Ted Yu commented on HBASE-6942: --- If we pass Scan and Delete to endpoint, we can handle arbitrary deletion requests. @Anoop: I clicked on the link above - it is not obvious which comment you were referring to. Please refer to comment by its time. Thanks Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477574#comment-13477574 ] Lars Hofhansl commented on HBASE-6942: -- Alternatively we use four integer constants, or do what you had suggested earlier and pass a Delete template object (although I still think that would be confusing). Since these are endpoints, it is also possible to just have 4 different endpoint that share some methods between them. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores
[ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477575#comment-13477575 ] Kannan Muthukkaruppan commented on HBASE-6980: -- Ramakrishna, Thanks for your email. #1. It is not clear why we even write a META entry for flushes... {code} private WALEdit completeCacheFlushLogEdit() { KeyValue kv = new KeyValue(METAROW, METAFAMILY, null, System.currentTimeMillis(), COMPLETE_CACHE_FLUSH); WALEdit e = new WALEdit(); e.add(kv); return e; } {code} The replayRecoveredEdits() logic skips over these entries anyway. And the only reference I see for this special entry in HLog is in unit tests. #2. Yes, currently there is a lot of comments (related to lastSeqWritten) before the function HLog.java:startCacheFlush(), but the logic is not very clear to me. The changes were committed as part of HBASE-3845. I think we should be able to simplify that logic. I think I see some potential bugs there even it stands now-- will need to spend some more time looking at this, and will write down an update here. But bottom line, I still don't see any good fundamental reason we need to hold this lock for the duration of the entire flush (even given the lastSeqWritten map logic). Parallel Flushing Of Memstores -- Key: HBASE-6980 URL: https://issues.apache.org/jira/browse/HBASE-6980 Project: HBase Issue Type: New Feature Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide. * For puts with WAL enabled, the bottleneck is more likely the single WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981). * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477579#comment-13477579 ] Anoop Sam John commented on HBASE-6942: --- Lars Will be better to pass the type constants (Int constants) Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477578#comment-13477578 ] Anoop Sam John commented on HBASE-6942: --- Ted Some drawbacks due to not taking Delete object 1. When it is a timestamp based delete same TS to be used for all the columns where as in normal delete diff TS can be used 2. Types can not be mixed. In normal delete one CF delete and other one's column delete and yet others version delete can be combined Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477581#comment-13477581 ] Lars Hofhansl commented on HBASE-6942: -- Yes. I *really* do not want not to make more complicated than it is. If somebody wants to delete a couple of column families and a couple of columns, it can be done with multiple roundtrips. Now, if the code can be simplified by passing a Delete object, then we should do that. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows
[ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477583#comment-13477583 ] Anoop Sam John commented on HBASE-6942: --- I will make a patch based on the delete template also... So will be easy to compare.. I will make those today Sorry was busy with meeting yday.. Endpoint implementation for bulk delete rows Key: HBASE-6942 URL: https://issues.apache.org/jira/browse/HBASE-6942 Project: HBase Issue Type: Improvement Components: Coprocessors, Performance Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys. Query like delete from table1 where... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
[ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477600#comment-13477600 ] Anoop Sam John commented on HBASE-6577: --- bq.In HRegionScannerImpl.nextRow(...) we try the current naive iteration for N KVs (let's say 100). If by then we have not reached the next row, we'll issue a direct seek. That way if there are few version we avoid unnecessary seeks Lars with HBASE-6032 in trunk, will it be a problem with calls to seek? I hope with this change now seek within same block will not have overhead. So may be we do not need configurable KV number(like 100).. Pls correct me if my understanding is wrong RegionScannerImpl.nextRow() should seek to next row --- Key: HBASE-6577 URL: https://issues.apache.org/jira/browse/HBASE-6577 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0 Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt, 6577-v3.txt RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case we should seek to the next row rather then iterating over all versions of all columns to get there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6991) Escape \ in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary()
[ https://issues.apache.org/jira/browse/HBASE-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Kishore updated HBASE-6991: -- Summary: Escape \ in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() (was: Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() are not always consistant) Escape \ in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() -- Key: HBASE-6991 URL: https://issues.apache.org/jira/browse/HBASE-6991 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Since \ is used to escape non-printable character but not treated as special character in conversion, it could lead to unexpected conversion. For example, please consider the following code snippet. {code} public void testConversion() { byte[] original = { '\\', 'x', 'A', 'D' }; String stringFromBytes = Bytes.toStringBinary(original); byte[] converted = Bytes.toBytesBinary(stringFromBytes); System.out.println(Original: + Arrays.toString(original)); System.out.println(Converted: + Arrays.toString(converted)); System.out.println(Reversible?: + (Bytes.compareTo(original, converted) == 0)); } Output: --- Original: [92, 120, 65, 68] Converted: [-83] Reversible?: false {code} The \ character needs to be treated as special and must be encoded as a non-printable character (\x5C) to avoid any kind of unambiguity during conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6991) Escape \ in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary()
[ https://issues.apache.org/jira/browse/HBASE-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Kishore updated HBASE-6991: -- Attachment: HBASE-6991_trunk.patch Attaching the patch which modifies toStringBinary() to treat \ as non-printable character and translate it to \x5C Escape \ in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() -- Key: HBASE-6991 URL: https://issues.apache.org/jira/browse/HBASE-6991 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Attachments: HBASE-6991_trunk.patch Since \ is used to escape non-printable character but not treated as special character in conversion, it could lead to unexpected conversion. For example, please consider the following code snippet. {code} public void testConversion() { byte[] original = { '\\', 'x', 'A', 'D' }; String stringFromBytes = Bytes.toStringBinary(original); byte[] converted = Bytes.toBytesBinary(stringFromBytes); System.out.println(Original: + Arrays.toString(original)); System.out.println(Converted: + Arrays.toString(converted)); System.out.println(Reversible?: + (Bytes.compareTo(original, converted) == 0)); } Output: --- Original: [92, 120, 65, 68] Converted: [-83] Reversible?: false {code} The \ character needs to be treated as special and must be encoded as a non-printable character (\x5C) to avoid any kind of unambiguity during conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477610#comment-13477610 ] Lars Hofhansl commented on HBASE-6032: -- How come we missed this for 0.94? This looks like an important performance improvement. Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.0 Attachments: 6032-ports-5987.txt, 6032-ports-5987-v2.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477612#comment-13477612 ] ramkrishna.s.vasudevan commented on HBASE-6032: --- We need this for 0.94 i think as per the changes in HBASE-6577. Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.0 Attachments: 6032-ports-5987.txt, 6032-ports-5987-v2.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6991) Escape \ in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary()
[ https://issues.apache.org/jira/browse/HBASE-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Kishore updated HBASE-6991: -- Fix Version/s: 0.96.0 Hadoop Flags: Incompatible change Status: Patch Available (was: Open) The patch include the following changes: 1. Gets rid of unnecessary byte[] to String conversion. The ISO-8859-1 charset does not do any transformation anyway. This also does away with the need of try-catch block. {code} -String first = new String(b, off, len, ISO-8859-1); -for (int i = 0; i first.length() ; ++i ) { - int ch = first.charAt(i) 0xFF; +for (int i = off; i off + len ; ++i ) { + int ch = b[i] 0xFF; {code} 2. Removed \ from the set of printable non-alphanumeric characters so that it can be escaped using the \xXX format. {code} - || `~!@#$%^*()-_=+[]{}\\|;:'\,./?.indexOf(ch) = 0 ) { + || `~!@#$%^*()-_=+[]{}|;:'\,./?.indexOf(ch) = 0 ) { {code} 3. Added new test case to verify that the conversion is reversible for random array of bytes. Without this change the test always fails. The test add 1 extra second to the test run. {code:title=hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java} + public void testToStringBytesBinaryReversible() { +// let's run test with 1000 randomly generated byte arrays +Random rand = new Random(System.currentTimeMillis()); +byte[] randomBytes = new byte[1000]; +for (int i = 0; i 1000; i++) { + rand.nextBytes(randomBytes); + verifyReversibleForBytes(randomBytes); +} + +// some specific cases +verifyReversibleForBytes(new byte[] {}); +verifyReversibleForBytes(new byte[] {'\\', 'x', 'A', 'D'}); +verifyReversibleForBytes(new byte[] {'\\', 'x', 'A', 'D', '\\'}); + } + + private void verifyReversibleForBytes(byte[] originalBytes) { +String convertedString = Bytes.toStringBinary(originalBytes); +byte[] convertedBytes = Bytes.toBytesBinary(convertedString); +if (Bytes.compareTo(originalBytes, convertedBytes) != 0) { + fail(Not reversible for\nbyte[]: + Arrays.toString(originalBytes) + + ,\nStringBinary: + convertedString); +} + } {code} 4. And finally, fixes the two test cases which were breaking because they assumed that \ is encoded as \. {code} hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java -+ \\xD46\\xEA5\\xEA3\\xEA7\\xE7\\x00LI\\s\\xA0\\x0F\\x00\\x00 ++ \\xD46\\xEA5\\xEA3\\xEA7\\xE7\\x00LI\\x5Cs\\xA0\\x0F\\x00\\x00 {code} Setting the Incompatible change flag since any other code which makes the same assumption as the two test cases needs fix. Escape \ in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() -- Key: HBASE-6991 URL: https://issues.apache.org/jira/browse/HBASE-6991 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Fix For: 0.96.0 Attachments: HBASE-6991_trunk.patch Since \ is used to escape non-printable character but not treated as special character in conversion, it could lead to unexpected conversion. For example, please consider the following code snippet. {code} public void testConversion() { byte[] original = { '\\', 'x', 'A', 'D' }; String stringFromBytes = Bytes.toStringBinary(original); byte[] converted = Bytes.toBytesBinary(stringFromBytes); System.out.println(Original: + Arrays.toString(original)); System.out.println(Converted: + Arrays.toString(converted)); System.out.println(Reversible?: + (Bytes.compareTo(original, converted) == 0)); } Output: --- Original: [92, 120, 65, 68] Converted: [-83] Reversible?: false {code} The \ character needs to be treated as special and must be encoded as a non-printable character (\x5C) to avoid any kind of unambiguity during conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477624#comment-13477624 ] Anoop Sam John commented on HBASE-6032: --- +1 for having this for 0.94 version.. In fact I was trying to make a port and test and as per that raise a new issue for porting.. Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.0 Attachments: 6032-ports-5987.txt, 6032-ports-5987-v2.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6032: - Attachment: 6032.094.txt 6032v3.txt Version of patch that will apply to 0.94 Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.0 Attachments: 6032.094.txt, 6032-ports-5987.txt, 6032-ports-5987-v2.txt, 6032v3.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477638#comment-13477638 ] stack commented on HBASE-6032: -- I've not run the tests but +1 on commit if all tests run (nice tests included w/ this patch) Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.96.0 Attachments: 6032.094.txt, 6032-ports-5987.txt, 6032-ports-5987-v2.txt, 6032v3.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira