[ https://issues.apache.org/jira/browse/HBASE-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461184#comment-13461184 ]
Alexander Alten-Lorenz commented on HBASE-6694: ----------------------------------------------- Confirmed that the patch is working. Job.xml contains: I created over a whole day CF's: hbase shell> for r in 1 .. 10 do for c in 1 .. 100000000 do put 'test1', "row-#{r}", "cf1:c#{c}", "1" end end =========== With -Dhbase.export.scanner.batch=100: HBASE_CLASSPATH="/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.0.1.jar" bin/hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.export.scanner.batch=100 test1 /home/hdfs/test2.export --- {code} 12/09/22 01:17:23 DEBUG mapreduce.TableInputFormatBase: getSplits: split -> 0 -> hadoop4.internal:, 12/09/22 01:17:24 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 12/09/22 01:17:24 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 12/09/22 01:17:25 INFO mapred.JobClient: Running job: job_201209212254_0010 12/09/22 01:17:26 INFO mapred.JobClient: map 0% reduce 0% 12/09/22 01:17:59 INFO mapred.JobClient: map 100% reduce 0% 12/09/22 01:18:02 INFO mapred.JobClient: Job complete: job_201209212254_0010 12/09/22 01:18:03 INFO mapred.JobClient: Counters: 24 12/09/22 01:18:03 INFO mapred.JobClient: File System Counters 12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of bytes read=0 12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of bytes written=84332 12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of read operations=0 12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of large read operations=0 12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of write operations=0 12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of bytes read=70 12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of bytes written=62070728 12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of read operations=1 12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of large read operations=0 12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of write operations=1 12/09/22 01:18:03 INFO mapred.JobClient: Job Counters 12/09/22 01:18:03 INFO mapred.JobClient: Launched map tasks=1 12/09/22 01:18:03 INFO mapred.JobClient: Data-local map tasks=1 12/09/22 01:18:03 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=35760 12/09/22 01:18:03 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 12/09/22 01:18:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/09/22 01:18:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/09/22 01:18:03 INFO mapred.JobClient: Map-Reduce Framework 12/09/22 01:18:03 INFO mapred.JobClient: Map input records=15258 12/09/22 01:18:03 INFO mapred.JobClient: Map output records=15258 12/09/22 01:18:03 INFO mapred.JobClient: Input split bytes=70 12/09/22 01:18:03 INFO mapred.JobClient: Spilled Records=0 12/09/22 01:18:03 INFO mapred.JobClient: CPU time spent (ms)=5970 12/09/22 01:18:03 INFO mapred.JobClient: Physical memory (bytes) snapshot=106557440 12/09/22 01:18:03 INFO mapred.JobClient: Virtual memory (bytes) snapshot=570249216 12/09/22 01:18:03 INFO mapred.JobClient: Total committed heap usage (bytes)=42663936 {code} Export readable in hdfs. ========== Without -D switch: RS timed out: {code} 2012-09-22 01:27:27,937 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired /hbase/splitlog/hdfs%3A%2F%2Fhadoop4%3A8020%2Fhbase%2F.logs%2Fhadoop4.internal%2C60020%2C1348269190284-splitting%2Fhadoop4.internal%252C60020%252C1348269190284.1348269200784 ver = 0 2012-09-22 01:27:28,938 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 2012-09-22 01:27:28,938 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-09-22 01:27:29,237 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired /hbase/splitlog/hdfs%3A%2F%2Fhadoop4%3A8020%2Fhbase%2F.logs%2Fhadoop4.internal%2C60020%2C1348269190284-splitting%2Fhadoop4.internal%252C60020%252C1348269190284.1348269200784 ver = 0 2012-09-22 01:27:29,239 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/RESCAN0000000057 entered state done hadoop4.internal,60000,1348269184131 2012-09-22 01:27:29,239 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/RESCAN0000000058 entered state done hadoop4.internal,60000,1348269184131 2012-09-22 01:27:29,282 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/RESCAN0000000057 2012-09-22 01:27:29,282 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: deleted task without in memory state /hbase/splitlog/RESCAN0000000057 2012-09-22 01:27:29,283 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/RESCAN0000000058 2012-09-22 01:27:29,283 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: deleted task without in memory state /hbase/splitlog/RESCAN0000000058 {code} ==== Test environment: Virtual machine, 2GB RAM, 512MB exported Heap for Hbase, Hadoop cluster mode, HBase pseudo distributed. I would say, it worked. > Test scanner batching in export job feature HBASE-6372 AND report on > improvement HBASE-6372 adds > ------------------------------------------------------------------------------------------------ > > Key: HBASE-6694 > URL: https://issues.apache.org/jira/browse/HBASE-6694 > Project: HBase > Issue Type: Task > Reporter: stack > Assignee: Alexander Alten-Lorenz > Attachments: HBASE-6694.patch > > > From tail of HBASE-6372, Jon had raised issue that test added did not > actually test the feature. This issue is about adding a test of HBASE-6372. > We should also have numbers for the improvement that HBASE-6372 brings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira