[
https://issues.apache.org/jira/browse/HBASE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruslan Salyakhov updated HBASE-2378:
------------------------------------
Attachment: MyReadPerformance.java
Attached is the read test (MyReadPerformance) to check empty vals:
* tst_hfiles_01 (prepared with one reducer)
{code}
$ hadoop jar keyvalue-poc.jar MyReadPerformance -in
/test_hbase/my_sample_log_1k.txt -out /test_hbase/res/01/ -table tst_hfiles_01
...
10/03/25 23:16:49 INFO mapred.JobClient: MyReadPerformance$Counters
10/03/25 23:16:49 INFO mapred.JobClient: TOTAL_READ_ROWS=999
10/03/25 23:16:49 INFO mapred.JobClient: BAD_RECORDS=1
10/03/25 23:16:49 INFO mapred.JobClient: TOTAL_READ_TIME=72
10/03/25 23:16:49 INFO mapred.JobClient: Job Counters
...
{code}
* tst_hfiles_02 (prepared with two reducers)
{code}
$ hadoop jar keyvalue-poc.jar MyReadPerformance -in
/test_hbase/my_sample_log_1k.txt -out /test_hbase/res/02/ -table tst_hfiles_02
...
10/03/25 23:17:40 INFO mapred.JobClient: MyReadPerformance$Counters
10/03/25 23:17:40 INFO mapred.JobClient: TOTAL_READ_ROWS=482
10/03/25 23:17:40 INFO mapred.JobClient: BAD_RECORDS=1
10/03/25 23:17:40 INFO mapred.JobClient: TOTAL_READ_TIME=59
10/03/25 23:17:40 INFO mapred.JobClient: NULL_VALUE=517
10/03/25 23:17:40 INFO mapred.JobClient: Job Counters
...
{code}
> Bulk insert with multiple reducers
> ----------------------------------
>
> Key: HBASE-2378
> URL: https://issues.apache.org/jira/browse/HBASE-2378
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.20.3
> Reporter: Ruslan Salyakhov
> Attachments: HFileOutputFormat.java, my_sample_log_1k.txt,
> MyHFilesWriter.java, MyKeyComparator.java, MyReadPerformance.java,
> MySampler.java, TestTotalOrderPartitionerForMyKeys.java,
> TotalOrderPartitioner.java
>
>
> If I run MR to prepare HFIles with more than one reducer then some values for
> keys are not appeared in the table after loadtable.rb script execution. With
> one reducer everything works fine.
> References:
> http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
> - the row id must be formatted as a ImmutableBytesWritable
> - MR job should ensure a total ordering among all keys
> MAPREDUCE-366 (patch-5668-3.txt)
> - TotalOrderPartitioner that uses the new API (attached)
> HBASE-2063
> - patched HFileOutputFormat (attached)
> Input data (attached):
> * my_sample_log_1k.txt - sample data, input for MyHFilesWriter
> Source (attached):
> * MyKeyComparator.java - comparator for my ImmutableBytesWritable keys
> * TestTotalOrderPartitionerForMyKeys.java - test case for my keys (note that
> I've set up MyKeyComparator to pass that test)
> * MyHFilesWriter.java - My MR job to prepare HFiles
> * HFileOutputFormat.java - from MAPREDUCE-366
> * TotalOrderPartitioner.java - from MAPREDUCE-366
> * MySampler.java - My RandomSampler based on Sampler from MAPREDUCE-366 BUT
> I've put the following string into getSample method (without that string it
> doesn't work):
> {code}
> reader.initialize(splits.get(i), new
> TaskAttemptContext(job.getConfiguration(), new TaskAttemptID()));
> {code}
> Test case:
> # comment the following string in MyHFilesWriter:
> //job.setSortComparatorClass(MyKeyComparator.class);
> # hadoop jar keyvalue-poc.jar MyHFilesWriter -in
> /test_hbase/my_sample_log_1k.txt -out /test_hbase/hfiles/01/ -r 1
> # hadoop jar keyvalue-poc.jar MyHFilesWriter -in
> /test_hbase/my_sample_log_1k.txt -out /test_hbase/hfiles/02/ -r 2
> # hbase> create 'tst_hfiles_01', {NAME => 'vals'}
> # hbase> create 'tst_hfiles_02', {NAME => 'vals'}
> # hbase org.jruby.Main /usr/lib/hbase-0.20/bin/loadtable.rb tst_hfiles_01
> /test_hbase/hfiles/01
> # hbase org.jruby.Main /usr/lib/hbase-0.20/bin/loadtable.rb tst_hfiles_02
> /test_hbase/hfiles/02
> # check values for keys
> # uncomment the following string in MyHFilesWriter:
> //job.setSortComparatorClass(MyKeyComparator.class);
> # hadoop jar keyvalue-poc.jar MyHFilesWriter -in
> /test_hbase/my_sample_log_1k.txt -out /test_hbase/hfiles/03/ -r 2
> for example, results:
> {code}
> hbase(main):006:0* count 'tst_hfiles_01', 100
> Current count: 100, row: 0.14.USA.IL.602.ELMHURST.1.1.0.0
>
> Current count: 200, row: 0.245.USA.ME.500.PORTLAND.1.1.0.0
>
> Current count: 300, row: 0.34.USA.FL.Rollup.Rollup.1.1.0.0
>
> Current count: 400, row: 0.443.USA.CA.803.LOS.ANGELES.1.1.0
>
> Current count: 500, row: 0.8.USA.CO.751.CASTLE.ROCK.1.1.0
>
> Current count: 600, row: 1.14.DZA.Rollup.Rollup.Rollup.1.1.0.1
>
> Current count: 700, row: 1.159.SWE.AB.Rollup.Rollup.1.1.0.1
>
> Current count: 800, row: 1.17.USA.TN.659.CLARKSVILLE.1.1.0.1
>
> Current count: 900, row: 1.220.USA.MI.505.SOUTHFIELD.1.1.0.1
>
> 999 row(s) in 0.0930 seconds
> hbase(main):007:0> count 'tst_hfiles_02', 100
> Current count: 100, row: 0.231.USA.GA.524.BUFORD.1.1.0.1
>
> Current count: 200, row: 0.4.USA.VA.573.Rollup.1.1.0.0
>
> Current count: 300, row: 0.9.ROU.B.-1.BUCHAREST.1.1.0.0
>
> Current count: 400, row: 1.16.USA.IA.679.Rollup.1.1.1.0
>
> Current count: 500, row: 1.245.NOR.03.-1.OSLO.1.1.0.0
>
> Current count: 600, row: 0.245.GBR.ENG.826005.BEXLEY.1.1.0.1
>
> Current count: 700, row: 0.48.GBR.ENG.826027.Rollup.1.1.0.1
>
> Current count: 800, row: 1.14.SWE.Rollup.Rollup.Rollup.1.1.0.1
>
> Current count: 900, row: 1.201.GBR.ENG.826005.LONDON.1.1.0.1
>
> 999 row(s) in 0.1630 seconds
> hbase(main):008:0> get 'tst_hfiles_01', '0.14.USA.IL.602.ELMHURST.1.1.0.0'
> COLUMN CELL
>
> vals:key0 timestamp=1269542753914, value=0
>
> vals:key1 timestamp=1269542753914, value=14
>
> vals:key2 timestamp=1269542753914, value=USA
>
> vals:key3 timestamp=1269542753914, value=IL
>
> vals:key4 timestamp=1269542753914, value=602
>
> vals:key5 timestamp=1269542753914, value=ELMHURST
>
> vals:key6 timestamp=1269542753914, value=1
>
> vals:key7 timestamp=1269542753914, value=1
>
> vals:key8 timestamp=1269542753914, value=0
>
> vals:key9 timestamp=1269542753914, value=0
>
> vals:val0 timestamp=1269542753914, value=2
>
> 11 row(s) in 0.0160 seconds
> hbase(main):009:0> get 'tst_hfiles_02', '0.14.USA.IL.602.ELMHURST.1.1.0.0'
> COLUMN CELL
>
> 0 row(s) in 0.0220 seconds
> {code}
> with MyKeyComparator
> {code}
> java.io.IOException: Added a key not lexically larger than previous
> key=.103.FRA.V.-1.LYON.1.1.0.0valskey0'XXX,
> lastkey=1.20.USA.AOL.0.AOL.1.1.0.0valsval0'XXX
> at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkKey(HFile.java:551)
> at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:513)
> at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:481)
> at
> com.contextweb.hadoop.hbase.mapred.HFileOutputFormat$1.write(HFileOutputFormat.java:77)
> at
> com.contextweb.hadoop.hbase.mapred.HFileOutputFormat$1.write(HFileOutputFormat.java:49)
> at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> org.apache.hadoop.hbase.mapreduce.KeyValueSortReducer.reduce(KeyValueSortReducer.java:46)
> at
> org.apache.hadoop.hbase.mapreduce.KeyValueSortReducer.reduce(KeyValueSortReducer.java:35)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.