Re: HBase mapreduce job crawls on final 25% of maps

2016-04-13 Thread Colin Kincaid Williams
It appears that my issue was caused by the missing sections I
mentioned in the second post. I ran a job with these settings, and my
job finished in < 6 hours. Thanks for your suggestions because I have
further ideas regarding issues moving forward.

scan.setCaching(500);// 1 is the default in Scan, which will
be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs



On Wed, Apr 13, 2016 at 7:32 AM, Colin Kincaid Williams  wrote:
> Hi Chien,
>
> 4. From  50-150k per * second * to 100-150k per * minute *, as stated
> above, so reads went *DOWN* significantly. I think you must have
> misread.
>
> I will take into account some of your other suggestions.
>
> Thanks,
>
> Colin
>
> On Tue, Apr 12, 2016 at 8:19 PM, Chien Le  wrote:
>> Some things I would look at:
>> 1. Node statistics, both the mapper and regionserver nodes. Make sure
>> they're on fully healthy nodes (no disk issues, no half duplex, etc) and
>> that they're not already saturated from other jobs.
>> 2. Is there a common regionserver behind the remaining mappers/regions? If
>> so, try moving some regions off to spread the load.
>> 3. Verify the locality of the region blocks to the regionserver. If you
>> don't automate major compacts or have moved regions recently, mapper
>> locality might not help. Major compact if needed or move regions if you can
>> determine source?
>> 4. You mentioned that the requests per sec has gone from 50-150k to
>> 100-150k. Was that a typo? Did the read rate really increase?
>> 5. You've listed the region sizes but was that done with a cursory hadoop
>> fs du? Have you tried using the hfile analyzer to verify number of rows and
>> sizes are roughly the same?
>> 5. profile the mappers. If you can share the task counters for a completed
>> and a still running task to compare, it might help find the issue
>> 6. I don't think you should underestimate the perf gains of node local
>> tasks vs just rack local, especially if short circuit reads are enabled.
>> This is a big gamble unfortunately given how far your tasks have been
>> running already so I'd look at this as a last resort
>>
>>
>> HTH,
>> Chien
>>
>> On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williams 
>> wrote:
>>
>>> I've noticed that I've omitted
>>>
>>> scan.setCaching(500);// 1 is the default in Scan, which will
>>> be bad for MapReduce jobs
>>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>>
>>> which appear to be suggestions from examples. Still I am not sure if
>>> this explains the significant request slowdown on the final 25% of the
>>> jobs.
>>>
>>> On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams 
>>> wrote:
>>> > Excuse my double post. I thought I deleted my draft, and then
>>> > constructed a cleaner, more detailed, more readable mail.
>>> >
>>> > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams 
>>> wrote:
>>> >> After trying to get help with distcp on hadoop-user and cdh-user
>>> >> mailing lists, I've given up on trying to use distcp and exporttable
>>> >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>>> >>
>>> >> I've been working on an hbase map reduce job to serialize my entries
>>> >> and insert them into kafka. Then I plan to re-import them into
>>> >> cdh5.3.0.
>>> >>
>>> >> Currently I'm having trouble with my map-reduce job. I have 43 maps,
>>> >> 33 which have finished successfully, and 10 which are currently still
>>> >> running. I had previously seen requests of 50-150k per second. Now for
>>> >> the final 10 maps, I'm seeing 100-150k per minute.
>>> >>
>>> >> I might also mention that there were 6 failures near the application
>>> >> start. Unfortunately, I cannot read the logs for these 6 failures.
>>> >> There is an exception related to the yarn logging for these maps,
>>> >> maybe because they failed to start.
>>> >>
>>> >> I had a look around HDFS. It appears that the regions are all between
>>> >> 5-10GB. The longest completed map so far took 7 hours, with the
>>> >> majority appearing to take around 3.5 hours .
>>> >>
>>> >> The remaining 10 maps have each been running between 23-27 hours.
>>> >>
>>> >> Considering data locality issues. 6 of the remaining jobs are running
>>> >> on the same rack. Then the other 4 are split between my other two
>>> >> racks. There should currently be a replica on each rack, since it
>>> >> appears the replicas are set to 3. Then I'm not sure this is really
>>> >> the cause of the slowdown.
>>> >>
>>> >> Then I'm looking for advice on what I can do to troubleshoot my job.
>>> >> I'm setting up my map job like:
>>> >>
>>> >> main(String[] args){
>>> >> ...
>>> >> Scan fromScan = new Scan();
>>> >> System.out.println(fromScan);
>>> >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan,
>>> Map.class,
>>> >> null, null, job, true, TableInputFormat.class);
>>> >>
>>> >> // My guess is this contols the output 

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-13 Thread Colin Kincaid Williams
Hi Chien,

4. From  50-150k per * second * to 100-150k per * minute *, as stated
above, so reads went *DOWN* significantly. I think you must have
misread.

I will take into account some of your other suggestions.

Thanks,

Colin

On Tue, Apr 12, 2016 at 8:19 PM, Chien Le  wrote:
> Some things I would look at:
> 1. Node statistics, both the mapper and regionserver nodes. Make sure
> they're on fully healthy nodes (no disk issues, no half duplex, etc) and
> that they're not already saturated from other jobs.
> 2. Is there a common regionserver behind the remaining mappers/regions? If
> so, try moving some regions off to spread the load.
> 3. Verify the locality of the region blocks to the regionserver. If you
> don't automate major compacts or have moved regions recently, mapper
> locality might not help. Major compact if needed or move regions if you can
> determine source?
> 4. You mentioned that the requests per sec has gone from 50-150k to
> 100-150k. Was that a typo? Did the read rate really increase?
> 5. You've listed the region sizes but was that done with a cursory hadoop
> fs du? Have you tried using the hfile analyzer to verify number of rows and
> sizes are roughly the same?
> 5. profile the mappers. If you can share the task counters for a completed
> and a still running task to compare, it might help find the issue
> 6. I don't think you should underestimate the perf gains of node local
> tasks vs just rack local, especially if short circuit reads are enabled.
> This is a big gamble unfortunately given how far your tasks have been
> running already so I'd look at this as a last resort
>
>
> HTH,
> Chien
>
> On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williams 
> wrote:
>
>> I've noticed that I've omitted
>>
>> scan.setCaching(500);// 1 is the default in Scan, which will
>> be bad for MapReduce jobs
>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>
>> which appear to be suggestions from examples. Still I am not sure if
>> this explains the significant request slowdown on the final 25% of the
>> jobs.
>>
>> On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams 
>> wrote:
>> > Excuse my double post. I thought I deleted my draft, and then
>> > constructed a cleaner, more detailed, more readable mail.
>> >
>> > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams 
>> wrote:
>> >> After trying to get help with distcp on hadoop-user and cdh-user
>> >> mailing lists, I've given up on trying to use distcp and exporttable
>> >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>> >>
>> >> I've been working on an hbase map reduce job to serialize my entries
>> >> and insert them into kafka. Then I plan to re-import them into
>> >> cdh5.3.0.
>> >>
>> >> Currently I'm having trouble with my map-reduce job. I have 43 maps,
>> >> 33 which have finished successfully, and 10 which are currently still
>> >> running. I had previously seen requests of 50-150k per second. Now for
>> >> the final 10 maps, I'm seeing 100-150k per minute.
>> >>
>> >> I might also mention that there were 6 failures near the application
>> >> start. Unfortunately, I cannot read the logs for these 6 failures.
>> >> There is an exception related to the yarn logging for these maps,
>> >> maybe because they failed to start.
>> >>
>> >> I had a look around HDFS. It appears that the regions are all between
>> >> 5-10GB. The longest completed map so far took 7 hours, with the
>> >> majority appearing to take around 3.5 hours .
>> >>
>> >> The remaining 10 maps have each been running between 23-27 hours.
>> >>
>> >> Considering data locality issues. 6 of the remaining jobs are running
>> >> on the same rack. Then the other 4 are split between my other two
>> >> racks. There should currently be a replica on each rack, since it
>> >> appears the replicas are set to 3. Then I'm not sure this is really
>> >> the cause of the slowdown.
>> >>
>> >> Then I'm looking for advice on what I can do to troubleshoot my job.
>> >> I'm setting up my map job like:
>> >>
>> >> main(String[] args){
>> >> ...
>> >> Scan fromScan = new Scan();
>> >> System.out.println(fromScan);
>> >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan,
>> Map.class,
>> >> null, null, job, true, TableInputFormat.class);
>> >>
>> >> // My guess is this contols the output type for the reduce function
>> >> base on setOutputKeyClass and setOutput value class from p.27 . Since
>> >> there is no reduce step, then this is currently null.
>> >> job.setOutputFormatClass(NullOutputFormat.class);
>> >> job.setNumReduceTasks(0);
>> >> job.submit();
>> >> ...
>> >> }
>> >>
>> >> I'm not performing a reduce step, and I'm traversing row keys like
>> >>
>> >> map(final ImmutableBytesWritable fromRowKey,
>> >> Result fromResult, Context context) throws IOException {
>> >> ...
>> >>   // should I assume that each keyvalue is a version of the stored
>> row?
>> >>   for 

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Chien Le
Some things I would look at:
1. Node statistics, both the mapper and regionserver nodes. Make sure
they're on fully healthy nodes (no disk issues, no half duplex, etc) and
that they're not already saturated from other jobs.
2. Is there a common regionserver behind the remaining mappers/regions? If
so, try moving some regions off to spread the load.
3. Verify the locality of the region blocks to the regionserver. If you
don't automate major compacts or have moved regions recently, mapper
locality might not help. Major compact if needed or move regions if you can
determine source?
4. You mentioned that the requests per sec has gone from 50-150k to
100-150k. Was that a typo? Did the read rate really increase?
5. You've listed the region sizes but was that done with a cursory hadoop
fs du? Have you tried using the hfile analyzer to verify number of rows and
sizes are roughly the same?
5. profile the mappers. If you can share the task counters for a completed
and a still running task to compare, it might help find the issue
6. I don't think you should underestimate the perf gains of node local
tasks vs just rack local, especially if short circuit reads are enabled.
This is a big gamble unfortunately given how far your tasks have been
running already so I'd look at this as a last resort


HTH,
Chien

On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williams 
wrote:

> I've noticed that I've omitted
>
> scan.setCaching(500);// 1 is the default in Scan, which will
> be bad for MapReduce jobs
> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>
> which appear to be suggestions from examples. Still I am not sure if
> this explains the significant request slowdown on the final 25% of the
> jobs.
>
> On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams 
> wrote:
> > Excuse my double post. I thought I deleted my draft, and then
> > constructed a cleaner, more detailed, more readable mail.
> >
> > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams 
> wrote:
> >> After trying to get help with distcp on hadoop-user and cdh-user
> >> mailing lists, I've given up on trying to use distcp and exporttable
> >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
> >>
> >> I've been working on an hbase map reduce job to serialize my entries
> >> and insert them into kafka. Then I plan to re-import them into
> >> cdh5.3.0.
> >>
> >> Currently I'm having trouble with my map-reduce job. I have 43 maps,
> >> 33 which have finished successfully, and 10 which are currently still
> >> running. I had previously seen requests of 50-150k per second. Now for
> >> the final 10 maps, I'm seeing 100-150k per minute.
> >>
> >> I might also mention that there were 6 failures near the application
> >> start. Unfortunately, I cannot read the logs for these 6 failures.
> >> There is an exception related to the yarn logging for these maps,
> >> maybe because they failed to start.
> >>
> >> I had a look around HDFS. It appears that the regions are all between
> >> 5-10GB. The longest completed map so far took 7 hours, with the
> >> majority appearing to take around 3.5 hours .
> >>
> >> The remaining 10 maps have each been running between 23-27 hours.
> >>
> >> Considering data locality issues. 6 of the remaining jobs are running
> >> on the same rack. Then the other 4 are split between my other two
> >> racks. There should currently be a replica on each rack, since it
> >> appears the replicas are set to 3. Then I'm not sure this is really
> >> the cause of the slowdown.
> >>
> >> Then I'm looking for advice on what I can do to troubleshoot my job.
> >> I'm setting up my map job like:
> >>
> >> main(String[] args){
> >> ...
> >> Scan fromScan = new Scan();
> >> System.out.println(fromScan);
> >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan,
> Map.class,
> >> null, null, job, true, TableInputFormat.class);
> >>
> >> // My guess is this contols the output type for the reduce function
> >> base on setOutputKeyClass and setOutput value class from p.27 . Since
> >> there is no reduce step, then this is currently null.
> >> job.setOutputFormatClass(NullOutputFormat.class);
> >> job.setNumReduceTasks(0);
> >> job.submit();
> >> ...
> >> }
> >>
> >> I'm not performing a reduce step, and I'm traversing row keys like
> >>
> >> map(final ImmutableBytesWritable fromRowKey,
> >> Result fromResult, Context context) throws IOException {
> >> ...
> >>   // should I assume that each keyvalue is a version of the stored
> row?
> >>   for (KeyValue kv : fromResult.raw()) {
> >> ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
> >> kv.getValue());
> >> //TODO: ADD counter for each qualifier
> >>   }
> >>
> >>
> >>
> >> I've also have a list of simple questions.
> >>
> >> Has anybody experienced a significant slowdown on map jobs related to
> >> a portion of their hbase regions? If so what issues did you come
> >> across?
> >>
> >> 

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Colin Kincaid Williams
I've noticed that I've omitted

scan.setCaching(500);// 1 is the default in Scan, which will
be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs

which appear to be suggestions from examples. Still I am not sure if
this explains the significant request slowdown on the final 25% of the
jobs.

On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams  wrote:
> Excuse my double post. I thought I deleted my draft, and then
> constructed a cleaner, more detailed, more readable mail.
>
> On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams  
> wrote:
>> After trying to get help with distcp on hadoop-user and cdh-user
>> mailing lists, I've given up on trying to use distcp and exporttable
>> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>>
>> I've been working on an hbase map reduce job to serialize my entries
>> and insert them into kafka. Then I plan to re-import them into
>> cdh5.3.0.
>>
>> Currently I'm having trouble with my map-reduce job. I have 43 maps,
>> 33 which have finished successfully, and 10 which are currently still
>> running. I had previously seen requests of 50-150k per second. Now for
>> the final 10 maps, I'm seeing 100-150k per minute.
>>
>> I might also mention that there were 6 failures near the application
>> start. Unfortunately, I cannot read the logs for these 6 failures.
>> There is an exception related to the yarn logging for these maps,
>> maybe because they failed to start.
>>
>> I had a look around HDFS. It appears that the regions are all between
>> 5-10GB. The longest completed map so far took 7 hours, with the
>> majority appearing to take around 3.5 hours .
>>
>> The remaining 10 maps have each been running between 23-27 hours.
>>
>> Considering data locality issues. 6 of the remaining jobs are running
>> on the same rack. Then the other 4 are split between my other two
>> racks. There should currently be a replica on each rack, since it
>> appears the replicas are set to 3. Then I'm not sure this is really
>> the cause of the slowdown.
>>
>> Then I'm looking for advice on what I can do to troubleshoot my job.
>> I'm setting up my map job like:
>>
>> main(String[] args){
>> ...
>> Scan fromScan = new Scan();
>> System.out.println(fromScan);
>> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class,
>> null, null, job, true, TableInputFormat.class);
>>
>> // My guess is this contols the output type for the reduce function
>> base on setOutputKeyClass and setOutput value class from p.27 . Since
>> there is no reduce step, then this is currently null.
>> job.setOutputFormatClass(NullOutputFormat.class);
>> job.setNumReduceTasks(0);
>> job.submit();
>> ...
>> }
>>
>> I'm not performing a reduce step, and I'm traversing row keys like
>>
>> map(final ImmutableBytesWritable fromRowKey,
>> Result fromResult, Context context) throws IOException {
>> ...
>>   // should I assume that each keyvalue is a version of the stored row?
>>   for (KeyValue kv : fromResult.raw()) {
>> ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
>> kv.getValue());
>> //TODO: ADD counter for each qualifier
>>   }
>>
>>
>>
>> I've also have a list of simple questions.
>>
>> Has anybody experienced a significant slowdown on map jobs related to
>> a portion of their hbase regions? If so what issues did you come
>> across?
>>
>> Can I get a suggestion how to show which map corresponds to which
>> region, so I can troubleshoot from there? Is this already logged
>> somewhere by default, or is there a way to set this up with the
>> TableMapReduceUtil.initTableMapperJob ?
>>
>> Any other suggestions would be appreciated.


Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Colin Kincaid Williams
Excuse my double post. I thought I deleted my draft, and then
constructed a cleaner, more detailed, more readable mail.

On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams  wrote:
> After trying to get help with distcp on hadoop-user and cdh-user
> mailing lists, I've given up on trying to use distcp and exporttable
> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>
> I've been working on an hbase map reduce job to serialize my entries
> and insert them into kafka. Then I plan to re-import them into
> cdh5.3.0.
>
> Currently I'm having trouble with my map-reduce job. I have 43 maps,
> 33 which have finished successfully, and 10 which are currently still
> running. I had previously seen requests of 50-150k per second. Now for
> the final 10 maps, I'm seeing 100-150k per minute.
>
> I might also mention that there were 6 failures near the application
> start. Unfortunately, I cannot read the logs for these 6 failures.
> There is an exception related to the yarn logging for these maps,
> maybe because they failed to start.
>
> I had a look around HDFS. It appears that the regions are all between
> 5-10GB. The longest completed map so far took 7 hours, with the
> majority appearing to take around 3.5 hours .
>
> The remaining 10 maps have each been running between 23-27 hours.
>
> Considering data locality issues. 6 of the remaining jobs are running
> on the same rack. Then the other 4 are split between my other two
> racks. There should currently be a replica on each rack, since it
> appears the replicas are set to 3. Then I'm not sure this is really
> the cause of the slowdown.
>
> Then I'm looking for advice on what I can do to troubleshoot my job.
> I'm setting up my map job like:
>
> main(String[] args){
> ...
> Scan fromScan = new Scan();
> System.out.println(fromScan);
> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class,
> null, null, job, true, TableInputFormat.class);
>
> // My guess is this contols the output type for the reduce function
> base on setOutputKeyClass and setOutput value class from p.27 . Since
> there is no reduce step, then this is currently null.
> job.setOutputFormatClass(NullOutputFormat.class);
> job.setNumReduceTasks(0);
> job.submit();
> ...
> }
>
> I'm not performing a reduce step, and I'm traversing row keys like
>
> map(final ImmutableBytesWritable fromRowKey,
> Result fromResult, Context context) throws IOException {
> ...
>   // should I assume that each keyvalue is a version of the stored row?
>   for (KeyValue kv : fromResult.raw()) {
> ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
> kv.getValue());
> //TODO: ADD counter for each qualifier
>   }
>
>
>
> I've also have a list of simple questions.
>
> Has anybody experienced a significant slowdown on map jobs related to
> a portion of their hbase regions? If so what issues did you come
> across?
>
> Can I get a suggestion how to show which map corresponds to which
> region, so I can troubleshoot from there? Is this already logged
> somewhere by default, or is there a way to set this up with the
> TableMapReduceUtil.initTableMapperJob ?
>
> Any other suggestions would be appreciated.


HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Colin Kincaid Williams
After trying to get help with distcp on hadoop-user and cdh-user
mailing lists, I've given up on trying to use distcp and exporttable
to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0

I've been working on an hbase map reduce job to serialize my entries
and insert them into kafka. Then I plan to re-import them into
cdh5.3.0.

Currently I'm having trouble with my map-reduce job. I have 43 maps,
33 which have finished successfully, and 10 which are currently still
running. I had previously seen requests of 50-150k per second. Now for
the final 10 maps, I'm seeing 100-150k per minute.

I might also mention that there were 6 failures near the application
start. Unfortunately, I cannot read the logs for these 6 failures.
There is an exception related to the yarn logging for these maps,
maybe because they failed to start.

I had a look around HDFS. It appears that the regions are all between
5-10GB. The longest completed map so far took 7 hours, with the
majority appearing to take around 3.5 hours .

The remaining 10 maps have each been running between 23-27 hours.

Considering data locality issues. 6 of the remaining jobs are running
on the same rack. Then the other 4 are split between my other two
racks. There should currently be a replica on each rack, since it
appears the replicas are set to 3. Then I'm not sure this is really
the cause of the slowdown.

Then I'm looking for advice on what I can do to troubleshoot my job.
I'm setting up my map job like:

main(String[] args){
...
Scan fromScan = new Scan();
System.out.println(fromScan);
TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class,
null, null, job, true, TableInputFormat.class);

// My guess is this contols the output type for the reduce function
base on setOutputKeyClass and setOutput value class from p.27 . Since
there is no reduce step, then this is currently null.
job.setOutputFormatClass(NullOutputFormat.class);
job.setNumReduceTasks(0);
job.submit();
...
}

I'm not performing a reduce step, and I'm traversing row keys like

map(final ImmutableBytesWritable fromRowKey,
Result fromResult, Context context) throws IOException {
...
  // should I assume that each keyvalue is a version of the stored row?
  for (KeyValue kv : fromResult.raw()) {
ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
kv.getValue());
//TODO: ADD counter for each qualifier
  }



I've also have a list of simple questions.

Has anybody experienced a significant slowdown on map jobs related to
a portion of their hbase regions? If so what issues did you come
across?

Can I get a suggestion how to show which map corresponds to which
region, so I can troubleshoot from there? Is this already logged
somewhere by default, or is there a way to set this up with the
TableMapReduceUtil.initTableMapperJob ?

Any other suggestions would be appreciated.