Re: input split for hbase mapreduce

2017-05-09 Thread Ted Yu
Please take a look at map() method of Mapper classes in the code base.
e.g.

hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/GroupingTableMapper.java



On Tue, May 9, 2017 at 2:51 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com>
wrote:

> Hi
>
>If I am running mapreduce on hbase tables what will be the input to
> mapper function
>
> Thanks
>


input split for hbase mapreduce

2017-05-09 Thread Rajeshkumar J
Hi

   If I am running mapreduce on hbase tables what will be the input to
mapper function

Thanks


Re: HBase mapreduce job crawls on final 25% of maps

2016-04-13 Thread Colin Kincaid Williams
It appears that my issue was caused by the missing sections I
mentioned in the second post. I ran a job with these settings, and my
job finished in < 6 hours. Thanks for your suggestions because I have
further ideas regarding issues moving forward.

scan.setCaching(500);// 1 is the default in Scan, which will
be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs



On Wed, Apr 13, 2016 at 7:32 AM, Colin Kincaid Williams  wrote:
> Hi Chien,
>
> 4. From  50-150k per * second * to 100-150k per * minute *, as stated
> above, so reads went *DOWN* significantly. I think you must have
> misread.
>
> I will take into account some of your other suggestions.
>
> Thanks,
>
> Colin
>
> On Tue, Apr 12, 2016 at 8:19 PM, Chien Le  wrote:
>> Some things I would look at:
>> 1. Node statistics, both the mapper and regionserver nodes. Make sure
>> they're on fully healthy nodes (no disk issues, no half duplex, etc) and
>> that they're not already saturated from other jobs.
>> 2. Is there a common regionserver behind the remaining mappers/regions? If
>> so, try moving some regions off to spread the load.
>> 3. Verify the locality of the region blocks to the regionserver. If you
>> don't automate major compacts or have moved regions recently, mapper
>> locality might not help. Major compact if needed or move regions if you can
>> determine source?
>> 4. You mentioned that the requests per sec has gone from 50-150k to
>> 100-150k. Was that a typo? Did the read rate really increase?
>> 5. You've listed the region sizes but was that done with a cursory hadoop
>> fs du? Have you tried using the hfile analyzer to verify number of rows and
>> sizes are roughly the same?
>> 5. profile the mappers. If you can share the task counters for a completed
>> and a still running task to compare, it might help find the issue
>> 6. I don't think you should underestimate the perf gains of node local
>> tasks vs just rack local, especially if short circuit reads are enabled.
>> This is a big gamble unfortunately given how far your tasks have been
>> running already so I'd look at this as a last resort
>>
>>
>> HTH,
>> Chien
>>
>> On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williams 
>> wrote:
>>
>>> I've noticed that I've omitted
>>>
>>> scan.setCaching(500);// 1 is the default in Scan, which will
>>> be bad for MapReduce jobs
>>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>>
>>> which appear to be suggestions from examples. Still I am not sure if
>>> this explains the significant request slowdown on the final 25% of the
>>> jobs.
>>>
>>> On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams 
>>> wrote:
>>> > Excuse my double post. I thought I deleted my draft, and then
>>> > constructed a cleaner, more detailed, more readable mail.
>>> >
>>> > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams 
>>> wrote:
>>> >> After trying to get help with distcp on hadoop-user and cdh-user
>>> >> mailing lists, I've given up on trying to use distcp and exporttable
>>> >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>>> >>
>>> >> I've been working on an hbase map reduce job to serialize my entries
>>> >> and insert them into kafka. Then I plan to re-import them into
>>> >> cdh5.3.0.
>>> >>
>>> >> Currently I'm having trouble with my map-reduce job. I have 43 maps,
>>> >> 33 which have finished successfully, and 10 which are currently still
>>> >> running. I had previously seen requests of 50-150k per second. Now for
>>> >> the final 10 maps, I'm seeing 100-150k per minute.
>>> >>
>>> >> I might also mention that there were 6 failures near the application
>>> >> start. Unfortunately, I cannot read the logs for these 6 failures.
>>> >> There is an exception related to the yarn logging for these maps,
>>> >> maybe because they failed to start.
>>> >>
>>> >> I had a look around HDFS. It appears that the regions are all between
>>> >> 5-10GB. The longest completed map so far took 7 hours, with the
>>> >> majority appearing to take around 3.5 hours .
>>> >>
>>> >> The remaining 10 maps have each been running between 23-27 hours.
>>> >>
>>> >> Considering data locality issues. 6 of the remaining jobs are running
>>> >> on the same rack. Then the other 4 are split between my other two
>>> >> racks. There should currently be a replica on each rack, since it
>>> >> appears the replicas are set to 3. Then I'm not sure this is really
>>> >> the cause of the slowdown.
>>> >>
>>> >> Then I'm looking for advice on what I can do to troubleshoot my job.
>>> >> I'm setting up my map job like:
>>> >>
>>> >> main(String[] args){
>>> >> ...
>>> >> Scan fromScan = new Scan();
>>> >> System.out.println(fromScan);
>>> >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan,
>>> Map.class,
>>> >> null, null, job, true, TableInputFormat.class);
>>> >>
>>> >> // My guess is this contols the output 

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-13 Thread Colin Kincaid Williams
Hi Chien,

4. From  50-150k per * second * to 100-150k per * minute *, as stated
above, so reads went *DOWN* significantly. I think you must have
misread.

I will take into account some of your other suggestions.

Thanks,

Colin

On Tue, Apr 12, 2016 at 8:19 PM, Chien Le  wrote:
> Some things I would look at:
> 1. Node statistics, both the mapper and regionserver nodes. Make sure
> they're on fully healthy nodes (no disk issues, no half duplex, etc) and
> that they're not already saturated from other jobs.
> 2. Is there a common regionserver behind the remaining mappers/regions? If
> so, try moving some regions off to spread the load.
> 3. Verify the locality of the region blocks to the regionserver. If you
> don't automate major compacts or have moved regions recently, mapper
> locality might not help. Major compact if needed or move regions if you can
> determine source?
> 4. You mentioned that the requests per sec has gone from 50-150k to
> 100-150k. Was that a typo? Did the read rate really increase?
> 5. You've listed the region sizes but was that done with a cursory hadoop
> fs du? Have you tried using the hfile analyzer to verify number of rows and
> sizes are roughly the same?
> 5. profile the mappers. If you can share the task counters for a completed
> and a still running task to compare, it might help find the issue
> 6. I don't think you should underestimate the perf gains of node local
> tasks vs just rack local, especially if short circuit reads are enabled.
> This is a big gamble unfortunately given how far your tasks have been
> running already so I'd look at this as a last resort
>
>
> HTH,
> Chien
>
> On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williams 
> wrote:
>
>> I've noticed that I've omitted
>>
>> scan.setCaching(500);// 1 is the default in Scan, which will
>> be bad for MapReduce jobs
>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>
>> which appear to be suggestions from examples. Still I am not sure if
>> this explains the significant request slowdown on the final 25% of the
>> jobs.
>>
>> On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams 
>> wrote:
>> > Excuse my double post. I thought I deleted my draft, and then
>> > constructed a cleaner, more detailed, more readable mail.
>> >
>> > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams 
>> wrote:
>> >> After trying to get help with distcp on hadoop-user and cdh-user
>> >> mailing lists, I've given up on trying to use distcp and exporttable
>> >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>> >>
>> >> I've been working on an hbase map reduce job to serialize my entries
>> >> and insert them into kafka. Then I plan to re-import them into
>> >> cdh5.3.0.
>> >>
>> >> Currently I'm having trouble with my map-reduce job. I have 43 maps,
>> >> 33 which have finished successfully, and 10 which are currently still
>> >> running. I had previously seen requests of 50-150k per second. Now for
>> >> the final 10 maps, I'm seeing 100-150k per minute.
>> >>
>> >> I might also mention that there were 6 failures near the application
>> >> start. Unfortunately, I cannot read the logs for these 6 failures.
>> >> There is an exception related to the yarn logging for these maps,
>> >> maybe because they failed to start.
>> >>
>> >> I had a look around HDFS. It appears that the regions are all between
>> >> 5-10GB. The longest completed map so far took 7 hours, with the
>> >> majority appearing to take around 3.5 hours .
>> >>
>> >> The remaining 10 maps have each been running between 23-27 hours.
>> >>
>> >> Considering data locality issues. 6 of the remaining jobs are running
>> >> on the same rack. Then the other 4 are split between my other two
>> >> racks. There should currently be a replica on each rack, since it
>> >> appears the replicas are set to 3. Then I'm not sure this is really
>> >> the cause of the slowdown.
>> >>
>> >> Then I'm looking for advice on what I can do to troubleshoot my job.
>> >> I'm setting up my map job like:
>> >>
>> >> main(String[] args){
>> >> ...
>> >> Scan fromScan = new Scan();
>> >> System.out.println(fromScan);
>> >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan,
>> Map.class,
>> >> null, null, job, true, TableInputFormat.class);
>> >>
>> >> // My guess is this contols the output type for the reduce function
>> >> base on setOutputKeyClass and setOutput value class from p.27 . Since
>> >> there is no reduce step, then this is currently null.
>> >> job.setOutputFormatClass(NullOutputFormat.class);
>> >> job.setNumReduceTasks(0);
>> >> job.submit();
>> >> ...
>> >> }
>> >>
>> >> I'm not performing a reduce step, and I'm traversing row keys like
>> >>
>> >> map(final ImmutableBytesWritable fromRowKey,
>> >> Result fromResult, Context context) throws IOException {
>> >> ...
>> >>   // should I assume that each keyvalue is a version of the stored
>> row?
>> >>   for 

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Chien Le
Some things I would look at:
1. Node statistics, both the mapper and regionserver nodes. Make sure
they're on fully healthy nodes (no disk issues, no half duplex, etc) and
that they're not already saturated from other jobs.
2. Is there a common regionserver behind the remaining mappers/regions? If
so, try moving some regions off to spread the load.
3. Verify the locality of the region blocks to the regionserver. If you
don't automate major compacts or have moved regions recently, mapper
locality might not help. Major compact if needed or move regions if you can
determine source?
4. You mentioned that the requests per sec has gone from 50-150k to
100-150k. Was that a typo? Did the read rate really increase?
5. You've listed the region sizes but was that done with a cursory hadoop
fs du? Have you tried using the hfile analyzer to verify number of rows and
sizes are roughly the same?
5. profile the mappers. If you can share the task counters for a completed
and a still running task to compare, it might help find the issue
6. I don't think you should underestimate the perf gains of node local
tasks vs just rack local, especially if short circuit reads are enabled.
This is a big gamble unfortunately given how far your tasks have been
running already so I'd look at this as a last resort


HTH,
Chien

On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williams 
wrote:

> I've noticed that I've omitted
>
> scan.setCaching(500);// 1 is the default in Scan, which will
> be bad for MapReduce jobs
> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>
> which appear to be suggestions from examples. Still I am not sure if
> this explains the significant request slowdown on the final 25% of the
> jobs.
>
> On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams 
> wrote:
> > Excuse my double post. I thought I deleted my draft, and then
> > constructed a cleaner, more detailed, more readable mail.
> >
> > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams 
> wrote:
> >> After trying to get help with distcp on hadoop-user and cdh-user
> >> mailing lists, I've given up on trying to use distcp and exporttable
> >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
> >>
> >> I've been working on an hbase map reduce job to serialize my entries
> >> and insert them into kafka. Then I plan to re-import them into
> >> cdh5.3.0.
> >>
> >> Currently I'm having trouble with my map-reduce job. I have 43 maps,
> >> 33 which have finished successfully, and 10 which are currently still
> >> running. I had previously seen requests of 50-150k per second. Now for
> >> the final 10 maps, I'm seeing 100-150k per minute.
> >>
> >> I might also mention that there were 6 failures near the application
> >> start. Unfortunately, I cannot read the logs for these 6 failures.
> >> There is an exception related to the yarn logging for these maps,
> >> maybe because they failed to start.
> >>
> >> I had a look around HDFS. It appears that the regions are all between
> >> 5-10GB. The longest completed map so far took 7 hours, with the
> >> majority appearing to take around 3.5 hours .
> >>
> >> The remaining 10 maps have each been running between 23-27 hours.
> >>
> >> Considering data locality issues. 6 of the remaining jobs are running
> >> on the same rack. Then the other 4 are split between my other two
> >> racks. There should currently be a replica on each rack, since it
> >> appears the replicas are set to 3. Then I'm not sure this is really
> >> the cause of the slowdown.
> >>
> >> Then I'm looking for advice on what I can do to troubleshoot my job.
> >> I'm setting up my map job like:
> >>
> >> main(String[] args){
> >> ...
> >> Scan fromScan = new Scan();
> >> System.out.println(fromScan);
> >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan,
> Map.class,
> >> null, null, job, true, TableInputFormat.class);
> >>
> >> // My guess is this contols the output type for the reduce function
> >> base on setOutputKeyClass and setOutput value class from p.27 . Since
> >> there is no reduce step, then this is currently null.
> >> job.setOutputFormatClass(NullOutputFormat.class);
> >> job.setNumReduceTasks(0);
> >> job.submit();
> >> ...
> >> }
> >>
> >> I'm not performing a reduce step, and I'm traversing row keys like
> >>
> >> map(final ImmutableBytesWritable fromRowKey,
> >> Result fromResult, Context context) throws IOException {
> >> ...
> >>   // should I assume that each keyvalue is a version of the stored
> row?
> >>   for (KeyValue kv : fromResult.raw()) {
> >> ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
> >> kv.getValue());
> >> //TODO: ADD counter for each qualifier
> >>   }
> >>
> >>
> >>
> >> I've also have a list of simple questions.
> >>
> >> Has anybody experienced a significant slowdown on map jobs related to
> >> a portion of their hbase regions? If so what issues did you come
> >> across?
> >>
> >> 

Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Colin Kincaid Williams
I've noticed that I've omitted

scan.setCaching(500);// 1 is the default in Scan, which will
be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs

which appear to be suggestions from examples. Still I am not sure if
this explains the significant request slowdown on the final 25% of the
jobs.

On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams  wrote:
> Excuse my double post. I thought I deleted my draft, and then
> constructed a cleaner, more detailed, more readable mail.
>
> On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams  
> wrote:
>> After trying to get help with distcp on hadoop-user and cdh-user
>> mailing lists, I've given up on trying to use distcp and exporttable
>> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>>
>> I've been working on an hbase map reduce job to serialize my entries
>> and insert them into kafka. Then I plan to re-import them into
>> cdh5.3.0.
>>
>> Currently I'm having trouble with my map-reduce job. I have 43 maps,
>> 33 which have finished successfully, and 10 which are currently still
>> running. I had previously seen requests of 50-150k per second. Now for
>> the final 10 maps, I'm seeing 100-150k per minute.
>>
>> I might also mention that there were 6 failures near the application
>> start. Unfortunately, I cannot read the logs for these 6 failures.
>> There is an exception related to the yarn logging for these maps,
>> maybe because they failed to start.
>>
>> I had a look around HDFS. It appears that the regions are all between
>> 5-10GB. The longest completed map so far took 7 hours, with the
>> majority appearing to take around 3.5 hours .
>>
>> The remaining 10 maps have each been running between 23-27 hours.
>>
>> Considering data locality issues. 6 of the remaining jobs are running
>> on the same rack. Then the other 4 are split between my other two
>> racks. There should currently be a replica on each rack, since it
>> appears the replicas are set to 3. Then I'm not sure this is really
>> the cause of the slowdown.
>>
>> Then I'm looking for advice on what I can do to troubleshoot my job.
>> I'm setting up my map job like:
>>
>> main(String[] args){
>> ...
>> Scan fromScan = new Scan();
>> System.out.println(fromScan);
>> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class,
>> null, null, job, true, TableInputFormat.class);
>>
>> // My guess is this contols the output type for the reduce function
>> base on setOutputKeyClass and setOutput value class from p.27 . Since
>> there is no reduce step, then this is currently null.
>> job.setOutputFormatClass(NullOutputFormat.class);
>> job.setNumReduceTasks(0);
>> job.submit();
>> ...
>> }
>>
>> I'm not performing a reduce step, and I'm traversing row keys like
>>
>> map(final ImmutableBytesWritable fromRowKey,
>> Result fromResult, Context context) throws IOException {
>> ...
>>   // should I assume that each keyvalue is a version of the stored row?
>>   for (KeyValue kv : fromResult.raw()) {
>> ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
>> kv.getValue());
>> //TODO: ADD counter for each qualifier
>>   }
>>
>>
>>
>> I've also have a list of simple questions.
>>
>> Has anybody experienced a significant slowdown on map jobs related to
>> a portion of their hbase regions? If so what issues did you come
>> across?
>>
>> Can I get a suggestion how to show which map corresponds to which
>> region, so I can troubleshoot from there? Is this already logged
>> somewhere by default, or is there a way to set this up with the
>> TableMapReduceUtil.initTableMapperJob ?
>>
>> Any other suggestions would be appreciated.


Re: HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Colin Kincaid Williams
Excuse my double post. I thought I deleted my draft, and then
constructed a cleaner, more detailed, more readable mail.

On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams  wrote:
> After trying to get help with distcp on hadoop-user and cdh-user
> mailing lists, I've given up on trying to use distcp and exporttable
> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>
> I've been working on an hbase map reduce job to serialize my entries
> and insert them into kafka. Then I plan to re-import them into
> cdh5.3.0.
>
> Currently I'm having trouble with my map-reduce job. I have 43 maps,
> 33 which have finished successfully, and 10 which are currently still
> running. I had previously seen requests of 50-150k per second. Now for
> the final 10 maps, I'm seeing 100-150k per minute.
>
> I might also mention that there were 6 failures near the application
> start. Unfortunately, I cannot read the logs for these 6 failures.
> There is an exception related to the yarn logging for these maps,
> maybe because they failed to start.
>
> I had a look around HDFS. It appears that the regions are all between
> 5-10GB. The longest completed map so far took 7 hours, with the
> majority appearing to take around 3.5 hours .
>
> The remaining 10 maps have each been running between 23-27 hours.
>
> Considering data locality issues. 6 of the remaining jobs are running
> on the same rack. Then the other 4 are split between my other two
> racks. There should currently be a replica on each rack, since it
> appears the replicas are set to 3. Then I'm not sure this is really
> the cause of the slowdown.
>
> Then I'm looking for advice on what I can do to troubleshoot my job.
> I'm setting up my map job like:
>
> main(String[] args){
> ...
> Scan fromScan = new Scan();
> System.out.println(fromScan);
> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class,
> null, null, job, true, TableInputFormat.class);
>
> // My guess is this contols the output type for the reduce function
> base on setOutputKeyClass and setOutput value class from p.27 . Since
> there is no reduce step, then this is currently null.
> job.setOutputFormatClass(NullOutputFormat.class);
> job.setNumReduceTasks(0);
> job.submit();
> ...
> }
>
> I'm not performing a reduce step, and I'm traversing row keys like
>
> map(final ImmutableBytesWritable fromRowKey,
> Result fromResult, Context context) throws IOException {
> ...
>   // should I assume that each keyvalue is a version of the stored row?
>   for (KeyValue kv : fromResult.raw()) {
> ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
> kv.getValue());
> //TODO: ADD counter for each qualifier
>   }
>
>
>
> I've also have a list of simple questions.
>
> Has anybody experienced a significant slowdown on map jobs related to
> a portion of their hbase regions? If so what issues did you come
> across?
>
> Can I get a suggestion how to show which map corresponds to which
> region, so I can troubleshoot from there? Is this already logged
> somewhere by default, or is there a way to set this up with the
> TableMapReduceUtil.initTableMapperJob ?
>
> Any other suggestions would be appreciated.


HBase mapreduce job crawls on final 25% of maps

2016-04-12 Thread Colin Kincaid Williams
After trying to get help with distcp on hadoop-user and cdh-user
mailing lists, I've given up on trying to use distcp and exporttable
to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0

I've been working on an hbase map reduce job to serialize my entries
and insert them into kafka. Then I plan to re-import them into
cdh5.3.0.

Currently I'm having trouble with my map-reduce job. I have 43 maps,
33 which have finished successfully, and 10 which are currently still
running. I had previously seen requests of 50-150k per second. Now for
the final 10 maps, I'm seeing 100-150k per minute.

I might also mention that there were 6 failures near the application
start. Unfortunately, I cannot read the logs for these 6 failures.
There is an exception related to the yarn logging for these maps,
maybe because they failed to start.

I had a look around HDFS. It appears that the regions are all between
5-10GB. The longest completed map so far took 7 hours, with the
majority appearing to take around 3.5 hours .

The remaining 10 maps have each been running between 23-27 hours.

Considering data locality issues. 6 of the remaining jobs are running
on the same rack. Then the other 4 are split between my other two
racks. There should currently be a replica on each rack, since it
appears the replicas are set to 3. Then I'm not sure this is really
the cause of the slowdown.

Then I'm looking for advice on what I can do to troubleshoot my job.
I'm setting up my map job like:

main(String[] args){
...
Scan fromScan = new Scan();
System.out.println(fromScan);
TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class,
null, null, job, true, TableInputFormat.class);

// My guess is this contols the output type for the reduce function
base on setOutputKeyClass and setOutput value class from p.27 . Since
there is no reduce step, then this is currently null.
job.setOutputFormatClass(NullOutputFormat.class);
job.setNumReduceTasks(0);
job.submit();
...
}

I'm not performing a reduce step, and I'm traversing row keys like

map(final ImmutableBytesWritable fromRowKey,
Result fromResult, Context context) throws IOException {
...
  // should I assume that each keyvalue is a version of the stored row?
  for (KeyValue kv : fromResult.raw()) {
ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
kv.getValue());
//TODO: ADD counter for each qualifier
  }



I've also have a list of simple questions.

Has anybody experienced a significant slowdown on map jobs related to
a portion of their hbase regions? If so what issues did you come
across?

Can I get a suggestion how to show which map corresponds to which
region, so I can troubleshoot from there? Is this already logged
somewhere by default, or is there a way to set this up with the
TableMapReduceUtil.initTableMapperJob ?

Any other suggestions would be appreciated.


HBASE mapReduce stoppage

2015-05-20 Thread dchrimes
We are bulk loading 1 billion rows into hbase. The 1 billion file was
split into 20 files of ~22.5GB. Ingesting the file to hdfs took ~2min.
Ingesting the first file to hbase took  ~3 hours. The next took ~5hours,
then it is increasing. By the sixth or seventh file the ingestion just
stops (mapReduce Bulk load stops at 99% of mapper and around 22% of the
reducer). We also noticed that as soon as the reducers are starting, the
progress of the job slows down.

The logs did not show any problem and we do not see any hot spotting (the
table is already salted). We are running out of ideas. Few questions to
get started:
1- Is the increase MR expected? Does MR need to sort the new data again
the already ingested one?
2- Is there a way to speed up this, especially that our data is already
sorted? From 2min on hdfs to 5 hours on hbase is a big gap. A word count
map reduce on 24GB took only ~7 minutes. Removing the reducers from the
existing cvs bulk load will not help as the mappers will spit the data in
a random order.

regards,

Dillon

Dillon Chrimes (PhD)
University of Victoria
Victoria BC Canada







Re: HBASE mapReduce stoppage

2015-05-20 Thread Esteban Gutierrez
Hi Dilon,

Sounds like your table was not pre-split from the behavior that you are
describing, but when you say that you are bulk loading the data using MR is
this a MR job that does Put(s) into HBase or just generating HFiles (if
using importtsv you have both options) that are later on bulk loaded via
the completebulkload command?

Have you looked into https://hbase.apache.org/book.html#arch.bulk.load for
how to perform a bulk load into HBase?

cheers,
esteban.

--
Cloudera, Inc.


On Wed, May 20, 2015 at 3:01 PM, dchri...@uvic.ca wrote:

 We are bulk loading 1 billion rows into hbase. The 1 billion file was
 split into 20 files of ~22.5GB. Ingesting the file to hdfs took ~2min.
 Ingesting the first file to hbase took  ~3 hours. The next took ~5hours,
 then it is increasing. By the sixth or seventh file the ingestion just
 stops (mapReduce Bulk load stops at 99% of mapper and around 22% of the
 reducer). We also noticed that as soon as the reducers are starting, the
 progress of the job slows down.

 The logs did not show any problem and we do not see any hot spotting (the
 table is already salted). We are running out of ideas. Few questions to
 get started:
 1- Is the increase MR expected? Does MR need to sort the new data again
 the already ingested one?
 2- Is there a way to speed up this, especially that our data is already
 sorted? From 2min on hdfs to 5 hours on hbase is a big gap. A word count
 map reduce on 24GB took only ~7 minutes. Removing the reducers from the
 existing cvs bulk load will not help as the mappers will spit the data in
 a random order.

 regards,

 Dillon

 Dillon Chrimes (PhD)
 University of Victoria
 Victoria BC Canada








Re: HBase MapReduce in Kerberized cluster

2015-05-14 Thread Edward C. Skoviak
 I searched (current) 0.98 and branch-1 where I found:
./hbase-client/src/main/java/org/apache/hadoop/hbase/security/token/TokenUtil.java


Looking at both 0.98[1] and 0.98.6[2] on github I see TokenUtil as
part of hbase-server.

Is it necessary for us to add this call to TokenUtil to all MR jobs
interacting with HBase after kerberizing our clusters? Or is there a
better approach that won't require making this change?


Thanks,

Ed Skoviak


[1] https://github.com/apache/hbase/find/0.98.0

[2] https://github.com/apache/hbase/find/0.98.6


Re: HBase MapReduce in Kerberized cluster

2015-05-14 Thread Ted Yu
Please take a look at
HBASE-12493 User class should provide a way to re-use existing token

which went into 0.98.9

FYI

On Thu, May 14, 2015 at 8:37 AM, Edward C. Skoviak edward.skov...@gmail.com
 wrote:

  I searched (current) 0.98 and branch-1 where I found:

 ./hbase-client/src/main/java/org/apache/hadoop/hbase/security/token/TokenUtil.java


 Looking at both 0.98[1] and 0.98.6[2] on github I see TokenUtil as
 part of hbase-server.

 Is it necessary for us to add this call to TokenUtil to all MR jobs
 interacting with HBase after kerberizing our clusters? Or is there a
 better approach that won't require making this change?


 Thanks,

 Ed Skoviak


 [1] https://github.com/apache/hbase/find/0.98.0

 [2] https://github.com/apache/hbase/find/0.98.6



HBase MapReduce in Kerberized cluster

2015-05-13 Thread Edward C. Skoviak
I'm attempting to write a Crunch pipeline to read various rows from a table
in HBase and then do processing on these results. I am doing this from a
cluster deployed using CDH 5.3.2 running Kerberos and YARN.

I was hoping to get an answer on what is considered the best approach to
authenticate to HBase within MapReduce task execution context? I've perused
various posts/documentation and it seems that TokenUtil was, at least at
one point, the right approach, however I notice now it has been moved to be
a part of the hbase-server package (instead of hbase-client). Is there a
better way to retrieve and pass an HBase delegation token to the MR job
launched by my pipeline?

Thanks,
Ed Skoviak


Re: HBase MapReduce in Kerberized cluster

2015-05-13 Thread Ted Yu
bq. it has been moved to be a part of the hbase-server package

I searched (current) 0.98 and branch-1 where I found:
./hbase-client/src/main/java/org/apache/hadoop/hbase/security/token/TokenUtil.java

FYI

On Wed, May 13, 2015 at 11:45 AM, Edward C. Skoviak 
edward.skov...@gmail.com wrote:

 I'm attempting to write a Crunch pipeline to read various rows from a table
 in HBase and then do processing on these results. I am doing this from a
 cluster deployed using CDH 5.3.2 running Kerberos and YARN.

 I was hoping to get an answer on what is considered the best approach to
 authenticate to HBase within MapReduce task execution context? I've perused
 various posts/documentation and it seems that TokenUtil was, at least at
 one point, the right approach, however I notice now it has been moved to be
 a part of the hbase-server package (instead of hbase-client). Is there a
 better way to retrieve and pass an HBase delegation token to the MR job
 launched by my pipeline?

 Thanks,
 Ed Skoviak



A solution for data skew issue in HBase-Mapreduce jobs

2014-11-30 Thread yeweichen2...@gmail.com
Hi, all,

   I submit a new patch to fix the data skew issue in HBase-Mapreduce jobs. 
Would you please take a look at this new patch and give me some advice? 
   https://issues.apache.org/jira/browse/HBASE-12590 

  Example: 

   



yeweichen2...@gmail.com


Re: A solution for data skew issue in HBase-Mapreduce jobs

2014-11-30 Thread Ted Yu
Did you attach a screenshot ?

The attachment shows up as grey area.
Probably you can attach the image to JIRA.

Cheers

On Sun, Nov 30, 2014 at 6:57 PM, yeweichen2...@gmail.com 
yeweichen2...@gmail.com wrote:

 Hi, all,

I submit a new patch to fix the data skew issue in HBase-Mapreduce
 jobs. Would you please take a look at this new patch and give me some
 advice?
https://issues.apache.org/jira/browse/HBASE-12590
 https://issues.apache.org/jira/browse/HBASE-12590

   Example:



 --
 yeweichen2...@gmail.com



Re: Re: A solution for data skew issue in HBase-Mapreduce jobs

2014-11-30 Thread yeweichen2...@gmail.com
Yes, the screenshot is not good in mail. 

You can get the simple pdf document from this link: 
https://issues.apache.org/jira/secure/attachment/12683977/A%20Solution%20for%20Data%20Skew%20in%20HBase-MapReduce%20Job.pdf

Cheers


yeweichen2...@gmail.com
 
From: Ted Yu
Date: 2014-12-01 11:00
To: user@hbase.apache.org
Subject: Re: A solution for data skew issue in HBase-Mapreduce jobs
Did you attach a screenshot ?
 
The attachment shows up as grey area.
Probably you can attach the image to JIRA.
 
Cheers
 
On Sun, Nov 30, 2014 at 6:57 PM, yeweichen2...@gmail.com 
yeweichen2...@gmail.com wrote:
 
 Hi, all,

I submit a new patch to fix the data skew issue in HBase-Mapreduce
 jobs. Would you please take a look at this new patch and give me some
 advice?
https://issues.apache.org/jira/browse/HBASE-12590
 https://issues.apache.org/jira/browse/HBASE-12590

   Example:



 --
 yeweichen2...@gmail.com



Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-08-02 Thread Parkirat
Hi Shahab,

Thanks for the response.

I have added the @Override and somehow that worked. I have pasted the new
Reducer code below.

Though, I did not understood the difference here, as if what I have done
differently. I might be a very silly reason though.

=
package com.test.hadoop;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends ReducerText, IntWritable, Text,
IntWritable {

@Override
protected void reduce(Text key, IterableIntWritable values, Context
context)
throws IOException, InterruptedException {

int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
=

Regards,
Parkirat Bagga.



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062240.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-08-02 Thread Arun Allamsetty
Hi,

The @Override annotation worked because, without it the reduce method in
the superclass (Reducer) was being invoked, which basically writes the
input from the mapper class to the context object. Try to look up the
source code for the Reducer class online and you'll realize that.

Hope that clears it up.

Cheers,
Arun

Sent from a mobile device. Please don't mind the typos.
Hi Shahab,

Thanks for the response.

I have added the @Override and somehow that worked. I have pasted the new
Reducer code below.

Though, I did not understood the difference here, as if what I have done
differently. I might be a very silly reason though.

=
package com.test.hadoop;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends ReducerText, IntWritable, Text,
IntWritable {

@Override
protected void reduce(Text key, IterableIntWritable values,
Context
context)
throws IOException, InterruptedException {

int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
=

Regards,
Parkirat Bagga.



--
View this message in context:
http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062240.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-08-02 Thread Shahab Yunus
Parkirat,

This is a core Java concept which is mainly related to how Class
inheritance works in Java and how the @Override annotation is used, and is
not Hadoop specific. (It is also used while implementing interfaces since
JDK 6.)

You can read about it here:
http://tutorials.jenkov.com/java/annotations.html#override
http://www.javapractices.com/topic/TopicAction.do?Id=223

Regards,
Shahab


On Sat, Aug 2, 2014 at 2:56 AM, Arun Allamsetty arun.allamse...@gmail.com
wrote:

 Hi,

 The @Override annotation worked because, without it the reduce method in
 the superclass (Reducer) was being invoked, which basically writes the
 input from the mapper class to the context object. Try to look up the
 source code for the Reducer class online and you'll realize that.

 Hope that clears it up.

 Cheers,
 Arun

 Sent from a mobile device. Please don't mind the typos.
 Hi Shahab,

 Thanks for the response.

 I have added the @Override and somehow that worked. I have pasted the new
 Reducer code below.

 Though, I did not understood the difference here, as if what I have done
 differently. I might be a very silly reason though.

 =
 package com.test.hadoop;

 import java.io.IOException;

 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Reducer;

 public class WordCountReducer extends ReducerText, IntWritable, Text,
 IntWritable {

 @Override
 protected void reduce(Text key, IterableIntWritable values,
 Context
 context)
 throws IOException, InterruptedException {

 int sum = 0;
 for (IntWritable val : values) {
 sum += val.get();
 }
 context.write(key, new IntWritable(sum));
 }
 }
 =

 Regards,
 Parkirat Bagga.



 --
 View this message in context:

 http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062240.html
 Sent from the HBase User mailing list archive at Nabble.com.



Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-08-01 Thread Arun Allamsetty
Hi Parkirat,

I don't think that HBase is causing the problems. You might already know
this but need to add the reducer class to the job as you add the mapper.
Also, if you want to read from a HBase table in a MapReduce job, you need
to implement the TableMapper for the mapper and if you want to write to a
file on HDFS instead of a HBase table, you need to implement the generic
Reducer class for the reducer.

But again, if you can show everyone the code, people will be able to help
you better.

Cheers,
Arun

Sent from a mobile device. Please don't mind the typos.
On Jul 31, 2014 1:07 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 Hi Parkirat,

 I don't follow the reducer problem you're having. Can you post your code
 that configures the job? I assume you're using TableMapReduceUtil
 someplace.

 Your reducer is removing duplicate values? Sounds like you need to update
 it's logic to only emit a value once. Pastebin-ing your reducer code may be
 helpful as well.

 -n


 On Thu, Jul 31, 2014 at 8:20 AM, Parkirat parkiratbigd...@gmail.com
 wrote:

  Hi All,
 
  I am using Mapreduce API to read Hbase Table, based on some scan
 operation
  in mapper and putting the data to a file in reducer.
  I am using Hbase Version Version 0.94.5.23.
 
  *Problem:*
  Now in my job, my mapper output a key as text and value as text, but my
  reducer output key as text and value as nullwritable, but it seems *hbase
  mapreduce api dont consider reducer*, and outputs both key and value as
  text.
 
  Moreover if the same key comes twice, it goes to the file twice, even if
 my
  reducer want to log it only once.
 
  Could anybody help me with this problem?
 
  Regards,
  Parkirat Singh Bagga.
 
 
 
  --
  View this message in context:
 
 http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141.html
  Sent from the HBase User mailing list archive at Nabble.com.
 



Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-08-01 Thread Parkirat
]
14/08/01 20:52:19 WARN snappy.LoadSnappy: Snappy native library is available
14/08/01 20:52:19 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
14/08/01 20:52:19 INFO snappy.LoadSnappy: Snappy native library loaded
14/08/01 20:52:41 INFO mapred.JobClient: Running job: job_201404021234_0090
14/08/01 20:52:42 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 20:52:54 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 20:53:02 INFO mapred.JobClient:  map 100% reduce 33%
14/08/01 20:53:04 INFO mapred.JobClient:  map 100% reduce 100%
14/08/01 20:53:05 INFO mapred.JobClient: Job complete: job_201404021234_0090
14/08/01 20:53:05 INFO mapred.JobClient: Counters: 29
14/08/01 20:53:05 INFO mapred.JobClient:   Job Counters
14/08/01 20:53:05 INFO mapred.JobClient: Launched reduce tasks=1
14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9171
14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
14/08/01 20:53:05 INFO mapred.JobClient: Launched map tasks=1
14/08/01 20:53:05 INFO mapred.JobClient: Data-local map tasks=1
14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9719
14/08/01 20:53:05 INFO mapred.JobClient:   File Output Format Counters
14/08/01 20:53:05 INFO mapred.JobClient: Bytes Written=119
14/08/01 20:53:05 INFO mapred.JobClient:   FileSystemCounters
14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_READ=197
14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_READ=214
14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=112948
14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=119
14/08/01 20:53:05 INFO mapred.JobClient:   File Input Format Counters
14/08/01 20:53:05 INFO mapred.JobClient: Bytes Read=83
14/08/01 20:53:05 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 20:53:05 INFO mapred.JobClient: Map output materialized
bytes=197
14/08/01 20:53:05 INFO mapred.JobClient: Map input records=1
14/08/01 20:53:05 INFO mapred.JobClient: Reduce shuffle bytes=197
14/08/01 20:53:05 INFO mapred.JobClient: Spilled Records=36
14/08/01 20:53:05 INFO mapred.JobClient: Map output bytes=155
14/08/01 20:53:05 INFO mapred.JobClient: CPU time spent (ms)=2770
14/08/01 20:53:05 INFO mapred.JobClient: Total committed heap usage
(bytes)=398393344
14/08/01 20:53:05 INFO mapred.JobClient: Combine input records=0
14/08/01 20:53:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=131
14/08/01 20:53:05 INFO mapred.JobClient: Reduce input records=18
14/08/01 20:53:05 INFO mapred.JobClient: Reduce input groups=15
14/08/01 20:53:05 INFO mapred.JobClient: Combine output records=0
14/08/01 20:53:05 INFO mapred.JobClient: Physical memory (bytes)
snapshot=385605632
14/08/01 20:53:05 INFO mapred.JobClient: Reduce output records=18
14/08/01 20:53:05 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=2707595264
14/08/01 20:53:05 INFO mapred.JobClient: Map output records=18


*Generated Output File:*

-bash-4.1$ hadoop fs -tail /tmp/wc/output/part-r-0
Hadoop  1
This1
an  1
as  1
example 1
example 1
fine1
if  1
is  1
not.1
or  1
so  1
test1
test1
this1
to  1
to  1
works   1


Regards,
Parkirat Bagga




--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p406.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-08-01 Thread Shahab Yunus
/wc/output
 14/08/01 20:52:19 WARN mapred.JobClient: Use GenericOptionsParser for
 parsing the arguments. Applications should implement Tool for the same.
 14/08/01 20:52:19 INFO input.FileInputFormat: Total input paths to process
 :
 1
 14/08/01 20:52:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
 14/08/01 20:52:19 INFO lzo.LzoCodec: Successfully loaded  initialized
 native-lzo library [hadoop-lzo rev
 cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
 14/08/01 20:52:19 WARN snappy.LoadSnappy: Snappy native library is
 available
 14/08/01 20:52:19 INFO util.NativeCodeLoader: Loaded the native-hadoop
 library
 14/08/01 20:52:19 INFO snappy.LoadSnappy: Snappy native library loaded
 14/08/01 20:52:41 INFO mapred.JobClient: Running job: job_201404021234_0090
 14/08/01 20:52:42 INFO mapred.JobClient:  map 0% reduce 0%
 14/08/01 20:52:54 INFO mapred.JobClient:  map 100% reduce 0%
 14/08/01 20:53:02 INFO mapred.JobClient:  map 100% reduce 33%
 14/08/01 20:53:04 INFO mapred.JobClient:  map 100% reduce 100%
 14/08/01 20:53:05 INFO mapred.JobClient: Job complete:
 job_201404021234_0090
 14/08/01 20:53:05 INFO mapred.JobClient: Counters: 29
 14/08/01 20:53:05 INFO mapred.JobClient:   Job Counters
 14/08/01 20:53:05 INFO mapred.JobClient: Launched reduce tasks=1
 14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9171
 14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all
 reduces
 waiting after reserving slots (ms)=0
 14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all maps
 waiting after reserving slots (ms)=0
 14/08/01 20:53:05 INFO mapred.JobClient: Launched map tasks=1
 14/08/01 20:53:05 INFO mapred.JobClient: Data-local map tasks=1
 14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9719
 14/08/01 20:53:05 INFO mapred.JobClient:   File Output Format Counters
 14/08/01 20:53:05 INFO mapred.JobClient: Bytes Written=119
 14/08/01 20:53:05 INFO mapred.JobClient:   FileSystemCounters
 14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_READ=197
 14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_READ=214
 14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=112948
 14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=119
 14/08/01 20:53:05 INFO mapred.JobClient:   File Input Format Counters
 14/08/01 20:53:05 INFO mapred.JobClient: Bytes Read=83
 14/08/01 20:53:05 INFO mapred.JobClient:   Map-Reduce Framework
 14/08/01 20:53:05 INFO mapred.JobClient: Map output materialized
 bytes=197
 14/08/01 20:53:05 INFO mapred.JobClient: Map input records=1
 14/08/01 20:53:05 INFO mapred.JobClient: Reduce shuffle bytes=197
 14/08/01 20:53:05 INFO mapred.JobClient: Spilled Records=36
 14/08/01 20:53:05 INFO mapred.JobClient: Map output bytes=155
 14/08/01 20:53:05 INFO mapred.JobClient: CPU time spent (ms)=2770
 14/08/01 20:53:05 INFO mapred.JobClient: Total committed heap usage
 (bytes)=398393344
 14/08/01 20:53:05 INFO mapred.JobClient: Combine input records=0
 14/08/01 20:53:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=131
 14/08/01 20:53:05 INFO mapred.JobClient: Reduce input records=18
 14/08/01 20:53:05 INFO mapred.JobClient: Reduce input groups=15
 14/08/01 20:53:05 INFO mapred.JobClient: Combine output records=0
 14/08/01 20:53:05 INFO mapred.JobClient: Physical memory (bytes)
 snapshot=385605632
 14/08/01 20:53:05 INFO mapred.JobClient: Reduce output records=18
 14/08/01 20:53:05 INFO mapred.JobClient: Virtual memory (bytes)
 snapshot=2707595264
 14/08/01 20:53:05 INFO mapred.JobClient: Map output records=18
 

 *Generated Output File:*
 
 -bash-4.1$ hadoop fs -tail /tmp/wc/output/part-r-0
 Hadoop  1
 This1
 an  1
 as  1
 example 1
 example 1
 fine1
 if  1
 is  1
 not.1
 or  1
 so  1
 test1
 test1
 this1
 to  1
 to  1
 works   1
 

 Regards,
 Parkirat Bagga




 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p406.html
 Sent from the HBase User mailing list archive at Nabble.com.



Hbase Mapreduce API - Reduce to a file is not working properly.

2014-07-31 Thread Parkirat
Hi All,

I am using Mapreduce API to read Hbase Table, based on some scan operation
in mapper and putting the data to a file in reducer.
I am using Hbase Version Version 0.94.5.23.

*Problem:*
Now in my job, my mapper output a key as text and value as text, but my
reducer output key as text and value as nullwritable, but it seems *hbase
mapreduce api dont consider reducer*, and outputs both key and value as
text.

Moreover if the same key comes twice, it goes to the file twice, even if my
reducer want to log it only once.

Could anybody help me with this problem?

Regards,
Parkirat Singh Bagga.



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-07-31 Thread Nick Dimiduk
Hi Parkirat,

I don't follow the reducer problem you're having. Can you post your code
that configures the job? I assume you're using TableMapReduceUtil someplace.

Your reducer is removing duplicate values? Sounds like you need to update
it's logic to only emit a value once. Pastebin-ing your reducer code may be
helpful as well.

-n


On Thu, Jul 31, 2014 at 8:20 AM, Parkirat parkiratbigd...@gmail.com wrote:

 Hi All,

 I am using Mapreduce API to read Hbase Table, based on some scan operation
 in mapper and putting the data to a file in reducer.
 I am using Hbase Version Version 0.94.5.23.

 *Problem:*
 Now in my job, my mapper output a key as text and value as text, but my
 reducer output key as text and value as nullwritable, but it seems *hbase
 mapreduce api dont consider reducer*, and outputs both key and value as
 text.

 Moreover if the same key comes twice, it goes to the file twice, even if my
 reducer want to log it only once.

 Could anybody help me with this problem?

 Regards,
 Parkirat Singh Bagga.



 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141.html
 Sent from the HBase User mailing list archive at Nabble.com.



Re: HBase MapReduce problem

2014-02-04 Thread Murali
Hi Ted,
 
   I am trying your solution. But I got the same error message.

Thanks








Re: HBase MapReduce problem

2014-02-04 Thread Ted Yu
Did you create the table prior to launching your program ?
If so, when you scan hbase:meta table, do you see row(s) for it ?

Cheers

On Feb 4, 2014, at 12:53 AM, Murali muralidha...@veradistech.com wrote:

 Hi Ted,
 
   I am trying your solution. But I got the same error message.
 
 Thanks
 
 
 
 
 
 


Re: HBase MapReduce problem

2014-02-03 Thread Murali
Hi Ted,

   I am using HBase 0.96 version. But I am also getting the below error 
message

14/02/03 10:18:32 ERROR mapreduce.TableOutputFormat: 
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
region for  after 35 tries.
Exception in thread main java.lang.RuntimeException: 
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
region for  after 35 tries.
at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputForma
t.java:211)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:453)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java
:342)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at com.hbase.HBaseWC.run(HBaseWC.java:78)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.hbase.HBaseWC.main(HBaseWC.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable 
to find region for  after 35 tries.
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
locateRegionInMeta(HConnectionManager.java:1127)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
locateRegion(HConnectionManager.java:1047)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
locateRegion(HConnectionManager.java:1004)
at 
org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:325)
at org.apache.hadoop.hbase.client.HTable.init(HTable.java:191)
at org.apache.hadoop.hbase.client.HTable.init(HTable.java:149)
at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputForma
t.java:206)
... 19 more

 May I know how to fix it?

Thanks



Re: HBase MapReduce problem

2014-02-03 Thread Ted Yu
Murali:
Are you using 0.96.1.1 ?
Can you show us the command line you used ?

Meanwhile I assume the HBase cluster is functional - you can use shell to
insert data.

Cheers


On Mon, Feb 3, 2014 at 8:33 PM, Murali muralidha...@veradistech.com wrote:

 Hi Ted,

I am using HBase 0.96 version. But I am also getting the below error
 message

 14/02/03 10:18:32 ERROR mapreduce.TableOutputFormat:
 org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
 region for  after 35 tries.
 Exception in thread main java.lang.RuntimeException:
 org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
 region for  after 35 tries.
 at

 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputForma
 t.java:211)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at

 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:453)
 at

 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java
 :342)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
 va:1491)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
 at com.hbase.HBaseWC.run(HBaseWC.java:78)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at com.hbase.HBaseWC.main(HBaseWC.java:83)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
 )
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
 .java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException:
 Unable
 to find region for  after 35 tries.
 at

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
 locateRegionInMeta(HConnectionManager.java:1127)
 at

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
 locateRegion(HConnectionManager.java:1047)
 at

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
 locateRegion(HConnectionManager.java:1004)
 at
 org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:325)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:191)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:149)
 at

 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputForma
 t.java:206)
 ... 19 more

  May I know how to fix it?

 Thanks




Re: HBase MapReduce problem

2014-02-03 Thread Murali
Hi Ted

  Thanks for your reply. I am using HBase version 0.96.0. I can insert a 
record using shell command. I am running the below command to run my 
MapReduce job. It is a word count example. Reading a text file from hdfs file 
path and insert the counts to HBase table.

hadoop jar hb.jar com.hbase.HBaseWC

Thanks



Re: HBase MapReduce problem

2014-02-03 Thread Ted Yu
See the sample command in
http://hbase.apache.org/book.html#trouble.mapreduce :

HADOOP_CLASSPATH=`hbase classpath` hadoop jar



On Mon, Feb 3, 2014 at 9:33 PM, Murali muralidha...@veradistech.com wrote:

 Hi Ted

   Thanks for your reply. I am using HBase version 0.96.0. I can insert a
 record using shell command. I am running the below command to run my
 MapReduce job. It is a word count example. Reading a text file from hdfs
 file
 path and insert the counts to HBase table.

 hadoop jar hb.jar com.hbase.HBaseWC

 Thanks




HBase MapReduce with setup function problem

2014-01-27 Thread daidong
Dear all,

  I am writing a MapReduce application processing HBase table. In each map,
it needs to read data from another HBase table, so i use the 'setup'
function to initialize the HTable instance like this:

@Override

public void setup(Context context){

  Configuration conf = HBaseConfiguration.create();

  try {

   centrals = new HTable(conf, central.getBytes());

  } catch (IOException e) {

  }

  return;

}

But, when i run this mapreduce application, it is always stay in 0%,0%. And
the map phase is always under initializing, does not progress. I have
googled it, but still do not have any ideas.

P.S. I use hadoop-1.1.2 and Hbase-0.96.

Thanks!

- Dong


Re: HBase MapReduce with setup function problem

2014-01-27 Thread Ted Yu
Have you considered using MultiTableInputFormat ?

Cheers


On Mon, Jan 27, 2014 at 9:14 AM, daidong daidon...@gmail.com wrote:

 Dear all,

   I am writing a MapReduce application processing HBase table. In each map,
 it needs to read data from another HBase table, so i use the 'setup'
 function to initialize the HTable instance like this:

 @Override

 public void setup(Context context){

   Configuration conf = HBaseConfiguration.create();

   try {

centrals = new HTable(conf, central.getBytes());

   } catch (IOException e) {

   }

   return;

 }

 But, when i run this mapreduce application, it is always stay in 0%,0%. And
 the map phase is always under initializing, does not progress. I have
 googled it, but still do not have any ideas.

 P.S. I use hadoop-1.1.2 and Hbase-0.96.

 Thanks!

 - Dong



Re: HBase MapReduce with setup function problem

2014-01-27 Thread Ted Yu
I agree that we should find the cause for why initialization got stuck.
I noticed empty catch block:
  } catch (IOException e) {

  }
Can you add some logging there to see what might have gone wrong ?

Thanks


On Mon, Jan 27, 2014 at 11:56 AM, daidong daidon...@gmail.com wrote:

 Dear Ted, Thanks very much for your reply!

 Yes. MultiTableInputFormat may work here, but i still want to know how to
 connect a hbase table inside MapReduce applications. Because i may need
 also write to tables inside map function.

 Do you know why previous mr application does not work? Because the wrong
 configuration instance? Any suggestion will be great for me! Thanks!

 - Dong


 2014-01-27 Ted Yu yuzhih...@gmail.com

  Have you considered using MultiTableInputFormat ?
 
  Cheers
 
 
  On Mon, Jan 27, 2014 at 9:14 AM, daidong daidon...@gmail.com wrote:
 
   Dear all,
  
 I am writing a MapReduce application processing HBase table. In each
  map,
   it needs to read data from another HBase table, so i use the 'setup'
   function to initialize the HTable instance like this:
  
   @Override
  
   public void setup(Context context){
  
 Configuration conf = HBaseConfiguration.create();
  
 try {
  
  centrals = new HTable(conf, central.getBytes());
  
 } catch (IOException e) {
  
 }
  
 return;
  
   }
  
   But, when i run this mapreduce application, it is always stay in 0%,0%.
  And
   the map phase is always under initializing, does not progress. I have
   googled it, but still do not have any ideas.
  
   P.S. I use hadoop-1.1.2 and Hbase-0.96.
  
   Thanks!
  
   - Dong
  
 



HBase MapReduce problem

2014-01-24 Thread daidong
Dear all,

  I have a simple HBase MapReduce application and try to run it on a
12-node cluster using this command:

HADOOP_CLASSPATH=`bin/hbase classpath` ~/hadoop-1.1.2/bin/hadoop jar
.jar org.test.WordCount

HBase version is 0.95.0. But i got this error:

java.lang.RuntimeException:
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
region for wordcount,,99 after 10 tries.

after a long waiting from this output: 14/01/24 10:12:51 INFO
mapred.JobClient: map 0% reduce 0%

It seems like RegionServer problem, however, i have tried all the ways
to make sure the HBase cluster is running well: i can access all
region servers through web ui, i can see the table 'wordcount' there,
i run 'hbase hbck' and it returns all 'ok'. I even try a simple HBase
program without MapReduce, it works well too.

So, Could anybody tell me why this happens? how to fix it? Is it
relevant to my Hadoop configuration? Thanks!

- Dong Dai


Re: HBase MapReduce problem

2014-01-24 Thread Ted Yu
Why do you use 0.95 which was a developer release ?

See http://hbase.apache.org/book.html#d243e520

Cheers


On Fri, Jan 24, 2014 at 8:40 AM, daidong daidon...@gmail.com wrote:

 Dear all,

   I have a simple HBase MapReduce application and try to run it on a
 12-node cluster using this command:

 HADOOP_CLASSPATH=`bin/hbase classpath` ~/hadoop-1.1.2/bin/hadoop jar
 .jar org.test.WordCount

 HBase version is 0.95.0. But i got this error:

 java.lang.RuntimeException:
 org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
 region for wordcount,,99 after 10 tries.

 after a long waiting from this output: 14/01/24 10:12:51 INFO
 mapred.JobClient: map 0% reduce 0%

 It seems like RegionServer problem, however, i have tried all the ways
 to make sure the HBase cluster is running well: i can access all
 region servers through web ui, i can see the table 'wordcount' there,
 i run 'hbase hbck' and it returns all 'ok'. I even try a simple HBase
 program without MapReduce, it works well too.

 So, Could anybody tell me why this happens? how to fix it? Is it
 relevant to my Hadoop configuration? Thanks!

 - Dong Dai



Re: HBase MapReduce problem

2014-01-24 Thread daidong
Thanks Ted, I actually tried to modify HBase, so i choose this developer
release.

So, you are thinking this is a version problem, should disappear if i
switched to 0.96?


2014/1/24 Ted Yu yuzhih...@gmail.com

 Why do you use 0.95 which was a developer release ?

 See http://hbase.apache.org/book.html#d243e520

 Cheers


 On Fri, Jan 24, 2014 at 8:40 AM, daidong daidon...@gmail.com wrote:

  Dear all,
 
I have a simple HBase MapReduce application and try to run it on a
  12-node cluster using this command:
 
  HADOOP_CLASSPATH=`bin/hbase classpath` ~/hadoop-1.1.2/bin/hadoop jar
  .jar org.test.WordCount
 
  HBase version is 0.95.0. But i got this error:
 
  java.lang.RuntimeException:
  org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
  region for wordcount,,99 after 10 tries.
 
  after a long waiting from this output: 14/01/24 10:12:51 INFO
  mapred.JobClient: map 0% reduce 0%
 
  It seems like RegionServer problem, however, i have tried all the ways
  to make sure the HBase cluster is running well: i can access all
  region servers through web ui, i can see the table 'wordcount' there,
  i run 'hbase hbck' and it returns all 'ok'. I even try a simple HBase
  program without MapReduce, it works well too.
 
  So, Could anybody tell me why this happens? how to fix it? Is it
  relevant to my Hadoop configuration? Thanks!
 
  - Dong Dai
 



FILE_BYTES_READ counter missing for HBase mapreduce job

2013-09-05 Thread Haijia Zhou
Hi,
 Basically I have a mapreduce job to scan a hbase table and do some
processing. After the job finishes, I only got three filesystem counters:
HDFS_BYTES_READ, HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN.
 The value of HDFS_BYTES_READ is not very useful here because it shows the
size of the .META file, not the size of input records.
 I am looking for counter FILE_BYTES_READ but somehow it's missing in the
job status report.

 Does anyone know what I might miss here?

 Thanks
Haijia

P.S. The job status report
 FileSystemCounters
HDFS_BYTES_READ
 340,124  0   340,124
FILE_BYTES_WRITTEN
190,431,329  0   190,431,329
HDFS_BYTES_WRITTEN
272,538,467,123  0   272,538,467,123


Re: FILE_BYTES_READ counter missing for HBase mapreduce job

2013-09-05 Thread Haijia Zhou
Addition info:
The mapreduce job I run is a map-only job. It does not have reducers and it
write data directly to hdfs in the mapper.
 Could this be the reason why there's no value for file_bytes_read?
 If so, is there any easy way to get the total input data size?

 Thanks
Haijia


On Thu, Sep 5, 2013 at 2:46 PM, Haijia Zhou leons...@gmail.com wrote:

 Hi,
  Basically I have a mapreduce job to scan a hbase table and do some
 processing. After the job finishes, I only got three filesystem counters:
 HDFS_BYTES_READ, HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN.
  The value of HDFS_BYTES_READ is not very useful here because it shows the
 size of the .META file, not the size of input records.
  I am looking for counter FILE_BYTES_READ but somehow it's missing in the
 job status report.

  Does anyone know what I might miss here?

  Thanks
 Haijia

 P.S. The job status report
  FileSystemCounters
 HDFS_BYTES_READ
340,124  0   340,124
 FILE_BYTES_WRITTEN
 190,431,329  0   190,431,329
 HDFS_BYTES_WRITTEN
 272,538,467,123  0   272,538,467,123



issure about DNS error in running hbase mapreduce

2013-08-01 Thread ch huang
i use hadoop-dns-checker check the dns problem ,seems all ok,but when i run
MR task in hbase,it report problem,anyone have good idea?

# ./run-on-cluster.sh hosts1
 CH22 
The authenticity of host 'ch22 (192.168.10.22)' can't be established.
RSA key fingerprint is f3:4a:ca:a3:17:08:98:c2:0a:bd:27:99:a3:65:bc:89.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ch22,192.168.10.22' (RSA) to the list of known
hosts.
root@ch22's password:
sending incremental file list
created directory hadoop-dns
a.jar
hosts1
run.sh
sent 2394 bytes  received 69 bytes  547.33 bytes/sec
total size is 2618  speedup is 1.06
root@ch22's password:
# self check...
-- host : CH22
   host lookup : success (192.168.10.22)
   reverse lookup : success (CH22)
   is reachable : yes
# end self check
 Running on : CH22/192.168.10.22 =
-- host : CH22
   host lookup : success (192.168.10.22)
   reverse lookup : success (CH22)
   is reachable : yes
-- host : CH34
   host lookup : success (192.168.10.34)
   reverse lookup : success (CH34)
   is reachable : yes
-- host : CH35
   host lookup : success (192.168.10.35)
   reverse lookup : success (CH35)
   is reachable : yes
-- host : CH36
   host lookup : success (192.168.10.36)
   reverse lookup : success (CH36)
   is reachable : yes

 CH34 
root@ch34's password:
sending incremental file list
created directory hadoop-dns
a.jar
hosts1
run.sh
sent 2394 bytes  received 69 bytes  703.71 bytes/sec
total size is 2618  speedup is 1.06
root@ch34's password:
# self check...
-- host : CH34
   host lookup : success (192.168.10.34)
   reverse lookup : success (CH34)
   is reachable : yes
# end self check
 Running on : CH34/192.168.10.34 =
-- host : CH22
   host lookup : success (192.168.10.22)
   reverse lookup : success (CH22)
   is reachable : yes
-- host : CH34
   host lookup : success (192.168.10.34)
   reverse lookup : success (CH34)
   is reachable : yes
-- host : CH35
   host lookup : success (192.168.10.35)
   reverse lookup : success (CH35)
   is reachable : yes
-- host : CH36
   host lookup : success (192.168.10.36)
   reverse lookup : success (CH36)
   is reachable : yes
 CH35 
root@ch35's password:
sending incremental file list
created directory hadoop-dns
a.jar
hosts1
run.sh
sent 2394 bytes  received 69 bytes  703.71 bytes/sec
total size is 2618  speedup is 1.06
root@ch35's password:
# self check...
-- host : CH35
   host lookup : success (192.168.10.35)
   reverse lookup : success (CH35)
   is reachable : yes
# end self check
 Running on : CH35/192.168.10.35 =
-- host : CH22
   host lookup : success (192.168.10.22)
   reverse lookup : success (CH22)
   is reachable : yes
-- host : CH34
   host lookup : success (192.168.10.34)
   reverse lookup : success (CH34)
   is reachable : yes
-- host : CH35
   host lookup : success (192.168.10.35)
   reverse lookup : success (CH35)
   is reachable : yes
-- host : CH36
   host lookup : success (192.168.10.36)
   reverse lookup : success (CH36)
   is reachable : yes
 CH36 
root@ch36's password:
sending incremental file list
created directory hadoop-dns
a.jar
hosts1
run.sh
sent 2394 bytes  received 69 bytes  703.71 bytes/sec
total size is 2618  speedup is 1.06
root@ch36's password:
# self check...
-- host : CH36
   host lookup : success (192.168.10.36)
   reverse lookup : success (CH36)
   is reachable : yes
# end self check
 Running on : CH36/192.168.10.36 =
-- host : CH22
   host lookup : success (192.168.10.22)
   reverse lookup : success (CH22)
   is reachable : yes
-- host : CH34
   host lookup : success (192.168.10.34)
   reverse lookup : success (CH34)
   is reachable : yes
-- host : CH35
   host lookup : success (192.168.10.35)
   reverse lookup : success (CH35)
   is reachable : yes
-- host : CH36
   host lookup : success (192.168.10.36)
   reverse lookup : success (CH36)
   is reachable : yes

#  yarn jar mapreducehbaseTest.jar com.mediaadx.hbase.hadoop.test.TxtHbase
'' ''
13/08/02 13:10:17 WARN conf.Configuration: dfs.df.interval is deprecated.
Instead, use fs.df.interval
13/08/02 13:10:17 WARN conf.Configuration: hadoop.native.lib is deprecated.
Instead, use io.native.lib.available
13/08/02 13:10:17 WARN conf.Configuration: fs.default.name is deprecated.
Instead, use fs.defaultFS
13/08/02 13:10:17 WARN conf.Configuration: topology.script.number.args is
deprecated. Instead, use net.topology.script.number.args
13/08/02 13:10:17 WARN conf.Configuration: dfs.umaskmode is deprecated.
Instead, use fs.permissions.umask-mode
13/08/02 13:10:17 WARN conf.Configuration:
topology.node.switch.mapping.impl is deprecated. Instead, use
net.topology.node.switch.mapping.impl
13/08/02 13:10:17 WARN conf.Configuration: session.id is deprecated.
Instead, use dfs.metrics.session-id
13/08/02 13:10:17 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
13/08/02 13:10:17 WARN conf.Configuration: slave.host.name is deprecated.
Instead, use 

HBase mapreduce job: unable to find region for a table

2013-07-11 Thread S. Zhou
I am running a very simple MR HBase job (reading from a tiny HBase table and 
outputs nothing). I run it on a pseudo-distributed HBase cluster on my local 
machine which uses a pseudo-distributed HDFS (on local machine again). When I 
run it, I get the following exception: Unable to find region for test. But I am 
sure the table test exists.


13/07/11 10:27:35 INFO zookeeper.ClientCnxn: Session establishment complete on 
server localhost/127.0.0.1:2181, sessionid = 0x13fcec598d70005, negotiated 
timeout = 9
13/07/11 10:38:15 ERROR mapreduce.TableInputFormat: 
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
region for test,,99 after 10 tries.
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846)
    at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
  .

Here is my HBase hbase-site.xml file:
configuration
  property
    namehbase.rootdir/name
    valuehdfs://127.0.0.1:9000/hbase/value
  /property
  property
    namehbase.cluster.distributed/name
    valuetrue/value
  /property
 property
    namehbase.zookeeper.quorum/name
    value127.0.0.1/value
  /property

/configuration


Re: HBase mapreduce job: unable to find region for a table

2013-07-11 Thread Jean-Marc Spaggiari
Hi,

Is your table properly served? Are you able to see it on the Web UI? Is you
HBCK reporting everything correctly?

JM

2013/7/11 S. Zhou myx...@yahoo.com

 I am running a very simple MR HBase job (reading from a tiny HBase table
 and outputs nothing). I run it on a pseudo-distributed HBase cluster on my
 local machine which uses a pseudo-distributed HDFS (on local machine
 again). When I run it, I get the following exception: Unable to find region
 for test. But I am sure the table test exists.


 13/07/11 10:27:35 INFO zookeeper.ClientCnxn: Session establishment
 complete on server localhost/127.0.0.1:2181, sessionid =
 0x13fcec598d70005, negotiated timeout = 9
 13/07/11 10:38:15 ERROR mapreduce.TableInputFormat:
 org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
 region for test,,99 after 10 tries.
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
   .

 Here is my HBase hbase-site.xml file:
 configuration
   property
 namehbase.rootdir/name
 valuehdfs://127.0.0.1:9000/hbase/value
   /property
   property
 namehbase.cluster.distributed/name
 valuetrue/value
   /property
  property
 namehbase.zookeeper.quorum/name
 value127.0.0.1/value
   /property

 /configuration



Re: HBase mapreduce job: unable to find region for a table

2013-07-11 Thread S. Zhou
Yes, I can see the table through hbase shell and web ui (localhost:60010). hbck 
reports ok





 From: Jean-Marc Spaggiari jean-m...@spaggiari.org
To: user@hbase.apache.org; S. Zhou myx...@yahoo.com 
Sent: Thursday, July 11, 2013 11:01 AM
Subject: Re: HBase mapreduce job: unable to find region for a table
 


Hi,

Is your table properly served? Are you able to see it on the Web UI? Is you 
HBCK reporting everything correctly?

JM


2013/7/11 S. Zhou myx...@yahoo.com

I am running a very simple MR HBase job (reading from a tiny HBase table and 
outputs nothing). I run it on a pseudo-distributed HBase cluster on my local 
machine which uses a pseudo-distributed HDFS (on local machine again). When I 
run it, I get the following exception: Unable to find region for test. But I am 
sure the table test exists.


13/07/11 10:27:35 INFO zookeeper.ClientCnxn: Session establishment complete on 
server localhost/127.0.0.1:2181, sessionid = 0x13fcec598d70005, negotiated 
timeout = 9
13/07/11 10:38:15 ERROR mapreduce.TableInputFormat: 
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
region for test,,99 after 10 tries.
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846)
    at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
  .

Here is my HBase hbase-site.xml file:
configuration
  property
    namehbase.rootdir/name
    valuehdfs://127.0.0.1:9000/hbase/value
  /property
  property
    namehbase.cluster.distributed/name
    valuetrue/value
  /property
 property
    namehbase.zookeeper.quorum/name
    value127.0.0.1/value
  /property

/configuration


Re: HBase mapreduce job: unable to find region for a table

2013-07-11 Thread Jean-Marc Spaggiari
On the webui, when you click on your table, can you see the regions and are
they assigned to the server correctly

JM

2013/7/11 S. Zhou myx...@yahoo.com

 Yes, I can see the table through hbase shell and web ui (localhost:60010).
 hbck reports ok

   --
  *From:* Jean-Marc Spaggiari jean-m...@spaggiari.org
 *To:* user@hbase.apache.org; S. Zhou myx...@yahoo.com
 *Sent:* Thursday, July 11, 2013 11:01 AM
 *Subject:* Re: HBase mapreduce job: unable to find region for a table

 Hi,

 Is your table properly served? Are you able to see it on the Web UI? Is
 you HBCK reporting everything correctly?

 JM

 2013/7/11 S. Zhou myx...@yahoo.com

 I am running a very simple MR HBase job (reading from a tiny HBase table
 and outputs nothing). I run it on a pseudo-distributed HBase cluster on my
 local machine which uses a pseudo-distributed HDFS (on local machine
 again). When I run it, I get the following exception: Unable to find region
 for test. But I am sure the table test exists.


 13/07/11 10:27:35 INFO zookeeper.ClientCnxn: Session establishment
 complete on server localhost/127.0.0.1:2181, sessionid =
 0x13fcec598d70005, negotiated timeout = 9
 13/07/11 10:38:15 ERROR mapreduce.TableInputFormat:
 org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
 region for test,,99 after 10 tries.
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
   .

 Here is my HBase hbase-site.xml file:
 configuration
   property
 namehbase.rootdir/name
 valuehdfs://127.0.0.1:9000/hbase/value
   /property
   property
 namehbase.cluster.distributed/name
 valuetrue/value
   /property
  property
 namehbase.zookeeper.quorum/name
 value127.0.0.1/value
   /property

 /configuration







hbase + mapreduce

2013-04-21 Thread Adrian Acosta Mitjans
Hello:

I'm working in a proyect, and i'm using hbase for storage 
the data, y have this method that work great but without the performance
 i'm looking for, so i want is to make the same but using mapreduce.


public ArrayListMyObject findZ(String z) throws IOException {

ArrayListMyObject rows = new ArrayListMyObject();
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, test);
Scan s = new Scan();
s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y));
ResultScanner scanner = table.getScanner(s);
try {
for (Result rr : scanner) {
if (Bytes.toString(rr.getValue(Bytes.toBytes(x), 
Bytes.toBytes(y))).equals(z)) {
rows.add(getInformation(Bytes.toString(rr.getRow(;
}
}
} finally {
scanner.close();
}
return archivos;
}

the getInformation method take all the columns and convert the row in MyObject 
type.

I
 just want a example or a link to a tutorial that make something like 
this,  i want to get a result type as answer and not a number to count 
words, like many a found.
My natural language is spanish, so sorry if something is not well writing.

Thanths
http://www.uci.cu

Re: hbase + mapreduce

2013-04-21 Thread Marcos Luis Ortiz Valmaseda
Here you have several examples:
http://hbase.apache.org/book/mapreduce.example.html
http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/
http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/
http://stackoverflow.com/questions/12215313/load-data-into-hbase-table-using-hbase-map-reduce-api




2013/4/21 Adrian Acosta Mitjans amitj...@estudiantes.uci.cu

 Hello:

 I'm working in a proyect, and i'm using hbase for storage
 the data, y have this method that work great but without the performance
  i'm looking for, so i want is to make the same but using mapreduce.


 public ArrayListMyObject findZ(String z) throws IOException {

 ArrayListMyObject rows = new ArrayListMyObject();
 Configuration conf = HBaseConfiguration.create();
 HTable table = new HTable(conf, test);
 Scan s = new Scan();
 s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y));
 ResultScanner scanner = table.getScanner(s);
 try {
 for (Result rr : scanner) {
 if (Bytes.toString(rr.getValue(Bytes.toBytes(x),
 Bytes.toBytes(y))).equals(z)) {
 rows.add(getInformation(Bytes.toString(rr.getRow(;
 }
 }
 } finally {
 scanner.close();
 }
 return archivos;
 }

 the getInformation method take all the columns and convert the row in
 MyObject type.

 I
  just want a example or a link to a tutorial that make something like
 this,  i want to get a result type as answer and not a number to count
 words, like many a found.
 My natural language is spanish, so sorry if something is not well writing.

 Thanths
 http://www.uci.cu




-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186


hbase + mapreduce

2013-04-20 Thread Adrian Acosta Mitjans
Hello:

I'm working in a proyect, and i'm using hbase for storage the data, y have this 
method that work great but without the performance i'm looking for, so i want 
is to make the same but using mapreduce.


public ArrayListMyObject findZ(String z) throws IOException {

ArrayListMyObject rows = new ArrayListMyObject();
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, test);
Scan s = new Scan();
s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y));
ResultScanner scanner = table.getScanner(s);
try {
for (Result rr : scanner) {
if (Bytes.toString(rr.getValue(Bytes.toBytes(x), 
Bytes.toBytes(y))).equals(z)) {
rows.add(getInformation(Bytes.toString(rr.getRow(;
}
}
} finally {
scanner.close();
}
return archivos;
}

the getInformation method take all the columns and convert the row in MyObject 
type.

I just want a example or a link to a tutorial that make something like this,  i 
want to get a result type as answer and not a number to count words, like many 
a found.
My natural language is spanish, so sorry if something is not well writing.

Thanths
http://www.uci.cu

Re: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction

2013-01-21 Thread Farrokh Shahriari
Tnx,But I don't know why when the client.buffer.size is increased, I've got
bad result,does it related to other parameters ? and I give 8 gb heap to
each regionserver.

On Mon, Jan 21, 2013 at 12:34 PM, Harsh J ha...@cloudera.com wrote:

 Hi Farrokh,

 This isn't a HDFS question - please ask these questions only on their
 relevant lists for best results and to keep each list's discussion separate.


 On Mon, Jan 21, 2013 at 11:40 AM, Farrokh Shahriari 
 mohandes.zebeleh...@gmail.com wrote:

 Hi there
 Is there any way to use arrayList of Puts in map function to insert data
 to hbase ? Because,the context.write method doesn't allow to use arraylist
 of puts,so in every map function I can only put one row. What can I do for
 inserting some rows in each map function ?
 And also how can I use autoflush  bufferclientside in Map function for
 inserting data to Hbase Table ?

 Mohandes Zebeleh




 --
 Harsh J



Re: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction

2013-01-20 Thread Mohammad Tariq
Give put(ListPut puts)  a shot and see if it works for you.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Mon, Jan 21, 2013 at 11:41 AM, Farrokh Shahriari 
mohandes.zebeleh...@gmail.com wrote:

 Hi there
 Is there any way to use arrayList of Puts in map function to insert data to
 hbase ? Because,the context.write method doesn't allow to use arraylist of
 puts,so in every map function I can only put one row. What can I do for
 inserting some rows in each map function ?
 And also how can I use autoflush  bufferclientside in Map function for
 inserting data to Hbase Table ?

 Mohandes Zebeleh



RE: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction

2013-01-20 Thread Anoop Sam John
And also how can I use autoflush  bufferclientside in Map function for
inserting data to Hbase Table ?

You are using TableOutputFormat right? Here autoFlush is turned OFF ... You can 
use config param hbase.client.write.buffer to set the client side buffer size.

-Anoop-

From: Farrokh Shahriari [mohandes.zebeleh...@gmail.com]
Sent: Monday, January 21, 2013 11:41 AM
To: user@hbase.apache.org
Subject: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction

Hi there
Is there any way to use arrayList of Puts in map function to insert data to
hbase ? Because,the context.write method doesn't allow to use arraylist of
puts,so in every map function I can only put one row. What can I do for
inserting some rows in each map function ?
And also how can I use autoflush  bufferclientside in Map function for
inserting data to Hbase Table ?

Mohandes Zebeleh

Re: Hbase MapReduce

2012-11-25 Thread Thomas Wendzinski

Hallo,


It 's weird that hbase aggregate functions don't use MapReduce, this means
that the performance will be very poor.
Is it a must to use coprocessors?
Is there a much easier way to improve the functions' performance ?
Why would performance be poor? I am not dealing a long time with these 
coprocessors and am still testing

a lot, but in my perception its much more lightweight on the other hand.
Actually, depending on your row key design and request range, the load 
is distributed across the region servers.
Each will handle aggregation for its own key range. The client, invoking 
the coprpocessor then must merge the results, which

can be seen as sort of reduce function.

I think one has to precisely think about its own requirements. I am not 
sure how and even if M/R Jobs can work for real-time scenarios.
Here, coprocessors seem to be a good alternative, which could also be 
limited, depending of how many rows you need to iterate in the 
coprocessor for the kind of data you have and expect to request.


However, having the data processed in a M/R job before and persistent 
would be faster for the single client request because you only need to 
fetch the aggregated data that then already exists. But how recent is 
data at this time? Does it change frequently and aggregation results 
must be as recent as the data it reflects? Could be a con against M/R...


Regards
tom



Am 25.11.2012 07:26, schrieb Wei Tan:

Actually coprocessor can be used to implement MR-like function, while not
using Hadoop framework.



Best Regards,
Wei

Wei Tan
Research Staff Member
IBM T. J. Watson Research Center
Yorktown Heights, NY 10598
w...@us.ibm.com; 914-784-6752



From:   Dalia Sobhy dalia.mohso...@hotmail.com
To: user@hbase.apache.org user@hbase.apache.org,
Date:   11/24/2012 01:33 PM
Subject:RE: Hbase MapReduce




It 's weird that hbase aggregate functions don't use MapReduce, this means
that the performance will be very poor.
Is it a must to use coprocessors?
Is there a much easier way to improve the functions' performance ?


CC: user@hbase.apache.org
From: michael_se...@hotmail.com
Subject: Re: Hbase MapReduce
Date: Sat, 24 Nov 2012 12:05:45 -0600
To: user@hbase.apache.org

Do you think it would be a good idea to temper the use of CoProcessors?

This kind of reminds me of when people first started using stored

procedures...


Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 24, 2012, at 11:46 AM, tom t...@arcor.de wrote:


Hi, but you do not need to us M/R. You could also use coprocessors.

See this site:
https://blogs.apache.org/hbase/entry/coprocessor_introduction
- in the section Endpoints

An aggregation coprocessor ships with hbase that should match your

requirements.

You just need to load it and eventually you can access it from HTable:

HTable.coprocessorExec(..) 

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte
[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29

Regards
tom

Am 24.11.2012 18:32, schrieb Marcos Ortiz:

Regards, Dalia.
You have to use MapReduce for that.
In the HBase in Practice´s book, there are lot of great examples for

this.

On 11/24/2012 12:15 PM, Dalia Sobhy wrote:

Dear all,
I wanted to ask a question..
Do Hbase Aggregate Functions such as rowcount, getMax, get Average

use MapReduce to execute those functions?

Thanks :D
   





Hbase MapReduce

2012-11-24 Thread Dalia Sobhy

Dear all,
I wanted to ask a question..
Do Hbase Aggregate Functions such as rowcount, getMax, get Average use 
MapReduce to execute those functions?
Thanks :D 

Re: Hbase MapReduce

2012-11-24 Thread Marcos Ortiz

Regards, Dalia.
You have to use MapReduce for that.
In the HBase in Practice´s book, there are lot of great examples for this.

On 11/24/2012 12:15 PM, Dalia Sobhy wrote:

Dear all,
I wanted to ask a question..
Do Hbase Aggregate Functions such as rowcount, getMax, get Average use 
MapReduce to execute those functions?
Thanks :D   




--

Marcos Luis Orti'z Valmaseda
about.me/marcosortiz http://about.me/marcosortiz
@marcosluis2186 http://twitter.com/marcosluis2186



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Hbase MapReduce

2012-11-24 Thread tom

Hi, but you do not need to us M/R. You could also use coprocessors.

See this site:
https://blogs.apache.org/hbase/entry/coprocessor_introduction
- in the section Endpoints

An aggregation coprocessor ships with hbase that should match your 
requirements.

You just need to load it and eventually you can access it from HTable:

HTable.coprocessorExec(..) 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29


Regards
tom

Am 24.11.2012 18:32, schrieb Marcos Ortiz:

Regards, Dalia.
You have to use MapReduce for that.
In the HBase in Practice´s book, there are lot of great examples for 
this.


On 11/24/2012 12:15 PM, Dalia Sobhy wrote:

Dear all,
I wanted to ask a question..
Do Hbase Aggregate Functions such as rowcount, getMax, get Average 
use MapReduce to execute those functions?

Thanks :D








Re: Hbase MapReduce

2012-11-24 Thread Michel Segel
Do you think it would be a good idea to temper the use of CoProcessors?

This kind of reminds me of when people first started using stored procedures...


Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 24, 2012, at 11:46 AM, tom t...@arcor.de wrote:

 Hi, but you do not need to us M/R. You could also use coprocessors.
 
 See this site:
 https://blogs.apache.org/hbase/entry/coprocessor_introduction
 - in the section Endpoints
 
 An aggregation coprocessor ships with hbase that should match your 
 requirements.
 You just need to load it and eventually you can access it from HTable:
 
 HTable.coprocessorExec(..) 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29
 
 Regards
 tom
 
 Am 24.11.2012 18:32, schrieb Marcos Ortiz:
 Regards, Dalia.
 You have to use MapReduce for that.
 In the HBase in Practice´s book, there are lot of great examples for this.
 
 On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
 Dear all,
 I wanted to ask a question..
 Do Hbase Aggregate Functions such as rowcount, getMax, get Average use 
 MapReduce to execute those functions?
 Thanks :D
 


RE: Hbase MapReduce

2012-11-24 Thread Dalia Sobhy

It 's weird that hbase aggregate functions don't use MapReduce, this means that 
the performance will be very poor.
Is it a must to use coprocessors?
Is there a much easier way to improve the functions' performance ?

 CC: user@hbase.apache.org
 From: michael_se...@hotmail.com
 Subject: Re: Hbase MapReduce
 Date: Sat, 24 Nov 2012 12:05:45 -0600
 To: user@hbase.apache.org
 
 Do you think it would be a good idea to temper the use of CoProcessors?
 
 This kind of reminds me of when people first started using stored 
 procedures...
 
 
 Sent from a remote device. Please excuse any typos...
 
 Mike Segel
 
 On Nov 24, 2012, at 11:46 AM, tom t...@arcor.de wrote:
 
  Hi, but you do not need to us M/R. You could also use coprocessors.
  
  See this site:
  https://blogs.apache.org/hbase/entry/coprocessor_introduction
  - in the section Endpoints
  
  An aggregation coprocessor ships with hbase that should match your 
  requirements.
  You just need to load it and eventually you can access it from HTable:
  
  HTable.coprocessorExec(..) 
  http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29
  
  Regards
  tom
  
  Am 24.11.2012 18:32, schrieb Marcos Ortiz:
  Regards, Dalia.
  You have to use MapReduce for that.
  In the HBase in Practice´s book, there are lot of great examples for this.
  
  On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
  Dear all,
  I wanted to ask a question..
  Do Hbase Aggregate Functions such as rowcount, getMax, get Average use 
  MapReduce to execute those functions?
  Thanks :D
  
  

RE: Hbase MapReduce

2012-11-24 Thread Wei Tan
Actually coprocessor can be used to implement MR-like function, while not 
using Hadoop framework.



Best Regards,
Wei

Wei Tan 
Research Staff Member 
IBM T. J. Watson Research Center
Yorktown Heights, NY 10598
w...@us.ibm.com; 914-784-6752



From:   Dalia Sobhy dalia.mohso...@hotmail.com
To: user@hbase.apache.org user@hbase.apache.org, 
Date:   11/24/2012 01:33 PM
Subject:RE: Hbase MapReduce




It 's weird that hbase aggregate functions don't use MapReduce, this means 
that the performance will be very poor.
Is it a must to use coprocessors?
Is there a much easier way to improve the functions' performance ?

 CC: user@hbase.apache.org
 From: michael_se...@hotmail.com
 Subject: Re: Hbase MapReduce
 Date: Sat, 24 Nov 2012 12:05:45 -0600
 To: user@hbase.apache.org
 
 Do you think it would be a good idea to temper the use of CoProcessors?
 
 This kind of reminds me of when people first started using stored 
procedures...
 
 
 Sent from a remote device. Please excuse any typos...
 
 Mike Segel
 
 On Nov 24, 2012, at 11:46 AM, tom t...@arcor.de wrote:
 
  Hi, but you do not need to us M/R. You could also use coprocessors.
  
  See this site:
  https://blogs.apache.org/hbase/entry/coprocessor_introduction
  - in the section Endpoints
  
  An aggregation coprocessor ships with hbase that should match your 
requirements.
  You just need to load it and eventually you can access it from HTable:
  
  HTable.coprocessorExec(..) 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte
[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29
  
  Regards
  tom
  
  Am 24.11.2012 18:32, schrieb Marcos Ortiz:
  Regards, Dalia.
  You have to use MapReduce for that.
  In the HBase in Practice´s book, there are lot of great examples for 
this.
  
  On 11/24/2012 12:15 PM, Dalia Sobhy wrote:
  Dear all,
  I wanted to ask a question..
  Do Hbase Aggregate Functions such as rowcount, getMax, get Average 
use MapReduce to execute those functions?
  Thanks :D
  
  


Re: Query regarding HBase Mapreduce

2012-10-25 Thread Bertrand Dechoux
Hi Amit,

You might want to add details to your question.

1) Lot of small files is a known 'problem' for Hadoop MapReduce. And you
will find information on it by searching.
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
I assume you have a more specific issue, what is it?

2) I am not sure what you mean by HBase mapreduce on small files. If you
are using MapReduce with HBase as a source, you are not dealing with files
directly. If you are using HBase as a sink, then the lots of small files is
a problem which is orthogonal to the use of HBase. I don't think there is
such a thing as HBase MapReduce. You might want to reformulate your use
case.

Regards

Bertrand

On Thu, Oct 25, 2012 at 4:15 PM, amit bohra bohr...@gmail.com wrote:

 Hi,

 We are working on processing of lot of small files. For processing them we
 are using HBase Mapreduce as of now. Currently we are working with files in
 the range for around few millions, but over the period of time it would
 grow to a larger extent.

 Did anyone faced any issues while working on HBase mapreduce on small
 files?

 Thanks and Regards,
 Amit Bohra




-- 
Bertrand Dechoux


Re: Query regarding HBase Mapreduce

2012-10-25 Thread Nick maillard
Hi amit

I am starting with Hbase and MR so my opinion ismore about what I read than real
world.
However the documentation says Hadoop will deal better with a set of large files
than a lot of small ones.

regards
amit bohra bohra.a@... writes:







Re: Query regarding HBase Mapreduce

2012-10-25 Thread lohit
When you say small files, do you mean to say those are stored within HBase
columns?
If so, you need not worry as HBase would eventually write bigger HFile on
disk (or HDFS).
If you are storing lot of small files on HDFS itself, then you will have
scalability problems as single NameNode cannot handle billions of files.

2012/10/25 amit bohra bohr...@gmail.com

 Hi,

 We are working on processing of lot of small files. For processing them we
 are using HBase Mapreduce as of now. Currently we are working with files in
 the range for around few millions, but over the period of time it would
 grow to a larger extent.

 Did anyone faced any issues while working on HBase mapreduce on small
 files?

 Thanks and Regards,
 Amit Bohra




-- 
Have a Nice Day!
Lohit


HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Amlan Roy
Hi,

 

While writing a MapReduce job for HBase, can I use multiple tables as input?
I think TableMapReduceUtil.initTableMapperJob() takes a single table as
parameter. For my requirement, I want to specify multiple tables and scan
instances. I read about MultiTableInputCollection in the document
https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in
HBase-0.92.0.

 

Regards,

Amlan



Re: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Mohammad Tariq
Hello Amlan,

Issue is still unresolved...Will get fixed in 0.96.0.

Regards,
Mohammad Tariq


On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote:
 Hi,



 While writing a MapReduce job for HBase, can I use multiple tables as input?
 I think TableMapReduceUtil.initTableMapperJob() takes a single table as
 parameter. For my requirement, I want to specify multiple tables and scan
 instances. I read about MultiTableInputCollection in the document
 https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in
 HBase-0.92.0.



 Regards,

 Amlan



Re: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Ioakim Perros

Hi,

Isn't that the case that you can always initiate a scanner inside a map 
job (referring to another table from which had been set into the 
configuration of TableMapReduceUtil.initTableMapperJob(...) ) ?


Hope this serves as temporary solution.

On 08/06/2012 02:35 PM, Mohammad Tariq wrote:

Hello Amlan,

 Issue is still unresolved...Will get fixed in 0.96.0.

Regards,
 Mohammad Tariq


On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote:

Hi,



While writing a MapReduce job for HBase, can I use multiple tables as input?
I think TableMapReduceUtil.initTableMapperJob() takes a single table as
parameter. For my requirement, I want to specify multiple tables and scan
instances. I read about MultiTableInputCollection in the document
https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in
HBase-0.92.0.



Regards,

Amlan





Re: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Sonal Goyal
Hi Amlan,

I think if you share your usecase regarding two tables as inputs, people on
the mailing list may be able to help you better. For example, are you
looking at joining the two tables? What are the sizes of the tables etc?

Best Regards,
Sonal
Crux: Reporting for HBase https://github.com/sonalgoyal/crux
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal





On Mon, Aug 6, 2012 at 5:10 PM, Ioakim Perros imper...@gmail.com wrote:

 Hi,

 Isn't that the case that you can always initiate a scanner inside a map
 job (referring to another table from which had been set into the
 configuration of TableMapReduceUtil.**initTableMapperJob(...) ) ?

 Hope this serves as temporary solution.

 On 08/06/2012 02:35 PM, Mohammad Tariq wrote:

 Hello Amlan,

  Issue is still unresolved...Will get fixed in 0.96.0.

 Regards,
  Mohammad Tariq


 On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com
 wrote:

 Hi,



 While writing a MapReduce job for HBase, can I use multiple tables as
 input?
 I think TableMapReduceUtil.**initTableMapperJob() takes a single table
 as
 parameter. For my requirement, I want to specify multiple tables and scan
 instances. I read about MultiTableInputCollection in the document
 https://issues.apache.org/**jira/browse/HBASE-3996https://issues.apache.org/jira/browse/HBASE-3996.
 But I don't find it in
 HBase-0.92.0.



 Regards,

 Amlan





RE: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Amlan Roy
Hi,

If TableMapper and TableMapReduceUtil.initTableMapperJob() does not support
multiple tables as input, can I use Hadoop Mapper/Reducer classes and
specify the the input/output format myself?

What I want to do is, I want to read two tables in the map phase and want to
reduce them together. What is the best solution available in 0.92.0 (I
understand the best solution is coming in version 0.96.0).

Regards,
Amlan 

-Original Message-
From: Ioakim Perros [mailto:imper...@gmail.com] 
Sent: Monday, August 06, 2012 5:11 PM
To: user@hbase.apache.org
Subject: Re: HBase MapReduce - Using mutiple tables as source

Hi,

Isn't that the case that you can always initiate a scanner inside a map 
job (referring to another table from which had been set into the 
configuration of TableMapReduceUtil.initTableMapperJob(...) ) ?

Hope this serves as temporary solution.

On 08/06/2012 02:35 PM, Mohammad Tariq wrote:
 Hello Amlan,

  Issue is still unresolved...Will get fixed in 0.96.0.

 Regards,
  Mohammad Tariq


 On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote:
 Hi,



 While writing a MapReduce job for HBase, can I use multiple tables as
input?
 I think TableMapReduceUtil.initTableMapperJob() takes a single table as
 parameter. For my requirement, I want to specify multiple tables and scan
 instances. I read about MultiTableInputCollection in the document
 https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in
 HBase-0.92.0.



 Regards,

 Amlan




Re: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Ferdy Galema
Hi,

Perhaps you want to take a look at MultipleInputs. I'm not sure if it works
for TableInputFormat, but at least you can use it for inspiration.

Ferdy.

On Mon, Aug 6, 2012 at 3:02 PM, Amlan Roy amlan@cleartrip.com wrote:

 Hi,

 If TableMapper and TableMapReduceUtil.initTableMapperJob() does not support
 multiple tables as input, can I use Hadoop Mapper/Reducer classes and
 specify the the input/output format myself?

 What I want to do is, I want to read two tables in the map phase and want
 to
 reduce them together. What is the best solution available in 0.92.0 (I
 understand the best solution is coming in version 0.96.0).

 Regards,
 Amlan

 -Original Message-
 From: Ioakim Perros [mailto:imper...@gmail.com]
 Sent: Monday, August 06, 2012 5:11 PM
 To: user@hbase.apache.org
 Subject: Re: HBase MapReduce - Using mutiple tables as source

 Hi,

 Isn't that the case that you can always initiate a scanner inside a map
 job (referring to another table from which had been set into the
 configuration of TableMapReduceUtil.initTableMapperJob(...) ) ?

 Hope this serves as temporary solution.

 On 08/06/2012 02:35 PM, Mohammad Tariq wrote:
  Hello Amlan,
 
   Issue is still unresolved...Will get fixed in 0.96.0.
 
  Regards,
   Mohammad Tariq
 
 
  On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com
 wrote:
  Hi,
 
 
 
  While writing a MapReduce job for HBase, can I use multiple tables as
 input?
  I think TableMapReduceUtil.initTableMapperJob() takes a single table as
  parameter. For my requirement, I want to specify multiple tables and
 scan
  instances. I read about MultiTableInputCollection in the document
  https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it
 in
  HBase-0.92.0.
 
 
 
  Regards,
 
  Amlan
 




RE: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Wei Tan
A related question: may I have multiple tables as output, in a single Map 
Job?
I understand that this is achievable by running multiple MR jobs, each 
with a different output table specified in the reduce class. What I want 
is to scan a source table once and generate multiple tables at one time.
Thanks,


Best Regards,
Wei

Wei Tan 
Research Staff Member 
IBM T. J. Watson Research Center
19 Skyline Dr, Hawthorne, NY  10532
w...@us.ibm.com; 914-784-6752



From:   Amlan Roy amlan@cleartrip.com
To: user@hbase.apache.org, 
Date:   08/06/2012 09:05 AM
Subject:RE: HBase MapReduce - Using mutiple tables as source



Hi,

If TableMapper and TableMapReduceUtil.initTableMapperJob() does not 
support
multiple tables as input, can I use Hadoop Mapper/Reducer classes and
specify the the input/output format myself?

What I want to do is, I want to read two tables in the map phase and want 
to
reduce them together. What is the best solution available in 0.92.0 (I
understand the best solution is coming in version 0.96.0).

Regards,
Amlan 

-Original Message-
From: Ioakim Perros [mailto:imper...@gmail.com] 
Sent: Monday, August 06, 2012 5:11 PM
To: user@hbase.apache.org
Subject: Re: HBase MapReduce - Using mutiple tables as source

Hi,

Isn't that the case that you can always initiate a scanner inside a map 
job (referring to another table from which had been set into the 
configuration of TableMapReduceUtil.initTableMapperJob(...) ) ?

Hope this serves as temporary solution.

On 08/06/2012 02:35 PM, Mohammad Tariq wrote:
 Hello Amlan,

  Issue is still unresolved...Will get fixed in 0.96.0.

 Regards,
  Mohammad Tariq


 On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com 
wrote:
 Hi,



 While writing a MapReduce job for HBase, can I use multiple tables as
input?
 I think TableMapReduceUtil.initTableMapperJob() takes a single table as
 parameter. For my requirement, I want to specify multiple tables and 
scan
 instances. I read about MultiTableInputCollection in the document
 https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it 
in
 HBase-0.92.0.



 Regards,

 Amlan





Re: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Stack
On Mon, Aug 6, 2012 at 3:22 PM, Wei Tan w...@us.ibm.com wrote:
 I understand that this is achievable by running multiple MR jobs, each
 with a different output table specified in the reduce class. What I want
 is to scan a source table once and generate multiple tables at one time.
 Thanks,


There is nothing in HBase natively that will do this but no reason you
can't do this in a map or reduce task. You'd set up two or more HTable
instances on task init each pointing to a particular table.  Then,
inside in your task you'd send the puts to one of the possible HTables
switching on whatever your fancy.

St.Ack


Re: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread jmozah
Its available just as a patch on trunk for now.
You wont find it in 0.92.0

./zahoor


On 06-Aug-2012, at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote:

 https://issues.apache.org/jira/browse/HBASE-3996



A question about HBase MapReduce

2012-05-25 Thread Florin P
Hello!

I've read Lars George's blog 
http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where at the 
end of the article, he mentioned In the next post I will show you how to 
import data from a raw data
file into a HBase table and how you eventually process the data in the
HBase table. We will address questions like how many mappers and/or
reducers are needed and how can I improve import and processing
performance.. I looked in the blog up for these questions, but it seems that 
there is no article related. Do you knoe if he you touched these subjects into 
a different post or book? Particular I am interested  

1. how you can set up the number of mappers?
2. number of mappers can be set up per region server? If yes how?
3. How the big number of set up mappers can affect the data locality?
4. is this algorithm for computing the number of mappers 
(https://issues.apache.org/jira/browse/HBASE-1172) still available
Currently,
the number of mappers specified when using TableInputFormat is strictly
followed if less than total regions on the input table. If greater, the
number of regions is used.
This will modify the splitting algorithm to do the following:
* Specify 0 mappers when you want # mappers = # regions
* If you specify fewer mappers than regions, will use exactly the 
number you specify based on the current algorithm
* If
you specify more mappers than regions, will divide regions up by
determining [start,X) [X,end). The number of mappers will always be a
multiple of number of regions. This is so we do not have scanners
spanning multiple regions.
There is an additional issue in that the default number of mappers
in JobConf is set to 1. That means if a user does not explicitly set
number of map tasks, a single mapper will be used. 

I'll look forward for you answers. Thank you.

Kind regards, Florin

Re: A question about HBase MapReduce

2012-05-25 Thread Doug Meil

re:  data from raw data file into hbase table

One approach is bulk loading..

http://hbase.apache.org/book.html#arch.bulk.load



If he's talking about using an Hbase table as the source of a MR job, then
see this...


http://hbase.apache.org/book.html#splitter




On 5/25/12 2:35 AM, Florin P florinp...@yahoo.com wrote:

Hello!

I've read Lars George's blog
http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where
at the end of the article, he mentioned In the next post I will show you
how to import data from a raw data
file into a HBase table and how you eventually process the data in the
HBase table. We will address questions like how many mappers and/or
reducers are needed and how can I improve import and processing
performance.. I looked in the blog up for these questions, but it seems
that there is no article related. Do you knoe if he you touched these
subjects into a different post or book? Particular I am interested

1. how you can set up the number of mappers?
2. number of mappers can be set up per region server? If yes how?
3. How the big number of set up mappers can affect the data locality?
4. is this algorithm for computing the number of mappers
(https://issues.apache.org/jira/browse/HBASE-1172) still available
Currently,
the number of mappers specified when using TableInputFormat is strictly
followed if less than total regions on the input table. If greater, the
number of regions is used.
This will modify the splitting algorithm to do the following:
   * Specify 0 mappers when you want # mappers = # regions
   * If you specify fewer mappers than regions, will use exactly the number
you specify based on the current algorithm
   * If
you specify more mappers than regions, will divide regions up by
determining [start,X) [X,end). The number of mappers will always be a
multiple of number of regions. This is so we do not have scanners
spanning multiple regions.
There is an additional issue in that the default number of mappers
in JobConf is set to 1. That means if a user does not explicitly set
number of map tasks, a single mapper will be used. 

I'll look forward for you answers. Thank you.

Kind regards, Florin




Re: HBase mapreduce sink - using a custom TableReducer to pass in Puts

2012-05-15 Thread Jean-Daniel Cryans
My first guess would be to check if all the KVs using the same
qualifier, because then it's basically the same cell 10 times.

J-D

On Mon, May 14, 2012 at 6:50 PM, Ben Kim benkimkim...@gmail.com wrote:
 Hello!

 I'm writing a mapreduce code to read a SequenceFile and write it to hbase
 table.
 Normally, or what hbase tutorial tells us to do.. you would create a Put in
 TableMapper and pass it to IdentityTableReducer. This in fact work for me.

 But now I'm trying to separate the computations into mapper and let reducer
 take care of writing to hbase.

 Following is my TableReducer

 public class MyTableReducer extends TableReducerImmutableBytesWritable,
 KeyValue, ImmutableBytesWritable {
    public void reduce(ImmutableBytesWritable key, IterableKeyValue
 values, Context context) throws IOException, InterruptedException {
        Put put = new Put(key.get());
        for(KeyValue kv : values) {
            put.add(kv);
        }
        context.write(key, put);
    }
 }

 For my testing purpose, I'm writing 10 rows with 10 cells.
 I added multiple cells to each Put operations (put.add(kv))
 But this Reducer will only write the one last cell passed by Mapper!

 Following is setup of the Job

        Job itemTableJob = prepareJob(
                inputPath, outputPath, SequenceFileInputFormat.class,
                MyMapper.class, ImmutableBytesWritable.class,
 KeyValue.class,
                MyTableReducerclass, ImmutableBytesWritable.class,
 Writable.class, TableOutputFormat.class);

        TableMapReduceUtil.initTableReducerJob(rs_system, null,
 itemTableJob);
        itemTableJob.waitForCompletion(true);

 Am I missing smting?





 --

 *Benjamin Kim*
 Tel : +82 2.6400.3654* |* Mo : +82 10.5357.0521*
 benkimkimben at gmail*


HBase mapreduce sink - using a custom TableReducer to pass in Puts

2012-05-14 Thread Ben Kim
Hello!

I'm writing a mapreduce code to read a SequenceFile and write it to hbase
table.
Normally, or what hbase tutorial tells us to do.. you would create a Put in
TableMapper and pass it to IdentityTableReducer. This in fact work for me.

But now I'm trying to separate the computations into mapper and let reducer
take care of writing to hbase.

Following is my TableReducer

public class MyTableReducer extends TableReducerImmutableBytesWritable,
KeyValue, ImmutableBytesWritable {
public void reduce(ImmutableBytesWritable key, IterableKeyValue
values, Context context) throws IOException, InterruptedException {
Put put = new Put(key.get());
for(KeyValue kv : values) {
put.add(kv);
}
context.write(key, put);
}
}

For my testing purpose, I'm writing 10 rows with 10 cells.
I added multiple cells to each Put operations (put.add(kv))
But this Reducer will only write the one last cell passed by Mapper!

Following is setup of the Job

Job itemTableJob = prepareJob(
inputPath, outputPath, SequenceFileInputFormat.class,
MyMapper.class, ImmutableBytesWritable.class,
KeyValue.class,
MyTableReducerclass, ImmutableBytesWritable.class,
Writable.class, TableOutputFormat.class);

TableMapReduceUtil.initTableReducerJob(rs_system, null,
itemTableJob);
itemTableJob.waitForCompletion(true);

Am I missing smting?





-- 

*Benjamin Kim*
Tel : +82 2.6400.3654* |* Mo : +82 10.5357.0521*
benkimkimben at gmail*


Re: HBase mapreduce sink - using a custom TableReducer to pass in Puts

2012-05-14 Thread Ben Kim
Oops I made mistake while copy-paste

The reducer initialization code should be like this

TableMapReduceUtil.initTableReducerJob(rs_system, MyTableReducer,
itemTableJob);


On Tue, May 15, 2012 at 10:50 AM, Ben Kim benkimkim...@gmail.com wrote:

 Hello!

 I'm writing a mapreduce code to read a SequenceFile and write it to hbase
 table.
 Normally, or what hbase tutorial tells us to do.. you would create a Put
 in TableMapper and pass it to IdentityTableReducer. This in fact work for
 me.

 But now I'm trying to separate the computations into mapper and let
 reducer take care of writing to hbase.

 Following is my TableReducer

 public class MyTableReducer extends TableReducerImmutableBytesWritable,
 KeyValue, ImmutableBytesWritable {
 public void reduce(ImmutableBytesWritable key, IterableKeyValue
 values, Context context) throws IOException, InterruptedException {
 Put put = new Put(key.get());
 for(KeyValue kv : values) {
 put.add(kv);
 }
 context.write(key, put);
 }
 }

 For my testing purpose, I'm writing 10 rows with 10 cells.
 I added multiple cells to each Put operations (put.add(kv))
 But this Reducer will only write the one last cell passed by Mapper!

 Following is setup of the Job

 Job itemTableJob = prepareJob(
 inputPath, outputPath, SequenceFileInputFormat.class,
 MyMapper.class, ImmutableBytesWritable.class,
 KeyValue.class,
 MyTableReducerclass, ImmutableBytesWritable.class,
 Writable.class, TableOutputFormat.class);

 TableMapReduceUtil.initTableReducerJob(rs_system, null,
 itemTableJob);
 itemTableJob.waitForCompletion(true);

 Am I missing smting?





 --

 *Benjamin Kim*
 Tel : +82 2.6400.3654* |* Mo : +82 10.5357.0521*
 benkimkimben at gmail*




-- 

*Benjamin Kim*
Tel : +82 2.6400.3654* |* Mo : +82 10.5357.0521*
benkimkim...@gmail.com*


HBase MapReduce Job with Multiple Scans

2012-04-03 Thread Shawn Quinn
Hello,

I have a table whose key is structured as eventType + time, and I need to
periodically run a map reduce job on the table which will process each
event type within a specific time range.  So, the map reduce job needs to
process multiple segments of the table as input, and therefore can't be
setup with a single scan.  (Using a filter on the scan would theoretically
work, but doesn't scale well as the data size increases.)

Given that the HBase provided TableMapReduceUtil.initTableMapperJob only
supports a single scan there doesn't appear to be a built in way to run a
mapreduce job that has multiple scans as input.  I found the following
related post which points me to creating my own map reduce InputFormat
type by extending HBase's TableInputFormatBase and overriding the
getSplits() method:

http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects

So, that's currently the direction I'm heading.  However, before I got too
far in the weeds I thought I'd ask:

1. Is this still the best/right way to handle this situation?

2. Does anyone have an example of a custom InputFormat that sets up
multiple scans against an HBase input table (something like the
MultiSegmentTableInputFormat referred to in the post) that they'd be
willing to share?

Thanks,

   -Shawn


Re: HBase MapReduce Job with Multiple Scans

2012-04-03 Thread Ted Yu
Take a look at HBASE-3996 where Stack has some comments outstanding.

Cheers

On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn squ...@moxiegroup.com wrote:

 Hello,

 I have a table whose key is structured as eventType + time, and I need to
 periodically run a map reduce job on the table which will process each
 event type within a specific time range.  So, the map reduce job needs to
 process multiple segments of the table as input, and therefore can't be
 setup with a single scan.  (Using a filter on the scan would theoretically
 work, but doesn't scale well as the data size increases.)

 Given that the HBase provided TableMapReduceUtil.initTableMapperJob only
 supports a single scan there doesn't appear to be a built in way to run a
 mapreduce job that has multiple scans as input.  I found the following
 related post which points me to creating my own map reduce InputFormat
 type by extending HBase's TableInputFormatBase and overriding the
 getSplits() method:


 http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects

 So, that's currently the direction I'm heading.  However, before I got too
 far in the weeds I thought I'd ask:

 1. Is this still the best/right way to handle this situation?

 2. Does anyone have an example of a custom InputFormat that sets up
 multiple scans against an HBase input table (something like the
 MultiSegmentTableInputFormat referred to in the post) that they'd be
 willing to share?

 Thanks,

   -Shawn



Re: HBase MapReduce Job with Multiple Scans

2012-04-03 Thread Shawn Quinn
Sounds good, thanks Ted.  I'll give it a whirl and add any
comments/findings to the Jira issue.

 -Shawn

On Tue, Apr 3, 2012 at 10:45 AM, Ted Yu yuzhih...@gmail.com wrote:

 Stack said he might help implement his suggestions if Eran is busy.

 The patch doesn't depend on recent changes to the Hadoop/MapReduce.

 Give it a try. Feedback would help us refine the patch.

 Thanks

 On Tue, Apr 3, 2012 at 7:43 AM, Shawn Quinn squ...@moxiegroup.com wrote:

  Thanks for the quick reply Ted!  That's exactly what I'm looking for.
  Reading through the Jira comments I'm a bit confused on what the
  status/plan is with that patch.  Do you expect that will be included in
 the
  next HBase release, or has it been postponed?  Also, does that change
  depend on any recent changes to the Hadoop/MapReduce, or will it work
  as-is?
 
  In the meantime, I'll give that patch a closer look and setup some custom
  classes in my own project to try and pull off something similar.
 
  -Shawn
 
  On Tue, Apr 3, 2012 at 9:42 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   Take a look at HBASE-3996 where Stack has some comments outstanding.
  
   Cheers
  
   On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn squ...@moxiegroup.com
  wrote:
  
Hello,
   
I have a table whose key is structured as eventType + time, and I
  need
   to
periodically run a map reduce job on the table which will process
 each
event type within a specific time range.  So, the map reduce job
 needs
  to
process multiple segments of the table as input, and therefore can't
 be
setup with a single scan.  (Using a filter on the scan would
   theoretically
work, but doesn't scale well as the data size increases.)
   
Given that the HBase provided TableMapReduceUtil.initTableMapperJob
   only
supports a single scan there doesn't appear to be a built in way to
   run a
mapreduce job that has multiple scans as input.  I found the
 following
related post which points me to creating my own map reduce
  InputFormat
type by extending HBase's TableInputFormatBase and overriding the
getSplits() method:
   
   
   
  
 
 http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects
   
So, that's currently the direction I'm heading.  However, before I
 got
   too
far in the weeds I thought I'd ask:
   
1. Is this still the best/right way to handle this situation?
   
2. Does anyone have an example of a custom InputFormat that sets up
multiple scans against an HBase input table (something like the
MultiSegmentTableInputFormat referred to in the post) that they'd
 be
willing to share?
   
Thanks,
   
  -Shawn
   
  
 



Re: hbase mapreduce running though command line

2011-12-10 Thread Vamshi Krishna
i tried to run the program from eclipse, but during that , i could not see
any job running on the jobtracker/tasktracker web UI pages. i observed that
on the eclipse localJobRunner is executing , so that job is not submitted
to the whole cluster, but its executing on that name node machine alone.
So i thought of generating jar of my whole project from eclipse, in which
conf folder of the hbase was added as external class folder to its build
path. When i am trying to jar my project folder with the help of
eclipse(File-export-java-jar FILE-), one alert is coming like 'the
jar has been generated except conf folder in the build path'. So, without
conf folder in the jar, i tried to execute from command line , as i
mentioned in the last mail.

how can i run my mapreduce program on the hadoop cluster from the
eclipse..? Any configuration settings should i make ..? please help..

On Fri, Dec 9, 2011 at 11:13 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 You don't need the conf dir in the jar, in fact you really don't want
 it there. I don't know where that alert is coming from, would be nice
 if you gave more details.

 J-D

 On Fri, Dec 9, 2011 at 6:45 AM, Vamshi Krishna vamshi2...@gmail.com
 wrote:
  Hi,
  i want to run mapreduce program to insert data to tables in hbase. my
  cluster has  3 machines. If i want to run that program through command
  line, where can i do so..? should i do
  ${Hadoop_Home}/bin/hadoop jar MyJavaProg.jar java_mainclass_file source
  destn
  here MyJavaProg.jar is the jar of my java project , in which i wrote java
  main class named java_mainclass_file , where i wrote map reduce class
  consisting of map method only(i dont require reduce). i passed arguments
 to
  that main method, as arguments through Fileformat.addinput(..),
  Fileformat.setoutput(..) methods.  source and destn are locations inside
  DFS.
  when i tried to create jar of my whole java project from eclipse, i got
  alert that, 'conf directory of HBase was not exported to jar', So how to
  run my program through command line to insert data into hbase table..??
 can
  anybody help??
 
  --
  *Regards*
  *
  Vamshi Krishna
  *




-- 
*Regards*
*
Vamshi Krishna
*


hbase mapreduce running though command line

2011-12-09 Thread Vamshi Krishna
Hi,
i want to run mapreduce program to insert data to tables in hbase. my
cluster has  3 machines. If i want to run that program through command
line, where can i do so..? should i do
${Hadoop_Home}/bin/hadoop jar MyJavaProg.jar java_mainclass_file source
destn
here MyJavaProg.jar is the jar of my java project , in which i wrote java
main class named java_mainclass_file , where i wrote map reduce class
consisting of map method only(i dont require reduce). i passed arguments to
that main method, as arguments through Fileformat.addinput(..),
Fileformat.setoutput(..) methods.  source and destn are locations inside
DFS.
when i tried to create jar of my whole java project from eclipse, i got
alert that, 'conf directory of HBase was not exported to jar', So how to
run my program through command line to insert data into hbase table..?? can
anybody help??

-- 
*Regards*
*
Vamshi Krishna
*


Re: hbase mapreduce running though command line

2011-12-09 Thread Jean-Daniel Cryans
You don't need the conf dir in the jar, in fact you really don't want
it there. I don't know where that alert is coming from, would be nice
if you gave more details.

J-D

On Fri, Dec 9, 2011 at 6:45 AM, Vamshi Krishna vamshi2...@gmail.com wrote:
 Hi,
 i want to run mapreduce program to insert data to tables in hbase. my
 cluster has  3 machines. If i want to run that program through command
 line, where can i do so..? should i do
 ${Hadoop_Home}/bin/hadoop jar MyJavaProg.jar java_mainclass_file source
 destn
 here MyJavaProg.jar is the jar of my java project , in which i wrote java
 main class named java_mainclass_file , where i wrote map reduce class
 consisting of map method only(i dont require reduce). i passed arguments to
 that main method, as arguments through Fileformat.addinput(..),
 Fileformat.setoutput(..) methods.  source and destn are locations inside
 DFS.
 when i tried to create jar of my whole java project from eclipse, i got
 alert that, 'conf directory of HBase was not exported to jar', So how to
 run my program through command line to insert data into hbase table..?? can
 anybody help??

 --
 *Regards*
 *
 Vamshi Krishna
 *


Re: HBase MapReduce Zookeeper

2011-11-20 Thread Randy D. Wallace Jr.

I had the same issue.

The problem for me turned out to be that the hbase.zookeeper.quorum was 
not set in hbase-site.xml in the server that submitted the mapreduce 
job.  Ironically, this is also the same server that was running hbase 
master.  This defaulted to 127.0.0.1 which was where the tasktrackers 
where trying to initiate the connection from.  The fact that the 
zookeeper quorum was not set in the tasktrackers was useless.


Hbase Mapreduce jobs Dashboard

2011-09-12 Thread Jimson K. James
Hi All,

 

When I run Hadoop mapreduce jobs, the job statistics and status is
displayed in jobtracker/task tracker. But when I use HBase mapreduce it
doesn't.

Is there any hbase mapreduce dashboard available or am I missing
something?

 

 

Thanks  Regards

Jimson K James

 

The Quieter You Become The More You Are Able To Hear.

 

* Confidentiality Statement/Disclaimer *

This message and any attachments is intended for the sole use of the intended 
recipient. It may contain confidential information. Any unauthorized use, 
dissemination or modification is strictly prohibited. If you are not the 
intended recipient, please notify the sender immediately then delete it from 
all your systems, and do not copy, use or print. Internet communications are 
not secure and it is the responsibility of the recipient to make sure that it 
is virus/malicious code exempt.
The company/sender cannot be responsible for any unauthorized alterations or 
modifications made to the contents. If you require any form of confirmation of 
the contents, please contact the company/sender. The company/sender is not 
liable for any errors or omissions in the content of this message.


Re: Hbase Mapreduce jobs Dashboard

2011-09-12 Thread Joey Echeverria
HBase doesn't have it's own MapReduce system, it uses Hadoop's. How
are you launching your jobs?

On Mon, Sep 12, 2011 at 2:32 AM, Jimson K. James
jimson.ja...@nestgroup.net wrote:
 Hi All,



 When I run Hadoop mapreduce jobs, the job statistics and status is
 displayed in jobtracker/task tracker. But when I use HBase mapreduce it
 doesn't.

 Is there any hbase mapreduce dashboard available or am I missing
 something?





 Thanks  Regards

 Jimson K James



 The Quieter You Become The More You Are Able To Hear.



 * Confidentiality Statement/Disclaimer *

 This message and any attachments is intended for the sole use of the intended 
 recipient. It may contain confidential information. Any unauthorized use, 
 dissemination or modification is strictly prohibited. If you are not the 
 intended recipient, please notify the sender immediately then delete it from 
 all your systems, and do not copy, use or print. Internet communications are 
 not secure and it is the responsibility of the recipient to make sure that it 
 is virus/malicious code exempt.
 The company/sender cannot be responsible for any unauthorized alterations or 
 modifications made to the contents. If you require any form of confirmation 
 of the contents, please contact the company/sender. The company/sender is not 
 liable for any errors or omissions in the content of this message.




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


RE: Hbase Mapreduce jobs Dashboard

2011-09-12 Thread Jimson K. James
Hi,

 How are you launching your jobs?
Testing from both a cygwin environment and from inside the hadoop/HBase
cluster using hadoop jar mrtest.jar


-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com] 
Sent: Monday, September 12, 2011 4:47 PM
To: user@hbase.apache.org
Subject: Re: Hbase Mapreduce jobs Dashboard

HBase doesn't have it's own MapReduce system, it uses Hadoop's. How
are you launching your jobs?

On Mon, Sep 12, 2011 at 2:32 AM, Jimson K. James
jimson.ja...@nestgroup.net wrote:
 Hi All,



 When I run Hadoop mapreduce jobs, the job statistics and status is
 displayed in jobtracker/task tracker. But when I use HBase mapreduce
it
 doesn't.

 Is there any hbase mapreduce dashboard available or am I missing
 something?





 Thanks  Regards

 Jimson K James



 The Quieter You Become The More You Are Able To Hear.



 * Confidentiality Statement/Disclaimer *

 This message and any attachments is intended for the sole use of the
intended recipient. It may contain confidential information. Any
unauthorized use, dissemination or modification is strictly prohibited.
If you are not the intended recipient, please notify the sender
immediately then delete it from all your systems, and do not copy, use
or print. Internet communications are not secure and it is the
responsibility of the recipient to make sure that it is virus/malicious
code exempt.
 The company/sender cannot be responsible for any unauthorized
alterations or modifications made to the contents. If you require any
form of confirmation of the contents, please contact the company/sender.
The company/sender is not liable for any errors or omissions in the
content of this message.




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434
* Confidentiality Statement/Disclaimer *

This message and any attachments is intended for the sole use of the intended 
recipient. It may contain confidential information. Any unauthorized use, 
dissemination or modification is strictly prohibited. If you are not the 
intended recipient, please notify the sender immediately then delete it from 
all your systems, and do not copy, use or print. Internet communications are 
not secure and it is the responsibility of the recipient to make sure that it 
is virus/malicious code exempt.
The company/sender cannot be responsible for any unauthorized alterations or 
modifications made to the contents. If you require any form of confirmation of 
the contents, please contact the company/sender. The company/sender is not 
liable for any errors or omissions in the content of this message.


Re: Hbase Mapreduce jobs Dashboard

2011-09-12 Thread Harsh J
Jimson,

This is probably related to your other question on the list where its
apparent that the HBase jobs are somehow running on a LocalJobRunner
instead of a proper MapReduce cluster. This is why the JT doesn't
display any knowledge about it.

On Mon, Sep 12, 2011 at 5:04 PM, Jimson K. James
jimson.ja...@nestgroup.net wrote:
 Hi,

 How are you launching your jobs?
 Testing from both a cygwin environment and from inside the hadoop/HBase
 cluster using hadoop jar mrtest.jar


 -Original Message-
 From: Joey Echeverria [mailto:j...@cloudera.com]
 Sent: Monday, September 12, 2011 4:47 PM
 To: user@hbase.apache.org
 Subject: Re: Hbase Mapreduce jobs Dashboard

 HBase doesn't have it's own MapReduce system, it uses Hadoop's. How
 are you launching your jobs?

 On Mon, Sep 12, 2011 at 2:32 AM, Jimson K. James
 jimson.ja...@nestgroup.net wrote:
 Hi All,



 When I run Hadoop mapreduce jobs, the job statistics and status is
 displayed in jobtracker/task tracker. But when I use HBase mapreduce
 it
 doesn't.

 Is there any hbase mapreduce dashboard available or am I missing
 something?





 Thanks  Regards

 Jimson K James



 The Quieter You Become The More You Are Able To Hear.



 * Confidentiality Statement/Disclaimer *

 This message and any attachments is intended for the sole use of the
 intended recipient. It may contain confidential information. Any
 unauthorized use, dissemination or modification is strictly prohibited.
 If you are not the intended recipient, please notify the sender
 immediately then delete it from all your systems, and do not copy, use
 or print. Internet communications are not secure and it is the
 responsibility of the recipient to make sure that it is virus/malicious
 code exempt.
 The company/sender cannot be responsible for any unauthorized
 alterations or modifications made to the contents. If you require any
 form of confirmation of the contents, please contact the company/sender.
 The company/sender is not liable for any errors or omissions in the
 content of this message.




 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434
 * Confidentiality Statement/Disclaimer *

 This message and any attachments is intended for the sole use of the intended 
 recipient. It may contain confidential information. Any unauthorized use, 
 dissemination or modification is strictly prohibited. If you are not the 
 intended recipient, please notify the sender immediately then delete it from 
 all your systems, and do not copy, use or print. Internet communications are 
 not secure and it is the responsibility of the recipient to make sure that it 
 is virus/malicious code exempt.
 The company/sender cannot be responsible for any unauthorized alterations or 
 modifications made to the contents. If you require any form of confirmation 
 of the contents, please contact the company/sender. The company/sender is not 
 liable for any errors or omissions in the content of this message.




-- 
Harsh J


Fwd: HBase Mapreduce cannot find Map class

2011-07-28 Thread air
-- Forwarded message --
From: air cnwe...@gmail.com
Date: 2011/7/28
Subject: HBase Mapreduce cannot find Map class
To: CDH Users cdh-u...@cloudera.org


import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapred.TableMapReduceUtil;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.lib.NullOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class LoadToHBase extends Configured implements Tool{
public static class XMapK, V extends MapReduceBase implements
MapperLongWritable, Text, K, V{
private JobConf conf;

@Override
public void configure(JobConf conf){
this.conf = conf;
try{
this.table = new HTable(new HBaseConfiguration(conf),
observations);
}catch(IOException e){
throw new RuntimeException(Failed HTable construction, e);
}
}

@Override
public void close() throws IOException{
super.close();
table.close();
}

private HTable table;
public void map(LongWritable key, Text value, OutputCollector
output, Reporter reporter) throws IOException{
String[] valuelist = value.toString().split(\t);
SimpleDateFormat sdf = new  SimpleDateFormat(-MM-dd
HH:mm:ss);
Date addtime = null; // 用户注册时间
Date ds = null;
Long delta_days = null;
String uid = valuelist[0];
try {
addtime = sdf.parse(valuelist[1]);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

String ds_str = conf.get(load.hbase.ds, null);
if (ds_str != null){
try {
ds = sdf.parse(ds_str);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}else{
ds_str = 2011-07-28;
}

if (addtime != null  ds != null){
delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 *
60 * 1000);
}

if (delta_days != null){
byte[] rowKey = uid.getBytes();
Put p = new Put(rowKey);
p.add(content.getBytes(), attr1.getBytes(),
delta_days.toString().getBytes());
table.put(p);
}
}
}
/**
 * @param args
 * @throws Exception
 */
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
int exitCode = ToolRunner.run(new HBaseConfiguration(), new
LoadToHBase(), args);
System.exit(exitCode);
}

@Override
public int run(String[] args) throws Exception {
// TODO Auto-generated method stub
JobConf conf = new JobConf(getClass());
TableMapReduceUtil.addDependencyJars(conf);
FileInputFormat.addInputPath(conf, new Path(args[0]));
conf.setJobName(LoadToHBase);
conf.setJarByClass(getClass());
conf.setMapperClass(XMap.class);
conf.setNumReduceTasks(0);
conf.setOutputFormat(NullOutputFormat.class);
JobClient.runJob(conf);
return 0;
}

}

execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/
and it says:

..
11/07/28 17:20:29 INFO mapred.JobClient: Task Id :
attempt_201107261532_2625_m_04_1, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127

Re: HBase MapReduce Zookeeper

2011-07-28 Thread Andre Reiter

this issue is still not resolved...

unfortunatelly calling HConnectionManager.deleteConnection(conf, true); after 
the MR job is finished, does not close the connection to the zookeeper
we have 3 zookeeper nodes
by default there is a limit of 10 connections allowed from a single client

so after running 30 MR jobs scheduled by our application, we have 30 unclosed 
connections, trying to start a new MR job results in a failure, the connection 
to the zookeeper ensamble is droped...

the work around to restart the whole application after 30 MR jobss is not very 
elegant... :-(




Re: HBase MapReduce Zookeeper

2011-07-28 Thread Stack
Try getting the ZooKeeperWatcher from the connection on your way out
and explicitly shutdown the zk connection (see TestZooKeeper unit test
for example).
St.Ack

On Thu, Jul 28, 2011 at 6:01 AM, Andre Reiter a.rei...@web.de wrote:
 this issue is still not resolved...

 unfortunatelly calling HConnectionManager.deleteConnection(conf, true);
 after the MR job is finished, does not close the connection to the zookeeper
 we have 3 zookeeper nodes
 by default there is a limit of 10 connections allowed from a single client

 so after running 30 MR jobs scheduled by our application, we have 30
 unclosed connections, trying to start a new MR job results in a failure, the
 connection to the zookeeper ensamble is droped...

 the work around to restart the whole application after 30 MR jobss is not
 very elegant... :-(





Re: HBase MapReduce Zookeeper

2011-07-28 Thread Jeff Whiting
10 connection maximum is too low.  It has been recommended to go up to as many as 2000 connections 
in the list.  This doesn't fix your problem but is something you should probably have in your 
configuration.


~Jeff

On 7/28/2011 10:00 AM, Stack wrote:

Try getting the ZooKeeperWatcher from the connection on your way out
and explicitly shutdown the zk connection (see TestZooKeeper unit test
for example).
St.Ack

On Thu, Jul 28, 2011 at 6:01 AM, Andre Reitera.rei...@web.de  wrote:

this issue is still not resolved...

unfortunatelly calling HConnectionManager.deleteConnection(conf, true);
after the MR job is finished, does not close the connection to the zookeeper
we have 3 zookeeper nodes
by default there is a limit of 10 connections allowed from a single client

so after running 30 MR jobs scheduled by our application, we have 30
unclosed connections, trying to start a new MR job results in a failure, the
connection to the zookeeper ensamble is droped...

the work around to restart the whole application after 30 MR jobss is not
very elegant... :-(





--
Jeff Whiting
Qualtrics Senior Software Engineer
je...@qualtrics.com



Re: HBase MapReduce Zookeeper

2011-07-28 Thread Ruben Quintero
This problem has come up a few times. There are leaked connections in the TIF

See: 
https://issues.apache.org/jira/browse/HBASE-3792
https://issues.apache.org/jira/browse/HBASE-3777


A quick and (very) dirty solution is to call deleteAllConnections(bool) at the 
end of your MapReduce jobs, or  periodically. If you have no other tables or 
pools, etc. open, then no  problem. If you do, they'll start throwing 
IOExceptions, but you can  re-instantiate them with a new config and then 
continue as usual. (You  do have to change the config or it'll simply grab the 
closed, cached one  from the HCM).

Another way: The leak comes from inside of TableInputFormat.setConf, where the 
Configuration gets cloned (so then it's hash in the HCM is lost):
setHTable(new HTable(new Configuration(conf), tableName)); 

This is done to prevent changes to a config from affecting the job and 
vice-versa. If you're 100% sure the config won't be modified, you could 
subclass 
TIF to not make this copy.

For me, I didn't have extra tables hanging around, so I just blast them with 
deleteAllConnections. :)

- Ruben





From: Jeff Whiting je...@qualtrics.com
To: user@hbase.apache.org
Sent: Thu, July 28, 2011 12:10:16 PM
Subject: Re: HBase  MapReduce  Zookeeper

10 connection maximum is too low.  It has been recommended to go up to as many 
as 2000 connections 

in the list.  This doesn't fix your problem but is something you should 
probably 
have in your 

configuration.

~Jeff

On 7/28/2011 10:00 AM, Stack wrote:
 Try getting the ZooKeeperWatcher from the connection on your way out
 and explicitly shutdown the zk connection (see TestZooKeeper unit test
 for example).
 St.Ack

 On Thu, Jul 28, 2011 at 6:01 AM, Andre Reitera.rei...@web.de  wrote:
 this issue is still not resolved...

 unfortunatelly calling HConnectionManager.deleteConnection(conf, true);
 after the MR job is finished, does not close the connection to the zookeeper
 we have 3 zookeeper nodes
 by default there is a limit of 10 connections allowed from a single client

 so after running 30 MR jobs scheduled by our application, we have 30
 unclosed connections, trying to start a new MR job results in a failure, the
 connection to the zookeeper ensamble is droped...

 the work around to restart the whole application after 30 MR jobss is not
 very elegant... :-(




-- 
Jeff Whiting
Qualtrics Senior Software Engineer
je...@qualtrics.com

Re: HBase MapReduce Zookeeper

2011-07-28 Thread Andre Reiter

i guess, i know the reason, why  HConnectionManager.deleteConnection(conf, 
true); does not work for me

in the MR job im using TableInputFormat, if you have a look at the source code 
in the method
public void setConf(Configuration configuration)
there is a line creating the HTable like this :

...
setHTable(new HTable(new Configuration(conf), tableName));
...

now the HTable is using a new configuration, why???

HConnectionManager holds all  Connections in a map with the config as a key:
private static final MapConfiguration, HConnectionImplementation 
HBASE_INSTANCE = ...

Configuration does not override the equals and hash methods, so the HConnection 
created by the table using the new Configuration can not be referenced...


just looking at executed code using debugger, i found out, that there is even 
one place more, where another new configuration is created and used:
org.apache.hadoop.mapred.JobClient.submitJobInternal

...
JobContext context = new JobContext(jobCopy, jobId);
...

the JobContext constructor creates a new Configuration object: new 
org.apache.hadoop.mapred.JobConf(conf)

so there is no real chance for me to get the configuration, which was used 
during the call to HConnectionManager.getConnection(conf)



that is why HConnectionManager.deleteConnection(conf, true); does not work for 
me, or am i completely wrong???


just going crazy with that! i'm not doing anything very special, all what i 
want, is to run a MR job out of my application, and NOT by calling it from the 
shell like this:
./bin/hadoop jar /tmp/my.jar package.OurClass





Re: HBase MapReduce Zookeeper

2011-07-28 Thread Ruben Quintero
Yes, that's the connection leak.

Use deleteAllConnections(true), and it will close all open connections.

- Ruben





From: Andre Reiter a.rei...@web.de
To: user@hbase.apache.org
Sent: Thu, July 28, 2011 4:55:52 PM
Subject: Re: HBase  MapReduce  Zookeeper

i guess, i know the reason, why  HConnectionManager.deleteConnection(conf, 
true); does not work for me

in the MR job im using TableInputFormat, if you have a look at the source code 
in the method
public void setConf(Configuration configuration)
there is a line creating the HTable like this :

...
setHTable(new HTable(new Configuration(conf), tableName));
...

now the HTable is using a new configuration, why???

HConnectionManager holds all  Connections in a map with the config as a key:
private static final MapConfiguration, HConnectionImplementation 
HBASE_INSTANCE = ...

Configuration does not override the equals and hash methods, so the HConnection 
created by the table using the new Configuration can not be referenced...


just looking at executed code using debugger, i found out, that there is even 
one place more, where another new configuration is created and used:
org.apache.hadoop.mapred.JobClient.submitJobInternal

...
JobContext context = new JobContext(jobCopy, jobId);
...

the JobContext constructor creates a new Configuration object: new 
org.apache.hadoop.mapred.JobConf(conf)

so there is no real chance for me to get the configuration, which was used 
during the call to HConnectionManager.getConnection(conf)



that is why HConnectionManager.deleteConnection(conf, true); does not work for 
me, or am i completely wrong???


just going crazy with that! i'm not doing anything very special, all what i 
want, is to run a MR job out of my application, and NOT by calling it from the 
shell like this:
./bin/hadoop jar /tmp/my.jar package.OurClass

Re: HBase MapReduce Zookeeper

2011-07-28 Thread Stack
Or override that method and in your version do not clone the Configuration.
St.Ack

On Thu, Jul 28, 2011 at 2:28 PM, Ruben Quintero rfq_...@yahoo.com wrote:
 Yes, that's the connection leak.

 Use deleteAllConnections(true), and it will close all open connections.

 - Ruben




 
 From: Andre Reiter a.rei...@web.de
 To: user@hbase.apache.org
 Sent: Thu, July 28, 2011 4:55:52 PM
 Subject: Re: HBase  MapReduce  Zookeeper

 i guess, i know the reason, why  HConnectionManager.deleteConnection(conf,
 true); does not work for me

 in the MR job im using TableInputFormat, if you have a look at the source code
 in the method
 public void setConf(Configuration configuration)
 there is a line creating the HTable like this :

 ...
 setHTable(new HTable(new Configuration(conf), tableName));
 ...

 now the HTable is using a new configuration, why???

 HConnectionManager holds all  Connections in a map with the config as a key:
 private static final MapConfiguration, HConnectionImplementation
 HBASE_INSTANCE = ...

 Configuration does not override the equals and hash methods, so the 
 HConnection
 created by the table using the new Configuration can not be referenced...


 just looking at executed code using debugger, i found out, that there is even
 one place more, where another new configuration is created and used:
 org.apache.hadoop.mapred.JobClient.submitJobInternal

 ...
 JobContext context = new JobContext(jobCopy, jobId);
 ...

 the JobContext constructor creates a new Configuration object: new
 org.apache.hadoop.mapred.JobConf(conf)

 so there is no real chance for me to get the configuration, which was used
 during the call to HConnectionManager.getConnection(conf)



 that is why HConnectionManager.deleteConnection(conf, true); does not work for
 me, or am i completely wrong???


 just going crazy with that! i'm not doing anything very special, all what i
 want, is to run a MR job out of my application, and NOT by calling it from the
 shell like this:
 ./bin/hadoop jar /tmp/my.jar package.OurClass


Re: HBase MapReduce Zookeeper

2011-07-28 Thread Andre Reiter

hi Ruben, St.Ack

thanks a lot for your help!

finally, the problem seems to be solved by an pretty sick workaround
i did it like Bryan Keller described in this issue: 
https://issues.apache.org/jira/browse/HBASE-3792

@Ruben: thanks for the urls to that issues

cheers
andre



Re: HBase Mapreduce cannot find Map class

2011-07-28 Thread Gan, Xiyun
Maybe job.setJarByClass() can solve this problem.

On Thu, Jul 28, 2011 at 7:06 PM, air cnwe...@gmail.com wrote:
 -- Forwarded message --
 From: air cnwe...@gmail.com
 Date: 2011/7/28
 Subject: HBase Mapreduce cannot find Map class
 To: CDH Users cdh-u...@cloudera.org


 import java.io.IOException;
 import java.text.ParseException;
 import java.text.SimpleDateFormat;
 import java.util.Date;

 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.client.HTable;
 import org.apache.hadoop.hbase.client.Put;
 import org.apache.hadoop.hbase.mapred.TableMapReduceUtil;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.MapReduceBase;
 import org.apache.hadoop.mapred.Mapper;
 import org.apache.hadoop.mapred.OutputCollector;
 import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.lib.NullOutputFormat;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;


 public class LoadToHBase extends Configured implements Tool{
public static class XMapK, V extends MapReduceBase implements
 MapperLongWritable, Text, K, V{
private JobConf conf;

@Override
public void configure(JobConf conf){
this.conf = conf;
try{
this.table = new HTable(new HBaseConfiguration(conf),
 observations);
}catch(IOException e){
throw new RuntimeException(Failed HTable construction, e);
}
}

@Override
public void close() throws IOException{
super.close();
table.close();
}

private HTable table;
public void map(LongWritable key, Text value, OutputCollector
 output, Reporter reporter) throws IOException{
String[] valuelist = value.toString().split(\t);
SimpleDateFormat sdf = new  SimpleDateFormat(-MM-dd
 HH:mm:ss);
Date addtime = null; // 用户注册时间
Date ds = null;
Long delta_days = null;
String uid = valuelist[0];
try {
addtime = sdf.parse(valuelist[1]);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

String ds_str = conf.get(load.hbase.ds, null);
if (ds_str != null){
try {
ds = sdf.parse(ds_str);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}else{
ds_str = 2011-07-28;
}

if (addtime != null  ds != null){
delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 *
 60 * 1000);
}

if (delta_days != null){
byte[] rowKey = uid.getBytes();
Put p = new Put(rowKey);
p.add(content.getBytes(), attr1.getBytes(),
 delta_days.toString().getBytes());
table.put(p);
}
}
}
/**
 * @param args
 * @throws Exception
 */
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
int exitCode = ToolRunner.run(new HBaseConfiguration(), new
 LoadToHBase(), args);
System.exit(exitCode);
}

@Override
public int run(String[] args) throws Exception {
// TODO Auto-generated method stub
JobConf conf = new JobConf(getClass());
TableMapReduceUtil.addDependencyJars(conf);
FileInputFormat.addInputPath(conf, new Path(args[0]));
conf.setJobName(LoadToHBase);
conf.setJarByClass(getClass());
conf.setMapperClass(XMap.class);
conf.setNumReduceTasks(0);
conf.setOutputFormat(NullOutputFormat.class);
JobClient.runJob(conf);
return 0;
}

 }

 execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/
 and it says:

 ..
 11/07/28 17:20:29 INFO mapred.JobClient: Task Id :
 attempt_201107261532_2625_m_04_1, Status : FAILED
 java.lang.RuntimeException: Error in configuring object
at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method

Re: HBase MapReduce Zookeeper

2011-07-20 Thread Andre Reiter

Hi St.Ack,

thanks for your reply
but funally i miss the point, what would be the options to solve our issue?

andre



Re: HBase MapReduce Zookeeper

2011-07-20 Thread Stack
Can you reuse Configuration instances though the configuration changes?

Else in your Mapper#cleanup, call HTable.close() then try
HConnectionManager.deleteConnection(table.getConfiguration()) after
close (could be issue with executors used by multi* operations not
completing before delete of connection finishes.

St.Ack

On Tue, Jul 19, 2011 at 11:36 PM, Andre Reiter a.rei...@web.de wrote:
 Hi St.Ack,

 thanks for your reply
 but funally i miss the point, what would be the options to solve our issue?

 andre




Re: HBase MapReduce Zookeeper

2011-07-20 Thread Andre Reiter

Hi Stack,

just to make clear, actually the connections to the zookeeper being kept are 
not on our mappers (tasktrackers) but on the client, which schedules the MR job
i think, the mappers are just fine, as they are

andre



Stack wrote:

Can you reuse Configuration instances though the configuration changes?

Else in your Mapper#cleanup, call HTable.close() then try
HConnectionManager.deleteConnection(table.getConfiguration()) after
close (could be issue with executors used by multi* operations not
completing before delete of connection finishes.

St.Ack




Re: HBase MapReduce Zookeeper

2011-07-20 Thread Stack
Then similarly, can you do the deleteConnection above in your client
or reuse the Configuration client-side that you use setting up the
job?
St.Ack

On Wed, Jul 20, 2011 at 12:13 AM, Andre Reiter a.rei...@web.de wrote:
 Hi Stack,

 just to make clear, actually the connections to the zookeeper being kept are
 not on our mappers (tasktrackers) but on the client, which schedules the MR
 job
 i think, the mappers are just fine, as they are

 andre



 Stack wrote:

 Can you reuse Configuration instances though the configuration changes?

 Else in your Mapper#cleanup, call HTable.close() then try
 HConnectionManager.deleteConnection(table.getConfiguration()) after
 close (could be issue with executors used by multi* operations not
 completing before delete of connection finishes.

 St.Ack




Re: HBase MapReduce Zookeeper

2011-07-20 Thread Andre Reiter

Hi St.Ack,

actually calling HConnectionManager.deleteConnection(conf, true); does not 
close the connection to the zookeeper
i still can see the connection established...

andre



Stack wrote:

Then similarly, can you do the deleteConnection above in your client
or reuse the Configuration client-side that you use setting up the
job?
St.Ack





  1   2   >