Re: input split for hbase mapreduce
Please take a look at map() method of Mapper classes in the code base. e.g. hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/GroupingTableMapper.java On Tue, May 9, 2017 at 2:51 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com> wrote: > Hi > >If I am running mapreduce on hbase tables what will be the input to > mapper function > > Thanks >
input split for hbase mapreduce
Hi If I am running mapreduce on hbase tables what will be the input to mapper function Thanks
Re: HBase mapreduce job crawls on final 25% of maps
It appears that my issue was caused by the missing sections I mentioned in the second post. I ran a job with these settings, and my job finished in < 6 hours. Thanks for your suggestions because I have further ideas regarding issues moving forward. scan.setCaching(500);// 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs On Wed, Apr 13, 2016 at 7:32 AM, Colin Kincaid Williamswrote: > Hi Chien, > > 4. From 50-150k per * second * to 100-150k per * minute *, as stated > above, so reads went *DOWN* significantly. I think you must have > misread. > > I will take into account some of your other suggestions. > > Thanks, > > Colin > > On Tue, Apr 12, 2016 at 8:19 PM, Chien Le wrote: >> Some things I would look at: >> 1. Node statistics, both the mapper and regionserver nodes. Make sure >> they're on fully healthy nodes (no disk issues, no half duplex, etc) and >> that they're not already saturated from other jobs. >> 2. Is there a common regionserver behind the remaining mappers/regions? If >> so, try moving some regions off to spread the load. >> 3. Verify the locality of the region blocks to the regionserver. If you >> don't automate major compacts or have moved regions recently, mapper >> locality might not help. Major compact if needed or move regions if you can >> determine source? >> 4. You mentioned that the requests per sec has gone from 50-150k to >> 100-150k. Was that a typo? Did the read rate really increase? >> 5. You've listed the region sizes but was that done with a cursory hadoop >> fs du? Have you tried using the hfile analyzer to verify number of rows and >> sizes are roughly the same? >> 5. profile the mappers. If you can share the task counters for a completed >> and a still running task to compare, it might help find the issue >> 6. I don't think you should underestimate the perf gains of node local >> tasks vs just rack local, especially if short circuit reads are enabled. >> This is a big gamble unfortunately given how far your tasks have been >> running already so I'd look at this as a last resort >> >> >> HTH, >> Chien >> >> On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williams >> wrote: >> >>> I've noticed that I've omitted >>> >>> scan.setCaching(500);// 1 is the default in Scan, which will >>> be bad for MapReduce jobs >>> scan.setCacheBlocks(false); // don't set to true for MR jobs >>> >>> which appear to be suggestions from examples. Still I am not sure if >>> this explains the significant request slowdown on the final 25% of the >>> jobs. >>> >>> On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams >>> wrote: >>> > Excuse my double post. I thought I deleted my draft, and then >>> > constructed a cleaner, more detailed, more readable mail. >>> > >>> > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams >>> wrote: >>> >> After trying to get help with distcp on hadoop-user and cdh-user >>> >> mailing lists, I've given up on trying to use distcp and exporttable >>> >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0 >>> >> >>> >> I've been working on an hbase map reduce job to serialize my entries >>> >> and insert them into kafka. Then I plan to re-import them into >>> >> cdh5.3.0. >>> >> >>> >> Currently I'm having trouble with my map-reduce job. I have 43 maps, >>> >> 33 which have finished successfully, and 10 which are currently still >>> >> running. I had previously seen requests of 50-150k per second. Now for >>> >> the final 10 maps, I'm seeing 100-150k per minute. >>> >> >>> >> I might also mention that there were 6 failures near the application >>> >> start. Unfortunately, I cannot read the logs for these 6 failures. >>> >> There is an exception related to the yarn logging for these maps, >>> >> maybe because they failed to start. >>> >> >>> >> I had a look around HDFS. It appears that the regions are all between >>> >> 5-10GB. The longest completed map so far took 7 hours, with the >>> >> majority appearing to take around 3.5 hours . >>> >> >>> >> The remaining 10 maps have each been running between 23-27 hours. >>> >> >>> >> Considering data locality issues. 6 of the remaining jobs are running >>> >> on the same rack. Then the other 4 are split between my other two >>> >> racks. There should currently be a replica on each rack, since it >>> >> appears the replicas are set to 3. Then I'm not sure this is really >>> >> the cause of the slowdown. >>> >> >>> >> Then I'm looking for advice on what I can do to troubleshoot my job. >>> >> I'm setting up my map job like: >>> >> >>> >> main(String[] args){ >>> >> ... >>> >> Scan fromScan = new Scan(); >>> >> System.out.println(fromScan); >>> >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, >>> Map.class, >>> >> null, null, job, true, TableInputFormat.class); >>> >> >>> >> // My guess is this contols the output
Re: HBase mapreduce job crawls on final 25% of maps
Hi Chien, 4. From 50-150k per * second * to 100-150k per * minute *, as stated above, so reads went *DOWN* significantly. I think you must have misread. I will take into account some of your other suggestions. Thanks, Colin On Tue, Apr 12, 2016 at 8:19 PM, Chien Lewrote: > Some things I would look at: > 1. Node statistics, both the mapper and regionserver nodes. Make sure > they're on fully healthy nodes (no disk issues, no half duplex, etc) and > that they're not already saturated from other jobs. > 2. Is there a common regionserver behind the remaining mappers/regions? If > so, try moving some regions off to spread the load. > 3. Verify the locality of the region blocks to the regionserver. If you > don't automate major compacts or have moved regions recently, mapper > locality might not help. Major compact if needed or move regions if you can > determine source? > 4. You mentioned that the requests per sec has gone from 50-150k to > 100-150k. Was that a typo? Did the read rate really increase? > 5. You've listed the region sizes but was that done with a cursory hadoop > fs du? Have you tried using the hfile analyzer to verify number of rows and > sizes are roughly the same? > 5. profile the mappers. If you can share the task counters for a completed > and a still running task to compare, it might help find the issue > 6. I don't think you should underestimate the perf gains of node local > tasks vs just rack local, especially if short circuit reads are enabled. > This is a big gamble unfortunately given how far your tasks have been > running already so I'd look at this as a last resort > > > HTH, > Chien > > On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williams > wrote: > >> I've noticed that I've omitted >> >> scan.setCaching(500);// 1 is the default in Scan, which will >> be bad for MapReduce jobs >> scan.setCacheBlocks(false); // don't set to true for MR jobs >> >> which appear to be suggestions from examples. Still I am not sure if >> this explains the significant request slowdown on the final 25% of the >> jobs. >> >> On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams >> wrote: >> > Excuse my double post. I thought I deleted my draft, and then >> > constructed a cleaner, more detailed, more readable mail. >> > >> > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams >> wrote: >> >> After trying to get help with distcp on hadoop-user and cdh-user >> >> mailing lists, I've given up on trying to use distcp and exporttable >> >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0 >> >> >> >> I've been working on an hbase map reduce job to serialize my entries >> >> and insert them into kafka. Then I plan to re-import them into >> >> cdh5.3.0. >> >> >> >> Currently I'm having trouble with my map-reduce job. I have 43 maps, >> >> 33 which have finished successfully, and 10 which are currently still >> >> running. I had previously seen requests of 50-150k per second. Now for >> >> the final 10 maps, I'm seeing 100-150k per minute. >> >> >> >> I might also mention that there were 6 failures near the application >> >> start. Unfortunately, I cannot read the logs for these 6 failures. >> >> There is an exception related to the yarn logging for these maps, >> >> maybe because they failed to start. >> >> >> >> I had a look around HDFS. It appears that the regions are all between >> >> 5-10GB. The longest completed map so far took 7 hours, with the >> >> majority appearing to take around 3.5 hours . >> >> >> >> The remaining 10 maps have each been running between 23-27 hours. >> >> >> >> Considering data locality issues. 6 of the remaining jobs are running >> >> on the same rack. Then the other 4 are split between my other two >> >> racks. There should currently be a replica on each rack, since it >> >> appears the replicas are set to 3. Then I'm not sure this is really >> >> the cause of the slowdown. >> >> >> >> Then I'm looking for advice on what I can do to troubleshoot my job. >> >> I'm setting up my map job like: >> >> >> >> main(String[] args){ >> >> ... >> >> Scan fromScan = new Scan(); >> >> System.out.println(fromScan); >> >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, >> Map.class, >> >> null, null, job, true, TableInputFormat.class); >> >> >> >> // My guess is this contols the output type for the reduce function >> >> base on setOutputKeyClass and setOutput value class from p.27 . Since >> >> there is no reduce step, then this is currently null. >> >> job.setOutputFormatClass(NullOutputFormat.class); >> >> job.setNumReduceTasks(0); >> >> job.submit(); >> >> ... >> >> } >> >> >> >> I'm not performing a reduce step, and I'm traversing row keys like >> >> >> >> map(final ImmutableBytesWritable fromRowKey, >> >> Result fromResult, Context context) throws IOException { >> >> ... >> >> // should I assume that each keyvalue is a version of the stored >> row? >> >> for
Re: HBase mapreduce job crawls on final 25% of maps
Some things I would look at: 1. Node statistics, both the mapper and regionserver nodes. Make sure they're on fully healthy nodes (no disk issues, no half duplex, etc) and that they're not already saturated from other jobs. 2. Is there a common regionserver behind the remaining mappers/regions? If so, try moving some regions off to spread the load. 3. Verify the locality of the region blocks to the regionserver. If you don't automate major compacts or have moved regions recently, mapper locality might not help. Major compact if needed or move regions if you can determine source? 4. You mentioned that the requests per sec has gone from 50-150k to 100-150k. Was that a typo? Did the read rate really increase? 5. You've listed the region sizes but was that done with a cursory hadoop fs du? Have you tried using the hfile analyzer to verify number of rows and sizes are roughly the same? 5. profile the mappers. If you can share the task counters for a completed and a still running task to compare, it might help find the issue 6. I don't think you should underestimate the perf gains of node local tasks vs just rack local, especially if short circuit reads are enabled. This is a big gamble unfortunately given how far your tasks have been running already so I'd look at this as a last resort HTH, Chien On Tue, Apr 12, 2016 at 3:59 PM, Colin Kincaid Williamswrote: > I've noticed that I've omitted > > scan.setCaching(500);// 1 is the default in Scan, which will > be bad for MapReduce jobs > scan.setCacheBlocks(false); // don't set to true for MR jobs > > which appear to be suggestions from examples. Still I am not sure if > this explains the significant request slowdown on the final 25% of the > jobs. > > On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams > wrote: > > Excuse my double post. I thought I deleted my draft, and then > > constructed a cleaner, more detailed, more readable mail. > > > > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams > wrote: > >> After trying to get help with distcp on hadoop-user and cdh-user > >> mailing lists, I've given up on trying to use distcp and exporttable > >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0 > >> > >> I've been working on an hbase map reduce job to serialize my entries > >> and insert them into kafka. Then I plan to re-import them into > >> cdh5.3.0. > >> > >> Currently I'm having trouble with my map-reduce job. I have 43 maps, > >> 33 which have finished successfully, and 10 which are currently still > >> running. I had previously seen requests of 50-150k per second. Now for > >> the final 10 maps, I'm seeing 100-150k per minute. > >> > >> I might also mention that there were 6 failures near the application > >> start. Unfortunately, I cannot read the logs for these 6 failures. > >> There is an exception related to the yarn logging for these maps, > >> maybe because they failed to start. > >> > >> I had a look around HDFS. It appears that the regions are all between > >> 5-10GB. The longest completed map so far took 7 hours, with the > >> majority appearing to take around 3.5 hours . > >> > >> The remaining 10 maps have each been running between 23-27 hours. > >> > >> Considering data locality issues. 6 of the remaining jobs are running > >> on the same rack. Then the other 4 are split between my other two > >> racks. There should currently be a replica on each rack, since it > >> appears the replicas are set to 3. Then I'm not sure this is really > >> the cause of the slowdown. > >> > >> Then I'm looking for advice on what I can do to troubleshoot my job. > >> I'm setting up my map job like: > >> > >> main(String[] args){ > >> ... > >> Scan fromScan = new Scan(); > >> System.out.println(fromScan); > >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, > Map.class, > >> null, null, job, true, TableInputFormat.class); > >> > >> // My guess is this contols the output type for the reduce function > >> base on setOutputKeyClass and setOutput value class from p.27 . Since > >> there is no reduce step, then this is currently null. > >> job.setOutputFormatClass(NullOutputFormat.class); > >> job.setNumReduceTasks(0); > >> job.submit(); > >> ... > >> } > >> > >> I'm not performing a reduce step, and I'm traversing row keys like > >> > >> map(final ImmutableBytesWritable fromRowKey, > >> Result fromResult, Context context) throws IOException { > >> ... > >> // should I assume that each keyvalue is a version of the stored > row? > >> for (KeyValue kv : fromResult.raw()) { > >> ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder, > >> kv.getValue()); > >> //TODO: ADD counter for each qualifier > >> } > >> > >> > >> > >> I've also have a list of simple questions. > >> > >> Has anybody experienced a significant slowdown on map jobs related to > >> a portion of their hbase regions? If so what issues did you come > >> across? > >> > >>
Re: HBase mapreduce job crawls on final 25% of maps
I've noticed that I've omitted scan.setCaching(500);// 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs which appear to be suggestions from examples. Still I am not sure if this explains the significant request slowdown on the final 25% of the jobs. On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williamswrote: > Excuse my double post. I thought I deleted my draft, and then > constructed a cleaner, more detailed, more readable mail. > > On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams > wrote: >> After trying to get help with distcp on hadoop-user and cdh-user >> mailing lists, I've given up on trying to use distcp and exporttable >> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0 >> >> I've been working on an hbase map reduce job to serialize my entries >> and insert them into kafka. Then I plan to re-import them into >> cdh5.3.0. >> >> Currently I'm having trouble with my map-reduce job. I have 43 maps, >> 33 which have finished successfully, and 10 which are currently still >> running. I had previously seen requests of 50-150k per second. Now for >> the final 10 maps, I'm seeing 100-150k per minute. >> >> I might also mention that there were 6 failures near the application >> start. Unfortunately, I cannot read the logs for these 6 failures. >> There is an exception related to the yarn logging for these maps, >> maybe because they failed to start. >> >> I had a look around HDFS. It appears that the regions are all between >> 5-10GB. The longest completed map so far took 7 hours, with the >> majority appearing to take around 3.5 hours . >> >> The remaining 10 maps have each been running between 23-27 hours. >> >> Considering data locality issues. 6 of the remaining jobs are running >> on the same rack. Then the other 4 are split between my other two >> racks. There should currently be a replica on each rack, since it >> appears the replicas are set to 3. Then I'm not sure this is really >> the cause of the slowdown. >> >> Then I'm looking for advice on what I can do to troubleshoot my job. >> I'm setting up my map job like: >> >> main(String[] args){ >> ... >> Scan fromScan = new Scan(); >> System.out.println(fromScan); >> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class, >> null, null, job, true, TableInputFormat.class); >> >> // My guess is this contols the output type for the reduce function >> base on setOutputKeyClass and setOutput value class from p.27 . Since >> there is no reduce step, then this is currently null. >> job.setOutputFormatClass(NullOutputFormat.class); >> job.setNumReduceTasks(0); >> job.submit(); >> ... >> } >> >> I'm not performing a reduce step, and I'm traversing row keys like >> >> map(final ImmutableBytesWritable fromRowKey, >> Result fromResult, Context context) throws IOException { >> ... >> // should I assume that each keyvalue is a version of the stored row? >> for (KeyValue kv : fromResult.raw()) { >> ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder, >> kv.getValue()); >> //TODO: ADD counter for each qualifier >> } >> >> >> >> I've also have a list of simple questions. >> >> Has anybody experienced a significant slowdown on map jobs related to >> a portion of their hbase regions? If so what issues did you come >> across? >> >> Can I get a suggestion how to show which map corresponds to which >> region, so I can troubleshoot from there? Is this already logged >> somewhere by default, or is there a way to set this up with the >> TableMapReduceUtil.initTableMapperJob ? >> >> Any other suggestions would be appreciated.
Re: HBase mapreduce job crawls on final 25% of maps
Excuse my double post. I thought I deleted my draft, and then constructed a cleaner, more detailed, more readable mail. On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williamswrote: > After trying to get help with distcp on hadoop-user and cdh-user > mailing lists, I've given up on trying to use distcp and exporttable > to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0 > > I've been working on an hbase map reduce job to serialize my entries > and insert them into kafka. Then I plan to re-import them into > cdh5.3.0. > > Currently I'm having trouble with my map-reduce job. I have 43 maps, > 33 which have finished successfully, and 10 which are currently still > running. I had previously seen requests of 50-150k per second. Now for > the final 10 maps, I'm seeing 100-150k per minute. > > I might also mention that there were 6 failures near the application > start. Unfortunately, I cannot read the logs for these 6 failures. > There is an exception related to the yarn logging for these maps, > maybe because they failed to start. > > I had a look around HDFS. It appears that the regions are all between > 5-10GB. The longest completed map so far took 7 hours, with the > majority appearing to take around 3.5 hours . > > The remaining 10 maps have each been running between 23-27 hours. > > Considering data locality issues. 6 of the remaining jobs are running > on the same rack. Then the other 4 are split between my other two > racks. There should currently be a replica on each rack, since it > appears the replicas are set to 3. Then I'm not sure this is really > the cause of the slowdown. > > Then I'm looking for advice on what I can do to troubleshoot my job. > I'm setting up my map job like: > > main(String[] args){ > ... > Scan fromScan = new Scan(); > System.out.println(fromScan); > TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class, > null, null, job, true, TableInputFormat.class); > > // My guess is this contols the output type for the reduce function > base on setOutputKeyClass and setOutput value class from p.27 . Since > there is no reduce step, then this is currently null. > job.setOutputFormatClass(NullOutputFormat.class); > job.setNumReduceTasks(0); > job.submit(); > ... > } > > I'm not performing a reduce step, and I'm traversing row keys like > > map(final ImmutableBytesWritable fromRowKey, > Result fromResult, Context context) throws IOException { > ... > // should I assume that each keyvalue is a version of the stored row? > for (KeyValue kv : fromResult.raw()) { > ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder, > kv.getValue()); > //TODO: ADD counter for each qualifier > } > > > > I've also have a list of simple questions. > > Has anybody experienced a significant slowdown on map jobs related to > a portion of their hbase regions? If so what issues did you come > across? > > Can I get a suggestion how to show which map corresponds to which > region, so I can troubleshoot from there? Is this already logged > somewhere by default, or is there a way to set this up with the > TableMapReduceUtil.initTableMapperJob ? > > Any other suggestions would be appreciated.
HBase mapreduce job crawls on final 25% of maps
After trying to get help with distcp on hadoop-user and cdh-user mailing lists, I've given up on trying to use distcp and exporttable to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0 I've been working on an hbase map reduce job to serialize my entries and insert them into kafka. Then I plan to re-import them into cdh5.3.0. Currently I'm having trouble with my map-reduce job. I have 43 maps, 33 which have finished successfully, and 10 which are currently still running. I had previously seen requests of 50-150k per second. Now for the final 10 maps, I'm seeing 100-150k per minute. I might also mention that there were 6 failures near the application start. Unfortunately, I cannot read the logs for these 6 failures. There is an exception related to the yarn logging for these maps, maybe because they failed to start. I had a look around HDFS. It appears that the regions are all between 5-10GB. The longest completed map so far took 7 hours, with the majority appearing to take around 3.5 hours . The remaining 10 maps have each been running between 23-27 hours. Considering data locality issues. 6 of the remaining jobs are running on the same rack. Then the other 4 are split between my other two racks. There should currently be a replica on each rack, since it appears the replicas are set to 3. Then I'm not sure this is really the cause of the slowdown. Then I'm looking for advice on what I can do to troubleshoot my job. I'm setting up my map job like: main(String[] args){ ... Scan fromScan = new Scan(); System.out.println(fromScan); TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class, null, null, job, true, TableInputFormat.class); // My guess is this contols the output type for the reduce function base on setOutputKeyClass and setOutput value class from p.27 . Since there is no reduce step, then this is currently null. job.setOutputFormatClass(NullOutputFormat.class); job.setNumReduceTasks(0); job.submit(); ... } I'm not performing a reduce step, and I'm traversing row keys like map(final ImmutableBytesWritable fromRowKey, Result fromResult, Context context) throws IOException { ... // should I assume that each keyvalue is a version of the stored row? for (KeyValue kv : fromResult.raw()) { ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder, kv.getValue()); //TODO: ADD counter for each qualifier } I've also have a list of simple questions. Has anybody experienced a significant slowdown on map jobs related to a portion of their hbase regions? If so what issues did you come across? Can I get a suggestion how to show which map corresponds to which region, so I can troubleshoot from there? Is this already logged somewhere by default, or is there a way to set this up with the TableMapReduceUtil.initTableMapperJob ? Any other suggestions would be appreciated.
HBASE mapReduce stoppage
We are bulk loading 1 billion rows into hbase. The 1 billion file was split into 20 files of ~22.5GB. Ingesting the file to hdfs took ~2min. Ingesting the first file to hbase took ~3 hours. The next took ~5hours, then it is increasing. By the sixth or seventh file the ingestion just stops (mapReduce Bulk load stops at 99% of mapper and around 22% of the reducer). We also noticed that as soon as the reducers are starting, the progress of the job slows down. The logs did not show any problem and we do not see any hot spotting (the table is already salted). We are running out of ideas. Few questions to get started: 1- Is the increase MR expected? Does MR need to sort the new data again the already ingested one? 2- Is there a way to speed up this, especially that our data is already sorted? From 2min on hdfs to 5 hours on hbase is a big gap. A word count map reduce on 24GB took only ~7 minutes. Removing the reducers from the existing cvs bulk load will not help as the mappers will spit the data in a random order. regards, Dillon Dillon Chrimes (PhD) University of Victoria Victoria BC Canada
Re: HBASE mapReduce stoppage
Hi Dilon, Sounds like your table was not pre-split from the behavior that you are describing, but when you say that you are bulk loading the data using MR is this a MR job that does Put(s) into HBase or just generating HFiles (if using importtsv you have both options) that are later on bulk loaded via the completebulkload command? Have you looked into https://hbase.apache.org/book.html#arch.bulk.load for how to perform a bulk load into HBase? cheers, esteban. -- Cloudera, Inc. On Wed, May 20, 2015 at 3:01 PM, dchri...@uvic.ca wrote: We are bulk loading 1 billion rows into hbase. The 1 billion file was split into 20 files of ~22.5GB. Ingesting the file to hdfs took ~2min. Ingesting the first file to hbase took ~3 hours. The next took ~5hours, then it is increasing. By the sixth or seventh file the ingestion just stops (mapReduce Bulk load stops at 99% of mapper and around 22% of the reducer). We also noticed that as soon as the reducers are starting, the progress of the job slows down. The logs did not show any problem and we do not see any hot spotting (the table is already salted). We are running out of ideas. Few questions to get started: 1- Is the increase MR expected? Does MR need to sort the new data again the already ingested one? 2- Is there a way to speed up this, especially that our data is already sorted? From 2min on hdfs to 5 hours on hbase is a big gap. A word count map reduce on 24GB took only ~7 minutes. Removing the reducers from the existing cvs bulk load will not help as the mappers will spit the data in a random order. regards, Dillon Dillon Chrimes (PhD) University of Victoria Victoria BC Canada
Re: HBase MapReduce in Kerberized cluster
I searched (current) 0.98 and branch-1 where I found: ./hbase-client/src/main/java/org/apache/hadoop/hbase/security/token/TokenUtil.java Looking at both 0.98[1] and 0.98.6[2] on github I see TokenUtil as part of hbase-server. Is it necessary for us to add this call to TokenUtil to all MR jobs interacting with HBase after kerberizing our clusters? Or is there a better approach that won't require making this change? Thanks, Ed Skoviak [1] https://github.com/apache/hbase/find/0.98.0 [2] https://github.com/apache/hbase/find/0.98.6
Re: HBase MapReduce in Kerberized cluster
Please take a look at HBASE-12493 User class should provide a way to re-use existing token which went into 0.98.9 FYI On Thu, May 14, 2015 at 8:37 AM, Edward C. Skoviak edward.skov...@gmail.com wrote: I searched (current) 0.98 and branch-1 where I found: ./hbase-client/src/main/java/org/apache/hadoop/hbase/security/token/TokenUtil.java Looking at both 0.98[1] and 0.98.6[2] on github I see TokenUtil as part of hbase-server. Is it necessary for us to add this call to TokenUtil to all MR jobs interacting with HBase after kerberizing our clusters? Or is there a better approach that won't require making this change? Thanks, Ed Skoviak [1] https://github.com/apache/hbase/find/0.98.0 [2] https://github.com/apache/hbase/find/0.98.6
HBase MapReduce in Kerberized cluster
I'm attempting to write a Crunch pipeline to read various rows from a table in HBase and then do processing on these results. I am doing this from a cluster deployed using CDH 5.3.2 running Kerberos and YARN. I was hoping to get an answer on what is considered the best approach to authenticate to HBase within MapReduce task execution context? I've perused various posts/documentation and it seems that TokenUtil was, at least at one point, the right approach, however I notice now it has been moved to be a part of the hbase-server package (instead of hbase-client). Is there a better way to retrieve and pass an HBase delegation token to the MR job launched by my pipeline? Thanks, Ed Skoviak
Re: HBase MapReduce in Kerberized cluster
bq. it has been moved to be a part of the hbase-server package I searched (current) 0.98 and branch-1 where I found: ./hbase-client/src/main/java/org/apache/hadoop/hbase/security/token/TokenUtil.java FYI On Wed, May 13, 2015 at 11:45 AM, Edward C. Skoviak edward.skov...@gmail.com wrote: I'm attempting to write a Crunch pipeline to read various rows from a table in HBase and then do processing on these results. I am doing this from a cluster deployed using CDH 5.3.2 running Kerberos and YARN. I was hoping to get an answer on what is considered the best approach to authenticate to HBase within MapReduce task execution context? I've perused various posts/documentation and it seems that TokenUtil was, at least at one point, the right approach, however I notice now it has been moved to be a part of the hbase-server package (instead of hbase-client). Is there a better way to retrieve and pass an HBase delegation token to the MR job launched by my pipeline? Thanks, Ed Skoviak
A solution for data skew issue in HBase-Mapreduce jobs
Hi, all, I submit a new patch to fix the data skew issue in HBase-Mapreduce jobs. Would you please take a look at this new patch and give me some advice? https://issues.apache.org/jira/browse/HBASE-12590 Example: yeweichen2...@gmail.com
Re: A solution for data skew issue in HBase-Mapreduce jobs
Did you attach a screenshot ? The attachment shows up as grey area. Probably you can attach the image to JIRA. Cheers On Sun, Nov 30, 2014 at 6:57 PM, yeweichen2...@gmail.com yeweichen2...@gmail.com wrote: Hi, all, I submit a new patch to fix the data skew issue in HBase-Mapreduce jobs. Would you please take a look at this new patch and give me some advice? https://issues.apache.org/jira/browse/HBASE-12590 https://issues.apache.org/jira/browse/HBASE-12590 Example: -- yeweichen2...@gmail.com
Re: Re: A solution for data skew issue in HBase-Mapreduce jobs
Yes, the screenshot is not good in mail. You can get the simple pdf document from this link: https://issues.apache.org/jira/secure/attachment/12683977/A%20Solution%20for%20Data%20Skew%20in%20HBase-MapReduce%20Job.pdf Cheers yeweichen2...@gmail.com From: Ted Yu Date: 2014-12-01 11:00 To: user@hbase.apache.org Subject: Re: A solution for data skew issue in HBase-Mapreduce jobs Did you attach a screenshot ? The attachment shows up as grey area. Probably you can attach the image to JIRA. Cheers On Sun, Nov 30, 2014 at 6:57 PM, yeweichen2...@gmail.com yeweichen2...@gmail.com wrote: Hi, all, I submit a new patch to fix the data skew issue in HBase-Mapreduce jobs. Would you please take a look at this new patch and give me some advice? https://issues.apache.org/jira/browse/HBASE-12590 https://issues.apache.org/jira/browse/HBASE-12590 Example: -- yeweichen2...@gmail.com
Re: Hbase Mapreduce API - Reduce to a file is not working properly.
Hi Shahab, Thanks for the response. I have added the @Override and somehow that worked. I have pasted the new Reducer code below. Though, I did not understood the difference here, as if what I have done differently. I might be a very silly reason though. = package com.test.hadoop; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends ReducerText, IntWritable, Text, IntWritable { @Override protected void reduce(Text key, IterableIntWritable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } = Regards, Parkirat Bagga. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062240.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Hbase Mapreduce API - Reduce to a file is not working properly.
Hi, The @Override annotation worked because, without it the reduce method in the superclass (Reducer) was being invoked, which basically writes the input from the mapper class to the context object. Try to look up the source code for the Reducer class online and you'll realize that. Hope that clears it up. Cheers, Arun Sent from a mobile device. Please don't mind the typos. Hi Shahab, Thanks for the response. I have added the @Override and somehow that worked. I have pasted the new Reducer code below. Though, I did not understood the difference here, as if what I have done differently. I might be a very silly reason though. = package com.test.hadoop; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends ReducerText, IntWritable, Text, IntWritable { @Override protected void reduce(Text key, IterableIntWritable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } = Regards, Parkirat Bagga. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062240.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Hbase Mapreduce API - Reduce to a file is not working properly.
Parkirat, This is a core Java concept which is mainly related to how Class inheritance works in Java and how the @Override annotation is used, and is not Hadoop specific. (It is also used while implementing interfaces since JDK 6.) You can read about it here: http://tutorials.jenkov.com/java/annotations.html#override http://www.javapractices.com/topic/TopicAction.do?Id=223 Regards, Shahab On Sat, Aug 2, 2014 at 2:56 AM, Arun Allamsetty arun.allamse...@gmail.com wrote: Hi, The @Override annotation worked because, without it the reduce method in the superclass (Reducer) was being invoked, which basically writes the input from the mapper class to the context object. Try to look up the source code for the Reducer class online and you'll realize that. Hope that clears it up. Cheers, Arun Sent from a mobile device. Please don't mind the typos. Hi Shahab, Thanks for the response. I have added the @Override and somehow that worked. I have pasted the new Reducer code below. Though, I did not understood the difference here, as if what I have done differently. I might be a very silly reason though. = package com.test.hadoop; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends ReducerText, IntWritable, Text, IntWritable { @Override protected void reduce(Text key, IterableIntWritable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } = Regards, Parkirat Bagga. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062240.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Hbase Mapreduce API - Reduce to a file is not working properly.
Hi Parkirat, I don't think that HBase is causing the problems. You might already know this but need to add the reducer class to the job as you add the mapper. Also, if you want to read from a HBase table in a MapReduce job, you need to implement the TableMapper for the mapper and if you want to write to a file on HDFS instead of a HBase table, you need to implement the generic Reducer class for the reducer. But again, if you can show everyone the code, people will be able to help you better. Cheers, Arun Sent from a mobile device. Please don't mind the typos. On Jul 31, 2014 1:07 PM, Nick Dimiduk ndimi...@gmail.com wrote: Hi Parkirat, I don't follow the reducer problem you're having. Can you post your code that configures the job? I assume you're using TableMapReduceUtil someplace. Your reducer is removing duplicate values? Sounds like you need to update it's logic to only emit a value once. Pastebin-ing your reducer code may be helpful as well. -n On Thu, Jul 31, 2014 at 8:20 AM, Parkirat parkiratbigd...@gmail.com wrote: Hi All, I am using Mapreduce API to read Hbase Table, based on some scan operation in mapper and putting the data to a file in reducer. I am using Hbase Version Version 0.94.5.23. *Problem:* Now in my job, my mapper output a key as text and value as text, but my reducer output key as text and value as nullwritable, but it seems *hbase mapreduce api dont consider reducer*, and outputs both key and value as text. Moreover if the same key comes twice, it goes to the file twice, even if my reducer want to log it only once. Could anybody help me with this problem? Regards, Parkirat Singh Bagga. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Hbase Mapreduce API - Reduce to a file is not working properly.
] 14/08/01 20:52:19 WARN snappy.LoadSnappy: Snappy native library is available 14/08/01 20:52:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/08/01 20:52:19 INFO snappy.LoadSnappy: Snappy native library loaded 14/08/01 20:52:41 INFO mapred.JobClient: Running job: job_201404021234_0090 14/08/01 20:52:42 INFO mapred.JobClient: map 0% reduce 0% 14/08/01 20:52:54 INFO mapred.JobClient: map 100% reduce 0% 14/08/01 20:53:02 INFO mapred.JobClient: map 100% reduce 33% 14/08/01 20:53:04 INFO mapred.JobClient: map 100% reduce 100% 14/08/01 20:53:05 INFO mapred.JobClient: Job complete: job_201404021234_0090 14/08/01 20:53:05 INFO mapred.JobClient: Counters: 29 14/08/01 20:53:05 INFO mapred.JobClient: Job Counters 14/08/01 20:53:05 INFO mapred.JobClient: Launched reduce tasks=1 14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9171 14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/08/01 20:53:05 INFO mapred.JobClient: Launched map tasks=1 14/08/01 20:53:05 INFO mapred.JobClient: Data-local map tasks=1 14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9719 14/08/01 20:53:05 INFO mapred.JobClient: File Output Format Counters 14/08/01 20:53:05 INFO mapred.JobClient: Bytes Written=119 14/08/01 20:53:05 INFO mapred.JobClient: FileSystemCounters 14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_READ=197 14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_READ=214 14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=112948 14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=119 14/08/01 20:53:05 INFO mapred.JobClient: File Input Format Counters 14/08/01 20:53:05 INFO mapred.JobClient: Bytes Read=83 14/08/01 20:53:05 INFO mapred.JobClient: Map-Reduce Framework 14/08/01 20:53:05 INFO mapred.JobClient: Map output materialized bytes=197 14/08/01 20:53:05 INFO mapred.JobClient: Map input records=1 14/08/01 20:53:05 INFO mapred.JobClient: Reduce shuffle bytes=197 14/08/01 20:53:05 INFO mapred.JobClient: Spilled Records=36 14/08/01 20:53:05 INFO mapred.JobClient: Map output bytes=155 14/08/01 20:53:05 INFO mapred.JobClient: CPU time spent (ms)=2770 14/08/01 20:53:05 INFO mapred.JobClient: Total committed heap usage (bytes)=398393344 14/08/01 20:53:05 INFO mapred.JobClient: Combine input records=0 14/08/01 20:53:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=131 14/08/01 20:53:05 INFO mapred.JobClient: Reduce input records=18 14/08/01 20:53:05 INFO mapred.JobClient: Reduce input groups=15 14/08/01 20:53:05 INFO mapred.JobClient: Combine output records=0 14/08/01 20:53:05 INFO mapred.JobClient: Physical memory (bytes) snapshot=385605632 14/08/01 20:53:05 INFO mapred.JobClient: Reduce output records=18 14/08/01 20:53:05 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2707595264 14/08/01 20:53:05 INFO mapred.JobClient: Map output records=18 *Generated Output File:* -bash-4.1$ hadoop fs -tail /tmp/wc/output/part-r-0 Hadoop 1 This1 an 1 as 1 example 1 example 1 fine1 if 1 is 1 not.1 or 1 so 1 test1 test1 this1 to 1 to 1 works 1 Regards, Parkirat Bagga -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p406.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Hbase Mapreduce API - Reduce to a file is not working properly.
/wc/output 14/08/01 20:52:19 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/08/01 20:52:19 INFO input.FileInputFormat: Total input paths to process : 1 14/08/01 20:52:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 14/08/01 20:52:19 INFO lzo.LzoCodec: Successfully loaded initialized native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] 14/08/01 20:52:19 WARN snappy.LoadSnappy: Snappy native library is available 14/08/01 20:52:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/08/01 20:52:19 INFO snappy.LoadSnappy: Snappy native library loaded 14/08/01 20:52:41 INFO mapred.JobClient: Running job: job_201404021234_0090 14/08/01 20:52:42 INFO mapred.JobClient: map 0% reduce 0% 14/08/01 20:52:54 INFO mapred.JobClient: map 100% reduce 0% 14/08/01 20:53:02 INFO mapred.JobClient: map 100% reduce 33% 14/08/01 20:53:04 INFO mapred.JobClient: map 100% reduce 100% 14/08/01 20:53:05 INFO mapred.JobClient: Job complete: job_201404021234_0090 14/08/01 20:53:05 INFO mapred.JobClient: Counters: 29 14/08/01 20:53:05 INFO mapred.JobClient: Job Counters 14/08/01 20:53:05 INFO mapred.JobClient: Launched reduce tasks=1 14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9171 14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/08/01 20:53:05 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/08/01 20:53:05 INFO mapred.JobClient: Launched map tasks=1 14/08/01 20:53:05 INFO mapred.JobClient: Data-local map tasks=1 14/08/01 20:53:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9719 14/08/01 20:53:05 INFO mapred.JobClient: File Output Format Counters 14/08/01 20:53:05 INFO mapred.JobClient: Bytes Written=119 14/08/01 20:53:05 INFO mapred.JobClient: FileSystemCounters 14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_READ=197 14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_READ=214 14/08/01 20:53:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=112948 14/08/01 20:53:05 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=119 14/08/01 20:53:05 INFO mapred.JobClient: File Input Format Counters 14/08/01 20:53:05 INFO mapred.JobClient: Bytes Read=83 14/08/01 20:53:05 INFO mapred.JobClient: Map-Reduce Framework 14/08/01 20:53:05 INFO mapred.JobClient: Map output materialized bytes=197 14/08/01 20:53:05 INFO mapred.JobClient: Map input records=1 14/08/01 20:53:05 INFO mapred.JobClient: Reduce shuffle bytes=197 14/08/01 20:53:05 INFO mapred.JobClient: Spilled Records=36 14/08/01 20:53:05 INFO mapred.JobClient: Map output bytes=155 14/08/01 20:53:05 INFO mapred.JobClient: CPU time spent (ms)=2770 14/08/01 20:53:05 INFO mapred.JobClient: Total committed heap usage (bytes)=398393344 14/08/01 20:53:05 INFO mapred.JobClient: Combine input records=0 14/08/01 20:53:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=131 14/08/01 20:53:05 INFO mapred.JobClient: Reduce input records=18 14/08/01 20:53:05 INFO mapred.JobClient: Reduce input groups=15 14/08/01 20:53:05 INFO mapred.JobClient: Combine output records=0 14/08/01 20:53:05 INFO mapred.JobClient: Physical memory (bytes) snapshot=385605632 14/08/01 20:53:05 INFO mapred.JobClient: Reduce output records=18 14/08/01 20:53:05 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2707595264 14/08/01 20:53:05 INFO mapred.JobClient: Map output records=18 *Generated Output File:* -bash-4.1$ hadoop fs -tail /tmp/wc/output/part-r-0 Hadoop 1 This1 an 1 as 1 example 1 example 1 fine1 if 1 is 1 not.1 or 1 so 1 test1 test1 this1 to 1 to 1 works 1 Regards, Parkirat Bagga -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p406.html Sent from the HBase User mailing list archive at Nabble.com.
Hbase Mapreduce API - Reduce to a file is not working properly.
Hi All, I am using Mapreduce API to read Hbase Table, based on some scan operation in mapper and putting the data to a file in reducer. I am using Hbase Version Version 0.94.5.23. *Problem:* Now in my job, my mapper output a key as text and value as text, but my reducer output key as text and value as nullwritable, but it seems *hbase mapreduce api dont consider reducer*, and outputs both key and value as text. Moreover if the same key comes twice, it goes to the file twice, even if my reducer want to log it only once. Could anybody help me with this problem? Regards, Parkirat Singh Bagga. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Hbase Mapreduce API - Reduce to a file is not working properly.
Hi Parkirat, I don't follow the reducer problem you're having. Can you post your code that configures the job? I assume you're using TableMapReduceUtil someplace. Your reducer is removing duplicate values? Sounds like you need to update it's logic to only emit a value once. Pastebin-ing your reducer code may be helpful as well. -n On Thu, Jul 31, 2014 at 8:20 AM, Parkirat parkiratbigd...@gmail.com wrote: Hi All, I am using Mapreduce API to read Hbase Table, based on some scan operation in mapper and putting the data to a file in reducer. I am using Hbase Version Version 0.94.5.23. *Problem:* Now in my job, my mapper output a key as text and value as text, but my reducer output key as text and value as nullwritable, but it seems *hbase mapreduce api dont consider reducer*, and outputs both key and value as text. Moreover if the same key comes twice, it goes to the file twice, even if my reducer want to log it only once. Could anybody help me with this problem? Regards, Parkirat Singh Bagga. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141.html Sent from the HBase User mailing list archive at Nabble.com.
Re: HBase MapReduce problem
Hi Ted, I am trying your solution. But I got the same error message. Thanks
Re: HBase MapReduce problem
Did you create the table prior to launching your program ? If so, when you scan hbase:meta table, do you see row(s) for it ? Cheers On Feb 4, 2014, at 12:53 AM, Murali muralidha...@veradistech.com wrote: Hi Ted, I am trying your solution. But I got the same error message. Thanks
Re: HBase MapReduce problem
Hi Ted, I am using HBase 0.96 version. But I am also getting the below error message 14/02/03 10:18:32 ERROR mapreduce.TableOutputFormat: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for after 35 tries. Exception in thread main java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for after 35 tries. at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputForma t.java:211) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:453) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java :342) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286) at com.hbase.HBaseWC.run(HBaseWC.java:78) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.hbase.HBaseWC.main(HBaseWC.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for after 35 tries. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation. locateRegionInMeta(HConnectionManager.java:1127) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation. locateRegion(HConnectionManager.java:1047) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation. locateRegion(HConnectionManager.java:1004) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:325) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:191) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:149) at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputForma t.java:206) ... 19 more May I know how to fix it? Thanks
Re: HBase MapReduce problem
Murali: Are you using 0.96.1.1 ? Can you show us the command line you used ? Meanwhile I assume the HBase cluster is functional - you can use shell to insert data. Cheers On Mon, Feb 3, 2014 at 8:33 PM, Murali muralidha...@veradistech.com wrote: Hi Ted, I am using HBase 0.96 version. But I am also getting the below error message 14/02/03 10:18:32 ERROR mapreduce.TableOutputFormat: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for after 35 tries. Exception in thread main java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for after 35 tries. at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputForma t.java:211) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:453) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java :342) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286) at com.hbase.HBaseWC.run(HBaseWC.java:78) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.hbase.HBaseWC.main(HBaseWC.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for after 35 tries. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation. locateRegionInMeta(HConnectionManager.java:1127) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation. locateRegion(HConnectionManager.java:1047) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation. locateRegion(HConnectionManager.java:1004) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:325) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:191) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:149) at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputForma t.java:206) ... 19 more May I know how to fix it? Thanks
Re: HBase MapReduce problem
Hi Ted Thanks for your reply. I am using HBase version 0.96.0. I can insert a record using shell command. I am running the below command to run my MapReduce job. It is a word count example. Reading a text file from hdfs file path and insert the counts to HBase table. hadoop jar hb.jar com.hbase.HBaseWC Thanks
Re: HBase MapReduce problem
See the sample command in http://hbase.apache.org/book.html#trouble.mapreduce : HADOOP_CLASSPATH=`hbase classpath` hadoop jar On Mon, Feb 3, 2014 at 9:33 PM, Murali muralidha...@veradistech.com wrote: Hi Ted Thanks for your reply. I am using HBase version 0.96.0. I can insert a record using shell command. I am running the below command to run my MapReduce job. It is a word count example. Reading a text file from hdfs file path and insert the counts to HBase table. hadoop jar hb.jar com.hbase.HBaseWC Thanks
HBase MapReduce with setup function problem
Dear all, I am writing a MapReduce application processing HBase table. In each map, it needs to read data from another HBase table, so i use the 'setup' function to initialize the HTable instance like this: @Override public void setup(Context context){ Configuration conf = HBaseConfiguration.create(); try { centrals = new HTable(conf, central.getBytes()); } catch (IOException e) { } return; } But, when i run this mapreduce application, it is always stay in 0%,0%. And the map phase is always under initializing, does not progress. I have googled it, but still do not have any ideas. P.S. I use hadoop-1.1.2 and Hbase-0.96. Thanks! - Dong
Re: HBase MapReduce with setup function problem
Have you considered using MultiTableInputFormat ? Cheers On Mon, Jan 27, 2014 at 9:14 AM, daidong daidon...@gmail.com wrote: Dear all, I am writing a MapReduce application processing HBase table. In each map, it needs to read data from another HBase table, so i use the 'setup' function to initialize the HTable instance like this: @Override public void setup(Context context){ Configuration conf = HBaseConfiguration.create(); try { centrals = new HTable(conf, central.getBytes()); } catch (IOException e) { } return; } But, when i run this mapreduce application, it is always stay in 0%,0%. And the map phase is always under initializing, does not progress. I have googled it, but still do not have any ideas. P.S. I use hadoop-1.1.2 and Hbase-0.96. Thanks! - Dong
Re: HBase MapReduce with setup function problem
I agree that we should find the cause for why initialization got stuck. I noticed empty catch block: } catch (IOException e) { } Can you add some logging there to see what might have gone wrong ? Thanks On Mon, Jan 27, 2014 at 11:56 AM, daidong daidon...@gmail.com wrote: Dear Ted, Thanks very much for your reply! Yes. MultiTableInputFormat may work here, but i still want to know how to connect a hbase table inside MapReduce applications. Because i may need also write to tables inside map function. Do you know why previous mr application does not work? Because the wrong configuration instance? Any suggestion will be great for me! Thanks! - Dong 2014-01-27 Ted Yu yuzhih...@gmail.com Have you considered using MultiTableInputFormat ? Cheers On Mon, Jan 27, 2014 at 9:14 AM, daidong daidon...@gmail.com wrote: Dear all, I am writing a MapReduce application processing HBase table. In each map, it needs to read data from another HBase table, so i use the 'setup' function to initialize the HTable instance like this: @Override public void setup(Context context){ Configuration conf = HBaseConfiguration.create(); try { centrals = new HTable(conf, central.getBytes()); } catch (IOException e) { } return; } But, when i run this mapreduce application, it is always stay in 0%,0%. And the map phase is always under initializing, does not progress. I have googled it, but still do not have any ideas. P.S. I use hadoop-1.1.2 and Hbase-0.96. Thanks! - Dong
HBase MapReduce problem
Dear all, I have a simple HBase MapReduce application and try to run it on a 12-node cluster using this command: HADOOP_CLASSPATH=`bin/hbase classpath` ~/hadoop-1.1.2/bin/hadoop jar .jar org.test.WordCount HBase version is 0.95.0. But i got this error: java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for wordcount,,99 after 10 tries. after a long waiting from this output: 14/01/24 10:12:51 INFO mapred.JobClient: map 0% reduce 0% It seems like RegionServer problem, however, i have tried all the ways to make sure the HBase cluster is running well: i can access all region servers through web ui, i can see the table 'wordcount' there, i run 'hbase hbck' and it returns all 'ok'. I even try a simple HBase program without MapReduce, it works well too. So, Could anybody tell me why this happens? how to fix it? Is it relevant to my Hadoop configuration? Thanks! - Dong Dai
Re: HBase MapReduce problem
Why do you use 0.95 which was a developer release ? See http://hbase.apache.org/book.html#d243e520 Cheers On Fri, Jan 24, 2014 at 8:40 AM, daidong daidon...@gmail.com wrote: Dear all, I have a simple HBase MapReduce application and try to run it on a 12-node cluster using this command: HADOOP_CLASSPATH=`bin/hbase classpath` ~/hadoop-1.1.2/bin/hadoop jar .jar org.test.WordCount HBase version is 0.95.0. But i got this error: java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for wordcount,,99 after 10 tries. after a long waiting from this output: 14/01/24 10:12:51 INFO mapred.JobClient: map 0% reduce 0% It seems like RegionServer problem, however, i have tried all the ways to make sure the HBase cluster is running well: i can access all region servers through web ui, i can see the table 'wordcount' there, i run 'hbase hbck' and it returns all 'ok'. I even try a simple HBase program without MapReduce, it works well too. So, Could anybody tell me why this happens? how to fix it? Is it relevant to my Hadoop configuration? Thanks! - Dong Dai
Re: HBase MapReduce problem
Thanks Ted, I actually tried to modify HBase, so i choose this developer release. So, you are thinking this is a version problem, should disappear if i switched to 0.96? 2014/1/24 Ted Yu yuzhih...@gmail.com Why do you use 0.95 which was a developer release ? See http://hbase.apache.org/book.html#d243e520 Cheers On Fri, Jan 24, 2014 at 8:40 AM, daidong daidon...@gmail.com wrote: Dear all, I have a simple HBase MapReduce application and try to run it on a 12-node cluster using this command: HADOOP_CLASSPATH=`bin/hbase classpath` ~/hadoop-1.1.2/bin/hadoop jar .jar org.test.WordCount HBase version is 0.95.0. But i got this error: java.lang.RuntimeException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for wordcount,,99 after 10 tries. after a long waiting from this output: 14/01/24 10:12:51 INFO mapred.JobClient: map 0% reduce 0% It seems like RegionServer problem, however, i have tried all the ways to make sure the HBase cluster is running well: i can access all region servers through web ui, i can see the table 'wordcount' there, i run 'hbase hbck' and it returns all 'ok'. I even try a simple HBase program without MapReduce, it works well too. So, Could anybody tell me why this happens? how to fix it? Is it relevant to my Hadoop configuration? Thanks! - Dong Dai
FILE_BYTES_READ counter missing for HBase mapreduce job
Hi, Basically I have a mapreduce job to scan a hbase table and do some processing. After the job finishes, I only got three filesystem counters: HDFS_BYTES_READ, HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN. The value of HDFS_BYTES_READ is not very useful here because it shows the size of the .META file, not the size of input records. I am looking for counter FILE_BYTES_READ but somehow it's missing in the job status report. Does anyone know what I might miss here? Thanks Haijia P.S. The job status report FileSystemCounters HDFS_BYTES_READ 340,124 0 340,124 FILE_BYTES_WRITTEN 190,431,329 0 190,431,329 HDFS_BYTES_WRITTEN 272,538,467,123 0 272,538,467,123
Re: FILE_BYTES_READ counter missing for HBase mapreduce job
Addition info: The mapreduce job I run is a map-only job. It does not have reducers and it write data directly to hdfs in the mapper. Could this be the reason why there's no value for file_bytes_read? If so, is there any easy way to get the total input data size? Thanks Haijia On Thu, Sep 5, 2013 at 2:46 PM, Haijia Zhou leons...@gmail.com wrote: Hi, Basically I have a mapreduce job to scan a hbase table and do some processing. After the job finishes, I only got three filesystem counters: HDFS_BYTES_READ, HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN. The value of HDFS_BYTES_READ is not very useful here because it shows the size of the .META file, not the size of input records. I am looking for counter FILE_BYTES_READ but somehow it's missing in the job status report. Does anyone know what I might miss here? Thanks Haijia P.S. The job status report FileSystemCounters HDFS_BYTES_READ 340,124 0 340,124 FILE_BYTES_WRITTEN 190,431,329 0 190,431,329 HDFS_BYTES_WRITTEN 272,538,467,123 0 272,538,467,123
issure about DNS error in running hbase mapreduce
i use hadoop-dns-checker check the dns problem ,seems all ok,but when i run MR task in hbase,it report problem,anyone have good idea? # ./run-on-cluster.sh hosts1 CH22 The authenticity of host 'ch22 (192.168.10.22)' can't be established. RSA key fingerprint is f3:4a:ca:a3:17:08:98:c2:0a:bd:27:99:a3:65:bc:89. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'ch22,192.168.10.22' (RSA) to the list of known hosts. root@ch22's password: sending incremental file list created directory hadoop-dns a.jar hosts1 run.sh sent 2394 bytes received 69 bytes 547.33 bytes/sec total size is 2618 speedup is 1.06 root@ch22's password: # self check... -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes # end self check Running on : CH22/192.168.10.22 = -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes CH34 root@ch34's password: sending incremental file list created directory hadoop-dns a.jar hosts1 run.sh sent 2394 bytes received 69 bytes 703.71 bytes/sec total size is 2618 speedup is 1.06 root@ch34's password: # self check... -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes # end self check Running on : CH34/192.168.10.34 = -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes CH35 root@ch35's password: sending incremental file list created directory hadoop-dns a.jar hosts1 run.sh sent 2394 bytes received 69 bytes 703.71 bytes/sec total size is 2618 speedup is 1.06 root@ch35's password: # self check... -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes # end self check Running on : CH35/192.168.10.35 = -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes CH36 root@ch36's password: sending incremental file list created directory hadoop-dns a.jar hosts1 run.sh sent 2394 bytes received 69 bytes 703.71 bytes/sec total size is 2618 speedup is 1.06 root@ch36's password: # self check... -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes # end self check Running on : CH36/192.168.10.36 = -- host : CH22 host lookup : success (192.168.10.22) reverse lookup : success (CH22) is reachable : yes -- host : CH34 host lookup : success (192.168.10.34) reverse lookup : success (CH34) is reachable : yes -- host : CH35 host lookup : success (192.168.10.35) reverse lookup : success (CH35) is reachable : yes -- host : CH36 host lookup : success (192.168.10.36) reverse lookup : success (CH36) is reachable : yes # yarn jar mapreducehbaseTest.jar com.mediaadx.hbase.hadoop.test.TxtHbase '' '' 13/08/02 13:10:17 WARN conf.Configuration: dfs.df.interval is deprecated. Instead, use fs.df.interval 13/08/02 13:10:17 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 13/08/02 13:10:17 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 13/08/02 13:10:17 WARN conf.Configuration: topology.script.number.args is deprecated. Instead, use net.topology.script.number.args 13/08/02 13:10:17 WARN conf.Configuration: dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode 13/08/02 13:10:17 WARN conf.Configuration: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 13/08/02 13:10:17 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 13/08/02 13:10:17 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 13/08/02 13:10:17 WARN conf.Configuration: slave.host.name is deprecated. Instead, use
HBase mapreduce job: unable to find region for a table
I am running a very simple MR HBase job (reading from a tiny HBase table and outputs nothing). I run it on a pseudo-distributed HBase cluster on my local machine which uses a pseudo-distributed HDFS (on local machine again). When I run it, I get the following exception: Unable to find region for test. But I am sure the table test exists. 13/07/11 10:27:35 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13fcec598d70005, negotiated timeout = 9 13/07/11 10:38:15 ERROR mapreduce.TableInputFormat: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for test,,99 after 10 tries. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234) . Here is my HBase hbase-site.xml file: configuration property namehbase.rootdir/name valuehdfs://127.0.0.1:9000/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name value127.0.0.1/value /property /configuration
Re: HBase mapreduce job: unable to find region for a table
Hi, Is your table properly served? Are you able to see it on the Web UI? Is you HBCK reporting everything correctly? JM 2013/7/11 S. Zhou myx...@yahoo.com I am running a very simple MR HBase job (reading from a tiny HBase table and outputs nothing). I run it on a pseudo-distributed HBase cluster on my local machine which uses a pseudo-distributed HDFS (on local machine again). When I run it, I get the following exception: Unable to find region for test. But I am sure the table test exists. 13/07/11 10:27:35 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13fcec598d70005, negotiated timeout = 9 13/07/11 10:38:15 ERROR mapreduce.TableInputFormat: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for test,,99 after 10 tries. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234) . Here is my HBase hbase-site.xml file: configuration property namehbase.rootdir/name valuehdfs://127.0.0.1:9000/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name value127.0.0.1/value /property /configuration
Re: HBase mapreduce job: unable to find region for a table
Yes, I can see the table through hbase shell and web ui (localhost:60010). hbck reports ok From: Jean-Marc Spaggiari jean-m...@spaggiari.org To: user@hbase.apache.org; S. Zhou myx...@yahoo.com Sent: Thursday, July 11, 2013 11:01 AM Subject: Re: HBase mapreduce job: unable to find region for a table Hi, Is your table properly served? Are you able to see it on the Web UI? Is you HBCK reporting everything correctly? JM 2013/7/11 S. Zhou myx...@yahoo.com I am running a very simple MR HBase job (reading from a tiny HBase table and outputs nothing). I run it on a pseudo-distributed HBase cluster on my local machine which uses a pseudo-distributed HDFS (on local machine again). When I run it, I get the following exception: Unable to find region for test. But I am sure the table test exists. 13/07/11 10:27:35 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13fcec598d70005, negotiated timeout = 9 13/07/11 10:38:15 ERROR mapreduce.TableInputFormat: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for test,,99 after 10 tries. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234) . Here is my HBase hbase-site.xml file: configuration property namehbase.rootdir/name valuehdfs://127.0.0.1:9000/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name value127.0.0.1/value /property /configuration
Re: HBase mapreduce job: unable to find region for a table
On the webui, when you click on your table, can you see the regions and are they assigned to the server correctly JM 2013/7/11 S. Zhou myx...@yahoo.com Yes, I can see the table through hbase shell and web ui (localhost:60010). hbck reports ok -- *From:* Jean-Marc Spaggiari jean-m...@spaggiari.org *To:* user@hbase.apache.org; S. Zhou myx...@yahoo.com *Sent:* Thursday, July 11, 2013 11:01 AM *Subject:* Re: HBase mapreduce job: unable to find region for a table Hi, Is your table properly served? Are you able to see it on the Web UI? Is you HBCK reporting everything correctly? JM 2013/7/11 S. Zhou myx...@yahoo.com I am running a very simple MR HBase job (reading from a tiny HBase table and outputs nothing). I run it on a pseudo-distributed HBase cluster on my local machine which uses a pseudo-distributed HDFS (on local machine again). When I run it, I get the following exception: Unable to find region for test. But I am sure the table test exists. 13/07/11 10:27:35 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13fcec598d70005, negotiated timeout = 9 13/07/11 10:38:15 ERROR mapreduce.TableInputFormat: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for test,,99 after 10 tries. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234) . Here is my HBase hbase-site.xml file: configuration property namehbase.rootdir/name valuehdfs://127.0.0.1:9000/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name value127.0.0.1/value /property /configuration
hbase + mapreduce
Hello: I'm working in a proyect, and i'm using hbase for storage the data, y have this method that work great but without the performance i'm looking for, so i want is to make the same but using mapreduce. public ArrayListMyObject findZ(String z) throws IOException { ArrayListMyObject rows = new ArrayListMyObject(); Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, test); Scan s = new Scan(); s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y)); ResultScanner scanner = table.getScanner(s); try { for (Result rr : scanner) { if (Bytes.toString(rr.getValue(Bytes.toBytes(x), Bytes.toBytes(y))).equals(z)) { rows.add(getInformation(Bytes.toString(rr.getRow(; } } } finally { scanner.close(); } return archivos; } the getInformation method take all the columns and convert the row in MyObject type. I just want a example or a link to a tutorial that make something like this, i want to get a result type as answer and not a number to count words, like many a found. My natural language is spanish, so sorry if something is not well writing. Thanths http://www.uci.cu
Re: hbase + mapreduce
Here you have several examples: http://hbase.apache.org/book/mapreduce.example.html http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/ http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/ http://stackoverflow.com/questions/12215313/load-data-into-hbase-table-using-hbase-map-reduce-api 2013/4/21 Adrian Acosta Mitjans amitj...@estudiantes.uci.cu Hello: I'm working in a proyect, and i'm using hbase for storage the data, y have this method that work great but without the performance i'm looking for, so i want is to make the same but using mapreduce. public ArrayListMyObject findZ(String z) throws IOException { ArrayListMyObject rows = new ArrayListMyObject(); Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, test); Scan s = new Scan(); s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y)); ResultScanner scanner = table.getScanner(s); try { for (Result rr : scanner) { if (Bytes.toString(rr.getValue(Bytes.toBytes(x), Bytes.toBytes(y))).equals(z)) { rows.add(getInformation(Bytes.toString(rr.getRow(; } } } finally { scanner.close(); } return archivos; } the getInformation method take all the columns and convert the row in MyObject type. I just want a example or a link to a tutorial that make something like this, i want to get a result type as answer and not a number to count words, like many a found. My natural language is spanish, so sorry if something is not well writing. Thanths http://www.uci.cu -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186
hbase + mapreduce
Hello: I'm working in a proyect, and i'm using hbase for storage the data, y have this method that work great but without the performance i'm looking for, so i want is to make the same but using mapreduce. public ArrayListMyObject findZ(String z) throws IOException { ArrayListMyObject rows = new ArrayListMyObject(); Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, test); Scan s = new Scan(); s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y)); ResultScanner scanner = table.getScanner(s); try { for (Result rr : scanner) { if (Bytes.toString(rr.getValue(Bytes.toBytes(x), Bytes.toBytes(y))).equals(z)) { rows.add(getInformation(Bytes.toString(rr.getRow(; } } } finally { scanner.close(); } return archivos; } the getInformation method take all the columns and convert the row in MyObject type. I just want a example or a link to a tutorial that make something like this, i want to get a result type as answer and not a number to count words, like many a found. My natural language is spanish, so sorry if something is not well writing. Thanths http://www.uci.cu
Re: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction
Tnx,But I don't know why when the client.buffer.size is increased, I've got bad result,does it related to other parameters ? and I give 8 gb heap to each regionserver. On Mon, Jan 21, 2013 at 12:34 PM, Harsh J ha...@cloudera.com wrote: Hi Farrokh, This isn't a HDFS question - please ask these questions only on their relevant lists for best results and to keep each list's discussion separate. On Mon, Jan 21, 2013 at 11:40 AM, Farrokh Shahriari mohandes.zebeleh...@gmail.com wrote: Hi there Is there any way to use arrayList of Puts in map function to insert data to hbase ? Because,the context.write method doesn't allow to use arraylist of puts,so in every map function I can only put one row. What can I do for inserting some rows in each map function ? And also how can I use autoflush bufferclientside in Map function for inserting data to Hbase Table ? Mohandes Zebeleh -- Harsh J
Re: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction
Give put(ListPut puts) a shot and see if it works for you. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 21, 2013 at 11:41 AM, Farrokh Shahriari mohandes.zebeleh...@gmail.com wrote: Hi there Is there any way to use arrayList of Puts in map function to insert data to hbase ? Because,the context.write method doesn't allow to use arraylist of puts,so in every map function I can only put one row. What can I do for inserting some rows in each map function ? And also how can I use autoflush bufferclientside in Map function for inserting data to Hbase Table ? Mohandes Zebeleh
RE: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction
And also how can I use autoflush bufferclientside in Map function for inserting data to Hbase Table ? You are using TableOutputFormat right? Here autoFlush is turned OFF ... You can use config param hbase.client.write.buffer to set the client side buffer size. -Anoop- From: Farrokh Shahriari [mohandes.zebeleh...@gmail.com] Sent: Monday, January 21, 2013 11:41 AM To: user@hbase.apache.org Subject: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction Hi there Is there any way to use arrayList of Puts in map function to insert data to hbase ? Because,the context.write method doesn't allow to use arraylist of puts,so in every map function I can only put one row. What can I do for inserting some rows in each map function ? And also how can I use autoflush bufferclientside in Map function for inserting data to Hbase Table ? Mohandes Zebeleh
Re: Hbase MapReduce
Hallo, It 's weird that hbase aggregate functions don't use MapReduce, this means that the performance will be very poor. Is it a must to use coprocessors? Is there a much easier way to improve the functions' performance ? Why would performance be poor? I am not dealing a long time with these coprocessors and am still testing a lot, but in my perception its much more lightweight on the other hand. Actually, depending on your row key design and request range, the load is distributed across the region servers. Each will handle aggregation for its own key range. The client, invoking the coprpocessor then must merge the results, which can be seen as sort of reduce function. I think one has to precisely think about its own requirements. I am not sure how and even if M/R Jobs can work for real-time scenarios. Here, coprocessors seem to be a good alternative, which could also be limited, depending of how many rows you need to iterate in the coprocessor for the kind of data you have and expect to request. However, having the data processed in a M/R job before and persistent would be faster for the single client request because you only need to fetch the aggregated data that then already exists. But how recent is data at this time? Does it change frequently and aggregation results must be as recent as the data it reflects? Could be a con against M/R... Regards tom Am 25.11.2012 07:26, schrieb Wei Tan: Actually coprocessor can be used to implement MR-like function, while not using Hadoop framework. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center Yorktown Heights, NY 10598 w...@us.ibm.com; 914-784-6752 From: Dalia Sobhy dalia.mohso...@hotmail.com To: user@hbase.apache.org user@hbase.apache.org, Date: 11/24/2012 01:33 PM Subject:RE: Hbase MapReduce It 's weird that hbase aggregate functions don't use MapReduce, this means that the performance will be very poor. Is it a must to use coprocessors? Is there a much easier way to improve the functions' performance ? CC: user@hbase.apache.org From: michael_se...@hotmail.com Subject: Re: Hbase MapReduce Date: Sat, 24 Nov 2012 12:05:45 -0600 To: user@hbase.apache.org Do you think it would be a good idea to temper the use of CoProcessors? This kind of reminds me of when people first started using stored procedures... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 24, 2012, at 11:46 AM, tom t...@arcor.de wrote: Hi, but you do not need to us M/R. You could also use coprocessors. See this site: https://blogs.apache.org/hbase/entry/coprocessor_introduction - in the section Endpoints An aggregation coprocessor ships with hbase that should match your requirements. You just need to load it and eventually you can access it from HTable: HTable.coprocessorExec(..) http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte [],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29 Regards tom Am 24.11.2012 18:32, schrieb Marcos Ortiz: Regards, Dalia. You have to use MapReduce for that. In the HBase in Practice´s book, there are lot of great examples for this. On 11/24/2012 12:15 PM, Dalia Sobhy wrote: Dear all, I wanted to ask a question.. Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions? Thanks :D
Hbase MapReduce
Dear all, I wanted to ask a question.. Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions? Thanks :D
Re: Hbase MapReduce
Regards, Dalia. You have to use MapReduce for that. In the HBase in Practice´s book, there are lot of great examples for this. On 11/24/2012 12:15 PM, Dalia Sobhy wrote: Dear all, I wanted to ask a question.. Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions? Thanks :D -- Marcos Luis Orti'z Valmaseda about.me/marcosortiz http://about.me/marcosortiz @marcosluis2186 http://twitter.com/marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Hbase MapReduce
Hi, but you do not need to us M/R. You could also use coprocessors. See this site: https://blogs.apache.org/hbase/entry/coprocessor_introduction - in the section Endpoints An aggregation coprocessor ships with hbase that should match your requirements. You just need to load it and eventually you can access it from HTable: HTable.coprocessorExec(..) http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29 Regards tom Am 24.11.2012 18:32, schrieb Marcos Ortiz: Regards, Dalia. You have to use MapReduce for that. In the HBase in Practice´s book, there are lot of great examples for this. On 11/24/2012 12:15 PM, Dalia Sobhy wrote: Dear all, I wanted to ask a question.. Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions? Thanks :D
Re: Hbase MapReduce
Do you think it would be a good idea to temper the use of CoProcessors? This kind of reminds me of when people first started using stored procedures... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 24, 2012, at 11:46 AM, tom t...@arcor.de wrote: Hi, but you do not need to us M/R. You could also use coprocessors. See this site: https://blogs.apache.org/hbase/entry/coprocessor_introduction - in the section Endpoints An aggregation coprocessor ships with hbase that should match your requirements. You just need to load it and eventually you can access it from HTable: HTable.coprocessorExec(..) http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29 Regards tom Am 24.11.2012 18:32, schrieb Marcos Ortiz: Regards, Dalia. You have to use MapReduce for that. In the HBase in Practice´s book, there are lot of great examples for this. On 11/24/2012 12:15 PM, Dalia Sobhy wrote: Dear all, I wanted to ask a question.. Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions? Thanks :D
RE: Hbase MapReduce
It 's weird that hbase aggregate functions don't use MapReduce, this means that the performance will be very poor. Is it a must to use coprocessors? Is there a much easier way to improve the functions' performance ? CC: user@hbase.apache.org From: michael_se...@hotmail.com Subject: Re: Hbase MapReduce Date: Sat, 24 Nov 2012 12:05:45 -0600 To: user@hbase.apache.org Do you think it would be a good idea to temper the use of CoProcessors? This kind of reminds me of when people first started using stored procedures... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 24, 2012, at 11:46 AM, tom t...@arcor.de wrote: Hi, but you do not need to us M/R. You could also use coprocessors. See this site: https://blogs.apache.org/hbase/entry/coprocessor_introduction - in the section Endpoints An aggregation coprocessor ships with hbase that should match your requirements. You just need to load it and eventually you can access it from HTable: HTable.coprocessorExec(..) http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte[],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29 Regards tom Am 24.11.2012 18:32, schrieb Marcos Ortiz: Regards, Dalia. You have to use MapReduce for that. In the HBase in Practice´s book, there are lot of great examples for this. On 11/24/2012 12:15 PM, Dalia Sobhy wrote: Dear all, I wanted to ask a question.. Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions? Thanks :D
RE: Hbase MapReduce
Actually coprocessor can be used to implement MR-like function, while not using Hadoop framework. Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center Yorktown Heights, NY 10598 w...@us.ibm.com; 914-784-6752 From: Dalia Sobhy dalia.mohso...@hotmail.com To: user@hbase.apache.org user@hbase.apache.org, Date: 11/24/2012 01:33 PM Subject:RE: Hbase MapReduce It 's weird that hbase aggregate functions don't use MapReduce, this means that the performance will be very poor. Is it a must to use coprocessors? Is there a much easier way to improve the functions' performance ? CC: user@hbase.apache.org From: michael_se...@hotmail.com Subject: Re: Hbase MapReduce Date: Sat, 24 Nov 2012 12:05:45 -0600 To: user@hbase.apache.org Do you think it would be a good idea to temper the use of CoProcessors? This kind of reminds me of when people first started using stored procedures... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 24, 2012, at 11:46 AM, tom t...@arcor.de wrote: Hi, but you do not need to us M/R. You could also use coprocessors. See this site: https://blogs.apache.org/hbase/entry/coprocessor_introduction - in the section Endpoints An aggregation coprocessor ships with hbase that should match your requirements. You just need to load it and eventually you can access it from HTable: HTable.coprocessorExec(..) http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorExec%28java.lang.Class,%20byte [],%20byte[],%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Callback%29 Regards tom Am 24.11.2012 18:32, schrieb Marcos Ortiz: Regards, Dalia. You have to use MapReduce for that. In the HBase in Practice´s book, there are lot of great examples for this. On 11/24/2012 12:15 PM, Dalia Sobhy wrote: Dear all, I wanted to ask a question.. Do Hbase Aggregate Functions such as rowcount, getMax, get Average use MapReduce to execute those functions? Thanks :D
Re: Query regarding HBase Mapreduce
Hi Amit, You might want to add details to your question. 1) Lot of small files is a known 'problem' for Hadoop MapReduce. And you will find information on it by searching. http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ I assume you have a more specific issue, what is it? 2) I am not sure what you mean by HBase mapreduce on small files. If you are using MapReduce with HBase as a source, you are not dealing with files directly. If you are using HBase as a sink, then the lots of small files is a problem which is orthogonal to the use of HBase. I don't think there is such a thing as HBase MapReduce. You might want to reformulate your use case. Regards Bertrand On Thu, Oct 25, 2012 at 4:15 PM, amit bohra bohr...@gmail.com wrote: Hi, We are working on processing of lot of small files. For processing them we are using HBase Mapreduce as of now. Currently we are working with files in the range for around few millions, but over the period of time it would grow to a larger extent. Did anyone faced any issues while working on HBase mapreduce on small files? Thanks and Regards, Amit Bohra -- Bertrand Dechoux
Re: Query regarding HBase Mapreduce
Hi amit I am starting with Hbase and MR so my opinion ismore about what I read than real world. However the documentation says Hadoop will deal better with a set of large files than a lot of small ones. regards amit bohra bohra.a@... writes:
Re: Query regarding HBase Mapreduce
When you say small files, do you mean to say those are stored within HBase columns? If so, you need not worry as HBase would eventually write bigger HFile on disk (or HDFS). If you are storing lot of small files on HDFS itself, then you will have scalability problems as single NameNode cannot handle billions of files. 2012/10/25 amit bohra bohr...@gmail.com Hi, We are working on processing of lot of small files. For processing them we are using HBase Mapreduce as of now. Currently we are working with files in the range for around few millions, but over the period of time it would grow to a larger extent. Did anyone faced any issues while working on HBase mapreduce on small files? Thanks and Regards, Amit Bohra -- Have a Nice Day! Lohit
HBase MapReduce - Using mutiple tables as source
Hi, While writing a MapReduce job for HBase, can I use multiple tables as input? I think TableMapReduceUtil.initTableMapperJob() takes a single table as parameter. For my requirement, I want to specify multiple tables and scan instances. I read about MultiTableInputCollection in the document https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in HBase-0.92.0. Regards, Amlan
Re: HBase MapReduce - Using mutiple tables as source
Hello Amlan, Issue is still unresolved...Will get fixed in 0.96.0. Regards, Mohammad Tariq On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote: Hi, While writing a MapReduce job for HBase, can I use multiple tables as input? I think TableMapReduceUtil.initTableMapperJob() takes a single table as parameter. For my requirement, I want to specify multiple tables and scan instances. I read about MultiTableInputCollection in the document https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in HBase-0.92.0. Regards, Amlan
Re: HBase MapReduce - Using mutiple tables as source
Hi, Isn't that the case that you can always initiate a scanner inside a map job (referring to another table from which had been set into the configuration of TableMapReduceUtil.initTableMapperJob(...) ) ? Hope this serves as temporary solution. On 08/06/2012 02:35 PM, Mohammad Tariq wrote: Hello Amlan, Issue is still unresolved...Will get fixed in 0.96.0. Regards, Mohammad Tariq On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote: Hi, While writing a MapReduce job for HBase, can I use multiple tables as input? I think TableMapReduceUtil.initTableMapperJob() takes a single table as parameter. For my requirement, I want to specify multiple tables and scan instances. I read about MultiTableInputCollection in the document https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in HBase-0.92.0. Regards, Amlan
Re: HBase MapReduce - Using mutiple tables as source
Hi Amlan, I think if you share your usecase regarding two tables as inputs, people on the mailing list may be able to help you better. For example, are you looking at joining the two tables? What are the sizes of the tables etc? Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Mon, Aug 6, 2012 at 5:10 PM, Ioakim Perros imper...@gmail.com wrote: Hi, Isn't that the case that you can always initiate a scanner inside a map job (referring to another table from which had been set into the configuration of TableMapReduceUtil.**initTableMapperJob(...) ) ? Hope this serves as temporary solution. On 08/06/2012 02:35 PM, Mohammad Tariq wrote: Hello Amlan, Issue is still unresolved...Will get fixed in 0.96.0. Regards, Mohammad Tariq On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote: Hi, While writing a MapReduce job for HBase, can I use multiple tables as input? I think TableMapReduceUtil.**initTableMapperJob() takes a single table as parameter. For my requirement, I want to specify multiple tables and scan instances. I read about MultiTableInputCollection in the document https://issues.apache.org/**jira/browse/HBASE-3996https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in HBase-0.92.0. Regards, Amlan
RE: HBase MapReduce - Using mutiple tables as source
Hi, If TableMapper and TableMapReduceUtil.initTableMapperJob() does not support multiple tables as input, can I use Hadoop Mapper/Reducer classes and specify the the input/output format myself? What I want to do is, I want to read two tables in the map phase and want to reduce them together. What is the best solution available in 0.92.0 (I understand the best solution is coming in version 0.96.0). Regards, Amlan -Original Message- From: Ioakim Perros [mailto:imper...@gmail.com] Sent: Monday, August 06, 2012 5:11 PM To: user@hbase.apache.org Subject: Re: HBase MapReduce - Using mutiple tables as source Hi, Isn't that the case that you can always initiate a scanner inside a map job (referring to another table from which had been set into the configuration of TableMapReduceUtil.initTableMapperJob(...) ) ? Hope this serves as temporary solution. On 08/06/2012 02:35 PM, Mohammad Tariq wrote: Hello Amlan, Issue is still unresolved...Will get fixed in 0.96.0. Regards, Mohammad Tariq On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote: Hi, While writing a MapReduce job for HBase, can I use multiple tables as input? I think TableMapReduceUtil.initTableMapperJob() takes a single table as parameter. For my requirement, I want to specify multiple tables and scan instances. I read about MultiTableInputCollection in the document https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in HBase-0.92.0. Regards, Amlan
Re: HBase MapReduce - Using mutiple tables as source
Hi, Perhaps you want to take a look at MultipleInputs. I'm not sure if it works for TableInputFormat, but at least you can use it for inspiration. Ferdy. On Mon, Aug 6, 2012 at 3:02 PM, Amlan Roy amlan@cleartrip.com wrote: Hi, If TableMapper and TableMapReduceUtil.initTableMapperJob() does not support multiple tables as input, can I use Hadoop Mapper/Reducer classes and specify the the input/output format myself? What I want to do is, I want to read two tables in the map phase and want to reduce them together. What is the best solution available in 0.92.0 (I understand the best solution is coming in version 0.96.0). Regards, Amlan -Original Message- From: Ioakim Perros [mailto:imper...@gmail.com] Sent: Monday, August 06, 2012 5:11 PM To: user@hbase.apache.org Subject: Re: HBase MapReduce - Using mutiple tables as source Hi, Isn't that the case that you can always initiate a scanner inside a map job (referring to another table from which had been set into the configuration of TableMapReduceUtil.initTableMapperJob(...) ) ? Hope this serves as temporary solution. On 08/06/2012 02:35 PM, Mohammad Tariq wrote: Hello Amlan, Issue is still unresolved...Will get fixed in 0.96.0. Regards, Mohammad Tariq On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote: Hi, While writing a MapReduce job for HBase, can I use multiple tables as input? I think TableMapReduceUtil.initTableMapperJob() takes a single table as parameter. For my requirement, I want to specify multiple tables and scan instances. I read about MultiTableInputCollection in the document https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in HBase-0.92.0. Regards, Amlan
RE: HBase MapReduce - Using mutiple tables as source
A related question: may I have multiple tables as output, in a single Map Job? I understand that this is achievable by running multiple MR jobs, each with a different output table specified in the reduce class. What I want is to scan a source table once and generate multiple tables at one time. Thanks, Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 w...@us.ibm.com; 914-784-6752 From: Amlan Roy amlan@cleartrip.com To: user@hbase.apache.org, Date: 08/06/2012 09:05 AM Subject:RE: HBase MapReduce - Using mutiple tables as source Hi, If TableMapper and TableMapReduceUtil.initTableMapperJob() does not support multiple tables as input, can I use Hadoop Mapper/Reducer classes and specify the the input/output format myself? What I want to do is, I want to read two tables in the map phase and want to reduce them together. What is the best solution available in 0.92.0 (I understand the best solution is coming in version 0.96.0). Regards, Amlan -Original Message- From: Ioakim Perros [mailto:imper...@gmail.com] Sent: Monday, August 06, 2012 5:11 PM To: user@hbase.apache.org Subject: Re: HBase MapReduce - Using mutiple tables as source Hi, Isn't that the case that you can always initiate a scanner inside a map job (referring to another table from which had been set into the configuration of TableMapReduceUtil.initTableMapperJob(...) ) ? Hope this serves as temporary solution. On 08/06/2012 02:35 PM, Mohammad Tariq wrote: Hello Amlan, Issue is still unresolved...Will get fixed in 0.96.0. Regards, Mohammad Tariq On Mon, Aug 6, 2012 at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote: Hi, While writing a MapReduce job for HBase, can I use multiple tables as input? I think TableMapReduceUtil.initTableMapperJob() takes a single table as parameter. For my requirement, I want to specify multiple tables and scan instances. I read about MultiTableInputCollection in the document https://issues.apache.org/jira/browse/HBASE-3996. But I don't find it in HBase-0.92.0. Regards, Amlan
Re: HBase MapReduce - Using mutiple tables as source
On Mon, Aug 6, 2012 at 3:22 PM, Wei Tan w...@us.ibm.com wrote: I understand that this is achievable by running multiple MR jobs, each with a different output table specified in the reduce class. What I want is to scan a source table once and generate multiple tables at one time. Thanks, There is nothing in HBase natively that will do this but no reason you can't do this in a map or reduce task. You'd set up two or more HTable instances on task init each pointing to a particular table. Then, inside in your task you'd send the puts to one of the possible HTables switching on whatever your fancy. St.Ack
Re: HBase MapReduce - Using mutiple tables as source
Its available just as a patch on trunk for now. You wont find it in 0.92.0 ./zahoor On 06-Aug-2012, at 5:01 PM, Amlan Roy amlan@cleartrip.com wrote: https://issues.apache.org/jira/browse/HBASE-3996
A question about HBase MapReduce
Hello! I've read Lars George's blog http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where at the end of the article, he mentioned In the next post I will show you how to import data from a raw data file into a HBase table and how you eventually process the data in the HBase table. We will address questions like how many mappers and/or reducers are needed and how can I improve import and processing performance.. I looked in the blog up for these questions, but it seems that there is no article related. Do you knoe if he you touched these subjects into a different post or book? Particular I am interested 1. how you can set up the number of mappers? 2. number of mappers can be set up per region server? If yes how? 3. How the big number of set up mappers can affect the data locality? 4. is this algorithm for computing the number of mappers (https://issues.apache.org/jira/browse/HBASE-1172) still available Currently, the number of mappers specified when using TableInputFormat is strictly followed if less than total regions on the input table. If greater, the number of regions is used. This will modify the splitting algorithm to do the following: * Specify 0 mappers when you want # mappers = # regions * If you specify fewer mappers than regions, will use exactly the number you specify based on the current algorithm * If you specify more mappers than regions, will divide regions up by determining [start,X) [X,end). The number of mappers will always be a multiple of number of regions. This is so we do not have scanners spanning multiple regions. There is an additional issue in that the default number of mappers in JobConf is set to 1. That means if a user does not explicitly set number of map tasks, a single mapper will be used. I'll look forward for you answers. Thank you. Kind regards, Florin
Re: A question about HBase MapReduce
re: data from raw data file into hbase table One approach is bulk loading.. http://hbase.apache.org/book.html#arch.bulk.load If he's talking about using an Hbase table as the source of a MR job, then see this... http://hbase.apache.org/book.html#splitter On 5/25/12 2:35 AM, Florin P florinp...@yahoo.com wrote: Hello! I've read Lars George's blog http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where at the end of the article, he mentioned In the next post I will show you how to import data from a raw data file into a HBase table and how you eventually process the data in the HBase table. We will address questions like how many mappers and/or reducers are needed and how can I improve import and processing performance.. I looked in the blog up for these questions, but it seems that there is no article related. Do you knoe if he you touched these subjects into a different post or book? Particular I am interested 1. how you can set up the number of mappers? 2. number of mappers can be set up per region server? If yes how? 3. How the big number of set up mappers can affect the data locality? 4. is this algorithm for computing the number of mappers (https://issues.apache.org/jira/browse/HBASE-1172) still available Currently, the number of mappers specified when using TableInputFormat is strictly followed if less than total regions on the input table. If greater, the number of regions is used. This will modify the splitting algorithm to do the following: * Specify 0 mappers when you want # mappers = # regions * If you specify fewer mappers than regions, will use exactly the number you specify based on the current algorithm * If you specify more mappers than regions, will divide regions up by determining [start,X) [X,end). The number of mappers will always be a multiple of number of regions. This is so we do not have scanners spanning multiple regions. There is an additional issue in that the default number of mappers in JobConf is set to 1. That means if a user does not explicitly set number of map tasks, a single mapper will be used. I'll look forward for you answers. Thank you. Kind regards, Florin
Re: HBase mapreduce sink - using a custom TableReducer to pass in Puts
My first guess would be to check if all the KVs using the same qualifier, because then it's basically the same cell 10 times. J-D On Mon, May 14, 2012 at 6:50 PM, Ben Kim benkimkim...@gmail.com wrote: Hello! I'm writing a mapreduce code to read a SequenceFile and write it to hbase table. Normally, or what hbase tutorial tells us to do.. you would create a Put in TableMapper and pass it to IdentityTableReducer. This in fact work for me. But now I'm trying to separate the computations into mapper and let reducer take care of writing to hbase. Following is my TableReducer public class MyTableReducer extends TableReducerImmutableBytesWritable, KeyValue, ImmutableBytesWritable { public void reduce(ImmutableBytesWritable key, IterableKeyValue values, Context context) throws IOException, InterruptedException { Put put = new Put(key.get()); for(KeyValue kv : values) { put.add(kv); } context.write(key, put); } } For my testing purpose, I'm writing 10 rows with 10 cells. I added multiple cells to each Put operations (put.add(kv)) But this Reducer will only write the one last cell passed by Mapper! Following is setup of the Job Job itemTableJob = prepareJob( inputPath, outputPath, SequenceFileInputFormat.class, MyMapper.class, ImmutableBytesWritable.class, KeyValue.class, MyTableReducerclass, ImmutableBytesWritable.class, Writable.class, TableOutputFormat.class); TableMapReduceUtil.initTableReducerJob(rs_system, null, itemTableJob); itemTableJob.waitForCompletion(true); Am I missing smting? -- *Benjamin Kim* Tel : +82 2.6400.3654* |* Mo : +82 10.5357.0521* benkimkimben at gmail*
HBase mapreduce sink - using a custom TableReducer to pass in Puts
Hello! I'm writing a mapreduce code to read a SequenceFile and write it to hbase table. Normally, or what hbase tutorial tells us to do.. you would create a Put in TableMapper and pass it to IdentityTableReducer. This in fact work for me. But now I'm trying to separate the computations into mapper and let reducer take care of writing to hbase. Following is my TableReducer public class MyTableReducer extends TableReducerImmutableBytesWritable, KeyValue, ImmutableBytesWritable { public void reduce(ImmutableBytesWritable key, IterableKeyValue values, Context context) throws IOException, InterruptedException { Put put = new Put(key.get()); for(KeyValue kv : values) { put.add(kv); } context.write(key, put); } } For my testing purpose, I'm writing 10 rows with 10 cells. I added multiple cells to each Put operations (put.add(kv)) But this Reducer will only write the one last cell passed by Mapper! Following is setup of the Job Job itemTableJob = prepareJob( inputPath, outputPath, SequenceFileInputFormat.class, MyMapper.class, ImmutableBytesWritable.class, KeyValue.class, MyTableReducerclass, ImmutableBytesWritable.class, Writable.class, TableOutputFormat.class); TableMapReduceUtil.initTableReducerJob(rs_system, null, itemTableJob); itemTableJob.waitForCompletion(true); Am I missing smting? -- *Benjamin Kim* Tel : +82 2.6400.3654* |* Mo : +82 10.5357.0521* benkimkimben at gmail*
Re: HBase mapreduce sink - using a custom TableReducer to pass in Puts
Oops I made mistake while copy-paste The reducer initialization code should be like this TableMapReduceUtil.initTableReducerJob(rs_system, MyTableReducer, itemTableJob); On Tue, May 15, 2012 at 10:50 AM, Ben Kim benkimkim...@gmail.com wrote: Hello! I'm writing a mapreduce code to read a SequenceFile and write it to hbase table. Normally, or what hbase tutorial tells us to do.. you would create a Put in TableMapper and pass it to IdentityTableReducer. This in fact work for me. But now I'm trying to separate the computations into mapper and let reducer take care of writing to hbase. Following is my TableReducer public class MyTableReducer extends TableReducerImmutableBytesWritable, KeyValue, ImmutableBytesWritable { public void reduce(ImmutableBytesWritable key, IterableKeyValue values, Context context) throws IOException, InterruptedException { Put put = new Put(key.get()); for(KeyValue kv : values) { put.add(kv); } context.write(key, put); } } For my testing purpose, I'm writing 10 rows with 10 cells. I added multiple cells to each Put operations (put.add(kv)) But this Reducer will only write the one last cell passed by Mapper! Following is setup of the Job Job itemTableJob = prepareJob( inputPath, outputPath, SequenceFileInputFormat.class, MyMapper.class, ImmutableBytesWritable.class, KeyValue.class, MyTableReducerclass, ImmutableBytesWritable.class, Writable.class, TableOutputFormat.class); TableMapReduceUtil.initTableReducerJob(rs_system, null, itemTableJob); itemTableJob.waitForCompletion(true); Am I missing smting? -- *Benjamin Kim* Tel : +82 2.6400.3654* |* Mo : +82 10.5357.0521* benkimkimben at gmail* -- *Benjamin Kim* Tel : +82 2.6400.3654* |* Mo : +82 10.5357.0521* benkimkim...@gmail.com*
HBase MapReduce Job with Multiple Scans
Hello, I have a table whose key is structured as eventType + time, and I need to periodically run a map reduce job on the table which will process each event type within a specific time range. So, the map reduce job needs to process multiple segments of the table as input, and therefore can't be setup with a single scan. (Using a filter on the scan would theoretically work, but doesn't scale well as the data size increases.) Given that the HBase provided TableMapReduceUtil.initTableMapperJob only supports a single scan there doesn't appear to be a built in way to run a mapreduce job that has multiple scans as input. I found the following related post which points me to creating my own map reduce InputFormat type by extending HBase's TableInputFormatBase and overriding the getSplits() method: http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects So, that's currently the direction I'm heading. However, before I got too far in the weeds I thought I'd ask: 1. Is this still the best/right way to handle this situation? 2. Does anyone have an example of a custom InputFormat that sets up multiple scans against an HBase input table (something like the MultiSegmentTableInputFormat referred to in the post) that they'd be willing to share? Thanks, -Shawn
Re: HBase MapReduce Job with Multiple Scans
Take a look at HBASE-3996 where Stack has some comments outstanding. Cheers On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn squ...@moxiegroup.com wrote: Hello, I have a table whose key is structured as eventType + time, and I need to periodically run a map reduce job on the table which will process each event type within a specific time range. So, the map reduce job needs to process multiple segments of the table as input, and therefore can't be setup with a single scan. (Using a filter on the scan would theoretically work, but doesn't scale well as the data size increases.) Given that the HBase provided TableMapReduceUtil.initTableMapperJob only supports a single scan there doesn't appear to be a built in way to run a mapreduce job that has multiple scans as input. I found the following related post which points me to creating my own map reduce InputFormat type by extending HBase's TableInputFormatBase and overriding the getSplits() method: http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects So, that's currently the direction I'm heading. However, before I got too far in the weeds I thought I'd ask: 1. Is this still the best/right way to handle this situation? 2. Does anyone have an example of a custom InputFormat that sets up multiple scans against an HBase input table (something like the MultiSegmentTableInputFormat referred to in the post) that they'd be willing to share? Thanks, -Shawn
Re: HBase MapReduce Job with Multiple Scans
Sounds good, thanks Ted. I'll give it a whirl and add any comments/findings to the Jira issue. -Shawn On Tue, Apr 3, 2012 at 10:45 AM, Ted Yu yuzhih...@gmail.com wrote: Stack said he might help implement his suggestions if Eran is busy. The patch doesn't depend on recent changes to the Hadoop/MapReduce. Give it a try. Feedback would help us refine the patch. Thanks On Tue, Apr 3, 2012 at 7:43 AM, Shawn Quinn squ...@moxiegroup.com wrote: Thanks for the quick reply Ted! That's exactly what I'm looking for. Reading through the Jira comments I'm a bit confused on what the status/plan is with that patch. Do you expect that will be included in the next HBase release, or has it been postponed? Also, does that change depend on any recent changes to the Hadoop/MapReduce, or will it work as-is? In the meantime, I'll give that patch a closer look and setup some custom classes in my own project to try and pull off something similar. -Shawn On Tue, Apr 3, 2012 at 9:42 AM, Ted Yu yuzhih...@gmail.com wrote: Take a look at HBASE-3996 where Stack has some comments outstanding. Cheers On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn squ...@moxiegroup.com wrote: Hello, I have a table whose key is structured as eventType + time, and I need to periodically run a map reduce job on the table which will process each event type within a specific time range. So, the map reduce job needs to process multiple segments of the table as input, and therefore can't be setup with a single scan. (Using a filter on the scan would theoretically work, but doesn't scale well as the data size increases.) Given that the HBase provided TableMapReduceUtil.initTableMapperJob only supports a single scan there doesn't appear to be a built in way to run a mapreduce job that has multiple scans as input. I found the following related post which points me to creating my own map reduce InputFormat type by extending HBase's TableInputFormatBase and overriding the getSplits() method: http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects So, that's currently the direction I'm heading. However, before I got too far in the weeds I thought I'd ask: 1. Is this still the best/right way to handle this situation? 2. Does anyone have an example of a custom InputFormat that sets up multiple scans against an HBase input table (something like the MultiSegmentTableInputFormat referred to in the post) that they'd be willing to share? Thanks, -Shawn
Re: hbase mapreduce running though command line
i tried to run the program from eclipse, but during that , i could not see any job running on the jobtracker/tasktracker web UI pages. i observed that on the eclipse localJobRunner is executing , so that job is not submitted to the whole cluster, but its executing on that name node machine alone. So i thought of generating jar of my whole project from eclipse, in which conf folder of the hbase was added as external class folder to its build path. When i am trying to jar my project folder with the help of eclipse(File-export-java-jar FILE-), one alert is coming like 'the jar has been generated except conf folder in the build path'. So, without conf folder in the jar, i tried to execute from command line , as i mentioned in the last mail. how can i run my mapreduce program on the hadoop cluster from the eclipse..? Any configuration settings should i make ..? please help.. On Fri, Dec 9, 2011 at 11:13 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: You don't need the conf dir in the jar, in fact you really don't want it there. I don't know where that alert is coming from, would be nice if you gave more details. J-D On Fri, Dec 9, 2011 at 6:45 AM, Vamshi Krishna vamshi2...@gmail.com wrote: Hi, i want to run mapreduce program to insert data to tables in hbase. my cluster has 3 machines. If i want to run that program through command line, where can i do so..? should i do ${Hadoop_Home}/bin/hadoop jar MyJavaProg.jar java_mainclass_file source destn here MyJavaProg.jar is the jar of my java project , in which i wrote java main class named java_mainclass_file , where i wrote map reduce class consisting of map method only(i dont require reduce). i passed arguments to that main method, as arguments through Fileformat.addinput(..), Fileformat.setoutput(..) methods. source and destn are locations inside DFS. when i tried to create jar of my whole java project from eclipse, i got alert that, 'conf directory of HBase was not exported to jar', So how to run my program through command line to insert data into hbase table..?? can anybody help?? -- *Regards* * Vamshi Krishna * -- *Regards* * Vamshi Krishna *
hbase mapreduce running though command line
Hi, i want to run mapreduce program to insert data to tables in hbase. my cluster has 3 machines. If i want to run that program through command line, where can i do so..? should i do ${Hadoop_Home}/bin/hadoop jar MyJavaProg.jar java_mainclass_file source destn here MyJavaProg.jar is the jar of my java project , in which i wrote java main class named java_mainclass_file , where i wrote map reduce class consisting of map method only(i dont require reduce). i passed arguments to that main method, as arguments through Fileformat.addinput(..), Fileformat.setoutput(..) methods. source and destn are locations inside DFS. when i tried to create jar of my whole java project from eclipse, i got alert that, 'conf directory of HBase was not exported to jar', So how to run my program through command line to insert data into hbase table..?? can anybody help?? -- *Regards* * Vamshi Krishna *
Re: hbase mapreduce running though command line
You don't need the conf dir in the jar, in fact you really don't want it there. I don't know where that alert is coming from, would be nice if you gave more details. J-D On Fri, Dec 9, 2011 at 6:45 AM, Vamshi Krishna vamshi2...@gmail.com wrote: Hi, i want to run mapreduce program to insert data to tables in hbase. my cluster has 3 machines. If i want to run that program through command line, where can i do so..? should i do ${Hadoop_Home}/bin/hadoop jar MyJavaProg.jar java_mainclass_file source destn here MyJavaProg.jar is the jar of my java project , in which i wrote java main class named java_mainclass_file , where i wrote map reduce class consisting of map method only(i dont require reduce). i passed arguments to that main method, as arguments through Fileformat.addinput(..), Fileformat.setoutput(..) methods. source and destn are locations inside DFS. when i tried to create jar of my whole java project from eclipse, i got alert that, 'conf directory of HBase was not exported to jar', So how to run my program through command line to insert data into hbase table..?? can anybody help?? -- *Regards* * Vamshi Krishna *
Re: HBase MapReduce Zookeeper
I had the same issue. The problem for me turned out to be that the hbase.zookeeper.quorum was not set in hbase-site.xml in the server that submitted the mapreduce job. Ironically, this is also the same server that was running hbase master. This defaulted to 127.0.0.1 which was where the tasktrackers where trying to initiate the connection from. The fact that the zookeeper quorum was not set in the tasktrackers was useless.
Hbase Mapreduce jobs Dashboard
Hi All, When I run Hadoop mapreduce jobs, the job statistics and status is displayed in jobtracker/task tracker. But when I use HBase mapreduce it doesn't. Is there any hbase mapreduce dashboard available or am I missing something? Thanks Regards Jimson K James The Quieter You Become The More You Are Able To Hear. * Confidentiality Statement/Disclaimer * This message and any attachments is intended for the sole use of the intended recipient. It may contain confidential information. Any unauthorized use, dissemination or modification is strictly prohibited. If you are not the intended recipient, please notify the sender immediately then delete it from all your systems, and do not copy, use or print. Internet communications are not secure and it is the responsibility of the recipient to make sure that it is virus/malicious code exempt. The company/sender cannot be responsible for any unauthorized alterations or modifications made to the contents. If you require any form of confirmation of the contents, please contact the company/sender. The company/sender is not liable for any errors or omissions in the content of this message.
Re: Hbase Mapreduce jobs Dashboard
HBase doesn't have it's own MapReduce system, it uses Hadoop's. How are you launching your jobs? On Mon, Sep 12, 2011 at 2:32 AM, Jimson K. James jimson.ja...@nestgroup.net wrote: Hi All, When I run Hadoop mapreduce jobs, the job statistics and status is displayed in jobtracker/task tracker. But when I use HBase mapreduce it doesn't. Is there any hbase mapreduce dashboard available or am I missing something? Thanks Regards Jimson K James The Quieter You Become The More You Are Able To Hear. * Confidentiality Statement/Disclaimer * This message and any attachments is intended for the sole use of the intended recipient. It may contain confidential information. Any unauthorized use, dissemination or modification is strictly prohibited. If you are not the intended recipient, please notify the sender immediately then delete it from all your systems, and do not copy, use or print. Internet communications are not secure and it is the responsibility of the recipient to make sure that it is virus/malicious code exempt. The company/sender cannot be responsible for any unauthorized alterations or modifications made to the contents. If you require any form of confirmation of the contents, please contact the company/sender. The company/sender is not liable for any errors or omissions in the content of this message. -- Joseph Echeverria Cloudera, Inc. 443.305.9434
RE: Hbase Mapreduce jobs Dashboard
Hi, How are you launching your jobs? Testing from both a cygwin environment and from inside the hadoop/HBase cluster using hadoop jar mrtest.jar -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, September 12, 2011 4:47 PM To: user@hbase.apache.org Subject: Re: Hbase Mapreduce jobs Dashboard HBase doesn't have it's own MapReduce system, it uses Hadoop's. How are you launching your jobs? On Mon, Sep 12, 2011 at 2:32 AM, Jimson K. James jimson.ja...@nestgroup.net wrote: Hi All, When I run Hadoop mapreduce jobs, the job statistics and status is displayed in jobtracker/task tracker. But when I use HBase mapreduce it doesn't. Is there any hbase mapreduce dashboard available or am I missing something? Thanks Regards Jimson K James The Quieter You Become The More You Are Able To Hear. * Confidentiality Statement/Disclaimer * This message and any attachments is intended for the sole use of the intended recipient. It may contain confidential information. Any unauthorized use, dissemination or modification is strictly prohibited. If you are not the intended recipient, please notify the sender immediately then delete it from all your systems, and do not copy, use or print. Internet communications are not secure and it is the responsibility of the recipient to make sure that it is virus/malicious code exempt. The company/sender cannot be responsible for any unauthorized alterations or modifications made to the contents. If you require any form of confirmation of the contents, please contact the company/sender. The company/sender is not liable for any errors or omissions in the content of this message. -- Joseph Echeverria Cloudera, Inc. 443.305.9434 * Confidentiality Statement/Disclaimer * This message and any attachments is intended for the sole use of the intended recipient. It may contain confidential information. Any unauthorized use, dissemination or modification is strictly prohibited. If you are not the intended recipient, please notify the sender immediately then delete it from all your systems, and do not copy, use or print. Internet communications are not secure and it is the responsibility of the recipient to make sure that it is virus/malicious code exempt. The company/sender cannot be responsible for any unauthorized alterations or modifications made to the contents. If you require any form of confirmation of the contents, please contact the company/sender. The company/sender is not liable for any errors or omissions in the content of this message.
Re: Hbase Mapreduce jobs Dashboard
Jimson, This is probably related to your other question on the list where its apparent that the HBase jobs are somehow running on a LocalJobRunner instead of a proper MapReduce cluster. This is why the JT doesn't display any knowledge about it. On Mon, Sep 12, 2011 at 5:04 PM, Jimson K. James jimson.ja...@nestgroup.net wrote: Hi, How are you launching your jobs? Testing from both a cygwin environment and from inside the hadoop/HBase cluster using hadoop jar mrtest.jar -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, September 12, 2011 4:47 PM To: user@hbase.apache.org Subject: Re: Hbase Mapreduce jobs Dashboard HBase doesn't have it's own MapReduce system, it uses Hadoop's. How are you launching your jobs? On Mon, Sep 12, 2011 at 2:32 AM, Jimson K. James jimson.ja...@nestgroup.net wrote: Hi All, When I run Hadoop mapreduce jobs, the job statistics and status is displayed in jobtracker/task tracker. But when I use HBase mapreduce it doesn't. Is there any hbase mapreduce dashboard available or am I missing something? Thanks Regards Jimson K James The Quieter You Become The More You Are Able To Hear. * Confidentiality Statement/Disclaimer * This message and any attachments is intended for the sole use of the intended recipient. It may contain confidential information. Any unauthorized use, dissemination or modification is strictly prohibited. If you are not the intended recipient, please notify the sender immediately then delete it from all your systems, and do not copy, use or print. Internet communications are not secure and it is the responsibility of the recipient to make sure that it is virus/malicious code exempt. The company/sender cannot be responsible for any unauthorized alterations or modifications made to the contents. If you require any form of confirmation of the contents, please contact the company/sender. The company/sender is not liable for any errors or omissions in the content of this message. -- Joseph Echeverria Cloudera, Inc. 443.305.9434 * Confidentiality Statement/Disclaimer * This message and any attachments is intended for the sole use of the intended recipient. It may contain confidential information. Any unauthorized use, dissemination or modification is strictly prohibited. If you are not the intended recipient, please notify the sender immediately then delete it from all your systems, and do not copy, use or print. Internet communications are not secure and it is the responsibility of the recipient to make sure that it is virus/malicious code exempt. The company/sender cannot be responsible for any unauthorized alterations or modifications made to the contents. If you require any form of confirmation of the contents, please contact the company/sender. The company/sender is not liable for any errors or omissions in the content of this message. -- Harsh J
Fwd: HBase Mapreduce cannot find Map class
-- Forwarded message -- From: air cnwe...@gmail.com Date: 2011/7/28 Subject: HBase Mapreduce cannot find Map class To: CDH Users cdh-u...@cloudera.org import java.io.IOException; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.mapred.TableMapReduceUtil; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.lib.NullOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class LoadToHBase extends Configured implements Tool{ public static class XMapK, V extends MapReduceBase implements MapperLongWritable, Text, K, V{ private JobConf conf; @Override public void configure(JobConf conf){ this.conf = conf; try{ this.table = new HTable(new HBaseConfiguration(conf), observations); }catch(IOException e){ throw new RuntimeException(Failed HTable construction, e); } } @Override public void close() throws IOException{ super.close(); table.close(); } private HTable table; public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException{ String[] valuelist = value.toString().split(\t); SimpleDateFormat sdf = new SimpleDateFormat(-MM-dd HH:mm:ss); Date addtime = null; // 用户注册时间 Date ds = null; Long delta_days = null; String uid = valuelist[0]; try { addtime = sdf.parse(valuelist[1]); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } String ds_str = conf.get(load.hbase.ds, null); if (ds_str != null){ try { ds = sdf.parse(ds_str); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } }else{ ds_str = 2011-07-28; } if (addtime != null ds != null){ delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 * 60 * 1000); } if (delta_days != null){ byte[] rowKey = uid.getBytes(); Put p = new Put(rowKey); p.add(content.getBytes(), attr1.getBytes(), delta_days.toString().getBytes()); table.put(p); } } } /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub int exitCode = ToolRunner.run(new HBaseConfiguration(), new LoadToHBase(), args); System.exit(exitCode); } @Override public int run(String[] args) throws Exception { // TODO Auto-generated method stub JobConf conf = new JobConf(getClass()); TableMapReduceUtil.addDependencyJars(conf); FileInputFormat.addInputPath(conf, new Path(args[0])); conf.setJobName(LoadToHBase); conf.setJarByClass(getClass()); conf.setMapperClass(XMap.class); conf.setNumReduceTasks(0); conf.setOutputFormat(NullOutputFormat.class); JobClient.runJob(conf); return 0; } } execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/ and it says: .. 11/07/28 17:20:29 INFO mapred.JobClient: Task Id : attempt_201107261532_2625_m_04_1, Status : FAILED java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127
Re: HBase MapReduce Zookeeper
this issue is still not resolved... unfortunatelly calling HConnectionManager.deleteConnection(conf, true); after the MR job is finished, does not close the connection to the zookeeper we have 3 zookeeper nodes by default there is a limit of 10 connections allowed from a single client so after running 30 MR jobs scheduled by our application, we have 30 unclosed connections, trying to start a new MR job results in a failure, the connection to the zookeeper ensamble is droped... the work around to restart the whole application after 30 MR jobss is not very elegant... :-(
Re: HBase MapReduce Zookeeper
Try getting the ZooKeeperWatcher from the connection on your way out and explicitly shutdown the zk connection (see TestZooKeeper unit test for example). St.Ack On Thu, Jul 28, 2011 at 6:01 AM, Andre Reiter a.rei...@web.de wrote: this issue is still not resolved... unfortunatelly calling HConnectionManager.deleteConnection(conf, true); after the MR job is finished, does not close the connection to the zookeeper we have 3 zookeeper nodes by default there is a limit of 10 connections allowed from a single client so after running 30 MR jobs scheduled by our application, we have 30 unclosed connections, trying to start a new MR job results in a failure, the connection to the zookeeper ensamble is droped... the work around to restart the whole application after 30 MR jobss is not very elegant... :-(
Re: HBase MapReduce Zookeeper
10 connection maximum is too low. It has been recommended to go up to as many as 2000 connections in the list. This doesn't fix your problem but is something you should probably have in your configuration. ~Jeff On 7/28/2011 10:00 AM, Stack wrote: Try getting the ZooKeeperWatcher from the connection on your way out and explicitly shutdown the zk connection (see TestZooKeeper unit test for example). St.Ack On Thu, Jul 28, 2011 at 6:01 AM, Andre Reitera.rei...@web.de wrote: this issue is still not resolved... unfortunatelly calling HConnectionManager.deleteConnection(conf, true); after the MR job is finished, does not close the connection to the zookeeper we have 3 zookeeper nodes by default there is a limit of 10 connections allowed from a single client so after running 30 MR jobs scheduled by our application, we have 30 unclosed connections, trying to start a new MR job results in a failure, the connection to the zookeeper ensamble is droped... the work around to restart the whole application after 30 MR jobss is not very elegant... :-( -- Jeff Whiting Qualtrics Senior Software Engineer je...@qualtrics.com
Re: HBase MapReduce Zookeeper
This problem has come up a few times. There are leaked connections in the TIF See: https://issues.apache.org/jira/browse/HBASE-3792 https://issues.apache.org/jira/browse/HBASE-3777 A quick and (very) dirty solution is to call deleteAllConnections(bool) at the end of your MapReduce jobs, or periodically. If you have no other tables or pools, etc. open, then no problem. If you do, they'll start throwing IOExceptions, but you can re-instantiate them with a new config and then continue as usual. (You do have to change the config or it'll simply grab the closed, cached one from the HCM). Another way: The leak comes from inside of TableInputFormat.setConf, where the Configuration gets cloned (so then it's hash in the HCM is lost): setHTable(new HTable(new Configuration(conf), tableName)); This is done to prevent changes to a config from affecting the job and vice-versa. If you're 100% sure the config won't be modified, you could subclass TIF to not make this copy. For me, I didn't have extra tables hanging around, so I just blast them with deleteAllConnections. :) - Ruben From: Jeff Whiting je...@qualtrics.com To: user@hbase.apache.org Sent: Thu, July 28, 2011 12:10:16 PM Subject: Re: HBase MapReduce Zookeeper 10 connection maximum is too low. It has been recommended to go up to as many as 2000 connections in the list. This doesn't fix your problem but is something you should probably have in your configuration. ~Jeff On 7/28/2011 10:00 AM, Stack wrote: Try getting the ZooKeeperWatcher from the connection on your way out and explicitly shutdown the zk connection (see TestZooKeeper unit test for example). St.Ack On Thu, Jul 28, 2011 at 6:01 AM, Andre Reitera.rei...@web.de wrote: this issue is still not resolved... unfortunatelly calling HConnectionManager.deleteConnection(conf, true); after the MR job is finished, does not close the connection to the zookeeper we have 3 zookeeper nodes by default there is a limit of 10 connections allowed from a single client so after running 30 MR jobs scheduled by our application, we have 30 unclosed connections, trying to start a new MR job results in a failure, the connection to the zookeeper ensamble is droped... the work around to restart the whole application after 30 MR jobss is not very elegant... :-( -- Jeff Whiting Qualtrics Senior Software Engineer je...@qualtrics.com
Re: HBase MapReduce Zookeeper
i guess, i know the reason, why HConnectionManager.deleteConnection(conf, true); does not work for me in the MR job im using TableInputFormat, if you have a look at the source code in the method public void setConf(Configuration configuration) there is a line creating the HTable like this : ... setHTable(new HTable(new Configuration(conf), tableName)); ... now the HTable is using a new configuration, why??? HConnectionManager holds all Connections in a map with the config as a key: private static final MapConfiguration, HConnectionImplementation HBASE_INSTANCE = ... Configuration does not override the equals and hash methods, so the HConnection created by the table using the new Configuration can not be referenced... just looking at executed code using debugger, i found out, that there is even one place more, where another new configuration is created and used: org.apache.hadoop.mapred.JobClient.submitJobInternal ... JobContext context = new JobContext(jobCopy, jobId); ... the JobContext constructor creates a new Configuration object: new org.apache.hadoop.mapred.JobConf(conf) so there is no real chance for me to get the configuration, which was used during the call to HConnectionManager.getConnection(conf) that is why HConnectionManager.deleteConnection(conf, true); does not work for me, or am i completely wrong??? just going crazy with that! i'm not doing anything very special, all what i want, is to run a MR job out of my application, and NOT by calling it from the shell like this: ./bin/hadoop jar /tmp/my.jar package.OurClass
Re: HBase MapReduce Zookeeper
Yes, that's the connection leak. Use deleteAllConnections(true), and it will close all open connections. - Ruben From: Andre Reiter a.rei...@web.de To: user@hbase.apache.org Sent: Thu, July 28, 2011 4:55:52 PM Subject: Re: HBase MapReduce Zookeeper i guess, i know the reason, why HConnectionManager.deleteConnection(conf, true); does not work for me in the MR job im using TableInputFormat, if you have a look at the source code in the method public void setConf(Configuration configuration) there is a line creating the HTable like this : ... setHTable(new HTable(new Configuration(conf), tableName)); ... now the HTable is using a new configuration, why??? HConnectionManager holds all Connections in a map with the config as a key: private static final MapConfiguration, HConnectionImplementation HBASE_INSTANCE = ... Configuration does not override the equals and hash methods, so the HConnection created by the table using the new Configuration can not be referenced... just looking at executed code using debugger, i found out, that there is even one place more, where another new configuration is created and used: org.apache.hadoop.mapred.JobClient.submitJobInternal ... JobContext context = new JobContext(jobCopy, jobId); ... the JobContext constructor creates a new Configuration object: new org.apache.hadoop.mapred.JobConf(conf) so there is no real chance for me to get the configuration, which was used during the call to HConnectionManager.getConnection(conf) that is why HConnectionManager.deleteConnection(conf, true); does not work for me, or am i completely wrong??? just going crazy with that! i'm not doing anything very special, all what i want, is to run a MR job out of my application, and NOT by calling it from the shell like this: ./bin/hadoop jar /tmp/my.jar package.OurClass
Re: HBase MapReduce Zookeeper
Or override that method and in your version do not clone the Configuration. St.Ack On Thu, Jul 28, 2011 at 2:28 PM, Ruben Quintero rfq_...@yahoo.com wrote: Yes, that's the connection leak. Use deleteAllConnections(true), and it will close all open connections. - Ruben From: Andre Reiter a.rei...@web.de To: user@hbase.apache.org Sent: Thu, July 28, 2011 4:55:52 PM Subject: Re: HBase MapReduce Zookeeper i guess, i know the reason, why HConnectionManager.deleteConnection(conf, true); does not work for me in the MR job im using TableInputFormat, if you have a look at the source code in the method public void setConf(Configuration configuration) there is a line creating the HTable like this : ... setHTable(new HTable(new Configuration(conf), tableName)); ... now the HTable is using a new configuration, why??? HConnectionManager holds all Connections in a map with the config as a key: private static final MapConfiguration, HConnectionImplementation HBASE_INSTANCE = ... Configuration does not override the equals and hash methods, so the HConnection created by the table using the new Configuration can not be referenced... just looking at executed code using debugger, i found out, that there is even one place more, where another new configuration is created and used: org.apache.hadoop.mapred.JobClient.submitJobInternal ... JobContext context = new JobContext(jobCopy, jobId); ... the JobContext constructor creates a new Configuration object: new org.apache.hadoop.mapred.JobConf(conf) so there is no real chance for me to get the configuration, which was used during the call to HConnectionManager.getConnection(conf) that is why HConnectionManager.deleteConnection(conf, true); does not work for me, or am i completely wrong??? just going crazy with that! i'm not doing anything very special, all what i want, is to run a MR job out of my application, and NOT by calling it from the shell like this: ./bin/hadoop jar /tmp/my.jar package.OurClass
Re: HBase MapReduce Zookeeper
hi Ruben, St.Ack thanks a lot for your help! finally, the problem seems to be solved by an pretty sick workaround i did it like Bryan Keller described in this issue: https://issues.apache.org/jira/browse/HBASE-3792 @Ruben: thanks for the urls to that issues cheers andre
Re: HBase Mapreduce cannot find Map class
Maybe job.setJarByClass() can solve this problem. On Thu, Jul 28, 2011 at 7:06 PM, air cnwe...@gmail.com wrote: -- Forwarded message -- From: air cnwe...@gmail.com Date: 2011/7/28 Subject: HBase Mapreduce cannot find Map class To: CDH Users cdh-u...@cloudera.org import java.io.IOException; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.mapred.TableMapReduceUtil; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.lib.NullOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class LoadToHBase extends Configured implements Tool{ public static class XMapK, V extends MapReduceBase implements MapperLongWritable, Text, K, V{ private JobConf conf; @Override public void configure(JobConf conf){ this.conf = conf; try{ this.table = new HTable(new HBaseConfiguration(conf), observations); }catch(IOException e){ throw new RuntimeException(Failed HTable construction, e); } } @Override public void close() throws IOException{ super.close(); table.close(); } private HTable table; public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException{ String[] valuelist = value.toString().split(\t); SimpleDateFormat sdf = new SimpleDateFormat(-MM-dd HH:mm:ss); Date addtime = null; // 用户注册时间 Date ds = null; Long delta_days = null; String uid = valuelist[0]; try { addtime = sdf.parse(valuelist[1]); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } String ds_str = conf.get(load.hbase.ds, null); if (ds_str != null){ try { ds = sdf.parse(ds_str); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } }else{ ds_str = 2011-07-28; } if (addtime != null ds != null){ delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 * 60 * 1000); } if (delta_days != null){ byte[] rowKey = uid.getBytes(); Put p = new Put(rowKey); p.add(content.getBytes(), attr1.getBytes(), delta_days.toString().getBytes()); table.put(p); } } } /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub int exitCode = ToolRunner.run(new HBaseConfiguration(), new LoadToHBase(), args); System.exit(exitCode); } @Override public int run(String[] args) throws Exception { // TODO Auto-generated method stub JobConf conf = new JobConf(getClass()); TableMapReduceUtil.addDependencyJars(conf); FileInputFormat.addInputPath(conf, new Path(args[0])); conf.setJobName(LoadToHBase); conf.setJarByClass(getClass()); conf.setMapperClass(XMap.class); conf.setNumReduceTasks(0); conf.setOutputFormat(NullOutputFormat.class); JobClient.runJob(conf); return 0; } } execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/ and it says: .. 11/07/28 17:20:29 INFO mapred.JobClient: Task Id : attempt_201107261532_2625_m_04_1, Status : FAILED java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method
Re: HBase MapReduce Zookeeper
Hi St.Ack, thanks for your reply but funally i miss the point, what would be the options to solve our issue? andre
Re: HBase MapReduce Zookeeper
Can you reuse Configuration instances though the configuration changes? Else in your Mapper#cleanup, call HTable.close() then try HConnectionManager.deleteConnection(table.getConfiguration()) after close (could be issue with executors used by multi* operations not completing before delete of connection finishes. St.Ack On Tue, Jul 19, 2011 at 11:36 PM, Andre Reiter a.rei...@web.de wrote: Hi St.Ack, thanks for your reply but funally i miss the point, what would be the options to solve our issue? andre
Re: HBase MapReduce Zookeeper
Hi Stack, just to make clear, actually the connections to the zookeeper being kept are not on our mappers (tasktrackers) but on the client, which schedules the MR job i think, the mappers are just fine, as they are andre Stack wrote: Can you reuse Configuration instances though the configuration changes? Else in your Mapper#cleanup, call HTable.close() then try HConnectionManager.deleteConnection(table.getConfiguration()) after close (could be issue with executors used by multi* operations not completing before delete of connection finishes. St.Ack
Re: HBase MapReduce Zookeeper
Then similarly, can you do the deleteConnection above in your client or reuse the Configuration client-side that you use setting up the job? St.Ack On Wed, Jul 20, 2011 at 12:13 AM, Andre Reiter a.rei...@web.de wrote: Hi Stack, just to make clear, actually the connections to the zookeeper being kept are not on our mappers (tasktrackers) but on the client, which schedules the MR job i think, the mappers are just fine, as they are andre Stack wrote: Can you reuse Configuration instances though the configuration changes? Else in your Mapper#cleanup, call HTable.close() then try HConnectionManager.deleteConnection(table.getConfiguration()) after close (could be issue with executors used by multi* operations not completing before delete of connection finishes. St.Ack
Re: HBase MapReduce Zookeeper
Hi St.Ack, actually calling HConnectionManager.deleteConnection(conf, true); does not close the connection to the zookeeper i still can see the connection established... andre Stack wrote: Then similarly, can you do the deleteConnection above in your client or reuse the Configuration client-side that you use setting up the job? St.Ack