Map Tasks not splitting data for distributed processing
I have 17,000 odd records for processing and in non distributed mode(without mapred) I am able to process 1 record in 1 min. To reduce this time I used HDFS/HBase and Mapred . But to my frustrations, HBase has kept all the records in one region and is running a single mapred. this nullifies all my effort and I am back to where I started(process 1 record in 1 min). Forced splitting of records corrupts data. Can some one tell me a way of of this mess? On Fri, Apr 10, 2009 at 11:02 PM, Rakhi Khatwani <[email protected]>wrote: > > > ---------- Forwarded message ---------- > From: Vaibhav Puranik <[email protected]> > Date: Fri, Apr 10, 2009 at 7:43 PM > Subject: Re: Region Servers going down frequently > To: [email protected] > > > Rakhi, > > Erick Holstad has written an Import Export map reduce job. This job is > compatible with 0.19.1. > > I have tried it myself and it works fine. > Here is the code - > https://issues.apache.org/jira/browse/HBASE-974< > https://issues.apache.org/jira/browse/HBASE-974> > > Regards, > Vaibhav > > > On Thu, Apr 9, 2009 at 3:10 AM, Rakhi Khatwani <[email protected] > >wrote: > > > Hi, > > The backup tool is written for older version and not for 0.19 . I tries > > changing the code and taking a backup and restoring it again, but the > > restore fails. No data is restored, though the mapreduce runs without > > errors. > > > > Any help??? > > > > On Thu, Apr 9, 2009 at 12:57 PM, Amandeep Khurana <[email protected]> > > wrote: > > > > > When you say column foo: it basically picks up all the columns under > the > > > family foo:.. You dont have to give individual column names. > > > > > > > > > Amandeep Khurana > > > Computer Science Graduate Student > > > University of California, Santa Cruz > > > > > > > > > On Thu, Apr 9, 2009 at 12:25 AM, Rakhi Khatwani < > > [email protected] > > > >wrote: > > > > > > > Thanks Amandeep. > > > > > > > > the usage of the code is > > > > > > > > "bin/hadoop com.mahalo.hadoop.hbase.Exporter -output mybackup -table > > test > > > > -columns foo:" > > > > > > > > but my columns are like example > > > > *URL:http://www.yahoo.com/(columnname)*3(some<http://www.yahoo.com/%28columnname%29*3%28some> > <http://www.yahoo.com/%28columnname%29*3%28some> > > <http://www.yahoo.com/%28columnname%29*3%28some> > > > <http://www.yahoo.com/%28columnname%29*3%28some>int value) .And there > > > > are thousands of rows. > > > > Its not feasible to use the code from command prompt. Is there > another > > > way > > > > ?? > > > > > > > > > > > > > > > > On Thu, Apr 9, 2009 at 12:33 PM, Amandeep Khurana <[email protected]> > > > > wrote: > > > > > > > > > You can use this... > https://issues.apache.org/jira/browse/HBASE-897 > > > > > > > > > > > > > > > > > > > > > > > > > Amandeep Khurana > > > > > Computer Science Graduate Student > > > > > University of California, Santa Cruz > > > > > > > > > > > > > > > On Wed, Apr 8, 2009 at 11:59 PM, Rakhi Khatwani < > > > > [email protected] > > > > > >wrote: > > > > > > > > > > > Hi Andy, > > > > > > > > > > > > I want to back up my HBase and move to a more powerful machine. I > > am > > > > > trying > > > > > > distcp but it doesnot backup hbase folder properly. When I try > > > > restoring > > > > > > the > > > > > > hbase folder I don't get all the records. Some tables are coming > > > blank. > > > > > > What > > > > > > could be the reason.?? > > > > > > > > > > > > On Wed, Apr 8, 2009 at 11:23 PM, Andrew Purtell < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > I updated the Troubleshooting page on the wiki with a section > > > > > > > about EC2. Please feel free to extend/enhance/revise. > > > > > > > > > > > > > > - Andy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
