Ninad, I'm not sure why you posted in this thread as it does not seem related, but for your answer a region will only split when a family reaches 256MB, so I guess your small number of records wasn't enough. To force a split, go in the shell and type " split 'tablename' " with the name of your table. Or you can also alter your table and change the MAX_FILESIZE value to something smaller. Finally, searching this mailing list you will find many other tips.
J-D On Sat, Apr 11, 2009 at 1:46 AM, Ninad Raut <[email protected]> wrote: > Map Tasks not splitting data for > > distributed processing > > I have 17,000 odd records for processing and in non distributed mode(without > mapred) I am able to process 1 record in 1 min. To reduce this time I used > HDFS/HBase and Mapred . But to my frustrations, HBase has kept all the > records in one region and is running a single mapred. this nullifies all my > effort and I am back to where I started(process 1 record in 1 min). Forced > splitting of records corrupts data. Can some one tell me a way of of this > mess? > > On Fri, Apr 10, 2009 at 11:02 PM, Rakhi Khatwani > <[email protected]>wrote: > >> >> >> ---------- Forwarded message ---------- >> From: Vaibhav Puranik <[email protected]> >> Date: Fri, Apr 10, 2009 at 7:43 PM >> Subject: Re: Region Servers going down frequently >> To: [email protected] >> >> >> Rakhi, >> >> Erick Holstad has written an Import Export map reduce job. This job is >> compatible with 0.19.1. >> >> I have tried it myself and it works fine. >> Here is the code - >> https://issues.apache.org/jira/browse/HBASE-974< >> https://issues.apache.org/jira/browse/HBASE-974> >> >> Regards, >> Vaibhav >> >> >> On Thu, Apr 9, 2009 at 3:10 AM, Rakhi Khatwani <[email protected] >> >wrote: >> >> > Hi, >> > The backup tool is written for older version and not for 0.19 . I tries >> > changing the code and taking a backup and restoring it again, but the >> > restore fails. No data is restored, though the mapreduce runs without >> > errors. >> > >> > Any help??? >> > >> > On Thu, Apr 9, 2009 at 12:57 PM, Amandeep Khurana <[email protected]> >> > wrote: >> > >> > > When you say column foo: it basically picks up all the columns under >> the >> > > family foo:.. You dont have to give individual column names. >> > > >> > > >> > > Amandeep Khurana >> > > Computer Science Graduate Student >> > > University of California, Santa Cruz >> > > >> > > >> > > On Thu, Apr 9, 2009 at 12:25 AM, Rakhi Khatwani < >> > [email protected] >> > > >wrote: >> > > >> > > > Thanks Amandeep. >> > > > >> > > > the usage of the code is >> > > > >> > > > "bin/hadoop com.mahalo.hadoop.hbase.Exporter -output mybackup -table >> > test >> > > > -columns foo:" >> > > > >> > > > but my columns are like example >> > > > *URL:http://www.yahoo.com/(columnname)*3(some<http://www.yahoo.com/%28columnname%29*3%28some> >> <http://www.yahoo.com/%28columnname%29*3%28some> >> > <http://www.yahoo.com/%28columnname%29*3%28some> >> > > <http://www.yahoo.com/%28columnname%29*3%28some>int value) .And there >> > > > are thousands of rows. >> > > > Its not feasible to use the code from command prompt. Is there >> another >> > > way >> > > > ?? >> > > > >> > > > >> > > > >> > > > On Thu, Apr 9, 2009 at 12:33 PM, Amandeep Khurana <[email protected]> >> > > > wrote: >> > > > >> > > > > You can use this... >> https://issues.apache.org/jira/browse/HBASE-897 >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > Amandeep Khurana >> > > > > Computer Science Graduate Student >> > > > > University of California, Santa Cruz >> > > > > >> > > > > >> > > > > On Wed, Apr 8, 2009 at 11:59 PM, Rakhi Khatwani < >> > > > [email protected] >> > > > > >wrote: >> > > > > >> > > > > > Hi Andy, >> > > > > > >> > > > > > I want to back up my HBase and move to a more powerful machine. I >> > am >> > > > > trying >> > > > > > distcp but it doesnot backup hbase folder properly. When I try >> > > > restoring >> > > > > > the >> > > > > > hbase folder I don't get all the records. Some tables are coming >> > > blank. >> > > > > > What >> > > > > > could be the reason.?? >> > > > > > >> > > > > > On Wed, Apr 8, 2009 at 11:23 PM, Andrew Purtell < >> > [email protected] >> > > > >> > > > > > wrote: >> > > > > > >> > > > > > > >> > > > > > > I updated the Troubleshooting page on the wiki with a section >> > > > > > > about EC2. Please feel free to extend/enhance/revise. >> > > > > > > >> > > > > > > - Andy >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >> >
