Map Tasks not splitting data for

distributed processing

I have 17,000 odd records for processing and in non distributed mode(without
mapred) I am able to process 1 record in 1 min. To reduce this time I used
HDFS/HBase and Mapred . But to my frustrations, HBase has kept all the
records in one region and is running a single mapred. this nullifies all my
effort and I am back to where I started(process 1 record in 1 min). Forced
splitting of records corrupts data. Can some one tell me a way of of this
mess?

On Fri, Apr 10, 2009 at 11:02 PM, Rakhi Khatwani
<[email protected]>wrote:

>
>
> ---------- Forwarded message ----------
> From: Vaibhav Puranik <[email protected]>
> Date: Fri, Apr 10, 2009 at 7:43 PM
> Subject: Re: Region Servers going down frequently
> To: [email protected]
>
>
> Rakhi,
>
> Erick Holstad has written an Import Export map reduce job. This job is
> compatible with 0.19.1.
>
> I have tried it myself and it works fine.
> Here is the code -
> https://issues.apache.org/jira/browse/HBASE-974<
> https://issues.apache.org/jira/browse/HBASE-974>
>
> Regards,
> Vaibhav
>
>
> On Thu, Apr 9, 2009 at 3:10 AM, Rakhi Khatwani <[email protected]
> >wrote:
>
> > Hi,
> > The backup tool is written for older version and not for 0.19 . I tries
> > changing the code and taking a backup and restoring it again, but the
> > restore fails. No data is restored, though the mapreduce runs without
> > errors.
> >
> > Any help???
> >
> > On Thu, Apr 9, 2009 at 12:57 PM, Amandeep Khurana <[email protected]>
> > wrote:
> >
> > > When you say column foo: it basically picks up all the columns under
> the
> > > family foo:.. You dont have to give individual column names.
> > >
> > >
> > > Amandeep Khurana
> > > Computer Science Graduate Student
> > > University of California, Santa Cruz
> > >
> > >
> > > On Thu, Apr 9, 2009 at 12:25 AM, Rakhi Khatwani <
> > [email protected]
> > > >wrote:
> > >
> > > > Thanks Amandeep.
> > > >
> > > > the usage of the code is
> > > >
> > > > "bin/hadoop com.mahalo.hadoop.hbase.Exporter -output mybackup -table
> > test
> > > > -columns foo:"
> > > >
> > > > but my columns are like example
> > > > *URL:http://www.yahoo.com/(columnname)*3(some<http://www.yahoo.com/%28columnname%29*3%28some>
> <http://www.yahoo.com/%28columnname%29*3%28some>
> > <http://www.yahoo.com/%28columnname%29*3%28some>
> > > <http://www.yahoo.com/%28columnname%29*3%28some>int value) .And there
> > > > are thousands of rows.
> > > > Its not feasible to use the code from command prompt. Is there
> another
> > > way
> > > > ??
> > > >
> > > >
> > > >
> > > > On Thu, Apr 9, 2009 at 12:33 PM, Amandeep Khurana <[email protected]>
> > > > wrote:
> > > >
> > > > > You can use this...
> https://issues.apache.org/jira/browse/HBASE-897
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Amandeep Khurana
> > > > > Computer Science Graduate Student
> > > > > University of California, Santa Cruz
> > > > >
> > > > >
> > > > > On Wed, Apr 8, 2009 at 11:59 PM, Rakhi Khatwani <
> > > > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > Hi Andy,
> > > > > >
> > > > > > I want to back up my HBase and move to a more powerful machine. I
> > am
> > > > > trying
> > > > > > distcp but it doesnot backup hbase folder properly. When I try
> > > > restoring
> > > > > > the
> > > > > > hbase folder I don't get all the records. Some tables are coming
> > > blank.
> > > > > > What
> > > > > > could be the reason.??
> > > > > >
> > > > > > On Wed, Apr 8, 2009 at 11:23 PM, Andrew Purtell <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > > I updated the Troubleshooting page on the wiki with a section
> > > > > > > about EC2. Please feel free to extend/enhance/revise.
> > > > > > >
> > > > > > >   - Andy
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>

Reply via email to