As far as I know, FileInputFormat.getSplits() will returns the number
of splits automatically computed by the number of files, blocks. BTW,
What version of Hadoop/Hbase?

I tried to test that code
(http://wiki.apache.org/hadoop/Hbase/MapReduce) on my cluster (Hadoop
0.19.1 and Hbase 0.19.0). The number of input paths was 2, map tasks
were 274.

Below is my changed code for v0.19.0.
---
  public JobConf createSubmittableJob(String[] args) {
    JobConf c = new JobConf(getConf(), TestImport.class);
    c.setJobName(NAME);
    FileInputFormat.setInputPaths(c, args[0]);

    c.set("input.table", args[1]);
    c.setMapperClass(InnerMap.class);
    c.setNumReduceTasks(0);
    c.setOutputFormat(NullOutputFormat.class);
    return c;
  }



On Thu, Apr 23, 2009 at 6:19 PM, nguyenhuynh.mr
<nguyenhuynh...@gmail.com> wrote:
> Edward J. Yoon wrote:
>
>> How do you to add input paths?
>>
>> On Wed, Apr 22, 2009 at 5:09 PM, nguyenhuynh.mr
>> <nguyenhuynh...@gmail.com> wrote:
>>
>>> Edward J. Yoon wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> In that case, The atomic unit of split is a file. So, you need to
>>>> increase the number of files. or Use the TextInputFormat as below.
>>>>
>>>> jobConf.setInputFormat(TextInputFormat.class);
>>>>
>>>> On Wed, Apr 22, 2009 at 4:35 PM, nguyenhuynh.mr
>>>> <nguyenhuynh...@gmail.com> wrote:
>>>>
>>>>
>>>>> Hi all!
>>>>>
>>>>>
>>>>> I have a MR job use to import contents into HBase.
>>>>>
>>>>> The content is text file in HDFS. I used the maps file to store local
>>>>> path of contents.
>>>>>
>>>>> Each content has the map file. ( the map is a text file in HDFS and
>>>>> contain 1 line info).
>>>>>
>>>>>
>>>>> I created the maps directory used to contain map files. And the  this
>>>>> maps directory used to input path for job.
>>>>>
>>>>> When i run job, the number map task is same number map files.
>>>>> Ex: I have 5 maps file -> 5 map tasks.
>>>>>
>>>>> Therefor, the map phase is slowly :(
>>>>>
>>>>> Why the map phase is slowly if the number map task large and the number
>>>>> map task is equal number of files?.
>>>>>
>>>>> * p/s: Run jobs with: 3 node: 1 server and 2 slaver
>>>>>
>>>>> Please help me!
>>>>> Thanks.
>>>>>
>>>>> Best,
>>>>> Nguyen.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> Current, I use TextInputformat to set InputFormat for map phase.
>>>
>>>
>>
>>
>>
>> Thanks for your help!
> I use FileInputFormat to add input paths.
> Some thing like:
>    FileInputFormat.setInputPath(new Path("dir"));
>
> The "dir" is a directory contains input files.
>
> Best,
> Nguyen
>
>
>



-- 
Best Regards, Edward J. Yoon
edwardy...@apache.org
http://blog.udanax.org

Reply via email to