Thanks Ted.  I just didn't ask it right.  Here is a stupid 101
question, which I am sure the answer lies in the documentation
somewhere, just that I was having some difficulties in finding it...

when I do an "ls" on the dfs,  I would see this:
/user/bear/output/part-00000 <r 4>

I probably got confused on what the part-##### means... I thought
part-##### tells how many splits a file has... so far, I have only
seen part-00000.  When will it have part-00001, 00002, etc?



On Jan 16, 2008 11:04 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>
> Parallelizing the processing of data occurs at two steps.  The first is
> during the map phase where the input data file is (hopefully) split across
> multiple tasks.  This should happen transparently most of the time unless
> you have a perverse data format or use unsplittable compression on your
> file.
>
> This parallelism can occur whether you have one input file or many.
>
> The second level of parallelism is at reduce phase.  You set this by setting
> the number of reducers.  This will also determine the number of output files
> that you get.
>
> Depending on your algorithm, it may help or hurt to have one or many
> reducers.  The recent example of a program to find the 10 largest elements
> is an example that pretty much requires a single reducer.  Other programs
> where the mapper produces huge amounts of output would be better served by
> having many reducers.
>
> This is a general answer since the question is kind of non-specific.
>
>
>
> On 1/16/08 7:59 AM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > How do I make hadoop split its output?  The program I am writing
> > crawls a catalog tree from a single url, so initially the input
> > contains only one entry.  after a few iterations, it will have tens of
> > thousands of urls.  But what I noticed is that the file is always in
> > one block (part-00000).   What I would like to have is once the number
> > of entries increases, it can parallelize the job.  Currently it
> > doesn't seem to be case.
>
>



-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------

Reply via email to