On Jan 29, 2008 10:50 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> IF you drill into the task using the job tracker's web interface, you can
> get to the tasks xml configuration.  That configuration will have the input
> file split specification in it.
>
> You may also be able to see the input file elsewhere, but the xml
> configuration is definitive.

In task XML configuration I see only split file name
('mapred.job.split.file' property
which have value like
'/disk3/nutch/data/filesystem/mapreduce/system/job_200801212103_0067/job.split')
but not the original file name. Any way to get more information about splits?

Also, I was under impression that in case of the gzip input files are
not split. Does that mean that
even if they are not split they copies of them are made anyway? That
could be a potential optimization point.

Vadim

Reply via email to