Re: LocalDirAllocator and getLocalPathForWrite

2011-01-05 Thread Todd Lipcon
Hi Marc,

LocalDirAllocator is an internal-facing API and you shouldn't be using it
from user code. If you write into mapred.local.dir like this, you will end
up with conflicts between different tasks running from the same node.

The working directory of your MR task is already within one of the drives,
and there isn't usually a good reason to write to multiple drives from
within a task - you should get parallelism by running multiple tasks at the
same time, not by having each task write to multiple places.

Thanks
-Todd

On Wed, Jan 5, 2011 at 8:35 AM, Marc Sturlese wrote:

>
> I have a doubt about how this works. The API documentation says that the
> class LocalDirAllocator is: "An implementation of a round-robin scheme for
> disk allocation for creating files"
> I am wondering, the disk allocation is done in the constructor?
> Let's say I have a cluster of just 1 node and 4 disks and I do inside a
> reducer:
> LocalDirAllocator localDirAlloc = new
> LocalDirAllocator("mapred.local.dir");
> Path pathA = localDirAlloc.getLocalPathForWrite("a") ;
> Path pathB = localDirAlloc.getLocalPathForWrite("b") ;
>
> The local paths pathA and pathB will for sure be in the same local disk as
> it was allocated by new LocalDirAllocator("mapred.local.dir") or is
> getLocalPathForWrite who gets the disk and so the two paths might not be in
> the same disk (as I have 4 disks)?
>
> Thanks in advance
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2199517.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: LocalDirAllocator and getLocalPathForWrite

2011-01-05 Thread Marc Sturlese

Hey Todd,

>>LocalDirAllocator is an internal-facing API and you shouldn't be using it
>>from user code. If you write into mapred.local.dir like this, you will end
>>up with conflicts between different tasks running from the same node

 I know it's a bit odd usage but the thing is that I need to create files in
the local file system, work in there with them amb after that upload them to
hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I
create a folder which looks like "mapred.local.dir"/taskId/attemptId and I
work there and aparently I am having no problems.

>>and there isn't usually a good reason to write to multiple drives from
>>within a task

When I said I had a cluster of one node, was just to try to clarify my doubt
and explain the example. My cluster is bigger than that actually and each
node has more than 1 phisical disk. To have multuple task running at the
same time is what I do. I would like each task to write just to a single
local disk but don't know how to do it. 

>>The working directory of your MR task is already within one of the drives,

Is there a way to get a working directory in the local disk from the
reducer?
Could I do something similar to:
FileSystem fs = FileSystem.get(conf);
LocalFileSystem localFs = fs.getLocal(conf);
Path path = localFs.getWorkingDirectory();
I would apreciate if you can tell me a bit more about this.
I need to deal with these files just in local and want them copied to hdfs
just when I finish working with them. 

Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: LocalDirAllocator and getLocalPathForWrite

2011-01-05 Thread Todd Lipcon
Hi Marc,

Yes, using LocalFileSystem would work fine, or you can just use the normal
java.io.File APIs.

-Todd

On Wed, Jan 5, 2011 at 3:26 PM, Marc Sturlese wrote:

>
> Hey Todd,
>
> >>LocalDirAllocator is an internal-facing API and you shouldn't be using it
> >>from user code. If you write into mapred.local.dir like this, you will
> end
> >>up with conflicts between different tasks running from the same node
>
>  I know it's a bit odd usage but the thing is that I need to create files
> in
> the local file system, work in there with them amb after that upload them
> to
> hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I
> create a folder which looks like "mapred.local.dir"/taskId/attemptId and I
> work there and aparently I am having no problems.
>
> >>and there isn't usually a good reason to write to multiple drives from
> >>within a task
>
> When I said I had a cluster of one node, was just to try to clarify my
> doubt
> and explain the example. My cluster is bigger than that actually and each
> node has more than 1 phisical disk. To have multuple task running at the
> same time is what I do. I would like each task to write just to a single
> local disk but don't know how to do it.
>
> >>The working directory of your MR task is already within one of the
> drives,
>
> Is there a way to get a working directory in the local disk from the
> reducer?
> Could I do something similar to:
> FileSystem fs = FileSystem.get(conf);
> LocalFileSystem localFs = fs.getLocal(conf);
> Path path = localFs.getWorkingDirectory();
> I would apreciate if you can tell me a bit more about this.
> I need to deal with these files just in local and want them copied to hdfs
> just when I finish working with them.
>
> Thanks in advance.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>



-- 
Todd Lipcon
Software Engineer, Cloudera