LocalDirAllocator and getLocalPathForWrite

2011-01-05 Thread Marc Sturlese

I have a doubt about how this works. The API documentation says that the
class LocalDirAllocator is: An implementation of a round-robin scheme for
disk allocation for creating files
I am wondering, the disk allocation is done in the constructor?
Let's say I have a cluster of just 1 node and 4 disks and I do inside a
reducer:
LocalDirAllocator localDirAlloc = new LocalDirAllocator(mapred.local.dir);
Path pathA = localDirAlloc.getLocalPathForWrite(a) ;
Path pathB = localDirAlloc.getLocalPathForWrite(b) ;

The local paths pathA and pathB will for sure be in the same local disk as
it was allocated by new LocalDirAllocator(mapred.local.dir) or is
getLocalPathForWrite who gets the disk and so the two paths might not be in
the same disk (as I have 4 disks)?

Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2199517.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: LocalDirAllocator and getLocalPathForWrite

2011-01-05 Thread Todd Lipcon
Hi Marc,

LocalDirAllocator is an internal-facing API and you shouldn't be using it
from user code. If you write into mapred.local.dir like this, you will end
up with conflicts between different tasks running from the same node.

The working directory of your MR task is already within one of the drives,
and there isn't usually a good reason to write to multiple drives from
within a task - you should get parallelism by running multiple tasks at the
same time, not by having each task write to multiple places.

Thanks
-Todd

On Wed, Jan 5, 2011 at 8:35 AM, Marc Sturlese marc.sturl...@gmail.comwrote:


 I have a doubt about how this works. The API documentation says that the
 class LocalDirAllocator is: An implementation of a round-robin scheme for
 disk allocation for creating files
 I am wondering, the disk allocation is done in the constructor?
 Let's say I have a cluster of just 1 node and 4 disks and I do inside a
 reducer:
 LocalDirAllocator localDirAlloc = new
 LocalDirAllocator(mapred.local.dir);
 Path pathA = localDirAlloc.getLocalPathForWrite(a) ;
 Path pathB = localDirAlloc.getLocalPathForWrite(b) ;

 The local paths pathA and pathB will for sure be in the same local disk as
 it was allocated by new LocalDirAllocator(mapred.local.dir) or is
 getLocalPathForWrite who gets the disk and so the two paths might not be in
 the same disk (as I have 4 disks)?

 Thanks in advance
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2199517.html
 Sent from the Hadoop lucene-users mailing list archive at Nabble.com.




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: LocalDirAllocator and getLocalPathForWrite

2011-01-05 Thread Marc Sturlese

Hey Todd,

LocalDirAllocator is an internal-facing API and you shouldn't be using it
from user code. If you write into mapred.local.dir like this, you will end
up with conflicts between different tasks running from the same node

 I know it's a bit odd usage but the thing is that I need to create files in
the local file system, work in there with them amb after that upload them to
hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I
create a folder which looks like mapred.local.dir/taskId/attemptId and I
work there and aparently I am having no problems.

and there isn't usually a good reason to write to multiple drives from
within a task

When I said I had a cluster of one node, was just to try to clarify my doubt
and explain the example. My cluster is bigger than that actually and each
node has more than 1 phisical disk. To have multuple task running at the
same time is what I do. I would like each task to write just to a single
local disk but don't know how to do it. 

The working directory of your MR task is already within one of the drives,

Is there a way to get a working directory in the local disk from the
reducer?
Could I do something similar to:
FileSystem fs = FileSystem.get(conf);
LocalFileSystem localFs = fs.getLocal(conf);
Path path = localFs.getWorkingDirectory();
I would apreciate if you can tell me a bit more about this.
I need to deal with these files just in local and want them copied to hdfs
just when I finish working with them. 

Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: LocalDirAllocator and getLocalPathForWrite

2011-01-05 Thread Todd Lipcon
Hi Marc,

Yes, using LocalFileSystem would work fine, or you can just use the normal
java.io.File APIs.

-Todd

On Wed, Jan 5, 2011 at 3:26 PM, Marc Sturlese marc.sturl...@gmail.comwrote:


 Hey Todd,

 LocalDirAllocator is an internal-facing API and you shouldn't be using it
 from user code. If you write into mapred.local.dir like this, you will
 end
 up with conflicts between different tasks running from the same node

  I know it's a bit odd usage but the thing is that I need to create files
 in
 the local file system, work in there with them amb after that upload them
 to
 hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I
 create a folder which looks like mapred.local.dir/taskId/attemptId and I
 work there and aparently I am having no problems.

 and there isn't usually a good reason to write to multiple drives from
 within a task

 When I said I had a cluster of one node, was just to try to clarify my
 doubt
 and explain the example. My cluster is bigger than that actually and each
 node has more than 1 phisical disk. To have multuple task running at the
 same time is what I do. I would like each task to write just to a single
 local disk but don't know how to do it.

 The working directory of your MR task is already within one of the
 drives,

 Is there a way to get a working directory in the local disk from the
 reducer?
 Could I do something similar to:
 FileSystem fs = FileSystem.get(conf);
 LocalFileSystem localFs = fs.getLocal(conf);
 Path path = localFs.getWorkingDirectory();
 I would apreciate if you can tell me a bit more about this.
 I need to deal with these files just in local and want them copied to hdfs
 just when I finish working with them.

 Thanks in advance.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html
 Sent from the Hadoop lucene-users mailing list archive at Nabble.com.




-- 
Todd Lipcon
Software Engineer, Cloudera