LocalDirAllocator and getLocalPathForWrite
I have a doubt about how this works. The API documentation says that the class LocalDirAllocator is: An implementation of a round-robin scheme for disk allocation for creating files I am wondering, the disk allocation is done in the constructor? Let's say I have a cluster of just 1 node and 4 disks and I do inside a reducer: LocalDirAllocator localDirAlloc = new LocalDirAllocator(mapred.local.dir); Path pathA = localDirAlloc.getLocalPathForWrite(a) ; Path pathB = localDirAlloc.getLocalPathForWrite(b) ; The local paths pathA and pathB will for sure be in the same local disk as it was allocated by new LocalDirAllocator(mapred.local.dir) or is getLocalPathForWrite who gets the disk and so the two paths might not be in the same disk (as I have 4 disks)? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2199517.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: LocalDirAllocator and getLocalPathForWrite
Hi Marc, LocalDirAllocator is an internal-facing API and you shouldn't be using it from user code. If you write into mapred.local.dir like this, you will end up with conflicts between different tasks running from the same node. The working directory of your MR task is already within one of the drives, and there isn't usually a good reason to write to multiple drives from within a task - you should get parallelism by running multiple tasks at the same time, not by having each task write to multiple places. Thanks -Todd On Wed, Jan 5, 2011 at 8:35 AM, Marc Sturlese marc.sturl...@gmail.comwrote: I have a doubt about how this works. The API documentation says that the class LocalDirAllocator is: An implementation of a round-robin scheme for disk allocation for creating files I am wondering, the disk allocation is done in the constructor? Let's say I have a cluster of just 1 node and 4 disks and I do inside a reducer: LocalDirAllocator localDirAlloc = new LocalDirAllocator(mapred.local.dir); Path pathA = localDirAlloc.getLocalPathForWrite(a) ; Path pathB = localDirAlloc.getLocalPathForWrite(b) ; The local paths pathA and pathB will for sure be in the same local disk as it was allocated by new LocalDirAllocator(mapred.local.dir) or is getLocalPathForWrite who gets the disk and so the two paths might not be in the same disk (as I have 4 disks)? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2199517.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- Todd Lipcon Software Engineer, Cloudera
Re: LocalDirAllocator and getLocalPathForWrite
Hey Todd, LocalDirAllocator is an internal-facing API and you shouldn't be using it from user code. If you write into mapred.local.dir like this, you will end up with conflicts between different tasks running from the same node I know it's a bit odd usage but the thing is that I need to create files in the local file system, work in there with them amb after that upload them to hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I create a folder which looks like mapred.local.dir/taskId/attemptId and I work there and aparently I am having no problems. and there isn't usually a good reason to write to multiple drives from within a task When I said I had a cluster of one node, was just to try to clarify my doubt and explain the example. My cluster is bigger than that actually and each node has more than 1 phisical disk. To have multuple task running at the same time is what I do. I would like each task to write just to a single local disk but don't know how to do it. The working directory of your MR task is already within one of the drives, Is there a way to get a working directory in the local disk from the reducer? Could I do something similar to: FileSystem fs = FileSystem.get(conf); LocalFileSystem localFs = fs.getLocal(conf); Path path = localFs.getWorkingDirectory(); I would apreciate if you can tell me a bit more about this. I need to deal with these files just in local and want them copied to hdfs just when I finish working with them. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: LocalDirAllocator and getLocalPathForWrite
Hi Marc, Yes, using LocalFileSystem would work fine, or you can just use the normal java.io.File APIs. -Todd On Wed, Jan 5, 2011 at 3:26 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey Todd, LocalDirAllocator is an internal-facing API and you shouldn't be using it from user code. If you write into mapred.local.dir like this, you will end up with conflicts between different tasks running from the same node I know it's a bit odd usage but the thing is that I need to create files in the local file system, work in there with them amb after that upload them to hdfs (I use the outputcomitter.) To avoid the conflicts you talk about, I create a folder which looks like mapred.local.dir/taskId/attemptId and I work there and aparently I am having no problems. and there isn't usually a good reason to write to multiple drives from within a task When I said I had a cluster of one node, was just to try to clarify my doubt and explain the example. My cluster is bigger than that actually and each node has more than 1 phisical disk. To have multuple task running at the same time is what I do. I would like each task to write just to a single local disk but don't know how to do it. The working directory of your MR task is already within one of the drives, Is there a way to get a working directory in the local disk from the reducer? Could I do something similar to: FileSystem fs = FileSystem.get(conf); LocalFileSystem localFs = fs.getLocal(conf); Path path = localFs.getWorkingDirectory(); I would apreciate if you can tell me a bit more about this. I need to deal with these files just in local and want them copied to hdfs just when I finish working with them. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/LocalDirAllocator-and-getLocalPathForWrite-tp2199517p2202221.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- Todd Lipcon Software Engineer, Cloudera