No. I was referring to the fact that the locality of any given large file is about the same from any node because there are many blocks and they get spattered all over.
Only with a small file is this rough and random symmetry broken. For files < 1 block, there are typically three privileged nodes from the standpoint of locality. Another approach would be to set the replication on the SGD input somewhat higher than normal. This would make many nodes have lots of local blocks but wouldn't change the fact that nodes are all, on average, the same with respect to locality. On Thu, Jan 28, 2010 at 3:17 PM, Jake Mannix <[email protected]> wrote: > > If you are running SGD on a single node, just open the HDFS files > directly. > > You won't have significant benefit to locality unless the files are > > relatively small. > > > > You mean relatively large, right? -- Ted Dunning, CTO DeepDyve
