Sorry for the delay. 1) This works similar how Hadoop distributes the Keys to the reducers, there is a HashPartitioner that rewrites the vertices to n-files where n is the number of tasks. 2) block size doesn't matter in this case because a filesplit will be associated with the partitioned files.
Am 19. April 2012 03:01 schrieb Praveen Sripati <[email protected]>: > 1. Lets say the input is partitioned into part0, part1, part2, part3 and > part4. How is it ensured that bsp0 processes part0, bsp1 processes part1 > and so on and there is no mix? We don't want bsp0 to process part1. > > private void send(BSPPeerProtocol peer, BSPMessage msg) throws IOException > { > int mod = ((Integer) msg.getTag()) % peer.getAllPeerNames().length; > peer.send(peer.getAllPeerNames()[mod], msg); > } > > 2) If the partition file size is more than the HDFS block size and 1+ bsp > task processes a single partition, how is this scenario handled? > > Thanks, > Praveen > -- Thomas Jungblut Berlin <[email protected]>
