Hello Every one, I have couple of doubts can any one please point me in right direction.
1>What exactly happen when I want to copy 1TB file to Hadoop Cluster using copyfromlocal command 1> what will be the split size? will it be same as the block size? 2> What is a block and split? If we have 100 MB file and a block size of 64 MB, As we know it will be divided into 2 blocks of 64 MB and 36 MB the second block still has 28 MB of space left what will happen to that free space? will the cluster have unequal block size or will it be occupied by other file? 3) let’s say a 64MB block is on node A and replicated among 2 other nodes(B,C), and the input split size for the map-reduce program is 64MB, will this split just have location for node A? Or will it have locations for all the three nodes A,b,C? 4) How is it handled if the Input Split size is greater or lesser than block size? can any one please help? thanks SP