The file splitter, block reader combination allows for parallel reading of files by multiple partitions by dividing the files into blocks. Does anyone have any ideas on how to have the block readers be data local to the blocks they are reading.
I think we will need to spawn block readers on all nodes where the block are present and if the readers are reading multiple files this could mean all the nodes in the cluster and route the block meta information to the appropriate block reader. Thanks
