I have many large files ranging from 2gb to 800gb and I use hadoop fs -cat a lot to pipe to various programs.
I was wondering if its possible to prefetch the data for clients with more bandwidth. Most of my clients have 10g interface and datanodes are 1g. I was thinking, prefetch x blocks (even though it will cost extra memory) while reading block y. After block y is read, read the prefetched blocked and then throw it away. It should be used like this: export PREFETCH_BLOCKS=2 #default would be 1 hadoop fs -pcat hdfs://namenode/verylarge file | program Any thoughts? -- --- Get your facts first, then you can distort them as you please.--