Hey Per,

Your performance will most likely be limited by your connection to HDFS and replication. If you are connected via 1Gbps lan and have 3-fold replication, then you can write at most 1 / 3 Gbps to HDFS. (Note: If you write many many small HDFS files then of course everything will be horribly slow anyways) I had to do something like this once (write files in a tar archive to a sequence file) and java was never the bottleneck. Or do you have massively higher connection to HDFS?

best,
Stephan

On 30.08.2011 10:19, Per Steffensen wrote:
Hi

I want to unzip a file that is living on an external (external from HDFS) filesystem to HDFS, so that the unzipped files end up in some folder on the HDFS. This needs to be as efficient as possible - so e.g. if it is done i java code it probably needs to involve java.nio.channels stuff or something that works directly with I/O resources. Can anyone point me to the best/easiest/most efficient way to do this? I would like to at least be able to invoke/initiate the unzip-process from java code, but I guess I can invoke anything from java, so that is not much of a requirement.

Regards, Per Steffensen

Reply via email to