Hey Per,
Your performance will most likely be limited by your connection to HDFS
and replication. If you are connected via 1Gbps lan and have 3-fold
replication, then you can write at most 1 / 3 Gbps to HDFS. (Note: If
you write many many small HDFS files then of course everything will be
horribly slow anyways) I had to do something like this once (write files
in a tar archive to a sequence file) and java was never the bottleneck.
Or do you have massively higher connection to HDFS?
best,
Stephan
On 30.08.2011 10:19, Per Steffensen wrote:
Hi
I want to unzip a file that is living on an external (external from
HDFS) filesystem to HDFS, so that the unzipped files end up in some
folder on the HDFS. This needs to be as efficient as possible - so
e.g. if it is done i java code it probably needs to involve
java.nio.channels stuff or something that works directly with I/O
resources. Can anyone point me to the best/easiest/most efficient way
to do this? I would like to at least be able to invoke/initiate the
unzip-process from java code, but I guess I can invoke anything from
java, so that is not much of a requirement.
Regards, Per Steffensen