Ananth T. Sarathy wrote:
I am trying to download binary files stored in Hadoop but there is like a 2
minute wait on a 20mb file when I try to execute the in.read(buf).

What does this mean : 2 min to pipe 20mb or one or your one of the in.read() calls took 2 minutes? Your code actually measures team for read and write.

There is nothing in FSInputstream to cause this slow down. Do you think anyone would use Hadoop otherwise? It would be as fast as underlying filesystem goes.

Raghu.

is there a better way to be doing this?

    private void pipe(InputStream in, OutputStream out) throws IOException
    {    System.out.println(System.currentTimeMillis()+" Starting to Pipe
Data");
        byte[] buf = new byte[1024];
        int read = 0;
        while ((read = in.read(buf)) >= 0)
        {
            out.write(buf, 0, read);
            System.out.println(System.currentTimeMillis()+" Piping Data");
        }
        out.flush();
        System.out.println(System.currentTimeMillis()+" Finished Piping
Data");

    }

public void readFile(String fileToRead, OutputStream out)
            throws IOException
    {
        System.out.println(System.currentTimeMillis()+" Start Read File");
        Path inFile = new Path(fileToRead);
        System.out.println(System.currentTimeMillis()+" Set Path");
        // Validate the input/output paths before reading/writing.

        if (!fs.exists(inFile))
        {
            throw new HadoopFileException("Specified file  " + fileToRead
                    + " not found.");
        }
        if (!fs.isFile(inFile))
        {
            throw new HadoopFileException("Specified file  " + fileToRead
                    + " not found.");
        }
        // Open inFile for reading.
        System.out.println(System.currentTimeMillis()+" Opening Data
Stream");
        FSDataInputStream in = fs.open(inFile);

        System.out.println(System.currentTimeMillis()+" Opened Data
Stream");
        // Open outFile for writing.

        // Read from input stream and write to output stream until EOF.
        pipe(in, out);

        // Close the streams when done.
        out.close();
        in.close();
    }
Ananth T Sarathy


Reply via email to