Edward Capriolo wrote:
On Wed, Aug 19, 2009 at 11:11 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

It would be as fast as underlying filesystem goes.
I would not agree with that statement. There is overhead.

You might be misinterpreting my comment. There is of course some over head (at the least the procedure calls).. depending on you underlying filesystem, there could be extra buffer copies and CRC overhead. But none of that explains transfer as slow as 1 MBps (if my interpretation of of results is correct).

Raghu.

In some testing I did writing a small file can
take 30-300 ms. So if you have 9000 small files (like I did) and you
are single threaded this takes a long time.

If you orchestrate your task to use FSDataInput and FSDataOutput in
the map or reduce phase then each mapper or reducer is writing a file
at a time. Now that is fast.

Ananth, are you doing your r/w inside a map/reduce job or are you just
using FS* in a top down program?



On Wed, Aug 19, 2009 at 1:26 AM, Raghu Angadi<rang...@yahoo-inc.com>
wrote:
Ananth T. Sarathy wrote:
I am trying to download binary files stored in Hadoop but there is like
a
2
minute wait on a 20mb file when I try to execute the in.read(buf).
What does this mean : 2 min to pipe 20mb or one or your one of the
in.read()
calls took 2 minutes? Your code actually measures team for read and
write.
There is nothing in FSInputstream to cause this slow down. Do you think
anyone would use Hadoop otherwise? It would be as fast as underlying
filesystem goes.

Raghu.

is there a better way to be doing this?

   private void pipe(InputStream in, OutputStream out) throws
IOException
   {    System.out.println(System.currentTimeMillis()+" Starting to Pipe
Data");
       byte[] buf = new byte[1024];
       int read = 0;
       while ((read = in.read(buf)) >= 0)
       {
           out.write(buf, 0, read);
           System.out.println(System.currentTimeMillis()+" Piping
Data");
       }
       out.flush();
       System.out.println(System.currentTimeMillis()+" Finished Piping
Data");

   }

public void readFile(String fileToRead, OutputStream out)
           throws IOException
   {
       System.out.println(System.currentTimeMillis()+" Start Read
File");
       Path inFile = new Path(fileToRead);
       System.out.println(System.currentTimeMillis()+" Set Path");
       // Validate the input/output paths before reading/writing.

       if (!fs.exists(inFile))
       {
           throw new HadoopFileException("Specified file  " + fileToRead
                   + " not found.");
       }
       if (!fs.isFile(inFile))
       {
           throw new HadoopFileException("Specified file  " + fileToRead
                   + " not found.");
       }
       // Open inFile for reading.
       System.out.println(System.currentTimeMillis()+" Opening Data
Stream");
       FSDataInputStream in = fs.open(inFile);

       System.out.println(System.currentTimeMillis()+" Opened Data
Stream");
       // Open outFile for writing.

       // Read from input stream and write to output stream until EOF.
       pipe(in, out);

       // Close the streams when done.
       out.close();
       in.close();
   }
Ananth T Sarathy



Reply via email to