I looked through the fs.readFile implementation and was wondering why Node 
creates a read stream, emits data event buffers, and then copies these 
buffers again into a single large buffer before returning. Perhaps there's 
a good reason for doing all of this, but it seemed allocating the final 
buffer upfront and copying directly into this using fs.read would be faster.

Here's a rough alternative implementation. It doesn't bother with the 
encoding parameter since this should not impact performance if it's not 
given by the caller. So this is just comparing the functions going from 
path to final buffer.

var readFile = function(path, end) {
  fs.open(path, 'r',
    function(error, descriptor) {
      if (error) return end(error);
      fs.fstat(descriptor,
        function(error, stats) {
          if (error) return end(error);
          var buffer = new Buffer(stats.size);
          var offset = 0;
          var length = buffer.length;
          var read = function() {
            fs.read(descriptor, buffer, offset, length, offset,
              function(error, copied) {
                if (error) return end(error);
                offset += copied;
                length -= copied;
                if (length === 0) return end(undefined, buffer);
                read();
              }
            );
          };
          read();
        }
      );
    }
  );
};

Maybe I've overlooked something but just very rough new Date().getTime() 
timing is showing this readFile version to consistently take about 12ms for 
Miles Davis' So What (9mb) and fs.readFile to consistently take about 33ms 
for the same.

Hypothesis is that the extra step of data chunk copying is slowing things 
down, plus all the unnecessary object instantiations, event emitters, 
conditionals branching etc.

Am I doing anything wrong or is it right that creating a single large 
buffer and copying directly is faster?

Reply via email to