Re: Interesting Hadoop/FUSE-DFS access patterns

Brian Bockelman Tue, 14 Apr 2009 04:53:32 -0700

Hey Jason,

Thanks, I'll keep this on hand as I do more tests. I now have a C,Java, and python version of my testing program ;)

However, I particularly *like* the fact that there's caching going on- it'll help out our application immensely, I think. I'll be lookingat the performance both with and without the cache.


Brian

On Apr 14, 2009, at 12:01 AM, jason hadoop wrote:

The following very simple program will tell the VM to drop the pagesbeingcached for a file. I tend to spin this in a for loop when makinglarge tarfiles, or otherwise working with large files, and the systemperformance
really smooths out.
Since it use open(path) it will churn through the inode cache and
directories.
Something like this might actually significantly speed up HDFS byrunning
over the blocks on the datanodes, for busy clusters.


#define _XOPEN_SOURCE 600
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
/** Simple program to dump buffered data for specific files from thebuffer
cache. Copyright Jason Venner 2009, License GPL*/

int main( int argc, char** argv )
{
 int failCount = 0;
 int i;
 for( i = 1; i < argc; i++ ) {
   char* file = argv[i];
   int fd = open( file, O_RDONLY|O_LARGEFILE );
   if (fd == -1) {
     perror( file );
     failCount++;
     continue;
   }
   if (posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED )!=0) {
fprintf( stderr, "Failed to flush cache for %s %s\n",argv[optind],
strerror( posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED ) ) );
     failCount++;
   }
   close(fd);
 }
 exit( failCount );
}
On Mon, Apr 13, 2009 at 4:01 PM, Scott Carey<sc...@richrelevance.com>wrote:
On 4/12/09 9:41 PM, "Brian Bockelman" <bbock...@cse.unl.edu> wrote:
Ok, here's something perhaps even more strange. I removed the"seek"part out of my timings, so I was only timing the "read" instead ofthe"seek + read" as in the first case. I also turned the read-aheaddown
to 1-byte (aka, off).

The jump *always* occurs at 128KB, exactly.
Some random ideas:

I have no idea how FUSE interops with the Linux block layer, but 128K
happens to be the default 'readahead' value for block devices,which may
just be a coincidence.
For a disk 'sda', you check and set the value (in 512 byte blocks)with:
/sbin/blockdev --getra /dev/sda
/sbin/blockdev --setra [num blocks] /dev/sda
I know on my file system tests, the OS readahead is not activateduntil aseries of sequential reads go through the block device, so trulyrandomaccess is not affected by this. I've set it to 128MB and randomiops doesnot change on a ext3 or xfs file system. If this applies to FUSEtoo,
there
may be reasons that this behavior differs.
Furthermore, one would not expect it to be slower to randomly read4k than
randomly read up to the readahead size itself even if it did.

I also have no idea how much of the OS device queue and block device
scheduler is involved with FUSE. If those are involved, thenthere's a
bunch of stuff to tinker with there as well.

Lastly, an FYI if you don't already know the following.  If the OS is
caching pages, there is a way to flush these in Linux to evict thecache.
See /proc/sys/vm/drop_caches .
I'm a bit befuddled. I know we say that HDFS is optimized forlarge,
sequential reads, not random reads - but it seems that it's one bug-
fix away from being a good general-purpose system. Heck if I canfind
what's causing the issues though...

Brian
--
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Interesting Hadoop/FUSE-DFS access patterns

Reply via email to