Re: Basic question about using C# with Hadoop filesystems

Andrew Purtell Sun, 10 Jan 2010 11:31:14 -0800

Bear in mind that hdfs-fuse has something like a 30% performance impact
when compared with direct access via the Java API. The data path is
something like:

    your app -> kernel -> libfuse -> JVM -> kernel -> HDFS

    HDFS -> kernel-> JVM -> libfuse -> kernel -> your app

On Windows especially context switching during I/O like that has a 
high penalty. Maybe it would be better to bind the C libhdfs API
directly via a C# wrapper (see http://wiki.apache.org/hadoop/LibHDFS).
But, at that point, you have pulled the Java Virtual Machine into the
address space of your process and are bridging between Java land and
C# land over the JNI and the C# equivalent. So, at this point, why not
just use Java instead of C#? Or, just use C and limit the damage to
only one native-to-managed interface instead of two?

The situation will change somewhat when/if all HDFS RPC is moved to
some RPC and serialization scheme which is truly language independent,
i.e. Avro. I have no idea when or if that will happen. Even if that
happens, as Ryan said before, the HDFS client is fat. Just talking
the RPC gets you maybe 25% of the way toward a functional HDFS
client. 

The bottom line is the Hadoop software ecosystem has a strong Java
affinity. 

   - Andy

----- Original Message ----
> From: Jean-Daniel Cryans <[email protected]>
> To: [email protected]
> Sent: Sun, January 10, 2010 8:57:32 AM
> Subject: Re: Basic question about using C# with Hadoop filesystems
> 
> http://code.google.com/p/hdfs-fuse/
> 
> On Sun, Jan 10, 2010 at 7:36 AM, Aram Mkhitaryan
> wrote:
> > ah, sorry, forgot to mention, it's in hdfs-user mailing list
> > [email protected]

Re: Basic question about using C# with Hadoop filesystems

Reply via email to