Re: [DISCUSS] Merging HDFS-8707 (C++ HDFS client) to trunk

Chris Douglas Wed, 28 Feb 2018 13:09:12 -0800

+1

Let's get this done. We've had many false starts on a native HDFS
client. This is a good base to build on. -C


On Wed, Feb 28, 2018 at 9:55 AM, Jim Clampffer
<james.clampf...@gmail.com> wrote:
> Hi everyone,
>
> I'd like to start a thread to discuss merging the HDFS-8707 aka libhdfs++
> into trunk.  I sent originally sent a similar email out last October but it
> sounds like it was buried by discussions about other feature merges that
> were going on at the time.
>
> libhdfs++ is an HDFS client written in C++ designed to be used in
> applications that are written in non-JVM based languages.  In its current
> state it supports kerberos authenticated reads from HDFS and has been used
> in production clusters for over a year so it has had a significant amount
> of burn-in time.  The HDFS-8707 branch has been around for about 2 years
> now so I'd like to know people's thoughts on what it would take to merge
> current branch and handling writes and encrypted reads in a new one.
>
> Current notable features:
>   -A libhdfs/libhdfs3 compatible C API that allows libhdfs++ to serve as a
> drop-in replacement for clients that only need read support (until
> libhdfs++ also supports writes).
>   -An asynchronous C++ API with synchronous shims on top if the client
> application wants to do blocking operations.  Internally a single thread
> (optionally more) uses select/epoll by way of boost::asio to watch
> thousands of sockets without the overhead of spawning threads to emulate
> async operation.
>   -Kerberos/SASL authentication support
>   -HA namenode support
>   -A set of utility programs that mirror the HDFS CLI utilities e.g.
> "./hdfs dfs -chmod".  The major benefit of these is the tool startup time
> is ~3 orders of magnitude faster (<1ms vs hundreds of ms) and occupies a
> lot less memory since it isn't dealing with the JVM.  This makes it
> possible to do things like write a simple bash script that stats a file,
> applies some rules to the result, and decides if it should move it in a way
> that scales to thousands of files without being penalized with O(N) JVM
> startups.
>   -Cancelable reads.  This has proven to be very useful in multiuser
> applications that (pre)fetch large blocks of data but need to remain
> responsive for interactive users.  Rather than waiting for a large and/or
> slow read to finish it will return immediately and the associated resources
> (buffer, file descriptor) become available for the rest of the application
> to use.
>
> There are a couple known issues: the doc build isn't integrated with the
> rest of hadoop and the public API headers aren't being exported when
> building a distribution.  A short term solution for missing docs is to go
> through the libhdfs(3) compatible API and use the libhdfs docs.  Other than
> a few modifications to the pom files to integrate the build and the changes
> are isolated to a new directory so the chance of causing any regressions in
> the rest of the code is minimal.
>
> Please share your thoughts, thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSS] Merging HDFS-8707 (C++ HDFS client) to trunk

Reply via email to