+1 (binding) -C On Thu, Mar 8, 2018 at 9:31 AM, Jim Clampffer <james.clampf...@gmail.com> wrote: > Hi Everyone, > > The feedback was generally positive on the discussion thread [1] so I'd > like to start a formal vote for merging HDFS-8707 (libhdfs++) into trunk. > The vote will be open for 7 days and end 6PM EST on 3/15/18. > > This branch includes a C++ implementation of an HDFS client for use in > applications that don't run an in-process JVM. Right now the branch only > supports reads and metadata calls. > > Features (paraphrasing the list from the discussion thread): > -Avoiding the JVM means applications that use libhdfs++ can explicitly > control resources (memory, FDs, threads). The driving goal for this > project was to let C/C++ applications access HDFS while maintaining a > single heap. > -Includes support for Kerberos authentication. > -Includes a libhdfs/libhdfs3 compatible C API as well as a C++ API that > supports asynchronous operations. Applications that only do reads may be > able to use this as a drop in replacement for libhdfs. > -Asynchronous IO is built on top of boost::asio which in turn uses > select/epoll so many sockets can be monitored from a single thread (or > thread pool) rather than spawning a thread to sleep on a blocked socket. > -Includes a set of utilities written in C++ that mirror the CLI tools (e.g. > ./hdfs dfs -ls). These have a 3 order of magnitude lower startup time than > java client which is useful for scripts that need to work with many files. > -Support for cancelable reads that release associated resources > immediately. Useful for applications that need to be responsive to > interactive users. > > Other points: > -This is almost all new code in a new subdirectory. No Java source for the > rest of hadoop was changed so there's no risk of regressions there. The > only changes outside of that subdirectory were integrating the build in > some of the pom files and adding a couple dependencies to the DockerFile. > -The library has had plenty of burn-in time. It's been used in production > for well over a year and is indirectly being distributed as part of the > Apache ORC project (in the form of a third party dependency). > -There isn't much in the way of well formatted documentation right now. > The documentation for the libhdfs API is applicable to the libhdfs++ C API. > Header files describe various component including details about threading > and lifecycle expectations for important objects. Good places to start are > hdfspp.h, filesystem.h, filehandle.h, rpc_connection.h and rpc_enginel.h. > > I'll start with my +1 (binding). > > [1] > http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201803.mbox/browser > (second message in thread, can't figure out how to link directly to mine) > > Thanks!
--------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org