Hi Wes, Thanks,
*[ Part 1 ]* *C++ HDFS/ORC [Completed]* Steps which I followed : 1) arrow::fs::HadoopFileSystem --> create a hadoop FS 2) std::shared_ptr<io::RandomAccessFile> -->then create a stream 3) Pass that stream to adapters::orc::ORCFileReader *[Part 2 ]* *C++ HDFS/ORC via Java JNI [Partial Completed]* *Follow same approach in orc.jni_wrapper* 1) arrow::fs::HadoopFileSystem --> create a hadoop FS 2) std::shared_ptr<io::RandomAccessFile> -->then create a stream 3) Pass that stream to adapters::orc::ORCFileReader *<jni snippet>* std::unique_ptr<ORCFileReader> reader; arrow::Status ret; if (path.find("hdfs://") == 0) { * arrow::fs::HdfsOptions options_; options_ = *arrow::fs::HdfsOptions::FromUri(path); auto _fsRes = arrow::fs::HadoopFileSystem::Make(options_); if (!_fsRes.ok()) { std::cerr << "HadoopFileSystem::Make failed, it is possible when we don't have " "proper driver on this node, err msg is " << _fsRes.status().ToString(); } _fs = *_fsRes; auto _stream = *_fs->OpenInputFile(path); hadoop_fs_holder_.Insert(_fs); //global holder in arrow::jni::ConcurrentMap, cleared during unload ret = ORCFileReader::Open( * *_stream* *, arrow::default_memory_pool(), &reader);* * if (!ret.ok()) { env->ThrowNew(io_exception_class, std::string("Failed open file" + path).c_str()); }* * return orc_reader_holder_.Insert(std::shared_ptr<ORCFileReader>(reader.release()));* *}* JNI also works fine, but at the end of application, I am getting segmentation fault. *Do you have any idea about , looks like some issue with libhdfs connection close or cleanup ?* *stack trace:* /tmp/tmp3973555041947319188libarrow_orc_jni.so : ()+0xb8b1a3 /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x153c0 /lib/x86_64-linux-gnu/libc.so.6 : gsignal()+0xcb /lib/x86_64-linux-gnu/libc.so.6 : abort()+0x12b /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so : ()+0x90e769 /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so : ()+0xad3803 /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so : JVM_handle_linux_signal()+0x1a5 /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so : ()+0x90b8b8 /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x153c0 /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so : ()+0x8cac27 /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so : ()+0x8cc50b /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so : ()+0xada661 /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so : ()+0x6c237d * /home/legion/ha_devel/hadoop-ecosystem-3x/hadoop-3.1.1/lib/native/libhdfs.so : ()+0xaa4f* /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x85a1 /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x962a /lib/x86_64-linux-gnu/libc.so.6 : clone()+0x43 On Wed, 8 Sept 2021 at 04:07, Weston Pace <weston.p...@gmail.com> wrote: > I'll just add that a PR in in progress (thanks Joris!) for adding this > adapter: https://github.com/apache/arrow/pull/10991 > > On Tue, Sep 7, 2021 at 12:05 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > I'm missing context but if you're talking about C++/Python, we are > > currently missing a wrapper interface to the ORC reader in the Arrow > > datasets library > > > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/dataset > > > > We have CSV, Arrow (IPC), and Parquet interfaces. > > > > But we have an HDFS filesystem implementation and an ORC reader > > implementation, so mechanically all of the pieces are there but need > > to be connected together. > > > > Thanks, > > Wes > > > > On Tue, Sep 7, 2021 at 8:22 AM Manoj Kumar <man...@zettabolt.com> wrote: > > > > > > Hi Dev-Community, > > > > > > Anyone can help me to guide how to read ORC directly from HDFS to an > > > arrow dataset. > > > > > > Thanks > > > Manoj >