On further analysis :- ==114164== Process terminating with default action of signal 6 (SIGABRT) ==114164== at 0x4AD118B: raise (raise.c:51) ==114164== by 0x4AB092D: abort (abort.c:100) ==114164== by 0x598D768: os::abort(bool) (in /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so) ==114164== by 0x5B52802: VMError::report_and_die() (in /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so) ==114164== by 0x59979F4: JVM_handle_linux_signal (in /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so) ==114164== by 0x598A8B7: signalHandler(int, siginfo*, void*) (in /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so) ==114164== by 0x485F3BF: ??? (in /usr/lib/x86_64-linux-gnu/ libpthread-2.31.so) ==114164== by 0x5949C26: Monitor::ILock(Thread*) [clone .part.2] (in /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so) ==114164== by 0x594B50A: Monitor::lock_without_safepoint_check() (in /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so) ==114164== by 0x5B59660: VM_Exit::wait_if_vm_exited() (in /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so) ==114164== by 0x574137C: jni_DetachCurrentThread (in /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so) ==114164== by 0x4140AA4E: hdfsThreadDestructor (thread_local_storage.c:53
It turned out that the issue was in libhdfs, so I fixed that. Now ORC JNI also works fine There are many features missing in ORC-JN like reading full split or index based reading etc etc Do we have any plan to support those ? On Wed, 8 Sept 2021 at 22:06, Manoj Kumar <man...@zettabolt.com> wrote: > Hi Wes, > > Thanks, > > *[ Part 1 ]* > *C++ HDFS/ORC [Completed]* > Steps which I followed : > 1) arrow::fs::HadoopFileSystem --> create a hadoop FS > 2) std::shared_ptr<io::RandomAccessFile> -->then create a stream > 3) Pass that stream to adapters::orc::ORCFileReader > > *[Part 2 ]* > *C++ HDFS/ORC via Java JNI [Partial Completed]* > *Follow same approach in orc.jni_wrapper* > 1) arrow::fs::HadoopFileSystem --> create a hadoop FS > 2) std::shared_ptr<io::RandomAccessFile> -->then create a stream > 3) Pass that stream to adapters::orc::ORCFileReader > > *<jni snippet>* > std::unique_ptr<ORCFileReader> reader; > arrow::Status ret; > if (path.find("hdfs://") == 0) { > > > > > > > > > > > > > > > * arrow::fs::HdfsOptions options_; options_ = > *arrow::fs::HdfsOptions::FromUri(path); auto _fsRes = > arrow::fs::HadoopFileSystem::Make(options_); if (!_fsRes.ok()) { > std::cerr << "HadoopFileSystem::Make failed, it > is possible when we don't have " "proper driver on > this node, err msg is " << _fsRes.status().ToString(); > } _fs = *_fsRes; auto _stream = > *_fs->OpenInputFile(path); hadoop_fs_holder_.Insert(_fs); //global > holder in arrow::jni::ConcurrentMap, cleared during unload ret = > ORCFileReader::Open( * *_stream* > *, arrow::default_memory_pool(), &reader);* > > > > * if (!ret.ok()) { env->ThrowNew(io_exception_class, > std::string("Failed open file" + path).c_str()); }* > > * return > orc_reader_holder_.Insert(std::shared_ptr<ORCFileReader>(reader.release()));* > *}* > > > JNI also works fine, but at the end of application, I am getting > segmentation fault. > > *Do you have any idea about , looks like some issue with libhdfs > connection close or cleanup ?* > > *stack trace:* > /tmp/tmp3973555041947319188libarrow_orc_jni.so : ()+0xb8b1a3 > /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x153c0 > /lib/x86_64-linux-gnu/libc.so.6 : gsignal()+0xcb > /lib/x86_64-linux-gnu/libc.so.6 : abort()+0x12b > > /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so > : ()+0x90e769 > > /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so > : ()+0xad3803 > > /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so > : JVM_handle_linux_signal()+0x1a5 > > /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so > : ()+0x90b8b8 > /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x153c0 > > /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so > : ()+0x8cac27 > > /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so > : ()+0x8cc50b > > /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so > : ()+0xada661 > > /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so > : ()+0x6c237d > * > /home/legion/ha_devel/hadoop-ecosystem-3x/hadoop-3.1.1/lib/native/libhdfs.so > : ()+0xaa4f* > /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x85a1 > /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x962a > /lib/x86_64-linux-gnu/libc.so.6 : clone()+0x43 > > > > On Wed, 8 Sept 2021 at 04:07, Weston Pace <weston.p...@gmail.com> wrote: > >> I'll just add that a PR in in progress (thanks Joris!) for adding this >> adapter: https://github.com/apache/arrow/pull/10991 >> >> On Tue, Sep 7, 2021 at 12:05 PM Wes McKinney <wesmck...@gmail.com> wrote: >> > >> > I'm missing context but if you're talking about C++/Python, we are >> > currently missing a wrapper interface to the ORC reader in the Arrow >> > datasets library >> > >> > https://github.com/apache/arrow/tree/master/cpp/src/arrow/dataset >> > >> > We have CSV, Arrow (IPC), and Parquet interfaces. >> > >> > But we have an HDFS filesystem implementation and an ORC reader >> > implementation, so mechanically all of the pieces are there but need >> > to be connected together. >> > >> > Thanks, >> > Wes >> > >> > On Tue, Sep 7, 2021 at 8:22 AM Manoj Kumar <man...@zettabolt.com> >> wrote: >> > > >> > > Hi Dev-Community, >> > > >> > > Anyone can help me to guide how to read ORC directly from HDFS to an >> > > arrow dataset. >> > > >> > > Thanks >> > > Manoj >> >