On further analysis :-

==114164== Process terminating with default action of signal 6 (SIGABRT)
==114164==    at 0x4AD118B: raise (raise.c:51)
==114164==    by 0x4AB092D: abort (abort.c:100)
==114164==    by 0x598D768: os::abort(bool) (in
/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so)
==114164==    by 0x5B52802: VMError::report_and_die() (in
/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so)
==114164==    by 0x59979F4: JVM_handle_linux_signal (in
/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so)
==114164==    by 0x598A8B7: signalHandler(int, siginfo*, void*) (in
/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so)
==114164==    by 0x485F3BF: ??? (in /usr/lib/x86_64-linux-gnu/
libpthread-2.31.so)
==114164==    by 0x5949C26: Monitor::ILock(Thread*) [clone .part.2] (in
/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so)
==114164==    by 0x594B50A: Monitor::lock_without_safepoint_check() (in
/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so)
==114164==    by 0x5B59660: VM_Exit::wait_if_vm_exited() (in
/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so)
==114164==    by 0x574137C: jni_DetachCurrentThread (in
/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so)
==114164==    by 0x4140AA4E: hdfsThreadDestructor (thread_local_storage.c:53

It turned out that the issue was in libhdfs, so I fixed that.

Now ORC JNI also works fine

There are many features missing in ORC-JN like reading full split or index
based reading etc etc
Do we have any plan to support those ?


On Wed, 8 Sept 2021 at 22:06, Manoj Kumar <man...@zettabolt.com> wrote:

> Hi Wes,
>
> Thanks,
>
> *[ Part 1 ]*
> *C++ HDFS/ORC  [Completed]*
> Steps which I followed :
> 1) arrow::fs::HadoopFileSystem --> create a hadoop FS
> 2) std::shared_ptr<io::RandomAccessFile> -->then create a stream
> 3) Pass that stream to adapters::orc::ORCFileReader
>
> *[Part 2 ]*
> *C++ HDFS/ORC via Java JNI [Partial Completed]*
> *Follow same approach in orc.jni_wrapper*
> 1) arrow::fs::HadoopFileSystem --> create a hadoop FS
> 2) std::shared_ptr<io::RandomAccessFile> -->then create a stream
> 3) Pass that stream to adapters::orc::ORCFileReader
>
> *<jni snippet>*
>  std::unique_ptr<ORCFileReader> reader;
>  arrow::Status ret;
>  if (path.find("hdfs://") == 0) {
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *   arrow::fs::HdfsOptions options_;        options_ =
> *arrow::fs::HdfsOptions::FromUri(path);        auto _fsRes =
> arrow::fs::HadoopFileSystem::Make(options_);        if (!_fsRes.ok()) {
>         std::cerr                    << "HadoopFileSystem::Make failed, it
> is possible when we don't have "                       "proper driver on
> this node, err msg is "                    << _fsRes.status().ToString();
>       }         _fs = *_fsRes;        auto _stream =
> *_fs->OpenInputFile(path);        hadoop_fs_holder_.Insert(_fs); //global
> holder in arrow::jni::ConcurrentMap, cleared during unload         ret =
> ORCFileReader::Open(                * *_stream*
> *,                arrow::default_memory_pool(), &reader);*
>
>
>
> *  if (!ret.ok()) {        env->ThrowNew(io_exception_class,
> std::string("Failed open file" + path).c_str());    }*
>
> *   return
> orc_reader_holder_.Insert(std::shared_ptr<ORCFileReader>(reader.release()));*
> *}*
>
>
> JNI also works fine, but at the end of application, I am getting
> segmentation fault.
>
> *Do you have any idea about , looks like some issue with libhdfs
> connection close or cleanup ?*
>
> *stack trace:*
>   /tmp/tmp3973555041947319188libarrow_orc_jni.so : ()+0xb8b1a3
>   /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x153c0
>   /lib/x86_64-linux-gnu/libc.so.6 : gsignal()+0xcb
>   /lib/x86_64-linux-gnu/libc.so.6 : abort()+0x12b
>
> /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
> : ()+0x90e769
>
> /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
> : ()+0xad3803
>
> /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
> : JVM_handle_linux_signal()+0x1a5
>
> /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
> : ()+0x90b8b8
>   /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x153c0
>
> /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
> : ()+0x8cac27
>
> /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
> : ()+0x8cc50b
>
> /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
> : ()+0xada661
>
> /home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
> : ()+0x6c237d
> *
> /home/legion/ha_devel/hadoop-ecosystem-3x/hadoop-3.1.1/lib/native/libhdfs.so
> : ()+0xaa4f*
>   /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x85a1
>   /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x962a
>   /lib/x86_64-linux-gnu/libc.so.6 : clone()+0x43
>
>
>
> On Wed, 8 Sept 2021 at 04:07, Weston Pace <weston.p...@gmail.com> wrote:
>
>> I'll just add that a PR in in progress (thanks Joris!) for adding this
>> adapter: https://github.com/apache/arrow/pull/10991
>>
>> On Tue, Sep 7, 2021 at 12:05 PM Wes McKinney <wesmck...@gmail.com> wrote:
>> >
>> > I'm missing context but if you're talking about C++/Python, we are
>> > currently missing a wrapper interface to the ORC reader in the Arrow
>> > datasets library
>> >
>> > https://github.com/apache/arrow/tree/master/cpp/src/arrow/dataset
>> >
>> > We have CSV, Arrow (IPC), and Parquet interfaces.
>> >
>> > But we have an HDFS filesystem implementation and an ORC reader
>> > implementation, so mechanically all of the pieces are there but need
>> > to be connected together.
>> >
>> > Thanks,
>> > Wes
>> >
>> > On Tue, Sep 7, 2021 at 8:22 AM Manoj Kumar <man...@zettabolt.com>
>> wrote:
>> > >
>> > > Hi Dev-Community,
>> > >
>> > > Anyone can help me to guide how to read ORC directly from HDFS to an
>> > > arrow dataset.
>> > >
>> > > Thanks
>> > > Manoj
>>
>

Reply via email to