Hi Wes,

Thanks,

*[ Part 1 ]*
*C++ HDFS/ORC  [Completed]*
Steps which I followed :
1) arrow::fs::HadoopFileSystem --> create a hadoop FS
2) std::shared_ptr<io::RandomAccessFile> -->then create a stream
3) Pass that stream to adapters::orc::ORCFileReader

*[Part 2 ]*
*C++ HDFS/ORC via Java JNI [Partial Completed]*
*Follow same approach in orc.jni_wrapper*
1) arrow::fs::HadoopFileSystem --> create a hadoop FS
2) std::shared_ptr<io::RandomAccessFile> -->then create a stream
3) Pass that stream to adapters::orc::ORCFileReader

*<jni snippet>*
 std::unique_ptr<ORCFileReader> reader;
 arrow::Status ret;
 if (path.find("hdfs://") == 0) {














*   arrow::fs::HdfsOptions options_;        options_ =
*arrow::fs::HdfsOptions::FromUri(path);        auto _fsRes =
arrow::fs::HadoopFileSystem::Make(options_);        if (!_fsRes.ok()) {
        std::cerr                    << "HadoopFileSystem::Make failed, it
is possible when we don't have "                       "proper driver on
this node, err msg is "                    << _fsRes.status().ToString();
      }         _fs = *_fsRes;        auto _stream =
*_fs->OpenInputFile(path);        hadoop_fs_holder_.Insert(_fs); //global
holder in arrow::jni::ConcurrentMap, cleared during unload         ret =
ORCFileReader::Open(                * *_stream*
*,                arrow::default_memory_pool(), &reader);*



*  if (!ret.ok()) {        env->ThrowNew(io_exception_class,
std::string("Failed open file" + path).c_str());    }*

*   return
orc_reader_holder_.Insert(std::shared_ptr<ORCFileReader>(reader.release()));*
*}*


JNI also works fine, but at the end of application, I am getting
segmentation fault.

*Do you have any idea about , looks like some issue with libhdfs connection
close or cleanup ?*

*stack trace:*
  /tmp/tmp3973555041947319188libarrow_orc_jni.so : ()+0xb8b1a3
  /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x153c0
  /lib/x86_64-linux-gnu/libc.so.6 : gsignal()+0xcb
  /lib/x86_64-linux-gnu/libc.so.6 : abort()+0x12b

/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
: ()+0x90e769

/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
: ()+0xad3803

/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
: JVM_handle_linux_signal()+0x1a5

/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
: ()+0x90b8b8
  /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x153c0

/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
: ()+0x8cac27

/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
: ()+0x8cc50b

/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
: ()+0xada661

/home/legion/ha_devel/hadoop-ecosystem-3x/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so
: ()+0x6c237d
*
/home/legion/ha_devel/hadoop-ecosystem-3x/hadoop-3.1.1/lib/native/libhdfs.so
: ()+0xaa4f*
  /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x85a1
  /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x962a
  /lib/x86_64-linux-gnu/libc.so.6 : clone()+0x43



On Wed, 8 Sept 2021 at 04:07, Weston Pace <weston.p...@gmail.com> wrote:

> I'll just add that a PR in in progress (thanks Joris!) for adding this
> adapter: https://github.com/apache/arrow/pull/10991
>
> On Tue, Sep 7, 2021 at 12:05 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > I'm missing context but if you're talking about C++/Python, we are
> > currently missing a wrapper interface to the ORC reader in the Arrow
> > datasets library
> >
> > https://github.com/apache/arrow/tree/master/cpp/src/arrow/dataset
> >
> > We have CSV, Arrow (IPC), and Parquet interfaces.
> >
> > But we have an HDFS filesystem implementation and an ORC reader
> > implementation, so mechanically all of the pieces are there but need
> > to be connected together.
> >
> > Thanks,
> > Wes
> >
> > On Tue, Sep 7, 2021 at 8:22 AM Manoj Kumar <man...@zettabolt.com> wrote:
> > >
> > > Hi Dev-Community,
> > >
> > > Anyone can help me to guide how to read ORC directly from HDFS to an
> > > arrow dataset.
> > >
> > > Thanks
> > > Manoj
>

Reply via email to