thanks!   Is there sample code on how to use these APIs to learn best
practices ?

I am looking at
https://github.com/apache/arrow/tree/master/cpp/src/arrow/python
but that only covers Arrow itself

-Sandeep

On Sun, Nov 26, 2017 at 9:57 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> I think you want to use parquet::arrow::FileWriter::Open
>
> https://github.com/apache/parquet-cpp/blob/master/src/
> parquet/arrow/writer.h#L112
>
> The implementation is here:
>
> https://github.com/apache/parquet-cpp/blob/master/src/
> parquet/arrow/writer.cc#L992
>
> - Wes
>
> On Sun, Nov 26, 2017 at 8:25 AM, Sandeep Joshi <sanjos...@gmail.com>
> wrote:
> > This might seem like a dumb question but I am not intimate with the API
> yet
> > to figure out how to get around this problem.
> >
> > I have a pre-defined Arrow Schema which I convert to Parquet Schema using
> > the "ToParquetSchema" function.  This returns a SchemaDescriptor object.
> > https://github.com/apache/parquet-cpp/blob/master/src/
> parquet/arrow/schema.h#L80
> >
> > ParquetFileWriter on the other hand, expects a shared_ptr<GroupNode>
> > https://github.com/apache/parquet-cpp/blob/master/src/
> parquet/file/writer.h#L126
> >
> > SchemaDescriptor can return a raw pointer for GroupNode but to pass it to
> > the ParquetFileWriter, I need a shared_ptr.   This introduces memory
> > management complications.  I'd rather not create a copy of the GroupNode
> in
> > order to pass it to ParquetFileWriter.
> >
> >  * // convert arrow schema to parquet schema*
> > *  std::shared_ptr<SchemaDescriptor> parquet_schema;*
> > *  std::shared_ptr<::parquet::WriterProperties> properties =*
> > *    ::parquet::default_writer_properties();*
> > *  ToParquetSchema(arrow_sch.get(), *properties.get(),
> &parquet_schema);*
> >
> > *  // write arrow table to parquet*
> > *  parquet::schema::GroupNode* g =
> > (parquet::schema::GroupNode*)parquet_schema->group_node();*
> > *  grp_node.reset(g);  // Dont want to do this !*
> > *  std::shared_ptr<::arrow::io::FileOutputStream> sink;*
> > *  ::arrow::io::FileOutputStream::Open(path, &sink);*
> > *  std::unique_ptr<FileWriter> arrow_writer(*
> > *    new FileWriter(pool, ParquetFileWriter::Open(sink, grp_node)));*
> >
> > *  arrow_writer->WriteTable(*new_table_ptr.get(), 65536);*
> >
> > Is this an API limitation that no one has hit before ? Or I am missing a
> > better way of writing parquet files given a pre-defined arrow schema.
> >
> > -Sandeep
>

Reply via email to