thanks! Is there sample code on how to use these APIs to learn best practices ?
I am looking at https://github.com/apache/arrow/tree/master/cpp/src/arrow/python but that only covers Arrow itself -Sandeep On Sun, Nov 26, 2017 at 9:57 PM, Wes McKinney <wesmck...@gmail.com> wrote: > I think you want to use parquet::arrow::FileWriter::Open > > https://github.com/apache/parquet-cpp/blob/master/src/ > parquet/arrow/writer.h#L112 > > The implementation is here: > > https://github.com/apache/parquet-cpp/blob/master/src/ > parquet/arrow/writer.cc#L992 > > - Wes > > On Sun, Nov 26, 2017 at 8:25 AM, Sandeep Joshi <sanjos...@gmail.com> > wrote: > > This might seem like a dumb question but I am not intimate with the API > yet > > to figure out how to get around this problem. > > > > I have a pre-defined Arrow Schema which I convert to Parquet Schema using > > the "ToParquetSchema" function. This returns a SchemaDescriptor object. > > https://github.com/apache/parquet-cpp/blob/master/src/ > parquet/arrow/schema.h#L80 > > > > ParquetFileWriter on the other hand, expects a shared_ptr<GroupNode> > > https://github.com/apache/parquet-cpp/blob/master/src/ > parquet/file/writer.h#L126 > > > > SchemaDescriptor can return a raw pointer for GroupNode but to pass it to > > the ParquetFileWriter, I need a shared_ptr. This introduces memory > > management complications. I'd rather not create a copy of the GroupNode > in > > order to pass it to ParquetFileWriter. > > > > * // convert arrow schema to parquet schema* > > * std::shared_ptr<SchemaDescriptor> parquet_schema;* > > * std::shared_ptr<::parquet::WriterProperties> properties =* > > * ::parquet::default_writer_properties();* > > * ToParquetSchema(arrow_sch.get(), *properties.get(), > &parquet_schema);* > > > > * // write arrow table to parquet* > > * parquet::schema::GroupNode* g = > > (parquet::schema::GroupNode*)parquet_schema->group_node();* > > * grp_node.reset(g); // Dont want to do this !* > > * std::shared_ptr<::arrow::io::FileOutputStream> sink;* > > * ::arrow::io::FileOutputStream::Open(path, &sink);* > > * std::unique_ptr<FileWriter> arrow_writer(* > > * new FileWriter(pool, ParquetFileWriter::Open(sink, grp_node)));* > > > > * arrow_writer->WriteTable(*new_table_ptr.get(), 65536);* > > > > Is this an API limitation that no one has hit before ? Or I am missing a > > better way of writing parquet files given a pre-defined arrow schema. > > > > -Sandeep >