Re: [c++][compute]Is there any other way to use Join besides Acero?

2022-09-14 Thread Weston Pace
Within Arrow-C++ that is the only way I am aware of. You might be able to use DuckDb. It should be able to scan parquet files. Is this the same program that you shared before? Were you able to figure out threading? Can you create a JIRA with some sample input files and a reproducible example?

[c++][compute]Is there any other way to use Join besides Acero??

2022-09-14 Thread 1057445597
Acero performs poorly, and coredump occurs frequently?? In the scenario I'm working on, I'll read one Parquet file and then several other Parquet files. These files will have the same column name (UUID). I need to join (by UUID), project (remove UUID), and filter (some custom filtering) the

Re: [C++] How to write a null value to a int64 column with Parquet StreamWriter?

2022-09-14 Thread Micah Kornfield
> > terminate called after throwing an instance of 'parquet::ParquetException' > what(): Column converted type mismatch. Column 'field_name' has > converted type[NONE] not 'INT_64' I think this is probably a bug in the streaming library where it should also be checking on LogicalType, it has

Re: [C++] How to write a null value to a int64 column with Parquet StreamWriter?

2022-09-14 Thread Arun Joseph
I've tried the following schema: fields.push_back( parquet::schema::PrimitiveNode::Make( "field_name", parquet::Repetition::OPTIONAL, parquet::LogicalType::Timestamp(true, parquet::LogicalType::TimeUnit::NANOS),

Re: [C++] How to write a null value to a int64 column with Parquet StreamWriter?

2022-09-14 Thread Micah Kornfield
I'm not sure how it works with null elements but pass LogicalType of timestamp with isAdjustedToUtc=true and nanoseconds unit when creating the schema would be the most likely thing to work. The fact that nullopt doesn't work, seems like an oversight that might be nice to address if you would

Re: Run Arrow Flight server along with existing grpc server

2022-09-14 Thread Matthew Topol
This is also possible in Go, just FYI On Tue, Sep 13 2022 at 05:07:55 PM -0700, Haojin Gui wrote: Thank you very much, David! On Tue, Sep 13, 2022 at 4:56 PM David Li > wrote: __ Hi Haojin, This is possible in C++ and Java but not Python (because of how gRPC

Re: [C++] How to write a null value to a int64 column with Parquet StreamWriter?

2022-09-14 Thread Arun Joseph
Hi Micah, I couldn't find arrow::util::Optional::nullopt but I did find arrow::util::nullopt which also did not seem to work. However, I then found arrow::util::optional() right afterwhich seems to output NaNs! I do see that the resulting dataframe when loaded in pandas has the column dtype as

[C++][Gandiva] string expression evaluation performance issue using mimalloc

2022-09-14 Thread Jiangtao Peng
Hi all, Arrow use jemalloc as default memory allocator. For some reason, I am going to use mimalloc instead. But there seems have big performance difference between two memory allocators. Here are my steps. I use simple compile options: *-DCMAKE_BUILD_TYPE*=debug \