On Tue, Oct 29, 2019 at 3:11 AM <roman.karlstet...@gmail.com> wrote: > > Hi Wes, > > thanks for the response. There's one thing that is still a little unclear to > me: > I had a look at the code for function WriteArrowSerialize<FLBAType, > arrow::Decimal128Type> in the reference you provided. I don't have arrow data > in the first place, but as I understand it, I need to have an array of > FixedLenByteArrays objects which then point to the actual decimal values in > the big_endian_values buffer. Is this the only way to write decimal types or > is it also possible to directly provide an array with values to writeBatch()? >
Could you clarify what you mean by "an array"? If you use the arrow::Array-based write API then it will invoke this serializer specialization https://github.com/apache/arrow/blob/46cdf557eb710f17f71a10609e5f497ca585ae1c/cpp/src/parquet/column_writer.cc#L1569 That's what we're calling (if I'm not mistaken, since I just worked on this code recently) when writing arrow::Decimal128Array. If you set a breakpoint with gdb there you can see the call stack > For the issues, I also found > https://issues.apache.org/jira/browse/ARROW-6990, but I'm not sure if this is > also related to the issues you created. > > Thanks, > Roman > > -----Ursprüngliche Nachricht----- > Von: Wes McKinney <wesmck...@gmail.com> > Gesendet: Montag, 28. Oktober 2019 21:11 > An: dev <dev@arrow.apache.org> > Betreff: Re: State of decimal support in Arrow (from/to Parquet Decimal > Logicaltype) > > hi Roman, > > On Mon, Oct 28, 2019 at 5:56 AM <roman.karlstet...@gmail.com> wrote: > > > > Hi everyone, > > > > > > > > I have a question about the state of decimal support in Arrow when > > reading from/writing to Parquet. > > > > * Is writing decimals to parquet supposed to work? Are there any > > examples on how to do this in C++? > > Yes, it's supported, the details are here > > https://github.com/apache/arrow/blob/46cdf557eb710f17f71a10609e5f497ca585ae1c/cpp/src/parquet/column_writer.cc#L1511 > > > * When reading decimals in a parquet file with pyarrow and converting > > the resulting table to a pandas dataframe, datatype in the cells is > > "object". As a consequence, performance when doing analysis on this > > table is suboptimal. Can I somehow directly get the decimals from the > > parquet file into floats/doubles in a pandas dataframe? > > Some work will be required. The cleanest way would be to cast > decimal128 columns to float32/float64 prior to converting to pandas. > > I didn't see an issue for this right away so I opened > > https://issues.apache.org/jira/browse/ARROW-7010 > > I also opened > > https://issues.apache.org/jira/browse/ARROW-7011 > > about going the other way. This would be a useful thing to contribute to the > project. > > Thanks > Wes > > > > > > > Thanks in advance, > > > > Roman > > > > > > >