Re: State of decimal support in Arrow (from/to Parquet Decimal Logicaltype)

Wes McKinney Tue, 29 Oct 2019 08:00:25 -0700

On Tue, Oct 29, 2019 at 3:11 AM <roman.karlstet...@gmail.com> wrote:
>
> Hi Wes,
>
> thanks for the response. There's one thing that is still a little unclear to 
> me:
> I had a look at the code for function WriteArrowSerialize<FLBAType, 
> arrow::Decimal128Type> in the reference you provided. I don't have arrow data 
> in the first place, but as I understand it, I need to have an array of 
> FixedLenByteArrays objects which then point to the actual decimal values in 
> the big_endian_values buffer. Is this the only way to write decimal types or 
> is it also possible to directly provide an array with values to writeBatch()?
>


Could you clarify what you mean by "an array"? If you use the
arrow::Array-based write API then it will invoke this serializer
specialization

https://github.com/apache/arrow/blob/46cdf557eb710f17f71a10609e5f497ca585ae1c/cpp/src/parquet/column_writer.cc#L1569

That's what we're calling (if I'm not mistaken, since I just worked on
this code recently) when writing arrow::Decimal128Array. If you set a
breakpoint with gdb there you can see the call stack

> For the issues, I also found 
> https://issues.apache.org/jira/browse/ARROW-6990, but I'm not sure if this is 
> also related to the issues you created.
>
> Thanks,
> Roman
>
> -----Ursprüngliche Nachricht-----
> Von: Wes McKinney <wesmck...@gmail.com>
> Gesendet: Montag, 28. Oktober 2019 21:11
> An: dev <dev@arrow.apache.org>
> Betreff: Re: State of decimal support in Arrow (from/to Parquet Decimal 
> Logicaltype)
>
> hi Roman,
>
> On Mon, Oct 28, 2019 at 5:56 AM <roman.karlstet...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> >
> >
> > I have a question about the state of decimal support in Arrow when
> > reading from/writing to Parquet.
> >
> > *       Is writing decimals to parquet supposed to work? Are there any
> > examples on how to do this in C++?
>
> Yes, it's supported, the details are here
>
> https://github.com/apache/arrow/blob/46cdf557eb710f17f71a10609e5f497ca585ae1c/cpp/src/parquet/column_writer.cc#L1511
>
> > *       When reading decimals in a parquet file with pyarrow and converting
> > the resulting table to a pandas dataframe, datatype in the cells is
> > "object". As a consequence, performance when doing analysis on this
> > table is suboptimal. Can I somehow directly get the decimals from the
> > parquet file into floats/doubles in a pandas dataframe?
>
> Some work will be required. The cleanest way would be to cast
> decimal128 columns to float32/float64 prior to converting to pandas.
>
> I didn't see an issue for this right away so I opened
>
> https://issues.apache.org/jira/browse/ARROW-7010
>
> I also opened
>
> https://issues.apache.org/jira/browse/ARROW-7011
>
> about going the other way. This would be a useful thing to contribute to the 
> project.
>
> Thanks
> Wes
>
> >
> >
> > Thanks in advance,
> >
> > Roman
> >
> >
> >
>

Re: State of decimal support in Arrow (from/to Parquet Decimal Logicaltype)

Reply via email to