Hi Matei,
Another thing occurred to me. Will the binary format you're writing sort the
data in numeric order? Or would the decimals have to be decoded for comparison?
Cheers,
Michael
On Oct 12, 2014, at 10:48 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
The fixed-length binary type
Hello,
I'm interested in reading/writing parquet SchemaRDDs that support the Parquet
Decimal converted type. The first thing I did was update the Spark parquet
dependency to version 1.5.0, as this version introduced support for decimals in
parquet. However, conversion between the catalyst
Hi Michael,
I've been working on this in my repo:
https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests with
these features soon, but meanwhile you can try this branch. See
https://github.com/mateiz/spark/compare/decimal for the individual commits that
went into it. It
Hi Matei,
Thanks, I can see you've been hard at work on this! I examined your patch and
do have a question. It appears you're limiting the precision of decimals
written to parquet to those that will fit in a long, yet you're writing the
values as a parquet binary type. Why not write them using
The fixed-length binary type can hold fewer bytes than an int64, though many
encodings of int64 can probably do the right thing. We can look into supporting
multiple ways to do this -- the spec does say that you should at least be able
to read int32s and int64s.
Matei
On Oct 12, 2014, at 8:20