Re: reading/writing parquet decimal type

2014-10-23 Thread Michael Allman
Hi Matei, Another thing occurred to me. Will the binary format you're writing sort the data in numeric order? Or would the decimals have to be decoded for comparison? Cheers, Michael On Oct 12, 2014, at 10:48 PM, Matei Zaharia matei.zaha...@gmail.com wrote: The fixed-length binary type

reading/writing parquet decimal type

2014-10-12 Thread Michael Allman
Hello, I'm interested in reading/writing parquet SchemaRDDs that support the Parquet Decimal converted type. The first thing I did was update the Spark parquet dependency to version 1.5.0, as this version introduced support for decimals in parquet. However, conversion between the catalyst

Re: reading/writing parquet decimal type

2014-10-12 Thread Matei Zaharia
Hi Michael, I've been working on this in my repo: https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests with these features soon, but meanwhile you can try this branch. See https://github.com/mateiz/spark/compare/decimal for the individual commits that went into it. It

Re: reading/writing parquet decimal type

2014-10-12 Thread Michael Allman
Hi Matei, Thanks, I can see you've been hard at work on this! I examined your patch and do have a question. It appears you're limiting the precision of decimals written to parquet to those that will fit in a long, yet you're writing the values as a parquet binary type. Why not write them using

Re: reading/writing parquet decimal type

2014-10-12 Thread Matei Zaharia
The fixed-length binary type can hold fewer bytes than an int64, though many encodings of int64 can probably do the right thing. We can look into supporting multiple ways to do this -- the spec does say that you should at least be able to read int32s and int64s. Matei On Oct 12, 2014, at 8:20