date:20170730

Re: Thoughts on release cadence?

2017-07-30 Thread Reynold Xin

This is reasonable ... +1


On Sun, Jul 30, 2017 at 2:19 AM, Sean Owen  wrote:

> The project had traditionally posted some guidance about upcoming
> releases. The last release cycle was about 6 months. What about penciling
> in December 2017 for 2.3.0? http://spark.apache.org/versioning-policy.html
>

Failing to write a data-frame containing a UDT to parquet format

2017-07-30 Thread Erik Erlandson

I'm trying to support parquet i/o for data-frames that contain a UDT (for
t-digests). The UDT is defined here:

https://github.com/erikerlandson/isarn-sketches-spark/blob/feature/pyspark/src/main/scala/org/apache/spark/isarnproject/sketches/udt/TDigestUDT.scala#L37

I can read and write using 'objectFile', but when I try to use '
...write.parquet(...)' I'm getting failures I can't make sense of.  The
full stack-dump is here:
https://gist.github.com/erikerlandson/054652fc2d34ef896717124991196c0e

Following is the first portion of the dump.  The associated error message
is: "failure: `TimestampType' expected but `{' found"

scala> val data = sc.parallelize(Seq(1,2,3,4,5)).toDF("x")
data: org.apache.spark.sql.DataFrame = [x: int]

scala> val udaf = tdigestUDAF[Double].maxDiscrete(10)
udaf: org.isarnproject.sketches.udaf.TDigestUDAF[Double] =
TDigestUDAF(0.5,10)

scala> val agg = data.agg(udaf($"x").alias("tdigest"))
agg: org.apache.spark.sql.DataFrame = [tdigest: tdigest]

scala> agg.show()
++
| tdigest|
++
|TDigestSQL(TDiges...|
++

scala> agg.write.parquet("/tmp/agg.parquet")
2017-07-30 13:32:13 ERROR Utils:91 - Aborting task
java.lang.IllegalArgumentException: Unsupported dataType:
{"type":"struct","fields":[{"name":"tdigest","type":{"type":"udt","class":"org.apache.spark.isarnproject.sketches.udt.TDigestUDT$","pyClass":"isarnproject.sketches.udt.tdigest.TDigestUDT","sqlType":{"type":"struct","fields":[{"name":"delta","type":"double","nullable":false,"metadata":{}},{"name":"maxDiscrete","type":"integer","nullable":false,"metadata":{}},{"name":"nclusters","type":"integer","nullable":false,"metadata":{}},{"name":"clustX","type":{"type":"array","elementType":"double","containsNull":false},"nullable":false,"metadata":{}},{"name":"clustM","type":{"type":"array","elementType":"double","containsNull":false},"nullable":false,"metadata":{}}]}},"nullable":true,"metadata":{}}]},
[1.1] failure: `TimestampType' expected but `{' found

Re: Thoughts on release cadence?

2017-07-30 Thread Dongjoon Hyun

+1

Bests,
Dongjoon

On Sun, Jul 30, 2017 at 02:20 Sean Owen  wrote:

> The project had traditionally posted some guidance about upcoming
> releases. The last release cycle was about 6 months. What about penciling
> in December 2017 for 2.3.0? http://spark.apache.org/versioning-policy.html
>

Thoughts on release cadence?

2017-07-30 Thread Sean Owen

The project had traditionally posted some guidance about upcoming releases.
The last release cycle was about 6 months. What about penciling in December
2017 for 2.3.0? http://spark.apache.org/versioning-policy.html

Re: Thoughts on release cadence?

Failing to write a data-frame containing a UDT to parquet format

Re: Thoughts on release cadence?

Thoughts on release cadence?

4 matches

Site Navigation

Mail list logo

Footer information