date:20161120

Re: Analyzing and reusing cached Datasets

2016-11-20 Thread Jacek Laskowski

Hi Michael, Thanks a lot for your prompt answer. I greatly appreciate it. Having said that, I think we might be...cough...cough...wrong :) I think the "issue" is in QueryPlan.sameResult [1] as its scaladoc says: * Since its likely undecidable to generally determine if two given plans will

Re: Develop custom Estimator / Transformer for pipeline

2016-11-20 Thread Georg Heiler

The estimator should perform data cleaning tasks. This means some rows will be dropped, some columns dropped, some columns added, some values replaced in existing columns. IT should also store the mean or min for some numeric columns as a NaN replacement. However, override def

github mirroring is broken

2016-11-20 Thread Reynold Xin

FYI Github mirroring from Apache's official git repo to GitHub is broken since Sat Nov 19, and as a result GitHub is now stale. Merged pull requests won't show up in GitHub until ASF infra fixes the issue.

Re: OutOfMemoryError on parquet SnappyDecompressor

2016-11-20 Thread Aniket

Was anyone able find a solution or recommended conf for this? I am running into the same "java.lang.OutOfMemoryError: Direct buffer memory" but during snappy compression. Thanks, Aniket On Tue, Sep 23, 2014 at 7:04 PM Aaron Davidson [via Apache Spark Developers List]

Re:Re: Multiple streaming aggregations in structured streaming

2016-11-20 Thread Xinyu Zhang

MapWithState is also very useful. I want to calculate UV in real time, but "distinct count" and "multiple streaming aggregations" are not supported. Is there any method to calculate real-time UV in the current version? At 2016-11-19 06:01:45, "Michael Armbrust"

Re: Analyzing and reusing cached Datasets

Re: Develop custom Estimator / Transformer for pipeline

github mirroring is broken

Re: OutOfMemoryError on parquet SnappyDecompressor

Re:Re: Multiple streaming aggregations in structured streaming

5 matches

Site Navigation

Mail list logo

Footer information