Re: Maintenance releases for SPARK-23852?

2018-04-17 Thread Dongjoon Hyun
Since it's a backport from master to branch-2.3 for ORC 1.4.3, I made a backport PR. https://github.com/apache/spark/pull/21093 Thank you for raising this issues and confirming, Henry and Xiao. :) Bests, Dongjoon. On Tue, Apr 17, 2018 at 12:01 AM, Xiao Li wrote: > Yes,

unsubscribe

2018-04-17 Thread 韩盼
unsubscribe - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Sort-merge join improvement

2018-04-17 Thread Petar Zecevic
Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of Spark's sort-merge join (SMJ) which has improved performance of our jobs considerably and we would like to know if Spark community thinks it would be useful to include this in the

Re: [MLLib] Logistic Regression and standadization

2018-04-17 Thread Weichen Xu
Not a bug. When disabling standadization, mllib LR will still do standadization for features, but it will scale the coefficients back at the end (after training finished). So it will get the same result with no standadization training. The purpose of it is to improve the rate of convergence. So

Re: [discuss][data source v2] remove type parameter in DataReader/WriterFactory

2018-04-17 Thread Wenchen Fan
Yea definitely not. The only requirement is, the DataReader/WriterFactory must support at least one DataFormat. > how are we going to express capability of the given reader of its supported format(s), or specific support for each of “real-time data in row format, and history data in columnar