Re: data source api v2 refactoring

2018-09-01 Thread Mridul Muralidharan
Is it only me or are all others getting Wenchen’s mails ? (Obviously Ryan did :-) ) I did not see it in the mail thread I received or in archives ... [1] Wondering which othersenderswere getting dropped (if yes). Regards Mridul [1]

Re: data source api v2 refactoring

2018-09-01 Thread Ryan Blue
Thanks for clarifying, Wenchen. I think that's what I expected. As for the abstraction, here's the way that I think about it: there are two important parts of a scan: the definition of what will be read, and task sets that actually perform the read. In batch, there's one definition of the scan

Re: code freeze and branch cut for Apache Spark 2.4

2018-09-01 Thread sadhen
https://github.com/apache/spark/pull/22308 https://github.com/apache/spark/pull/22310 These two might be the last fixes for Scala 2.12 :) Please review. 原始邮件 发件人:Sean owensro...@apache.org 收件人:antonkulagaantonkul...@gmail.com 抄送:dev...@spark.apache.org 发送时间:2018年8月31日(周五) 05:00 主题:Re: code

Re: mllib + SQL

2018-09-01 Thread Hemant Bhanawat
SQL in addition to simplicity also provides standard way of analysis across multiple databases. That aspect is something that users would like with machine learning as well. Flexibility of Spark's API is definitely helpful but a simple and standard way for new users is desired when it comes to