Re: Integrating ML/DL frameworks with Spark

2018-05-09 Thread Xiangrui Meng
Shivaram: Yes, we can call it "gang scheduling" or "barrier synchronization". Spark doesn't support it now. The proposal is to have a proper support in Spark's job scheduler, so we can integrate well with MPI-like frameworks. On Tue, May 8, 2018 at 11:17 AM Nan Zhu wrote:

Problem with Spark Master shutting down when zookeeper leader is shutdown

2018-05-09 Thread agateaaa
Dear Spark community, Just wanted to bring this issue up which was filed for Spark 1.6.1 ( https://issues.apache.org/jira/browse/SPARK-15544) but also exists in Spark 2.3.0 (https://issues.apache.org/jira/browse/SPARK-23530) We have run into this on production, where Spark Master shuts down if

Revisiting Online serving of Spark models?

2018-05-09 Thread Holden Karau
Hi y'all, With the renewed interest in ML in Apache Spark now seems like a good a time as any to revisit the online serving situation in Spark ML. DB & other's have done some excellent working moving a lot of the necessary tools into a local linear algebra package that doesn't depend on having a

Re: eager execution and debuggability

2018-05-09 Thread Tim Hunter
The repr() trick is neat when working on a notebook. When working in a library, I used to use an evaluate(dataframe) -> DataFrame function that simply forces the materialization of a dataframe. As Reynold mentions, this is very convenient when working on a lot of chained UDFs, and it is a standard