Re: Spark Improvement Proposals

2017-02-17 Thread vaquar khan
I like document and happy to see SPIP draft version however i feel shepherd role is again hurdle in process improvement ,It's like everything depends only on shepherd . Also want to add point that SPIP should be time bound with define SLA else will defeats purpose. Regards, Vaquar khan On

Will .count() always trigger an evaluation of each row?

2017-02-17 Thread Nicholas Chammas
Especially during development, people often use .count() or .persist().count() to force evaluation of all rows ā€” exposing any problems, e.g. due to bad data ā€” and to load data into cache to speed up subsequent operations. But as the optimizer gets smarter, Iā€™m guessing it will eventually learn

Re: Design document - MLlib's statistical package for DataFrames

2017-02-17 Thread Tim Hunter
Hi Brad, this task is focusing on moving the existing algorithms, so that we are held up by parity issues. Do you have some paper suggestions for cardinality? I do not think there is a feature request on JIRA either. Tim On Thu, Feb 16, 2017 at 2:21 PM, bradc wrote: >