Re: Apache Spark - MLLib challenges

2017-09-23 Thread vaquar khan
MLIB is old RDD-based API since Apache Spark 2 is recommended to use dataset based APIs to get good performance and introduce ML. ML contains new API build around Dataset and ML Pipelines ,mllib is slowly being deprecated (this already happened in case of linear regression) MLIB currently

Re: Apache Spark - MLLib challenges

2017-09-23 Thread Koert Kuipers
our main challenge has been the lack of support for missing values generally On Sat, Sep 23, 2017 at 3:41 AM, Irfan Kabli wrote: > Dear All, > > We are looking to position MLLib in our organisation for machine learning > tasks and are keen to understand if their are

Re: Apache Spark - MLLib challenges

2017-09-23 Thread Aseem Bansal
This is something I wrote specifically for the challenges that we faced when taking spark ml models to production http://www.tothenew.com/blog/when-you-take-your-machine-learning-models-to-production-for-real-time-predictions/ On Sat, Sep 23, 2017 at 1:33 PM, Jörn Franke

Re: Apache Spark - MLLib challenges

2017-09-23 Thread Jörn Franke
As far as I know there is currently no encryption in-memory in Spark. There are some research projects to create secure enclaves in-memory based on Intel sgx, but there is still a lot to do in terms of performance and security objectives. The more interesting question is why would you need this

Apache Spark - MLLib challenges

2017-09-23 Thread Irfan Kabli
Dear All, We are looking to position MLLib in our organisation for machine learning tasks and are keen to understand if their are any challenges that you might have seen with MLLib in production. We will be going with the pure open-source approach here, rather than using one of the hadoop