Hi Yanbo,

As long as two models fit into memory of a single machine, there should be no 
problems, so even 16GB machines can handle large models. (master should have 
more memory because it runs LBFGS) In my experiments, I’ve trained the models 
12M and 32M parameters without issues.

Best regards, Alexander

From: Yanbo Liang [mailto:yblia...@gmail.com]
Sent: Sunday, December 27, 2015 2:23 AM
To: Joseph Bradley
Cc: Eugene Morozov; user; d...@spark.apache.org
Subject: Re: SparkML algos limitations question.

Hi Eugene,

AFAIK, the current implementation of MultilayerPerceptronClassifier have some 
scalability problems if the model is very huge (such as >10M), although I think 
the limitation can cover many use cases already.

Yanbo

2015-12-16 6:00 GMT+08:00 Joseph Bradley 
<jos...@databricks.com<mailto:jos...@databricks.com>>:
Hi Eugene,

The maxDepth parameter exists because the implementation uses Integer node IDs 
which correspond to positions in the binary tree.  This simplified the 
implementation.  I'd like to eventually modify it to avoid depending on tree 
node IDs, but that is not yet on the roadmap.

There is not an analogous limit for the GLMs you listed, but I'm not very 
familiar with the perceptron implementation.

Joseph

On Mon, Dec 14, 2015 at 10:52 AM, Eugene Morozov 
<evgeny.a.moro...@gmail.com<mailto:evgeny.a.moro...@gmail.com>> wrote:
Hello!

I'm currently working on POC and try to use Random Forest (classification and 
regression). I also have to check SVM and Multiclass perceptron (other algos 
are less important at the moment). So far I've discovered that Random Forest 
has a limitation of maxDepth for trees and just out of curiosity I wonder why 
such a limitation has been introduced?

An actual question is that I'm going to use Spark ML in production next year 
and would like to know if there are other limitations like maxDepth in RF for 
other algorithms: Logistic Regression, Perceptron, SVM, etc.

Thanks in advance for your time.
--
Be well!
Jean Morozov


Reply via email to