Re: [ML] Allow CrossValidation ParamGrid on SVMWithSGD

2018-01-19 Thread Nick Pentreath
SVMWithSGD sits in the older "mllib" package and is not compatible directly
with the DataFrame API. I suppose one could write a ML-API wrapper around
it.

However, there is LinearSVC in Spark 2.2.x:
http://spark.apache.org/docs/latest/ml-classification-regression.html#linear-support-vector-machine

You should use that instead I would say.

On Fri, 19 Jan 2018 at 13:59 Tomasz Dudek 
wrote:

> Hello,
>
> is there any way to use CrossValidation's ParamGrid with SVMWithSGD?
>
> usually, when e.g. using RandomForest you can specify a lot of parameters,
> to automatise the param grid search (when used with CrossValidation)
>
> val algorithm = new RandomForestClassifier()
> val paramGrid = { new ParamGridBuilder()
>   .addGrid(algorithm.impurity, Array("gini", "entropy"))
>   .addGrid(algorithm.maxDepth, Array(3, 5, 10))
>   .addGrid(algorithm.numTrees, Array(2, 3, 5, 15, 50))
>   .addGrid(algorithm.minInfoGain, Array(0.01, 0.001))
>   .addGrid(algorithm.minInstancesPerNode, Array(10, 50, 500))
>   .build()
> }
>
> with SGDWIthSGD however, the parameters are inside GradientDescent. You
> can explicitly tune the params, either by using SGDWithSGD's constructor or
> by calling setters here:
>
> val algorithm = new SVMWithSGD()
> algorithm.optimizer.setMiniBatchFraction(256)
>   .setNumIterations(200)
>   .setRegParam(0.01)
>
> those two ways however restrict me from using ParamGridBuilder correctly.
>
> There are no such things as algorithm.optimizer.numIterations or
> algorithm.optimizer.regParam, only setters(and ParamGrid requires Params,
> not setters)
>
> I could of course create each SVM model manually, create one huge Pipeline
> with each model saving its result to different column and then manually
> decide which performed the best. It requires a lot of coding and so far
> CrossValidation's ParamGrid did that job for me instead.
>
> Am I missing something? Is it WIP or is there any hack to do that?
>
> Yours,
> Tomasz
>


[ML] Allow CrossValidation ParamGrid on SVMWithSGD

2018-01-19 Thread Tomasz Dudek
Hello,

is there any way to use CrossValidation's ParamGrid with SVMWithSGD?

usually, when e.g. using RandomForest you can specify a lot of parameters,
to automatise the param grid search (when used with CrossValidation)

val algorithm = new RandomForestClassifier()
val paramGrid = { new ParamGridBuilder()
  .addGrid(algorithm.impurity, Array("gini", "entropy"))
  .addGrid(algorithm.maxDepth, Array(3, 5, 10))
  .addGrid(algorithm.numTrees, Array(2, 3, 5, 15, 50))
  .addGrid(algorithm.minInfoGain, Array(0.01, 0.001))
  .addGrid(algorithm.minInstancesPerNode, Array(10, 50, 500))
  .build()
}

with SGDWIthSGD however, the parameters are inside GradientDescent. You can
explicitly tune the params, either by using SGDWithSGD's constructor or by
calling setters here:

val algorithm = new SVMWithSGD()
algorithm.optimizer.setMiniBatchFraction(256)
  .setNumIterations(200)
  .setRegParam(0.01)

those two ways however restrict me from using ParamGridBuilder correctly.

There are no such things as algorithm.optimizer.numIterations or
algorithm.optimizer.regParam, only setters(and ParamGrid requires Params,
not setters)

I could of course create each SVM model manually, create one huge Pipeline
with each model saving its result to different column and then manually
decide which performed the best. It requires a lot of coding and so far
CrossValidation's ParamGrid did that job for me instead.

Am I missing something? Is it WIP or is there any hack to do that?

Yours,
Tomasz