SVMWithSGD sits in the older "mllib" package and is not compatible directly
with the DataFrame API. I suppose one could write a ML-API wrapper around
it.
However, there is LinearSVC in Spark 2.2.x:
http://spark.apache.org/docs/latest/ml-classification-regression.html#linear-support-vector-machine
You should use that instead I would say.
On Fri, 19 Jan 2018 at 13:59 Tomasz Dudek
wrote:
> Hello,
>
> is there any way to use CrossValidation's ParamGrid with SVMWithSGD?
>
> usually, when e.g. using RandomForest you can specify a lot of parameters,
> to automatise the param grid search (when used with CrossValidation)
>
> val algorithm = new RandomForestClassifier()
> val paramGrid = { new ParamGridBuilder()
> .addGrid(algorithm.impurity, Array("gini", "entropy"))
> .addGrid(algorithm.maxDepth, Array(3, 5, 10))
> .addGrid(algorithm.numTrees, Array(2, 3, 5, 15, 50))
> .addGrid(algorithm.minInfoGain, Array(0.01, 0.001))
> .addGrid(algorithm.minInstancesPerNode, Array(10, 50, 500))
> .build()
> }
>
> with SGDWIthSGD however, the parameters are inside GradientDescent. You
> can explicitly tune the params, either by using SGDWithSGD's constructor or
> by calling setters here:
>
> val algorithm = new SVMWithSGD()
> algorithm.optimizer.setMiniBatchFraction(256)
> .setNumIterations(200)
> .setRegParam(0.01)
>
> those two ways however restrict me from using ParamGridBuilder correctly.
>
> There are no such things as algorithm.optimizer.numIterations or
> algorithm.optimizer.regParam, only setters(and ParamGrid requires Params,
> not setters)
>
> I could of course create each SVM model manually, create one huge Pipeline
> with each model saving its result to different column and then manually
> decide which performed the best. It requires a lot of coding and so far
> CrossValidation's ParamGrid did that job for me instead.
>
> Am I missing something? Is it WIP or is there any hack to do that?
>
> Yours,
> Tomasz
>