Thanks, Xiangrui, for clarifying the metric and creating that JIRA issue.

I made an error while composing my earlier mail: "paramMap.get(als.regParam)" in my Evaluator actually returns "None". I just happended to use "getOrElse(1.0)" in my tests, which explains why negating the metric did not change anything.

-Stefan

PS: I got an error while sending my previous mail via the web interface, and did not think it got through to the list. So I did not follow up on my problem myself. Sorry for the confusion.


Am 19.05.2015 um 21:54 schrieb Xiangrui Meng:
The documentation needs to be updated to state that higher metric
values are better (https://issues.apache.org/jira/browse/SPARK-7740).
I don't know why if you negate the return value of the Evaluator you
still get the highest regularization parameter candidate. Maybe you
should check the log messages from CrossValidator and see the average
metric values during cross validation. -Xiangrui

On Sat, May 9, 2015 at 12:15 PM, Stefan H. <twel...@gmx.de> wrote:
Hello everyone,

I am stuck with the (experimental, I think) API for machine learning
pipelines. I have a pipeline with just one estimator (ALS) and I want it to
try different values for the regularization parameter. Therefore I need to
supply an Evaluator that returns a value of type Double. I guess this could
be something like accuracy or mean squared error? The only implementation I
found is BinaryClassificationEvaluator, and I did not understand the
computation there.

I could not find detailed documentation so I implemented a dummy Evaluator
that just returns the regularization parameter:

   new Evaluator {
     def evaluate(dataset: DataFrame, paramMap: ParamMap): Double =
       paramMap.get(als.regParam).getOrElse(throw new Exception)
   }

I just wanted to see whether the lower or higher value "wins". On the
resulting model I inspected the chosen regularization parameter this way:

   cvModel.bestModel.fittingParamMap.get(als.regParam)

And it was the highest of my three regularization parameter candidates.
Strange thing is, if I negate the return value of the Evaluator, that line
still returns the highest regularization parameter candidate.

So I am probably working with false assumptions. I'd be grateful if someone
could point me to some documentation or examples, or has a few hints to
share.

Cheers,
Stefan



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-implement-an-Evaluator-for-a-ML-pipeline-tp22830.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to