[ 
https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804124#comment-16804124
 ] 

Bryan Cutler commented on SPARK-27293:
--------------------------------------

Setting the seed like in your example for randomSplit and the regressor will 
change the input data and algorithm initialization, since these operations are 
non-deterministic. So it is not surprising that the result might end up 
different. If you have sufficient data and get blatantly wrong results as 
compared to another implementation of the algorithm, then there might be an 
issue. From what I can see here, there doesn't seem to be a problem.

> I am interested in finding out if there is a bug in the implementation of 
> RandomForests. The Issue is when applying a seed and getting different 
> results than other people from my class when applying it to the same data. 
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27293
>                 URL: https://issues.apache.org/jira/browse/SPARK-27293
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Martin Skauen
>            Priority: Major
>
> I am calculating the RMSE metric like this:
> {code:java}
> (trainingData, testData) = data.randomSplit([0.7, 0.3], 313)
> from pyspark.ml.regression import RandomForestRegressor
> rfr = RandomForestRegressor(labelCol="labels", featuresCol="features", 
> maxDepth=5, numTrees=3, seed = 313)
> from pyspark.ml.evaluation import RegressionEvaluator
> evaluator = RegressionEvaluator\
> (labelCol="labels", predictionCol="prediction", metricName="rmse")
> rmse = evaluator.evaluate(predictions)
> print("RMSE = %g " % rmse)
> {code}
> I am setting the seed. For seed = 50 and also for other seeds I get exact 
> same RMSE as people from class. I set seed to 313 and it is giving me 
> different value. What could be the issue here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to