[ 
https://issues.apache.org/jira/browse/SPARK-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360987#comment-15360987
 ] 

Zhang Mengqi commented on SPARK-16064:
--------------------------------------

Thank you very much!


> Fix the GLM error caused by NA produced by reweight function
> ------------------------------------------------------------
>
>                 Key: SPARK-16064
>                 URL: https://issues.apache.org/jira/browse/SPARK-16064
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Zhang Mengqi
>            Assignee: Yanbo Liang
>            Priority: Minor
>
> This case happens when users run GLM in with SparkR, the same dataset runs 
> GLM well in native R.
> When users run the GLM model using glm with family of poisson, it generates a 
> assertion errors by NA produced by reweight function.
> 16/06/20 16:40:22 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
>       at scala.Predef$.assert(Predef.scala:170)
>       at 
> org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
>       at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
>       at 
> org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
>       at 
> org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
>       at 
> org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
>       at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
>       at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>       at scala.collection.Abstra
> P.S The dataset is about a city ride flow between several planning area in 
> Singapore.
> ride_flow_exp <- glm(flow~Origin+Destination+distance,ride_flow,family = 
> poisson(link = "log"))
> SparkDataFrame[Origin:string, Destination:string, flow:double, Oi:int, 
> Dj:int, distance:double]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to