[ 
https://issues.apache.org/jira/browse/SPARK-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-15153:
--------------------------------
    Description: 
When the type of label of dataset is numeric, SparkR spark.naiveBayes will 
throw error when training. This bug is easy to reproduce:
{code}
t <- as.data.frame(Titanic)
t1 <- t[t$Freq > 0, -5]
t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
t2 <- t1[-4]
df <- suppressWarnings(createDataFrame(sqlContext, t2))
m <- spark.naiveBayes(df, NumericSurvived ~ .)

16/05/05 03:26:17 ERROR RBackendHandler: fit on 
org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
  java.lang.ClassCastException: 
org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to 
org.apache.spark.ml.attribute.NominalAttribute
        at 
org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
        at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
        at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
        at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
        at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invo
{code}

In RFormula, the response variable type can be numeric or string. If it's 
string, RFormula will transform it to DoubleType by StringIndexer; otherwise, 
RFormula will directly use it at model training (and assume it was numbered 
from 0, ..., maxLabelIndex). When we extract labels at SparkR naiveBayes 
wrapper, we should handle it according the type of the response variable 
(string or numeric).

  was:
When the type of label of dataset is numeric, SparkR spark.naiveBayes will 
throw error when training. This bug is easy to reproduce:
{code}
t <- as.data.frame(Titanic)
t1 <- t[t$Freq > 0, -5]
t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
t2 <- t1[-4]
df <- suppressWarnings(createDataFrame(sqlContext, t2))
m <- spark.naiveBayes(df, NumericSurvived ~ .)

16/05/05 03:26:17 ERROR RBackendHandler: fit on 
org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
  java.lang.ClassCastException: 
org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to 
org.apache.spark.ml.attribute.NominalAttribute
        at 
org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
        at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
        at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
        at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
        at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invo
{code}

In RFormula, the response variable type can be numeric or string. If it's 
string, RFormula will transform it to DoubleType by StringIndexer; otherwise, 
RFormula will assume it number from 0, ..., maxLabelIndex. We should use 
different methods to extract labels from the label column metadata.


> SparkR spark.naiveBayes error when label is numeric type
> --------------------------------------------------------
>
>                 Key: SPARK-15153
>                 URL: https://issues.apache.org/jira/browse/SPARK-15153
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, SparkR
>            Reporter: Yanbo Liang
>
> When the type of label of dataset is numeric, SparkR spark.naiveBayes will 
> throw error when training. This bug is easy to reproduce:
> {code}
> t <- as.data.frame(Titanic)
> t1 <- t[t$Freq > 0, -5]
> t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
> t2 <- t1[-4]
> df <- suppressWarnings(createDataFrame(sqlContext, t2))
> m <- spark.naiveBayes(df, NumericSurvived ~ .)
> 16/05/05 03:26:17 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
>   java.lang.ClassCastException: 
> org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to 
> org.apache.spark.ml.attribute.NominalAttribute
>       at 
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
>       at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
>       at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
>       at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
>       at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>       at io.netty.channel.AbstractChannelHandlerContext.invo
> {code}
> In RFormula, the response variable type can be numeric or string. If it's 
> string, RFormula will transform it to DoubleType by StringIndexer; otherwise, 
> RFormula will directly use it at model training (and assume it was numbered 
> from 0, ..., maxLabelIndex). When we extract labels at SparkR naiveBayes 
> wrapper, we should handle it according the type of the response variable 
> (string or numeric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to