[jira] [Commented] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289738#comment-15289738 ] Jakob Odersky commented on SPARK-13581: --- I can't reproduce it anymore either > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Critical > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is in spark repository > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288815#comment-15288815 ] Sean Owen commented on SPARK-13581: --- [~jodersky] do you think this is still a problem? > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Critical > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is in spark repository > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282520#comment-15282520 ] Sandeep Singh commented on SPARK-13581: --- Can't seem to reproduce on current master {code} scala> val df = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") df: org.apache.spark.sql.DataFrame = [label: double, features: vector] scala> df.select(df("features")).show() ++ |features| ++ |(692,[127,128,129...| |(692,[158,159,160...| |(692,[124,125,126...| |(692,[152,153,154...| |(692,[151,152,153...| |(692,[129,130,131...| |(692,[158,159,160...| |(692,[99,100,101,...| |(692,[154,155,156...| |(692,[127,128,129...| |(692,[154,155,156...| |(692,[153,154,155...| |(692,[151,152,153...| |(692,[129,130,131...| |(692,[154,155,156...| |(692,[150,151,152...| |(692,[124,125,126...| |(692,[152,153,154...| |(692,[97,98,99,12...| |(692,[124,125,126...| ++ only showing top 20 rows scala> df.cache() res1: df.type = [label: double, features: vector] scala> df.select(df("features")).show() 16/05/13 13:37:27 WARN Executor: 1 block locks were not released by TID = 9: [rdd_11_0] ++ |features| ++ |(692,[127,128,129...| |(692,[158,159,160...| |(692,[124,125,126...| |(692,[152,153,154...| |(692,[151,152,153...| |(692,[129,130,131...| |(692,[158,159,160...| |(692,[99,100,101,...| |(692,[154,155,156...| |(692,[127,128,129...| |(692,[154,155,156...| |(692,[153,154,155...| |(692,[151,152,153...| |(692,[129,130,131...| |(692,[154,155,156...| |(692,[150,151,152...| |(692,[124,125,126...| |(692,[152,153,154...| |(692,[97,98,99,12...| |(692,[124,125,126...| ++ only showing top 20 rows {code} > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Critical > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is in spark repository > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173383#comment-15173383 ] Jeff Zhang commented on SPARK-13581: I suspect it is issue in the code generation. Because the root cause is that it should read the column features but actually it read the column label, so cause the match error. And df.show() is successful without any selection. The stacktrace shows the error come from code generator. Can any guy familiar with code generation help on this ? {code} Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost): scala.MatchError: 0.0 (of class java.lang.Double) at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:63) at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:60) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:40) at org.apache.spark.sql.execution.WholeStageCodegen$$anonfun$5$$anon$1.hasNext(WholeStageCodegen.scala:305) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:350) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) {code} > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Minor > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is in spark repository > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173058#comment-15173058 ] Jakob Odersky commented on SPARK-13581: --- It's in spark "data/mllib/sample_libsvm_data.txt" > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Minor > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13581) LibSVM throws MatchError
[ https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173052#comment-15173052 ] Jeff Zhang commented on SPARK-13581: [~jodersky] Can you attach the data file ? I guess it it small. > LibSVM throws MatchError > > > Key: SPARK-13581 > URL: https://issues.apache.org/jira/browse/SPARK-13581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jakob Odersky >Assignee: Jeff Zhang >Priority: Minor > > When running an action on a DataFrame obtained by reading from a libsvm file > a MatchError is thrown, however doing the same on a cached DataFrame works > fine. > {code} > val df = > sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") > //file is > df.select(df("features")).show() //MatchError > df.cache() > df.select(df("features")).show() //OK > {code} > The exception stack trace is the following: > {code} > scala.MatchError: 1.0 (of class java.lang.Double) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207) > [info]at > org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) > [info]at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59) > [info]at > org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56) > {code} > This issue first appeared in commit {{1dac964c1}}, in PR > [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622. > [~jeffzhang], do you have any insight of what could be going on? > cc [~iyounus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org