[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-05-18 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289738#comment-15289738
 ] 

Jakob Odersky commented on SPARK-13581:
---

I can't reproduce it anymore either

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Critical
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is in spark repository
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-05-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288815#comment-15288815
 ] 

Sean Owen commented on SPARK-13581:
---

[~jodersky] do you think this is still a problem?

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Critical
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is in spark repository
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-05-13 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282520#comment-15282520
 ] 

Sandeep Singh commented on SPARK-13581:
---

Can't seem to reproduce on current master
{code}
scala> val df = 
spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
df: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> df.select(df("features")).show()
++
|features|
++
|(692,[127,128,129...|
|(692,[158,159,160...|
|(692,[124,125,126...|
|(692,[152,153,154...|
|(692,[151,152,153...|
|(692,[129,130,131...|
|(692,[158,159,160...|
|(692,[99,100,101,...|
|(692,[154,155,156...|
|(692,[127,128,129...|
|(692,[154,155,156...|
|(692,[153,154,155...|
|(692,[151,152,153...|
|(692,[129,130,131...|
|(692,[154,155,156...|
|(692,[150,151,152...|
|(692,[124,125,126...|
|(692,[152,153,154...|
|(692,[97,98,99,12...|
|(692,[124,125,126...|
++
only showing top 20 rows


scala> df.cache()
res1: df.type = [label: double, features: vector]

scala> df.select(df("features")).show()
16/05/13 13:37:27 WARN Executor: 1 block locks were not released by TID = 9:
[rdd_11_0]
++
|features|
++
|(692,[127,128,129...|
|(692,[158,159,160...|
|(692,[124,125,126...|
|(692,[152,153,154...|
|(692,[151,152,153...|
|(692,[129,130,131...|
|(692,[158,159,160...|
|(692,[99,100,101,...|
|(692,[154,155,156...|
|(692,[127,128,129...|
|(692,[154,155,156...|
|(692,[153,154,155...|
|(692,[151,152,153...|
|(692,[129,130,131...|
|(692,[154,155,156...|
|(692,[150,151,152...|
|(692,[124,125,126...|
|(692,[152,153,154...|
|(692,[97,98,99,12...|
|(692,[124,125,126...|
++
only showing top 20 rows
{code}

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Critical
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is in spark repository
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173383#comment-15173383
 ] 

Jeff Zhang commented on SPARK-13581:


I suspect it is issue in the code generation. Because the root cause is that it 
should read the column features but actually it read the column label, so cause 
the match error. And df.show() is successful without any selection.  The 
stacktrace shows the error come from code generator. Can any guy familiar with 
code generation help on this ?

{code}
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost 
task 0.0 in stage 5.0 (TID 5, localhost): scala.MatchError: 0.0 (of class 
java.lang.Double)
at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at 
org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:63)
at 
org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:60)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:40)
at 
org.apache.spark.sql.execution.WholeStageCodegen$$anonfun$5$$anon$1.hasNext(WholeStageCodegen.scala:305)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
{code}

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Minor
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is in spark repository
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173058#comment-15173058
 ] 

Jakob Odersky commented on SPARK-13581:
---

It's in spark "data/mllib/sample_libsvm_data.txt"

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Minor
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13581) LibSVM throws MatchError

2016-02-29 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173052#comment-15173052
 ] 

Jeff Zhang commented on SPARK-13581:


[~jodersky] Can you attach the data file ? I guess it it small. 

> LibSVM throws MatchError
> 
>
> Key: SPARK-13581
> URL: https://issues.apache.org/jira/browse/SPARK-13581
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jakob Odersky
>Assignee: Jeff Zhang
>Priority: Minor
>
> When running an action on a DataFrame obtained by reading from a libsvm file 
> a MatchError is thrown, however doing the same on a cached DataFrame works 
> fine.
> {code}
> val df = 
> sqlContext.read.format("libsvm").load("../data/mllib/sample_libsvm_data.txt") 
> //file is
> df.select(df("features")).show() //MatchError
> df.cache()
> df.select(df("features")).show() //OK
> {code}
> The exception stack trace is the following:
> {code}
> scala.MatchError: 1.0 (of class java.lang.Double)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
> [info]at 
> org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:192)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
> [info]at 
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
> [info]at 
> org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
> {code}
> This issue first appeared in commit {{1dac964c1}}, in PR 
> [#9595|https://github.com/apache/spark/pull/9595] fixing SPARK-11622.
> [~jeffzhang], do you have any insight of what could be going on?
> cc [~iyounus]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org