Where condition on columns of Arrays does no longer work in spark 2

2016-10-20 Thread filthysocks
I have a Column in a DataFrame that contains Arrays and I wanna filter for
equality. It does work fine in spark 1.6 but not in 2.0In spark 1.6.2:
import org.apache.spark.sql.SQLContextcase class DataTest(lists:
Seq[Int])val sql = new SQLContext(sc)val data =
sql.createDataFrame(sc.parallelize(Seq( DataTest(Seq(1)),  
DataTest(Seq(4,5,6))  
)))data.registerTempTable("uiae")sql.sql(s"SELECT lists FROM uiae WHERE
lists=Array(1)").collect().foreach(println)
returns:[WrappedArray(1)] 
In spark 2.0.0:
import spark.implicits._case class DataTest(lists: Seq[Int])val data =
Seq(DataTest(Seq(1)),DataTest(Seq(4,5,6))).toDS()data.createOrReplaceTempView("uiae")spark.sql(s"SELECT
lists FROM uiae WHERE lists=Array(1)").collect().foreach(println)
returns: nothing

Is that a bug? Or is it just done differently in spark 2?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Where-condition-on-columns-of-Arrays-does-no-longer-work-in-spark-2-tp27926.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Is it relevant to use BinaryClassificationMetrics.aucROC / aucPR with LogisticRegressionModel ?

2015-11-25 Thread filthysocks
jmvllt wrote
> Here, because the predicted class will always be 0 or 1, there is no way
> to vary the threshold to get the aucROC, right  Or am I totally wrong
> ? 

No, you are right. If you pass a (Score,Label) tuple to
BinaryClassificationMetrics, then Score has to be the class probability. 

Have you seen the clearThreshold function?

spark_docu wrote
> Clears the threshold so that predict will output raw prediction scores.

https://spark.apache.org/docs/1.5.1/api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionModel

You probably need to call it before the predict call.






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-relevant-to-use-BinaryClassificationMetrics-aucROC-aucPR-with-LogisticRegressionModel-tp25465p25473.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Anyone feels sparkSQL in spark1.5.1 very slow?

2015-10-26 Thread filthysocks
We upgrade from 1.4.1 to 1.5 and it's a pain
see
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-driver-memory-problems-while-doing-Cross-Validation-do-not-occur-with-1-4-1-td25076.html



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Anyone-feels-sparkSQL-in-spark1-5-1-very-slow-tp25154p25204.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org