Where condition on columns of Arrays does no longer work in spark 2
I have a Column in a DataFrame that contains Arrays and I wanna filter for equality. It does work fine in spark 1.6 but not in 2.0In spark 1.6.2: import org.apache.spark.sql.SQLContextcase class DataTest(lists: Seq[Int])val sql = new SQLContext(sc)val data = sql.createDataFrame(sc.parallelize(Seq( DataTest(Seq(1)), DataTest(Seq(4,5,6)) )))data.registerTempTable("uiae")sql.sql(s"SELECT lists FROM uiae WHERE lists=Array(1)").collect().foreach(println) returns:[WrappedArray(1)] In spark 2.0.0: import spark.implicits._case class DataTest(lists: Seq[Int])val data = Seq(DataTest(Seq(1)),DataTest(Seq(4,5,6))).toDS()data.createOrReplaceTempView("uiae")spark.sql(s"SELECT lists FROM uiae WHERE lists=Array(1)").collect().foreach(println) returns: nothing Is that a bug? Or is it just done differently in spark 2? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Where-condition-on-columns-of-Arrays-does-no-longer-work-in-spark-2-tp27926.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Is it relevant to use BinaryClassificationMetrics.aucROC / aucPR with LogisticRegressionModel ?
jmvllt wrote > Here, because the predicted class will always be 0 or 1, there is no way > to vary the threshold to get the aucROC, right Or am I totally wrong > ? No, you are right. If you pass a (Score,Label) tuple to BinaryClassificationMetrics, then Score has to be the class probability. Have you seen the clearThreshold function? spark_docu wrote > Clears the threshold so that predict will output raw prediction scores. https://spark.apache.org/docs/1.5.1/api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionModel You probably need to call it before the predict call. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-relevant-to-use-BinaryClassificationMetrics-aucROC-aucPR-with-LogisticRegressionModel-tp25465p25473.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Anyone feels sparkSQL in spark1.5.1 very slow?
We upgrade from 1.4.1 to 1.5 and it's a pain see http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-driver-memory-problems-while-doing-Cross-Validation-do-not-occur-with-1-4-1-td25076.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Anyone-feels-sparkSQL-in-spark1-5-1-very-slow-tp25154p25204.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org