Where condition on columns of Arrays does no longer work in spark 2

2016-10-20 Thread filthysocks
I have a Column in a DataFrame that contains Arrays and I wanna filter for equality. It does work fine in spark 1.6 but not in 2.0In spark 1.6.2: import org.apache.spark.sql.SQLContextcase class DataTest(lists: Seq[Int])val sql = new SQLContext(sc)val data = sql.createDataFrame(sc.parallelize(Seq(

Re: Is it relevant to use BinaryClassificationMetrics.aucROC / aucPR with LogisticRegressionModel ?

2015-11-25 Thread filthysocks
jmvllt wrote > Here, because the predicted class will always be 0 or 1, there is no way > to vary the threshold to get the aucROC, right Or am I totally wrong > ? No, you are right. If you pass a (Score,Label) tuple to BinaryClassificationMetrics, then Score has to be the class probability.

Re: Anyone feels sparkSQL in spark1.5.1 very slow?

2015-10-26 Thread filthysocks
We upgrade from 1.4.1 to 1.5 and it's a pain see http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-driver-memory-problems-while-doing-Cross-Validation-do-not-occur-with-1-4-1-td25076.html -- View this message in context: