StackOverflowError on RDD.union

2015-02-03 Thread Thomas Kwan
I am trying to combine multiple RDDs into 1 RDD, and I am using the union function. I wonder if anyone has seen StackOverflowError as follows: Exception in thread main java.lang.StackOverflowError at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at

retry in combineByKey at BinaryClassificationMetrics.scala

2014-12-23 Thread Thomas Kwan
Hi there, We are using mllib 1.1.1, and doing Logistics Regression with a dataset of about 150M rows. The training part usually goes pretty smoothly without any retries. But during the prediction stage and BinaryClassificationMetrics stage, I am seeing retries with error of fetch failure. The

weights not changed with different reg param

2014-12-23 Thread Thomas Kwan
Hi there We are on mllib 1.1.1, and trying different regularization parameters. We noticed that the regParam dont affect the weights at all. Is setting the reg param via the optimizer the right thing to do? Do we need to set our own updater? Anyone else seeing the same behaviour? thanks again