Dataframe constructor

2015-11-23 Thread spark_user_2015
Dear all, is the following usage of the Dataframe constructor correct or does it trigger any side effects that I should be aware of? My goal is to keep track of my dataframe's state and allow custom transformations accordingly. val df: Dataframe = ...some dataframe... val newDf = new

Discretization

2015-05-07 Thread spark_user_2015
The Spark documentation shows the following example code: // Discretize data in 16 equal bins since ChiSqSelector requires categorical features val discretizedData = data.map { lp = LabeledPoint(lp.label, Vectors.dense(lp.features.toArray.map { x = x / 16 } ) ) } I'm sort of missing why x / 16

Caching and Actions

2015-04-07 Thread spark_user_2015
I understand that RDDs are not created until an action is called. Is it a correct conclusion that it doesn't matter if .cache is used anywhere in the program if I only have one action that is called only once? Related to this question, consider this situation: val d1 = data.map((x,y,z) = (x,y))