Re: can not join dataset with itself

2016-04-08 Thread JH P
I’m using Spark 1.6.1 Class is case class DistinctValues(statType: Int, dataType: Int, _id: Int, values: Array[(String, Long)], numOfMembers: Int,category: String) and error for newGnsDS.joinWith(newGnsDS, $"dataType”) Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot

can not join dataset with itself

2016-04-08 Thread JH P
Hi. I want a dataset join with itself. So i tried below codes. 1. newGnsDS.joinWith(newGnsDS, $"dataType”) 2. newGnsDS.as("a").joinWith(newGnsDS.as("b"), $"a.dataType" === $"b.datatype”) 3. val a = newGnsDS.map(x => x).as("a") val b = newGnsDS.map(x => x).as("b") a.joinWith(b,

DataSet with Array member

2016-04-05 Thread JH P
Hi everyone. I have such class case class DistinctValues(statType: Int, dataType: Int, _id: Int, values: Array[(String, Long)], category: String) extends Serializable { I think this class won't work in case of DistinctValues. values.length > Int.MaxValue. Moreover I instantiate this class by