The more Spark code I write, the more I hit the same use cases where the Scala APIs feel a bit awkward. I'd love to understand if there are historical reasons for these and whether there is opportunity + interest to improve the APIs. Here are my top two: 1. registerTempTable() returns Unit def cachedDF(path: String, tableName: String) = { val df = sqlContext.read.load(path).cache() df.registerTempTable(tableName) df}// vs.def cachedDF(path: String, tableName: String) = sqlContext.read.load(path).cache().registerTempTable(tableName) 2. No toDF() implicit for creating a DataFrame from an RDD + schema val schema: StructType = ...val rdd = sc.textFile(...) .map(...) .aggregate(...)val df = sqlContext.createDataFrame(rdd, schema)// vs.val schema: StructType = ...val df = sc.textFile(...) .map(...) .aggregate(...) .toDF(schema) Have you encountered other examples where small, low-risk API tweaks could make common use cases more consistent + simpler to code? /Sim
-- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.