[ https://issues.apache.org/jira/browse/SPARK-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust resolved SPARK-4564. ------------------------------------- Resolution: Won't Fix I'm going to close this wontfix unless there is major objection. Happy to accept PRs to clarify the documentation though :) > SchemaRDD.groupBy(groupingExprs)(aggregateExprs) doesn't return the > groupingExprs as part of the output schema > -------------------------------------------------------------------------------------------------------------- > > Key: SPARK-4564 > URL: https://issues.apache.org/jira/browse/SPARK-4564 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.1.0 > Environment: Mac OSX, local mode, but should hold true for all > environments > Reporter: Dean Wampler > > In the following example, I would expect the "grouped" schema to contain two > fields, the String name and the Long count, but it only contains the Long > count. > {code} > // Assumes val sc = new SparkContext(...), e.g., in Spark Shell > import org.apache.spark.sql.{SQLContext, SchemaRDD} > import org.apache.spark.sql.catalyst.expressions._ > val sqlc = new SQLContext(sc) > import sqlc._ > case class Record(name: String, n: Int) > val records = List( > Record("three", 1), > Record("three", 2), > Record("two", 3), > Record("three", 4), > Record("two", 5)) > val recs = sc.parallelize(records) > recs.registerTempTable("records") > val grouped = recs.select('name, 'n).groupBy('name)(Count('n) as 'count) > grouped.printSchema > // root > // |-- count: long (nullable = false) > grouped foreach println > // [2] > // [3] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org