[ https://issues.apache.org/jira/browse/SPARK-17760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-17760: ------------------------------------ Assignee: Apache Spark > DataFrame's pivot doesn't see column created in groupBy > ------------------------------------------------------- > > Key: SPARK-17760 > URL: https://issues.apache.org/jira/browse/SPARK-17760 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.0.0 > Environment: Databrick's community version, spark 2.0.0, pyspark, > python 2. > Reporter: Alberto Bonsanto > Assignee: Apache Spark > Labels: easytest, newbie > > Related to > [https://stackoverflow.com/questions/39817993/pivoting-with-missing-values]. > I'm not completely sure if this is a bug or expected behavior. > When you `groypBy` by a column generated inside of it, the `pivot` method > apparently doesn't find this column during the analysis. > E.g. > {code:none} > df = (sc.parallelize([(1.0, "2016-03-30 01:00:00"), > (30.2, "2015-01-02 03:00:02")]) > .toDF(["amount", "Date"]) > .withColumn("Date", col("Date").cast("timestamp"))) > (df.withColumn("hour",hour("date")) > .groupBy(dayofyear("date").alias("date")) > .pivot("hour").sum("amount").show()){code} > Shows the following exception. > {quote} > AnalysisException: u'resolved attribute(s) date#140688 missing from > dayofyear(date)#140994,hour#140977,sum(`amount`)#140995 in operator > !Aggregate \[dayofyear(cast(date#140688 as date))], > [dayofyear(cast(date#140688 as date)) AS dayofyear(date)#140994, > pivotfirst(hour#140977, sum(`amount`)#140995, 1, 3, 0, 0) AS > __pivot_sum(`amount`) AS `sum(``amount``)`#141001\];' > {quote} > To solve it you have to add the column {{date}} before grouping and pivoting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org