Michael Styles created SPARK-17037: -------------------------------------- Summary: distinct() operator fails on Dataframe with column names containing periods Key: SPARK-17037 URL: https://issues.apache.org/jira/browse/SPARK-17037 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.0 Reporter: Michael Styles
Using the distinct() operator on a Dataframe with column names containing periods results in an AnalysisException. For example: {noformat} d = [{'pageview.count': 100, 'exit_page': 'example.com/landing'} df = sqlContext.createDataFrame(d)] df.distinct() {noformat} results in the following error: pyspark.sql.utils.AnalysisException: u'Cannot resolve column name "pageview.count" among (exit_page, pageview.count);' -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org