[ https://issues.apache.org/jira/browse/SPARK-16347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin closed SPARK-16347. ------------------------------- Resolution: Not A Problem > DataFrame allows duplicate column-names > --------------------------------------- > > Key: SPARK-16347 > URL: https://issues.apache.org/jira/browse/SPARK-16347 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Environment: Databricks community edition > Scala notebook in Google-Chrome > Linux (Ubuntu 14.04LTS) > Reporter: Sanjay Dasgupta > > Certain DataFrame APIs allow duplicate column-names. The following code > illustrates the problem: > case class Row(integer: Int, string1: String, string2: String) > val rows = spark.sparkContext.parallelize(Seq(Row(1, "one", "one"), Row(2, > "two", "two"), Row(3, "three", "three"))) > // DUPLICATED COLUMN-NAMES ... > val df = rows.toDF("integer", "string", "string") > df.printSchema() > Here is the output: > root > |-- integer: integer (nullable = false) > |-- string: string (nullable = true) > |-- string: string (nullable = true) > defined class Row > rows: org.apache.spark.rdd.RDD[Row] = ParallelCollectionRDD[168] at > parallelize at <console>:39 > df: org.apache.spark.sql.DataFrame = [integer: int, string: string ... 1 more > field] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org