[ https://issues.apache.org/jira/browse/SPARK-19742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885399#comment-15885399 ]
Song Jun commented on SPARK-19742: ---------------------------------- this is expected, see the comment. {code} /** * Inserts the content of the `DataFrame` to the specified table. It requires that * the schema of the `DataFrame` is the same as the schema of the table. * * @note Unlike `saveAsTable`, `insertInto` ignores the column names and just uses position-based * resolution. For example: * * {{{ * scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1") * scala> Seq((3, 4)).toDF("j", "i").write.insertInto("t1") * scala> Seq((5, 6)).toDF("a", "b").write.insertInto("t1") * scala> sql("select * from t1").show * +---+---+ * | i| j| * +---+---+ * | 5| 6| * | 3| 4| * | 1| 2| * +---+---+ * }}} * * Because it inserts data to an existing table, format or options will be ignored. * * @since 1.4.0 */ def insertInto(tableName: String): Unit = { insertInto(df.sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)) } {code} > When using SparkSession to write a dataset to Hive the schema is ignored > ------------------------------------------------------------------------ > > Key: SPARK-19742 > URL: https://issues.apache.org/jira/browse/SPARK-19742 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.0.1 > Environment: Running on Ubuntu with HDP 2.4. > Reporter: Navin Goel > > I am saving a Dataset that is created form reading a json and some selects > and filters into a hive table. The dataset.write().insertInto function does > not look at schema when writing to the table but instead writes in order to > the hive table. > The schemas for both the tables are same. > schema printed from spark of the dataset being written: > StructType(StructField(countrycode,StringType,true), > StructField(systemflag,StringType,true), > StructField(classcode,StringType,true), > StructField(classname,StringType,true), > StructField(rangestart,StringType,true), > StructField(rangeend,StringType,true), > StructField(tablename,StringType,true), > StructField(last_updated_date,TimestampType,true)) > Schema of the dataset after loading the same table from Hive: > StructType(StructField(systemflag,StringType,true), > StructField(RangeEnd,StringType,true), > StructField(classcode,StringType,true), > StructField(classname,StringType,true), > StructField(last_updated_date,TimestampType,true), > StructField(countrycode,StringType,true), > StructField(rangestart,StringType,true), > StructField(tablename,StringType,true)) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org