Github user zheh12 commented on the issue: https://github.com/apache/spark/pull/21554 I know this sql standard. But I wonder If I use `query.schema`, how it will affect the logic of by-position. I think we should let datasource implement has the ability to decide use by-position or by-name. As the implement of kudu-spark, it decides to use by-name with this map ``` val indices: Array[(Int, Int)] = schema.fields.zipWithIndex.map({ case (field, sparkIdx) => sparkIdx -> table.getSchema.getColumnIndex(field.name) }) ``` But now we give a wrong shcmea, it always be something like (0,0), (1,1), it always be by-position. But I think this code want to be by-name. Beacuse kudu schema must put primary key first, so it always has different order from other table schema. When create dataframe with `query.schema`, there will no error by-position, but add the possibility to let datasource to choose by-name or by-position. But now the datasource must be by-position. And more, As a developer, I choose to implement InsertableRelation ``` trait InsertableRelation { def insert(data: DataFrame, overwrite: Boolean): Unit } ``` I have the possibility get the wrong schema, and I can't find nothing wrong with the dataframe. @cloud-fan What I think is right?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org