[GitHub] spark issue #21554: [SPARK-24546][SQL] InsertIntoDataSourceCommand make data...

zheh12 Fri, 15 Jun 2018 02:45:01 -0700

Github user zheh12 commented on the issue:

    https://github.com/apache/spark/pull/21554
  
    I know this sql standard.
    
    But I wonder If I use `query.schema`, how it will affect the logic of 
by-position.
    
    I think we should let datasource implement has the ability to decide use 
by-position or by-name.
    
    As the implement of kudu-spark, it decides to use by-name with this map
    
    ```
    val indices: Array[(Int, Int)] = schema.fields.zipWithIndex.map({ case 
(field, sparkIdx) =>
          sparkIdx -> table.getSchema.getColumnIndex(field.name)
        })
    ```
    
    But now we give a wrong shcmea, it always be something like (0,0), (1,1), 
it always be by-position.
    
    But I think this code want to be by-name. Beacuse kudu schema must put 
primary key first, so it always has different order from other table schema.
    
    When create dataframe with `query.schema`, there will no error by-position, 
but add the possibility to let
    datasource to choose by-name or by-position.
    
    But now the datasource must be by-position.
    
    And more, As a developer, I choose to implement InsertableRelation
    ```
    trait InsertableRelation {
      def insert(data: DataFrame, overwrite: Boolean): Unit
    }
    ```
    
    I have the possibility get the wrong schema, and I can't find nothing wrong 
with the dataframe.
    
    @cloud-fan What I think is right?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21554: [SPARK-24546][SQL] InsertIntoDataSourceCommand make data...

Reply via email to