[GitHub] spark issue #22573: [SPARK-25558][SQL] Pushdown predicates for nested fields...

rdblue Mon, 01 Oct 2018 12:20:02 -0700

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/22573
  
    @dongjoon-hyun, Iceberg schema evolution is based on the field IDs, not on 
names. The current table schema's names are the runtime names for columns in 
that table, and all reads happen by first translating those names to IDs and 
projecting the IDs from the data files. That way, renames can never cause you 
to get incorrect data.
    
    You're mostly right that Spark has a problem with schema evolution for 
HadoopFS tables. That wouldn't affect my suggestion here, though. If you're 
filtering or projecting field `m.n`, then Spark currently handles that by 
matching columns by name. If you're matching by name, then `m.n` can't change 
across versions, or at least you can always project `m.n` from the data (in the 
case of Avro).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22573: [SPARK-25558][SQL] Pushdown predicates for nested fields...

Reply via email to