SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657978027
> Can you be more specific about the problem? Are you saying that the actual file schema doesn't match the table schema specified by the user? So in case of orc data created by the hive no field names in the physical schema. Please find the below code for reference. https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala#L133 So from this code we are sending the index of the col from the dataschema. But Where as in the below code , we are passing the input result schema and that result schema will not have that index number that is passed from OrcUtils.scala https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L211 For example - ``` val u = """select date_dim.d_date_id from date_dim limit 5""" spark.sql(u).collect ``` Here the value of index(d_date_id) returned by the OrcUtils.scala#L133 is 2 where the resultSchema passed in OrcFileFormat.scala#L211 is having only one struct<`d_date_id`:string> ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org