SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657978027


   > Can you be more specific about the problem? Are you saying that the actual 
file schema doesn't match the table schema specified by the user?
   
   So in case of orc data created by the hive no field names in the physical 
schema. Please find the below code for reference.
   
https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala#L133
   
   So from this code we are sending the index of the col from the dataschema.
   
   But Where as in the below code , we are passing the input result schema and 
that result schema will not have that index number that is passed from 
OrcUtils.scala
   
https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L211
   
   For example - 
   
   ```
   val u = """select date_dim.d_date_id from date_dim limit 5"""
   
   spark.sql(u).collect
   ```
   
   Here the value of index(d_date_id) returned by the OrcUtils.scala#L133 is 2 
   
   where the resultSchema passed in OrcFileFormat.scala#L211 is having only one 
 struct<`d_date_id`:string> 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to