williamhyun opened a new pull request #672:
URL: https://github.com/apache/orc/pull/672


   ### What changes were proposed in this pull request?
   
   This PR aims to fix regression on column names with a dot character. 
   
   ### Why are the changes needed?
   
   Since ORC-696, we can not read the orc files with column names including a 
dot. For example, the following test file was read incorrectly.
   ```
   % orc-tools meta core/src/test/resources/col.dot.orc
   Processing data file core/src/test/resources/col.dot.orc [length: 235]
   Structure for core/src/test/resources/col.dot.orc
   File Version: 0.12 with ORC_517
   Rows: 1
   Compression: SNAPPY
   Compression size: 262144
   Calendar: Julian/Gregorian
   Type: struct<`col.dot`:bigint>
   
   Stripe Statistics:
     Stripe 1:
       Column 0: count: 1 hasNull: false
       Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 0 max: 0 sum: 0
   
   File Statistics:
     Column 0: count: 1 hasNull: false
     Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 0 max: 0 sum: 0
   
   Stripes:
     Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
       Stream: column 0 section ROW_INDEX start: 3 length 11
       Stream: column 1 section ROW_INDEX start: 14 length 24
       Stream: column 1 section DATA start: 38 length 6
       Encoding column 0: DIRECT
       Encoding column 1: DIRECT_V2
   
   File length: 235 bytes
   Padding length: 0 bytes
   Padding ratio: 0%
   
   User Metadata:
     org.apache.spark.version=3.1.1
   
________________________________________________________________________________________________________________________
   ```
   
   
   ### How was this patch tested?
   Pass the CIs with the newly added test case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to