liangrui1988 opened a new issue, #1939:
URL: https://github.com/apache/orc/issues/1939

   
   Hi, I have a few questions for you now.
   Cause: In the file of 2023-2-11 partition, there is a file read exception, 
log as follows. It is currently read with saprk3.2.1, at that time the write 
was written by spark2.4, but at that time there were records of successful 
execution of the read. The orc version was changed several times in the process.
   
   This issue is currently finding files with 3 partitions (each partition has 
only one problematic file) with this similar exception.
   I would like to ask: How to determine what orc version of the orc file to 
read correctly?
   Because I used everything
   Orc - tools - 1.3. The last - uber. Jar
   Orc - tools - 1.4. The last - uber. Jar
   ...
   Orc - tools - 1.9. The last - uber. Jar
   Coming and going to verify that the orc file in question is read is the same 
exception.
   I also changed the orc-tools source code and put orc.compress.size=262144*20
   But there are other exceptions, so it should be that reading orc content is 
indeed problematic. But the orc file is a record of successful reading before.
   In this case, what are some good suggestions to debug the problem? How can I 
read and take out this orc file normally?
   thank you
   
   ```
   java -jar orc-tools-1.5.6-uber.jar meta 000221_0 
   log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.util.Shell).
   log4j:WARN Please initialize the log4j system properly.
   log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.
   Processing data file 000221_0 [length: 311578064]
   Structure for 000221_0
   File Version: 0.12 with ORC_517
   Rows: 19072149
   Compression: SNAPPY
   Compression size: 262144
   Type: 
struct<uid:bigint,object:bigint,attr:int,updatetime:string,appdata:string>
   
   Stripe Statistics:
     Stripe 1:
       Column 0: count: 7562 hasNull: false
       Column 1: count: 7562 hasNull: false min: 677081 max: 2933721627 sum: 
20406373220886
       Column 2: count: 7562 hasNull: false min: 588 max: 2933714767 sum: 
20128755480030
       Column 3: count: 7562 hasNull: false min: 1 max: 1 sum: 7562
       Column 4: count: 7562 hasNull: false min: 2023-02-11 00:00:38 max: 
2023-02-11 23:59:47 sum: 143678
       Column 5: count: 7562 hasNull: false min: attentionInterface max: 
yypadapt sum: 97419
     Stripe 2:
       Column 0: count: 3031906 hasNull: false
       Column 1: count: 3031906 hasNull: false min: 0 max: 254227973 sum: 
335994680295336
       Column 2: count: 3031906 hasNull: false min: 0 max: 4293656576 sum: 
2715250187650210
       Column 3: count: 3031906 hasNull: false min: 0 max: 1 sum: 2848516
       Column 4: count: 3031906 hasNull: false min: 1970-01-01 08:00:00 max: 
2023-02-10 23:33:18 sum: 57606214
       Column 5: count: 3031906 hasNull: false min: LittleArt max: zone sum: 
26158889
     Stripe 3:
       Column 0: count: 3252488 hasNull: false
       Column 1: count: 3252488 hasNull: false min: 254227973 max: 804847112 
sum: 1874185328076757
       Column 2: count: 3252488 hasNull: false min: 0 max: 3871325159 sum: 
3002506141474388
       Column 3: count: 3252488 hasNull: false min: 0 max: 1 sum: 3085396
       Column 4: count: 3252488 hasNull: false min: 1970-01-01 08:00:00 max: 
2023-02-10 23:54:27 sum: 61797272
       Column 5: count: 3252488 hasNull: false min: LittleArt max: zone sum: 
28344969
     Stripe 4:
       Column 0: count: 3326815 hasNull: false
       Column 1: count: 3326815 hasNull: false min: 804847112 max: 1169323168 
sum: 3247348759465597
       Column 2: count: 3326815 hasNull: false min: 0 max: 4294967295 sum: 
3346008539078461
       Column 3: count: 3326815 hasNull: false min: 0 max: 1 sum: 3136008
       Column 4: count: 3326815 hasNull: false min: 1970-01-01 08:00:00 max: 
2023-02-10 23:49:22 sum: 63209485
       Column 5: count: 3326815 hasNull: false min: LittleArt max: zone sum: 
31812118
     Stripe 5:
       Column 0: count: 3449497 hasNull: false
       Column 1: count: 3449497 hasNull: false min: 1169323334 max: 1364554387 
sum: 4350425743363666
       Column 2: count: 3449497 hasNull: false min: 0 max: 140960826654730 sum: 
3590689504624478
       Column 3: count: 3449497 hasNull: false min: 0 max: 1 sum: 3332407
       Column 4: count: 3449497 hasNull: false min: 2015-02-14 20:58:43 max: 
2023-02-10 22:16:37 sum: 65540443
       Column 5: count: 3449497 hasNull: false min: LittleArt max: zone sum: 
33889705
     Stripe 6:
       Column 0: count: 3285861 hasNull: false
       Column 1: count: 3285861 hasNull: false min: 1364554387 max: 1859624746 
sum: 5176693436169696
       Column 2: count: 3285861 hasNull: false min: 1 max: 2933312186 sum: 
3719658121533510
       Column 3: count: 3285861 hasNull: false min: 0 max: 1 sum: 3105437
       Column 4: count: 3285861 hasNull: false min: 2014-12-03 12:22:33 max: 
2023-02-10 23:54:39 sum: 62431359
       Column 5: count: 3285861 hasNull: false min: LittleArt max: zone sum: 
46929366
     Stripe 7:
       Column 0: count: 2718020 hasNull: false
       Column 1: count: 2718020 hasNull: false min: 1859624760 max: 2843928969 
sum: 6525427773935358
       Column 2: count: 2718020 hasNull: false min: 1 max: 4294967295 sum: 
4562575441612432
       Column 3: count: 2718020 hasNull: false min: 0 max: 1 sum: 2717848
       Column 4: count: 2718020 hasNull: false min: 2013-08-17 15:56:00 max: 
2023-02-10 23:59:34 sum: 51642380
       Column 5: count: 2718020 hasNull: false min: LittleArt max: zone sum: 
41734941
   
   File Statistics:
     Column 0: count: 19072149 hasNull: false
     Column 1: count: 19072149 hasNull: false min: 0 max: 2933721627 sum: 
21530482094527296
     Column 2: count: 19072149 hasNull: false min: 0 max: 140960826654730 sum: 
20956816691453509
     Column 3: count: 19072149 hasNull: false min: 0 max: 1 sum: 18233174
     Column 4: count: 19072149 hasNull: false min: 1970-01-01 08:00:00 max: 
2023-02-11 23:59:47 sum: 362370831
     Column 5: count: 19072149 hasNull: false min: LittleArt max: zone sum: 
208967407
   
   Stripes:
     Stripe: offset: 3 data: 108812 rows: 7562 tail: 143 index: 224
       Stream: column 0 section ROW_INDEX start: 3 length 12
       Stream: column 1 section ROW_INDEX start: 15 length 37
       Stream: column 2 section ROW_INDEX start: 52 length 36
       Stream: column 3 section ROW_INDEX start: 88 length 26
       Stream: column 4 section ROW_INDEX start: 114 length 60
       Stream: column 5 section ROW_INDEX start: 174 length 53
       Stream: column 1 section DATA start: 227 length 23543
       Stream: column 2 section DATA start: 23770 length 33929
       Stream: column 3 section DATA start: 57699 length 101
       Stream: column 4 section DATA start: 57800 length 49194
       Stream: column 4 section LENGTH start: 106994 length 355
       Stream: column 5 section DATA start: 107349 length 1576
       Stream: column 5 section LENGTH start: 108925 length 15
       Stream: column 5 section DICTIONARY_DATA start: 108940 length 99
       Encoding column 0: DIRECT
       Encoding column 1: DIRECT_V2
       Encoding column 2: DIRECT_V2
       Encoding column 3: DIRECT_V2
       Encoding column 4: DIRECT_V2
       Encoding column 5: DICTIONARY_V2[10]
     Stripe: offset: 109182 data: 53900850 rows: 3031906 tail: 157 index: 33222
       Stream: column 0 section ROW_INDEX start: 109182 length 149
       Stream: column 1 section ROW_INDEX start: 109331 length 7942
       Stream: column 2 section ROW_INDEX start: 117273 length 7305
       Stream: column 3 section ROW_INDEX start: 124578 length 3516
       Stream: column 4 section ROW_INDEX start: 128094 length 10205
       Stream: column 5 section ROW_INDEX start: 138299 length 4105
       Stream: column 1 section DATA start: 142404 length 5793729
       Stream: column 2 section DATA start: 5936133 length 13558058
       Stream: column 3 section DATA start: 19494191 length 347433
       Stream: column 4 section DATA start: 19841624 length 31301800
       Stream: column 4 section LENGTH start: 51143424 length 143687
       Stream: column 5 section DATA start: 51287111 length 2755665
       Stream: column 5 section LENGTH start: 54042776 length 60
       Stream: column 5 section DICTIONARY_DATA start: 54042836 length 418
       Encoding column 0: DIRECT
       Encoding column 1: DIRECT_V2
       Encoding column 2: DIRECT_V2
       Encoding column 3: DIRECT_V2
       Encoding column 4: DIRECT_V2
       Encoding column 5: DICTIONARY_V2[55]
     Stripe: offset: 54043411 data: 52730936 rows: 3252488 tail: 157 index: 
37205
       Stream: column 0 section ROW_INDEX start: 54043411 length 158
       Stream: column 1 section ROW_INDEX start: 54043569 length 9097
       Stream: column 2 section ROW_INDEX start: 54052666 length 8156
       Stream: column 3 section ROW_INDEX start: 54060822 length 3971
       Stream: column 4 section ROW_INDEX start: 54064793 length 11437
       Stream: column 5 section ROW_INDEX start: 54076230 length 4386
       Stream: column 1 section DATA start: 54080616 length 6630706
       Stream: column 2 section DATA start: 60711322 length 11987943
       Stream: column 3 section DATA start: 72699265 length 297188
       Stream: column 4 section DATA start: 72996453 length 31415283
       Stream: column 4 section LENGTH start: 104411736 length 154156
       Stream: column 5 section DATA start: 104565892 length 2245188
       Stream: column 5 section LENGTH start: 106811080 length 60
       Stream: column 5 section DICTIONARY_DATA start: 106811140 length 412
       Encoding column 0: DIRECT
       Encoding column 1: DIRECT_V2
       Encoding column 2: DIRECT_V2
       Encoding column 3: DIRECT_V2
       Encoding column 4: DIRECT_V2
       Encoding column 5: DICTIONARY_V2[55]
   Exception in thread "main" java.lang.IllegalArgumentException: Buffer size 
too small. size = 262144 needed = 7133168
           at 
org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:212)
           at 
org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:263)
           at 
org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:250)
           at java.io.InputStream.read(InputStream.java:101)
           at 
com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
           at 
com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
           at 
com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
           at org.apache.orc.OrcProto$StripeFooter.<init>(OrcProto.java:11144)
           at org.apache.orc.OrcProto$StripeFooter.<init>(OrcProto.java:11108)
           at 
org.apache.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:11213)
           at 
org.apache.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:11208)
           at 
com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
           at 
com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
           at 
com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
           at 
org.apache.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:11441)
           at 
org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readStripeFooter(RecordReaderUtils.java:275)
           at 
org.apache.orc.impl.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:311)
           at org.apache.orc.tools.FileDump.printMetaDataImpl(FileDump.java:343)
           at org.apache.orc.tools.FileDump.printMetaData(FileDump.java:274)
           at org.apache.orc.tools.FileDump.main(FileDump.java:135)
           at org.apache.orc.tools.Driver.main(Driver.java:108)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to