beliefer opened a new issue, #11062:
URL: https://github.com/apache/incubator-gluten/issues/11062

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   We have some Hive table with ORC format. Users want convert the Hive table 
from ORC format to Parquet format.  In the transition stage, some partitions 
are already converted to Parquet format, but the others not.
   
   We execute the SQL, such as:
   select * from intermediate_state_table;
   It is OK when we using Spark, but it is bad when using Gluten. We get the 
following error.
   
   ```
   Caused by: org.apache.gluten.exception.GlutenException: Exception: 
VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: No magic bytes found at end of the Parquet file
   Retriable: False
   Expression: strncmp(copy.data() + readSize - 4, "PAR1", 4) == 0
   Context: Split [Hive: 
hdfs://path/intermediate_state_table/20240818/part-00000-cc17cd1a-35a6-4ec6-8694-3dcdcb95ccfc-c000
 0 - 2545077] Task Gluten_Stage_0_TID_160_VTID_1
   Additional Context: Operator: TableScan[0] 0
   Function: loadFileMetaData
   File: 
/home/hadoop/gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp
   Line: 216
   Stack trace:
   # 0  _ZN8facebook5velox7process10StackTraceC1Ei
   # 1  
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
   # 2  
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorEPKcEEvRKNS1_18VeloxCheckFailArgsET0_
   # 3  _ZN8facebook5velox7parquet10ReaderBase16loadFileMetaDataEv
   # 4  
_ZN8facebook5velox7parquet10ReaderBaseC1ESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
   # 5  
_ZN8facebook5velox7parquet13ParquetReaderC2ESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
   # 6  
_ZN8facebook5velox7parquet20ParquetReaderFactory12createReaderESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
   # 7  _ZN8facebook5velox9connector4hive11SplitReader12createReaderEv
   # 8  
_ZN8facebook5velox9connector4hive11SplitReader12prepareSplitESt10shared_ptrINS0_6common14MetadataFilterEERNS0_4dwio6common17RuntimeStatisticsE
   # 9  
_ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
   # 10 _ZN8facebook5velox4exec9TableScan9getOutputEv
   # 11 
_ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE8_clEv
   # 12 
_ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
   # 13 _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEE
   # 14 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
   # 15 _ZN6gluten24WholeStageResultIterator4nextEv
   # 16 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
   # 17 0x00007f48b5018747
   
       at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native 
Method)
       at 
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
       at 
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
       ... 27 more
   ```
   
   ### Gluten version
   
   _No response_
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to