Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

via GitHub Wed, 25 Jun 2025 07:16:50 -0700


andygrove commented on PR #1920:
URL: 
https://github.com/apache/datafusion-comet/pull/1920#issuecomment-3004952533


   I added the following method to `FileReader` locally:
   
   ```scala
     /** Sets the projected columns to be read later via {@link 
#readNextRowGroup()} */
     public void setRequestedSchemaFromSpecs(List<ParquetColumnSpec> 
projection) {
       paths.clear();
       for (ParquetColumnSpec columnSpec : projection) {
         ColumnDescriptor col = Utils.buildColumnDescriptor(columnSpec);
         paths.put(ColumnPath.get(col.getPath()), col);
       }
     }
   ```
   
   I can now compile Iceberg, but I get an exception at runtime, and I do not 
yet understand why:
   
   ```
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 3.4.3
         /_/
            
   Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 11.0.27)
   Type in expressions to have them evaluated.
   Type :help for more information.
   
   scala> spark.sql(s"CREATE TABLE IF NOT EXISTS t1 (c0 INT, c1 STRING) USING 
iceberg")
   25/06/25 08:12:29 INFO core/src/lib.rs: Comet native library version 0.9.0 
initialized
   25/06/25 08:12:29 WARN CometExecRule: Comet cannot execute some parts of 
this plan natively (set spark.comet.explainFallback.enabled=false to disable 
this logging):
    CreateTable [COMET: CreateTable is not supported]
   
   res0: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql(s"INSERT INTO t1 VALUES ${(0 until 10000).map(i => (i, 
i)).mkString(",")}")
   25/06/25 08:12:32 WARN CometExecRule: Comet cannot execute some parts of 
this plan natively (set spark.comet.explainFallback.enabled=false to disable 
this logging):
    AppendData [COMET: AppendData is not supported]
   +-  LocalTableScan [COMET: LocalTableScan is not supported]
   
   res1: org.apache.spark.sql.DataFrame = []                                    
   
   
   scala> spark.sql(s"SELECT * from t1").show()
   25/06/25 08:12:35 WARN CometExecRule: Comet cannot execute some parts of 
this plan natively (set spark.comet.explainFallback.enabled=false to disable 
this logging):
    CollectLimit [COMET: CollectLimit is not supported]
   +- Project
      +-  BatchScan spark_catalog.default.t1 [COMET: Unsupported scan: 
org.apache.iceberg.spark.source.SparkBatchQueryScan. Comet Scan only supports 
Parquet and Iceberg Parquet file formats, BatchScan spark_catalog.default.t1 is 
not supported]
   
   25/06/25 08:12:35 WARN CheckAllocator: More than one 
DefaultAllocationManager on classpath. Choosing first found
   25/06/25 08:12:35 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 32)
   java.lang.NoSuchMethodError: 'void 
org.apache.comet.parquet.FileReader.setRequestedSchemaFromSpecs(java.util.List)'
        at 
org.apache.iceberg.parquet.CometVectorizedParquetReader$FileIterator.newCometReader(CometVectorizedParquetReader.java:222)
   ```
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

Reply via email to