[I] Cannot query iceberg tables through thrift server with odbc, but maintenance procedures work fine [iceberg]

via GitHub Tue, 15 Apr 2025 17:07:35 -0700


fuzing opened a new issue, #12804:
URL: https://github.com/apache/iceberg/issues/12804


   ### Query engine
   
   spark 3.5
   
   ### Question
   
   
   I'm having difficulty querying iceberg tables using thriftserver with odbc.  
There's no issue calling procedures such as system.rewrite_data_files and 
system.expire_snapshots - they work fine.
   
   I'm using a spark-iceberg docker container and the entrypoint.sh looks 
similar to:
   
   ```
   start-master.sh -p 7077
   
   start-worker.sh spark://spark-iceberg:7077
   
   start-history-server.sh
   
   start-thriftserver.sh  --driver-java-options 
"-Dderby.system.home=/tmp/derby" \
     --packages 
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-aws-bundle:1.8.1"
 \
     --conf 
"spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
 \
     --conf "spark.sql.catalogImplementation=in-memory" \
     --conf "spark.sql.defaultCatalog=somecatalog" \
     --conf 
"spark.sql.catalog.somecatalog=org.apache.iceberg.spark.SparkCatalog" \
     --conf "spark.sql.catalog.somecatalog.warehouse=s3://bucket-name" \
     --conf "spark.sql.catalog.somecatalog.s3.endpoint=http://minio:9005"; \
     --conf "spark.sql.catalog.somecatalog.uri=http://iceberg-rest:8181"; \
     --conf "spark.sql.catalog.somecatalog.type=rest" \
     --conf 
"spark.sql.catalog.somecatalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO"
   
   tail -f /dev/null
   
   ```
   
   Querying through odbc with the following works fine (i.e. snapshots are 
expired):
   ```
       CALL system.expire_snapshots(
        table => 'somecatalog.analytics.events',
        older_than => TIMESTAMP '2025-04-01 00:00:00.000');
   ```
   
   However the following query does not:
   ```
   SELECT * FROM somecatalog.analytics.events LIMIT 5;
   ```
   
   It comes back with the following error:
   ```
    Table or view not found: somecatalog.analytics.events
   ```
   
   If I run the same query with spark-sql, it does work:
   
   ```
   spark-sql  \
     --packages 
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1,org.apache.iceberg:iceberg-aws-bundle:1.8.1"
 \
     --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
     --conf spark.sql.defaultCatalog=somecatalog \
     --conf spark.sql.catalogImplementation=in-memory \
     --conf spark.sql.catalog.somecatalog.uri=http://iceberg-rest:8181/ \
     --conf spark.sql.catalog.somecatalog.type=rest \
     --conf spark.sql.catalog.fusing.warehouse=s3://bucket-name \
     --conf spark.sql.catalog.somecatalog=org.apache.iceberg.spark.SparkCatalog 
\
     --conf 
spark.sql.catalog.somecatalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
     --conf spark.sql.catalog.somecatalog.s3.endpoint=http://minio:9005
   ```
   
   Then:
   ```
   SELECT * FROM somecatalog.analytics.events LIMIT 5;
   ```
   
   This works fine.  I'm also using starrocks for read-only access to the 
catalog and table, and this also works fine for querying.
   
   Any assistance on how to get the thriftserver working correctly for SQL 
queries would be much appreciated.
   
   Thank you!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Cannot query iceberg tables through thrift server with odbc, but maintenance procedures work fine [iceberg]

Reply via email to