[GitHub] [iceberg] flyxu1991 commented on pull request #3454: Core: add delete option for bin-packing

GitBox Wed, 10 Nov 2021 04:10:48 -0800


flyxu1991 commented on pull request #3454:
URL: https://github.com/apache/iceberg/pull/3454#issuecomment-965071365



   Hi, i use spark3.0.3 to rewrite datafiles with this pr. but i get a err like 
this:
   > org.apache.spark.sql.connector.catalog.CatalogNotFoundException: Catalog 
'default_iceberg' plugin class not found: spark.sql.catalog.default_iceberg is 
not defined
           at 
org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:51)
           at 
org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$catalog$1(CatalogManager.scala:52)
   ERROR BaseRewriteDataFilesSparkAction: Cannot complete rewrite, 
partial-progress.enabled is not enabled and one of the file set groups failed 
to be rewritten. This error occurred during the writing of new files, not 
during the commit process
   
   Here is my code:
   
   > SparkSession spark = SparkSession
                   .builder()
                   .appName("small_file_rewrite_user_rcc")
                   .config("spark.sql.extensions", 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
                   .config("spark.sql.catalog.hadoop_catalog", 
"org.apache.iceberg.spark.SparkSessionCatalog")
                   .config("spark.sql.catalog.hadoop_catalog.type", "hadoop")
                   .config("spark.sql.catalog.hadoop_catalog.warehouse", 
basePath)
                   .getOrCreate();
   
           HadoopCatalog hadoopCatalog = new 
HadoopCatalog(spark.sparkContext().hadoopConfiguration(), basePath);
           TableIdentifier tableIdentifier = 
TableIdentifier.of("hadoop_database", "zx_user_rcc_realtime");
           Table userRccTable = hadoopCatalog.loadTable(tableIdentifier);
   
           System.out.println(userRccTable.location());
   
           SparkActions.get(spark)
                   .rewriteDataFiles(userRccTable)
                   .option(BinPackStrategy.MIN_FILE_SIZE_BYTES, "0")
                   .option(RewriteDataFiles.TARGET_FILE_SIZE_BYTES, 512 * 1024 
* 1024 + "")
                   .option(BinPackStrategy.MIN_DELETES_PER_FILE, "1")
                   .execute();
           spark.close();
   
   It's there something wrong with my code or something i missed?
   
   > 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] flyxu1991 commented on pull request #3454: Core: add delete option for bin-packing

Reply via email to