flyxu1991 commented on pull request #3454:
URL: https://github.com/apache/iceberg/pull/3454#issuecomment-965071365
Hi, i use spark3.0.3 to rewrite datafiles with this pr. but i get a err like
this:
> org.apache.spark.sql.connector.catalog.CatalogNotFoundException: Catalog
'default_iceberg' plugin class not found: spark.sql.catalog.default_iceberg is
not defined
at
org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:51)
at
org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$catalog$1(CatalogManager.scala:52)
ERROR BaseRewriteDataFilesSparkAction: Cannot complete rewrite,
partial-progress.enabled is not enabled and one of the file set groups failed
to be rewritten. This error occurred during the writing of new files, not
during the commit process
Here is my code:
> SparkSession spark = SparkSession
.builder()
.appName("small_file_rewrite_user_rcc")
.config("spark.sql.extensions",
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.hadoop_catalog",
"org.apache.iceberg.spark.SparkSessionCatalog")
.config("spark.sql.catalog.hadoop_catalog.type", "hadoop")
.config("spark.sql.catalog.hadoop_catalog.warehouse",
basePath)
.getOrCreate();
HadoopCatalog hadoopCatalog = new
HadoopCatalog(spark.sparkContext().hadoopConfiguration(), basePath);
TableIdentifier tableIdentifier =
TableIdentifier.of("hadoop_database", "zx_user_rcc_realtime");
Table userRccTable = hadoopCatalog.loadTable(tableIdentifier);
System.out.println(userRccTable.location());
SparkActions.get(spark)
.rewriteDataFiles(userRccTable)
.option(BinPackStrategy.MIN_FILE_SIZE_BYTES, "0")
.option(RewriteDataFiles.TARGET_FILE_SIZE_BYTES, 512 * 1024
* 1024 + "")
.option(BinPackStrategy.MIN_DELETES_PER_FILE, "1")
.execute();
spark.close();
It's there something wrong with my code or something i missed?
>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]