umathivagit commented on issue #7396: URL: https://github.com/apache/iceberg/issues/7396#issuecomment-2248469430
Dear All, I faced similar issue and found that including Iceberg and Minio official documentation, the samples given are not working , also there is no working found anywhere in internet to enable minio as a storage location to define it as metadata/data files location...Here is the code base (working version) of managing the Iceberg catalog on Minio...there were few issues on the mismatch of iceberg version vs aws bundle and also few more additional options I had to setup to get this working...after successful run you should see the table get created inside the minio like below...  Also most important thing create access keys in the minio browser (as I have highlighted in the second screen shot click on the icon , it will bring up the access keys and then create access keys those keys are the ones you need to pass it on the below configurations while creating the spark session)  .config('spark.hadoop.fs.s3a.access.key', "<<accesskey>>")\ .config('spark.hadoop.fs.s3a.secret.key', "<<secretkey>>")\ Please make sure add below user environment variables as well AWS_ACCESS_KEY_ID = accesskey AWS_SECRET_ACCESS_KEY = secretkey AWS_REGION = us-east-1 MINIO_ROOT_USER = minioadmin MINIO_ROOT_PASSWORD = minioadmin MINIO_REGION = us-east-1 This code base stores the catalog into Postgres and the metadata/data files into minio object storage **Here is the working code:** > from pyspark.sql import SparkSession, Row > > > ### Initialize Spark session with Iceberg JDBC catalog configuration > spark = SparkSession.builder \ > .appName("udaydemo-app") \ > .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2,org.postgresql:postgresql:42.2.23,org.apache.iceberg:iceberg-aws-bundle:1.5.2') \ > .config('spark.sql.catalog.uday_minio_catalog','org.apache.iceberg.spark.SparkCatalog')\ > .config("spark.sql.catalog.uday_minio_catalog.catalog-impl", "org.apache.iceberg.jdbc.JdbcCatalog")\ > .config("spark.sql.catalog.uday_minio_catalog.uri", "jdbc:postgresql://<<replace your postgres host address>>/<<database name>>") \ > .config("spark.sql.catalog.uday_minio_catalog.verifyServerCertificate", "true") \ > .config("spark.sql.catalog.uday_minio_catalog.useSSL", "true") \ > .config("spark.sql.catalog.uday_minio_catalog.jdbc.user", "<<replace your username>>") \ > .config("spark.sql.catalog.uday_minio_catalog.jdbc.password", "<<replace your password>>) \ > .config("spark.sql.catalog.uday_minio_catalog.jdbc.driver", "org.postgresql.Driver") \ > .config("spark.sql.catalog.uday_minio_catalog.warehouse", "s3a://demo-icare")\ > .config("spark.sql.catalog.uday_minio_catalog.s3.endpoint","http://127.0.0.1:9000")\ > .config("spark.sql.catalog.uday_minio_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")\ > .config('spark.hadoop.fs.s3a.access.key', "<<minio access key>>")\ > .config('spark.hadoop.fs.s3a.endpoint.region','us-east-1')\ > .config("spark.hadoop.fs.s3a.secret.key", "<<minio secret key>>")\ > .config("spark.sql.catalog.uday_minio_catalog.s3a.path-style-access", "true")\ > .config("spark.sql.catalogImplementation","in-memory")\ > .config("spark.executor.heartbeatInterval", "300000")\ > .config("spark.network.timeout", "400000")\ > .config("spark.hadoop.fs.s3a.connection.ssl.enabled", "false")\ > .config("spark.hadoop.fs.s3a.path.style.access", "true")\ > .config("spark.hadoop.fs.s3a.attempts.maximum", "1")\ > .config("spark.hadoop.fs.s3a.connection.establish.timeout", "5000")\ > .config("spark.hadoop.fs.s3a.connection.timeout", "10000")\ > .getOrCreate() > > sc = spark.sparkContext > sc.setLogLevel("ERROR") > > ### Create an Iceberg table > spark.sql(""" > CREATE TABLE IF NOT EXISTS uday_minio_catalog.product ( > id INT, > name STRING, > price INT > ) USING iceberg""") > > spark.sql(""" > INSERT INTO uday_minio_catalog.product VALUES > (1, 'laptop', 50000), > (2, 'workstation', 100000), > (3, 'server', 250000) > """) > > spark.sql("SELECT * FROM uday_minio_catalog.product").show(truncate=False) > > spark.stop() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
