rychu151 opened a new issue, #10222: URL: https://github.com/apache/iceberg/issues/10222
### Query engine Spark ### Question Im trying to setup local develop env for my testing purposes using docker **Target is to save dataframe in a Iceberg format and Hive-metadata** Here is my current docker-compose: ``` version: "3" services: #Jupyter Notebook with PySpark & iceberg Server spark-iceberg: image: tabulario/spark-iceberg container_name: spark-iceberg build: spark/ networks: iceberg_net: depends_on: #- rest - minio volumes: - ./warehouse:/home/iceberg/warehouse - ./notebooks:/home/iceberg/notebooks/notebooks - ./spark-iceberg/spark/jars/nessie-spark-extensions-3.5_2.12-0.80.0.jar:/opt/spark/jars/nessie-spark-extensions-3.5_2.12-0.80.0.jar - ./spark-iceberg/spark/conf/spark-defaults.conf:/opt/spark/conf/spark-defaults.conf environment: - AWS_ACCESS_KEY_ID=admin - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east-1 - USE_STREAM_CAPABLE_STATE_STORE=true - CATALOG_WAREHOUSE=s3://warehouse/ ports: - "8888:8888" - "8080:8080" - "10000:10000" - "10001:10001" # Minio Storage Server minio: image: bitnami/minio:latest # not miniop/minio because of reported issues with the image container_name: minio environment: - MINIO_ROOT_USER=admin - MINIO_ROOT_PASSWORD=password - MINIO_REGION=us-east-1 - MINIO_REGION_NAME=us-east-1 networks: iceberg_net: aliases: - warehouse.minio ports: - "9001:9001" - "9000:9000" #hive metastore hive-metastore: image: apache/hive:4.0.0 container_name: hive-metastore networks: iceberg_net: ports: - "9083:9083" environment: - SERVICE_NAME=metastore depends_on: - zookeeper - postgres volumes: - ./hive_metastore/conf/hive-site.xml:/opt/hive/conf/hive-site.xml ``` spark-defaults.conf: ``` spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.catalog.hive_prod org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.hive_prod.type hive spark.sql.catalog.hive_prod.uri thrift://hive-metastore:9083 spark.sql.catalog.hive_prod.io-impl org.apache.iceberg.aws.s3.S3FileIO spark.sql.catalog.hive_prod.s3.endpoint http://minio:9000 spark.sql.catalog.hive_prod.warehouse s3://warehouse/ hive.metastore.uris thrift://hive-metastore:9083 ``` and hive-site.xml ``` <configuration> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property> <property> <name>hive.tez.exec.inplace.progress</name> <value>false</value> </property> <property> <name>hive.exec.scratchdir</name> <value>/opt/hive/scratch_dir</value> </property> <property> <name>hive.user.install.directory</name> <value>/opt/hive/install_dir</value> </property> <property> <name>tez.runtime.optimize.local.fetch</name> <value>true</value> </property> <property> <name>hive.exec.submit.local.task.via.child</name> <value>false</value> </property> <property> <name>mapreduce.framework.name</name> <value>local</value> </property> <property> <name>tez.local.mode</name> <value>true</value> </property> <property> <name>hive.execution.engine</name> <value>tez</value> </property> <property> <name>metastore.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>s3a://warehouse/</value> </property> <property> <name>fs.s3a.endpoint</name> <value>http://localhost:9000</value> </property> <property> <name>fs.s3a.access.key</name> <value>admin</value> </property> <property> <name>fs.s3a.secret.key</name> <value>password</value> </property> <property> <name>fs.s3a.path.style.access</name> <value>true</value> </property> <property> <name>fs.s3a.impl</name> <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> </property> <property> <name>fs.s3a.connection.ssl.enabled</name> <value>false</value> </property> <property> <name>hive.metastore.authorization.storage.checks</name> <value>false</value> <description>Disables storage-based authorization checks to allow Hive better compatibility with MinIO. </description> </property> </configuration> ``` using MinIO US i have created a bucket called `warehouse` and set it to public access **Target is to save dataframe in a Iceberg format and Hive-metadata** so i will be able to browse this data using Apache Druid in order to create a table i use PySpark: ``` col_name = "col_name" label_name = "label" data_name = "upload_date" schema = StructType([ StructField(data_name, LongType(), False), StructField(col_name, StringType(), False), StructField(label_name, StringType(), False) ]) spark = SparkSession.builder.appName("schema_example").enableHiveSupport().getOrCreate() spark.conf.set("spark.sql.iceberg.catalog.hive_prod", "DEBUG") spark.conf.set("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider") data = [] df = spark.createDataFrame(data, schema) ``` spark.sql("SHOW DATABASES ").show() prints only `default` database when i try to create a database like below: `spark.sql('CREATE DATABASE IF NOT EXISTS hive_prod.testing')` i get the following error: ``` Py4JJavaError: An error occurred while calling o34.sql. : java.lang.RuntimeException: Failed to create namespace testing in Hive Metastore at org.apache.iceberg.hive.HiveCatalog.createNamespace(HiveCatalog.java:299) Caused by: MetaException(message:Failed to create external path s3://warehouse/testing.db for database testing. This may result in access not being allowed if the StorageBasedAuthorizationProvider is enabled: null) ``` anyone understands why? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org