MonkeyCanCode opened a new pull request, #3574:
URL: https://github.com/apache/polaris/pull/3574

   <!--
   ๐Ÿ“ Describe what changes you're proposing, especially breaking or user-facing 
changes. 
   ๐Ÿ“– See https://github.com/apache/polaris/blob/main/CONTRIBUTING.md for more.
   -->
   
   While validating other things, I noticed our ozone getting started example 
appears to be non-functional. The main breaking pieces are:
   1. Ozone cluster stuck in safe mode (this is due to in 2.1.0, OZone has 
`hdds.scm.safemode.min.datanode` set to 3 which it will needs 3 data nodes 
before leaving safe mode while we only have one datanode in the 
docker-compose). Sample error below:
   ```
   ozone-s3g-1       | SCM is in safe mode. Will retry in 1000ms
   ozone-s3g-1       | SCM is in safe mode. Will retry in 1000ms
   ```
   
   2. While waiting the end to end, I noticed AWS S3 auth are needed for 
Polaris and Spark even when securing S3 is not enabled (when it is not enabled, 
it will take any value as long as provided). Sample error below:
   ```
   Caused by: java.io.UncheckedIOException: Failed to close current writer
        at 
org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:124)
        at 
org.apache.iceberg.io.RollingFileWriter.close(RollingFileWriter.java:147)
        at 
org.apache.iceberg.io.RollingDataWriter.close(RollingDataWriter.java:32)
        at 
org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.close(SparkWrite.java:747)
        at 
org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.commit(SparkWrite.java:729)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$5(WriteToDataSourceV2Exec.scala:475)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:491)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:430)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:496)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:393)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
   Caused by: java.io.IOException: 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from any of the providers in the chain 
AwsCredentialsProviderChain(credentialsProviders=[SystemPropertyCredentialsProvider(),
 EnvironmentVariableCredentialsProvider(), 
WebIdentityTokenCredentialsProvider(), 
ProfileCredentialsProvider(profileName=default, 
profileFile=ProfileFile(sections=[])), ContainerCredentialsProvider(), 
InstanceProfileCredentialsProvider()]) : [SystemPropertyCredentialsProvider(): 
Unable to load credentials from system settings. Access key must be specified 
either via environment variable (AWS_ACCESS_KEY_ID) or system property 
(aws.accessKeyId)., EnvironmentVariableCredentialsProvider(): Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId)., 
WebIdentityTokenCredentialsProvider(): Either the environment variable 
AWS_WEB_IDENTITY_TOKEN
 _FILE or the javaproperty aws.webIdentityTokenFile must be set., 
ProfileCredentialsProvider(profileName=default, 
profileFile=ProfileFile(sections=[])): Profile file contained no credentials 
for profile 'default': ProfileFile(sections=[]), 
ContainerCredentialsProvider(): Cannot fetch credentials from container - 
neither AWS_CONTAINER_CREDENTIALS_FULL_URI or 
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variables are set., 
InstanceProfileCredentialsProvider(): Failed to load credentials from IMDS.]
        at 
org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:41)
        at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.close(ParquetFileWriter.java:1673)
        at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1659)
        at 
org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:261)
        at org.apache.iceberg.io.DataWriter.close(DataWriter.java:82)
        at 
org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:122)
        ... 21 more
   ```
   
   Beside these two, this PR also addressed the following NITs:
   1. Use valid and consistent S3 region (changed to us-west-2)
   3. Use auth without mentioned of minio (changed to polaris_root:polaris_pass)
   
   ## Checklist
   - [x] ๐Ÿ›ก๏ธ Don't disclose security issues! (contact [email protected])
   - [x] ๐Ÿ”— Clearly explained why the changes are needed, or linked related 
issues: Fixes #
   - [x] ๐Ÿงช Added/updated tests with good coverage, or manually tested (and 
explained how)
   - [x] ๐Ÿ’ก Added comments for complex logic
   - [x] ๐Ÿงพ Updated `CHANGELOG.md` (if needed)
   - [x] ๐Ÿ“š Updated documentation in `site/content/in-dev/unreleased` (if needed)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to