[ https://issues.apache.org/jira/browse/FLINK-26061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl updated FLINK-26061: ---------------------------------- Description: {{FileSystem.getFileStatus}} -or {{FileSystem.exists}}- are not supported for empty directories in object stores (like s3). The s3 filesystem implementations used by Flink are working around that in different ways. {{flink-s3-fs-hadoop}} fixes this issue internally. {{flink-s3-fs-presto}} fails in certain situations. This is covered by already existing tests like [FileSystemBehaviorTestSuite:125|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/test/java/org/apache/flink/core/fs/FileSystemBehaviorTestSuite.java#L125]. There was some work done in this matter already in FLINK-8373 where the before-mentioned tests were introduced. The behavior can be reproduced in the following way: # Start a local Minio instance: {code:bash} docker run \ -p 9000:9000 \ -p 9001:9001 \ -v /tmp/minio/data2:/data \ -e "MINIO_ROOT_USER=minio" \ -e "MINIO_ROOT_PASSWORD=minio123" \ minio/minio:latest server /data --console-address ":9001" {code} # Create a bucket {{foo}} through [http://localhost:9001/add-bucket] # Set environment variables accordingly: {{IT_CASE_S3_BUCKET=foo;IT_CASE_S3_ACCESS_KEY=minio;IT_CASE_S3_SECRET_KEY=minio123}} # Add the following lines to the test's {{checkCredentialsAndSetup}}: {code:java} BeforeClass public static void checkCredentialsAndSetup() throws IOException { // check whether credentials exist S3TestCredentials.assumeCredentialsAvailable(); // initialize configuration with valid credentials final Configuration conf = new Configuration(); + conf.setString("s3.endpoint", "http://localhost:9000"); + conf.setString("s3.path.style.access", "true"); conf.setString("s3.access.key", S3TestCredentials.getS3AccessKey()); conf.setString("s3.secret.key", S3TestCredentials.getS3SecretKey()); FileSystem.initialize(conf); } {code} # Remove the object store check from {{FileSystemBehaviorTestSuite.testMkdirsCreatesParentDirectories}} and run the test. This test will pass for {{HadoopS3FileSystemBehaviorITCase}} but fail for {{PrestoS3FileSystemBehaviorITCase}}. We might want to align this behavior since it's an easy-to-fall-into-trap. was: {{FileSystem.getFileStatus}} or {{FileSystem.exists}} are not supported for empty directories in object stores (like s3). The s3 filesystem implementations used by Flink are working around that in different ways. {{flink-s3-fs-hadoop}} fixes this issue internally. {{flink-s3-fs-presto}} fails in certain situations. This is covered by already existing tests like [FileSystemBehaviorTestSuite:125|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/test/java/org/apache/flink/core/fs/FileSystemBehaviorTestSuite.java#L125]. There was some work done in this matter already in FLINK-8373 where the before-mentioned tests were introduced. The behavior can be reproduced in the following way: # Start a local Minio instance: {code:bash} docker run \ -p 9000:9000 \ -p 9001:9001 \ -v /tmp/minio/data2:/data \ -e "MINIO_ROOT_USER=minio" \ -e "MINIO_ROOT_PASSWORD=minio123" \ minio/minio:latest server /data --console-address ":9001" {code} # Create a bucket {{foo}} through [http://localhost:9001/add-bucket] # Set environment variables accordingly: {{IT_CASE_S3_BUCKET=foo;IT_CASE_S3_ACCESS_KEY=minio;IT_CASE_S3_SECRET_KEY=minio123}} # Add the following lines to the test's {{checkCredentialsAndSetup}}: {code:java} BeforeClass public static void checkCredentialsAndSetup() throws IOException { // check whether credentials exist S3TestCredentials.assumeCredentialsAvailable(); // initialize configuration with valid credentials final Configuration conf = new Configuration(); + conf.setString("s3.endpoint", "http://localhost:9000"); + conf.setString("s3.path.style.access", "true"); conf.setString("s3.access.key", S3TestCredentials.getS3AccessKey()); conf.setString("s3.secret.key", S3TestCredentials.getS3SecretKey()); FileSystem.initialize(conf); } {code} # Remove the object store check from {{FileSystemBehaviorTestSuite.testMkdirsCreatesParentDirectories}} and run the test. This test will pass for {{HadoopS3FileSystemBehaviorITCase}} but fail for {{PrestoS3FileSystemBehaviorITCase}}. We might want to align this behavior since it's an easy-to-fall-into-trap. > FileSystem.getFileStatus fails for empty directories on Presto S3 FS > --------------------------------------------------------------------- > > Key: FLINK-26061 > URL: https://issues.apache.org/jira/browse/FLINK-26061 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / FileSystem > Affects Versions: 1.15.0 > Reporter: Matthias Pohl > Priority: Major > > {{FileSystem.getFileStatus}} -or {{FileSystem.exists}}- are not supported for > empty directories in object stores (like s3). The s3 filesystem > implementations used by Flink are working around that in different ways. > {{flink-s3-fs-hadoop}} fixes this issue internally. {{flink-s3-fs-presto}} > fails in certain situations. This is covered by already existing tests like > [FileSystemBehaviorTestSuite:125|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/test/java/org/apache/flink/core/fs/FileSystemBehaviorTestSuite.java#L125]. > There was some work done in this matter already in FLINK-8373 where the > before-mentioned tests were introduced. > The behavior can be reproduced in the following way: > # Start a local Minio instance: > {code:bash} > docker run \ > -p 9000:9000 \ > -p 9001:9001 \ > -v /tmp/minio/data2:/data \ > -e "MINIO_ROOT_USER=minio" \ > -e "MINIO_ROOT_PASSWORD=minio123" \ > minio/minio:latest server /data --console-address ":9001" > {code} > # Create a bucket {{foo}} through [http://localhost:9001/add-bucket] > # Set environment variables accordingly: > {{IT_CASE_S3_BUCKET=foo;IT_CASE_S3_ACCESS_KEY=minio;IT_CASE_S3_SECRET_KEY=minio123}} > # Add the following lines to the test's {{checkCredentialsAndSetup}}: > {code:java} > BeforeClass > public static void checkCredentialsAndSetup() throws IOException { > // check whether credentials exist > S3TestCredentials.assumeCredentialsAvailable(); > // initialize configuration with valid credentials > final Configuration conf = new Configuration(); > + conf.setString("s3.endpoint", "http://localhost:9000"); > + conf.setString("s3.path.style.access", "true"); > conf.setString("s3.access.key", S3TestCredentials.getS3AccessKey()); > conf.setString("s3.secret.key", S3TestCredentials.getS3SecretKey()); > FileSystem.initialize(conf); > } > {code} > # Remove the object store check from > {{FileSystemBehaviorTestSuite.testMkdirsCreatesParentDirectories}} and run > the test. > This test will pass for {{HadoopS3FileSystemBehaviorITCase}} but fail for > {{PrestoS3FileSystemBehaviorITCase}}. > We might want to align this behavior since it's an easy-to-fall-into-trap. -- This message was sent by Atlassian Jira (v8.20.1#820001)