[ 
https://issues.apache.org/jira/browse/FLINK-26061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-26061:
----------------------------------
    Description: 
{{FileSystem.getFileStatus}} -or {{FileSystem.exists}}- are not supported for 
empty directories in object stores (like s3). The s3 filesystem implementations 
used by Flink are working around that in different ways. {{flink-s3-fs-hadoop}} 
fixes this issue internally. {{flink-s3-fs-presto}} fails in certain 
situations. This is covered by already existing tests like 
[FileSystemBehaviorTestSuite:125|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/test/java/org/apache/flink/core/fs/FileSystemBehaviorTestSuite.java#L125].
 There was some work done in this matter already in FLINK-8373 where the 
before-mentioned tests were introduced.

The behavior can be reproduced in the following way:
 # Start a local Minio instance:
{code:bash}
docker run \
  -p 9000:9000 \
  -p 9001:9001 \
  -v /tmp/minio/data2:/data \
  -e "MINIO_ROOT_USER=minio" \
  -e "MINIO_ROOT_PASSWORD=minio123" \
  minio/minio:latest server /data --console-address ":9001"
{code}
 # Create a bucket {{foo}} through [http://localhost:9001/add-bucket]
 # Set environment variables accordingly: 
{{IT_CASE_S3_BUCKET=foo;IT_CASE_S3_ACCESS_KEY=minio;IT_CASE_S3_SECRET_KEY=minio123}}
 # Add the following lines to the test's {{checkCredentialsAndSetup}}:
{code:java}
BeforeClass
    public static void checkCredentialsAndSetup() throws IOException {
        // check whether credentials exist
        S3TestCredentials.assumeCredentialsAvailable();

        // initialize configuration with valid credentials
        final Configuration conf = new Configuration();
+        conf.setString("s3.endpoint", "http://localhost:9000";);
+        conf.setString("s3.path.style.access", "true");
        conf.setString("s3.access.key", S3TestCredentials.getS3AccessKey());
        conf.setString("s3.secret.key", S3TestCredentials.getS3SecretKey());
        FileSystem.initialize(conf);
    }
{code}
# Remove the object store check from 
{{FileSystemBehaviorTestSuite.testMkdirsCreatesParentDirectories}} and run the 
test.

This test will pass for {{HadoopS3FileSystemBehaviorITCase}} but fail for 
{{PrestoS3FileSystemBehaviorITCase}}.

We might want to align this behavior since it's an easy-to-fall-into-trap.

  was:
{{FileSystem.getFileStatus}} or {{FileSystem.exists}} are not supported for 
empty directories in object stores (like s3). The s3 filesystem implementations 
used by Flink are working around that in different ways. {{flink-s3-fs-hadoop}} 
fixes this issue internally. {{flink-s3-fs-presto}} fails in certain 
situations. This is covered by already existing tests like 
[FileSystemBehaviorTestSuite:125|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/test/java/org/apache/flink/core/fs/FileSystemBehaviorTestSuite.java#L125].
 There was some work done in this matter already in FLINK-8373 where the 
before-mentioned tests were introduced.

The behavior can be reproduced in the following way:
 # Start a local Minio instance:
{code:bash}
docker run \
  -p 9000:9000 \
  -p 9001:9001 \
  -v /tmp/minio/data2:/data \
  -e "MINIO_ROOT_USER=minio" \
  -e "MINIO_ROOT_PASSWORD=minio123" \
  minio/minio:latest server /data --console-address ":9001"
{code}
 # Create a bucket {{foo}} through [http://localhost:9001/add-bucket]
 # Set environment variables accordingly: 
{{IT_CASE_S3_BUCKET=foo;IT_CASE_S3_ACCESS_KEY=minio;IT_CASE_S3_SECRET_KEY=minio123}}
 # Add the following lines to the test's {{checkCredentialsAndSetup}}:
{code:java}
BeforeClass
    public static void checkCredentialsAndSetup() throws IOException {
        // check whether credentials exist
        S3TestCredentials.assumeCredentialsAvailable();

        // initialize configuration with valid credentials
        final Configuration conf = new Configuration();
+        conf.setString("s3.endpoint", "http://localhost:9000";);
+        conf.setString("s3.path.style.access", "true");
        conf.setString("s3.access.key", S3TestCredentials.getS3AccessKey());
        conf.setString("s3.secret.key", S3TestCredentials.getS3SecretKey());
        FileSystem.initialize(conf);
    }
{code}
# Remove the object store check from 
{{FileSystemBehaviorTestSuite.testMkdirsCreatesParentDirectories}} and run the 
test.

This test will pass for {{HadoopS3FileSystemBehaviorITCase}} but fail for 
{{PrestoS3FileSystemBehaviorITCase}}.

We might want to align this behavior since it's an easy-to-fall-into-trap.


> FileSystem.getFileStatus fails for empty directories on Presto S3 FS 
> ---------------------------------------------------------------------
>
>                 Key: FLINK-26061
>                 URL: https://issues.apache.org/jira/browse/FLINK-26061
>             Project: Flink
>          Issue Type: Technical Debt
>          Components: Connectors / FileSystem
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Priority: Major
>
> {{FileSystem.getFileStatus}} -or {{FileSystem.exists}}- are not supported for 
> empty directories in object stores (like s3). The s3 filesystem 
> implementations used by Flink are working around that in different ways. 
> {{flink-s3-fs-hadoop}} fixes this issue internally. {{flink-s3-fs-presto}} 
> fails in certain situations. This is covered by already existing tests like 
> [FileSystemBehaviorTestSuite:125|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/test/java/org/apache/flink/core/fs/FileSystemBehaviorTestSuite.java#L125].
>  There was some work done in this matter already in FLINK-8373 where the 
> before-mentioned tests were introduced.
> The behavior can be reproduced in the following way:
>  # Start a local Minio instance:
> {code:bash}
> docker run \
>   -p 9000:9000 \
>   -p 9001:9001 \
>   -v /tmp/minio/data2:/data \
>   -e "MINIO_ROOT_USER=minio" \
>   -e "MINIO_ROOT_PASSWORD=minio123" \
>   minio/minio:latest server /data --console-address ":9001"
> {code}
>  # Create a bucket {{foo}} through [http://localhost:9001/add-bucket]
>  # Set environment variables accordingly: 
> {{IT_CASE_S3_BUCKET=foo;IT_CASE_S3_ACCESS_KEY=minio;IT_CASE_S3_SECRET_KEY=minio123}}
>  # Add the following lines to the test's {{checkCredentialsAndSetup}}:
> {code:java}
> BeforeClass
>     public static void checkCredentialsAndSetup() throws IOException {
>         // check whether credentials exist
>         S3TestCredentials.assumeCredentialsAvailable();
>         // initialize configuration with valid credentials
>         final Configuration conf = new Configuration();
> +        conf.setString("s3.endpoint", "http://localhost:9000";);
> +        conf.setString("s3.path.style.access", "true");
>         conf.setString("s3.access.key", S3TestCredentials.getS3AccessKey());
>         conf.setString("s3.secret.key", S3TestCredentials.getS3SecretKey());
>         FileSystem.initialize(conf);
>     }
> {code}
> # Remove the object store check from 
> {{FileSystemBehaviorTestSuite.testMkdirsCreatesParentDirectories}} and run 
> the test.
> This test will pass for {{HadoopS3FileSystemBehaviorITCase}} but fail for 
> {{PrestoS3FileSystemBehaviorITCase}}.
> We might want to align this behavior since it's an easy-to-fall-into-trap.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to