Alek Łańduch created FLINK-33955:
------------------------------------

             Summary: UnsupportedFileSystemException when trying to save data 
to Azure's abfss File System
                 Key: FLINK-33955
                 URL: https://issues.apache.org/jira/browse/FLINK-33955
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.17.1, 1.18.0
         Environment: Flink 1.17.1 & Flink 1.18.0 with Java 11, ADLS Gen.2 with 
hierarchical namespace enabled
            Reporter: Alek Łańduch
         Attachments: error.log, pom.xml, success.log

When using Azure's File System connector for reading and writing files to Azure 
Data Lake Storage 2 Flink job fails at writing files with given error:

 
{noformat}
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: 
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
"file"{noformat}
 

Full logs from Job Manager along with stack trace is attached to as 
[^error.log] file.

The connection itself seems to be good, as the job successfully creates desired 
structure inside ADLS (and the the `.part` file), but the file itself is empty.

The job is simple, as its only purpose is to save events `a`, `b` and `c` into 
a file on ADLS. The whole code is presented below:
{code:java}
import org.apache.flink.api.common.serialization.SimpleStringEncoder;
import org.apache.flink.connector.file.sink.FileSink;
import org.apache.flink.core.fs.Path;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class DataStreamJob {
  public static void main(String[] args) throws Exception {
    final FileSink<String> sink = FileSink
        .forRowFormat(
            new Path("abfss://t...@stads2dev01.dfs.core.windows.net/output"),
            new SimpleStringEncoder<String>("UTF-8"))
        .build();
    final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
    env.fromElements("a", "b", "c").sinkTo(sink);
    env.execute("Test");
  }
}
{code}
Code is run locally using Flink 1.18.0 (the same behavior was present in 
version 1.17.1). The only change that was made to `flink-conf.yaml` was to add 
key for accessing Azure:

 
{code:java}
fs.azure.account.auth.type.stads2dev01.dfs.core.windows.net: SharedKey
fs.azure.account.key.stads2dev01.dfs.core.windows.net: ******{code}
 

The [^pom.xml] file was created by using [Getting 
Started|https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/configuration/overview/#getting-started]
 documentation - the only thing I added was `flink-azure-fs-hadoop` connector. 
The whole [^pom.xml] file is attached. The connector JAR was also copied from 
`opt` directory to `plugins/azure-fs-hadoop` in cluster files according to the 
documentation.

The interesting fact is that the deprecated method `writeAsText` (instead of 
FileSink) not only works and creates desired file on ADLS, but *the subsequent 
jobs that use FileSInk that previously failed now works and creates file 
successfully* (until cluster's restart). The logs from job with deprecated 
method are also attached here as [^success.log] file.

I suspect that it is somehow connected to how Azure File System is initialized, 
where the new FileSink method would create it incorrectly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to