Alek Łańduch created FLINK-33955:
------------------------------------
Summary: UnsupportedFileSystemException when trying to save data
to Azure's abfss File System
Key: FLINK-33955
URL: https://issues.apache.org/jira/browse/FLINK-33955
Project: Flink
Issue Type: Bug
Affects Versions: 1.17.1, 1.18.0
Environment: Flink 1.17.1 & Flink 1.18.0 with Java 11, ADLS Gen.2 with
hierarchical namespace enabled
Reporter: Alek Łańduch
Attachments: error.log, pom.xml, success.log
When using Azure's File System connector for reading and writing files to Azure
Data Lake Storage 2 Flink job fails at writing files with given error:
{noformat}
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException:
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme
"file"{noformat}
Full logs from Job Manager along with stack trace is attached to as
[^error.log] file.
The connection itself seems to be good, as the job successfully creates desired
structure inside ADLS (and the the `.part` file), but the file itself is empty.
The job is simple, as its only purpose is to save events `a`, `b` and `c` into
a file on ADLS. The whole code is presented below:
{code:java}
import org.apache.flink.api.common.serialization.SimpleStringEncoder;
import org.apache.flink.connector.file.sink.FileSink;
import org.apache.flink.core.fs.Path;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class DataStreamJob {
public static void main(String[] args) throws Exception {
final FileSink<String> sink = FileSink
.forRowFormat(
new Path("abfss://[email protected]/output"),
new SimpleStringEncoder<String>("UTF-8"))
.build();
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.fromElements("a", "b", "c").sinkTo(sink);
env.execute("Test");
}
}
{code}
Code is run locally using Flink 1.18.0 (the same behavior was present in
version 1.17.1). The only change that was made to `flink-conf.yaml` was to add
key for accessing Azure:
{code:java}
fs.azure.account.auth.type.stads2dev01.dfs.core.windows.net: SharedKey
fs.azure.account.key.stads2dev01.dfs.core.windows.net: ******{code}
The [^pom.xml] file was created by using [Getting
Started|https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/configuration/overview/#getting-started]
documentation - the only thing I added was `flink-azure-fs-hadoop` connector.
The whole [^pom.xml] file is attached. The connector JAR was also copied from
`opt` directory to `plugins/azure-fs-hadoop` in cluster files according to the
documentation.
The interesting fact is that the deprecated method `writeAsText` (instead of
FileSink) not only works and creates desired file on ADLS, but *the subsequent
jobs that use FileSInk that previously failed now works and creates file
successfully* (until cluster's restart). The logs from job with deprecated
method are also attached here as [^success.log] file.
I suspect that it is somehow connected to how Azure File System is initialized,
where the new FileSink method would create it incorrectly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)