[ 
https://issues.apache.org/jira/browse/FLINK-19067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196832#comment-17196832
 ] 

JieFang.He edited comment on FLINK-19067 at 9/16/20, 9:50 AM:
--------------------------------------------------------------

[~rmetzger] I find the same exception on original version

It seems that the client get the incorrect blob node.

The exception on client shows that the client want to get the file on 
node-01,but i find that the blob file is create on node-02

Here is the exception on client
{code:java}
Caused by: java.io.IOException: Failed to fetch BLOB 
0d04922e319f520b84a2f20c9b4556e0/p-903092697dde4fa408ba1f52fae34c5e876a997a-94c6b64d36b76c695392475a3c72cfab
 from ***-deployer-hejiefang01/172.17.0.4:32101 and store it under 
/data3/***/flink/tmp/blobStore-523c19be-109b-4b7b-9934-b78b0281ffd4/incoming/temp-00000002
{code}
But I find the file is on node-02
{code:java}
[root@***-deployer-hejiefang02 default]# for i in {1..100000}; do ll -R; sleep 
1;done
total 36
drwxr-xr-x 3 mr users  4096 Sep 17 01:28 blob
-rw-r--r-- 1 mr users 32576 Sep 17 01:28 submittedJobGraph202f6020058b
./blob:
total 4
drwxr-xr-x 2 mr users 4096 Sep 17 01:28 job_0d04922e319f520b84a2f20c9b4556e0
./blob/job_0d04922e319f520b84a2f20c9b4556e0:
total 12
-rw-r--r-- 1 mr users 9401 Sep 17 01:28 
blob_p-903092697dde4fa408ba1f52fae34c5e876a997a-94c6b64d36b76c695392475a3c72cfab
[root@***-deployer-hejiefang02 default]# 
{code}
Node-01 has no file
{code:java}
[root@***-deployer-hejiefang01 default]# for i in {1..100000}; do ll -R; sleep 
1;done
.:
total 0
.:
total 0
.:
[root@***-deployer-hejiefang01 default]# 
{code}
And the information on zookeeper is that, dispatcher is on node-02, 
resource_manager and rest_server are on node-01
{code:java}
[zk: localhost:2181(CONNECTED) 0] get /flink/default/leader/dispatcher_lock
??wGEakka.tcp://flink@***-deployer-hejiefang02:37241/user/rpc/dispatcher_1srjava.util.UUID????m?/J
[zk: localhost:2181(CONNECTED) 1] get 
/flink/default/leader/resource_manager_lock
??wLJakka.tcp://flink@***-deployer-hejiefang01:30717/user/rpc/resourcemanager_0srjava.util.UUID????m?/J
[zk: localhost:2181(CONNECTED) 2] get /flink/default/leader/rest_server_lock
??w&$http://***-deployer-hejiefang01:8181srjava.util.UUID????m?/J
{code}
[^flink-jobmanager-deployer-hejiefang01.log]

 


was (Author: hejiefang):
[~rmetzger] I find the same exception on original version

It seems that the client get the incorrect blob node.

The exception on client shows that the client want to get the file on 
node-01,but i find that the blob file is create on node-02

Here is the exception on client

 
{code:java}
Caused by: java.io.IOException: Failed to fetch BLOB 
0d04922e319f520b84a2f20c9b4556e0/p-903092697dde4fa408ba1f52fae34c5e876a997a-94c6b64d36b76c695392475a3c72cfab
 from ***-deployer-hejiefang01/172.17.0.4:32101 and store it under 
/data3/***/flink/tmp/blobStore-523c19be-109b-4b7b-9934-b78b0281ffd4/incoming/temp-00000002
{code}
But I find the file is on node-02

 

 
{code:java}
[root@***-deployer-hejiefang02 default]# for i in {1..100000}; do ll -R; sleep 
1;done
total 36
drwxr-xr-x 3 mr users  4096 Sep 17 01:28 blob
-rw-r--r-- 1 mr users 32576 Sep 17 01:28 submittedJobGraph202f6020058b
./blob:
total 4
drwxr-xr-x 2 mr users 4096 Sep 17 01:28 job_0d04922e319f520b84a2f20c9b4556e0
./blob/job_0d04922e319f520b84a2f20c9b4556e0:
total 12
-rw-r--r-- 1 mr users 9401 Sep 17 01:28 
blob_p-903092697dde4fa408ba1f52fae34c5e876a997a-94c6b64d36b76c695392475a3c72cfab
[root@***-deployer-hejiefang02 default]# 
{code}
Node-01 has no file
{code:java}
[root@***-deployer-hejiefang01 default]# for i in {1..100000}; do ll -R; sleep 
1;done
.:
total 0
.:
total 0
.:
[root@***-deployer-hejiefang01 default]# 
{code}
And the information on zookeeper is that, dispatcher is on node-02, 
resource_manager and rest_server are on node-01

 

 
{code:java}
[zk: localhost:2181(CONNECTED) 0] get /flink/default/leader/dispatcher_lock
??wGEakka.tcp://flink@***-deployer-hejiefang02:37241/user/rpc/dispatcher_1srjava.util.UUID????m?/J
[zk: localhost:2181(CONNECTED) 1] get 
/flink/default/leader/resource_manager_lock
??wLJakka.tcp://flink@***-deployer-hejiefang01:30717/user/rpc/resourcemanager_0srjava.util.UUID????m?/J
[zk: localhost:2181(CONNECTED) 2] get /flink/default/leader/rest_server_lock
??w&$http://***-deployer-hejiefang01:8181srjava.util.UUID????m?/J
{code}
[^flink-jobmanager-deployer-hejiefang01.log]

 

> FileNotFoundException when run flink examples
> ---------------------------------------------
>
>                 Key: FLINK-19067
>                 URL: https://issues.apache.org/jira/browse/FLINK-19067
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.1
>            Reporter: JieFang.He
>            Priority: Major
>         Attachments: flink-jobmanager-deployer-hejiefang01.log, 
> flink-jobmanager-deployer-hejiefang02.log, 
> flink-taskmanager-deployer-hejiefang01.log, 
> flink-taskmanager-deployer-hejiefang02.log
>
>
> 1、When run examples/batch/WordCount.jar,it will fail with the exception:
> Caused by: java.io.FileNotFoundException: 
> /data2/flink/storageDir/default/blob/job_d29414828f614d5466e239be4d3889ac/blob_p-a2ebe1c5aa160595f214b4bd0f39d80e42ee2e93-f458f1c12dc023e78d25f191de1d7c4b
>  (No such file or directory)
>  at java.io.FileInputStream.open0(Native Method)
>  at java.io.FileInputStream.open(FileInputStream.java:195)
>  at java.io.FileInputStream.<init>(FileInputStream.java:138)
>  at 
> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
>  at 
> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:143)
>  at 
> org.apache.flink.runtime.blob.FileSystemBlobStore.get(FileSystemBlobStore.java:105)
>  at 
> org.apache.flink.runtime.blob.FileSystemBlobStore.get(FileSystemBlobStore.java:87)
>  at 
> org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:501)
>  at 
> org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:231)
>  at 
> org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:117)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to