Unable to run flink job in dataproc cluster with jobmanager provided
Hi Team, I'm exploring flink for one of my use case, I'm facing some issues while running a flink job in cluster mode. Below are the steps I followed to setup and run job in cluster mode : 1. Setup flink on google cloud dataproc using https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/flink 2. After setting up the cluster I could see the flink session started and could see the UI for the same. 3 Submitted job from dataproc master node using below command sudo HADOOP_CONF_DIR=/etc/hadoop/conf /usr/lib/flink/bin/flink run -m yarn-cluster -yid application_1592311654771_0001 -class com.sm.flink.FlinkDriver /usr/lib/flink/lib/flink-1.0.10-sm-SNAPSHOT.jar hdfs://cluster-flink-poc-m:8020/user/flink/rocksdb/ After running the job I see the job started successfully but created a mini local cluster and ran in local mode. I don't see any jobs submitted to JobManger and I also see 0 task managers on UI. Can someone please help me understand here?, do let me know what input is required to investigate the same.
Re: Unable to run flink job in dataproc cluster with jobmanager provided
Are you by any chance creating a local environment via (Stream)ExecutionEnvironment#createLocalEnvironment? On 17/06/2020 17:05, Sourabh Mehta wrote: Hi Team, I'm exploring flink for one of my use case, I'm facing some issues while running a flink job in cluster mode. Below are the steps I followed to setup and run job in cluster mode : 1. Setup flink on google cloud dataproc using https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/flink 2. After setting up the cluster I could see the flink session started and could see the UI for the same. 3 Submitted job from dataproc master node using below command sudo HADOOP_CONF_DIR=/etc/hadoop/conf /usr/lib/flink/bin/flink run -m yarn-cluster -yid application_1592311654771_0001 -class com.sm.flink.FlinkDriver /usr/lib/flink/lib/flink-1.0.10-sm-SNAPSHOT.jar hdfs://cluster-flink-poc-m:8020/user/flink/rocksdb/ After running the job I see the job started successfully but created a mini local cluster and ran in local mode. I don't see any jobs submitted to JobManger and I also see 0 task managers on UI. Can someone please help me understand here?, do let me know what input is required to investigate the same.
Re: Unable to run flink job in dataproc cluster with jobmanager provided
No, I am not. On Wed, 17 Jun 2020 at 10:48 PM, Chesnay Schepler wrote: > Are you by any chance creating a local environment via > (Stream)ExecutionEnvironment#createLocalEnvironment? > > On 17/06/2020 17:05, Sourabh Mehta wrote: > > Hi Team, > > I'm exploring flink for one of my use case, I'm facing some issues while > running a flink job in cluster mode. Below are the steps I followed to > setup and run job in cluster mode : > 1. Setup flink on google cloud dataproc using > https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/flink > > 2. After setting up the cluster I could see the flink session started and > could see the UI for the same. > > 3 Submitted job from dataproc master node using below command > > sudo HADOOP_CONF_DIR=/etc/hadoop/conf /usr/lib/flink/bin/flink run -m > yarn-cluster -yid application_1592311654771_0001 -class > com.sm.flink.FlinkDriver /usr/lib/flink/lib/flink-1.0.10-sm-SNAPSHOT.jar > hdfs://cluster-flink-poc-m:8020/user/flink/rocksdb/ > > After running the job I see the job started successfully but created a > mini local cluster and ran in local mode. I don't see any jobs submitted to > JobManger and I also see 0 task managers on UI. > > Can someone please help me understand here?, do let me know what input is > required to investigate the same. > > > > >
Re: Unable to run flink job in dataproc cluster with jobmanager provided
Hi Sourabh, do you have access to the cluster logs? They could be helpful for debugging the problem. Which version of Flink are you using? Cheers, Till On Wed, Jun 17, 2020 at 7:39 PM Sourabh Mehta wrote: > No, I am not. > > On Wed, 17 Jun 2020 at 10:48 PM, Chesnay Schepler > wrote: > >> Are you by any chance creating a local environment via >> (Stream)ExecutionEnvironment#createLocalEnvironment? >> >> On 17/06/2020 17:05, Sourabh Mehta wrote: >> >> Hi Team, >> >> I'm exploring flink for one of my use case, I'm facing some issues >> while running a flink job in cluster mode. Below are the steps I followed >> to setup and run job in cluster mode : >> 1. Setup flink on google cloud dataproc using >> https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/flink >> >> 2. After setting up the cluster I could see the flink session started and >> could see the UI for the same. >> >> 3 Submitted job from dataproc master node using below command >> >> sudo HADOOP_CONF_DIR=/etc/hadoop/conf /usr/lib/flink/bin/flink run -m >> yarn-cluster -yid application_1592311654771_0001 -class >> com.sm.flink.FlinkDriver /usr/lib/flink/lib/flink-1.0.10-sm-SNAPSHOT.jar >> hdfs://cluster-flink-poc-m:8020/user/flink/rocksdb/ >> >> After running the job I see the job started successfully but created a >> mini local cluster and ran in local mode. I don't see any jobs submitted to >> JobManger and I also see 0 task managers on UI. >> >> Can someone please help me understand here?, do let me know what input >> is required to investigate the same. >> >> >> >> >>
Re: Unable to run flink job in dataproc cluster with jobmanager provided
Hi , application is using 1.10.0 but cluster is setup on 1.9.0. Yes I do have access. please find below starting logs from cluster 2020-06-17 11:28:18,989 INFO org.apache.shaded.flink.table.module.ModuleManager- Got FunctionDefinition equals from module core 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, cluster-flink-poc-m 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 12288 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 12288 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 28 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 2048 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf 2020-06-17 11:28:20,550 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback keys: []) required for local execution is not set, setting it to its default value 1.7976931348623157E308 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.task.heap.size' , default: null (fallback keys: []) required for local execution is not set, setting it to its default value 9223372036854775807 bytes 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.task.off-heap.size' , default: 0 bytes (fallback keys: []) required for local execution is not set, setting it to its default value 9223372036854775807 bytes 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.network.min' , default: 64 mb (fallback keys: [{key=taskmanager.network.memory.min, isDeprecated=true}]) required for local execution is not set, setting it to its default value 64 mb 2020-06-17 11:28:20,553 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.network.max' , default: 1 gb (fallback keys: [{key=taskmanager.network.memory.max, isDeprecated=true}]) required for local execution is not set, setting it to its default value 64 mb 2020-06-17 11:28:20,553 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.managed.size' , default: null (fallback keys: [{key=taskmanager.memory.size, isDeprecated=true}]) required for local execution is not set, setting it to its default value 128 mb 2020-06-17 11:28:20,558 INFO org.apache.shaded.flink.runtime.minicluster.MiniCluster - Starting Flink Mini Cluster 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2020-
Re: Unable to run flink job in dataproc cluster with jobmanager provided
Is your user-jar packaging and relocating Flink classes? If so, then your job actually operate against the classes provided by the cluster, which, well, just wouldn't work. On 18/06/2020 09:34, Sourabh Mehta wrote: Hi , application is using 1.10.0 but cluster is setup on 1.9.0. Yes I do have access. please find below starting logs from cluster 2020-06-17 11:28:18,989 INFO org.apache.shaded.flink.table.module.ModuleManager - Got FunctionDefinition equals from module core 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, cluster-flink-poc-m 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 12288 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 12288 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 28 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 2048 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf 2020-06-17 11:28:20,550 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback keys: []) required for local execution is not set, setting it to its default value 1.7976931348623157E308 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.task.heap.size' , default: null (fallback keys: []) required for local execution is not set, setting it to its default value 9223372036854775807 bytes 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.task.off-heap.size' , default: 0 bytes (fallback keys: []) required for local execution is not set, setting it to its default value 9223372036854775807 bytes 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.network.min' , default: 64 mb (fallback keys: [{key=taskmanager.network.memory.min, isDeprecated=true}]) required for local execution is not set, setting it to its default value 64 mb 2020-06-17 11:28:20,553 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.network.max' , default: 1 gb (fallback keys: [{key=taskmanager.network.memory.max, isDeprecated=true}]) required for local execution is not set, setting it to its default value 64 mb 2020-06-17 11:28:20,553 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.managed.size' , default: null (fallback keys: [{key=taskmanager.memory.size, isDeprecated=true}]) required for local execution is not set, setting it to its default value 128 mb 2020-06-17 11:28:20,558 INFO org.apache.shaded.flink.runtime.minicluster.MiniCluster - Starting Flink Mini Cluster 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading confi