date:20181108

How to get the ActorSystem in custom operator

2018-11-08 Thread wpb

hi??
Since I want to use akka to sync some status in my custom operator??Is 
there any way to get the ActorSystem in 
org.apache.flink.streaming.api.operators.AbstractStreamOperator  or get the 
address list of the taskmanager to build the ActorSystem??




thanks

Re: how get job id which job run slot

2018-11-08 Thread vino yang

Hi lining,

Yes, currently you can't get slot information via the
"/taskmanagers/:taskmanagerid" rest API.

In addition, please ask questions in the user mailing list. The dev mailing
list mainly discusses information related to Flink development.

Thanks, vino.

lining jing  于2018年11月9日周五 上午5:42写道：

> Hi, dev. I have a question. Now rest api can get TaskManagerInfo, but can
> not get  information of task which run on slot.
>

Re: Queryable state when key is UUID - getting Kyro Exception

2018-11-08 Thread Jayant Ameta

Yeah, it IS using Kryo serializer.

Jayant Ameta


On Wed, Nov 7, 2018 at 9:57 PM Till Rohrmann  wrote:

> Hi Jayant, could you check that the UUID key on the TM is actually
> serialized using a Kryo serializer? You can do this by setting a breakpoint
> in the constructor of the `AbstractKeyedStateBackend`.
>
> Cheers,
> Till
>
> On Tue, Oct 30, 2018 at 9:44 AM bupt_ljy  wrote:
>
>> Hi, Jayant
>>
>> Your code looks good to me. And I’ve tried the serialize/deserialize
>> of Kryo on UUID class, it all looks okay.
>>
>> I’m not very sure about this problem. Maybe you can write a very
>> simple demo to try if it works.
>>
>>
>> Jiayi Liao, Best
>>
>>  Original Message
>> *Sender:* Jayant Ameta
>> *Recipient:* bupt_ljy
>> *Cc:* Tzu-Li (Gordon) Tai; user<
>> user@flink.apache.org>
>> *Date:* Monday, Oct 29, 2018 11:53
>> *Subject:* Re: Queryable state when key is UUID - getting Kyro Exception
>>
>> Hi Jiayi,
>> Any further help on this?
>>
>> Jayant Ameta
>>
>>
>> On Fri, Oct 26, 2018 at 9:22 AM Jayant Ameta 
>> wrote:
>>
>>> MapStateDescriptor descriptor = new 
>>> MapStateDescriptor<>("rulePatterns", UUID.class,
>>> String.class);
>>>
>>> Jayant Ameta
>>>
>>>
>>> On Fri, Oct 26, 2018 at 8:19 AM bupt_ljy  wrote:
>>>
 Hi,

Can you show us the descriptor in the codes below?

 client.getKvState(JobID.fromHexString(
 "c7b8af14b8afacf4fac16cdd0da7e997"), "rule",

 UUID.fromString("3b3f17a0-d81a-11e8-bb91-7fd1412de84d"),
> TypeInformation.of(new TypeHint() {}), descriptor);
>
>
 Jiayi Liao, Best


  Original Message
 *Sender:* Jayant Ameta
 *Recipient:* bupt_ljy
 *Cc:* Tzu-Li (Gordon) Tai; user<
 user@flink.apache.org>
 *Date:* Friday, Oct 26, 2018 02:26
 *Subject:* Re: Queryable state when key is UUID - getting Kyro
 Exception

 Also, I haven't provided any custom serializer in my flink job.
 Shouldn't the same configuration work for queryable state client?

 Jayant Ameta


 On Thu, Oct 25, 2018 at 4:15 PM Jayant Ameta 
 wrote:

> Hi Gordon,
> Following is the stack trace that I'm getting:
>
> *Exception in thread "main" java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: Failed request 0.*
> * Caused by: java.lang.RuntimeException: Failed request 0.*
> * Caused by: java.lang.RuntimeException: Error while processing
> request with ID 0. Caused by: com.esotericsoftware.kryo.KryoException:
> Encountered unregistered class ID: -985346241*
> *Serialization trace:*
> *$outer (scala.collection.convert.Wrappers$SeqWrapper)*
> * at
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)*
> * at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)*
> * at
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)*
> * at
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)*
> * at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)*
> * at
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:249)*
> * at
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)*
> * at
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)*
> * at
> org.apache.flink.queryablestate.client.state.serialization.KvStateSerializer.deserializeKeyAndNamespace(KvStateSerializer.java:94)*
> * at
> org.apache.flink.runtime.state.heap.AbstractHeapState.getSerializedValue(AbstractHeapState.java:93)*
> * at
> org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:87)*
> * at
> org.apache.flink.queryablestate.server.KvStateServerHandler.handleRequest(KvStateServerHandler.java:49)*
> * at
> org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:229)*
> * at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)*
> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
> * at java.lang.Thread.run(Thread.java:748)*
>
> I am not using any custom serialize as mentioned by Jiayi.
>
> Jayant Ameta
>
>
> On Thu, Oct 25, 2018 at 3:01 PM bupt_ljy  wrote:
>
>> Hi  Jayant,
>>
>>   There should be a Serializer parameter in the constructor of the
>> StateDescriptor, you should create a new serializer like this:
>>
>>
>>new GenericTypeInfo(classOf[UUID]).createSerializer(env.getConfig)
>>
>>
>>  By the way, can y

The heartbeat of TaskManager with id ... timed out.

2018-11-08 Thread Hao Sun

I am running Flink 1.7 on K8S. I am not sure how to debug this issue. I
turned on debug on JM/TM.

I am not sure this part is related or not. How could an Actor suddenly
disappear?

=
2018-11-09 04:47:19,480 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcher  - Query
metrics for akka://flink-metrics/user/MetricQueryService.
2018-11-09 04:47:19,577 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcher  -
Retrieve metric query service gateway for
akka.tcp://flink-metrics@fps-flink-taskmanager-6f8f687fc8-b5qmm
:34429/user/MetricQueryService_0cfaa7b8f193a8002f121282298c58ac
2018-11-09 04:47:19,577 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcher  -
Retrieve metric query service gateway for
akka.tcp://flink-metrics@fps-flink-taskmanager-6f8f687fc8-ckwzp
:43062/user/MetricQueryService_1930889e5b6c51cbe57428f9a664e4dc
2018-11-09 04:47:19,578 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcher  -
Retrieve metric query service gateway for
akka.tcp://flink-metrics@fps-flink-taskmanager-6f8f687fc8-bzq2w
:39393/user/MetricQueryService_fcaf62301df0a6aeb29a65470cfe1e7a
2018-11-09 04:47:19,585 TRACE
org.apache.flink.runtime.rest.FileUploadHandler   - Received
request. URL:/jobs/ Method:GET
2018-11-09 04:47:19,613 DEBUG
org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcher  - Could
not retrieve QueryServiceGateway.
java.util.concurrent.CompletionException: akka.actor.ActorNotFound: Actor
not found for:
ActorSelection[Anchor(akka.tcp://flink-metrics@fps-flink-taskmanager-6f8f687fc8-ckwzp:43062/),
Path(/user/MetricQueryService_1930889e5b6c51cbe57428f9a664e4dc)]
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
at
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
at
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
at scala.concurrent.Promise.complete(Promise.scala:49)
at scala.concurrent.Promise.complete$(Promise.scala:48)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:183)
at scala.concurrent.Promise.failure(Promise.scala:100)
at scala.concurrent.Promise.failure$(Promise.scala:100)
at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:183)
at akka.actor.ActorSelection.$anonfun$resolveOne$1(ActorSelection.scala:68)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:76)
at akka.dispatch.BatchingExecutor.execute(BatchingExecutor.scala:120)
at akka.dispatch.BatchingExecutor.execute$(BatchingExecutor.scala:114)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:75)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
at
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
at
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:538)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:558)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:595)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:584)
at
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:98)
at akka.remote.EndpointWriter.postStop(Endpoint.scala:593)
at akka.actor.Actor.aroundPostStop(Actor.scala:515)
at akka.actor.Actor.aroundPostStop$(Actor.scala:515)
at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:446)
at akka.actor.du

Re: Always trigger calculation of a tumble window in Flink SQL

2018-11-08 Thread yinhua.dai

I am able to write a single operator as you suggested, thank you.

And then I saw ContinuousProcessingTimeTrigger from flink source code, it
looks like it's something that I am looking for, if there is a way that I
can customize the SQL "TUMBLE" window to use this trigger instead of
ProcessingTimeTrigger, then it should solve my problem.

Do you know if there is a way to use a customize trigger in the "TUMBLE"
window of SQL?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: akka timeout exception

2018-11-08 Thread Anil

Thanks for the reply Dawid. The Flink jobs are deployed in Yarn cluster. I am
seeing the  error in Job Manager log for some jobs too frequently. I'm using
Flink 1.4.2. I'm running only Streaming Jobs. 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

How can I configure logback for TaskManager in LocalStreamEnvironment

2018-11-08 Thread wpb

I'm using Flink LocalStreamEnvironment when developing job. 

When I use the logback for logging, and I found that logback.xml only work for 
Jobmanager  but not work for TaskManager.

How can I configure the log layout and level for TaskManager in 
LocalStreamEnvironment.

Thanks!

PB

Re: ProcessFunction's Event Timer not firing

2018-11-08 Thread Hequn Cheng

Hi Fritz,

Watermarks are merged on stream shuffles. If one of the input's watermark
not progressing, they will not advance the event time at the operators. I
think you should decrease the parallelism of source and make sure there are
data in each of your source partition.
Note that the Kafka source supports per-partition watermarking, which you
can read more about here[1].

Best, Hequn
[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/event_timestamps_watermarks.html#timestamps-per-kafka-partition

On Fri, Nov 9, 2018 at 1:56 AM Fritz Budiyanto  wrote:

> Hi All,
>
> I noticed if one of the slot's watermark not progressing, its impacting
> all slots processFunction timer and no timer are not firing.
>
> In my example, I have Source parallelism set to 8 and Kafka partition is
> 4. The next operator is processFunction with parallelism of 8 +  event
> timer. I can see from the debug log that one of the slot's watermark is not
> progressing. As a result, all slot's timer in the process function are not
> firing. Is this expected behavior or issue? How do I prevent this condition?
>
> Thanks,
> Fritz

Manually clean SQL keyed state

2018-11-08 Thread shkob1

I have a scenario in which i do a non-windowed group by using SQL. something
like

"Select count(*) as events, shouldTrigger(..) as shouldTrigger from source
group by sessionId"
i'm then converting to a retracted stream, filtering by "add" messages, then
further filtering by "shouldTrigger" field and sends out the result to a
sink.

While i'm using the query config (idle state retention time), it seems like
i can reduce the state size by clearing the state of the specific session id
earlier ("shouldTrigger" marks the end of the session rather than a timed
window). 

Is there a way for me to clear that state assuming i need to use the SQL
API? 

Thanks!
Shahar





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Rich variant for Async IO in Scala

2018-11-08 Thread Bruno Aranda

Hi,

I see that the AsyncFunction for Scala does not seem to have a rich variant
like the Java one. Is there a particular reason for this? Is there any
workaround?

Thanks!

Bruno

Re: HA jobmanagers redirect to ip address of leader instead of hostname

2018-11-08 Thread Jeroen Steggink | knowsy


Hi Till,

Thanks for your reply. We are running version 1.5.4. We can't upgrade to 
1.6.x because we are using Apache Beam which doesn't support 1.6.x yet.


I have also made a Jira issue about this: 
https://issues.apache.org/jira/projects/FLINK/issues/FLINK-10748


Best regards,
Jeroen Steggink

On 08-Nov-18 11:40, Jeroen Steggink | knowsy wrote:


Hi Till,

Thanks for your reply. We are running version 1.5.4. We can't upgrade 
to 1.6.x because we are using Apache Beam which doesn't support 1.6.x yet.


I have also made a Jira issue about this: 
https://issues.apache.org/jira/projects/FLINK/issues/FLINK-10748


Best regards,
Jeroen Steggink

On 07-Nov-18 16:06, Till Rohrmann wrote:

Hi Jeroen,

this sounds like a bug in Flink that we return sometimes IP addresses 
instead of hostnames. Could you tell me which Flink version you are 
using? In the current version, the redirect address and the address 
retrieved from ZooKeeper should actually be the same.


In the future, we plan to remove the redirect message and simply 
forward the request to the current leader. This should hopefully 
avoid these kind of problems.


Cheers,
Till

On Fri, Oct 26, 2018 at 1:40 PM Jeroen Steggink | knowsy 
mailto:jer...@knowsy.nl>> wrote:


Hi,

I'm having some troubles with Flink jobmanagers in a HA setup within
OpenShift.

I have three jobmanagers, a Zookeeper cluster and a loadbalancer
(Openshift/Kubernetes Route) for the web ui / rest server on the
jobmanagers. Everything works fine, as long as the loadbalancer
connects
to the leader. However, when the leader changes and the loadbalancer
connects to a non-leader, the jobmanager redirects to a leader
using the
ip address of the host. Since the routing in our network is done
using
hostnames, it doesn't know how to find the node using the ip
address and
results in a timeout.

So I have a few questions:
1. Why is Flink using the ip addresses instead of the hostname
which are
configured in the config? Other times it does use the hostname,
like the
info send to Zookeeper.
2. Is there another way of coping with connections to non-leaders
instead of redirects? Maybe proxying through a non-leader to the
leader?

Cheers,
Jeroen

Re: Understanding checkpoint behavior

2018-11-08 Thread Piotr Nowojski

Hi,

> On 6 Nov 2018, at 18:22, PranjalChauhan  wrote:
> 
> Thank you for your response Piotr. I plan to upgrade to Flink 1.5.x early
> next year.
> 
> Two follow-up questions for now. 
> 
> 1. 
> " When operator snapshots are taken, there are two parts: the synchronous
> and the asynchronous parts. "
> I understand that when the operator snapshot is being taken, the processing
> of that operator is stopped as taking this snapshot is synchronous part. Is
> there any other synchronous part in the snapshot / checkpoint process?
> 

Not as far as I know.

> 
> 2. 
> Based on the test I mentioned above, my understanding is that for a window
> operator, when all events that belongs to checkpoint N and the checkpoint
> barrier N are received by window operator (but pending for window to be
> triggered), then checkpoint barrier N will be immediately emitted to the
> sink operator (so snapshot can be completed) while the events are still
> pending to be evaluated by window operator.
> 
> Can you please confirm my understanding as I was initially confused by the
> following second statement (emits all pending outgoing records) under
> Barriers section in this doc
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/internals/stream_checkpointing.html
> ?
> 
> "When an intermediate operator has received a barrier for snapshot n from
> all of its input streams, it emits itself a barrier for snapshot n into all
> of its outgoing streams."
> 
> " Once the last stream has received barrier n, the operator emits all
> pending outgoing records, and then emits snapshot n barriers itself. “

I think you might be mixing two different concepts, watermarks and checkpoint 
barriers. The documentation that you are quoting describes checkpointing 
mechanism, checkpoint barriers and records alignment. Checkpoint barrier do not 
cause any results to be emitted from WindowOperator, this happens when timers 
are triggered (wall clock timers in case of processing time or watermarks in 
case of event time). 

Piotrek

Re: akka timeout exception

2018-11-08 Thread K Fred

Hi，

I got the same exception when running in flink cluster. The settings is
below:

flink version: 1.5.4

flink-conf.yaml:
jobmanager.heap.mb: 102400
taskmanager.heap.mb: 102400
taskmanager.numberOfTaskSlots: 40
parallelism.default: 40

I have 5 task manager.

My code just read hbase table data and write to another table. The size of
data about 1TB.

Thanks!



On Thu, Nov 8, 2018 at 5:50 PM Dawid Wysakowicz 
wrote:

> Hi,
>
> Could you provide us with some more information? Which version of flink
> are you running? In which cluster setup? When does this exception occur?
> This exception says that request for status overview (no of
> taskmanagers, slots info etc.) failed.
>
> Best,
>
> Dawid
>
> On 31/10/2018 20:05, Anil wrote:
> > getting this error in my job manager too frequently. any help. Thanks!
> >
> > java.util.concurrent.CompletionException:
> akka.pattern.AskTimeoutException:
> > Ask timed out on [Actor[akka://flink/user/jobmanager#1927353472]] after
> > [1 ms]. Sender[null] sent message of type
> > "org.apache.flink.runtime.messages.webmonitor.RequestStatusOverview".
> >   at
> >
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> >   at
> >
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> >   at
> >
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> >   at
> >
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> >   at
> >
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> >   at
> >
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> >   at
> >
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:442)
> >   at akka.dispatch.OnComplete.internal(Future.scala:258)
> >   at akka.dispatch.OnComplete.internal(Future.scala:256)
> >   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> >   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> >   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> >   at
> >
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> >   at
> > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> >   at
> >
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> >   at scala.concurrent.Promise$class.complete(Promise.scala:55)
> >   at
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
> >   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
> >   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
> >   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> >   at
> >
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
> >   at
> >
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
> >   at
> >
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
> >   at
> >
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
> >   at
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> >   at
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
> >   at
> >
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> >   at
> >
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
> >   at
> >
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> >   at
> > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> >   at
> >
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> >   at
> >
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> >   at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> >   at
> >
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> >   at
> >
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> >   at
> >
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> >   at
> >
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> >   at
> >
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
> >   at
> >
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
> >   at
> >
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
> >   at java.lang.Thread.run(Thread.java:748)
> > Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> > [Actor[akka://fli

Re: FlinkCEP, circular references and checkpointing failures

2018-11-08 Thread Shailesh Jain

Thanks a lot for looking into this issue Stefan.

Could you please let me know the issue ID once you open it? It'll help me
understand the problem better, and also I could do a quick test in our
environment once the issue is resolved.

Thanks,
Shailesh

On Wed, Nov 7, 2018, 10:46 PM Till Rohrmann  Really good finding Stefan!
>
> On Wed, Nov 7, 2018 at 5:28 PM Stefan Richter 
> wrote:
>
>> Hi,
>>
>> I think I can already spot the
>> problem: LockableTypeSerializer.duplicate() is not properly implemented
>> because it also has to call duplicate() on the element serialiser that is
>> passed into the constructor of the new instance. I will open an issue and
>> fix the problem.
>>
>> Best,
>> Stefan
>>
>> On 7. Nov 2018, at 17:17, Till Rohrmann  wrote:
>>
>> Hi Shailesh,
>>
>> could you maybe provide us with an example program which is able to
>> reproduce this problem? This would help the community to better debug the
>> problem. It looks not right and might point towards a bug in Flink. Thanks
>> a lot!
>>
>> Cheers,
>> Till
>>
>> On Tue, Oct 30, 2018 at 9:10 AM Dawid Wysakowicz 
>> wrote:
>>
>>> This is some problem with serializing your events using Kryo. I'm adding
>>> Gordon to cc, as he was recently working with serializers. He might give
>>> you more insights what is going wrong.
>>>
>>> Best,
>>>
>>> Dawid
>>> On 25/10/2018 05:41, Shailesh Jain wrote:
>>>
>>> Hi Dawid,
>>>
>>> I've upgraded to flink 1.6.1 and rebased by changes against the tag
>>> 1.6.1, the only commit on top of 1.6 is this:
>>> https://github.com/jainshailesh/flink/commit/797e3c4af5b28263fd98fb79daaba97cabf3392c
>>>
>>> I ran two separate identical jobs (with and without checkpointing
>>> enabled), I'm hitting a ArrayIndexOutOfBoundsException (and sometimes NPE) 
>>> *only
>>> when checkpointing (HDFS backend) is enabled*, with the below stack
>>> trace.
>>>
>>> I did see a similar problem with different operators here (
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/IndexOutOfBoundsException-on-deserialization-after-updating-to-1-6-1-td23933.html).
>>> Is this a known issue which is getting addressed?
>>>
>>> Any ideas on what could be causing this?
>>>
>>> Thanks,
>>> Shailesh
>>>
>>>
>>> 2018-10-24 17:04:13,365 INFO
>>> org.apache.flink.runtime.taskmanager.Task -
>>> SelectCepOperatorMixedTime (1/1) - SelectCepOperatorMixedTime (1/1)
>>> (3d984b7919342a3886593401088ca2cd) switched from RUNNING to FAILED.
>>> org.apache.flink.util.FlinkRuntimeException: Failure happened in filter
>>> function.
>>> at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:731)
>>> at org.apache.flink.cep.nfa.NFA.computeNextStates(NFA.java:541)
>>> at org.apache.flink.cep.nfa.NFA.doProcess(NFA.java:284)
>>> at org.apache.flink.cep.nfa.NFA.process(NFA.java:220)
>>> at
>>> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processEvent(AbstractKeyedCEPPatternOperator.java:379)
>>> at
>>> org.apache.flink.cep.operator.AbstractKeyedCEPPatternMixedTimeApproachOperator.processElement(AbstractKeyedCEPPatternMixedTimeApproachOperator.java:45)
>>> at
>>> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
>>> at
>>> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
>>> at
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Caused by: org.apache.flink.util.WrappingRuntimeException:
>>> java.lang.ArrayIndexOutOfBoundsException: -1
>>> at
>>> org.apache.flink.cep.nfa.sharedbuffer.SharedBuffer.lambda$materializeMatch$1(SharedBuffer.java:305)
>>> at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
>>> at
>>> org.apache.flink.cep.nfa.sharedbuffer.SharedBuffer.materializeMatch(SharedBuffer.java:301)
>>> at
>>> org.apache.flink.cep.nfa.sharedbuffer.SharedBuffer.materializeMatch(SharedBuffer.java:291)
>>> at
>>> org.apache.flink.cep.nfa.NFA$ConditionContext.getEventsForPattern(NFA.java:811)
>>> at
>>> com.stellapps.contrakcep.flink.patterns.critical.AgitatorMalfunctionAfterChillingPattern$1.filter(AgitatorMalfunctionAfterChillingPattern.java:70)
>>> at
>>> com.stellapps.contrakcep.flink.patterns.critical.AgitatorMalfunctionAfterChillingPattern$1.filter(AgitatorMalfunctionAfterChillingPattern.java:62)
>>> at
>>> org.apache.flink.cep.nfa.NFA.checkFilterCondition(NFA.java:742)
>>> at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:716)
>>> ... 10 more
>>> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
>>> at com.esotericsoftware.kryo.util.IntArray.add(IntArray.java:61)
>>> at
>>> com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:800)
>>> at com.esoteric

Multiple operators to the same sink

2018-11-08 Thread burgesschen

Hi Guys! I'm designing a topology where multiple operators should forward the
messages to the same sink.


For example I have Operator A,B,C,D,E. I want A,B,C to forward to Sink1 and
D, E to forward to Sink2.

My options are

1. Union A, B and C. then add Sink1 to them. Similarly for D and E. However,
the current framework out team has builds each operator individually. There
is nothing outside of the operators 
that has the knowledge of their destination sink. It means we need to build
something on the job level to union the operators.

2. have each operator output to a side output tag. A,B, and C will output to
tag "sink1", And have a singleton sink1 to consume from tag "sink1".
Similarly for sink2. My concern here is that 'it feels hacky', since those
messages are not really side outputs.

is this a legitimate use case for output tag or not? Is there a better way
to achieve this? Thank you!






--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: FlinkCEP, circular references and checkpointing failures

2018-11-08 Thread Stefan Richter

Sure, it is already merged as FLINK-10816.

Best,
Stefan

> On 8. Nov 2018, at 11:53, Shailesh Jain  wrote:
> 
> Thanks a lot for looking into this issue Stefan.
> 
> Could you please let me know the issue ID once you open it? It'll help me 
> understand the problem better, and also I could do a quick test in our 
> environment once the issue is resolved.
> 
> Thanks,
> Shailesh
> 
> On Wed, Nov 7, 2018, 10:46 PM Till Rohrmann   wrote:
> Really good finding Stefan!
> 
> On Wed, Nov 7, 2018 at 5:28 PM Stefan Richter  > wrote:
> Hi,
> 
> I think I can already spot the problem: LockableTypeSerializer.duplicate() is 
> not properly implemented because it also has to call duplicate() on the 
> element serialiser that is passed into the constructor of the new instance. I 
> will open an issue and fix the problem.
> 
> Best,
> Stefan
> 
>> On 7. Nov 2018, at 17:17, Till Rohrmann > > wrote:
>> 
>> Hi Shailesh,
>> 
>> could you maybe provide us with an example program which is able to 
>> reproduce this problem? This would help the community to better debug the 
>> problem. It looks not right and might point towards a bug in Flink. Thanks a 
>> lot!
>> 
>> Cheers,
>> Till
>> 
>> On Tue, Oct 30, 2018 at 9:10 AM Dawid Wysakowicz > > wrote:
>> This is some problem with serializing your events using Kryo. I'm adding 
>> Gordon to cc, as he was recently working with serializers. He might give you 
>> more insights what is going wrong.
>> 
>> Best,
>> 
>> Dawid
>> 
>> On 25/10/2018 05:41, Shailesh Jain wrote:
>>> Hi Dawid,
>>> 
>>> I've upgraded to flink 1.6.1 and rebased by changes against the tag 1.6.1, 
>>> the only commit on top of 1.6 is this: 
>>> https://github.com/jainshailesh/flink/commit/797e3c4af5b28263fd98fb79daaba97cabf3392c
>>>  
>>> 
>>> 
>>> I ran two separate identical jobs (with and without checkpointing enabled), 
>>> I'm hitting a ArrayIndexOutOfBoundsException (and sometimes NPE) only when 
>>> checkpointing (HDFS backend) is enabled, with the below stack trace.
>>> 
>>> I did see a similar problem with different operators here 
>>> (http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/IndexOutOfBoundsException-on-deserialization-after-updating-to-1-6-1-td23933.html
>>>  
>>> ).
>>>  Is this a known issue which is getting addressed?
>>> 
>>> Any ideas on what could be causing this?
>>> 
>>> Thanks,
>>> Shailesh
>>> 
>>> 
>>> 2018-10-24 17:04:13,365 INFO  org.apache.flink.runtime.taskmanager.Task 
>>> - SelectCepOperatorMixedTime (1/1) - 
>>> SelectCepOperatorMixedTime (1/1) (3d984b7919342a3886593401088ca2cd) 
>>> switched from RUNNING to FAILED.
>>> org.apache.flink.util.FlinkRuntimeException: Failure happened in filter 
>>> function.
>>> at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:731)
>>> at org.apache.flink.cep.nfa.NFA.computeNextStates(NFA.java:541)
>>> at org.apache.flink.cep.nfa.NFA.doProcess(NFA.java:284)
>>> at org.apache.flink.cep.nfa.NFA.process(NFA.java:220)
>>> at 
>>> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processEvent(AbstractKeyedCEPPatternOperator.java:379)
>>> at 
>>> org.apache.flink.cep.operator.AbstractKeyedCEPPatternMixedTimeApproachOperator.processElement(AbstractKeyedCEPPatternMixedTimeApproachOperator.java:45)
>>> at 
>>> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
>>> at 
>>> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
>>> at 
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Caused by: org.apache.flink.util.WrappingRuntimeException: 
>>> java.lang.ArrayIndexOutOfBoundsException: -1
>>> at 
>>> org.apache.flink.cep.nfa.sharedbuffer.SharedBuffer.lambda$materializeMatch$1(SharedBuffer.java:305)
>>> at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
>>> at 
>>> org.apache.flink.cep.nfa.sharedbuffer.SharedBuffer.materializeMatch(SharedBuffer.java:301)
>>> at 
>>> org.apache.flink.cep.nfa.sharedbuffer.SharedBuffer.materializeMatch(SharedBuffer.java:291)
>>> at 
>>> org.apache.flink.cep.nfa.NFA$ConditionContext.getEventsForPattern(NFA.java:811)
>>> at 
>>> com.stellapps.contrakcep.flink.patterns.critical.AgitatorMalfunctionAfterChillingPattern$1.filter(AgitatorMalfunctionAfterChillingPattern.java:70)
>>> at 
>>> com.stellapps.contrakcep.flink.patterns.

Task Manager allocation issue when upgrading 1.6.0 to 1.6.2

2018-11-08 Thread Cliff Resnick

I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a
configuration of 3 slots per TM. The cluster is dedicated to a single job
that runs at full capacity in "FLIP6" mode. So in this cluster, the
parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager).

When I run the job in 1.6.0, seven Task Managers are spun up as expected.
But if I run with 1.6.2 only four Task Managers spin up and the job hangs
waiting for more resources.

Our Flink distribution is set up by script after building from source. So
aside from flink jars, both 1.6.0 and 1.6.2 directories are identical. The
job is the same, restarting from savepoint. The problem is repeatable.

Has something changed in 1.6.2, and if so can it be remedied with a config
change?

Re: java.io.IOException: NSS is already initialized

2018-11-08 Thread Hao Sun

Thanks, any insight/help here is appreciated.

On Thu, Nov 8, 2018 at 4:38 AM Dawid Wysakowicz 
wrote:

> Hi Hao,
>
> I am not sure, what might be wrong, but I've cc'ed Gary and Kostas who
> were recently working with S3, maybe they will have some ideas.
>
> Best,
>
> Dawid
> On 03/11/2018 03:09, Hao Sun wrote:
>
> Same environment, new error.
>
> I can run the same docker image with my local Mac, but on K8S, this gives
> me this error.
> I can not think of any difference between local Docker and K8S Docker.
>
> Any hint will be helpful. Thanks
>
> 
>
> 2018-11-02 23:29:32,981 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph- Job
> ConnectedStreams maxwell.accounts ()
> switched from state RUNNING to FAILING.
> AsynchronousException{java.lang.Exception: Could not materialize
> checkpoint 235 for operator Source: KafkaSource(maxwell.accounts) ->
> MaxwellFilter->Maxwell(maxwell.accounts) ->
> FixedDelayWatermark(maxwell.accounts) ->
> MaxwellFPSEvent->InfluxDBData(maxwell.accounts) -> Sink:
> influxdbSink(maxwell.accounts) (1/1).}
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1153)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:947)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:884)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> *Caused by: java.lang.Exception: Could not materialize checkpoint 235 for
> operator Source*: KafkaSource(maxwell.accounts) ->
> MaxwellFilter->Maxwell(maxwell.accounts) ->
> FixedDelayWatermark(maxwell.accounts) ->
> MaxwellFPSEvent->InfluxDBData(maxwell.accounts) -> Sink:
> influxdbSink(maxwell.accounts) (1/1).
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942)
> ... 6 more
> *Caused by: java.util.concurrent.ExecutionException:
> java.lang.NoClassDefFoundError: Could not initialize class
> sun.security.ssl.SSLSessionImpl*
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
>
> at
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(
> http://OperatorSnapshotFinalizer.java:53
> )
>
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853)
> ... 5 more
> *Caused by: java.lang.NoClassDefFoundError: Could not initialize class
> sun.security.ssl.SSLSessionImpl*
> at sun.security.ssl.SSLSocketImpl.init(SSLSocketImpl.java:604)
>
> at sun.security.ssl.SSLSocketImpl.(http://SSLSocketImpl.java:572
> )
>
> at
> sun.security.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:110)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:365)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
> at
> org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:132)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
> at
> org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.$Proxy4.connect(Unknown
> Source)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
> at
> org.a

RE: Stopping a streaming app from its own code : behaviour change from 1.3 to 1.6

2018-11-08 Thread LINZ, Arnaud

1.FLINK-10832
Created (with heavy difficulties as typing java code in a jira description was 
an awful experience ☺)


De : LINZ, Arnaud
Envoyé : mercredi 7 novembre 2018 11:43
À : 'user' 
Objet : RE: Stopping a streaming app from its own code : behaviour change from 
1.3 to 1.6

FYI, the code below ends with version 1.6.0, do not end in 1.6.1. I suspect 
it’s a bug instead of a new feature.

De : LINZ, Arnaud
Envoyé : mercredi 7 novembre 2018 11:14
À : 'user' mailto:user@flink.apache.org>>
Objet : RE: Stopping a streaming app from its own code : behaviour change from 
1.3 to 1.6


Hello,



This has nothing to do with HA. All my unit tests involving a streaming app now 
fail in “infinite execution”
This simple code never ends :
@Test
public void testFlink162() throws Exception {
// get the execution environment
final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
// get input data
final DataStreamSource text = env.addSource(new 
SourceFunction() {
@Override
public void run(final SourceContext ctx) throws Exception {
for (int count = 0; count < 5; count++) {
ctx.collect(String.valueOf(count));
}
}
@Override
public void cancel() {
}
});
text.print().setParallelism(1);
env.execute("Simple Test");
// Never ends !
}
Is this really a new feature or a critical bug?
In the log, the task executor is stopped
[2018-11-07 11:11:23,608] INFO Stopped TaskExecutor 
akka://flink/user/taskmanager_0. 
(org.apache.flink.runtime.taskexecutor.TaskExecutor:330)
But execute() does not return.

Arnaud

Log is :
[2018-11-07 11:11:11,432] INFO Running job on local embedded Flink mini cluster 
(org.apache.flink.streaming.api.environment.LocalStreamEnvironment:114)
[2018-11-07 11:11:11,449] INFO Starting Flink Mini Cluster 
(org.apache.flink.runtime.minicluster.MiniCluster:227)
[2018-11-07 11:11:11,636] INFO Starting Metrics Registry 
(org.apache.flink.runtime.minicluster.MiniCluster:238)
[2018-11-07 11:11:11,652] INFO No metrics reporter configured, no metrics will 
be exposed/reported. (org.apache.flink.runtime.metrics.MetricRegistryImpl:113)
[2018-11-07 11:11:11,703] INFO Starting RPC Service(s) 
(org.apache.flink.runtime.minicluster.MiniCluster:249)
[2018-11-07 11:11:12,244] INFO Slf4jLogger started 
(akka.event.slf4j.Slf4jLogger:92)
[2018-11-07 11:11:12,264] INFO Starting high-availability services 
(org.apache.flink.runtime.minicluster.MiniCluster:290)
[2018-11-07 11:11:12,367] INFO Created BLOB server storage directory 
C:\Users\alinz\AppData\Local\Temp\blobStore-fd104a2d-caaf-4740-a762-d292cb2ed108
 (org.apache.flink.runtime.blob.BlobServer:141)
[2018-11-07 11:11:12,379] INFO Started BLOB server at 0.0.0.0:64504 - max 
concurrent requests: 50 - max backlog: 1000 
(org.apache.flink.runtime.blob.BlobServer:203)
[2018-11-07 11:11:12,380] INFO Starting ResourceManger 
(org.apache.flink.runtime.minicluster.MiniCluster:301)
[2018-11-07 11:11:12,409] INFO Starting RPC endpoint for 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at 
akka://flink/user/resourcemanager_f70e9071-b9d0-489a-9dcc-888f75b46864 . 
(org.apache.flink.runtime.rpc.akka.AkkaRpcService:224)
[2018-11-07 11:11:12,432] INFO Proposing leadership to contender 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager@5b1f29fa
 @ akka://flink/user/resourcemanager_f70e9071-b9d0-489a-9dcc-888f75b46864 
(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedLeaderService:274)
[2018-11-07 11:11:12,439] INFO ResourceManager 
akka://flink/user/resourcemanager_f70e9071-b9d0-489a-9dcc-888f75b46864 was 
granted leadership with fencing token 86394924fb97bad612b67f526f84406f 
(org.apache.flink.runtime.resourcemanager.StandaloneResourceManager:953)
[2018-11-07 11:11:12,440] INFO Starting the SlotManager. 
(org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager:185)
[2018-11-07 11:11:12,442] INFO Received confirmation of leadership for leader 
akka://flink/user/resourcemanager_f70e9071-b9d0-489a-9dcc-888f75b46864 , 
session=12b67f52-6f84-406f-8639-4924fb97bad6 
(org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedLeaderService:229)
[2018-11-07 11:11:12,452] INFO Created BLOB cache storage directory 
C:\Users\alinz\AppData\Local\Temp\blobStore-b2618f73-5ec6-4fdf-ad43-1da6d6c19a4f
 (org.apache.flink.runtime.blob.PermanentBlobCache:107)
[2018-11-07 11:11:12,454] INFO Created BLOB cache storage directory 
C:\Users\alinz\AppData\Local\Temp\blobStore-df6c61d2-3c51-4335-a96e-6b00c82e4d90
 (org.apache.flink.runtime.blob.TransientBlobCache:107)
[2018-11-07 11:11:12,454] INFO Starting 1 TaskManger(s) 
(org.apache.flink.runtime.minicluster.MiniCluster:316)
[2

Re: Always trigger calculation of a tumble window in Flink SQL

2018-11-08 Thread yinhua.dai

Hi Piotr, 

Thank you for your explanation. 
I basically understand your meaning, as far as my understanding, we can
write a custom window assigner and custom trigger, and we can register the
timer when the window process elements. 

But How can we register a timer when no elements received during a time
window? 
My requirement is to always fire at end of the time window even no result
from the sql query.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Always trigger calculation of a tumble window in Flink SQL

2018-11-08 Thread yinhua.dai

Hi Piotr, 

Thank you for your explanation. 
I basically understand your meaning, as far as my understanding, we can
write a custom window assigner and custom trigger, and we can register the
timer when the window process elements. 

But How can we register a timer when no elements received during a time
window? 
My requirement is to always fire at end of the time window even no result
from the sql query.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: FlinkKafkaProducer and Confluent Schema Registry

2018-11-08 Thread Olga Luganska

Dawid,

Is there a projected date to deliver ConfluentRegistryAvroSerializationSchema ?

thank you,
Olga


From: Dawid Wysakowicz 
Sent: Monday, October 22, 2018 10:40 AM
To: trebl...@hotmail.com
Cc: user
Subject: Re: FlinkKafkaProducer and Confluent Schema Registry

Hi Olga,
There is an open PR[1] that has some in-progress work on corresponding 
AvroSerializationSchema, you can have a look at it. The bigger issue there is 
that SerializationSchema does not have access to event's key so using topic 
pattern might be problematic.
Best,
Dawid

[1] https://github.com/apache/flink/pull/6259

On Mon, 22 Oct 2018 at 16:51, Kostas Kloudas 
mailto:k.klou...@data-artisans.com>> wrote:
Hi Olga,

Sorry for the late reply.
I think that Gordon (cc’ed) could be able to answer your question.

Cheers,
Kostas

On Oct 13, 2018, at 3:10 PM, Olga Luganska 
mailto:trebl...@hotmail.com>> wrote:

Any suggestions?

Thank you

Sent from my iPhone

On Oct 9, 2018, at 9:28 PM, Olga Luganska 
mailto:trebl...@hotmail.com>> wrote:

Hello,

I would like to use Confluent Schema Registry in my streaming job.
I was able to make it work with the help of generic Kafka producer and 
FlinkKafkaConsumer which is using ConfluentRegistryAvroDeserializationSchema.

FlinkKafkaConsumer011 consumer = new 
FlinkKafkaConsumer011<>(MY_TOPIC,
ConfluentRegistryAvroDeserializationSchema.forGeneric(schema, SCHEMA_URI), 
kafkaProperties);

My question: is it possible to implement producer logic in the 
FlinkKafkaProducer to serialize message and store schema id in the Confluent 
Schema registry?

I don't think this is going to work with the current interface because creation 
and caching of the schema id in the Confluent Schema Registry is done with the 
help of io.confluent.kafka.serializers.KafkaAvroSerializer.class  and all 
FlinkKafkaProducer constructors have either SerializationSchema or 
KeyedSerializationSchema (part of Flink's own serialization stack) as one of 
the parameters.
If my assumption is wrong, could you please provide details of implementation?
Thank you very much,
Olga

Re: Flink cluster security conf.: keberos.keytab add to run yarn-cluster

2018-11-08 Thread Paul Lam

Hi,

Wouldn't `-yD` option do the trick? I use it to override the kerberos 
configuration for different users very often.

Best,
Paul Lam


> 在 2018年11月8日，17:33，Dawid Wysakowicz  写道：
> 
> Hi Marke,
> 
> AFAIK Shuyi is right, there is no such option so far. Maybe you could do 
> though is to extend the "flink" script to substitute those parameters in the 
> file on each run, but I think it is a common practice to run flink jobs on 
> yarn from a single service user.
> On 31/10/2018 19:52, Shuyi Chen wrote:
>> Do you mean have these two options as the command line options? If so, 
>> AFAIK, I dont think it's supported now. What do you need it? Thanks.
>> 
>> On Wed, Oct 31, 2018 at 11:43 AM Marke Builder > > wrote:
>> Hi,
>> 
>> So far I have added my keytab and principal in the  flink-conf.yaml:
>> security.kerberos.login.keytab:
>> security.kerberos.login.principal:
>> 
>> But is there a way that I can add this to the "start script" -> run 
>> yarn-cluster .
>> 
>> 
>> Thanks!  
>> 
>> 
>> -- 
>> "So you have to trust that the dots will somehow connect in your future."

Re: Always trigger calculation of a tumble window in Flink SQL

2018-11-08 Thread yinhua.dai

Hi Piotr,

Thank you for your explanation.
I basically understand your meaning, as far as my understanding, we can
write a custom window assigner and custom trigger, and we can register the
timer when the window process elements.

But How can we register a timer when no elements received during a time
window?
My requirement is to always fire at end of the time window even no result
from the sql query.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Always trigger calculation of a tumble window in Flink SQL

2018-11-08 Thread yinhua.dai

Hi Piotr, Thank you for your explanation. I basically understand your
meaning, as far as my understanding, we can write a custom window assigner
and custom trigger, and we can register the timer when the window process
elements. But How can we register a timer when no elements received during a
time window? My requirement is to always fire at end of the time window even
no result from the sql query.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Always trigger calculation of a tumble window in Flink SQL

2018-11-08 Thread Piotr Nowojski

Re-adding user mailing list to CC

Hi,

> I basically understand your meaning, as far as my understanding, we can write 
> a custom window assigner and custom trigger, and we can register the timer 
> when the window process elements. 


No I was actually suggesting to write your own operator to do that. My bet is 
that hacking window operator to make it re-emit the same result in case of no 
data would be more difficult if not even impossible, while your custom 
“ReEmitLastRow” operator should be relatively simple.

> But How can we register a timer when no elements received during a time 
> window? 

Upon first element register timer for N seconds in the future. Once it fires, 
register next one (you can do that while processing a timer callback) again for 
N seconds in the future and so on.

Piotrek

> On 8 Nov 2018, at 07:44, yinhua.2...@outlook.com wrote:
> 
> Hi Piotr, 
> 
> Thank you for your explanation. 
> I basically understand your meaning, as far as my understanding, we can write 
> a custom window assigner and custom trigger, and we can register the timer 
> when the window process elements. 
> 
> But How can we register a timer when no elements received during a time 
> window? 
> My requirement is to always fire at end of the time window even no result 
> from the sql query.



> On 7 Nov 2018, at 09:48, Piotr Nowojski  wrote:
> 
> Hi,
> 
> You would have to register timers (probably based on event time).
> 
> Your operator would be a vastly simplified window operator, where for given 
> window you keep emitted record from your SQL, sth like:
> 
> MapState emittedRecords; // map window start -> emitted 
> record
> 
> When you process elements, you just put them into this map. To emit the 
> results, you just register event time timers and when a timer fires, you 
> search in the map for the latest record matching the timer's event time 
> (there might be many elements in the map, some of them older some of them 
> newer then the fired timer). You can/should also prune the state in the same 
> timer - for example after emitting the result drop all of the windows older 
> then the timer.
> 
> Piotrek
> 
>> On 7 Nov 2018, at 02:55, yinhua.dai  wrote:
>> 
>> Hi Piotr,
>> 
>> Can you elaborate more on the solution with the custom operator?
>> I don't think there will be any records from the SQL query if no input data
>> in coming in within the time window even if we convert the result to a
>> datastream.
>> 
>> 
>> 
>> --
>> Sent from: 
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: flink job restarts when flink cluster restarts?

2018-11-08 Thread Chang Liu

Thanks!

If I have a cluster more than one node (standalone or YRAN), can I stop and 
start any single node among them and keep the job running?

Best regards/祝好,

Chang Liu 刘畅


> On 7 Nov 2018, at 16:17, 秦超峰 <18637156...@163.com> wrote:
> 
> the second
> 
> 
> 
>   
> 秦超峰
> 邮箱：windyqinchaof...@163.com
>  
> 签名由
>  网易邮箱大师  定制
> 
> On 11/07/2018 17:14, Chang Liu  wrote:
> Hi,
> 
> I have a question regarding whether the current running job will restart if I 
> stop and start the flink cluster?
> 
> 1. Let’s say I am just having a Standalone one node cluster.
> 2. I have several Flink jobs already running on the cluster.
> 3. If I do a bin/cluster-stop.sh and then do a bin/cluster-start.sh, will be 
> previously running job restart again?
> 
> OR
> 
> Before I do bin/cluster-stop.sh, I have to do Savepoints for each of the job.
> After bin/cluster-start.sh is finished, I have to do Start Job based on 
> Savepoints triggered before for each of the job I want to restart.
> 
> Many thanks in advance :)
> 
> Best regards/祝好,
> 
> Chang Liu 刘畅
> 
>

Re: flink job restarts when flink cluster restarts?

2018-11-08 Thread Chang Liu

Or to say, how can I keep the jobs for system patching, server restart, etc. Is 
it related to Standalone vs YARN? Or is it related to whether to use Zookeeper?

Many thanks!

Best regards/祝好,

Chang Liu 刘畅


> On 8 Nov 2018, at 13:38, Chang Liu  wrote:
> 
> Thanks!
> 
> If I have a cluster more than one node (standalone or YRAN), can I stop and 
> start any single node among them and keep the job running?
> 
> Best regards/祝好,
> 
> Chang Liu 刘畅
> 
> 
>> On 7 Nov 2018, at 16:17, 秦超峰 <18637156...@163.com 
>> > wrote:
>> 
>> the second
>> 
>> 
>> 
>>  
>> 秦超峰
>> 邮箱：windyqinchaof...@163.com
>>  
>> 签名由
>>  网易邮箱大师  定制
>> 
>> On 11/07/2018 17:14, Chang Liu  wrote:
>> Hi,
>> 
>> I have a question regarding whether the current running job will restart if 
>> I stop and start the flink cluster?
>> 
>> 1. Let’s say I am just having a Standalone one node cluster.
>> 2. I have several Flink jobs already running on the cluster.
>> 3. If I do a bin/cluster-stop.sh and then do a bin/cluster-start.sh, will be 
>> previously running job restart again?
>> 
>> OR
>> 
>> Before I do bin/cluster-stop.sh, I have to do Savepoints for each of the job.
>> After bin/cluster-start.sh is finished, I have to do Start Job based on 
>> Savepoints triggered before for each of the job I want to restart.
>> 
>> Many thanks in advance :)
>> 
>> Best regards/祝好,
>> 
>> Chang Liu 刘畅
>> 
>> 
>

ProcessFunction's Event Timer not firing

2018-11-08 Thread Fritz Budiyanto

Hi All,

I noticed if one of the slot's watermark not progressing, its impacting all 
slots processFunction timer and no timer are not firing. 

In my example, I have Source parallelism set to 8 and Kafka partition is 4. The 
next operator is processFunction with parallelism of 8 +  event timer. I can 
see from the debug log that one of the slot's watermark is not progressing. As 
a result, all slot's timer in the process function are not firing. Is this 
expected behavior or issue? How do I prevent this condition?

Thanks,
Fritz

Re: FlinkKafkaProducer and Confluent Schema Registry

2018-11-08 Thread Dawid Wysakowicz

Hi Olga,

The only thing I can tell is that it definitely won't make it to 1.7
release. The earliest possible is 1.8 then, which is scheduled for the
beginning of next year.

Best,

Dawid


On 08/11/2018 00:48, Olga Luganska wrote:
> Dawid,
>
> Is there a projected date to
> deliver ConfluentRegistryAvroSerializationSchema ?
>
> thank you,
> Olga
>
> 
> *From:* Dawid Wysakowicz 
> *Sent:* Monday, October 22, 2018 10:40 AM
> *To:* trebl...@hotmail.com
> *Cc:* user
> *Subject:* Re: FlinkKafkaProducer and Confluent Schema Registry
>  
> Hi Olga,
> There is an open PR[1] that has some in-progress work on corresponding
> AvroSerializationSchema, you can have a look at it. The bigger issue
> there is that SerializationSchema does not have access to event's key
> so using topic pattern might be problematic.
> Best,
> Dawid
>
> [1] https://github.com/apache/flink/pull/6259
>
> On Mon, 22 Oct 2018 at 16:51, Kostas Kloudas
> mailto:k.klou...@data-artisans.com>> wrote:
>
> Hi Olga,
>
> Sorry for the late reply.
> I think that Gordon (cc’ed) could be able to answer your question.
>
> Cheers,
> Kostas
>
>> On Oct 13, 2018, at 3:10 PM, Olga Luganska > > wrote:
>>
>> Any suggestions?
>>
>> Thank you
>>
>> Sent from my iPhone
>>
>> On Oct 9, 2018, at 9:28 PM, Olga Luganska > > wrote:
>>
>>> Hello,
>>>
>>> Iwould like to useConfluent Schema Registry in my streaming job.
>>> I was able to make it work with the help of
>>> generic Kafka producer and FlinkKafkaConsumer which is using
>>> ConfluentRegistryAvroDeserializationSchema.  
>>>
>>> FlinkKafkaConsumer011 consumer=
>>> newFlinkKafkaConsumer011<>(MY_TOPIC,
>>> ConfluentRegistryAvroDeserializationSchema.forGeneric(schema,
>>> SCHEMA_URI), kafkaProperties);
>>>
>>> My question: is it possible to implement *producer*logic in the
>>> FlinkKafkaProducer to serialize message and store schema id in
>>> the Confluent Schema registry?
>>>
>>> I don't think this is going to work with the current interface
>>> because creation and caching of the schema id in the Confluent
>>> Schema Registry is done with the help
>>> of /io.confluent.kafka.serializers.KafkaAvroSerializer.class/  and
>>> all FlinkKafkaProducer constructors have either
>>> SerializationSchema or KeyedSerializationSchema (part of Flink's
>>> own serialization stack) as one of the parameters.
>>>
>>>
>>> If my assumption is wrong, could you please provide details
>>> of implementation?
>>>
>>> Thank you very much,
>>> Olga
>>>
>>>
>>>
>>>
>>>
>>> 
>>>
>>>
>>>
>>>
>


signature.asc
Description: OpenPGP digital signature

Re: Flink weird checkpointing behaviour

2018-11-08 Thread Dawid Wysakowicz

Hi,

I think it is definitely worth checking the alignment time as Yun Tang
suggested. There were some changes in the network stack that could
influence this behavior between those version.


I've also added Stefan as cc, who might have more ideas what would be
worth checking.

Best,

Dawid


On 31/10/2018 16:51, Yun Tang wrote:
> Hi Pawel
>
> First of all, I don't think the akka timeout exception has
> relationship with checkpoint taking long time. And both
> RocksDBStateBackend and FsStateBackend could have the async part of
> checkpoint, which would upload data to DFS in general. That's why
> async part would take more time than sync part of checkpoint in most
> cases.
>
> You could try to notice whether the checkpoint alignment time is much
> longer than before, back pressure of a job would cause tasks in
> downstream received checkpoint barrier later and tasks must receive
> all barriers from its inputs to trigger checkpoint [1]. If the long
> checkpoint alignment time mainly impact the overall checkpoint
> duration, you should check the tasks which cause back pressure.
>
> Also, the long time of checkpoint might also be caused by the low
> write performance of DFS.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html#barriers
> 
>   
> Apache Flink 1.6 Documentation: Data Streaming Fault Tolerance
> 
> Apache Flink offers a fault tolerance mechanism to consistently
> recover the state of data streaming applications. The mechanism
> ensures that even in the presence of failures, the program’s state
> will eventually reflect every record from the data stream exactly
> once. Note that there is a switch to ...
> ci.apache.org
>
> Best
> Yun Tang
>
> 
> *From:* Pawel Bartoszek 
> *Sent:* Wednesday, October 24, 2018 23:11
> *To:* User
> *Subject:* Flink weird checkpointing behaviour
>  
> Hi,
>
> We have just upgraded to Flink 1.5.2 on EMR from Flink 1.3.2. We have
> noticed that some checkpoints are taking a very long time to complete
> some of them event fails with exception
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/jobmanager_0#-665361795]] after [6 ms].
>
> We have noticed that /Checkpoint Duration (Async) /is taking most of
> checkpoint time compared to /Checkpoint Duration (Sync). /I thought
> that Async checkpoints are only offered by RocksDB backend state. We
> use filesystem state.
>
> We didn't have such problems on Flink 1.3.2
>
> Thanks,
> Pawel
>
> *Flink configuration*
>  
> akka.ask.timeout60 s
> classloader.resolve-orderparent-first
> containerized.heap-cutoff-ratio0.15
> env.hadoop.conf.dir/etc/hadoop/conf
> env.yarn.conf.dir/etc/hadoop/conf
> high-availabilityzookeeper
> high-availability.cluster-idapplication_1540292869184_0001
> high-availability.zookeeper.path.root/flink
> high-availability.zookeeper.quorumip-10-4-X-X.eu-west-1.compute.internal:2181
> high-availability.zookeeper.storageDirhdfs:///flink/recovery
> internal.cluster.execution-modeNORMAL
> internal.io.tmpdirs.use-local-defaulttrue
> io.tmp.dirs/mnt/yarn/usercache/hadoop/appcache/application_1540292869184_0001
> jobmanager.heap.mb3072
> jobmanager.rpc.addressip-10-4-X-X.eu-west-1.compute.internal
> jobmanager.rpc.port41219
> jobmanager.web.checkpoints.history1000
> parallelism.default32
> rest.addressip-10-4-X-X.eu-west-1.compute.internal
> rest.port0
> state.backendfilesystem
> state.backend.fs.checkpointdirs3a://
> state.checkpoints.dirs3a://...
> state.savepoints.dirs3a://...
> taskmanager.heap.mb6600
> taskmanager.numberOfTaskSlots1
> web.port0
> web.tmpdir/tmp/flink-web-c3d16e22-1a33-46a2-9825-a6e268892199
> yarn.application-attempts10
> yarn.maximum-failed-containers-1
> zookeeper.sasl.disabletrue


signature.asc
Description: OpenPGP digital signature

Re: "org.apache.flink.client.program.ProgramInvocationException: Unknown I/O error while extracting contained jar files

2018-11-08 Thread Dawid Wysakowicz

Hi,

Could you post the full stacktrace of the exception?

Best,

Dawid

On 05/11/2018 09:19, wangziyu wrote:
> Hi,
> I use monitor Restful api ，“/jars/{jars}/run” to test my environment.The
> exception is happend.
> I did exactly that：
> 1.I use “/jars/upload” to upload my jar.
> 2.I wanted to test my jar.
>  That is all. How can I  solve this exception.
>
>
>
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



signature.asc
Description: OpenPGP digital signature

Re: sys.exist(1) led to standalonesession daemon closed

2018-11-08 Thread Dawid Wysakowicz

Hi Tony,

I think your reasoning is correct that this is because of the fact that
rest server runs in the same process as standalonesession.

It's hard to say if it is an expected behavior or not, but there is
really not much we can do about it. User code can actually call
System.exit from any

place in the code, which will result in the process executing this code
to terminate. In general I would say calling System.exit is rather
discouraged.

Best,

Dawid

On 05/11/2018 06:38, Tony Wei wrote:
> Hi,
>
> I used a scala library called scallop[1] to parse my job’s arguments.
> When the argument didn’t 
> exist in the config setting, the default behavior of scallop would
> call sys.exit(1).
>
> It is not a problem when I’m using flink cli to submit job. However,
> when I used rest api to submit 
> job, it seems that sys.exit(1) will leads to standalonesession daemon
> closed. Maybe the reason is 
> that rest server is also in the same process as standalonesession
> daemon. Am I correct?
>
> If this is the root cause, is this an expected behavior and users
> should be aware of not using 
> sys.exit(1) in their jobs?
>
> I tested this on 1.6.0 standalone session cluster with flip-6 mode.
> And here are my testing job 
> and logs before and after the submission.
>
> package com.appier.rt.rt_match
> import org.apache.flink.api.scala.createTypeInformation
> import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
> import org.rogach.scallop.{ScallopConf, ScallopOption}
> object TestMain {
>   def main(args: Array[String]): Unit = {
>     object Args extends ScallopConf(args) {
>       val mode: ScallopOption[String] = opt[String](default =
> Some("development"))
>       verify
>     }
>     val env = StreamExecutionEnvironment.getExecutionEnvironment
>     env.fromElements(Args.mode()).map(a => a)
>     env.execute()
>   }
> }
>
>
> Submit by flink-cli
>
> $ ./bin/flink run -c com.appier.rt.rt_match.TestMain -p 2 -d
> rt-match-assembly-4.5.1-SNAPSHOT.jar --mo xyz
> Starting execution of program
> [scallop] Error: Unknown option 'mo'
>
>
> Submit by rest-api
>
> 2018-11-05 13:27:58,800 TRACE
> org.apache.flink.runtime.webmonitor.handlers.JarListHandler   -
> Received request /jars/.
> 2018-11-05 13:27:59,679 TRACE
> org.apache.flink.runtime.rest.FileUploadHandler               -
> Received request. URL:/jobs/overview Method:GET
> 2018-11-05 13:27:59,680 TRACE
> org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  -
> Received request /jobs/overview.
> 2018-11-05 13:28:01,752 TRACE
> org.apache.flink.runtime.rest.FileUploadHandler               -
> Received request. URL:/jars/ Method:GET
> 2018-11-05 13:28:01,753 TRACE
> org.apache.flink.runtime.webmonitor.handlers.JarListHandler   -
> Received request /jars/.
> 2018-11-05 13:28:02,682 TRACE
> org.apache.flink.runtime.rest.FileUploadHandler               -
> Received request. URL:/jobs/overview Method:GET
> 2018-11-05 13:28:02,683 TRACE
> org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  -
> Received request /jobs/overview.
> 2018-11-05 13:28:03,899 TRACE
> org.apache.flink.runtime.rest.FileUploadHandler               -
> Received request.
> 
> URL:/jars/7413f82a-d650-4729-873e-a94150ffe9d0_rt-match-assembly-4.5.1-SNAPSHOT.jar/run?entry
> class=com.appier.rt.rt_match.TestMain¶llelism=2&program-args=--mo+xyz
> Method:POST
> 2018-11-05 13:28:03,902 TRACE
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
> Received request
> 
> /jars/7413f82a-d650-4729-873e-a94150ffe9d0_rt-match-assembly-4.5.1-SNAPSHOT.jar/run?entry
> class=com.appier.rt.rt_match.TestMain¶llelism=2&program-args=--mo+xyz.
> 2018-11-05 13:28:04,751 TRACE
> org.apache.flink.runtime.rest.FileUploadHandler               -
> Received request. URL:/jars/ Method:GET
> 2018-11-05 13:28:04,752 TRACE
> org.apache.flink.runtime.webmonitor.handlers.JarListHandler   -
> Received request /jars/.
> 2018-11-05 13:28:04,760 INFO 
> org.apache.flink.runtime.blob.TransientBlobCache              -
> Shutting down BLOB cache
> 2018-11-05 13:28:04,761 INFO 
> org.apache.flink.runtime.blob.BlobServer                      -
> Stopped BLOB server at 0.0.0.0:42075 
>
>
> Best,
> Tony Wei.
>
> [1] https://github.com/scallop/scallop


signature.asc
Description: OpenPGP digital signature

Re: java.io.IOException: NSS is already initialized

2018-11-08 Thread Dawid Wysakowicz

Hi Hao,

I am not sure, what might be wrong, but I've cc'ed Gary and Kostas who
were recently working with S3, maybe they will have some ideas.

Best,

Dawid

On 03/11/2018 03:09, Hao Sun wrote:
> Same environment, new error.
> I can run the same docker image with my local Mac, but on K8S, this
> gives me this error.
> I can not think of any difference between local Docker and K8S Docker.
>
> Any hint will be helpful. Thanks
>
> 
> 2018-11-02 23:29:32,981 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job
> ConnectedStreams maxwell.accounts ()
> switched from state RUNNING to FAILING.
> AsynchronousException{java.lang.Exception: Could not materialize
> checkpoint 235 for operator Source: KafkaSource(maxwell.accounts) ->
> MaxwellFilter->Maxwell(maxwell.accounts) ->
> FixedDelayWatermark(maxwell.accounts) ->
> MaxwellFPSEvent->InfluxDBData(maxwell.accounts) -> Sink:
> influxdbSink(maxwell.accounts) (1/1).}
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1153)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:947)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:884)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> *Caused by: java.lang.Exception: Could not materialize checkpoint 235
> for operator Source*: KafkaSource(maxwell.accounts) ->
> MaxwellFilter->Maxwell(maxwell.accounts) ->
> FixedDelayWatermark(maxwell.accounts) ->
> MaxwellFPSEvent->InfluxDBData(maxwell.accounts) -> Sink:
> influxdbSink(maxwell.accounts) (1/1).
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942)
> ... 6 more
> *Caused by: java.util.concurrent.ExecutionException:
> java.lang.NoClassDefFoundError: Could not initialize class
> sun.security.ssl.SSLSessionImpl*
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
> at
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853)
> ... 5 more
> *Caused by: java.lang.NoClassDefFoundError: Could not initialize class
> sun.security.ssl.SSLSessionImpl*
> at sun.security.ssl.SSLSocketImpl.init(SSLSocketImpl.java:604)
> at sun.security.ssl.SSLSocketImpl.(SSLSocketImpl.java:572)
> at
> sun.security.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:110)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:365)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
> at
> org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:132)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
> at
> org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.$Proxy4.connect(Unknown
> Source)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
> at
> org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHt

Re: Question about slot and yarn vcores

2018-11-08 Thread Dawid Wysakowicz

Hi,

It is not as easy. To understand it well I would recommend going through
the docs[1]. In short slots do not equal thread/core it is just an
abstraction over a share of resources.

For you setup: It is true there will be 3 jvms each will be assigned 2
vcores ( by default it is equal to slot per taskamanger if you don't
overwrite it with[2]), but each of the jvm will have multiple threads.

Best,

Dawid

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/runtime.html#task-slots-and-resources

[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html#yarn-containers-vcores

On 02/11/2018 10:48, sohimankotia wrote:
> Let's assume I have yarn cluster with 3 nodes, 3 vcores each nodes. So total
> available cores = 9
>
> Now if I spin a flink job with taskmanager = 3 and no. of slots per task
> manager = 2 ,what will happen :
>
>
> 1. 3 Jvms will be initiated (for each task manager)
> 2. Each JVM will run 2 threads for tasks .
> 3. Will each thread will use 1 vcore from yarn or both threads will share
> same vcore ??
>
>
>
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

signature.asc
Description: OpenPGP digital signature

Re: Starting a seperate Java process within a Flink cluster

2018-11-08 Thread Dawid Wysakowicz

Hi,

I am afraid that would be extremely hard what you are trying to do as in
a cluster setup not all dependencies are taken from the taskmanager
classpath, actually the user code classes are loaded dynamically,
therefore they cannot be accessed in your new process which does not
have access to those user classes.

Best,

Dawid

On 02/11/2018 10:34, Jeff Zhang wrote:
>
> The error is most likely due to classpath issue. Because classpath is
> different when you running flink program in IDE and run it in cluster. 
>
> And starting another jvm process in SourceFunction doesn't seems a
> good approach to me, is it possible for you to do in your custom
> SourceFunction ?
>
>
> Ly, The Anh  >于2018年11月2日周五 下午5:22写道：
>
> Yes, i did. It is definitely there. I tried and made a separate
> Maven project to test if something was wrong with my jar. 
> The resulting shaded jar of that test project was fine and the
> message-buffer-process was running with that test jar. 
>
>
> Am 02.11.2018 04:47 schrieb Yun Tang  >:
> Hi
>
> Since you use the message-buffer-process as a dependency and the
> error tells you class not found, have you ever check your
> application jar package whether containing the wanted
> MessageBufferProcess.class? If not existed, try to use
> assembly-maven
>   or
> shaded-maven
>  plugin to
> include your classes.
>
> Best
> Yun Tang
> 
> *From:* Ly, The Anh  >
> *Sent:* Friday, November 2, 2018 6:33
> *To:* user@flink.apache.org 
> *Subject:* Starting a seperate Java process within a Flink cluster
>  
>
> Hello,
>
>
> I am currently working on my masters and I encountered a difficult
> problem.
>
>
> Background (for context): I am trying to connect different data
> stream processors. Therefore i am using Flink's
> internal mechanisms of creating custom sinks and sources to
> receive from and send to different data stream processors. I am
> starting a separate 
>
> process (message-buffer-process) in those custom sinks and sources
> to communicate and buffer data into that message-buffer-process. 
> My implementation is created with Maven and it could potentially
> be added as an dependency. 
>
>
> Problem: I already tested my implementation by adding it as
> an dependency to a simple Flink word-count example. The test was
> within an IDE which works perfectly fine. But when i package that
> Flink work-count example and try
>
> to run it with "./flink run " or by uploading and submitting it as
> a job, it tells me that my buffer-process-class could not be found:
>
> In German: "Fehler: Hauptklasse
> de.tuberlin.mcc.geddsprocon.messagebuffer.MessageBufferProcess
> konnte nicht gefunden oder geladen werden"
>
> Roughly translated: "Error: Main class
> de.tuberlin.mcc.geddsprocon.messagebuffer.MessageBufferProcess
> could not be found or loaded"
>
>
> Code snipplets:
>
> Example - Adding my custom sink to send data to another data
> stream processor:
>
> dataStream.addSink(
> (SinkFunction)DSPConnectorFactory.getInstance()
> .createSinkConnector( newDSPConnectorConfig .Builder("localhost",
> 9656) .withDSP("flink")
> .withBufferConnectorString("buffer-connection-string")
> .withHWM(20) .withTimeout(1) .build()));
>
>
>
> The way i am trying to start the separate
> buffer-process: JavaProcessBuilder.exec(MessageBufferProcess.class,
> connectionString, addSentMessagesFrame); How
> JavaProcessBuilder.exec looks like: public static Process
> exec(Class javaClass, String connectionString, boolean
> addSentMessagesFrame) throws IOException, InterruptedException {
> String javaHome = System.getProperty("java.home"); String javaBin
> = javaHome + File.separator + "bin" + File.separator + "java";
> String classpath = System.getProperty("java.class.path"); String
> className = javaClass.getCanonicalName();
> System.out.println("Trying to build process " + classpath + " " +
> className); ProcessBuilder builder = new ProcessBuilder( javaBin,
> "-cp", classpath, className, connectionString,
> Boolean.toString(addSentMessagesFrame));
> builder.redirectOutput(ProcessBuilder.Redirect.INHERIT);
> builder.redirectError(ProcessBuilder.Redirect.INHERIT); Process
> process = builder.start(); return process; } I also tried running
> that message-buffer process separately in another maven project
> and its packaged .jar file. That worked perfectly fine too. That
> is why I am assuming that my approach is not ap

Re: Job manager UI improvement

2018-11-08 Thread Dawid Wysakowicz

Hi Michael,


There are no metrics for actual state size so far, but Yun Tang's
suggestion is the best what you can do right now. You can also refer to
a similar previous thread in the ML[1]


I also add Aljosha to cc, who might know if there was any progress on
this topic since then.


Best,

Dawid


[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-State-monitoring-td17256.html


On 02/11/2018 04:55, Yun Tang wrote:
> Hi Michael
>
> You could view state size metrics in 'Checkpoints' UI tab[1], I think
> the state size shown here could meet your needs in most cases.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/checkpoint_monitoring.html#history-tab
> Apache Flink 1.6 Documentation: Monitoring Checkpointing
> 
> Flink’s web interface provides a tab to monitor the checkpoints of
> jobs. These stats are also available after the job has terminated.
> There are four different tabs to display information about your
> checkpoints: Overview, History, Summary, and Configuration. The
> following sections will cover all of ...
> ci.apache.org
>
> Best
> Yun Tang
> 
> *From:* Michael Latta 
> *Sent:* Friday, November 2, 2018 10:08
> *To:* user@flink.apache.org
> *Subject:* Job manager UI improvement
>  
> I would really like to see the job manager show metrics on state size,
> not just io per task. Is there a way to do that now, or is the metric
> there, and just needs some UI Ewok to show it?
>
> Michael
>
> Sent from my iPad


signature.asc
Description: OpenPGP digital signature

Re: TaskManagers cannot contact JobManager in Kubernetes when JobManager HA is enabled

2018-11-08 Thread Dawid Wysakowicz

Hi John,

Glad you resolved the issue. Also thanks for sharing the solution with ML!

Best,

Dawid

On 01/11/2018 16:22, John Stone wrote:
> I've managed to resolve the issue.  With HA enabled, you will see this 
> message in the logs:
>
> 2018-11-01 13:38:52,467 INFO  
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system 
> started at akka.tcp://flink@flink-jobmanager:40641
>
> Without HA enabled, you will see this message in the logs:
>
> 2018-11-01 13:38:52,467 INFO  
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system 
> started at akka.tcp://flink@flink-jobmanager:6123
>
> HA causes a random port assignment for the ResourceManager portion of the 
> JobManager.  This can be controlled by setting the 
> high-availability.jobmanager.port to a fixed port and exposing it in the 
> Kubernetes network configuration.



signature.asc
Description: OpenPGP digital signature

Re: Question about serialization and performance

2018-11-08 Thread Dawid Wysakowicz

Hi Michael,

Things I could suggest are:

* first of all KryoSerialization is sort of a fallback serialization
that is used if there is no better suited serialization that can be
used, as you said you are using some complex JSONObjects I would
recommend writing your own
org.apache.flink.api.common.typeutils.TypeSerializer for them.

* you could try to reduce the number of events you keep in state, make
sure you remove elements you no longer need

* I would revisit the map of lists of JSONObject structure as you have
to de/serialize the whole list on each access. Maybe you could implement
it with e.g. two maps? first map keeping just indices to JSONObjects in
the other Map? This way you won't have to deserialize whole lists of
complex objects.

* also make sure the performance bottleneck is while accessing the state
and not while forwarding events, check the performance for a job with
same logic and events, but without state.

Hope those pointers will help improve your job's performance.

Best,

Dawid

On 31/10/2018 23:58, TechnoMage wrote:
> In running tests of flink jobs we are seeing some that yield really good 
> performance (2.5M records in minutes) and others that are struggleing to get 
> past 200k records processed.  In the later case there are a large number of 
> keys, and each key gets state in the form of 3 value states.  One holds a 
> string and the others hold a Map of Lists of events (JSONObject object 
> subclasses with custom java serialization).  There is also a MapState for 
> each key that will hold one entry for each event matching that key 
> (string->string).
>
> The program starts out processing 1000-1500 records/sec (on my 4 year old 
> laptop), and progressively gets slower and slower.  it is about 400/sec when 
> processing the 500,000th event.
>
> When using JProfiler on the test (local environment running under Eclipse) it 
> indicates 70-80% of the execution time is spent in Kryo serialization methods.
>
> When using the MemoryStateBackend the above is true, the RocksDBStateBackend 
> is about 1/2 to 2/3 the speed.
>
> Any suggestions on how to reduce or identify the source of the serialization 
> performance issue is welcome.
>
> Michael

signature.asc
Description: OpenPGP digital signature

Re: akka timeout exception

2018-11-08 Thread Dawid Wysakowicz

Hi,

Could you provide us with some more information? Which version of flink
are you running? In which cluster setup? When does this exception occur?
This exception says that request for status overview (no of
taskmanagers, slots info etc.) failed.

Best,

Dawid

On 31/10/2018 20:05, Anil wrote:
> getting this error in my job manager too frequently. any help. Thanks!
>
> java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException:
> Ask timed out on [Actor[akka://flink/user/jobmanager#1927353472]] after
> [1 ms]. Sender[null] sent message of type
> "org.apache.flink.runtime.messages.webmonitor.RequestStatusOverview".
>   at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>   at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>   at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>   at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>   at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>   at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>   at
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:442)
>   at akka.dispatch.OnComplete.internal(Future.scala:258)
>   at akka.dispatch.OnComplete.internal(Future.scala:256)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>   at
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
>   at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>   at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>   at scala.concurrent.Promise$class.complete(Promise.scala:55)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>   at
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
>   at
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
>   at
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
>   at
> scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at 
> scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
>   at
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>   at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
>   at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>   at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>   at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>   at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>   at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>   at
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>   at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>   at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>   at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>   at
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>   at
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>   at
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/jobmanager#1927353472]] after [1 ms].
> Sender[null] sent message of type
> "org.apache.flink.runtime.messages.webmonitor.RequestStatusOverview".
>   at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>   ... 9 more
>
>
>
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



signature.asc
Description: OpenPGP digital signature

Re: Flink cluster security conf.: keberos.keytab add to run yarn-cluster

2018-11-08 Thread Dawid Wysakowicz

Hi Marke,

AFAIK Shuyi is right, there is no such option so far. Maybe you could do
though is to extend the "flink" script to substitute those parameters in
the file on each run, but I think it is a common practice to run flink
jobs on yarn from a single service user.

On 31/10/2018 19:52, Shuyi Chen wrote:
> Do you mean have these two options as the command line options? If so,
> AFAIK, I dont think it's supported now. What do you need it? Thanks.
>
> On Wed, Oct 31, 2018 at 11:43 AM Marke Builder
> mailto:marke.buil...@gmail.com>> wrote:
>
> Hi,
>
> So far I have added my keytab and principal in the  flink-conf.yaml:
> security.kerberos.login.keytab:
> security.kerberos.login.principal:
>
> But is there a way that I can add this to the "start script" ->
> run yarn-cluster .
>
>
> Thanks!  
>
>
>
> -- 
> "So you have to trust that the dots will somehow connect in your future."


signature.asc
Description: OpenPGP digital signature

Re: flink-1.6.1 :: job deployment :: detached mode

2018-11-08 Thread Till Rohrmann

Hi Mike,

could you also send me the YarnJobClusterEntrypoint logs. Thanks!

Cheers,
Till

On Wed, Nov 7, 2018 at 9:27 PM Mikhail Pryakhin  wrote:

> Hi Till,
> Thank you for your reply.
> Yes, I’ve upgraded to the latest Flink-1.6.2 and the problem is still
> there, please find the log file attached.
>
>
> Kind Regards,
> Mike Pryakhin
>
> On 7 Nov 2018, at 18:46, Till Rohrmann  wrote:
>
> Hi Mike,
>
> have you tried whether the problem also occurs with Flink 1.6.2? If yes,
> then please share with us the Flink logs with DEBUG log level to further
> debug the problem.
>
> Cheers,
> Till
>
> On Fri, Oct 26, 2018 at 5:46 PM Mikhail Pryakhin 
> wrote:
>
>> Hi community!
>>
>> Righ after I've upgraded flink up to flink-1.6.1 I get an exception
>> during job deployment as a YARN cluster.
>> The job is submitted with zookeper HA enabled, in detached mode.
>>
>> The flink yaml contains the following properties:
>>
>> high-availability: zookeeper
>> high-availability.zookeeper.quorum: 
>> high-availability.zookeeper.storageDir: hdfs:///
>> high-availability.zookeeper.path.root: 
>> high-availability.zookeeper.path.namespace: 
>>
>> the job is deployed via flink CLI command like the following:
>>
>> "${FLINK_HOME}/bin/flink" run \
>> -m yarn-cluster \
>> -ynm "${JOB_NAME}-${JOB_VERSION}" \
>> -yn "${tm_containers}" \
>> -ys "${tm_slots}" \
>> -ytm "${tm_memory}" \
>> -yjm "${jm_memory}" \
>> -p "${parallelism}" \
>> -yqu "${queue}" \
>> -yt "${YARN_APP_PATH}" \
>> -c "${MAIN_CLASS}" \
>> -yst \
>> -yd \
>> ${class_path} \
>> "${YARN_APP_PATH}"/"${APP_JAR}"
>>
>>
>> After the job has been successfully deplyed, I've got an exception:
>>
>> 2018-10-26 18:29:17,781 | ERROR | Curator-Framework-0 |
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
>> | Background exception was not retry-able or retry gave up
>> java.lang.InterruptedException
>> at java.lang.Object.wait(Native Method)
>> at java.lang.Object.wait(Object.java:502)
>> at
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1406)
>> at
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1097)
>> at
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1130)
>> at
>> org.apache.flink.shaded.curator.org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:274)
>> at
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CreateBuilderImpl$7.performBackgroundOperation(CreateBuilderImpl.java:561)
>> at
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:72)
>> at
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:831)
>> at
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>> at
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>> at
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> If the job is deployed in "attached mode" everything goes fine.
>>
>>
>>
>>
>>
>> Kind Regards,
>> Mike Pryakhin
>>
>>
>

Re: Flink 1.6, User Interface response time

2018-11-08 Thread Dawid Wysakowicz

Hi Oleksandr,

Have you checked the jobmanager logs to see if there are any exceptions?
What is the response code for request when it doesn't load?

Best,

Dawid

On 31/10/2018 16:49, Oleksandr Nitavskyi wrote:
>
> Hello!
>
>  
>
> We are migrating the the last 1.6.2 version and all the jobs seem to
> work fine, but when we check individual jobs through the web interface
> we encounter the issue that after clicking on a job, either it takes
> too long to load the information of the job or it never loads at all.
>
>  
>
> Has anyone had this issue? I know that UI responsiveness is quite
> subjective, but is there anybody who notice also some degradation
> between Flink 1.4 and 1.6?
>
>  
>
> Thank you,
>
> Juan
>
>  
>


signature.asc
Description: OpenPGP digital signature

43 matches

Mail list logo