Re: Latency tracking together with broadcast state can cause job failure

2020-06-16 Thread Arvid Heise
Hi Lasse,

your reported issue [1] will be fixed in the next release of 1.10 and the
upcoming 1.11.
Thank you for your detailed report.

[1] https://issues.apache.org/jira/browse/FLINK-17322

On Wed, Apr 22, 2020 at 12:54 PM Lasse Nedergaard <
lassenedergaardfl...@gmail.com> wrote:

> Hi Yun
>
> Thanks for looking into it and forwarded it to the right place.
>
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>
> Den 22. apr. 2020 kl. 11.06 skrev Yun Tang :
>
> 
> Hi Lasse
>
> After debug locally, this should be a bug in Flink (even the latest
> version). However, the bug should be caused in network stack with which I
> am not very familiar and not so easy to find root cause directly. After
> discussion with our network guys in Flink, we decide to first create
> FLINK-17322 [1] to track this problem, and related owner would take a look
> at this problem.
>
> Really thank you for reporting this bug.
>
> [1] https://issues.apache.org/jira/browse/FLINK-17322
>
> Best
> Yun Tang
> --
> *From:* Yun Tang 
> *Sent:* Wednesday, April 22, 2020 1:43
> *To:* Lasse Nedergaard 
> *Cc:* user 
> *Subject:* Re: Latency tracking together with broadcast state can cause
> job failure
>
> Hi Lasse
>
> Really sorry for missing your reply. I'll run your project and find the
> root cause in my day time. And thanks for @Robert Metzger
>  's kind remind.
>
> Best
> Yun Tang
> ------
> *From:* Robert Metzger 
> *Sent:* Tuesday, April 21, 2020 20:01
> *To:* Lasse Nedergaard 
> *Cc:* Yun Tang ; user 
> *Subject:* Re: Latency tracking together with broadcast state can cause
> job failure
>
> Hey Lasse,
> has the problem been resolved?
>
> (I'm also responding to this to make sure the thread gets attention again
> :) )
>
> Best,
> Robert
>
>
> On Wed, Apr 1, 2020 at 10:03 PM Lasse Nedergaard <
> lassenedergaardfl...@gmail.com> wrote:
>
> Hi
>
> I have attached a simple project with a test that reproduce the problem.
> The normal fault is a mixed string but you can also EOF exception.
> Please let me know if you have any questions to the solution.
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>
> Den 1. apr. 2020 kl. 09.15 skrev Yun Tang :
>
> 
> Hi Lasse
>
> Never meet this problem before, but can you share some exception stack
> trace so that we could take a look. The simple project to reproduce is also
> a good choice.
>
> Best
> Yun Tang
> --
> *From:* Lasse Nedergaard 
> *Sent:* Tuesday, March 31, 2020 19:10
> *To:* user 
> *Subject:* Latency tracking together with broadcast state can cause job
> failure
>
> Hi
>
> We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and
> Index out of range exception in one of our job. We also get out of memory
> exceptions.
> We have now identified it as a latency tracking together with broadcast
> state Causing the problem. When we do integration testing locally we don’t
> see any problem it’s only fails running on the cluster.
> We have concluded that latency tracking package send over broadcast cause
> the data stream to be corrupted and causing the exceptions.
> We work on preparing a simple project on github to reproduce the problem
> so the underlying problem can be solved.
>
> Anyone else have seen these kind of problems?
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng


Re: Latency tracking together with broadcast state can cause job failure

2020-04-22 Thread Lasse Nedergaard
Hi Yun

Thanks for looking into it and forwarded it to the right place. 


Med venlig hilsen / Best regards
Lasse Nedergaard


> Den 22. apr. 2020 kl. 11.06 skrev Yun Tang :
> 
> 
> Hi Lasse
> 
> After debug locally, this should be a bug in Flink (even the latest version). 
> However, the bug should be caused in network stack with which I am not very 
> familiar and not so easy to find root cause directly. After discussion with 
> our network guys in Flink, we decide to first create FLINK-17322 [1] to track 
> this problem, and related owner would take a look at this problem.
> 
> Really thank you for reporting this bug.
> 
> [1] https://issues.apache.org/jira/browse/FLINK-17322
> 
> Best
> Yun Tang
> From: Yun Tang 
> Sent: Wednesday, April 22, 2020 1:43
> To: Lasse Nedergaard 
> Cc: user 
> Subject: Re: Latency tracking together with broadcast state can cause job 
> failure
>  
> Hi Lasse
> 
> Really sorry for missing your reply. I'll run your project and find the root 
> cause in my day time. And thanks for @Robert Metzger 's kind remind.
> 
> Best
> Yun Tang
> From: Robert Metzger 
> Sent: Tuesday, April 21, 2020 20:01
> To: Lasse Nedergaard 
> Cc: Yun Tang ; user 
> Subject: Re: Latency tracking together with broadcast state can cause job 
> failure
>  
> Hey Lasse,
> has the problem been resolved?
> 
> (I'm also responding to this to make sure the thread gets attention again :) )
> 
> Best,
> Robert
> 
> 
>> On Wed, Apr 1, 2020 at 10:03 PM Lasse Nedergaard 
>>  wrote:
>> Hi
>> 
>> I have attached a simple project with a test that reproduce the problem. The 
>> normal fault is a mixed string but you can also EOF exception. 
>> Please let me know if you have any questions to the solution. 
>> 
>> Med venlig hilsen / Best regards
>> Lasse Nedergaard
>> 
>> 
>> Den 1. apr. 2020 kl. 09.15 skrev Yun Tang :
>> 
>> 
>> Hi Lasse
>> 
>> Never meet this problem before, but can you share some exception stack trace 
>> so that we could take a look. The simple project to reproduce is also a good 
>> choice.
>> 
>> Best
>> Yun Tang
>> From: Lasse Nedergaard 
>> Sent: Tuesday, March 31, 2020 19:10
>> To: user 
>> Subject: Latency tracking together with broadcast state can cause job failure
>>  
>> Hi
>> 
>> We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and 
>> Index out of range exception in one of our job. We also get out of memory 
>> exceptions. 
>> We have now identified it as a latency tracking together with broadcast 
>> state Causing the problem. When we do integration testing locally we don’t 
>> see any problem it’s only fails running on the cluster. 
>> We have concluded that latency tracking package send over broadcast cause 
>> the data stream to be corrupted and causing the exceptions. 
>> We work on preparing a simple project on github to reproduce the problem so 
>> the underlying problem can be solved. 
>> 
>> Anyone else have seen these kind of problems?
>> 
>> Med venlig hilsen / Best regards
>> Lasse Nedergaard
>> 


Re: Latency tracking together with broadcast state can cause job failure

2020-04-22 Thread Yun Tang
Hi Lasse

After debug locally, this should be a bug in Flink (even the latest version). 
However, the bug should be caused in network stack with which I am not very 
familiar and not so easy to find root cause directly. After discussion with our 
network guys in Flink, we decide to first create FLINK-17322 [1] to track this 
problem, and related owner would take a look at this problem.

Really thank you for reporting this bug.

[1] https://issues.apache.org/jira/browse/FLINK-17322

Best
Yun Tang

From: Yun Tang 
Sent: Wednesday, April 22, 2020 1:43
To: Lasse Nedergaard 
Cc: user 
Subject: Re: Latency tracking together with broadcast state can cause job 
failure

Hi Lasse

Really sorry for missing your reply. I'll run your project and find the root 
cause in my day time. And thanks for @Robert 
Metzger<mailto:rmetz...@apache.org> 's kind remind.

Best
Yun Tang

From: Robert Metzger 
Sent: Tuesday, April 21, 2020 20:01
To: Lasse Nedergaard 
Cc: Yun Tang ; user 
Subject: Re: Latency tracking together with broadcast state can cause job 
failure

Hey Lasse,
has the problem been resolved?

(I'm also responding to this to make sure the thread gets attention again :) )

Best,
Robert


On Wed, Apr 1, 2020 at 10:03 PM Lasse Nedergaard 
mailto:lassenedergaardfl...@gmail.com>> wrote:
Hi

I have attached a simple project with a test that reproduce the problem. The 
normal fault is a mixed string but you can also EOF exception.
Please let me know if you have any questions to the solution.

Med venlig hilsen / Best regards
Lasse Nedergaard


Den 1. apr. 2020 kl. 09.15 skrev Yun Tang 
mailto:myas...@live.com>>:


Hi Lasse

Never meet this problem before, but can you share some exception stack trace so 
that we could take a look. The simple project to reproduce is also a good 
choice.

Best
Yun Tang

From: Lasse Nedergaard 
mailto:lassenedergaardfl...@gmail.com>>
Sent: Tuesday, March 31, 2020 19:10
To: user mailto:user@flink.apache.org>>
Subject: Latency tracking together with broadcast state can cause job failure

Hi

We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and Index 
out of range exception in one of our job. We also get out of memory exceptions.
We have now identified it as a latency tracking together with broadcast state 
Causing the problem. When we do integration testing locally we don’t see any 
problem it’s only fails running on the cluster.
We have concluded that latency tracking package send over broadcast cause the 
data stream to be corrupted and causing the exceptions.
We work on preparing a simple project on github to reproduce the problem so the 
underlying problem can be solved.

Anyone else have seen these kind of problems?

Med venlig hilsen / Best regards
Lasse Nedergaard



Re: Latency tracking together with broadcast state can cause job failure

2020-04-21 Thread Yun Tang
Hi Lasse

Really sorry for missing your reply. I'll run your project and find the root 
cause in my day time. And thanks for @Robert 
Metzger<mailto:rmetz...@apache.org> 's kind remind.

Best
Yun Tang

From: Robert Metzger 
Sent: Tuesday, April 21, 2020 20:01
To: Lasse Nedergaard 
Cc: Yun Tang ; user 
Subject: Re: Latency tracking together with broadcast state can cause job 
failure

Hey Lasse,
has the problem been resolved?

(I'm also responding to this to make sure the thread gets attention again :) )

Best,
Robert


On Wed, Apr 1, 2020 at 10:03 PM Lasse Nedergaard 
mailto:lassenedergaardfl...@gmail.com>> wrote:
Hi

I have attached a simple project with a test that reproduce the problem. The 
normal fault is a mixed string but you can also EOF exception.
Please let me know if you have any questions to the solution.

Med venlig hilsen / Best regards
Lasse Nedergaard


Den 1. apr. 2020 kl. 09.15 skrev Yun Tang 
mailto:myas...@live.com>>:


Hi Lasse

Never meet this problem before, but can you share some exception stack trace so 
that we could take a look. The simple project to reproduce is also a good 
choice.

Best
Yun Tang

From: Lasse Nedergaard 
mailto:lassenedergaardfl...@gmail.com>>
Sent: Tuesday, March 31, 2020 19:10
To: user mailto:user@flink.apache.org>>
Subject: Latency tracking together with broadcast state can cause job failure

Hi

We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and Index 
out of range exception in one of our job. We also get out of memory exceptions.
We have now identified it as a latency tracking together with broadcast state 
Causing the problem. When we do integration testing locally we don’t see any 
problem it’s only fails running on the cluster.
We have concluded that latency tracking package send over broadcast cause the 
data stream to be corrupted and causing the exceptions.
We work on preparing a simple project on github to reproduce the problem so the 
underlying problem can be solved.

Anyone else have seen these kind of problems?

Med venlig hilsen / Best regards
Lasse Nedergaard



Re: Latency tracking together with broadcast state can cause job failure

2020-04-21 Thread Robert Metzger
Hey Lasse,
has the problem been resolved?

(I'm also responding to this to make sure the thread gets attention again
:) )

Best,
Robert


On Wed, Apr 1, 2020 at 10:03 PM Lasse Nedergaard <
lassenedergaardfl...@gmail.com> wrote:

> Hi
>
> I have attached a simple project with a test that reproduce the problem.
> The normal fault is a mixed string but you can also EOF exception.
> Please let me know if you have any questions to the solution.
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>
> Den 1. apr. 2020 kl. 09.15 skrev Yun Tang :
>
> 
> Hi Lasse
>
> Never meet this problem before, but can you share some exception stack
> trace so that we could take a look. The simple project to reproduce is also
> a good choice.
>
> Best
> Yun Tang
> --
> *From:* Lasse Nedergaard 
> *Sent:* Tuesday, March 31, 2020 19:10
> *To:* user 
> *Subject:* Latency tracking together with broadcast state can cause job
> failure
>
> Hi
>
> We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and
> Index out of range exception in one of our job. We also get out of memory
> exceptions.
> We have now identified it as a latency tracking together with broadcast
> state Causing the problem. When we do integration testing locally we don’t
> see any problem it’s only fails running on the cluster.
> We have concluded that latency tracking package send over broadcast cause
> the data stream to be corrupted and causing the exceptions.
> We work on preparing a simple project on github to reproduce the problem
> so the underlying problem can be solved.
>
> Anyone else have seen these kind of problems?
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>


Re: Latency tracking together with broadcast state can cause job failure

2020-04-01 Thread Lasse Nedergaard
HiI have attached a simple project with a test that reproduce the problem. The normal fault is a mixed string but you can also EOF exception. Please let me know if you have any questions to the solution. Med venlig hilsen / Best regardsLasse Nedergaard

Telematics2-feature-flink-1.10-latency-tracking-broken
Description: Zip archive
Den 1. apr. 2020 kl. 09.15 skrev Yun Tang :






Hi Lasse




Never meet this problem before, but can you share some exception stack trace so that we could take a look. The simple project to reproduce is also a good choice.





Best

Yun Tang



From: Lasse Nedergaard 
Sent: Tuesday, March 31, 2020 19:10
To: user 
Subject: Latency tracking together with broadcast state can cause job failure

 


Hi

We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and Index out of range exception in one of our job. We also get out of memory exceptions.

We have now identified it as a latency tracking together with broadcast state Causing the problem. When we do integration testing locally we don’t see any problem it’s only fails running on the cluster.

We have concluded that latency tracking package send over broadcast cause the data stream to be corrupted and causing the exceptions.

We work on preparing a simple project on github to reproduce the problem so the underlying problem can be solved.


Anyone else have seen these kind of problems?

Med venlig hilsen / Best regards
Lasse Nedergaard







Re: Latency tracking together with broadcast state can cause job failure

2020-04-01 Thread Yun Tang
Hi Lasse

Never meet this problem before, but can you share some exception stack trace so 
that we could take a look. The simple project to reproduce is also a good 
choice.

Best
Yun Tang

From: Lasse Nedergaard 
Sent: Tuesday, March 31, 2020 19:10
To: user 
Subject: Latency tracking together with broadcast state can cause job failure

Hi

We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and Index 
out of range exception in one of our job. We also get out of memory exceptions.
We have now identified it as a latency tracking together with broadcast state 
Causing the problem. When we do integration testing locally we don’t see any 
problem it’s only fails running on the cluster.
We have concluded that latency tracking package send over broadcast cause the 
data stream to be corrupted and causing the exceptions.
We work on preparing a simple project on github to reproduce the problem so the 
underlying problem can be solved.

Anyone else have seen these kind of problems?

Med venlig hilsen / Best regards
Lasse Nedergaard



Latency tracking together with broadcast state can cause job failure

2020-03-31 Thread Lasse Nedergaard
Hi

We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and Index 
out of range exception in one of our job. We also get out of memory exceptions. 
We have now identified it as a latency tracking together with broadcast state 
Causing the problem. When we do integration testing locally we don’t see any 
problem it’s only fails running on the cluster. 
We have concluded that latency tracking package send over broadcast cause the 
data stream to be corrupted and causing the exceptions. 
We work on preparing a simple project on github to reproduce the problem so the 
underlying problem can be solved. 

Anyone else have seen these kind of problems?

Med venlig hilsen / Best regards
Lasse Nedergaard