Re: Deprecated SplitStream class - what should be use instead.

2019-12-20 Thread Kostas Kloudas
Hi Krzysztof,

If I get it correctly, your main reason behind not using side-outputs
is that it seems that "side-output", by the name, seems to be a
"second class citizen"  compared to the main output.
I see your point but in terms of functionality, there is no difference
between the different outputs from Flink's perspective. Both create
DataStreams that are full integrated with Flink's fault-tolerant state
handling (if checkpointing is enabled) and event-time handling. So I
believe it is safe to use them for your usecase.

I hope this helps,
Kostas

On Thu, Dec 19, 2019 at 10:30 PM KristoffSC
 wrote:
>
> Kostas, thank you for your response,
>
> Well although the Side Outputs would do the job, I was just surprised that
> those are the replacements for stream splitting.
>
> The thing is, and this is might be only a subjective opinion, it that I
> would assume that Side Outputs should be used only to produce something
> aside of the main processing function like control messages or some
> leftovers.
>
> In my case, I wanted to simply split the stream into two new streams based
> on some condition.
> With side outputs I will have to "treat" the second stream as a something
> additional to the main processing result.
>
> Like it is written in the docs:
> "*In addition* to the main stream that results from DataStream
> operations(...)"
>
> or
> "The type of data in the result streams does not have to match the type of
> data in the *main *stream and the types of the different side outputs can
> also differ. "
>
>
> I'm my case I don't have any "addition" to my main stream and actually both
> spitted streams are equally important :)
>
> So by writing that side outputs are not good for my use case I meant that
> they are not fitting conceptually, at least in my opinion.
>
> Regards,
> Krzysztof
>
>
>
>
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: [DISCUSS] Drop vendor specific repositories from pom.xml

2019-12-20 Thread Robert Metzger
Okay, I understand. I'm okay with removing the profile.

On Thu, Dec 19, 2019 at 11:34 AM Till Rohrmann  wrote:

> The profiles make bumping ZooKeeper's version a bit more cumbersome. I
> would be interested for this reason to get rid of them, too.
>
> Cheers,
> Till
>
> On Wed, Dec 18, 2019 at 5:35 PM Robert Metzger 
> wrote:
>
>> I guess we are talking about this profile [1] in the pom.xml?
>>
>> +1 to remove.
>>
>> I'm not sure if we need to rush this for the 1.10 release. The profile is
>> not doing us any harm at the moment.
>>
>> [1]https://github.com/apache/flink/blob/master/pom.xml#L1035
>>
>> On Wed, Dec 18, 2019 at 4:51 PM Till Rohrmann 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> following the discussion started by Seth [1] I would like to discuss
>>> dropping the vendor specific repositories from Flink's parent pom.xml. As
>>> building Flink against a vendor specific Hadoop version is no longer
>>> needed
>>> (as it simply needs to be added to the classpath) and documented, I
>>> believe
>>> that the vendor specific repositories and the mapr profile have become
>>> obsolete. Moreover, users can still use vendor specific Hadoop versions
>>> if
>>> they configure their local maven to point to the respective repository
>>> [2].
>>> Flink's sources would simply no longer be shipped with this option.
>>>
>>> Are there any concerns about dropping the vendor specific repositories
>>> from
>>> pom.xml? I would like to make this change for the upcoming Flink 1.10
>>> release if possible.
>>>
>>> [1]
>>>
>>> https://lists.apache.org/thread.html/83afcf6c0d5d7a0a7179cbdac9593ebe7478b0dc548781bf9915a006%40%3Cdev.flink.apache.org%3E
>>> [2]
>>> https://maven.apache.org/guides/mini/guide-multiple-repositories.html
>>>
>>> Cheers,
>>> Till
>>>
>>


Re: Deprecated SplitStream class - what should be use instead.

2019-12-20 Thread KristoffSC
Hi Kostas,
Thank you for the answer and clarification. 

If Side-outputs are treated in the same way and there is no significant
performance penalty then it seems that they are ok for my use case.

I can accept the name mismatch ;)

Regards,
Krzysztof



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Flink task node shut it self off.

2019-12-20 Thread John Smith
Hi, using Flink 1.8.0

1st off I must say Flink resiliency is very impressive, we lost a node and
never lost one message by using checkpoints and Kafka. Thanks!

The cluster is a self hosted cluster and we use our own zookeeper cluster.
We have...
3 zookeepers: 4 cpu, 8GB (each)
3 job nodes: 4 cpu, 8GB (each)
3 task nodes: 4 cpu, 8GB (each)
The nodes also share GlusterFS for storing savepoints and checkpoints,
GlusterFS is running on the same machines.

Yesterday a node shut itself off we the following log messages...
- Stopping TaskExecutor akka.tcp://fl...@xxx.xxx.xxx.73
:34697/user/taskmanager_0.
- Stop job leader service.
- Stopping ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
- Shutting down TaskExecutorLocalStateStoresManager.
- Shutting down BLOB cache
- Shutting down BLOB cache
- removed file cache directory
/tmp/flink-dist-cache-4b60d79b-1cef-4ffb-8837-3a9c9a205000
- I/O manager removed spill file directory
/tmp/flink-io-c9d01b92-2809-4a55-8ab3-6920487da0ed
- Shutting down the network environment and its components.

Prior to the node shutting off we noticed massive IOWAIT of 140% and CPU
load 1minute of 15. And we also got an hs_err file which sais we should
increase the memory.

I'm attaching the logs here:
https://www.dropbox.com/sh/vp1ytpguimiayw7/AADviCPED47QEy_4rHsGI1Nya?dl=0

I wonder if my 5 second checkpointing is too much for gluster.

Any thoughts?


Taskmanagers in Docker Fail to Resolve Own Hostnames and Won't Accept Tasks

2019-12-20 Thread Martin, Nick J [US] (IS)
I'm running Flink 1.7.2 in a Docker swarm. Intermittently, new task managers 
will fail to resolve their own host names when starting up. In the log I see 
"no hostname could be resolved" messages coming from TaskManagerLocation. The 
webUI on the jobmanager shows the taskmanagers as are associated/connected with 
the jobmanager, but their akka paths show their IP, rather than the container 
name that 'good' taskmanager show. Those taskmanagers that are listed by IP 
give 'failed to connect' errors when new jobs are started that try to use those 
taskmanagers, and that job eventually fails. But the taskmanagers with this 
condition still give regular heartbeats to the Jobmanager, so the jobmanager 
keeps trying to assign work to them. Does anyone know what's going on here?


Re: Flink task node shut it self off.

2019-12-20 Thread jingjing bai
hi john

in our experience , the checkpoint interval we set interval 1-10 minute and
timeout usurally   5*interval . mostly we set 2 or 5 minute and 10 or
20timeout.
it depend on u data bulk per second and which window used.

John Smith  于2019年12月21日周六 上午5:26写道:

> Hi, using Flink 1.8.0
>
> 1st off I must say Flink resiliency is very impressive, we lost a node and
> never lost one message by using checkpoints and Kafka. Thanks!
>
> The cluster is a self hosted cluster and we use our own zookeeper cluster.
> We have...
> 3 zookeepers: 4 cpu, 8GB (each)
> 3 job nodes: 4 cpu, 8GB (each)
> 3 task nodes: 4 cpu, 8GB (each)
> The nodes also share GlusterFS for storing savepoints and checkpoints,
> GlusterFS is running on the same machines.
>
> Yesterday a node shut itself off we the following log messages...
> - Stopping TaskExecutor akka.tcp://fl...@xxx.xxx.xxx.73
> :34697/user/taskmanager_0.
> - Stop job leader service.
> - Stopping ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
> - Shutting down TaskExecutorLocalStateStoresManager.
> - Shutting down BLOB cache
> - Shutting down BLOB cache
> - removed file cache directory
> /tmp/flink-dist-cache-4b60d79b-1cef-4ffb-8837-3a9c9a205000
> - I/O manager removed spill file directory
> /tmp/flink-io-c9d01b92-2809-4a55-8ab3-6920487da0ed
> - Shutting down the network environment and its components.
>
> Prior to the node shutting off we noticed massive IOWAIT of 140% and CPU
> load 1minute of 15. And we also got an hs_err file which sais we should
> increase the memory.
>
> I'm attaching the logs here:
> https://www.dropbox.com/sh/vp1ytpguimiayw7/AADviCPED47QEy_4rHsGI1Nya?dl=0
>
> I wonder if my 5 second checkpointing is too much for gluster.
>
> Any thoughts?
>
>
>
>
>