Re: Flink upgrade to Flink-1.12

2021-01-25 Thread Ufuk Celebi
Thanks for reaching out. Semi-asynchronous does *not* refer to incremental checkpoints and Savepoints are always triggered as full snapshots (not incremental). Earlier versions of the RocksDb state backend supported two snapshotting modes, fully and semi-asynchronous snapshots.

Re: Flink Jobmanager HA deployment on k8s

2021-01-21 Thread Ufuk Celebi
@Yang: I think this would be valuable to document. I think it's a natural question to ask whether you can have standby JMs with Kubernetes. What do you think? If you agree, we could create a JIRA ticket and work on the "official" docs for this. On Thu, Jan 21, 2021, at 5:05 AM, Yang Wang

Re: Savepoint Location from Flink REST API

2020-04-02 Thread Ufuk Celebi
re > >> { >> "status": { >> "id": "completed" >> }, >> *"operation"*: { >> "failure-cause": { >> "class": "string", >> "stack-trace": "string"

Re: Deploying native Kubernetes via yaml files?

2020-03-24 Thread Ufuk Celebi
PS: See also https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes On Tue, Mar 24, 2020 at 2:49 PM Ufuk Celebi wrote: > Hey Niels, > > you can check out the README with example configuration files here: > https:

Re: Deploying native Kubernetes via yaml files?

2020-03-24 Thread Ufuk Celebi
Hey Niels, you can check out the README with example configuration files here: https://github.com/apache/flink/tree/master/flink-container/kubernetes Is that what you were looking for? Best, Ufuk On Tue, Mar 24, 2020 at 2:42 PM Niels Basjes wrote: > Hi, > > As clearly documented here >

Re: Savepoint Location from Flink REST API

2020-03-20 Thread Ufuk Celebi
Hey Aaron, you can expect one of the two responses for COMPLETED savepoints [1, 2]. 1. Success { "status": { "id": "completed" }, "savepoint": { "location": "string" } } 2. Failure { "status": { "id": "completed" }, "savepoint": { "failure-cause": {

Re: [DISCUSS] Drop Savepoint Compatibility with Flink 1.2

2020-02-22 Thread Ufuk Celebi
Hey Stephan, +1. Reading over the linked ticket and your description here, I think it makes a lot of sense to go ahead with this. Since it's possible to upgrade via intermediate Flink releases as a fail-safe I don't have any concerns. – Ufuk On Fri, Feb 21, 2020 at 4:34 PM Till Rohrmann

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-19 Thread Ufuk Celebi
I'm late to the party... Welcome and congrats! :-) – Ufuk On Mon, Aug 19, 2019 at 9:26 AM Andrey Zagrebin wrote: > Hi Everybody! > > Thanks a lot for the warn welcome! > I am really happy about joining Flink committer team and hope to help the > project to grow more. > > Cheers, > Andrey > >

Re: Flink 1.8.1: Seeing {"errors":["Not found."]} when trying to access the Jobmanagers web interface

2019-08-06 Thread Ufuk Celebi
Hey Tobias, out of curiosity: were you using the job/application cluster (as documented here: https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/docker.html#flink-job-cluster )? – Ufuk On Tue, Aug 6, 2019 at 1:50 PM Kaymak, Tobias wrote: > I was using Apache Beam and

Re: No yarn option in self-built flink version

2019-06-12 Thread Ufuk Celebi
@Arnaud: Turns out those examples are on purpose. As Chesnay pointed out in the ticket, there are also cases where you don't necessarily want to bundle the Hadoop dependency, but still want to set a version like that. On Wed, Jun 12, 2019 at 9:32 AM Ufuk Celebi wrote: > I created ht

Re: No yarn option in self-built flink version

2019-06-12 Thread Ufuk Celebi
e above paragraph… > > > > Arnaud > > > > *De :* Ufuk Celebi > *Envoyé :* vendredi 7 juin 2019 12:00 > *À :* LINZ, Arnaud > *Cc :* user ; ches...@apache.org > *Objet :* Re: No yarn option in self-built flink version > > > > Hey Arnaud, > >

Re: No yarn option in self-built flink version

2019-06-07 Thread Ufuk Celebi
Hey Arnaud, I think you need to active the Hadoop profile via -Pinclude-hadoop (the default was changed to not include Hadoop as far as I know). For more details, check out:

Re: QueryableState startup regression in 1.8.0 ( migration from 1.7.2 )

2019-04-29 Thread Ufuk Celebi
Actually, I couldn't even find a mention of this flag in the docs here: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/queryable_state.html – Ufuk On Mon, Apr 29, 2019 at 8:45 AM Ufuk Celebi wrote: > I didn't find this as part of the > https://flink.apac

Re: QueryableState startup regression in 1.8.0 ( migration from 1.7.2 )

2019-04-29 Thread Ufuk Celebi
I didn't find this as part of the https://flink.apache.org/news/2019/04/09/release-1.8.0.html notes. I think an update to the Important Changes section would be valuable for users upgrading to 1.8 from earlier releases. Also, logging that the library is on the classpath but the feature flag is

Re: Flink: How to use zookeeper when deployed in k8s?

2019-04-01 Thread Ufuk Celebi
You can set jobmanager.rpc.address: jmServiceName high-availability.jobmanager.port: 6123 in flink-conf.yaml and expose the port in the JobManager service. – Ufuk On Mon, Apr 1, 2019 at 9:29 AM sora wrote: > > Hello, > I encountered a problem when deploying flink to k8s. > When

Re: What are savepoint state manipulation support plans

2019-03-28 Thread Ufuk Celebi
7 > Let's move any further discussions there. > > Cheers, > Gordon > > On Thu, Mar 28, 2019 at 5:01 PM Ufuk Celebi wrote: >> >> I think such a tool would be really valuable to users. >> >> @Gordon: What do you think about creating an umbrella ticket for

Re: What are savepoint state manipulation support plans

2019-03-28 Thread Ufuk Celebi
I think such a tool would be really valuable to users. @Gordon: What do you think about creating an umbrella ticket for this and linking it in this thread? That way, it's easier to follow this effort. You could also link Bravo and Seth's tool in the ticket as starting points. – Ufuk

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-06 Thread Ufuk Celebi
I like Shaoxuan's idea to keep this a static site first. We could then iterate on this and make it a dynamic thing. Of course, if we have the resources in the community to quickly start with a dynamic site, I'm not apposed. – Ufuk On Wed, Mar 6, 2019 at 2:31 PM Robert Metzger wrote: > >

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-29 Thread Ufuk Celebi
> extends InputStream > with Seekable > with PositionedReadable { > > def read(): Int = underlying.read() > def seek(pos: Long): Unit = underlying.seek(pos) > def getPos: Long = underlying.getPos > } > ... > ``` > > Thanks for all your help,

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-28 Thread Ufuk Celebi
ng.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) >>>>> [2019-01-23 19:52:33.081904] at >>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) >>>>> [2019-01-23 19:52:33.081946] at >>>>> org.apac

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Ufuk Celebi
On Wed, Jan 23, 2019 at 11:01 AM Timo Walther wrote: > I think what is more important than a big dist bundle is a helpful > "Downloads" page where users can easily find available filesystems, > connectors, metric repoters. Not everyone checks Maven central for > available JAR files. I just saw

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Ufuk Celebi
I like the idea of a leaner binary distribution. At the same time I agree with Jamie that the current binary is quite convenient and connection speeds should not be that big of a deal. Since the binary distribution is one of the first entry points for users, I'd like to keep it as user-friendly as

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-21 Thread Ufuk Celebi
Hey Aaron, sorry for the late reply. (1) I think I was able to reproduce this issue using snappy-java. I've filed a ticket here: https://issues.apache.org/jira/browse/FLINK-11402. Can you check the ticket description whether it's in line with what you are experiencing? Most importantly, do you

Re: [DISCUSS] Dropping flink-storm?

2019-01-10 Thread Ufuk Celebi
+1 to drop. I totally agree with your reasoning. I like that we tried to keep it, but I don't think the maintenance overhead would be justified. – Ufuk On Wed, Jan 9, 2019 at 4:09 PM Till Rohrmann wrote: > > With https://issues.apache.org/jira/browse/FLINK-10571, we will remove the > Storm

Re: Per job cluster doesn't shut down after the job is canceled

2018-11-14 Thread Ufuk Celebi
Hey Paul, It might be related to this: https://github.com/apache/flink/pull/7004 (see linked issue for details). Best, Ufuk > On Nov 14, 2018, at 09:46, Paul Lam wrote: > > Hi Gary, > > Thanks for your reply and sorry for the delay. The attachment is the > jobmanager logs after invoking

Re: Where is the "Latest Savepoint" information saved?

2018-11-11 Thread Ufuk Celebi
see your savepoint there. Best, Ufuk On Sun, Nov 11, 2018 at 7:45 PM Hao Sun wrote: > > This is great, I will try option 3 and let you know. > Can I log some message so I know job is recovered from the latest savepoint? > > On Sun, Nov 11, 2018 at 10:42 AM Ufuk Celebi wrote

Re: java.io.IOException: NSS is already initialized

2018-11-11 Thread Ufuk Celebi
with the hadoop flavour. > > On Fri, Nov 9, 2018 at 2:02 PM Ufuk Celebi wrote: >> >> Hey Hao Sun, >> >> - Is this an intermittent failure or permanent? The logs indicate that >> some checkpoints completed before the error occurs (e.g. checkpoint >> numb

Re: Where is the "Latest Savepoint" information saved?

2018-11-11 Thread Ufuk Celebi
Hey Hao and Paul, 1) Fetch checkpoint info manually from ZK (problematic, not recommended) - As Paul pointed out, this is problematic as the node is a serialized pointer (StateHandle) to a CompletedCheckpoint in the HA storage directory and not a path [1]. - I would not recommend this approach at

Re: java.io.IOException: NSS is already initialized

2018-11-09 Thread Ufuk Celebi
Hey Hao Sun, - Is this an intermittent failure or permanent? The logs indicate that some checkpoints completed before the error occurs (e.g. checkpoint numbers are greater than 1). - Which Java versions are you using? And which Java image? I've Googled similar issues that seem to be related to

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

2018-11-04 Thread Ufuk Celebi
Hey Hao Sun, this has been changed recently [1] in order to properly support failover in job cluster mode. A workaround for you would be to add an application identifier to the checkpoint path of each application, resulting in S3 paths like application-/00...00/chk-64. Is that a feasible

Re: Default zookeeper

2018-05-14 Thread Ufuk Celebi
No, there is no difference if the version in your distro is part of the ZooKeeper 3.4.x series. The script is there for convenience during local testing/dev. – Ufuk On Sun, May 13, 2018 at 3:49 PM, miki haiat wrote: > When downloading the the flink source in order to run it

Re: PartitionNotFoundException after deployment

2018-05-04 Thread Ufuk Celebi
Hey Gyula! I'm including Piotr and Nico (cc'd) who have worked on the network stack in the last releases. Registering the network structures including the intermediate results actually happens **before** any state is restored. I'm not sure why this reproducibly happens when you restore state.

Re: Beam quickstart

2018-04-25 Thread Ufuk Celebi
Hey Gyula, including Aljoscha (cc) here who is a committer at the Beam project. Did you also ask on the Beam mailing list? – Ufuk On Wed, Apr 25, 2018 at 3:32 PM, Gyula Fóra wrote: > Hey, > Is there somewhere an end to end guide how to run a simple beam-on-flink >

Re: Does web ui hang for 1.4 for job submission ?

2018-01-24 Thread Ufuk Celebi
It's not a stupid question at all! Try the following please: 1) Use something like Chrome's Developer Tools to check the responses you get from the web UI. If you see an error there, that should point you to what's going on. 2) Enable DEBUG logging for the JobManager and check the logs (if 1

Re: Back-pressure Status shows OK but records are backed up in kafka

2018-01-07 Thread Ufuk Celebi
Hey Jins, our current back pressure tracking mechanism does not work with Kafka sources. To gather back pressure indicators we sample the main task thread of a subtask. For most tasks, this is the thread that emits records downstream (e.g. if you have a map function) and everything works as

Re: How to stop FlinkKafkaConsumer and make job finished?

2017-12-29 Thread Ufuk Celebi
ream markers. > > https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/util/serialization/KeyedDeserializationSchema.html#isEndOfStream-T- > > Eron > > On Wed, Dec 27, 2017 at 12:35 AM, Ufuk Celebi <u...@apache.org> wrote: >> >&

Re: RichAsyncFunction in scala

2017-12-28 Thread Ufuk Celebi
Hey Antoine, isn't it possible to use the Java RichAsyncFunction from Scala like this: class Test extends RichAsyncFunction[Int, Int] { override def open(parameters: Configuration): Unit = super.open(parameters) override def asyncInvoke(input: Int, resultFuture:

Re: keyby() issue

2017-12-28 Thread Ufuk Celebi
Hey Jinhua, On Thu, Dec 28, 2017 at 9:57 AM, Jinhua Luo wrote: > The keyby() upon the field would generate unique key as the field > value, so if the number of the uniqueness is huge, flink would have > trouble both on cpu and memory. Is it considered in the design of >

Re: org.apache.zookeeper.ClientCnxn, Client session timed out

2017-12-28 Thread Ufuk Celebi
On Thu, Dec 28, 2017 at 12:11 AM, Hao Sun wrote: > Thanks! Great to know I do not have to worry duplicates inside Flink. > > One more question, why this happens? Because TM and JM both check leadership > in different interval? Yes, it's not deterministic how this happens.

Re: org.apache.zookeeper.ClientCnxn, Client session timed out

2017-12-27 Thread Ufuk Celebi
On Wed, Dec 27, 2017 at 4:41 PM, Hao Sun wrote: > Somehow TM detected JM leadership loss from ZK and self disconnected? > And couple of seconds later, JM failed to connect to ZK? > Yes, exactly as you describe. The TM noticed the loss of leadership before the JM did. >

Re: MergingWindow

2017-12-27 Thread Ufuk Celebi
Please check your email before sending it the next time as three emails for the same message is a little spammy ;-) This is internal code that is used to implement session windows as far as I can tell. The idea is to not merge the new window as it never had any state associated with it. The

Re: Apache Flink - broadcasting DataStream

2017-12-27 Thread Ufuk Celebi
Hey Mans! This refers to how sub tasks are connected to each other in your program. If you have a single sub task A1 and three sub tasks B1, B2, B3, broadcast will emit each incoming record at A1 to all B1, B2, B3: A1 --+-> B1 +-> B2 +-> B3 Does this help? On Mon, Dec 25, 2017 at

Re: flink yarn-cluster run job --files

2017-12-27 Thread Ufuk Celebi
The file URL needs to be accessible from all nodes, e.g. something like S3://... or hdfs://... >From the CLI: ``` Adds a URL to each user code classloader on all nodes in the cluster. The paths must specify a protocol (e.g. file://) and be accessible on all nodes (e.g. by means of a NFS share).

Re: Fetching TaskManager log failed

2017-12-27 Thread Ufuk Celebi
Thanks for reporting this issue. A few questions: - Which version of Flink are you using? - Does it work up to the point that the Exception is thrown? e.g. for smaller files it's OK? Let me pull in Chesnay (cc'd) who has worked on log fetching for the web runtime. – Ufuk On Tue, Dec 26, 2017

Re: How to stop FlinkKafkaConsumer and make job finished?

2017-12-27 Thread Ufuk Celebi
Hey Jaxon, I don't think it's possible to control this via the life-cycle methods of your functions. Note that Flink currently does not support graceful stop in a meaningful manner and you can only cancel running jobs. What comes to my mind to cancel on EOF: 1) Extend Kafka consumer to stop

Re: Flink network access control documentation

2017-12-27 Thread Ufuk Celebi
Hey Elias, thanks for opening a ticket (for reference: https://issues.apache.org/jira/browse/FLINK-8311). I fully agree with adding docs for this. I will try to write something down this week. Your point about JobManagers only coordinating via ZK is correct though. I had a look into the

Re: state.checkpoints.dir not configured

2017-12-21 Thread Ufuk Celebi
23 is already bind but it says that state.checkpoints.dir is > not set. > > Thanks > > > > On 19.12.2017 17:55, Ufuk Celebi wrote: >> >> When the JobManager/TaskManager are starting up they log what config >> they are loading. Look for lines like >> >> &q

Re: Add new slave to running cluster?

2017-12-20 Thread Ufuk Celebi
Hey Jinhua, - The `slaves` file is only relevant for the startup scripts. You can add as many task managers as you like by starting them manually. - You can check the logs of the JobManager or its web UI (jobmanager-host:8081) to see how many TMs have registered. - The registration time out looks

Re: state.checkpoints.dir not configured

2017-12-19 Thread Ufuk Celebi
When the JobManager/TaskManager are starting up they log what config they are loading. Look for lines like "Loading configuration property: {}, {}" Do you find the required configuration as part of these messages? – Ufuk On Tue, Dec 19, 2017 at 3:45 PM, Plamen Paskov

Re: consecutive stream aggregations

2017-12-18 Thread Ufuk Celebi
how to achieve this. > > > > On 15.12.2017 17:52, Ufuk Celebi wrote: >> >> You can first aggregate the length per user and emit it downstream. >> Then you do the all window and average all lengths. Does that make >> sense? >> >> On Fri, Dec 15, 2017 at 4:48 PM, Pl

Re: consecutive stream aggregations

2017-12-15 Thread Ufuk Celebi
solution in this case? > > Thanks > > > On 15.12.2017 15:46, Ufuk Celebi wrote: >> >> Hey Plamen, >> >> I think what you are looking for is the AggregateFunction. This you >> can use on keyed streams. The Javadoc [1] contains an example for your >>

Re: consecutive stream aggregations

2017-12-15 Thread Ufuk Celebi
Hey Plamen, I think what you are looking for is the AggregateFunction. This you can use on keyed streams. The Javadoc [1] contains an example for your use case (averaging). – Ufuk [1]

Re: Flink 1.4.0 keytab is unreadable

2017-12-15 Thread Ufuk Celebi
Hey 杨光, thanks for looking into this in such a detail. Unfortunately, I'm not sure what the expected behaviour is (whether the change in behaviour was accidental or on purpose). Let me pull in Gordon who has worked quite a bit on the Kerberos related components in Flink. @Gordon: 1) Do you know

Re: Python API not working

2017-12-15 Thread Ufuk Celebi
Hey Yassine, let me include Chesnay (cc'd) who worked on the Python API. I'm not familiar with the API and what it expects, but try entering `streaming` or `batch` for the mode. Chesnay probably has the details. – Ufuk On Fri, Dec 15, 2017 at 11:05 AM, Yassine MARZOUGUI

Re: docker-flink images and CI

2017-12-15 Thread Ufuk Celebi
I agree with Patrick's (cc'd) comment from the linked issue. What I understand from the linked issue is that Patrick will take care of the Docker image update for 1.4 manually. Is that ok with you Patrick? :-) Regarding the Flink release process question: I fully agree with the idea to integrate

Re: Flink State monitoring

2017-12-15 Thread Ufuk Celebi
Hey Liron, unfortunately, there are no built-in metrics related to state. In general, exposing the actual values as metrics is problematic, but exposing summary statistics would be a good idea. I'm not aware of a good work around at the moment that would work in the general case (taking into

Re: S3 Access in eu-central-1

2017-11-29 Thread Ufuk Celebi
Hey Dominik, yes, we should definitely add this to the docs. @Nico: You recently updated the Flink S3 setup docs. Would you mind adding these hints for eu-central-1 from Steve? I think that would be super helpful! Best, Ufuk On Tue, Nov 28, 2017 at 10:00 PM, Dominik Bruhn

Re: Issue with back pressure and AsyncFunction

2017-11-10 Thread Ufuk Celebi
Hey Ken, thanks for your message. Both your comments are correct (see inline). On Fri, Nov 10, 2017 at 10:31 PM, Ken Krugler wrote: > 1. A downstream function in the iteration was (significantly) increasing the > number of tuples - it would get one in, and sometimes

Re: Flink memory leak

2017-11-07 Thread Ufuk Celebi
itly define any state descriptor. We only use map and filters > operator. We thought that gc handle clearing the flink’s internal states. > So how can we manage the memory if it is always increasing? > > - Ebru > > On 7 Nov 2017, at 16:23, Ufuk Celebi <u...@apache.org> wrot

Re: MODERATE for d...@flink.apache.org

2017-11-07 Thread Ufuk Celebi
> Thank you for your reply. > > All the cluster Flink servers are able to access network drive, and it mapped > as drive Y in all nodes. > Do I need to provide more information? > > Thanks, > Jordan > > >> On 7 Nov 2017, at 6:36 PM, Ufuk Celebi <u...@apache.org&

Re: MODERATE for d...@flink.apache.org

2017-11-07 Thread Ufuk Celebi
As answered by David on SO, the files need to be accessible by all nodes. In your setup this seems not to be the case, therefore it won't work. You need a distributed file system (e.g. NFS or HDFS) or object store (e.g. S3) that is accessible from all nodes. – Ufuk On Tue, Nov 7, 2017 at 3:34

Re: Flink memory leak

2017-11-07 Thread Ufuk Celebi
Hey Ebru, let me pull in Aljoscha (CC'd) who might have an idea what's causing this. Since multiple jobs are running, it will be hard to understand to which job the state descriptors from the heap snapshot belong to. - Is it possible to isolate the problem and reproduce the behaviour with only a

Re: FlinkCEP behaviour with time constraints not as expected

2017-11-07 Thread Ufuk Celebi
Hey Frederico, let me pull in Dawid (cc'd) who works on CEP. He can probably clarify the expected behaviour here. Best, Ufuk On Mon, Nov 6, 2017 at 12:06 PM, Federico D'Ambrosio wrote: > Hi everyone, > > I wanted to ask if FlinkCEP in the following scenario is

Re: PartitionNotFoundException when running in yarn-session.

2017-10-12 Thread Ufuk Celebi
ld' MapReduce: The > missing task is simply resubmitted and ran again. > Why doesn't that happen? > > > Niels > > On Wed, Oct 11, 2017 at 8:49 AM, Ufuk Celebi <u...@apache.org> wrote: >> >> Hey Niels, >> >> any update on this? >> >> – Uf

Re: Flink 1.3.2 Netty Exception

2017-10-11 Thread Ufuk Celebi
@Chesnay: Recycling of network resources happens after the tasks go into state FINISHED. Since we are submitting new jobs in a local loop here it can easily happen that the new job is submitted before enough buffers are available again. At least, previously that was the case. I'm CC'ing Nico who

Re: PartitionNotFoundException when running in yarn-session.

2017-10-11 Thread Ufuk Celebi
Hey Niels, any update on this? – Ufuk On Mon, Oct 9, 2017 at 10:16 PM, Ufuk Celebi <u...@apache.org> wrote: > Hey Niels, > > thanks for the detailed report. I don't think that it is related to > the Hadoop or Scala version. I think the following happens: > > - Occasio

Re: PartitionNotFoundException when running in yarn-session.

2017-10-09 Thread Ufuk Celebi
Hey Niels, thanks for the detailed report. I don't think that it is related to the Hadoop or Scala version. I think the following happens: - Occasionally, one of your tasks seems to be extremely slow in registering its produced intermediate result (the data shuffled between TaskManagers) -

Re: question on sideoutput from ProcessWindow function

2017-09-23 Thread Ufuk Celebi
+1 Created an issue here: https://issues.apache.org/jira/browse/FLINK-7677 On Thu, Sep 14, 2017 at 11:51 AM, Aljoscha Krettek wrote: > Hi, > > Chen is correct! I think it would be nice, though, to also add that > functionality for ProcessWindowFunction and I think this

Re: Noisy org.apache.flink.configuration.GlobalConfiguration

2017-09-19 Thread Ufuk Celebi
I saw this too recently when using HadoopFileSystem for checkpoints (HDFS or S3). I thought I had opened an issue for this, but I didn't. Here it is: https://issues.apache.org/jira/browse/FLINK-7643 On Tue, Sep 19, 2017 at 1:28 PM, Till Rohrmann wrote: > Hi Elias, > >

Re: Flink flick cancel vs stop

2017-09-14 Thread Ufuk Celebi
Hey Elias, sorry for the delay here. No, stop is not deprecated but not fully implemented yet. One missing part is migration of the existing source functions as you say. Let me pull in Till for more details on this. @Till: Is there more missing than migrating the sources? Here is the PR and

Re: Flink 1.2.1 JobManager Election Deadlock

2017-09-07 Thread Ufuk Celebi
Thanks for looking into this and finding out that it is (probably) related to Curator. Very valuable information! On Thu, Sep 7, 2017 at 3:09 PM, Timo Walther wrote: > Thanks for informing us. As far as I know, we were not aware of any deadlock > in the JobManager election.

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-23 Thread Ufuk Celebi
Thanks James for sharing your experience. I find it very interesting :-) On Tue, Aug 22, 2017 at 9:50 PM, Hao Sun wrote: > Great suggestions, the etcd operator is very interesting, thanks James. > > > On Tue, Aug 22, 2017, 12:42 James Bucher wrote: >> >>

Re: Great number of jobs and numberOfBuffers

2017-08-17 Thread Ufuk Celebi
PS: Also pulling in Nico (CC'd) who is working on the network stack. On Thu, Aug 17, 2017 at 11:23 AM, Ufuk Celebi <u...@apache.org> wrote: > Hey Gwenhael, > > the network buffers are recycled automatically after a job terminates. > If this does not happen, it would be

Re: Great number of jobs and numberOfBuffers

2017-08-17 Thread Ufuk Celebi
Hey Gwenhael, the network buffers are recycled automatically after a job terminates. If this does not happen, it would be quite a major bug. To help debug this: - Which version of Flink are you using? - Does the job fail immediately after submission or later during execution? - Is the following

Re: Queryable State max number of clients

2017-08-14 Thread Ufuk Celebi
This is as Aljoscha describes. Each thread can handle many different clients at the same time. You shouldn't need to change the defaults in most cases. The network threads handle the TCP connections and dispatch query tasks to the query threads which do the actual querying of the state backend.

Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-07-28 Thread Ufuk Celebi
On Fri, Jul 28, 2017 at 4:03 PM, Stephan Ewen wrote: > Seems like no one raised a concern so far about dropping the savepoint > format compatibility for 1.1 in 1.4. > > Leaving this thread open for some more days, but from the sentiment, it > seems like we should go ahead? +1

Re: mirror links don't work

2017-07-04 Thread Ufuk Celebi
Thanks for reporting this. Did you find these pages by Googling for the Flink docs? They are definitely very outdated versions of Flink. On Tue, Jul 4, 2017 at 4:46 PM, AndreaKinn wrote: > I found it clicking on "download flink for hadoop 1.2" button: >

Re: External checkpoints not getting cleaned up/discarded - potentially causing high load

2017-07-03 Thread Ufuk Celebi
On Mon, Jul 3, 2017 at 12:02 PM, Stefan Richter wrote: > Another thing that could be really helpful, if possible, can you attach a > profiler/sampling to your job manager and figure out the hotspot methods > where most time is spend? This would be very helpful as a

Re: Flink 1.2.0 - Flink Job restart config not using Flink Cluster restart config.

2017-06-29 Thread Ufuk Celebi
On Thu, Jun 29, 2017 at 8:59 AM, Ufuk Celebi <u...@apache.org> wrote: > @Vera: As a work around you could enable checkpointing and afterwards > explicitly disable restarts via > ExecutionConfig.setRestartStrategy(null). Then the cluster default > should be picked up. @Vera: S

Re: Flink 1.2.0 - Flink Job restart config not using Flink Cluster restart config.

2017-06-29 Thread Ufuk Celebi
Hey Vera and Gordon, I agree that this behaviour is confusing. If we want to split hairs here, we wouldn't call it a bug, because the restart strategy docs say that "Default restart strategy to use in case no restart strategy has been specified for the job". The confusing part is that enabling

Re: UnknownKvStateKeyGroupLocation

2017-05-16 Thread Ufuk Celebi
Hey Joe! This sounds odd... are there any failures (JobManager or TaskManager) or leader elections being reported? You should see such events in the JobManager/TaskManager logs. On Tue, May 16, 2017 at 2:28 PM, Joe Olson wrote: > When running Flink in high availability mode,

Re: Queryable State

2017-05-04 Thread Ufuk Celebi
the UI to make sure all the task > managers are present, and that the job is running on all of them, but is > there some verbiage in the logs that indicates the job manager can talk to > all the task managers, and vice versa? > > Thanks! > > > 02.05.2017, 06:03, "Ufuk C

Re: Multiple consumers on a subpartition

2017-04-26 Thread Ufuk Celebi
Adding to what Zhijiang said: I think the way to go would be to create multiple "read views" over the pipelined subpartition. You would have to make sure that the initial reference count of the partition buffers is incremented accordingly. The producer will be back pressured by both consumers now.

Re: in-memory optimization

2017-04-24 Thread Ufuk Celebi
Loop invariant data should be kept in Flink's managed memory in serialized form (in a custom hash table). That means that they are not read back again from the CSV file, but they are kept in serialized form and need be deserialized again on access. CC'ing Fabian to double check... On Mon, Apr

Re: Flink on Ignite - Collocation?

2017-04-24 Thread Ufuk Celebi
Hey Matt, in general, Flink doesn't put too much work in co-locating sources (doesn't happen for Kafka, etc. either). I think the only local assignments happen in the DataSet API for files in HDFS. Often this is of limited help anyways. Your approach sounds like it could work, but I would

Re: Disk I/O in Flink

2017-04-24 Thread Ufuk Celebi
Hey Robert, for batch that should cover the relevant spilling code. If the records are >= 5 MB, the SpillingAdaptiveSpanningRecordDeserializer will spill incoming records as well. But that should be covered by the FileChannel instrumentation as well? – Ufuk On Tue, Apr 18, 2017 at 3:57 PM,

Re: Queryable State

2017-04-24 Thread Ufuk Celebi
You should be able to use queryable state w/o any changes to the default config. The `query.server.port` option defines the port of the queryable state server that serves the state on the task managers and it is enabled by default. The important thing is to configure the client to discover the

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Ufuk Celebi
@Yessine: no, there is no way to disable the back pressure mechanism. Do you have more details about the two last operators? What do you mean with the process function is slow on purpose? @Rune: with 1.3 Flink will configure the internal buffers in a way that not too much data is buffered in the

Re: Monitoring REST API and YARN session

2017-03-31 Thread Ufuk Celebi
In this case they are proxied through YARN, you can check the list auf running applications and click on the Flink app master UI link. Then you have the host and port for the REST calls. Does this work? On Fri, Mar 31, 2017 at 1:51 AM, Mohammad Kargar wrote: > How can I access

Re: Async Functions and Scala async-client for mySql/MariaDB database connection

2017-03-31 Thread Ufuk Celebi
I'm not too familiar with what's happening here, but maybe Klou (cc'd) can help? On Thu, Mar 30, 2017 at 6:50 PM, Andrea Spina wrote: > Dear Flink community, > > I started to use Async Functions in Scala, Flink 1.2.0, in order to retrieve > enriching information from

Re: flink one transformation end,the next transformation start

2017-03-31 Thread Ufuk Celebi
What is the error message/stack trace you get here? On Thu, Mar 30, 2017 at 9:33 AM, wrote: > hi,all, > i run a job,it is : > - > val data = env.readTextFile("hdfs:///")//DataSet[(String,Array[String])] > val dataVec

Re: ClassNotFoundException upon Savepoint Disposal

2017-03-29 Thread Ufuk Celebi
Thanks for providing the logs. It looks like the JARs are uploaded (modulo any bugs ;-))... could you double check that the class is actually part of the JAR and has not been moved around via jar -tf | grep EventDataRecord If everything looks good in the JAR, I could write a short tool that

Re: Job fails to start with S3 savepoint

2017-03-20 Thread Ufuk Celebi
Hey Abhinav, the Exception is thrown if the S3 object does not exist. Can you double check that it actually does exist (no typos, etc.)? Could this be related to accessing a different region than expected? – Ufuk On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther wrote: > Hi

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Ufuk Celebi
@Gordon: What's your take on integrating this directly into the consumer? Can't we poll the latest offset wie the Offset API [1] and report a consumer lag metric for the consumer group of the application? This we could also display in the web frontend. In the first version, users would have to

Re: Return of Flink shading problems in 1.2.0

2017-03-17 Thread Ufuk Celebi
Pulling in Robert and Stephan who know the project's shading setup the best. On Fri, Mar 17, 2017 at 6:52 AM, Foster, Craig wrote: > Hi: > > A few months ago, I was building Flink and ran into shading issues for > flink-dist as described in your docs. We resolved this in

Re: Questions regarding queryable state

2017-03-16 Thread Ufuk Celebi
On Thu, Mar 16, 2017 at 10:00 AM, Kathleen Sharp wrote: > Hi, > > I have some questions regarding the Queryable State feature: > > Is it possible to use the QueryClient to get a list of keys for a given State? No, this is not possible at the moment. You would have to

Re: Remove Accumulators at runtime

2017-03-09 Thread Ufuk Celebi
I see, this is not possible with accumulators. You could wrap all counts in a single metric and update that one. Check out Flink's metrics: https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/metrics.html On Wed, Mar 8, 2017 at 5:04 PM, PedroMrChaves

Re: flink/cancel & shutdown hooks

2017-03-08 Thread Ufuk Celebi
com> wrote: > I’m not using YARN but instead of starting the cluster using > bin/start-cluster.sh > >> On 8 Mar 2017, at 15:32, Ufuk Celebi <u...@apache.org> wrote: >> >> On Wed, Mar 8, 2017 at 3:19 PM, Dominik Safaric >> <dominiksafa...@gmail.com> wrote:

Re: flink/cancel & shutdown hooks

2017-03-08 Thread Ufuk Celebi
On Wed, Mar 8, 2017 at 3:19 PM, Dominik Safaric wrote: > The cluster consists of 4 workers and a master node. Are you starting the cluster via bin/start-cluster.sh or are you using YARN etc.?

Re: flink/cancel & shutdown hooks

2017-03-08 Thread Ufuk Celebi
How are you deploying your job? Shutdown hooks are executed when the JVM terminates whereas the cancel command only cancels the Flink job and the JVM process potentially keeps running. For example, running a standalone cluster would keep the JVMs running. On Wed, Mar 8, 2017 at 9:36 AM, Timo

  1   2   3   4   5   >