Re: Signal for End of Stream

2018-05-07 Thread Dhruv Kumar
Thanks a lot, Fabian for your response. What I understand is that if I write my own Sourcefunction such that it handles the "end of stream” record and make the source exit from run() method, the flink program will terminate. I have been using SocketTextStreamFunction till now. So, I

Taskmanager with multiple slots vs running multiple taskmanagers with 1 slot each

2018-05-07 Thread au.fp2018
Hi All, As the subject indicates, what is the general consensus and best practices around running 1 taskmanger with multiple slots vs running multiple taskmanager each with 1 slot? What is your reason for you picking one over the other? Thanks, Andre -- Sent from:

Re: Standalone HA Cluster using Shared Zookeeper

2018-05-07 Thread au.fp2018
Thanks Fabian After I understood the purpose of cluster-id configuration, I did end up changing the default to something more specific to my cluster. Fabian Hueske-2 wrote > Hi Andre, > > Sharing a Zookeeper cluster between Kafka and Flink should be OK. > > If you're running just one cluster,

Re: FLIP-6 Docker / Kubernetes

2018-05-07 Thread anaray
Thank You Fabian. We are looking forward for this feature, especially a way to bundle the job jar along with the container (even in taskmanagers). In production we deploy flink (1 job=1 cluster) in Docker Swarm. One of the main issue we face is related to blob download when the taskmanager fails

Re: Apache Flink - Flink Forward SF 2018 - Scaling stream data pipelines (source code)

2018-05-07 Thread M Singh
Thanks Folks for your pointers. Mans On Sunday, May 6, 2018, 10:49:12 PM PDT, Till Rohrmann wrote: Hi, you can find the Flink code here [1] and the Pravega connector code here [2]. Let me know if you run into any problems. [1] https://github.com/

Re: pre-initializing global window state

2018-05-07 Thread jelmer
Hi Ken > 1. I would first try using RockDB with incremental checkpointing , before deciding that an alternative approach is required. That would reduce the size of the checkpoints but as far as I know not the

Re: Signal for End of Stream

2018-05-07 Thread Fabian Hueske
Hi, Flink will automatically stop the execution of a DataStream program once all sources have finished to provide data, i.e., when all SourceFunction return from the run() method. The DeserializationSchema.isEndOfStream() method can be used to tell a built-in SourceFunction such as a

Re: pre-initializing global window state

2018-05-07 Thread Ken Krugler
Hi Jelmer, Three comments, if I understand your use case correctly… 1. I would first try using RockDB with incremental checkpointing , before deciding that an alternative approach is required. 2. Have you considered

Re: Signal for End of Stream

2018-05-07 Thread Dhruv Kumar
I notice that there is some DeserializationSchema in org.apache.flink.api.common.serialization which has a function isEndOfStream but I am not sure if I can use it in my use case. -- Dhruv Kumar PhD Candidate Department of Computer Science and

Re: Dynamically deleting kafka topics does not remove partitions from kafkaConsumer

2018-05-07 Thread Edward Alexander Rojas Clavijo
Hello, I've being working on a fix for this, I posted more details on the JIRA ticket. Regards, Edward 2018-05-07 5:51 GMT+02:00 Tzu-Li (Gordon) Tai : > Ah, correct, sorry for the incorrect link. > Thanks Ted! > > > On 7 May 2018 at 11:43:12 AM, Ted Yu (yuzhih...@gmail.com)

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Rohil Surana
Yes, I am aware that to restart the jobs Flink won't require the jars. But would have been awesome if it could have retained those. Thanks all for the help. Regards, - Rohil On Mon 7 May, 2018, 5:32 PM Sampath Bhat, wrote: > Hi Rohil > > You need not upload the jar

Re: Reading csv-files in parallel

2018-05-07 Thread Fabian Hueske
Hi Esa, you can certainly read CSV files in parallel. This works very well in a batch query. For streaming queries, that expect data to be ingested in timestamp order this is much more challenging, because you need 1) read the files in the right order and 2) cannot split files (unless you

Re: Standalone HA Cluster using Shared Zookeeper

2018-05-07 Thread Fabian Hueske
Hi Andre, Sharing a Zookeeper cluster between Kafka and Flink should be OK. If you're running just one cluster, you could in principle keep the default. However, I'd change the configuration just in case. Otherwise, you might get into trouble when you (accidentally) run another Flink setup.

Re: strange behavior with jobmanager.rpc.address on standalone HA cluster

2018-05-07 Thread Fabian Hueske
Hi Derek, 1. I've created a JIRA issue to improve the docs as you recommended [1]. 2. This discussion goes quite a bit into the internals of the HA setup. Let me pull in Till (in CC) who knows the details of HA. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-9309 2018-05-05

pre-initializing global window state

2018-05-07 Thread jelmer
Hi I am looking for some advice on how to solve the following problem I'd like to keep track of the all time last n events received for a user. An event on average takes up 500 bytes and here will be ten's of millions of users for which we need to keep this information. The list of events will be

Re: YARN per-job cluster reserves all of remaining memory in YARN

2018-05-07 Thread Fabian Hueske
Hi Dongwon, I see that you are using the latest master (Flink 1.6-SNAPSHOT). This is a known problem in the new FLIP-6 mode. The ResourceManager tries to allocate too many resources, basically on TM per required slot, i.e., it does not take the number of slots per TM into account. The resources

Re: Why FoldFunction deprecated?

2018-05-07 Thread Fabian Hueske
Hi, FoldFunction was deprecated because it doesn't support partial aggregation. AggregateFunction is much more expressive, however requires a bit more implementation effort. In favor of a concise API, FoldFunction was deprecated because it doesn't offer more functionality than AggregateFunction.

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Sampath Bhat
Hi Rohil You need not upload the jar again when job manager restarts in an HA environment. Only the the jar stored in web.upload.dir will be deleted which is fine. The jars needed for the job manager to restart will be stored in high-availability.storageDir along with job graphs and job related

Re: This server is not the leader for that topic-partition

2018-05-07 Thread Alexander Smirnov
thank you Piotr On Mon, May 7, 2018 at 2:59 PM Piotr Nowojski wrote: > Hi, > > Regardless if that will fix the problem or not, please consider upgrading > to Kafka 0.11.0.2 or 1.0.1. Kafka 0.11.0 release was quite messy and it > might be that the bug you have hit was

Re: Init RocksDB state backend during startup

2018-05-07 Thread Fabian Hueske
Hi Peter, State initialization with with historic data is a use case that's coming up more and more. Unfortunately, there's no good solution for this yet but just a couple of workaround that require careful design and work for all cases. There was a talk about exactly this problem and some ideas

Re: This server is not the leader for that topic-partition

2018-05-07 Thread Piotr Nowojski
Hi, Regardless if that will fix the problem or not, please consider upgrading to Kafka 0.11.0.2 or 1.0.1. Kafka 0.11.0 release was quite messy and it might be that the bug you have hit was fixed in 0.11.0.2. As a side note, as far as we know our FlinkKafkaProducer011 works fine with Kafka

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Rohil Surana
Ok. but why was this decision taken to automatically delete and not retain the jars, to me it makes sense to have the uploaded jars so user doesn't have to do it when JobManager restarts. Thanks. - Rohil On Mon, May 7, 2018 at 12:16 PM, Chesnay Schepler wrote: > The jar

Re: Assign JIRA issue permission

2018-05-07 Thread Sampath Bhat
Thank you for your reply. On Mon, May 7, 2018 at 9:02 AM, Tzu-Li (Gordon) Tai wrote: > Hi Sampath, > > Do you already have a target JIRA that you would like to work on? > > Once you have one, let us know the JIRA issue ID and your JIRA account ID, > then we'll assign you

Signal for End of Stream

2018-05-07 Thread Dhruv Kumar
Hi Is there a way I can capture the end of stream signal for streams which are replayed from historical data? I need the end of stream signal to tell the Flink program to finish its execution. Below is the use case in detail: 1. An independent log replayer program sends the records to a socket

Reading csv-files in parallel

2018-05-07 Thread Esa Heikkinen
Hi I would want to read many different type csv-files (time series data) parallel using by CsvTableSource. Is that possible in Flink application ? If yes, are there exist the examples about that ? If it is not, do you have any advices how to do that ? Should I combine all csv-files to one

Re: FLIP-6 Docker / Kubernetes

2018-05-07 Thread Fabian Hueske
Hi, Most, but not all, of the FLIP-6 features will be released with Flink 1.5.0. I'm not sure if this deployment mode will be fully supported in 1.5.0. Gary (in CC) might know details here. Anyway, the deployment would work by starting the image using regular Docker/Kubernetes tools. The image

Re: Lost JobManager

2018-05-07 Thread Fabian Hueske
Hi Regina, I see from the logs that you are using the DataSet API. Are you trying to fetch a large result to your client using the collect() method? Best, Fabian 2018-05-02 0:38 GMT+02:00 Chan, Regina : > Hi, > > > > I’m running a single TM with the following params -yn 1

Re: This server is not the leader for that topic-partition

2018-05-07 Thread Alexander Smirnov
Hi Piotr, using 0.11.0 Kafka version On Sat, May 5, 2018 at 10:19 AM Piotr Nowojski wrote: > FlinkKafka011Producer uses Kafka 0.11.0.2. > > However I’m not sure if bumping KafkaProducer version solves this issue or > upgrading Kafka. What Kafka version are you using? >

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Rohil Surana
Hey Chirag, I tried adding both the configs as per the documentation, and I can see the jars getting uploaded to the specified paths, but on JobManager restarts the JARS are actually *deleted* *from* the `jobmanager.web.upload.dir` path. Anything else that I am missing? Thanks. - Rohil On Mon,

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Chirag Dewan
I think you are looking for jobmanager.web.tmpdir along with upload.dir  >From the documentation : - jobmanager.web.tmpdir: This configuration parameter allows defining the Flink web directory to be used by the web interface. The web interface will copy its static files into the