date:20180508

Re: Error handling

2018-05-08 Thread Vishnu Viswanath

Was referring to the original email thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-handling-td3448.html On Tue, May 8, 2018 at 5:29 PM, vishnuviswanath < vishnu.viswanat...@gmail.com> wrote: > Hi, > > Wondering if any of these ideas were implemented after the

Re: Error handling

2018-05-08 Thread vishnuviswanath

Hi, Wondering if any of these ideas were implemented after the discussion? Thanks, Vishnu -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming and batch jobs together

2018-05-08 Thread Flavio Pompermaier

Thanks! Both solutions are reasonable but ehat abiut max state size (per key)?is there any suggested database/nosql store to use? On Tue, 8 May 2018, 18:09 TechnoMage, wrote: > If you use a KeyedStream you can group records by key (city) and then use > a RichFlatMap to

Re: Reading csv-files in parallel

2018-05-08 Thread Fabian Hueske

Hi, the Table API / SQL and the DataSet API can be used together in the same program. So you could read the data with a custom input format or a TextInputFormat and a custom MapFunction parser and hand it to SQL afterwards. The program would be a regular Scala DataSet program with an

Re: Lost JobManager

2018-05-08 Thread Fabian Hueske

Hi, I noticed that you configured the Akka framesize to 2GB (the default being 10MB). This appears like quite a lot to me and might be causing problems since the exceptions indicate an Akka timeout issue. Did configure the framesize for a particular reason that high? It seems that you are

Re: Processing Sorted Input Datasets

2018-05-08 Thread Fabian Hueske

Hi Helmut, In fact this is possible with the DataSet API. However, AFAIK it is an undocumented feature and probably not widely used. You can do this by specifying so-called SplitDataProperties on a DataSource as follows: DataSource src = env.createInput(...); SplitDataProperties splitProps =

Re: Streaming and batch jobs together

2018-05-08 Thread TechnoMage

If you use a KeyedStream you can group records by key (city) and then use a RichFlatMap to aggregate state in a MapState or ListState per key. You can then have that operator publish the updated results as a new aggregated record, or send it to a database or such as you see fit. Michael > On

RE: Lost JobManager

2018-05-08 Thread Chan, Regina

There’s no collect() explicitly from me. It has a cogroup operator before writing to DataSink. From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Monday, May 07, 2018 6:31 AM To: Chan, Regina [Tech] Cc: user@flink.apache.org; Newport, Billy [Tech] Subject: Re: Lost JobManager Hi Regina, I

Fwd: Processing Sorted Input Datasets

2018-05-08 Thread Helmut Zechmann

Hi all, Helmut Zechmann helmut.zechm...@mailbox.org www.helmutzechmann.com 0151 27527950 we want to use flink batch to merge records from two or more datasets using groupBy. The input datasets are already sorted since they have been written out sorted by some other job. Is it possible to

Re: jvm options for individual jobs / topologies

2018-05-08 Thread Kostas Kloudas

Hi Benjamin, I do not think you can set per job memory configuration in a shared cluster. The reason is that if different jobs share the same TM, there are resources that are shared between them, e.g, network buffers. If you are ok with having a separate cluster per job then this will allow you

Re: Streaming and batch jobs together

2018-05-08 Thread Kostas Kloudas

Hi Flavio, If I understand correctly, you have a set of keys which evolves in two ways: keys may be added/deleted values associated with the keys can also be updated. If this is the case, you can use a streaming job that: 1. has as a source the stream of events

RE: Reading csv-files in parallel

2018-05-08 Thread Esa Heikkinen

Hi Would it better to use DataSet API, Table (Relational) and readCsvFile() , because it is little but upper level implementation ? SQL also sounds very good in this (batch processing) case, but is it possible to use (because many different type of csv-files) ? And does it understand

Re: Taskmanager with multiple slots vs running multiple taskmanagers with 1 slot each

2018-05-08 Thread Kostas Kloudas

Hi Andre, I cannot speak on behalf of everyone but I would recommend 1 TM with multiple slots. This way you pay the “fixed costs” of running a TM (like allocating memory for network buffers, launching thread pools, exchanging heartbeat messages etc) only once. On the flip-side, this means that

Re: Reading csv-files in parallel

2018-05-08 Thread Fabian Hueske

Hi, the easiest approach is to read the CSV files linewise as regular text files (ExecutionEnvironment.readTextFile()) and apply custom parse logic in a MapFunction. Then you have all freedom to deal with records of different schema. Best, Fabian 2018-05-08 12:35 GMT+02:00 Esa Heikkinen

RE: Reading csv-files in parallel

2018-05-08 Thread Esa Heikkinen

Hi At this moment a batch query is ok. Do you know any good (Scala) examples how to query batches (different type of csv-files) in parallel ? Or do you have example of a custom source function, that read csv-files parallel ? Best, Esa From: Fabian Hueske Sent: Monday,

Streaming and batch jobs together

2018-05-08 Thread Flavio Pompermaier

Hi all, I'd like to introduce in our pipeline an efficient way to aggregate incoming data around an entity. We have basically new incoming facts that are added (but also removed potentially) to an entity (by id). For example, when we receive a new name of a city we add this name to the known

jvm options for individual jobs / topologies

2018-05-08 Thread Benjamin Cuthbert

Hi all, How do I give different topologies / jobs different Xmx memory parameters? Regards

Re: Signal for End of Stream

2018-05-08 Thread Dhruv Kumar

Fabian, Thanks a lot for your continuous help! Really appreciate it. Sent from Phone. > On May 8, 2018, at 03:06, Fabian Hueske wrote: > > Hi Dhruv, > > The changes look good to me. > > Best, Fabian > > 2018-05-08 5:37 GMT+02:00 Dhruv Kumar : >>

Re: Error handling

Re: Error handling

Re: Streaming and batch jobs together

Re: Reading csv-files in parallel

Re: Lost JobManager

Re: Processing Sorted Input Datasets

Re: Streaming and batch jobs together

RE: Lost JobManager

Fwd: Processing Sorted Input Datasets

Re: jvm options for individual jobs / topologies

Re: Streaming and batch jobs together

RE: Reading csv-files in parallel

Re: Taskmanager with multiple slots vs running multiple taskmanagers with 1 slot each

Re: Reading csv-files in parallel

RE: Reading csv-files in parallel

Streaming and batch jobs together

jvm options for individual jobs / topologies

Re: Signal for End of Stream

18 matches

Site Navigation

Mail list logo

Footer information