Was referring to the original email thread:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-handling-td3448.html
On Tue, May 8, 2018 at 5:29 PM, vishnuviswanath <
vishnu.viswanat...@gmail.com> wrote:
> Hi,
>
> Wondering if any of these ideas were implemented after the
Hi,
Wondering if any of these ideas were implemented after the discussion?
Thanks,
Vishnu
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Thanks! Both solutions are reasonable but ehat abiut max state size (per
key)?is there any suggested database/nosql store to use?
On Tue, 8 May 2018, 18:09 TechnoMage, wrote:
> If you use a KeyedStream you can group records by key (city) and then use
> a RichFlatMap to
Hi,
the Table API / SQL and the DataSet API can be used together in the same
program.
So you could read the data with a custom input format or a TextInputFormat
and a custom MapFunction parser and hand it to SQL afterwards.
The program would be a regular Scala DataSet program with an
Hi,
I noticed that you configured the Akka framesize to 2GB (the default being
10MB).
This appears like quite a lot to me and might be causing problems since the
exceptions indicate an Akka timeout issue.
Did configure the framesize for a particular reason that high?
It seems that you are
Hi Helmut,
In fact this is possible with the DataSet API. However, AFAIK it is an
undocumented feature and probably not widely used.
You can do this by specifying so-called SplitDataProperties on a DataSource
as follows:
DataSource src = env.createInput(...);
SplitDataProperties splitProps =
If you use a KeyedStream you can group records by key (city) and then use a
RichFlatMap to aggregate state in a MapState or ListState per key. You can
then have that operator publish the updated results as a new aggregated record,
or send it to a database or such as you see fit.
Michael
> On
There’s no collect() explicitly from me. It has a cogroup operator before
writing to DataSink.
From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: Monday, May 07, 2018 6:31 AM
To: Chan, Regina [Tech]
Cc: user@flink.apache.org; Newport, Billy [Tech]
Subject: Re: Lost JobManager
Hi Regina,
I
Hi all,
Helmut Zechmann
helmut.zechm...@mailbox.org
www.helmutzechmann.com
0151 27527950
we want to use flink batch to merge records from two or more datasets using
groupBy.
The input datasets are already sorted since they have been written out sorted
by some other job.
Is it possible to
Hi Benjamin,
I do not think you can set per job memory configuration in a shared cluster.
The reason is that if different jobs share the same TM, there are resources
that are shared between them, e.g, network buffers.
If you are ok with having a separate cluster per job then this will allow you
Hi Flavio,
If I understand correctly, you have a set of keys which evolves in two ways:
keys may be added/deleted
values associated with the keys can also be updated.
If this is the case, you can use a streaming job that:
1. has as a source the stream of events
Hi
Would it better to use DataSet API, Table (Relational) and readCsvFile() ,
because it is little but upper level implementation ?
SQL also sounds very good in this (batch processing) case, but is it possible
to use (because many different type of csv-files) ?
And does it understand
Hi Andre,
I cannot speak on behalf of everyone but I would recommend 1 TM with multiple
slots.
This way you pay the “fixed costs” of running a TM (like allocating memory for
network buffers,
launching thread pools, exchanging heartbeat messages etc) only once.
On the flip-side, this means that
Hi,
the easiest approach is to read the CSV files linewise as regular text
files (ExecutionEnvironment.readTextFile()) and apply custom parse logic in
a MapFunction.
Then you have all freedom to deal with records of different schema.
Best, Fabian
2018-05-08 12:35 GMT+02:00 Esa Heikkinen
Hi
At this moment a batch query is ok.
Do you know any good (Scala) examples how to query batches (different type of
csv-files) in parallel ?
Or do you have example of a custom source function, that read csv-files
parallel ?
Best, Esa
From: Fabian Hueske
Sent: Monday,
Hi all,
I'd like to introduce in our pipeline an efficient way to aggregate
incoming data around an entity.
We have basically new incoming facts that are added (but also removed
potentially) to an entity (by id). For example, when we receive a new name
of a city we add this name to the known
Hi all,
How do I give different topologies / jobs different Xmx memory parameters?
Regards
Fabian, Thanks a lot for your continuous help! Really appreciate it.
Sent from Phone.
> On May 8, 2018, at 03:06, Fabian Hueske wrote:
>
> Hi Dhruv,
>
> The changes look good to me.
>
> Best, Fabian
>
> 2018-05-08 5:37 GMT+02:00 Dhruv Kumar :
>>
18 matches
Mail list logo