Hi Sathi,
The `getPartitionId` method is invoked with each record from the stream. In
there, you can extract values / fields from the record, and use that to
determine the target partition id.
Is this what you had in mind?
Cheers,
Gordon
On February 21, 2017 at 11:54:21 AM, Sathi Chowdhury
Hi flink users and experts,
In my flink processor I am trying to use Flink Kinesis connector . I read from
a kinesis stream , and After the transformation (for which I use
RichCoFlatMapFunction), json event needs to sink to a kinesis stream k1.
DataStream myStream = see.addSource(new
Hi Sonex
All windows under the same key (e.g., TimeWindow(0, 3600) and TimeWindow(3600,
7200)) will be processed by the flatmap function. You can put the variable
drawn from TimeWindow(0, 3600) into a State. When you receive TimeWindow(3600,
7200), you can access the state and apply the
There's nothing stopping me assigning timestamps and generating watermarks on
a keyed stream in the implementation and the KeyedStream API supports it. It
appears the underlying operator that gets created in
DataStream.assignTimestampsAndWatermarks() isn't key-aware and globally
tracks timestamps.
Hi Stephan,
Just saw your mail while I was explaining the answer to your earlier
questions. I have attached some more screenshots which are taken from the
latest run today.
Yes I will try to set it to higher value and check if performance improves
Let me know your thoughts
Regards,
Vinay Patil
Hi Stephan,
I am using Flink 1.2.0 version, and running the job on on YARN using
c3.4xlarge EC2 instances having 16 cores and 30GB RAM each.
In total I have set 32 slots and alloted 1200 network buffers
I have attached the latest checkpointing snapshot, its configuration, cpu
load average
@Vinay!
Just saw the screenshot you attached to the first mail. The checkpoint that
failed came after one that had an incredible heavy alignment phase (14 GB).
I think that working that off threw the next checkpoint because the workers
were still working off the alignment backlog.
I think you
How about adding this to the "logging" docs - a section on how to run log4j2
On Mon, Feb 20, 2017 at 8:50 AM, Robert Metzger wrote:
> Hi Chet,
>
> These are the files I have in my lib/ folder with the working log4j2
> integration:
>
> -rw-r--r-- 1 robert robert 79966937
Hi Shannon!
In the latest HA and BlobStore changes (1.3) it uses "/tmp" only for
caching and will re-obtain the files from the persistent storage.
I think we should make this a bigger point, even:
- Flink should not use "/tmp" at all (except for mini cluster mode)
- Yarn and Mesos should
With pre-aggregation (which the Reduce does), Flink can handle many windows
and many keys, as long as you have the memory and storage to support that.
Your case should work.
On Mon, Feb 20, 2017 at 4:58 PM, Vadim Vararu
wrote:
> It's something like:
>
>
Hi!
Exactly-once end-to-end requires sinks that support that kind of behavior
(typically some form of transactions support).
Kafka currently does not have the mechanisms in place to support
exactly-once sinks, but the Kafka project is working on that feature.
For ElasticSearch, it is also not
Hi Vinay!
Can you start by giving us a bit of an environment spec?
- What Flink version are you using?
- What is your rough topology (what operations does the program use)
- Where is the state (windows, keyBy)?
- What is the rough size of your checkpoints and where does the time go?
Can
Hi Xiaogang,
Thank you for your inputs.
Yes I have already tried setting MaxBackgroundFlushes and
MaxBackgroundCompactions to higher value (tried with 2, 4, 8) , still not
getting expected results.
System.getProperty("java.io.tmpdir") points to /tmp but there I could not
find RocksDB logs, can
Hi,
I'm using Flink 1.2.0 and try to do "exactly once" data transfer
from Kafka to Elasticsearch, but I cannot.
(Scala 2.11, Kafka 0.10, without YARN)
There are 2 Flink TaskManager nodes, and when processing
with 2 parallelism, shutdown one of them (simulating node failure).
Using
Hi,
I'm using Flink and Elasticsearch and I want to recieve in elasticsearch a
json object ({"id":1, "name":"X"} ect...), I already have a string with
this information, but I don't want to save it as string.
I recieve this:
{
"_index": "logs",
"_type": "object",
"_id":
It's something like:
DataStreamSource stream =
env.addSource(getKafkaConsumer(parameterTool)); stream
.map(getEventToDomainMapper())
.keyBy(getKeySelector())
.window(ProcessingTimeSessionWindows.withGap(Time.minutes(90)))
.reduce(getReducer())
Hi Vadim,
this of course depends on your use case. The question is how large is
your state per pane and how much memory is available for Flink?
Are you using incremental aggregates such that only the aggregated value
per pane has to be kept in memory?
Regards,
Timo
Am 20/02/17 um 16:34
HI guys,
Is it okay to have very many (tens of thousands or hundreds of thousand)
of session windows?
Thanks, Vadim.
Yes, you are correct. A window will be created for each key/group and then
you can apply a function, or aggregate elements per key.
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Is-a-new-window-created-for-each-key-group-tp11745p11746.html
Hey Shannon,
good idea! We currently have this:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/ops/production_ready.html
It has a strong focus on managed state and not the points you mentioned.
Would you like to create an issue for adding this to the production
check list? I think
Hi,
Flink 1.2 is partitioning all keys into key-groups, the atomic units for
rescaling. This partitioning is done by hash partitioning and is also in sync
with the routing of tuples to operator instances (each parallel instance of a
keyed operator is responsible for some range of key groups).
Hi sonex
I think you can accomplish it by using a PassThroughFunction as the apply
function and processing the elements in a rich flatMap function followed. You
can keep the information in the flatmap function (via states) so that they can
be shared among different windows.
The program may
I have seen the log, did not find any information. Just get some
information about the machine run this node. Disk less 10%
2017-02-20 14:03 GMT+08:00 wangzhijiang999 :
> The log just indicates the SignalHandler handles the kill signal and the
> process of JobManager
23 matches
Mail list logo