Re: kinesis producer setCustomPartitioner use stream's own data

2017-02-20 Thread Tzu-Li (Gordon) Tai
Hi Sathi, The `getPartitionId` method is invoked with each record from the stream. In there, you can extract values / fields from the record, and use that to determine the target partition id. Is this what you had in mind? Cheers, Gordon On February 21, 2017 at 11:54:21 AM, Sathi Chowdhury

kinesis producer setCustomPartitioner use stream's own data

2017-02-20 Thread Sathi Chowdhury
Hi flink users and experts, In my flink processor I am trying to use Flink Kinesis connector . I read from a kinesis stream , and After the transformation (for which I use RichCoFlatMapFunction), json event needs to sink to a kinesis stream k1. DataStream myStream = see.addSource(new

回复:回复:Transfer information from one window to the next

2017-02-20 Thread 施晓罡(星罡)
Hi Sonex All windows under the same key (e.g., TimeWindow(0, 3600) and TimeWindow(3600, 7200)) will be processed by the flatmap function. You can put the variable drawn from TimeWindow(0, 3600) into a State. When you receive TimeWindow(3600, 7200), you can access the state and apply the

Re: Watermarks per key

2017-02-20 Thread jganoff
There's nothing stopping me assigning timestamps and generating watermarks on a keyed stream in the implementation and the KeyedStream API supports it. It appears the underlying operator that gets created in DataStream.assignTimestampsAndWatermarks() isn't key-aware and globally tracks timestamps.

Re: Checkpointing with RocksDB as statebackend

2017-02-20 Thread vinay patil
Hi Stephan, Just saw your mail while I was explaining the answer to your earlier questions. I have attached some more screenshots which are taken from the latest run today. Yes I will try to set it to higher value and check if performance improves Let me know your thoughts Regards, Vinay Patil

Re: Checkpointing with RocksDB as statebackend

2017-02-20 Thread vinay patil
Hi Stephan, I am using Flink 1.2.0 version, and running the job on on YARN using c3.4xlarge EC2 instances having 16 cores and 30GB RAM each. In total I have set 32 slots and alloted 1200 network buffers I have attached the latest checkpointing snapshot, its configuration, cpu load average

Re: Checkpointing with RocksDB as statebackend

2017-02-20 Thread Stephan Ewen
@Vinay! Just saw the screenshot you attached to the first mail. The checkpoint that failed came after one that had an incredible heavy alignment phase (14 GB). I think that working that off threw the next checkpoint because the workers were still working off the alignment backlog. I think you

Re: Log4J

2017-02-20 Thread Stephan Ewen
How about adding this to the "logging" docs - a section on how to run log4j2 On Mon, Feb 20, 2017 at 8:50 AM, Robert Metzger wrote: > Hi Chet, > > These are the files I have in my lib/ folder with the working log4j2 > integration: > > -rw-r--r-- 1 robert robert 79966937

Re: blob store defaults to /tmp and files get deleted

2017-02-20 Thread Stephan Ewen
Hi Shannon! In the latest HA and BlobStore changes (1.3) it uses "/tmp" only for caching and will re-obtain the files from the persistent storage. I think we should make this a bigger point, even: - Flink should not use "/tmp" at all (except for mini cluster mode) - Yarn and Mesos should

Re: Is it OK to have very many session windows?

2017-02-20 Thread Stephan Ewen
With pre-aggregation (which the Reduce does), Flink can handle many windows and many keys, as long as you have the memory and storage to support that. Your case should work. On Mon, Feb 20, 2017 at 4:58 PM, Vadim Vararu wrote: > It's something like: > >

Re: How to achieve exactly once on node failure using Kafka

2017-02-20 Thread Stephan Ewen
Hi! Exactly-once end-to-end requires sinks that support that kind of behavior (typically some form of transactions support). Kafka currently does not have the mechanisms in place to support exactly-once sinks, but the Kafka project is working on that feature. For ElasticSearch, it is also not

Re: Checkpointing with RocksDB as statebackend

2017-02-20 Thread Stephan Ewen
Hi Vinay! Can you start by giving us a bit of an environment spec? - What Flink version are you using? - What is your rough topology (what operations does the program use) - Where is the state (windows, keyBy)? - What is the rough size of your checkpoints and where does the time go? Can

Re: Checkpointing with RocksDB as statebackend

2017-02-20 Thread vinay patil
Hi Xiaogang, Thank you for your inputs. Yes I have already tried setting MaxBackgroundFlushes and MaxBackgroundCompactions to higher value (tried with 2, 4, 8) , still not getting expected results. System.getProperty("java.io.tmpdir") points to /tmp but there I could not find RocksDB logs, can

How to achieve exactly once on node failure using Kafka

2017-02-20 Thread Y. Sakamoto
Hi, I'm using Flink 1.2.0 and try to do "exactly once" data transfer from Kafka to Elasticsearch, but I cannot. (Scala 2.11, Kafka 0.10, without YARN) There are 2 Flink TaskManager nodes, and when processing with 2 parallelism, shutdown one of them (simulating node failure). Using

Apache Flink and Elasticsearch send Json Object instead of string

2017-02-20 Thread Fábio Dias
Hi, I'm using Flink and Elasticsearch and I want to recieve in elasticsearch a json object ({"id":1, "name":"X"} ect...), I already have a string with this information, but I don't want to save it as string. I recieve this: { "_index": "logs", "_type": "object", "_id":

Re: Is it OK to have very many session windows?

2017-02-20 Thread Vadim Vararu
It's something like: DataStreamSource stream = env.addSource(getKafkaConsumer(parameterTool)); stream .map(getEventToDomainMapper()) .keyBy(getKeySelector()) .window(ProcessingTimeSessionWindows.withGap(Time.minutes(90))) .reduce(getReducer())

Re: Is it OK to have very many session windows?

2017-02-20 Thread Timo Walther
Hi Vadim, this of course depends on your use case. The question is how large is your state per pane and how much memory is available for Flink? Are you using incremental aggregates such that only the aggregated value per pane has to be kept in memory? Regards, Timo Am 20/02/17 um 16:34

Is it OK to have very many session windows?

2017-02-20 Thread Vadim Vararu
HI guys, Is it okay to have very many (tens of thousands or hundreds of thousand) of session windows? Thanks, Vadim.

Re: Is a new window created for each key/group?

2017-02-20 Thread Sonex
Yes, you are correct. A window will be created for each key/group and then you can apply a function, or aggregate elements per key. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Is-a-new-window-created-for-each-key-group-tp11745p11746.html

Re: blob store defaults to /tmp and files get deleted

2017-02-20 Thread Ufuk Celebi
Hey Shannon, good idea! We currently have this: https://ci.apache.org/projects/flink/flink-docs-release-1.3/ops/production_ready.html It has a strong focus on managed state and not the points you mentioned. Would you like to create an issue for adding this to the production check list? I think

Re: Previously working job fails on Flink 1.2.0

2017-02-20 Thread Stefan Richter
Hi, Flink 1.2 is partitioning all keys into key-groups, the atomic units for rescaling. This partitioning is done by hash partitioning and is also in sync with the routing of tuples to operator instances (each parallel instance of a keyed operator is responsible for some range of key groups).

回复:Transfer information from one window to the next

2017-02-20 Thread 施晓罡(星罡)
Hi sonex I think you can accomplish it by using a PassThroughFunction as the apply function and processing the elements in a rich flatMap function followed.  You can keep the information in the flatmap function (via states) so that they can be shared among different windows. The program may

Re: Jobmanager was killed when disk less 10% in yarn

2017-02-20 Thread lining jing
I have seen the log, did not find any information. Just get some information about the machine run this node. Disk less 10% 2017-02-20 14:03 GMT+08:00 wangzhijiang999 : > The log just indicates the SignalHandler handles the kill signal and the > process of JobManager