Tumbling window rich functionality

2016-09-14 Thread Swapnil Chougule
Hi Team,

I am using tumbling window functionality having window size 5 minutes.
I want to perform setup & teardown functionality for each window. I tried
using RichWindowFunction but it didn't work for me.
Can anybody tell me how can I do it ?

Attaching code snippet what I tried

impressions.map(new
LineItemAdUnitAggr()).keyBy(0).timeWindow(Time.seconds(300)).apply(new
RichWindowFunction,Long>, Boolean, Tuple,
TimeWindow>() {

@Override
public void open(Configuration parameters) throws Exception
{
super.open(parameters);
//setup method
}

public void apply(Tuple key, TimeWindow window,
Iterable, Long>>
input,
Collector out) throws Exception {
//do processing
}

@Override
public void close() throws Exception {
//tear down method
super.close();
}
});

Thanks,
Swapnil


Re: Firing windows multiple times

2016-09-14 Thread Aljoscha Krettek
Hi,
yes AJ that observation is correct. Let's see what Shannon has to say about
this but it might be that all "higher-level" aggregates will have to be
based on the first level and can then update at the speed of that aggregate.

Cheers,
Aljoscha

On Mon, 12 Sep 2016 at 05:03 aj.h  wrote:

> In the way that FLIP-2 would solve this problem, secondAggregate would
> ignore
> the early firing updates from firstAggregate to prevent double-counting,
> correct? If that's the case, I am trying to understand why we'd want to
> trigger early-fires every 30 seconds for the secondAggregate if it's only
> accepting new results at a daily rate, after firstAggregate's primary
> firing
> at the end of the window. If we filter out results from early-fires,
> wouldn't every 30-second result from secondAggregate remain unchanged
> within
> the same 1-day window?
>
> Similarly (compounded) for a 365-day window aggregating over a 30 day
> window: if it filters out early fires, wouldn't it only produce new/unique
> results every 30 days?
>
> I very well may have misunderstood this solution.
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Firing-windows-multiple-times-tp8424p8994.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: Fw: Flink Cluster Load Distribution Question

2016-09-14 Thread Aljoscha Krettek
Hi,
this is a different job from the Kafka Job that you have running, right?

Could you maybe post the code for that as well?

Cheers,
Aljoscha

On Tue, 13 Sep 2016 at 20:14 amir bahmanyari  wrote:

> Hi Robert,
> Sure, I am forwarding it to user. Sorry about that. I followed the
> "robot's" instructions :))
> Topology: 4 Azure A11 CentOS 7 nodes (16 cores, 110 GB). Lets call them
> node1, 2, 3, 4.
> Flink Clustered with node1 running JM & a TM. Three more TM's running on
> node2,3, and 4 respectively.
> I have a Beam running FLink Runner underneath.
> The input data is received by Beam TextIO() reading off a 1.6 GB of data
> containing roughly 22 million tuples.
> *All nodes have identical flink-conf.yam*l, masters & slaves contents as
> follows:
>
> *flink-conf.yaml:*
> jobmanager.rpc.address: node1
> jobmanager.rpc.port: 6123
> jobmanager.heap.mb: 1024
> taskmanager.heap.mb: 102400
> taskmanager.numberOfTaskSlots: 16
> taskmanager.memory.preallocate: false
> parallelism.default: 64
> jobmanager.web.port: 8081
> taskmanager.network.numberOfBuffers: 4096
>
>
>
> *masters*:
> node1:8081
>
> *slaves*:
> node1
> node2
> node3
> node4
>
> Everything looks normal at ./start-cluster.sh & all daemons start on all
> nodes.
> JM, TMs log files get generated on all nodes.
> Dashboard shows how all slots are being used.
> I deploy the Beam app to the cluster where JM is running at node1.
> a *.out file gets generated as data is being processed. No *.out on other
> nodes, just node1 where I deployed the fat jar.
> I tail -f the *.out log on node1 (master). starts fine...but slowly
> degrades & becomes extremely slow.
> As we speak, I started the Beam app 13 hrs ago and its still running.
> How can I prove that ALL NODES are involved in processing the data at the
> same time i.e. clustered?
> Do the above configurations look ok for a reasonable performance?
> Given above parameters set, how can I improve the performance in this
> cluster?
> What other information and or dashboard screen shots is needed to clarify
> this issue.
> I used these websites to do the configuration:
> Apache Flink: Cluster Setup
> 
>
> Apache Flink: Cluster Setup
>
> 
>
>
> Apache Flink: Configuration
> 
>
>
> Apache Flink: Configuration
> 
>
> In the second link, there is a config recommendation for the following but
> this parameter is not in the configuration file out of the box:
>
>- taskmanager.network.bufferSizeInBytes
>
> Should I include it manually? Does it make any difference if the default
> value i.e.32 KB doesn't get picked up?
> Sorry too many questions.
> Pls let me know.
> I appreciate your help.
> Cheers,
> Amir-
>
> - Forwarded Message -
> *From:* Robert Metzger 
> *To:* "d...@flink.apache.org" ; amir bahmanyari <
> amirto...@yahoo.com>
> *Sent:* Tuesday, September 13, 2016 1:15 AM
> *Subject:* Re: Flink Cluster Load Distribution Question
>
> Hi Amir,
>
> I would recommend to post such questions to the user@flink mailing list in
> the future. This list is meant for development-related topics.
>
> I think we need more details to understand why your application is not
> running properly. Can you quickly describe what your topology is doing?
> Are you setting the parallelism to a value >= 1 ?
>
> Regards,
> Robert
>
>
> On Tue, Sep 13, 2016 at 6:35 AM, amir bahmanyari <
> amirto...@yahoo.com.invalid> wrote:
>
> > Hi Colleagues,Just joined this forum.I have done everything possible to
> > get a 4 nodes Flink cluster to work peoperly & run a Beam app.It always
> > generates system-output logs (*.out) in only one node. Its so
> slow
> > for 4 nodes being there.Seems like the load is not distributed amongst
> all
> > 4 nodes but only one node. Most of the time the one where JM runs.I
> > run/tested it in a single node, and it took even faster to run the same
> > load.Not sure whats not being configured right.1- why am I getting
> > SystemOut .out log in only one server? All nodes get their TaskManager
> log
> > files updated thu.2- why dont I see load being distributed amongst all 4
> > nodes, but only one all the times.3- Why does the Dashboard show a 0
> (zero)
> > for Send/Receive numbers per all Task Managers.
> > The Dashboard shows all the right stuff. Top shows not much of resources
> > being stressed on any of the nodes.I can share its contents if it helps
> > diagnosing the issue.Thanks + I appreciate your valuable time, response &
> > help.Amir-
>
>
>


Re: Why tuples are not ignored after watermark?

2016-09-14 Thread Aljoscha Krettek
Hi,
the problem might be that your timestamp/watermark assigner is run in
parallel and that only one parallel instance of those operators emits the
watermark because only one of those parallel instances sees the element
with _3 == 9000. For the watermark to advance at an operator it needs to
advance in all upstream operations.

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 18:29 Saiph Kappa  wrote:

> Hi,
>
> I have a streaming (event time) application where I am receiving events
> with the same assigned timestamp. I receive 1 events in total on a
> window of 5 minutes, but I emit water mark when 9000 elements have been
> received. This watermark is 6 minutes after the assigned timestamps. My
> question is: why the function that is associated with the window reads
> 1 elements and not 9000? All elements that have a timestamp lower than
> the watermark should be ignored (1000), but it's not happening.
>
> Here is part of the code:
> «
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
> val rawStream = env.socketTextStream("localhost", 4321)
>
> val punctuatedAssigner = new AssignerWithPunctuatedWatermarks[(String,
> Int, Long)] {
>   val timestamp = System.currentTimeMillis();
>
>   override def extractTimestamp(element: (String, Int, Long),
> previousElementTimestamp: Long): Long =
> timestamp
>
>   override def checkAndGetNextWatermark(lastElement: (String, Int,
> Long), extractedTimestamp: Long): Watermark = {
> if(lastElement._3 == 9000) {
>   val ts = extractedTimestamp + TimeUnit.MINUTES.toMillis(6)
>   new watermark.Watermark(ts)
> } else null
>   }
> }
>
> val stream = rawStream.map(line => {
>   val Array(p1, p2, p3) = line.split(" ")
>   (p1, p2.toInt, p3.toLong)
> })
>   .assignTimestampsAndWatermarks(punctuatedAssigner)
>
> stream.keyBy(1).timeWindow(Time.of(5, TimeUnit.MINUTES)).apply(function)
> »
>
> Thanks!
>


Re: Tumbling window rich functionality

2016-09-14 Thread Aljoscha Krettek
Hi,
WindowFunction.apply() will be called once for each window so you should be
able to do the setup/teardown in there. open() and close() are called at
the start of processing, end of processing, respectively.

Cheers,
Aljoscha

On Wed, 14 Sep 2016 at 09:04 Swapnil Chougule 
wrote:

> Hi Team,
>
> I am using tumbling window functionality having window size 5 minutes.
> I want to perform setup & teardown functionality for each window. I tried
> using RichWindowFunction but it didn't work for me.
> Can anybody tell me how can I do it ?
>
> Attaching code snippet what I tried
>
> impressions.map(new
> LineItemAdUnitAggr()).keyBy(0).timeWindow(Time.seconds(300)).apply(new
> RichWindowFunction,Long>, Boolean, Tuple,
> TimeWindow>() {
>
> @Override
> public void open(Configuration parameters) throws
> Exception {
> super.open(parameters);
> //setup method
> }
>
> public void apply(Tuple key, TimeWindow window,
> Iterable, Long>>
> input,
> Collector out) throws Exception {
> //do processing
> }
>
> @Override
> public void close() throws Exception {
> //tear down method
> super.close();
> }
> });
>
> Thanks,
> Swapnil
>


SQL for Flink

2016-09-14 Thread Radu Tudoran
Hi,

As a follow up to multiple discussions that happened during Flink Forward about 
how SQL should be supported by Flink, I was thinking to make a couple of 
proposals.
Disclaimer: I do not claim I have managed to synthesized all the discussions 
and probably a great deal of things are still missing

Why supporting SQL for Flink?

-  A goal to support SQL for Flink should be to enable larger adoption 
of Flink - particularly for data scientists / data engineers who might not 
want/know how to program against the existing APIs

-  The main implication as I see from this is that SQL should serve as 
a translation tool of the data processing processing flow to a stream topology 
that will be executed by Flink

-  This would require to support rather soon an SQL client for Flink

How many features should be supported?

-  In order to enable a (close to ) full benefit of the processing 
capabilities of Flink, I believe most of the processing types should be 
supported - this includes all different types of windows, aggregations, 
transformations, joins

-  I would propose that UDFs should also be supported such that one can 
easily add more complex computation if needed

-  In the spirit of the extensibility that Flink supports for the 
operators, functions... such custom operators should be supported to replace 
the default implementations of the SQL logical operators

How much customization should be enabled?

-  Regarding customization this could be provided by configuration 
files. Such a configuration can cover the policies for how the triggers, 
evictors, parallelization ...  will be done for the specific translation of the 
SQL query into Flink code

-  In order to support the integration of custom operators for specific 
SQL logical operators, the users should be enabled also to provide translation 
RULES that will replace the default ones  (e.g. if a user want to define their 
own CUSTOM_TABLE_SCAN, it should be able to provide something like 
configuration.replaceRule(DataStreamScanRule.INSTANCE , 
CUSTOM_TABLE_SCAN_Rule.INSTANCE) - or if the selection of the new translation 
rule can be handled from the cost than simply configuration.addRule( 
CUSTOM_TABLE_SCAN_Rule.INSTANCE)

What do you think?


Dr. Radu Tudoran
Senior Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, 
www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!



Re: SQL for Flink

2016-09-14 Thread Deepak Sharma
+1
Yes.I agree to having SQL for Flink.
I can take up some tasks as well once this starts.

Thanks
Deepak

On Wed, Sep 14, 2016 at 3:47 PM, Radu Tudoran 
wrote:

> Hi,
>
>
>
> As a follow up to multiple discussions that happened during Flink Forward
> about how SQL should be supported by Flink, I was thinking to make a couple
> of proposals.
>
> Disclaimer: I do not claim I have managed to synthesized all the
> discussions and probably a great deal of things are still missing
>
>
>
> *Why supporting SQL for Flink?*
>
> -  A goal to support SQL for Flink should be to enable larger
> adoption of Flink – particularly for data scientists / data engineers who
> might not want/know how to program against the existing APIs
>
> -  The main implication as I see from this is that SQL should
> serve as a translation tool of the data processing processing flow to a
> stream topology that will be executed by Flink
>
> -  This would require to support rather soon an SQL client for
> Flink
>
>
>
> *How many features should be supported?*
>
> -  In order to enable a (close to ) full benefit of the
> processing capabilities of Flink, I believe most of the processing types
> should be supported – this includes all different types of windows,
> aggregations, transformations, joins….
>
> -  I would propose that UDFs should also be supported such that
> one can easily add more complex computation if needed
>
> -  In the spirit of the extensibility that Flink supports for the
> operators, functions… such custom operators should be supported to replace
> the default implementations of the SQL logical operators
>
>
>
> *How much customization should be enabled?*
>
> -  Regarding customization this could be provided by
> configuration files. Such a configuration can cover the policies for how
> the triggers, evictors, parallelization …  will be done for the specific
> translation of the SQL query into Flink code
>
> -  In order to support the integration of custom operators for
> specific SQL logical operators, the users should be enabled also to provide
> translation RULES that will replace the default ones  (e.g. if a user want
> to define their own CUSTOM_TABLE_SCAN, it should be able to provide
> something like configuration.replaceRule(DataStreamScanRule.INSTANCE ,
> CUSTOM_TABLE_SCAN_Rule.INSTANCE) – or if the selection of the new
> translation rule can be handled from the cost than simply
> configuration.addRule( CUSTOM_TABLE_SCAN_Rule.INSTANCE)
>
>
>
> What do you think?
>
>
>
>
>
> Dr. Radu Tudoran
>
> Senior Research Engineer - Big Data Expert
>
> IT R&D Division
>
>
>
> [image: cid:image007.jpg@01CD52EB.AD060EE0]
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>
> European Research Center
>
> Riesstrasse 25, 80992 München
>
>
>
> E-mail: *radu.tudo...@huawei.com *
>
> Mobile: +49 15209084330
>
> Telephone: +49 891588344173
>
>
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Problem with CEPPatternOperator when taskmanager is killed

2016-09-14 Thread jaxbihani
*Problem :*
I have created a PatternStream with "custom type" and added an event
pattern. This works fine in both local and cluster setup. But when I tried
to take one of the taskmanager down (on which task was executing), flink
tries to restart a job but restart fails with the exception :  "Could not
restore checkpointed state to operators and functions" because of
"ClassNotFoundException". Then I tried to copy the application jar into lib
of all the nodes (to avoid any class loader related issues), but in case of
restart it still fails with the same exception but with the cause being
"java.lang.IllegalArgumentException in PriorityQueue init of
AbstractCEPOperator class". Detailed stacktraces are attached

*Setup Details*
Flink version  : 1.1.2
Source : Kakfa Consumer with custom Custom schema
Sink : print()
Using fixed delay restart
Using default (in memory) checkpointing
CEP pattern :
  Pattern.
begin("add message")
.where(new EventFilterFunction(4))
.next("edit issue")
.where(new EventFilterFunction(2))
.next("view issue")
.where(new EventFilterFunction(1));


stacktraces.stacktraces

  
Please point me towards what can be the problem and please let me know if
any other information is needed.




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-CEPPatternOperator-when-taskmanager-is-killed-tp9024.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Streaming issue help needed

2016-09-14 Thread Márton Balassi
Dear Vaidya,

This seems weird, me guess is that somehow that Time and AbstractTime
implementations are not from the same Flink version.

According to your Maven build you should be using Flink 0.10.2. Since then
there have been changes to windowing, are you tied to that version or would
it be feasible to upgrade to the latest (1.1.2) just to have a bit more
information on the issue?

Best,

Marton

On Wed, Sep 14, 2016 at 8:51 AM, Vaidyanathan Sivasubramanian <
svaid...@hotmail.com> wrote:

> Hi,
>
> I am trying to implement Flink steaming using an example in GitHub:
> https://github.com/andi1400/bestXStreamRating.  While executing I am
> getting the below error.  Can you please help guide me on the resolution.
> Thanks in advance!
>
>
> Starting execution of program
>
> 
>  The program finished with the following exception:
>
> java.lang.VerifyError: Bad type on operand stack
> Exception Details:
>   Location:
> ca/uwaterloo/cs/bigdata2016w/andi1400/bestXStreamRating/
> AnalyzeTwitterBestXSentimentRatingFlink$.main([Ljava/lang/String;)V @687:
> invokevirtual
>   Reason:
> Type 'org/apache/flink/streaming/api/windowing/time/Time' (current
> frame, stack[2]) is not assignable to 'org/apache/flink/streaming/
> api/windowing/time/AbstractTime'
>   Current Frame:
> bci: @687
> flags: { }
> locals: { 'ca/uwaterloo/cs/bigdata2016w/andi1400/bestXStreamRating/
> AnalyzeTwitterBestXSentimentRatingFlink$', '[Ljava/lang/String;',
> 'ca/uwaterloo/cs/bigdata2016w/andi1400/bestXStreamRating/Conf',
> 'java/lang/String', integer, 
> 'org/apache/flink/streaming/api/scala/StreamExecutionEnvironment',
> 'org/apache/flink/api/scala/ExecutionEnvironment',
> 'org/apache/flink/api/scala/DataSet', 'org/apache/flink/api/scala/DataSet',
> 'ca/uwaterloo/cs/bigdata2016w/andi1400/bestXStreamRating/TermConfigurationFileScala',
> '[Ljava/lang/String;', 'org/apache/flink/streaming/api/scala/DataStream',
> 'scala/collection/immutable/Map', 
> 'org/apache/flink/streaming/api/scala/DataStream',
> integer, 'org/apache/flink/streaming/api/scala/KeyedStream' }
> stack: { 'org/apache/flink/streaming/api/scala/KeyedStream',
> 'org/apache/flink/streaming/api/windowing/time/Time',
> 'org/apache/flink/streaming/api/windowing/time/Time' }
>   Bytecode:
> 000: bb00 4759 b200 4c2b c000 4eb6 0052 b700
> 010: 554d 2ab6 0057 bb00 5959 b700 5a12 5cb6
> 020: 0060 2cb6 0064 b600 6ab6 0060 b600 6eb6
> 030: 0074 2ab6 0057 bb00 5959 b700 5a12 76b6
> 040: 0060 2cb6 0079 b600 6ab6 0060 b600 6eb6
> 050: 0074 2ab6 0057 bb00 5959 b700 5a12 7bb6
> 060: 0060 2cb6 007e b600 6ab6 0060 b600 6eb6
> 070: 0074 2ab6 0057 bb00 5959 b700 5a12 80b6
> 080: 0060 2cb6 0083 b600 6ab6 0060 b600 6eb6
> 090: 0074 1285 b800 89b2 008f b600 9312 95b8
> 0a0: 0089 b200 8fb6 0093 2cb6 0098 b600 6ac0
> 0b0: 009a 4e2c b600 9db6 006a b800 a399 0007
> 0c0: 03a7 0004 0436 04b2 00a8 b600 ac3a 05b2
> 0d0: 00b1 b600 b43a 0619 062c b600 7eb6 006a
> 0e0: c000 9a19 06b6 00b9 b600 bd3a 0719 07bb
> 0f0: 00bf 59b7 00c0 bb00 c259 b700 c3b2 00c8
> 100: 12ca b600 cdb6 00d3 3a08 bb00 d559 1908
> 110: b600 d9b2 004c b600 ddb9 00e3 0200 b700
> 120: e63a 0919 09b6 00ea 3a0a 2cb6 00ed b600
> 130: 6ab8 00a3 9900 2119 05bb 00ef 5919 0ab7
> 140: 00f1 b200 c812 f3b6 00cd 12f3 b800 f9b6
> 150: 00ff a700 5d19 05bb 0101 5919 0a2c b601
> 160: 04b6 006a c000 9a2c b601 07b6 006a c000
> 170: 9a2c b601 0ab6 006a c000 9a2c b601 0db6
> 180: 006a c000 9a2c b600 98b6 006a c000 9a2c
> 190: b601 10b6 006a b801 1413 0116 b701 19b2
> 1a0: 00c8 12f3 b600 cd12 f3b8 00f9 b600 ff3a
> 1b0: 0b19 062c b601 1cb6 006a c000 9a19 06b6
> 1c0: 00b9 b600 bdbb 011e 59b7 011f bb01 2159
> 1d0: b701 22b2 00c8 12ca b600 cdb6 0125 b600
> 1e0: d9bb 0127 59b7 0128 b901 2c02 00bb 012e
> 1f0: 59b7 012f b901 3402 00bb 0136 59b7 0137
> 200: b201 3cb6 0140 b901 4303 00c0 0131 3a0c
> 210: 190b bb01 4559 b701 4612 9ab8 014c b200
> 220: c812 9ab6 00cd b601 51bb 0153 59b7 0154
> 230: 129a b801 4cb2 00c8 129a b600 cdb6 0151
> 240: bb01 5659 b701 57b6 015b 3a0d 2cb6 0083
> 250: b600 6ab8 00a3 360e 190d bb01 5d59 1909
> 260: 190c 150e b701 60bb 0162 59b7 0163 b200
> 270: c812 cab6 00cd b601 67b2 004c 04bc 0a59
> 280: 0303 4fb6 016b b601 6f3a 0f19 0f2c b600
> 290: 79b6 006a b801 1485 b201 75b8 017b 2cb6
> 2a0: 017e b600 6ab8 0114 85b2 0175 b801 7bb6
> 2b0: 0184 3a10 1910 bb01 8659 b701 87b6 018d
> 2c0: 3a11 1911 bb01 8f59 b701 90bb 0192 59b7
> 2d0: 0193 b200 c812 cab6 00cd b601 513a 1215
> 2e0: 0499 0023 1912 bb01 9559 2d19 09b7 0198
> 2f0: b201 9eb8 014c b200 c8b6 01a2 b601 a53a
> 300: 13a7 0023 

Re: SQL for Flink

2016-09-14 Thread Greg Hogan
Hi Deepak,

There are many open tickets for Flink's SQL API. Documentation is at
https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html.

https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20%22Table%20API%20%26%20SQL%22%20ORDER%20BY%20priority%20DESC

Greg

On Wed, Sep 14, 2016 at 12:27 PM, Deepak Sharma 
wrote:

> +1
> Yes.I agree to having SQL for Flink.
> I can take up some tasks as well once this starts.
>
> Thanks
> Deepak
>
> On Wed, Sep 14, 2016 at 3:47 PM, Radu Tudoran 
> wrote:
>
>> Hi,
>>
>>
>>
>> As a follow up to multiple discussions that happened during Flink Forward
>> about how SQL should be supported by Flink, I was thinking to make a couple
>> of proposals.
>>
>> Disclaimer: I do not claim I have managed to synthesized all the
>> discussions and probably a great deal of things are still missing
>>
>>
>>
>> *Why supporting SQL for Flink?*
>>
>> -  A goal to support SQL for Flink should be to enable larger
>> adoption of Flink – particularly for data scientists / data engineers who
>> might not want/know how to program against the existing APIs
>>
>> -  The main implication as I see from this is that SQL should
>> serve as a translation tool of the data processing processing flow to a
>> stream topology that will be executed by Flink
>>
>> -  This would require to support rather soon an SQL client for
>> Flink
>>
>>
>>
>> *How many features should be supported?*
>>
>> -  In order to enable a (close to ) full benefit of the
>> processing capabilities of Flink, I believe most of the processing types
>> should be supported – this includes all different types of windows,
>> aggregations, transformations, joins….
>>
>> -  I would propose that UDFs should also be supported such that
>> one can easily add more complex computation if needed
>>
>> -  In the spirit of the extensibility that Flink supports for
>> the operators, functions… such custom operators should be supported to
>> replace the default implementations of the SQL logical operators
>>
>>
>>
>> *How much customization should be enabled?*
>>
>> -  Regarding customization this could be provided by
>> configuration files. Such a configuration can cover the policies for how
>> the triggers, evictors, parallelization …  will be done for the specific
>> translation of the SQL query into Flink code
>>
>> -  In order to support the integration of custom operators for
>> specific SQL logical operators, the users should be enabled also to provide
>> translation RULES that will replace the default ones  (e.g. if a user want
>> to define their own CUSTOM_TABLE_SCAN, it should be able to provide
>> something like configuration.replaceRule(DataStreamScanRule.INSTANCE ,
>> CUSTOM_TABLE_SCAN_Rule.INSTANCE) – or if the selection of the new
>> translation rule can be handled from the cost than simply
>> configuration.addRule( CUSTOM_TABLE_SCAN_Rule.INSTANCE)
>>
>>
>>
>> What do you think?
>>
>>
>>
>>
>>
>> Dr. Radu Tudoran
>>
>> Senior Research Engineer - Big Data Expert
>>
>> IT R&D Division
>>
>>
>>
>> [image: cid:image007.jpg@01CD52EB.AD060EE0]
>>
>> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>>
>> European Research Center
>>
>> Riesstrasse 25, 80992 München
>>
>>
>>
>> E-mail: *radu.tudo...@huawei.com *
>>
>> Mobile: +49 15209084330
>>
>> Telephone: +49 891588344173
>>
>>
>>
>> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
>> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
>> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
>> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
>> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>>
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which is intended only for the person or entity whose address is
>> listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure, reproduction,
>> or dissemination) by persons other than the intended recipient(s) is
>> prohibited. If you receive this e-mail in error, please notify the sender
>> by phone or email immediately and delete it!
>>
>>
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>


Re: SQL for Flink

2016-09-14 Thread Deepak Sharma
Thanks Greg .
I will start picking some of them.

Thanks
Deepak

On 14 Sep 2016 6:31 pm, "Greg Hogan"  wrote:

> Hi Deepak,
>
> There are many open tickets for Flink's SQL API. Documentation is at
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html.
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%
> 20component%20%3D%20%22Table%20API%20%26%20SQL%22%20ORDER%
> 20BY%20priority%20DESC
>
> Greg
>
> On Wed, Sep 14, 2016 at 12:27 PM, Deepak Sharma 
> wrote:
>
>> +1
>> Yes.I agree to having SQL for Flink.
>> I can take up some tasks as well once this starts.
>>
>> Thanks
>> Deepak
>>
>> On Wed, Sep 14, 2016 at 3:47 PM, Radu Tudoran 
>> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> As a follow up to multiple discussions that happened during Flink
>>> Forward about how SQL should be supported by Flink, I was thinking to make
>>> a couple of proposals.
>>>
>>> Disclaimer: I do not claim I have managed to synthesized all the
>>> discussions and probably a great deal of things are still missing
>>>
>>>
>>>
>>> *Why supporting SQL for Flink?*
>>>
>>> -  A goal to support SQL for Flink should be to enable larger
>>> adoption of Flink – particularly for data scientists / data engineers who
>>> might not want/know how to program against the existing APIs
>>>
>>> -  The main implication as I see from this is that SQL should
>>> serve as a translation tool of the data processing processing flow to a
>>> stream topology that will be executed by Flink
>>>
>>> -  This would require to support rather soon an SQL client for
>>> Flink
>>>
>>>
>>>
>>> *How many features should be supported?*
>>>
>>> -  In order to enable a (close to ) full benefit of the
>>> processing capabilities of Flink, I believe most of the processing types
>>> should be supported – this includes all different types of windows,
>>> aggregations, transformations, joins….
>>>
>>> -  I would propose that UDFs should also be supported such that
>>> one can easily add more complex computation if needed
>>>
>>> -  In the spirit of the extensibility that Flink supports for
>>> the operators, functions… such custom operators should be supported to
>>> replace the default implementations of the SQL logical operators
>>>
>>>
>>>
>>> *How much customization should be enabled?*
>>>
>>> -  Regarding customization this could be provided by
>>> configuration files. Such a configuration can cover the policies for how
>>> the triggers, evictors, parallelization …  will be done for the specific
>>> translation of the SQL query into Flink code
>>>
>>> -  In order to support the integration of custom operators for
>>> specific SQL logical operators, the users should be enabled also to provide
>>> translation RULES that will replace the default ones  (e.g. if a user want
>>> to define their own CUSTOM_TABLE_SCAN, it should be able to provide
>>> something like configuration.replaceRule(DataStreamScanRule.INSTANCE ,
>>> CUSTOM_TABLE_SCAN_Rule.INSTANCE) – or if the selection of the new
>>> translation rule can be handled from the cost than simply
>>> configuration.addRule( CUSTOM_TABLE_SCAN_Rule.INSTANCE)
>>>
>>>
>>>
>>> What do you think?
>>>
>>>
>>>
>>>
>>>
>>> Dr. Radu Tudoran
>>>
>>> Senior Research Engineer - Big Data Expert
>>>
>>> IT R&D Division
>>>
>>>
>>>
>>> [image: cid:image007.jpg@01CD52EB.AD060EE0]
>>>
>>> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>>>
>>> European Research Center
>>>
>>> Riesstrasse 25, 80992 München
>>>
>>>
>>>
>>> E-mail: *radu.tudo...@huawei.com *
>>>
>>> Mobile: +49 15209084330
>>>
>>> Telephone: +49 891588344173
>>>
>>>
>>>
>>> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>>> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
>>> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
>>> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
>>> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
>>> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>>>
>>> This e-mail and its attachments contain confidential information from
>>> HUAWEI, which is intended only for the person or entity whose address is
>>> listed above. Any use of the information contained herein in any way
>>> (including, but not limited to, total or partial disclosure, reproduction,
>>> or dissemination) by persons other than the intended recipient(s) is
>>> prohibited. If you receive this e-mail in error, please notify the sender
>>> by phone or email immediately and delete it!
>>>
>>>
>>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>


RemoteEnv connect failed

2016-09-14 Thread Dayong
Hi folks,
I need to run a java app to submit a job to remote flink cluster. I am testing 
with the code at 
https://gist.github.com/datafibers/4b842ebc5b3c9e754ceaf78695e7567e
and my comments.


Thanks,
Will

CEP two transitions to the same state

2016-09-14 Thread Frank Dekervel
Hello,

I'm trying to model a FSM using the flink CEP patterns. However, there is
something i can't figure out as all the documentation examples are linear
(either you go to the single possible next state, either no match).

Suppose that two transitions lead from one state to two different states. I
guess this is doable by just defining multiple followedBy/next on the same
state.

But what about two different states that can end up in the same state (in
the order / delivery example: suppose there are two different delivery
methods, having a separate starting state but resulting in the same end
state). It is possible to deduplicate the "delivered" state but this would
lead to difficult to manage patterns when things get more complex.

Thanks!
greetings,
Frank


Re: Fw: Flink Cluster Load Distribution Question

2016-09-14 Thread amir bahmanyari
Hi Aljoscha,Thanks for your response. Its the same job but I am reading through 
TextIO() instead of a Kafka topic.I thought that would make a difference. It 
doesnt. Same slowness in Flink Cluster.I had sent you the code with reading 
from KafkaIO().Nothing different except commenting out the KafkaIO() & 
un-commenting TextIO().Its attached along with the Support class.Is there 
anything interesting you see in my configuration that may cause slowness and/or 
lack of the right distribution in the cluster as a whole?I also attached my 
config files in the JM node...same for other nodes.Have a wonderful day & 
thanks for your attention.Amir-


  From: Aljoscha Krettek 
 To: user@flink.apache.org; amir bahmanyari  
 Sent: Wednesday, September 14, 2016 1:48 AM
 Subject: Re: Fw: Flink Cluster Load Distribution Question
   
Hi,this is a different job from the Kafka Job that you have running, right?
Could you maybe post the code for that as well?
Cheers,Aljoscha
On Tue, 13 Sep 2016 at 20:14 amir bahmanyari  wrote:

Hi Robert,Sure, I am forwarding it to user. Sorry about that. I followed the 
"robot's" instructions :))Topology: 4 Azure A11 CentOS 7 nodes (16 cores, 110 
GB). Lets call them node1, 2, 3, 4.Flink Clustered with node1 running JM & a 
TM. Three more TM's running on node2,3, and 4 respectively.I have a Beam 
running FLink Runner underneath.The input data is received by Beam TextIO() 
reading off a 1.6 GB of data containing roughly 22 million tuples.All nodes 
have identical flink-conf.yaml, masters & slaves contents as follows:
flink-conf.yaml:
jobmanager.rpc.address: node1  jobmanager.rpc.port: 6123 
jobmanager.heap.mb: 1024 taskmanager.heap.mb: 102400 
taskmanager.numberOfTaskSlots: 16  taskmanager.memory.preallocate: false 
parallelism.default: 64 jobmanager.web.port: 8081 
taskmanager.network.numberOfBuffers: 4096


masters: node1:8081
slaves:node1node2
node3
node4

Everything looks normal at ./start-cluster.sh & all daemons start on all 
nodes.JM, TMs log files get generated on all nodes.Dashboard shows how all 
slots are being used.I deploy the Beam app to the cluster where JM is running 
at node1.a *.out file gets generated as data is being processed. No *.out on 
other nodes, just node1 where I deployed the fat jar.I tail -f the *.out log on 
node1 (master). starts fine...but slowly degrades & becomes extremely slow.As 
we speak, I started the Beam app 13 hrs ago and its still running.How can I 
prove that ALL NODES are involved in processing the data at the same time i.e. 
clustered?Do the above configurations look ok for a reasonable 
performance?Given above parameters set, how can I improve the performance in 
this cluster?What other information and or dashboard screen shots is needed to 
clarify this issue. I used these websites to do the configuration:Apache Flink: 
Cluster Setup

  
|  
|   |  
Apache Flink: Cluster Setup
   |  |

  |

 

Apache Flink: Configuration


  
|  
|   |  
Apache Flink: Configuration
   |  |

  |

 
In the second link, there is a config recommendation for the following but this 
parameter is not in the configuration file out of the box:   
   - taskmanager.network.bufferSizeInBytes
Should I include it manually? Does it make any difference if the default value 
i.e.32 KB doesn't get picked up?Sorry too many questions.Pls let me know.I 
appreciate your help.Cheers,Amir-
- Forwarded Message -
 From: Robert Metzger 
 To: "d...@flink.apache.org" ; amir bahmanyari 
 
 Sent: Tuesday, September 13, 2016 1:15 AM
 Subject: Re: Flink Cluster Load Distribution Question
  
Hi Amir,

I would recommend to post such questions to the user@flink mailing list in
the future. This list is meant for development-related topics.

I think we need more details to understand why your application is not
running properly. Can you quickly describe what your topology is doing?
Are you setting the parallelism to a value >= 1 ?

Regards,
Robert


On Tue, Sep 13, 2016 at 6:35 AM, amir bahmanyari <
amirto...@yahoo.com.invalid> wrote:

> Hi Colleagues,Just joined this forum.I have done everything possible to
> get a 4 nodes Flink cluster to work peoperly & run a Beam app.It always
> generates system-output logs (*.out) in only one node. Its so slow
> for 4 nodes being there.Seems like the load is not distributed amongst all
> 4 nodes but only one node. Most of the time the one where JM runs.I
> run/tested it in a single node, and it took even faster to run the same
> load.Not sure whats not being configured right.1- why am I getting
> SystemOut .out log in only one server? All nodes get their TaskManager log
> files updated thu.2- why dont I see load being distributed amongst all 4
> nodes, but only one all the times.3- Why does the Dashboard show a 0 (zero)
> for Send/Receive numbers per all Task Managers.
> The Dashboard shows all the right stuff. Top shows not much of resources
> being stressed on any of the nodes.I can share its contents if it helps
> diagnosing 

Re: Fw: Flink Cluster Load Distribution Question

2016-09-14 Thread amir bahmanyari
Hi Aljoscha,The JM logs is also attached. Seems like everything is ok, 
assigned...to all nodes...Not sure why I dont get performance? 
:-(Thanks+regards,Amir-

  From: Aljoscha Krettek 
 To: user@flink.apache.org; amir bahmanyari  
 Sent: Wednesday, September 14, 2016 1:48 AM
 Subject: Re: Fw: Flink Cluster Load Distribution Question
   
Hi,this is a different job from the Kafka Job that you have running, right?
Could you maybe post the code for that as well?
Cheers,Aljoscha
On Tue, 13 Sep 2016 at 20:14 amir bahmanyari  wrote:

Hi Robert,Sure, I am forwarding it to user. Sorry about that. I followed the 
"robot's" instructions :))Topology: 4 Azure A11 CentOS 7 nodes (16 cores, 110 
GB). Lets call them node1, 2, 3, 4.Flink Clustered with node1 running JM & a 
TM. Three more TM's running on node2,3, and 4 respectively.I have a Beam 
running FLink Runner underneath.The input data is received by Beam TextIO() 
reading off a 1.6 GB of data containing roughly 22 million tuples.All nodes 
have identical flink-conf.yaml, masters & slaves contents as follows:
flink-conf.yaml:
jobmanager.rpc.address: node1  jobmanager.rpc.port: 6123 
jobmanager.heap.mb: 1024 taskmanager.heap.mb: 102400 
taskmanager.numberOfTaskSlots: 16  taskmanager.memory.preallocate: false 
parallelism.default: 64 jobmanager.web.port: 8081 
taskmanager.network.numberOfBuffers: 4096


masters: node1:8081
slaves:node1node2
node3
node4

Everything looks normal at ./start-cluster.sh & all daemons start on all 
nodes.JM, TMs log files get generated on all nodes.Dashboard shows how all 
slots are being used.I deploy the Beam app to the cluster where JM is running 
at node1.a *.out file gets generated as data is being processed. No *.out on 
other nodes, just node1 where I deployed the fat jar.I tail -f the *.out log on 
node1 (master). starts fine...but slowly degrades & becomes extremely slow.As 
we speak, I started the Beam app 13 hrs ago and its still running.How can I 
prove that ALL NODES are involved in processing the data at the same time i.e. 
clustered?Do the above configurations look ok for a reasonable 
performance?Given above parameters set, how can I improve the performance in 
this cluster?What other information and or dashboard screen shots is needed to 
clarify this issue. I used these websites to do the configuration:Apache Flink: 
Cluster Setup

  
|  
|   |  
Apache Flink: Cluster Setup
   |  |

  |

 

Apache Flink: Configuration


  
|  
|   |  
Apache Flink: Configuration
   |  |

  |

 
In the second link, there is a config recommendation for the following but this 
parameter is not in the configuration file out of the box:   
   - taskmanager.network.bufferSizeInBytes
Should I include it manually? Does it make any difference if the default value 
i.e.32 KB doesn't get picked up?Sorry too many questions.Pls let me know.I 
appreciate your help.Cheers,Amir-
- Forwarded Message -
 From: Robert Metzger 
 To: "d...@flink.apache.org" ; amir bahmanyari 
 
 Sent: Tuesday, September 13, 2016 1:15 AM
 Subject: Re: Flink Cluster Load Distribution Question
  
Hi Amir,

I would recommend to post such questions to the user@flink mailing list in
the future. This list is meant for development-related topics.

I think we need more details to understand why your application is not
running properly. Can you quickly describe what your topology is doing?
Are you setting the parallelism to a value >= 1 ?

Regards,
Robert


On Tue, Sep 13, 2016 at 6:35 AM, amir bahmanyari <
amirto...@yahoo.com.invalid> wrote:

> Hi Colleagues,Just joined this forum.I have done everything possible to
> get a 4 nodes Flink cluster to work peoperly & run a Beam app.It always
> generates system-output logs (*.out) in only one node. Its so slow
> for 4 nodes being there.Seems like the load is not distributed amongst all
> 4 nodes but only one node. Most of the time the one where JM runs.I
> run/tested it in a single node, and it took even faster to run the same
> load.Not sure whats not being configured right.1- why am I getting
> SystemOut .out log in only one server? All nodes get their TaskManager log
> files updated thu.2- why dont I see load being distributed amongst all 4
> nodes, but only one all the times.3- Why does the Dashboard show a 0 (zero)
> for Send/Receive numbers per all Task Managers.
> The Dashboard shows all the right stuff. Top shows not much of resources
> being stressed on any of the nodes.I can share its contents if it helps
> diagnosing the issue.Thanks + I appreciate your valuable time, response &
> help.Amir-


 


   

flink-abahman-jobmanager-1-beam1.log
Description: Binary data


Re: Fw: Flink Cluster Load Distribution Question

2016-09-14 Thread amir bahmanyari
Hi Aljoscha,Experimenting on  relatively smaller file , everything fixed except 
KafkaIO()  vs. TextIO(), I get 50% better runtime performance in the Flink 
Cluster when reading tuples by TextIO().I understand the NW involvement in 
reading from Kafka topic etc.,  but 50% is significant.Also, I experimented 64 
partitions in Kafka topic vs. 400. I get exact same performance & increasing 
the topic partitions doesnt improve anything.I thought some of the 64 slots may 
get multiple-over- parallelism really pushing it to its limit. 64 kafka topic 
partitions & 400 kafka topic partitions while #slots=64  is the same.
Its still slow for a relatively large file though.Pls advice if something I can 
try to improve the cluster performance.Thanks+regards

  From: Aljoscha Krettek 
 To: user@flink.apache.org; amir bahmanyari  
 Sent: Wednesday, September 14, 2016 1:48 AM
 Subject: Re: Fw: Flink Cluster Load Distribution Question
   
Hi,this is a different job from the Kafka Job that you have running, right?
Could you maybe post the code for that as well?
Cheers,Aljoscha
On Tue, 13 Sep 2016 at 20:14 amir bahmanyari  wrote:

Hi Robert,Sure, I am forwarding it to user. Sorry about that. I followed the 
"robot's" instructions :))Topology: 4 Azure A11 CentOS 7 nodes (16 cores, 110 
GB). Lets call them node1, 2, 3, 4.Flink Clustered with node1 running JM & a 
TM. Three more TM's running on node2,3, and 4 respectively.I have a Beam 
running FLink Runner underneath.The input data is received by Beam TextIO() 
reading off a 1.6 GB of data containing roughly 22 million tuples.All nodes 
have identical flink-conf.yaml, masters & slaves contents as follows:
flink-conf.yaml:
jobmanager.rpc.address: node1  jobmanager.rpc.port: 6123 
jobmanager.heap.mb: 1024 taskmanager.heap.mb: 102400 
taskmanager.numberOfTaskSlots: 16  taskmanager.memory.preallocate: false 
parallelism.default: 64 jobmanager.web.port: 8081 
taskmanager.network.numberOfBuffers: 4096


masters: node1:8081
slaves:node1node2
node3
node4

Everything looks normal at ./start-cluster.sh & all daemons start on all 
nodes.JM, TMs log files get generated on all nodes.Dashboard shows how all 
slots are being used.I deploy the Beam app to the cluster where JM is running 
at node1.a *.out file gets generated as data is being processed. No *.out on 
other nodes, just node1 where I deployed the fat jar.I tail -f the *.out log on 
node1 (master). starts fine...but slowly degrades & becomes extremely slow.As 
we speak, I started the Beam app 13 hrs ago and its still running.How can I 
prove that ALL NODES are involved in processing the data at the same time i.e. 
clustered?Do the above configurations look ok for a reasonable 
performance?Given above parameters set, how can I improve the performance in 
this cluster?What other information and or dashboard screen shots is needed to 
clarify this issue. I used these websites to do the configuration:Apache Flink: 
Cluster Setup

  
|  
|   |  
Apache Flink: Cluster Setup
   |  |

  |

 

Apache Flink: Configuration


  
|  
|   |  
Apache Flink: Configuration
   |  |

  |

 
In the second link, there is a config recommendation for the following but this 
parameter is not in the configuration file out of the box:   
   - taskmanager.network.bufferSizeInBytes
Should I include it manually? Does it make any difference if the default value 
i.e.32 KB doesn't get picked up?Sorry too many questions.Pls let me know.I 
appreciate your help.Cheers,Amir-
- Forwarded Message -
 From: Robert Metzger 
 To: "d...@flink.apache.org" ; amir bahmanyari 
 
 Sent: Tuesday, September 13, 2016 1:15 AM
 Subject: Re: Flink Cluster Load Distribution Question
  
Hi Amir,

I would recommend to post such questions to the user@flink mailing list in
the future. This list is meant for development-related topics.

I think we need more details to understand why your application is not
running properly. Can you quickly describe what your topology is doing?
Are you setting the parallelism to a value >= 1 ?

Regards,
Robert


On Tue, Sep 13, 2016 at 6:35 AM, amir bahmanyari <
amirto...@yahoo.com.invalid> wrote:

> Hi Colleagues,Just joined this forum.I have done everything possible to
> get a 4 nodes Flink cluster to work peoperly & run a Beam app.It always
> generates system-output logs (*.out) in only one node. Its so slow
> for 4 nodes being there.Seems like the load is not distributed amongst all
> 4 nodes but only one node. Most of the time the one where JM runs.I
> run/tested it in a single node, and it took even faster to run the same
> load.Not sure whats not being configured right.1- why am I getting
> SystemOut .out log in only one server? All nodes get their TaskManager log
> files updated thu.2- why dont I see load being distributed amongst all 4
> nodes, but only one all the times.3- Why does the Dashboard show a 0 (zero)
> for Send/Receive numbers per all Task Managers.
> The Dashboard shows all the right stuff. Top shows no

Re: Data Transfer between TM should be encrypted

2016-09-14 Thread vinay patil
Hi Vijay,

Did you raise the PR for this task, I don't mind testing it out as well.

Regards,
Vinay Patil

On Tue, Aug 30, 2016 at 6:28 PM, Vinay Patil 
wrote:

> Hi Vijay,
>
> That's a good news for me. Eagerly waiting for this change so that I can
> integrate and test it before going live.
>
> Regards,
> Vinay Patil
>
> On Tue, Aug 30, 2016 at 4:06 PM, Vijay Srinivasaraghavan [via Apache Flink
> User Mailing List archive.] 
> wrote:
>
>> Hi Stephan,
>>
>> The dev work is almost complete except the Yarn mode deployment stuff
>> that needs to be patched. We are expecting to send a PR in a week or two.
>>
>> Regards
>> Vijay
>>
>>
>> On Tuesday, August 30, 2016 12:39 AM, Stephan Ewen <[hidden email]
>> > wrote:
>>
>>
>> Let me loop in Vijay, I think he is the one working on this and can
>> probably give the best estimate when it can be expected.
>>
>> @vijay: For the SSL/TLS transport encryption - do you have an estimate
>> for the timeline of that feature?
>>
>>
>> On Mon, Aug 29, 2016 at 8:54 PM, vinay patil <[hidden email]
>> > wrote:
>>
>> Hi Stephan,
>>
>> Thank you for your reply.
>>
>> Till when can I expect this feature to be integrated in master or release
>> version ?
>>
>> We are going to get production data (financial data) in October end , so
>> want to have this feature before that.
>>
>> Regards,
>> Vinay Patil
>>
>> On Mon, Aug 29, 2016 at 11:15 AM, Stephan Ewen [via Apache Flink User
>> Mailing List archive.] <[hidden email]> wrote:
>>
>> Hi!
>>
>> The way that the JIRA issue you linked will achieve this is by hooking
>> into the network stream pipeline directly, and encrypt the raw network byte
>> stream. We built the network stack on Netty, and will use Netty's SSL/TLS
>> handlers for that.
>>
>> That should be much more efficient than manual encryption/decryption in
>> each user function.
>>
>> Stephan
>>
>>
>>
>>
>>
>>
>> On Mon, Aug 29, 2016 at 6:12 PM, vinay patil <[hidden email]> wrote:
>>
>> Hi Ufuk,
>>
>> This is regarding this issue
>> https://issues.apache.org/jira /browse/FLINK-4404
>> 
>>
>> How can we achieve this, I am able to decrypt the data from Kafka coming
>> in, but I want to make sure that the data is encrypted when flowing between
>> TM's.
>>
>> One approach I can think of is to decrypt the data at the start of each
>> operator and encrypt it at the end of each operator, but I feel this is not
>> an efficient approach.
>>
>> I just want to check if there are alternatives to this and can this be
>> achieved by doing some configurations.
>>
>> Regards,
>> Vinay Patil
>>
>> --
>> View this message in context: Data Transfer between TM should be
>> encrypted
>> 
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive
>> 
>> at Nabble.com.
>>
>>
>>
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-maili ng-list-archive.2336050.n4.
>> nabble.com/Data-Transfer-betwe en-TM-should-be-encrypted- tp8781p8782.html
>> 
>> To start a new topic under Apache Flink User Mailing List archive., email 
>> [hidden
>> email]
>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>> NAML
>> 
>>
>>
>>
>> --
>> View this message in context: Re: Data Transfer between TM should be
>> encrypted
>> 
>>
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive
>> 
>> at Nabble.com.
>>
>>
>>
>>
>>
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/Data-Transfer-between-TM-should-be-encrypted-tp8781p8801.html
>> To start a new topic under Apache Flink User Mailing List archive., email
>> ml-