Re: Dependency convergence

2017-10-04 Thread Piotr Nowojski
I meant for whole flink project. 

> On Oct 4, 2017, at 8:43 PM, Bowen Li  wrote:
> 
> +1. This is great, Piotrek!
> 
> BTW, can you clarify what you mean by 'project wide'? Is it the whole
> `flink` project or just `flink-connector-kafka`? I think it's useful to
> be applied to the whole flink project. I've seen dependencies conflict
> problem like this in flink-connector-kinesis. Enabling this in flink would
> protect us from many hidden issues.
> 
> Bowen
> 
> 
> 
> On Wed, Oct 4, 2017 at 9:39 AM, Piotr Nowojski 
> wrote:
> 
>> Hi,
>> 
>> I have spent last couple of days trying to find and fix Kafka tests
>> instabilities on Travis and I think I have finally found the main reason:
>> dependency conflict on Netty. flakka was pulling in 3.8 and zookeeper 3.10.
>> Effect was very subtle, because rarely in some corner cases (but not
>> always) Netty was deadlocking itself…
>> 
>> Because of that I would like to enable dependencyConvergence rule in
>> maven-enforcer-plugin project wide - it catches this error immediately:
>> 
>> Dependency convergence error for io.netty:netty:3.10.5.Final paths to
>> dependency are:
>> +-org.apache.flink:flink-connector-kafka-0.9_2.11:1.4-SNAPSHOT
>>  +-org.apache.kafka:kafka_2.11:0.9.0.1
>>+-org.apache.zookeeper:zookeeper:3.4.10
>>  +-io.netty:netty:3.10.5.Final
>> and
>> +-org.apache.flink:flink-connector-kafka-0.9_2.11:1.4-SNAPSHOT
>>  +-org.apache.flink:flink-runtime_2.11:1.4-SNAPSHOT
>>+-com.data-artisans:flakka-remote_2.11:2.3-custom
>>  +-io.netty:netty:3.8.0.Final
>> 
>> Currently this rule fails with multiple errors, but after those lost
>> couple of days I’m pretty determined to fix all of them “just in case”.
>> dependencyConvergence rule would protect us in the future against such
>> nasty subtle bugs. Does anyone have any objections/issues that I’m not
>> aware of?
>> 
>> Piotrek



Re: Dependency convergence

2017-10-04 Thread Bowen Li
+1. This is great, Piotrek!

BTW, can you clarify what you mean by 'project wide'? Is it the whole
`flink` project or just `flink-connector-kafka`? I think it's useful to
be applied to the whole flink project. I've seen dependencies conflict
problem like this in flink-connector-kinesis. Enabling this in flink would
protect us from many hidden issues.

Bowen



On Wed, Oct 4, 2017 at 9:39 AM, Piotr Nowojski 
wrote:

> Hi,
>
> I have spent last couple of days trying to find and fix Kafka tests
> instabilities on Travis and I think I have finally found the main reason:
> dependency conflict on Netty. flakka was pulling in 3.8 and zookeeper 3.10.
> Effect was very subtle, because rarely in some corner cases (but not
> always) Netty was deadlocking itself…
>
> Because of that I would like to enable dependencyConvergence rule in
> maven-enforcer-plugin project wide - it catches this error immediately:
>
> Dependency convergence error for io.netty:netty:3.10.5.Final paths to
> dependency are:
> +-org.apache.flink:flink-connector-kafka-0.9_2.11:1.4-SNAPSHOT
>   +-org.apache.kafka:kafka_2.11:0.9.0.1
> +-org.apache.zookeeper:zookeeper:3.4.10
>   +-io.netty:netty:3.10.5.Final
> and
> +-org.apache.flink:flink-connector-kafka-0.9_2.11:1.4-SNAPSHOT
>   +-org.apache.flink:flink-runtime_2.11:1.4-SNAPSHOT
> +-com.data-artisans:flakka-remote_2.11:2.3-custom
>   +-io.netty:netty:3.8.0.Final
>
> Currently this rule fails with multiple errors, but after those lost
> couple of days I’m pretty determined to fix all of them “just in case”.
> dependencyConvergence rule would protect us in the future against such
> nasty subtle bugs. Does anyone have any objections/issues that I’m not
> aware of?
>
> Piotrek


Dependency convergence

2017-10-04 Thread Piotr Nowojski
Hi,

I have spent last couple of days trying to find and fix Kafka tests 
instabilities on Travis and I think I have finally found the main reason: 
dependency conflict on Netty. flakka was pulling in 3.8 and zookeeper 3.10. 
Effect was very subtle, because rarely in some corner cases (but not always) 
Netty was deadlocking itself… 

Because of that I would like to enable dependencyConvergence rule in 
maven-enforcer-plugin project wide - it catches this error immediately:

Dependency convergence error for io.netty:netty:3.10.5.Final paths to 
dependency are:
+-org.apache.flink:flink-connector-kafka-0.9_2.11:1.4-SNAPSHOT
  +-org.apache.kafka:kafka_2.11:0.9.0.1
+-org.apache.zookeeper:zookeeper:3.4.10
  +-io.netty:netty:3.10.5.Final
and
+-org.apache.flink:flink-connector-kafka-0.9_2.11:1.4-SNAPSHOT
  +-org.apache.flink:flink-runtime_2.11:1.4-SNAPSHOT
+-com.data-artisans:flakka-remote_2.11:2.3-custom
  +-io.netty:netty:3.8.0.Final

Currently this rule fails with multiple errors, but after those lost couple of 
days I’m pretty determined to fix all of them “just in case”. 
dependencyConvergence rule would protect us in the future against such nasty 
subtle bugs. Does anyone have any objections/issues that I’m not aware of?

Piotrek

Re: System resource logger

2017-10-04 Thread Greg Hogan
What if we added these as system metrics and added a way to write metrics to a 
(separate?) log file?


> On Oct 4, 2017, at 10:13 AM, Piotr Nowojski  wrote:
> 
> Hi,
> 
> Lately I was debugging some weird test failures on Travis and I needed to 
> look into metrics like:
> - User, System, IOWait, IRQ CPU usages (based on CPU ticks since previous 
> check)
> - System wide memory consumption (including making sure that swap was 
> disabled)
> - network usage 
> - etc…
> 
> Without an access to the machines itself. For this purpose I implemented some 
> periodic daemon thread logger. Log output looked like this:
> 
> https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 
> 
> 
> I think it would be nice to add this feature to Flink itself, by extending 
> existing MemoryLogger. Same lack of information that I had with travis could 
> easily happen on productional environments. The problem is that there is no 
> easy way to obtain such kind of information without using some external 
> libraries (think about cross platform support). I have used for that:
> 
> https://github.com/oshi/oshi 
> 
> It has some minimal additional dependencies, one thing worth noting is a JNA 
> - it’s JAR weights ~1MB. We would have two options to add this feature:
> 
> 1. Include this oshi dependency in flink-runtime
> 2. Wrap oshi into flink-contrib/flink-resource-logger module and make this 
> new module an optional/dynamically loaded  dependency by flink-runtime (used 
> only if user manually copies flink-resource-logger.jar to a class path).
> 
> I would lean toward 1., since that’s a powerful tool and it’s dependencies 
> are pretty minimal (except this JNA’s jar size). What do you think?
> 
> Piotrek


System resource logger

2017-10-04 Thread Piotr Nowojski
Hi,

Lately I was debugging some weird test failures on Travis and I needed to look 
into metrics like:
- User, System, IOWait, IRQ CPU usages (based on CPU ticks since previous check)
- System wide memory consumption (including making sure that swap was disabled)
- network usage 
- etc…

Without an access to the machines itself. For this purpose I implemented some 
periodic daemon thread logger. Log output looked like this:

https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 


I think it would be nice to add this feature to Flink itself, by extending 
existing MemoryLogger. Same lack of information that I had with travis could 
easily happen on productional environments. The problem is that there is no 
easy way to obtain such kind of information without using some external 
libraries (think about cross platform support). I have used for that:

https://github.com/oshi/oshi 

It has some minimal additional dependencies, one thing worth noting is a JNA - 
it’s JAR weights ~1MB. We would have two options to add this feature:

1. Include this oshi dependency in flink-runtime
2. Wrap oshi into flink-contrib/flink-resource-logger module and make this new 
module an optional/dynamically loaded  dependency by flink-runtime (used only 
if user manually copies flink-resource-logger.jar to a class path).

I would lean toward 1., since that’s a powerful tool and it’s dependencies are 
pretty minimal (except this JNA’s jar size). What do you think?

Piotrek



[jira] [Created] (FLINK-7761) Twitter example is not self-contained

2017-10-04 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-7761:
---

 Summary: Twitter example is not self-contained
 Key: FLINK-7761
 URL: https://issues.apache.org/jira/browse/FLINK-7761
 Project: Flink
  Issue Type: Bug
  Components: Examples
Affects Versions: 1.4.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.4.0


The Twitter example jar is not self-contained as it excludes the shaded guava 
dependency from the twitter connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7760) Restore failing from external checkpointing metadata.

2017-10-04 Thread Shashank Agarwal (JIRA)
Shashank Agarwal created FLINK-7760:
---

 Summary: Restore failing from external checkpointing metadata.
 Key: FLINK-7760
 URL: https://issues.apache.org/jira/browse/FLINK-7760
 Project: Flink
  Issue Type: Bug
  Components: CEP, State Backends, Checkpointing
Affects Versions: 1.3.2
 Environment: Yarn, Flink 1.3.2, HDFS,  FsStateBackend
Reporter: Shashank Agarwal


My job failed due to failure of cassandra. I have enabled 
ExternalizedCheckpoints. But when job tried to restore from that checkpoint 
it's failing continuously with following error.

{code:java}
2017-10-04 09:39:20,611 INFO  org.apache.flink.runtime.taskmanager.Task 
- KeyedCEPPatternOperator -> Map (1/2) 
(8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED.
java.lang.IllegalStateException: Could not initialize keyed state backend.
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.StreamCorruptedException: invalid type code: 00
at 
java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519)
at 
java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553)
at 
java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455)
at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at 
org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211)
at 
org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169)
at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957)
at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852)
at 
org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132)
at 
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518)
at 
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311)
... 6 more
{code}

I have tried to start new job also after failure with parameter {code:java} -s 
[checkpoint meta data path]{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Fwd: Re: Flink Watermark and timing

2017-10-04 Thread Timo Walther

Hi Björn,

the behavior of borderlines is defined clearly by the API: "start 
timestamp (inclusive) and an end timestamp (exclusive)". So it is always 
[0-4] [5-9]. You could increase the interval by one millisecond to 
include 5.



Regards,

Timo



 Weitergeleitete Nachricht 
Betreff:Re: Flink Watermark and timing
Datum:  Tue, 3 Oct 2017 06:37:13 +0200
Von:Björn Zachrisson 
An: Timo Walther 



Hi Timo,

One more question regarding that to clarify.
Where do i specify in which window a event that arrives on the exact 
window-borderline, window sizes [0-5] [5-10] and the event arrives at 
exactly 5

Where should the event go and can i control this?

Regards
Björn

2017-10-02 19:28 GMT+02:00 Timo Walther >:


   Hi Björn,


   I don't know if I get your example correctly, but I think your
   explanation "All events up to and equal to watermark should be
   handled in the prevoius window" is not 100% correct. Watermarks just
   indicate the progress ("until here we have seen all events with
   lower timestamp than X") and trigger the evaluation of a window. The
   assignment of events to windows is based on the timestamp not the
   watermark. The documentation will be improved for the upcoming release:

   
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/windows.html#window-assigners
   


   "Time-based windows have a start timestamp (inclusive) and an end
   timestamp (exclusive) that together describe the size of the window. "

   I hope this helps.

   Regards,
   Timo


   Am 10/2/17 um 1:06 PM schrieb Björn Zachrisson:

Hi,

I have a question regarding timing of events.

According to;

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_time.html#event-time-and-watermarks



All events up to and equal to watermark should be handled in "the
prevoius window".

In my case I use event-timestamp.


I'm testing the timing out.

The case is events from 2000-01-01 02:00:00 and up to 2000-01-01
02:20:00 where eavh event is 2 minutes apart. I try to group the
events in 5 minute windows

2000-01-01 02:00:00 => 2000-01-01 02:05:00
2000-01-01 02:05:00 => 2000-01-01 02:10:00
2000-01-01 02:10:00 => 2000-01-01 02:15:00
2000-01-01 02:15:00 => 2000-01-01 02:20:00

How ever, events at the exakt time 02:10:00 (94669260) is put
in the Window "2000-01-01 02:10:00 => 2000-01-01 02:15:00" which
is not according to what i can read on the wiki.

This is the exakt result;
2000-01-01 02:00:00, 94669200
2000-01-01 02:02:00, 94669212
2000-01-01 02:04:00, 94669224

2000-01-01 02:06:00, 94669236
2000-01-01 02:08:00, 94669248

2000-01-01 02:10:00, 94669260
2000-01-01 02:12:00, 94669272
2000-01-01 02:14:00, 94669284

2000-01-01 02:16:00, 94669296
2000-01-01 02:18:00, 94669308

2000-01-01 02:20:00, 94669320

Is this due to that I'm using event time extractor or what might
be the case?

Regards
Björn







[jira] [Created] (FLINK-7759) Fix Bug that fieldName/functionName with prefix equals "true" or "false" can't be parsed by ExpressionParser.

2017-10-04 Thread Ruidong Li (JIRA)
Ruidong Li created FLINK-7759:
-

 Summary: Fix Bug that fieldName/functionName with prefix equals 
"true" or "false" can't be parsed by ExpressionParser.
 Key: FLINK-7759
 URL: https://issues.apache.org/jira/browse/FLINK-7759
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: Ruidong Li
Assignee: Ruidong Li


just call {{ExpressionParser.parseExpression}} with a prefix equals "true" or 
"false"
{{ExpressionParser.parseExpression("true_target")}} or 
{{ExpressionParser.parseExpression("falsex")}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Unable to write snapshots to S3 on EMR

2017-10-04 Thread Till Rohrmann
Hi Andy,

this looks to me indeed like a dependency problem. I assume that EMR or
something else is pulling in an incompatible version of Hadoop.

The classpath you've posted, is this the one logged in the log files
(TaskManager log) or did you compile it yourself? In the latter case, it
would also be helpful to get access to the TaskManager logs.

Cheers,
Till

On Mon, Oct 2, 2017 at 10:20 PM, Andy M.  wrote:

> Hi Fabian,
>
> 1) I have looked at the linked docs, and from what I can tell no setup
> should really need to be done to get Flink working(Other than downloading
> the correct binaries, which I believe I did)
> 2) I have downloaded the Flink 1.3.2 binaries(flink-1.3.2-bin-
> hadoop27-scala_2.11.tgz
>  hadoop27-scala_2.11.tgz>)
> This is for hadoop 2.7.X, which matches EMR 5.8.0.
>
> I appreciate any help or guidance you can provide me in fixing my problems,
> please let me know if there is anything else I can provide you.
>
> Thank you
>
> On Mon, Oct 2, 2017 at 4:12 PM, Fabian Hueske  wrote:
>
> > Hi Andy,
> >
> > I'm not an AWS expert, so I'll just check on some common issues.
> >
> > I guess you already had a look at the Flink docs for AWS/EMR but I'll
> post
> > the link just be to sure [1].
> >
> > Since you are using Flink 1.3.2 (EMR 5.8.0 comes with Flink 1.3.1) did
> you
> > built Flink yourself or did you download the binaries?
> > Does the Hadoop version of the Flink build match the Hadoop version of
> EMR
> > 5.8.0, i.e., Hadoop 2.7.x?
> >
> > Best, Fabian
> >
> > [1]
> > https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/setup/aws.html
> >
> > 2017-10-02 21:51 GMT+02:00 Andy M. :
> >
> > > Hi Fabian,
> > >
> > > Sorry, I just realized I forgot to include that part.  The error
> returned
> > > is:
> > >
> > > java.lang.NoSuchMethodError:
> > > org.apache.hadoop.conf.Configuration.addResource(
> > Lorg/apache/hadoop/conf/
> > > Configuration;)V
> > > at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(
> > > EmrFileSystem.java:93)
> > > at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.
> > > initialize(HadoopFileSystem.java:328)
> > > at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
> > > FileSystem.java:350)
> > > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389)
> > > at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293)
> > > at org.apache.flink.runtime.state.filesystem.
> > > FsCheckpointStreamFactory.(FsCheckpointStreamFactory.java:99)
> > > at org.apache.flink.runtime.state.filesystem.FsStateBackend.
> > > createStreamFactory(FsStateBackend.java:282)
> > > at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.
> > > createStreamFactory(RocksDBStateBackend.java:273
> > >
> > > I believe it has something to do with the classpath, but I am unsure
> why
> > or
> > > how to fix it.  The classpath being used during the execution is:
> > > /home/hadoop/flink-1.3.2/lib/flink-python_2.11-1.3.2.jar:/
> > > ho‌​me/hadoop/flink-1.3.‌​2/lib/flink-shaded-h‌​adoop2-
> > > uber-1.3.2.ja‌​r:/home/hadoop/flink‌​-1.3.2/lib/log4j-1.2‌​.
> > > 17.jar:/home/hadoop‌​/flink-1.3.2/lib/slf‌​4j-log4j12-1.7.7.
> > > jar‌​:/home/hadoop/flink-‌​1.3.2/lib/flink-dist‌​_2.11-1.3.
> > > 2.jar::/et‌​c/hadoop/conf:
> > >
> > > I decompiled flink-shaded-h‌​adoop2-uber-1.3.2.ja‌​r and it seems the
> > > addResource function does seem to be there.
> > >
> > > Thank you
> > >
> > > On Mon, Oct 2, 2017 at 3:43 PM, Fabian Hueske 
> wrote:
> > >
> > > > Hi Andy,
> > > >
> > > > can you describe in more detail what exactly isn't working?
> > > > Do you see error messages in the log files or on the console?
> > > >
> > > > Thanks, Fabian
> > > >
> > > > 2017-10-02 15:52 GMT+02:00 Andy M. :
> > > >
> > > > > Hello,
> > > > >
> > > > > I am about to deploy my first Flink projects  to production, but I
> am
> > > > > running into a very big hurdle.  I am unable to launch my project
> so
> > it
> > > > can
> > > > > write to an S3 bucket.  My project is running on an EMR cluster,
> > where
> > > I
> > > > > have installed Flink 1.3.2.  I am using Yarn to launch the
> > application,
> > > > and
> > > > > it seems to run fine unless I am trying to enable check
> > pointing(with a
> > > > S3
> > > > > target).  I am looking to use RocksDB as my check-pointing backend.
> > I
> > > > have
> > > > > asked a few places, and I am still unable to find a solution to
> this
> > > > > problem.  Here are my steps for creating a cluster, and launching
> my
> > > > > application, perhaps I am missing a step.  I'd be happy to provide
> > any
> > > > > additional information if needed.
> > > > >
> > > > > AWS Portal:
> > > > >
> > > > > 1) EMR -> Create Cluster
> > > > > 2) Advanced Options
> > > > > 3) Release = emr-5.8.0
> > > > > 4) Only select Hadoop 2.7.3
> > > > > 5) Next 

[jira] [Created] (FLINK-7758) Fix bug Kafka09Fetcher add offset metrics

2017-10-04 Thread Hai Zhou UTC+8 (JIRA)
Hai Zhou UTC+8 created FLINK-7758:
-

 Summary: Fix bug  Kafka09Fetcher add offset metrics 
 Key: FLINK-7758
 URL: https://issues.apache.org/jira/browse/FLINK-7758
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: 1.3.2
Reporter: Hai Zhou UTC+8
Assignee: Hai Zhou UTC+8
 Fix For: 1.4.0


in Kafka09Fetcher, add _KafkaConsumer_ kafkaMetricGroup. 
No judgment that the useMetrics variable is true.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)