Error when accessing secure HDFS with standalone Flink

2016-03-10 Thread Stefano Baghino
Hello everybody,

me and my colleagues have been running some tests on Flink 1.0.0 in a
secure environment (Kerberos). Yesterday we did several tests on the
standalone Flink deployment but couldn't get it to access HDFS. Judging
from the error it looks like Flink is not trying to authenticate itself
with Kerberos. The root cause of the error is
"org.apache.hadoop.security.AccessControlException: SIMPLE authentication
is not enabled.  Available:[TOKEN, KERBEROS]". I've put the whole logs in this
gist . I've
went through the source code and judging from what I saw this error is
emitted by Hadoop if a client is not using any authentication method on a
secure cluster. Also, in the source code of Flink, it looks like when
running a job on a secure cluster a log message (at INFO level) should be
printed stating the fact.

To go through the steps I followed to setup the environment: I've built
Flink and put it in the same folder under the two nodes of the cluster,
adjusted the configs, assigned its ownership (and write permissions) to a
group, than I ran kinit with a user belonging to that group on both the
nodes and finally I ran start-cluster.sh and deployed the job. I tried both
running the job as the same user who ran the start-cluster.sh script and
another one (still authenticated with Kerberos on both nodes).

The core-site.xml correctly states that the authentication method is
kerberos and using the hdfs CLI everything runs as expected. Thinking it
could be an error tied to permissions on the core-site.xml file I also
added the user running the start-cluster.sh script to the hadoop group,
which owned the file, yield the same results, unfortunately.

Can you help me troubleshoot this issue? Thank you so much in advance!

-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Re: DataSet -> DataStream

2016-03-10 Thread Ashutosh Kumar
As data is already collected, why do you want add one more layer of Kafka.
Instead you can start processing your data.
Thanks
Ashutosh
On Mar 11, 2016 4:19 AM, "Prez Cannady"  wrote:

>
> I’d like to pour some data I’ve collected into a DataSet via JDBC into a
> Kafka topic, but I think I need to transform my DataSet into a DataStream
> first.  If anyone has a clue how to proceed, I’d appreciate it; or let me
> know if I’m completely off track.
>
>
> Prez Cannady
> p: 617 500 3378
> e: revp...@opencorrelate.org
> GH: https://github.com/opencorrelate
> LI: https://www.linkedin.com/in/revprez
>
>
>
>
>
>
>
>
>
>


Re: 404 error for Flink examples

2016-03-10 Thread janardhan shetty
Thanks Balaji.

This needs to be updated in the Job.java file of quickstart application.
I am using 1.0 version

On Thu, Mar 10, 2016 at 9:23 PM, Balaji Rajagopalan <
balaji.rajagopa...@olacabs.com> wrote:

> You could try this link.
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/examples.html
>
> On Fri, Mar 11, 2016 at 9:56 AM, janardhan shetty 
> wrote:
>
>> Hi,
>>
>> I was looking at the examples for Flink applications and the comment in
>> quickstart/job results in 404 for the web page.
>>
>> http://flink.apache.org/docs/latest/examples.html
>>
>> This needs to be updated
>>
>>
>>
>


Re: 404 error for Flink examples

2016-03-10 Thread Balaji Rajagopalan
You could try this link.

https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/examples.html

On Fri, Mar 11, 2016 at 9:56 AM, janardhan shetty 
wrote:

> Hi,
>
> I was looking at the examples for Flink applications and the comment in
> quickstart/job results in 404 for the web page.
>
> http://flink.apache.org/docs/latest/examples.html
>
> This needs to be updated
>
>
>


Re: DataSet -> DataStream

2016-03-10 Thread Balaji Rajagopalan
You could I suppose write the dateset to a sink a file and then read the
file to a data stream.

On Fri, Mar 11, 2016 at 4:18 AM, Prez Cannady 
wrote:

>
> I’d like to pour some data I’ve collected into a DataSet via JDBC into a
> Kafka topic, but I think I need to transform my DataSet into a DataStream
> first.  If anyone has a clue how to proceed, I’d appreciate it; or let me
> know if I’m completely off track.
>
>
> Prez Cannady
> p: 617 500 3378
> e: revp...@opencorrelate.org
> GH: https://github.com/opencorrelate
> LI: https://www.linkedin.com/in/revprez
>
>
>
>
>
>
>
>
>
>


404 error for Flink examples

2016-03-10 Thread janardhan shetty
Hi,

I was looking at the examples for Flink applications and the comment in
quickstart/job results in 404 for the web page.

http://flink.apache.org/docs/latest/examples.html

This needs to be updated


DataSet -> DataStream

2016-03-10 Thread Prez Cannady

I’d like to pour some data I’ve collected into a DataSet via JDBC into a Kafka 
topic, but I think I need to transform my DataSet into a DataStream first.  If 
anyone has a clue how to proceed, I’d appreciate it; or let me know if I’m 
completely off track.


Prez Cannady  
p: 617 500 3378  
e: revp...@opencorrelate.org   
GH: https://github.com/opencorrelate   
LI: https://www.linkedin.com/in/revprez   











Re: asm IllegalArgumentException with 1.0.0

2016-03-10 Thread Zach Cox
After some poking around I noticed
that flink-connector-elasticsearch_2.10-1.0.0.jar contains shaded asm
classes. If I remove that dependency from my project then I do not get the
IllegalArgumentException.


On Thu, Mar 10, 2016 at 11:51 AM Zach Cox  wrote:

> Here are the jars on the classpath when I try to run our Flink job in a
> local environment (via `sbt run`):
>
>
> https://gist.githubusercontent.com/zcox/0992aba1c517b51dc879/raw/7136ec034c2beef04bd65de9f125ce3796db511f/gistfile1.txt
>
> There are many transitive dependencies pulled in from internal library
> projects that probably need to be cleaned out. Maybe we are including
> something that conflicts? Or maybe something important is being excluded?
>
> Are the asm classes included in Flink jars in some shaded form?
>
> Thanks,
> Zach
>
>
> On Thu, Mar 10, 2016 at 5:06 AM Stephan Ewen  wrote:
>
>> Dependency shading changed a bit between RC4 and RC5 - maybe a different
>> minor ASM version is now included in the "test" scope.
>>
>> Can you share the dependencies of the problematic project?
>>
>> On Thu, Mar 10, 2016 at 12:26 AM, Zach Cox  wrote:
>>
>>> I also noticed when I try to run this application in a local
>>> environment, I get the same IllegalArgumentException.
>>>
>>> When I assemble this application into a fat jar and run it on a Flink
>>> cluster using the CLI tools, it seems to run fine.
>>>
>>> Maybe my local classpath is missing something that is provided on the
>>> Flink task managers?
>>>
>>> -Zach
>>>
>>>
>>> On Wed, Mar 9, 2016 at 5:16 PM Zach Cox  wrote:
>>>
 Hi - after upgrading to 1.0.0, I'm getting this exception now in a unit
 test:

IllegalArgumentException:   (null:-1)
 org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown
 Source)
 org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown
 Source)

 org.apache.flink.api.scala.InnerClosureFinder.(ClosureCleaner.scala:279)

 org.apache.flink.api.scala.ClosureCleaner$.getInnerClasses(ClosureCleaner.scala:95)

 org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:115)

 org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scalaClean(StreamExecutionEnvironment.scala:568)

 org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.scala:498)

 The line that causes that exception is just adding
 a FlinkKafkaConsumer08 source.

 ClassVisitor [1] seems to throw that IllegalArgumentException when it
 is not given a valid api version number, but InnerClosureFinder [2] looks
 fine to me.

 Any idea what might be causing this? This unit test worked fine with
 1.0.0-rc0 jars.

 Thanks,
 Zach

 [1]
 http://websvn.ow2.org/filedetails.php?repname=asm&path=%2Ftrunk%2Fasm%2Fsrc%2Forg%2Fobjectweb%2Fasm%2FClassVisitor.java
 [2]
 https://github.com/apache/flink/blob/master/flink-scala/src/main/scala/org/apache/flink/api/scala/ClosureCleaner.scala#L279



>>


Re: Checkpoint

2016-03-10 Thread Vijay Srinivasaraghavan
 Thanks Ufuk and Stephan.
I have added Identity mapper and disabled chaining. With that, I am able to see 
the backpressue alert on the identify mapper task.
I have noticed one thing that when I introduced delay (sleep) on the subsequent 
task, sometimes checkpoint is not working. I could see checkpoint trigger but 
the files are not moved from "pending" state. I will try to reproduce to find 
the pattern but are you aware of any such scenario?
RegardsVijay

On Thursday, March 10, 2016 2:51 AM, Stephan Ewen  wrote:
 

 Just to be sure: Is the task whose backpressure you want to monitor the Kafka 
Source?
There is an open issue that backpressure monitoring does not work for the Kafka 
Source: https://issues.apache.org/jira/browse/FLINK-3456
To circumvent that, add an "IdentityMapper" after the Kafka source, make sure 
it is non-chained, and monitor the backpressure on that MapFunction.
Greetings,Stephan

On Thu, Mar 10, 2016 at 11:23 AM, Robert Metzger  wrote:

Hi Vijay,

regarding your other questions:

1) On the TaskManagers, the FlinkKafkaConsumers will write the partitions they 
are going to read in the log. There is currently no way of seeing the state of 
a checkpoint in Flink (which is the offsets).
However, once a checkpoint is completed, the Kafka consumer is committing the 
offset to the Kafka broker. (I could not find tool to get the committed offsets 
from the broker, but its either stored in ZK or in a special topic by the 
broker. In Kafka 0.8 that's easily doable with the 
kafka.tools.ConsumerOffsetChecker)

2) Do you see duplicate data written by the rolling file sink? Or do you see it 
somewhere else?HDP 2.4 is using Hadoop 2.7.1 so the truncate() of invalid data 
should actually work properly.




On Thu, Mar 10, 2016 at 10:44 AM, Ufuk Celebi  wrote:

How many vertices does the web interface show and what parallelism are
you running? If the sleeping operator is chained you will not see
anything.

If your goal is to just see some back pressure warning, you can call
env.disableOperatorChaining() and re-run the program. Does this work?

– Ufuk


On Thu, Mar 10, 2016 at 1:35 AM, Vijay Srinivasaraghavan
 wrote:
> Hi Ufuk,
>
> I have increased the sampling size to 1000 and decreased the refresh
> interval by half. In my Kafka topic I have pumped million messages which is
> read by KafkaConsumer pipeline and then pass it to a transofmation step
> where I have introduced sleep (3 sec) for every single message received and
> the final step is HDFS sink using RollingSinc API.
>
> jobmanager.web.backpressure.num-samples: 1000
> jobmanager.web.backpressure.refresh-interval: 3
>
>
> I was hoping to see the backpressure tab from UI to display some warning but
> I still see "OK" message.
>
> This makes me wonder if I am testing the backpressure scenario properly or
> not?
>
> Regards
> Vijay
>
> On Monday, March 7, 2016 3:19 PM, Ufuk Celebi  wrote:
>
>
> Hey Vijay!
>
> On Mon, Mar 7, 2016 at 8:42 PM, Vijay Srinivasaraghavan
>  wrote:
>> 3) How can I simulate and verify backpressure? I have introduced some
>> delay
>> (Thread Sleep) in the job before the sink but the "backpressure" tab from
>> UI
>> does not show any indication of whether backpressure is working or not.
>
> If a task is slow, it is back pressuring upstream tasks, e.g. if your
> transformations have the sleep, the sources should be back pressured.
> It can happen that even with the sleep the tasks still produce their
> data as fast as they can and hence no back pressure is indicated in
> the web interface. You can increase the sleep to check this.
>
> The mechanism used to determine back pressure is based on sampling the
> stack traces of running tasks. You can increase the number of samples
> and/or decrease the delay between samples via config parameters shown
> in [1]. It can happen that the samples miss the back pressure
> indicators, but usually the defaults work fine.
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-web-frontend
>
>
>






  

Re: asm IllegalArgumentException with 1.0.0

2016-03-10 Thread Zach Cox
Here are the jars on the classpath when I try to run our Flink job in a
local environment (via `sbt run`):

https://gist.githubusercontent.com/zcox/0992aba1c517b51dc879/raw/7136ec034c2beef04bd65de9f125ce3796db511f/gistfile1.txt

There are many transitive dependencies pulled in from internal library
projects that probably need to be cleaned out. Maybe we are including
something that conflicts? Or maybe something important is being excluded?

Are the asm classes included in Flink jars in some shaded form?

Thanks,
Zach


On Thu, Mar 10, 2016 at 5:06 AM Stephan Ewen  wrote:

> Dependency shading changed a bit between RC4 and RC5 - maybe a different
> minor ASM version is now included in the "test" scope.
>
> Can you share the dependencies of the problematic project?
>
> On Thu, Mar 10, 2016 at 12:26 AM, Zach Cox  wrote:
>
>> I also noticed when I try to run this application in a local environment,
>> I get the same IllegalArgumentException.
>>
>> When I assemble this application into a fat jar and run it on a Flink
>> cluster using the CLI tools, it seems to run fine.
>>
>> Maybe my local classpath is missing something that is provided on the
>> Flink task managers?
>>
>> -Zach
>>
>>
>> On Wed, Mar 9, 2016 at 5:16 PM Zach Cox  wrote:
>>
>>> Hi - after upgrading to 1.0.0, I'm getting this exception now in a unit
>>> test:
>>>
>>>IllegalArgumentException:   (null:-1)
>>> org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown
>>> Source)
>>> org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown
>>> Source)
>>>
>>> org.apache.flink.api.scala.InnerClosureFinder.(ClosureCleaner.scala:279)
>>>
>>> org.apache.flink.api.scala.ClosureCleaner$.getInnerClasses(ClosureCleaner.scala:95)
>>>
>>> org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:115)
>>>
>>> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scalaClean(StreamExecutionEnvironment.scala:568)
>>>
>>> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.scala:498)
>>>
>>> The line that causes that exception is just adding
>>> a FlinkKafkaConsumer08 source.
>>>
>>> ClassVisitor [1] seems to throw that IllegalArgumentException when it is
>>> not given a valid api version number, but InnerClosureFinder [2] looks fine
>>> to me.
>>>
>>> Any idea what might be causing this? This unit test worked fine with
>>> 1.0.0-rc0 jars.
>>>
>>> Thanks,
>>> Zach
>>>
>>> [1]
>>> http://websvn.ow2.org/filedetails.php?repname=asm&path=%2Ftrunk%2Fasm%2Fsrc%2Forg%2Fobjectweb%2Fasm%2FClassVisitor.java
>>> [2]
>>> https://github.com/apache/flink/blob/master/flink-scala/src/main/scala/org/apache/flink/api/scala/ClosureCleaner.scala#L279
>>>
>>>
>>>
>


Re: Flink loading an S3 File out of order

2016-03-10 Thread Fabian Hueske
Hi Benjamin,

Flink reads data usually in parallel. This is done by splitting the input
(e.g., a file) into several input splits. Each input split is independently
processed. Since splits are usually concurrently processed by more than one
task, Flink does not care about the order by default.

You can implement a special InputFormat that uses a custom
InputSplitAssigner to ensure that splits are handed out in order.
This would requires a bit of coding though.

A DataSet is usually distributed among multiple partitions/tasks and does
also not have the concept (complete) order. It is possible to sort the data
of a data set in each individual partition by calling
DataSet.sortPartition(key, order). If you do that with a parallelism of one
(DataSet.sortPartition().setParallelism(1)), you'll have a fully ordered
data set, however only on one machine.
Flink does also support range partitioning (DataSet.partitionByRange()) in
case you want to sort the data in parallel.

Best, Fabian

2016-03-10 16:52 GMT+01:00 Benjamin Kadish :

> I am trying to read a file from S3 in the correct order. It seems to be
> that Flink is downloading the file out of order, or at least its
> constructing the DataSet out of order. I
> tried using hadoop to download the file and it seemed to download it in
> order.
> I am able to reproduce the problem with the following line:
>
> env.readTextFileWithValue(conf.options.get(S3FileName).get)
>
>.writeAsText(s"${conf.output}/output",writeMode = 
> FileSystem.WriteMode.OVERWRITE)
>
> The output looks something like
>
> line 1001
> line 1002
> ...
> line 1304
> line 1
>
> Is there a way to guarantee order?
>
> --
> Benjamin Kadish
> (260) 441-6159
>


Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread Stephan Ewen
The following issue should track that.
https://issues.apache.org/jira/browse/FLINK-3602

@Niels: Thanks for looking into this. At this point, I think it may
actually be a Flink issue, since it concerns the interaction of Avro and
Flink's TypeInformation.

On Thu, Mar 10, 2016 at 6:00 PM, Stephan Ewen  wrote:

> Hi!
>
> I think that is a TypeExtractor bug. It may actually be a bug for all
> recursive types.
> Let's check this and come up with a fix...
>
> Greetings,
> Stephan
>
>
> On Thu, Mar 10, 2016 at 4:11 PM, David Kim <
> david@braintreepayments.com> wrote:
>
>> Hello!
>>
>> Just wanted to check up on this again. Has anyone else seen this before
>> or have any suggestions?
>>
>> Thanks!
>> David
>>
>> On Tue, Mar 8, 2016 at 12:12 PM, David Kim <
>> david@braintreepayments.com> wrote:
>>
>>> Hello all,
>>>
>>> I'm running into a StackOverflowError using flink 1.0.0. I have an Avro
>>> schema that has a self reference. For example:
>>>
>>> item.avsc
>>>
>>> {
>>>
>>>   "namespace": "..."
>>>
>>>   "type": "record"
>>>   "name": "Item",
>>>   "fields": [
>>> {
>>>   "name": "parent"
>>>   "type": ["null, "Item"]
>>> }
>>>   ]
>>> }
>>>
>>>
>>> When running my flink job, I'm running into the follow error:
>>>
>>> Exception in thread "Thread-94" java.lang.StackOverflowError
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.countTypeInHierarchy(TypeExtractor.java:1105)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1397)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>> at 
>>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>> at 
>>> org.apache.flink.api.java.typeutils.AvroTypeInfo.(AvroTypeInfo.java:48)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1394)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>> at 
>>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>> at 
>>> org.apache.flink.api.java.typeutils.AvroTypeInfo.(AvroTypeInfo.java:48)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1394)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>> at 
>>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>> at 
>>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>>
>>>
>>> Interestingly if I change the type to an Avro array in the schema, this
>>> error is not thrown.
>>>
>>> Thanks!
>>> David
>>>
>>
>>
>>
>> --
>> Note: this information is confidential. It is prohibited to share, post
>> online or otherwise publicize without Braintree's prior written consent.
>>
>
>


Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread Stephan Ewen
Hi!

I think that is a TypeExtractor bug. It may actually be a bug for all
recursive types.
Let's check this and come up with a fix...

Greetings,
Stephan


On Thu, Mar 10, 2016 at 4:11 PM, David Kim 
wrote:

> Hello!
>
> Just wanted to check up on this again. Has anyone else seen this before or
> have any suggestions?
>
> Thanks!
> David
>
> On Tue, Mar 8, 2016 at 12:12 PM, David Kim <
> david@braintreepayments.com> wrote:
>
>> Hello all,
>>
>> I'm running into a StackOverflowError using flink 1.0.0. I have an Avro
>> schema that has a self reference. For example:
>>
>> item.avsc
>>
>> {
>>
>>   "namespace": "..."
>>
>>   "type": "record"
>>   "name": "Item",
>>   "fields": [
>> {
>>   "name": "parent"
>>   "type": ["null, "Item"]
>> }
>>   ]
>> }
>>
>>
>> When running my flink job, I'm running into the follow error:
>>
>> Exception in thread "Thread-94" java.lang.StackOverflowError
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.countTypeInHierarchy(TypeExtractor.java:1105)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1397)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.(AvroTypeInfo.java:48)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1394)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.(AvroTypeInfo.java:48)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1394)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>
>>
>> Interestingly if I change the type to an Avro array in the schema, this
>> error is not thrown.
>>
>> Thanks!
>> David
>>
>
>
>
> --
> Note: this information is confidential. It is prohibited to share, post
> online or otherwise publicize without Braintree's prior written consent.
>


Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread Niels Basjes
Hi,

Please try to reproduce the problem in a simple (commandline) Java
application (i.e. without Flink and such, just Avro).
If you can reproduce it with Avro 1.8.0 then please file a bug report
(preferable with the simplest reproduction path you can come up with) via.
https://issues.apache.org/jira/browse/AVRO/

Thanks

Niels Basjes

On Thu, Mar 10, 2016 at 4:11 PM, David Kim 
wrote:

> Hello!
>
> Just wanted to check up on this again. Has anyone else seen this before or
> have any suggestions?
>
> Thanks!
> David
>
> On Tue, Mar 8, 2016 at 12:12 PM, David Kim <
> david@braintreepayments.com> wrote:
>
>> Hello all,
>>
>> I'm running into a StackOverflowError using flink 1.0.0. I have an Avro
>> schema that has a self reference. For example:
>>
>> item.avsc
>>
>> {
>>
>>   "namespace": "..."
>>
>>   "type": "record"
>>   "name": "Item",
>>   "fields": [
>> {
>>   "name": "parent"
>>   "type": ["null, "Item"]
>> }
>>   ]
>> }
>>
>>
>> When running my flink job, I'm running into the follow error:
>>
>> Exception in thread "Thread-94" java.lang.StackOverflowError
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.countTypeInHierarchy(TypeExtractor.java:1105)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1397)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.(AvroTypeInfo.java:48)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1394)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.(AvroTypeInfo.java:48)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1394)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>>  at 
>> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>>  at 
>> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>>
>>
>> Interestingly if I change the type to an Avro array in the schema, this
>> error is not thrown.
>>
>> Thanks!
>> David
>>
>
>
>
> --
> Note: this information is confidential. It is prohibited to share, post
> online or otherwise publicize without Braintree's prior written consent.
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Fwd: Flink loading an S3 File out of order

2016-03-10 Thread Benjamin Kadish
I am trying to read a file from S3 in the correct order. It seems to be
that Flink is downloading the file out of order, or at least its
constructing the DataSet out of order. I
tried using hadoop to download the file and it seemed to download it in
order.
I am able to reproduce the problem with the following line:

env.readTextFileWithValue(conf.options.get(S3FileName).get)

   .writeAsText(s"${conf.output}/output",writeMode =
FileSystem.WriteMode.OVERWRITE)

The output looks something like

line 1001
line 1002
...
line 1304
line 1

Is there a way to guarantee order?

-- 
Benjamin Kadish
(260) 441-6159


Re: TaskManager unable to register with JobManager

2016-03-10 Thread Ufuk Celebi
Hey Ravinder,

check out the following config keys:

blob.server.port
taskmanager.rpc.port
taskmanager.data.port


– Ufuk


On Wed, Feb 10, 2016 at 4:06 PM, Ravinder Kaur  wrote:
> Hello Fabian,
>
> Thank you very much for the resource. I had already gone through this and
> have found port '6123' as default for taskmanager registration. But I want
> to know the specific range of ports the taskmanager access during job
> execution.
>
> The taskmanager always tries to access a random port during job execution
> for which I need to disable firewall using 'ufw allow port' during the
> execution, otherwise the job hangs and finally fails. So I  wanted to know a
> particular range of ports which I can specify in the iptables to always
> allow access.
>
>
> Kind Regards,
> Ravinder Kaur
>
> On Wed, Feb 10, 2016 at 2:16 PM, Fabian Hueske  wrote:
>>
>> Hi Ravinder,
>>
>> please have a look at the configuration documentation:
>>
>> -->
>> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-amp-taskmanager
>>
>> Best, Fabian
>>
>> 2016-02-10 13:55 GMT+01:00 Ravinder Kaur :
>>>
>>> Hello All,
>>>
>>> I need to know the range of ports that are being used during the
>>> master/slave communication in the Flink cluster. Also is there a way I can
>>> specify a range of ports, at the slaves, to restrict them to connect to
>>> master only in this range?
>>>
>>> Kind Regards,
>>> Ravinder Kaur
>>>
>>>
>>> On Wed, Feb 3, 2016 at 10:09 PM, Stephan Ewen  wrote:

 Can machines connect to port 6123? The firewall may block that port, put
 permit SSH.

 On Wed, Feb 3, 2016 at 9:52 PM, Ravinder Kaur 
 wrote:
>
> Hello,
>
> Here is the log file of Jobmanager. I did not see some thing suspicious
> and as it suggests the ports are also listening.
>
> 20:58:46,906 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager on IP-of-master:6123 with execution mode CLUSTER and
> streaming mode BATCH_ONLY
> 20:58:46,978 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - Security is not enabled. Starting non-authenticated JobManager.
> 20:58:46,979 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager
> 20:58:46,980 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager actor system at 10.155.208.138:6123
> 20:58:48,196 INFO  akka.event.slf4j.Slf4jLogger
> - Slf4jLogger started
> 20:58:48,295 INFO  Remoting
> - Starting remoting
> 20:58:48,541 INFO  Remoting
> - Remoting started; listening on addresses
> :[akka.tcp://flink@IP-of-master:6123]
> 20:58:48,549 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManger web frontend
> 20:58:48,690 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using
> directory /tmp/flink-web-876a4755-4f38-4ff7-8202-f263afa9b986 for the web
> interface files
> 20:58:48,691 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Serving 
> job
> manager log from
> /home/flink/flink-0.10.1/log/flink-flink-jobmanager-0-hostname.log
> 20:58:48,691 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Serving 
> job
> manager stdout from
> /home/flink/flink-0.10.1/log/flink-flink-jobmanager-0-hostname.out
> 20:58:49,044 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web 
> frontend
> listening at 0:0:0:0:0:0:0:0:8081
> 20:58:49,045 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager actor
> 20:58:49,052 INFO  org.apache.flink.runtime.blob.BlobServer
> - Created BLOB server storage directory
> /tmp/blobStore-e0c52bfb-2411-4a83-ac8d-5664a5894258
> 20:58:49,054 INFO  org.apache.flink.runtime.blob.BlobServer
> - Started BLOB server at 0.0.0.0:43683 - max concurrent requests: 50 - max
> backlog: 1000
> 20:58:49,075 INFO  org.apache.flink.runtime.jobmanager.MemoryArchivist
> - Started memory archivist akka://flink/user/archive
> 20:58:49,075 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - Starting JobManager at 
> akka.tcp://flink@IP-of-master:6123/user/jobmanager.
> 20:58:49,081 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - JobManager akka.tcp://flink@IP-of-master:6123/user/jobmanager was 
> granted
> leadership with leader session ID None.
> 20:58:49,082 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting
> with JobManager akka.tcp://flink@IP-of-master:6123/user/jobmanager on port
> 8081
> 20:58:49,083 INFO
> org.apache.flink.runtime.webmonitor.JobManagerRetriever   - New leader
> reachable under akka.tcp://flink@IP-of-master:6123/user/jobmanager:null.
> 20:59:22,794 INFO  org.apache.flink.runtime.jobmanager.JobManager
> - Submitting job 72733d69588678ec224003ab55

Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread David Kim
Hello!

Just wanted to check up on this again. Has anyone else seen this before or
have any suggestions?

Thanks!
David

On Tue, Mar 8, 2016 at 12:12 PM, David Kim 
wrote:

> Hello all,
>
> I'm running into a StackOverflowError using flink 1.0.0. I have an Avro
> schema that has a self reference. For example:
>
> item.avsc
>
> {
>
>   "namespace": "..."
>
>   "type": "record"
>   "name": "Item",
>   "fields": [
> {
>   "name": "parent"
>   "type": ["null, "Item"]
> }
>   ]
> }
>
>
> When running my flink job, I'm running into the follow error:
>
> Exception in thread "Thread-94" java.lang.StackOverflowError
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.countTypeInHierarchy(TypeExtractor.java:1105)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1397)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>   at 
> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>   at 
> org.apache.flink.api.java.typeutils.AvroTypeInfo.(AvroTypeInfo.java:48)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1394)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>   at 
> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>   at 
> org.apache.flink.api.java.typeutils.AvroTypeInfo.(AvroTypeInfo.java:48)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1394)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1319)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:609)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1531)
>   at 
> org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:53)
>
>
> Interestingly if I change the type to an Avro array in the schema, this
> error is not thrown.
>
> Thanks!
> David
>



-- 
Note: this information is confidential. It is prohibited to share, post
online or otherwise publicize without Braintree's prior written consent.


RE: operators

2016-03-10 Thread Radu Tudoran
Hi,

It would not be feasible actually to use kafka queues or the DFS. Could you 
point me at which level of API I could access the CoLocationConstraint? Is it 
accessible from the  DataSourceStream or from the operator directly?

I have also dig  through the documentation and API and I was curious to 
understand a bit what can the “slotSharingGroup” and “startNewResouceGroup()” 
can do.

I did not find though a good example..only this link 
https://issues.apache.org/jira/browse/FLINK-3315

Also, for the “slotSharingGroup” it doesn’t seem to be available (I am 
currently using flink 0.10) – so if it is something that came newer than I 
guess this is the explanation why I cannot find it in any of datastream api or 
source function

Thanks for the info.


From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of Stephan 
Ewen
Sent: Wednesday, March 09, 2016 6:30 PM
To: user@flink.apache.org
Subject: Re: operators

Hi!

You cannot specify that on the higher API levels. The lower API levels have 
something called "CoLocationConstraint". At this point it is not exposed, 
because we thought that would lead to not very scalable and robust designs in 
many cases
.
The best thing usually is location transparency and local affinity (as a 
performance optimization).
Is the file large, i.e., would it hurt to do it on a DFS? Or actually use a 
Kafka Queue between the operators?

Stephan


On Wed, Mar 9, 2016 at 5:38 PM, Radu Tudoran 
mailto:radu.tudo...@huawei.com>> wrote:
Hi,

Is there any way in which you can ensure that 2 distinct operators will be 
executed on the same machine?
More precisely what I am trying to do is to have a window that computes some 
metrics and will dump this locally (from the operator not from an output sink) 
and I would like to create independent of this (or event within the operator) a 
stream source to emit this data. I cannot

The schema would be something as below:

Stream ->  operator   -> output
|
  Local file
  |
Stream source -> new stream

.=> the red items should go on the same machine

Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, 
www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!




Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Zach Cox
I see the new Event Time docs page, thanks for fixing that! I like the
additional explanation of event time and watermarks.

I also updated our TimestampExtractors to AssignerWithPeriodicWatermarks as
described in [1]. I like the separation between periodic and punctuated
watermark assigners in the new API - it's definitely more clear how each
one operates.

Thanks,
Zach

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/event_timestamps_watermarks.html


On Thu, Mar 10, 2016 at 3:33 AM Ufuk Celebi  wrote:

> Just removed the page. Triggering a new docs build...
>
> On Thu, Mar 10, 2016 at 10:22 AM, Aljoscha Krettek 
> wrote:
> > Then Stephan should have removed the old doc when adding the new one… :-)
> >> On 10 Mar 2016, at 10:20, Ufuk Celebi  wrote:
> >>
> >> Just talked with Stephan: the document you are referring to is stale.
> >> Can you check out this one here:
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/event_time.html
> >>
> >> – Ufuk
> >>
> >>
> >> On Thu, Mar 10, 2016 at 10:17 AM, Ufuk Celebi  wrote:
> >>> I've added this to the migration guide here:
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.10.x+to+1.0.x
> >>>
> >>> Feel free to add any other API changes that are missing there.
> >>>
> >>> – Ufuk
> >>>
> >>>
> >>> On Thu, Mar 10, 2016 at 10:13 AM, Aljoscha Krettek <
> aljos...@apache.org> wrote:
>  Hi,
>  you’re right, this should be changed to
> “setStreamTimeCharacteristic(EventTime)” in the doc.
> 
>  Cheers,
>  Aljoscha
> > On 09 Mar 2016, at 23:21, Zach Cox  wrote:
> >
> > Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no
> longer has an enableTimestamps() method. Do we just not need to call that
> at all now?
> >
> > The docs still say to call it [1] - do they just need to be updated?
> >
> > Thanks,
> > Zach
> >
> > [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/time.html
> >
> 
> >
>


Re: asm IllegalArgumentException with 1.0.0

2016-03-10 Thread Stephan Ewen
Dependency shading changed a bit between RC4 and RC5 - maybe a different
minor ASM version is now included in the "test" scope.

Can you share the dependencies of the problematic project?

On Thu, Mar 10, 2016 at 12:26 AM, Zach Cox  wrote:

> I also noticed when I try to run this application in a local environment,
> I get the same IllegalArgumentException.
>
> When I assemble this application into a fat jar and run it on a Flink
> cluster using the CLI tools, it seems to run fine.
>
> Maybe my local classpath is missing something that is provided on the
> Flink task managers?
>
> -Zach
>
>
> On Wed, Mar 9, 2016 at 5:16 PM Zach Cox  wrote:
>
>> Hi - after upgrading to 1.0.0, I'm getting this exception now in a unit
>> test:
>>
>>IllegalArgumentException:   (null:-1)
>> org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown
>> Source)
>> org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown
>> Source)
>>
>> org.apache.flink.api.scala.InnerClosureFinder.(ClosureCleaner.scala:279)
>>
>> org.apache.flink.api.scala.ClosureCleaner$.getInnerClasses(ClosureCleaner.scala:95)
>> org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:115)
>>
>> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scalaClean(StreamExecutionEnvironment.scala:568)
>>
>> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.scala:498)
>>
>> The line that causes that exception is just adding a FlinkKafkaConsumer08
>> source.
>>
>> ClassVisitor [1] seems to throw that IllegalArgumentException when it is
>> not given a valid api version number, but InnerClosureFinder [2] looks fine
>> to me.
>>
>> Any idea what might be causing this? This unit test worked fine with
>> 1.0.0-rc0 jars.
>>
>> Thanks,
>> Zach
>>
>> [1]
>> http://websvn.ow2.org/filedetails.php?repname=asm&path=%2Ftrunk%2Fasm%2Fsrc%2Forg%2Fobjectweb%2Fasm%2FClassVisitor.java
>> [2]
>> https://github.com/apache/flink/blob/master/flink-scala/src/main/scala/org/apache/flink/api/scala/ClosureCleaner.scala#L279
>>
>>
>>


Re: Checkpoint

2016-03-10 Thread Stephan Ewen
Just to be sure: Is the task whose backpressure you want to monitor the
Kafka Source?

There is an open issue that backpressure monitoring does not work for the
Kafka Source: https://issues.apache.org/jira/browse/FLINK-3456

To circumvent that, add an "IdentityMapper" after the Kafka source, make
sure it is non-chained, and monitor the backpressure on that MapFunction.

Greetings,
Stephan


On Thu, Mar 10, 2016 at 11:23 AM, Robert Metzger 
wrote:

> Hi Vijay,
>
> regarding your other questions:
>
> 1) On the TaskManagers, the FlinkKafkaConsumers will write the partitions
> they are going to read in the log. There is currently no way of seeing the
> state of a checkpoint in Flink (which is the offsets).
> However, once a checkpoint is completed, the Kafka consumer is committing
> the offset to the Kafka broker. (I could not find tool to get the committed
> offsets from the broker, but its either stored in ZK or in a special topic
> by the broker. In Kafka 0.8 that's easily doable with the
> kafka.tools.ConsumerOffsetChecker)
>
> 2) Do you see duplicate data written by the rolling file sink? Or do you
> see it somewhere else?
> HDP 2.4 is using Hadoop 2.7.1 so the truncate() of invalid data should
> actually work properly.
>
>
>
>
>
> On Thu, Mar 10, 2016 at 10:44 AM, Ufuk Celebi  wrote:
>
>> How many vertices does the web interface show and what parallelism are
>> you running? If the sleeping operator is chained you will not see
>> anything.
>>
>> If your goal is to just see some back pressure warning, you can call
>> env.disableOperatorChaining() and re-run the program. Does this work?
>>
>> – Ufuk
>>
>>
>> On Thu, Mar 10, 2016 at 1:35 AM, Vijay Srinivasaraghavan
>>  wrote:
>> > Hi Ufuk,
>> >
>> > I have increased the sampling size to 1000 and decreased the refresh
>> > interval by half. In my Kafka topic I have pumped million messages
>> which is
>> > read by KafkaConsumer pipeline and then pass it to a transofmation step
>> > where I have introduced sleep (3 sec) for every single message received
>> and
>> > the final step is HDFS sink using RollingSinc API.
>> >
>> > jobmanager.web.backpressure.num-samples: 1000
>> > jobmanager.web.backpressure.refresh-interval: 3
>> >
>> >
>> > I was hoping to see the backpressure tab from UI to display some
>> warning but
>> > I still see "OK" message.
>> >
>> > This makes me wonder if I am testing the backpressure scenario properly
>> or
>> > not?
>> >
>> > Regards
>> > Vijay
>> >
>> > On Monday, March 7, 2016 3:19 PM, Ufuk Celebi  wrote:
>> >
>> >
>> > Hey Vijay!
>> >
>> > On Mon, Mar 7, 2016 at 8:42 PM, Vijay Srinivasaraghavan
>> >  wrote:
>> >> 3) How can I simulate and verify backpressure? I have introduced some
>> >> delay
>> >> (Thread Sleep) in the job before the sink but the "backpressure" tab
>> from
>> >> UI
>> >> does not show any indication of whether backpressure is working or not.
>> >
>> > If a task is slow, it is back pressuring upstream tasks, e.g. if your
>> > transformations have the sleep, the sources should be back pressured.
>> > It can happen that even with the sleep the tasks still produce their
>> > data as fast as they can and hence no back pressure is indicated in
>> > the web interface. You can increase the sleep to check this.
>> >
>> > The mechanism used to determine back pressure is based on sampling the
>> > stack traces of running tasks. You can increase the number of samples
>> > and/or decrease the delay between samples via config parameters shown
>> > in [1]. It can happen that the samples miss the back pressure
>> > indicators, but usually the defaults work fine.
>> >
>> >
>> > [1]
>> >
>> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-web-frontend
>> >
>> >
>> >
>>
>
>


Re: Checkpoint

2016-03-10 Thread Robert Metzger
Hi Vijay,

regarding your other questions:

1) On the TaskManagers, the FlinkKafkaConsumers will write the partitions
they are going to read in the log. There is currently no way of seeing the
state of a checkpoint in Flink (which is the offsets).
However, once a checkpoint is completed, the Kafka consumer is committing
the offset to the Kafka broker. (I could not find tool to get the committed
offsets from the broker, but its either stored in ZK or in a special topic
by the broker. In Kafka 0.8 that's easily doable with the
kafka.tools.ConsumerOffsetChecker)

2) Do you see duplicate data written by the rolling file sink? Or do you
see it somewhere else?
HDP 2.4 is using Hadoop 2.7.1 so the truncate() of invalid data should
actually work properly.




On Thu, Mar 10, 2016 at 10:44 AM, Ufuk Celebi  wrote:

> How many vertices does the web interface show and what parallelism are
> you running? If the sleeping operator is chained you will not see
> anything.
>
> If your goal is to just see some back pressure warning, you can call
> env.disableOperatorChaining() and re-run the program. Does this work?
>
> – Ufuk
>
>
> On Thu, Mar 10, 2016 at 1:35 AM, Vijay Srinivasaraghavan
>  wrote:
> > Hi Ufuk,
> >
> > I have increased the sampling size to 1000 and decreased the refresh
> > interval by half. In my Kafka topic I have pumped million messages which
> is
> > read by KafkaConsumer pipeline and then pass it to a transofmation step
> > where I have introduced sleep (3 sec) for every single message received
> and
> > the final step is HDFS sink using RollingSinc API.
> >
> > jobmanager.web.backpressure.num-samples: 1000
> > jobmanager.web.backpressure.refresh-interval: 3
> >
> >
> > I was hoping to see the backpressure tab from UI to display some warning
> but
> > I still see "OK" message.
> >
> > This makes me wonder if I am testing the backpressure scenario properly
> or
> > not?
> >
> > Regards
> > Vijay
> >
> > On Monday, March 7, 2016 3:19 PM, Ufuk Celebi  wrote:
> >
> >
> > Hey Vijay!
> >
> > On Mon, Mar 7, 2016 at 8:42 PM, Vijay Srinivasaraghavan
> >  wrote:
> >> 3) How can I simulate and verify backpressure? I have introduced some
> >> delay
> >> (Thread Sleep) in the job before the sink but the "backpressure" tab
> from
> >> UI
> >> does not show any indication of whether backpressure is working or not.
> >
> > If a task is slow, it is back pressuring upstream tasks, e.g. if your
> > transformations have the sleep, the sources should be back pressured.
> > It can happen that even with the sleep the tasks still produce their
> > data as fast as they can and hence no back pressure is indicated in
> > the web interface. You can increase the sleep to check this.
> >
> > The mechanism used to determine back pressure is based on sampling the
> > stack traces of running tasks. You can increase the number of samples
> > and/or decrease the delay between samples via config parameters shown
> > in [1]. It can happen that the samples miss the back pressure
> > indicators, but usually the defaults work fine.
> >
> >
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-web-frontend
> >
> >
> >
>


Re: Checkpoint

2016-03-10 Thread Ufuk Celebi
How many vertices does the web interface show and what parallelism are
you running? If the sleeping operator is chained you will not see
anything.

If your goal is to just see some back pressure warning, you can call
env.disableOperatorChaining() and re-run the program. Does this work?

– Ufuk


On Thu, Mar 10, 2016 at 1:35 AM, Vijay Srinivasaraghavan
 wrote:
> Hi Ufuk,
>
> I have increased the sampling size to 1000 and decreased the refresh
> interval by half. In my Kafka topic I have pumped million messages which is
> read by KafkaConsumer pipeline and then pass it to a transofmation step
> where I have introduced sleep (3 sec) for every single message received and
> the final step is HDFS sink using RollingSinc API.
>
> jobmanager.web.backpressure.num-samples: 1000
> jobmanager.web.backpressure.refresh-interval: 3
>
>
> I was hoping to see the backpressure tab from UI to display some warning but
> I still see "OK" message.
>
> This makes me wonder if I am testing the backpressure scenario properly or
> not?
>
> Regards
> Vijay
>
> On Monday, March 7, 2016 3:19 PM, Ufuk Celebi  wrote:
>
>
> Hey Vijay!
>
> On Mon, Mar 7, 2016 at 8:42 PM, Vijay Srinivasaraghavan
>  wrote:
>> 3) How can I simulate and verify backpressure? I have introduced some
>> delay
>> (Thread Sleep) in the job before the sink but the "backpressure" tab from
>> UI
>> does not show any indication of whether backpressure is working or not.
>
> If a task is slow, it is back pressuring upstream tasks, e.g. if your
> transformations have the sleep, the sources should be back pressured.
> It can happen that even with the sleep the tasks still produce their
> data as fast as they can and hence no back pressure is indicated in
> the web interface. You can increase the sleep to check this.
>
> The mechanism used to determine back pressure is based on sampling the
> stack traces of running tasks. You can increase the number of samples
> and/or decrease the delay between samples via config parameters shown
> in [1]. It can happen that the samples miss the back pressure
> indicators, but usually the defaults work fine.
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-web-frontend
>
>
>


Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Ufuk Celebi
Just removed the page. Triggering a new docs build...

On Thu, Mar 10, 2016 at 10:22 AM, Aljoscha Krettek  wrote:
> Then Stephan should have removed the old doc when adding the new one… :-)
>> On 10 Mar 2016, at 10:20, Ufuk Celebi  wrote:
>>
>> Just talked with Stephan: the document you are referring to is stale.
>> Can you check out this one here:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/event_time.html
>>
>> – Ufuk
>>
>>
>> On Thu, Mar 10, 2016 at 10:17 AM, Ufuk Celebi  wrote:
>>> I've added this to the migration guide here:
>>> https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.10.x+to+1.0.x
>>>
>>> Feel free to add any other API changes that are missing there.
>>>
>>> – Ufuk
>>>
>>>
>>> On Thu, Mar 10, 2016 at 10:13 AM, Aljoscha Krettek  
>>> wrote:
 Hi,
 you’re right, this should be changed to 
 “setStreamTimeCharacteristic(EventTime)” in the doc.

 Cheers,
 Aljoscha
> On 09 Mar 2016, at 23:21, Zach Cox  wrote:
>
> Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no longer 
> has an enableTimestamps() method. Do we just not need to call that at all 
> now?
>
> The docs still say to call it [1] - do they just need to be updated?
>
> Thanks,
> Zach
>
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/time.html
>

>


Re: streaming job reading from kafka stuck while cancelling

2016-03-10 Thread Ufuk Celebi
Hey Maciek! I'm working on the other proposed fix by closing the
buffer pool early. I expect the fix to make it into the next bugfix
release 1.0.1 (or 1.0.2 if 1.0.1 comes very soon).

– Ufuk


Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Aljoscha Krettek
Then Stephan should have removed the old doc when adding the new one… :-)
> On 10 Mar 2016, at 10:20, Ufuk Celebi  wrote:
> 
> Just talked with Stephan: the document you are referring to is stale.
> Can you check out this one here:
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/event_time.html
> 
> – Ufuk
> 
> 
> On Thu, Mar 10, 2016 at 10:17 AM, Ufuk Celebi  wrote:
>> I've added this to the migration guide here:
>> https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.10.x+to+1.0.x
>> 
>> Feel free to add any other API changes that are missing there.
>> 
>> – Ufuk
>> 
>> 
>> On Thu, Mar 10, 2016 at 10:13 AM, Aljoscha Krettek  
>> wrote:
>>> Hi,
>>> you’re right, this should be changed to 
>>> “setStreamTimeCharacteristic(EventTime)” in the doc.
>>> 
>>> Cheers,
>>> Aljoscha
 On 09 Mar 2016, at 23:21, Zach Cox  wrote:
 
 Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no longer 
 has an enableTimestamps() method. Do we just not need to call that at all 
 now?
 
 The docs still say to call it [1] - do they just need to be updated?
 
 Thanks,
 Zach
 
 [1] 
 https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/time.html
 
>>> 



Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Ufuk Celebi
Just talked with Stephan: the document you are referring to is stale.
Can you check out this one here:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/event_time.html

– Ufuk


On Thu, Mar 10, 2016 at 10:17 AM, Ufuk Celebi  wrote:
> I've added this to the migration guide here:
> https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.10.x+to+1.0.x
>
> Feel free to add any other API changes that are missing there.
>
> – Ufuk
>
>
> On Thu, Mar 10, 2016 at 10:13 AM, Aljoscha Krettek  
> wrote:
>> Hi,
>> you’re right, this should be changed to 
>> “setStreamTimeCharacteristic(EventTime)” in the doc.
>>
>> Cheers,
>> Aljoscha
>>> On 09 Mar 2016, at 23:21, Zach Cox  wrote:
>>>
>>> Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no longer 
>>> has an enableTimestamps() method. Do we just not need to call that at all 
>>> now?
>>>
>>> The docs still say to call it [1] - do they just need to be updated?
>>>
>>> Thanks,
>>> Zach
>>>
>>> [1] 
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/time.html
>>>
>>


Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Ufuk Celebi
I've added this to the migration guide here:
https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.10.x+to+1.0.x

Feel free to add any other API changes that are missing there.

– Ufuk


On Thu, Mar 10, 2016 at 10:13 AM, Aljoscha Krettek  wrote:
> Hi,
> you’re right, this should be changed to 
> “setStreamTimeCharacteristic(EventTime)” in the doc.
>
> Cheers,
> Aljoscha
>> On 09 Mar 2016, at 23:21, Zach Cox  wrote:
>>
>> Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no longer 
>> has an enableTimestamps() method. Do we just not need to call that at all 
>> now?
>>
>> The docs still say to call it [1] - do they just need to be updated?
>>
>> Thanks,
>> Zach
>>
>> [1] 
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/time.html
>>
>


Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Aljoscha Krettek
Hi,
you’re right, this should be changed to 
“setStreamTimeCharacteristic(EventTime)” in the doc. 

Cheers,
Aljoscha
> On 09 Mar 2016, at 23:21, Zach Cox  wrote:
> 
> Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no longer has 
> an enableTimestamps() method. Do we just not need to call that at all now?
> 
> The docs still say to call it [1] - do they just need to be updated?
> 
> Thanks,
> Zach
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/time.html
> 



Running Flink 1.0.0 on YARN

2016-03-10 Thread Ashutosh Kumar
I have a yarn setup with 1 master and 2 slaves.
When I run yarn session with  bin/yarn-session.sh -n 2 -jm 1024 -tm
1024 and  submit job with bin/flink run examples/batch/WordCount.jar , the
job succeeds . It shows status on yarn UI http://x.x.x.x:8088/cluster .
However it does not show anything on Flink UI http://x.x.x.x:8081/#/overview
.

Is this expected behavior ?

If I run using bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096
examples/batch/WordCount.jar then the job fails with following error.

 * java.lang.IllegalStateException: Update task on instance
451105022ff3b4cd6e2c307e239d1595 @ slave2 - 2 slots - URL:
akka.tcp://flink@*















*x.x.x.x:43272/user/taskmanager failed due to:at
org.apache.flink.runtime.executiongraph.Execution$6.onFailure(Execution.java:954)
at akka.dispatch.OnFailure.internal(Future.scala:228)at
akka.dispatch.OnFailure.internal(Future.scala:227)at
akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)at
akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)at
scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at
scala.runtime.AbstractPartialFunction.applyOrElse(AbstractPartialFunction.scala:25)
at
scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
at
scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:134)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)at
scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)Caused
by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka.tcp://flink@*













*x.x.x.x:43272/user/taskmanager#1361901425]] after [1 ms]at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)03/10/2016 08:36:46 Job
execution switched to status FAILING.*

*Thanks*

*Ashutosh*