Re: Data Transfer between TM should be encrypted

2016-10-12 Thread vinay patil
Hi Robert,

Thank you for that information, can you please let me know when 1.2 is
planned to release ?

Regards,
Vinay Patil

On Wed, Oct 12, 2016 at 4:17 AM, rmetzger0 [via Apache Flink User Mailing
List archive.]  wrote:

> Hi,
> I think that pull request will be merged for 1.2.
>
> On Fri, Oct 7, 2016 at 6:26 PM, vinay patil <[hidden email]
> > wrote:
>
>> Hi Stephan,
>>
>> https://github.com/apache/flink/pull/2518
>> Is this pull request going to be part of 1.2 release ? Just wanted to get
>> an idea on timelines so that I can pass on to the team.
>>
>>
>> Regards,
>> Vinay Patil
>>
>> On Thu, Sep 15, 2016 at 11:45 AM, Vijay Srinivasaraghavan <[hidden email]
>> > wrote:
>>
>>> Hi Vinay,
>>>
>>> There are some delays and we expect the PR to be created next week.
>>>
>>> Regards
>>> Vijay
>>>
>>> On Wednesday, September 14, 2016 5:41 PM, vinay patil <[hidden email]
>>> > wrote:
>>>
>>>
>>> Hi Vijay,
>>>
>>> Did you raise the PR for this task, I don't mind testing it out as well.
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>> On Tue, Aug 30, 2016 at 6:28 PM, Vinay Patil <[hidden email]> wrote:
>>>
>>> Hi Vijay,
>>>
>>> That's a good news for me. Eagerly waiting for this change so that I can
>>> integrate and test it before going live.
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>> On Tue, Aug 30, 2016 at 4:06 PM, Vijay Srinivasaraghavan [via Apache
>>> Flink User Mailing List archive.] <[hidden email]> wrote:
>>>
>>> Hi Stephan,
>>>
>>> The dev work is almost complete except the Yarn mode deployment stuff
>>> that needs to be patched. We are expecting to send a PR in a week or two.
>>>
>>> Regards
>>> Vijay
>>>
>>>
>>> On Tuesday, August 30, 2016 12:39 AM, Stephan Ewen <[hidden email]>
>>> wrote:
>>>
>>>
>>> Let me loop in Vijay, I think he is the one working on this and can
>>> probably give the best estimate when it can be expected.
>>>
>>> @vijay: For the SSL/TLS transport encryption - do you have an estimate
>>> for the timeline of that feature?
>>>
>>>
>>> On Mon, Aug 29, 2016 at 8:54 PM, vinay patil <[hidden email]> wrote:
>>>
>>> Hi Stephan,
>>>
>>> Thank you for your reply.
>>>
>>> Till when can I expect this feature to be integrated in master or
>>> release version ?
>>>
>>> We are going to get production data (financial data) in October end , so
>>> want to have this feature before that.
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>> On Mon, Aug 29, 2016 at 11:15 AM, Stephan Ewen [via Apache Flink User
>>> Mailing List archive.] <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> The way that the JIRA issue you linked will achieve this is by hooking
>>> into the network stream pipeline directly, and encrypt the raw network byte
>>> stream. We built the network stack on Netty, and will use Netty's SSL/TLS
>>> handlers for that.
>>>
>>> That should be much more efficient than manual encryption/decryption in
>>> each user function.
>>>
>>> Stephan
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Aug 29, 2016 at 6:12 PM, vinay patil <[hidden email]> wrote:
>>>
>>> Hi Ufuk,
>>>
>>> This is regarding this issue
>>> https://issues.apache.org/jira /browse/FLINK-4404
>>> 
>>>
>>> How can we achieve this, I am able to decrypt the data from Kafka coming
>>> in, but I want to make sure that the data is encrypted when flowing between
>>> TM's.
>>>
>>> One approach I can think of is to decrypt the data at the start of each
>>> operator and encrypt it at the end of each operator, but I feel this is not
>>> an efficient approach.
>>>
>>> I just want to check if there are alternatives to this and can this be
>>> achieved by doing some configurations.
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>> --
>>> View this message in context: Data Transfer between TM should be
>>> encrypted
>>> 
>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>> archive
>>> 
>>> at Nabble.com.
>>>
>>>
>>>
>>>
>>> --
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> http://apache-flink-user-maili ng-list-archive.2336050.n4.
>>> nabble.com/Data-Transfer-betwe en-TM-should-be-encrypted- tp8781p8782.html
>>> 
>>> To start a new topic under Apache Flink User Mailing List archive.,
>>> email [hidden email]
>>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>>> NAML
>>> 

re: About Sliding window

2016-10-12 Thread Zhangrucong
Hi Kostas:
Thanks for your answer.

So in your previous figure (yesterday) when e3 arrives, also e2 should be 
included in the result, right?
--zhangrucong: In Oct 11 email, e2 is coming at 9:02, e3 is coming at 9:07, 
and the aging time is 5 mins. So When e3 coming, e2 is aged. E2 is not in the 
result!

In the mail, you say you have discussion. Can you show me the link , I want to 
take part in it.

Best wishes!

发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月12日 22:32
收件人: Zhangrucong
抄送: user@flink.apache.org; Aljoscha Krettek
主题: Re: About Sliding window

Hello,

So in your previous figure (yesterday) when e3 arrives, also e2 should be 
included in the result, right?

In this case, I think that what you need is a Session window with gap equal to 
your event aging duration and
an evictor that evicts the elements that lag behind more than the gap duration.

The latter, the evictor that I am describing, is not currently supported in 
Flink but there is an ongoing
discussion in the dev mailing list about it. So it is worth having a look there 
and participate in the discussion.

I also loop in Aljoscha in the discussion, in case he has another solution that 
you can deploy right-away.

Thanks,
Kostas

On Oct 12, 2016, at 3:36 PM, Zhangrucong 
mailto:zhangruc...@huawei.com>> wrote:

Hi Kostas:
It doesn’t matter. Can you see the picture? My user case is:

1、The events are coming according to the following order

At 9:01 e1 is coming
At 9:02 e2 is coming
At 9:06  e3 is coming
At 9:08   e4 is coming

The time is system time.

2、And  event aging time is 5 minutes.

3、
   At 9:01 e1 is coming, aged nothing, store e1,we count e1 and send the 
result.
   At 9:02 e2 is coming,  aged nothing, store e2,  We count e1 and e2. and 
send the result.
  At 9:06  e3 is coming,  aged e1,  store e3, we count e2 and e3, and send 
the result.
 At 9:08   e4 is coming,  aged e2,  store e4, we count e3 and e4, and send 
the result.


I think I need a certain duration window.

Thank you very much!
发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月12日 21:11
收件人: Zhangrucong
抄送: user@flink.apache.org
主题: Re: About Sliding window

Hello again,

Sorry for the delay but I cannot really understand your use case.
Could you explain a bit more what do you mean by “out-of-date” event and 
“aging” an event?

Also your windows are of a certain duration or global?

Thanks,
Kostas

On Oct 11, 2016, at 3:04 PM, Zhangrucong 
mailto:zhangruc...@huawei.com>> wrote:

Hi Kostas:
Thank you for your rapid response!

My use-case is that :
For every incoming event, we want to age the out-of-date event , count the 
event in window and send the result.

For example:
The events are coming as flowing:


We want flowing result:



By the way, In StreamSQL API, in FILP11, It will realize row window. It seems 
that the function of Slide Event-time row-window suits my use-case. Does data 
stream API  support row window?

Thanks !

发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月11日 19:38
收件人: user@flink.apache.org
主题: Re: About Sliding window

Hi Zhangrucong,

Sliding windows only support time-based slide.
So your use-case is not supported out-of-the-box.

But, if you describe a bit more what you want to do,
we may be able to find a way together to do your job using
the currently offered functionality.

Kostas

On Oct 11, 2016, at 1:20 PM, Zhangrucong 
mailto:zhangruc...@huawei.com>> wrote:

Hello everyone:
  Now, I am want to use DataStream sliding window API. I look at the API and I 
have a question, dose the sliding time window support sliding by every incoming 
event?

Thanks in advance!







Re: java.lang.IllegalArgumentException: JDBC-Class not found. - org.postgresql.jdbc.Driver

2016-10-12 Thread Flavio Pompermaier
Hi Sunny,
As stated by Fabian try to see whether including the postgres classes in
the shaded jar solves the problem. If it doesn't, you're probably hitting
the same problem i had with an older version of Flink (
https://issues.apache.org/jira/plugins/servlet/mobile#issue/FLINK-4061) and
this you have to copy the postgres jar in the Flink lib directory.

Best,
Flavio

On 12 Oct 2016 23:36, "Fabian Hueske"  wrote:

> Hi Sunny,
>
> please avoid crossposting to all mailing lists.
> The dev@f.a.o list is for issues related to the development of Flink not
> the development of Flink applications.
>
> The error message is actually quite descriptive. Flink does not find the
> JDBC driver class.
> You need to add it to the classpath for example by adding the
> corresponding Maven dependency to your pom file.
>
> Fabian
>
>
> 2016-10-12 23:18 GMT+02:00 sunny patel :
>
>>
>> Hi Guys,
>>
>> I am facing JDBC error, could you please some one advise me on this error?
>>
>> $ java -version
>>
>> java version "1.8.0_102"
>>
>> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
>>
>> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
>>
>> $ scala -version
>>
>> Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
>>
>>
>> === Scala Code
>>
>> import org.apache.flink.api.common.typeinfo.TypeInformation
>> import org.apache.flink.api.java.io.jdbc.JDBCInputFormat
>> import org.apache.flink.api.scala._
>> import org.apache.flink.api.table.typeutils.RowTypeInfo
>>
>> object WordCount {
>>   def main(args: Array[String]) {
>>
>> val PATH = getClass.getResource("").getPath
>>
>> // set up the execution environment
>> val env = ExecutionEnvironment.getExecutionEnvironment
>>
>> // Read data from JDBC (Kylin in our case)
>> val stringColum: TypeInformation[Int] = createTypeInformation[Int]
>> val DB_ROWTYPE = new RowTypeInfo(Seq(stringColum))
>>
>> val inputFormat = JDBCInputFormat.buildJDBCInputFormat()
>>   .setDrivername("org.postgresql.jdbc.Driver")
>>   .setDBUrl("jdbc:postgresql://localhost:5432/mydb")
>>   .setUsername("MI")
>>   .setPassword("MI")
>>   .setQuery("select * FROM identity")
>>   .setRowTypeInfo(DB_ROWTYPE)
>>   .finish()
>>
>> val dataset =env.createInput(inputFormat)
>> dataset.print()
>>
>> println(PATH)
>>   }
>> }
>>
>> ==
>>
>> ==POM.XML
>>
>>
>> 
>> http://maven.apache.org/POM/4.0.0"; 
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
>> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
>> http://maven.apache.org/xsd/maven-4.0.0.xsd";>
>>4.0.0
>>
>>   flink-parent
>>   org.apache.flink
>>   1.2-SNAPSHOT
>>
>>
>>org.apache.flink.quickstart
>>flink-scala-project
>>0.1
>>jar
>>
>>Flink Quickstart Job
>>http://www.myorganization.org
>>
>>
>>   
>>  apache.snapshots
>>  Apache Development Snapshot Repository
>>  
>> https://repository.apache.org/content/repositories/snapshots/
>>  
>> false
>>  
>>  
>>  
>>   
>>
>>
>>
>>   UTF-8
>>   1.1.2
>>
>>
>>
>>
>>
>>   
>>  org.apache.flink
>>  flink-jdbc
>>  ${flink.version}
>>   
>>   
>>  org.apache.flink
>>  flink-table_2.11
>>  ${flink.version}
>>   
>>   
>>  org.apache.flink
>>  flink-scala_2.11
>>  ${flink.version}
>>   
>>   
>>  org.apache.flink
>>  flink-streaming-scala_2.11
>>  ${flink.version}
>>   
>>   
>>  org.apache.flink
>>  flink-clients_2.11
>>  ${flink.version}
>>   
>>
>>
>>
>>   
>>  
>>  build-jar
>>  
>>  
>>  
>> 
>>org.apache.flink
>>flink-scala_2.11
>>${flink.version}
>>provided
>> 
>> 
>>org.apache.flink
>>flink-streaming-scala_2.11
>>${flink.version}
>>provided
>> 
>> 
>>org.apache.flink
>>flink-clients_2.11
>>${flink.version}
>>provided
>> 
>>  
>>
>>  
>> 
>>
>>
>>   org.apache.maven.plugins
>>   maven-shade-plugin
>>   2.4.1
>>   
>>  
>> package
>> 
>>shade
>> 
>> 
>>
>>   
>>
>> 
>>  
>>   
>>   

Re: question about making a temporal Graph with Gelly

2016-10-12 Thread Greg Hogan
Hi Wouter,

Packing two or more values into the edge value using a Tuple is a common
practice. Does this work well for the algorithms you are writing?

Greg

On Wed, Oct 12, 2016 at 4:29 PM, Wouter Ligtenberg 
wrote:

> ​​Hi there,
>
> I'm currently working on making a Temporal Graph with Gelly, a Temporal
> graph is a graph where edges have 2 extra values namely a beginning and
> ending time.
>
> I started with this project a couple of weeks ago, since i don't have much
> experience with Gelly or Flink i wanted to ask you guys if you had some
> ideas on how to implement this idea into Gelly.
>
> I already tried extending Gelly Edges with a 5tuple instead of a 3tuple,
> but it seems that Gelly isn't made for that. After that i moved on on
> adding a 3tuple into the value field of an Edge.
>
> any other ideas?
>
> Best regards,
> Wouter Ligtenberg
> ​Student at TU Eindhoven​
>


question about making a temporal Graph with Gelly

2016-10-12 Thread Wouter Ligtenberg
​​Hi there,

I'm currently working on making a Temporal Graph with Gelly, a Temporal
graph is a graph where edges have 2 extra values namely a beginning and
ending time.

I started with this project a couple of weeks ago, since i don't have much
experience with Gelly or Flink i wanted to ask you guys if you had some
ideas on how to implement this idea into Gelly.

I already tried extending Gelly Edges with a 5tuple instead of a 3tuple,
but it seems that Gelly isn't made for that. After that i moved on on
adding a 3tuple into the value field of an Edge.

any other ideas?

Best regards,
Wouter Ligtenberg
​Student at TU Eindhoven​


Re: Error with table sql query - No match found for function signature TUMBLE(, )

2016-10-12 Thread Fabian Hueske
Hi Pedro,

support for window aggregations in SQL and Table API is currently work in
progress.
We have a pull request for the Table API and will add this feature for the
next release.
For SQL we depend on Apache Calcite to include the TUMBLE keyword in its
parser and optimizer.

At the moment the only way to do window aggregations is by using the
DataStream API.

Best, Fabian


2016-10-12 18:51 GMT+02:00 PedroMrChaves :

> Hello,
>
> I am trying to build an query using the StreamTableEnvironment API. I Am
> trying to build this queries with tableEnvironment.sql("QUERY") so that I
> can in the future load those queries from a file.
>
> Code source:
>
> Table accesses = tableEnvironment.sql
> ("SELECT STREAM TUMBLE_END(rowtime,
> INTERVAL '1' HOUR) AS rowtime,
> user,ip "
> + "FROM eventData "
> + "WHERE action='denied' "
> + "GROUP BY
> TUMBLE(rowtime, INTERVAL '1' HOUR) user,ip"
> + " HAVING COUNT(user,ip)
> > 5");
>
> But I always get the following error:
>
> /Caused by: org.apache.calcite.runtime.CalciteContextException: From line
> 1,
> column 120 to line 1, column 153: No match found for function signature
> TUMBLE(, )
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(
> Resources.java:405)
> at
> org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:765)
> at
> org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:753)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(
> SqlValidatorImpl.java:3929)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.handleUnresolvedFunction(
> SqlValidatorImpl.java:1544)
> at
> org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:278)
> at
> org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:222)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(
> SqlValidatorImpl.java:4266)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(
> SqlValidatorImpl.java:4253)
> at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:135)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(
> SqlValidatorImpl.java:1462)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.
> deriveType(SqlValidatorImpl.java:1445)
> at org.apache.calcite.sql.SqlNode.validateExpr(SqlNode.java:233)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateGroupClause(
> SqlValidatorImpl.java:3305)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(
> SqlValidatorImpl.java:2959)
> at
> org.apache.calcite.sql.validate.SelectNamespace.
> validateImpl(SelectNamespace.java:60)
> at
> org.apache.calcite.sql.validate.AbstractNamespace.
> validate(AbstractNamespace.java:86)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(
> SqlValidatorImpl.java:845)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(
> SqlValidatorImpl.java:831)
> at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:208)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(
> SqlValidatorImpl.java:807)
> at
> org.apache.calcite.sql.validate.SqlValidatorImpl.
> validate(SqlValidatorImpl.java:523)
> at
> org.apache.flink.api.table.FlinkPlannerImpl.validate(
> FlinkPlannerImpl.scala:84)
> ... 10 more
> Caused by: org.apache.calcite.sql.validate.SqlValidatorException: No match
> found for function signature TUMBLE(, )
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(
> Resources.java:405)
> at
> org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:514)
> /
>
> What am I doing wrong?
>
> Regards.
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Error-with-
> table-sql-query-No-match-found-for-function-signature

Re: Distributing Tasks over Task manager

2016-10-12 Thread Jürgen Thomann

Hi Robert,

Thanks for your suggestions. We are using the DataStream API and I tried 
it with disabling it completely, but that didn't help.


I attached the plan and to add some context, it starts with a Kafka 
source followed by a map operation ( parallelism 4). The next map is the 
expensive part with a parallelism of 18 which produces a Tuple2 which is 
used for splitting. Starting here the parallelism is always 2 except the 
sink with 1. Both resulting streams have two maps, a filter, one more 
map and are ending with an assignTimestampsAndWatermarks. If there is 
now a small box in the picture it is a filter operation and otherwise it 
goes directly to a keyBy, timewindow and apply operation followed by a sink.


If one task manager contains more sub tasks of the expensive map than 
any other task manager, everything later in the stream is running on the 
same task manager. If two task manager have the same amount of sub 
tasks, the following tasks with a parallelism of 2 are distributed over 
the two task manager.


Interesting is also that the task manager have 6 task slots configured 
and the expensive part has 6 sub tasks on one task manager but still 
everything later in the flow is running on this task manager. This also 
happens if operator chaining is disabled.


Best,
Jürgen


On 12.10.2016 17:43, Robert Metzger wrote:

Hi Jürgen,

Are you using the DataStream or the DataSet API?
Maybe the operator chaining is causing too many operations to be 
"packed" into one task. Check out this documentation page: 
https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_api.html#task-chaining-and-resource-groups 

You could try to disable chaining completely to see if that resolves 
the issue (you'll probably pay for this by having more serialization 
overhead and network traffic).


If my suggestions don't help, can you post a screenshot of your job 
plan (from the web interface) here, so that we see what operations you 
are performing?


Regards,
Robert



On Wed, Oct 12, 2016 at 12:52 PM, Jürgen Thomann 
mailto:juergen.thom...@innogames.com>> 
wrote:


Hi,

we currently have an issue with Flink, as it allocates many tasks
to the same task manager and as a result it overloads it. I
reduced the amount of task slots per task manager (keeping the CPU
count) and added some more servers but that did not help to
distribute the load.

Is there some way to force Flink to distribute the load/tasks on a
standalone cluster? I saw that
https://issues.apache.org/jira/browse/FLINK-1003
 would maybe
provide what we need, but that is currently not worked on as it seems.

Cheers,
Jürgen




Re: Flink Kafka Consumer Behaviour

2016-10-12 Thread Anchit Jatana
Hi Janardhan/Stephan,

I just figured out what the issue is (Talking about Flink KafkaConnector08,
don't know about Flink KafkaConnector09)

The reason why- bin/kafka-consumer-groups.sh --zookeeper
 --describe --group  is not showing any result
is because of the absence of the 

/consumers//owners/ in the zookeeper. 

The flink application is creating and updating
/consumers//offsets// but not creating "owners"
Znode 

If I manually create the Znode using the following:

create /consumers//owners “firstchildren”

create /consumers//owners/ null

It works fine, bin/kafka-consumer-groups.sh --zookeeper 
--describe --group  starts pulling offset results for me.

I think this needs to be corrected in the application: to check and create
"/consumers//owners/" if it does not exist.

Regards,
Anchit



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Kafka-Consumer-Behaviour-tp8257p9499.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Flink Kafka connector08 not updating the offsets with the zookeeper

2016-10-12 Thread Anchit Jatana
Hi Robert,

Thanks for your response. I just figured out what the issue is. 

The reason why- bin/kafka-consumer-groups.sh --zookeeper
 --describe --group  is not showing any result
is because of the absence of the 

/consumers//owners/ in the zookeeper. 

The flink application is creating and updating
/consumers//offsets// but not creating "owners"
Znode 

If I manually create the Znode using the following:

create /consumers//owners “firstchildren”

create /consumers//owners/ null

It works fine, bin/kafka-consumer-groups.sh --zookeeper 
--describe --group  starts pulling offset results for me.

I think this needs to be corrected in the application: to check and create
"/consumers//owners/" if it does not exist.

Regards,
Anchit



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Kafka-connector08-not-updating-the-offsets-with-the-zookeeper-tp9469p9498.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Error with table sql query - No match found for function signature TUMBLE(, )

2016-10-12 Thread PedroMrChaves
Hello,

I am trying to build an query using the StreamTableEnvironment API. I Am
trying to build this queries with tableEnvironment.sql("QUERY") so that I
can in the future load those queries from a file. 

Code source:

Table accesses = tableEnvironment.sql 
("SELECT STREAM TUMBLE_END(rowtime, INTERVAL 
'1' HOUR) AS rowtime,
user,ip "
+ "FROM eventData "
+ "WHERE action='denied' "
+ "GROUP BY TUMBLE(rowtime, 
INTERVAL '1' HOUR) user,ip"
+ " HAVING COUNT(user,ip) > 5");

But I always get the following error:

/Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1,
column 120 to line 1, column 153: No match found for function signature
TUMBLE(, )
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405)
at
org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:765)
at
org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:753)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:3929)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.handleUnresolvedFunction(SqlValidatorImpl.java:1544)
at
org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:278)
at
org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:222)
at
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4266)
at
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4253)
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:135)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1462)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1445)
at org.apache.calcite.sql.SqlNode.validateExpr(SqlNode.java:233)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateGroupClause(SqlValidatorImpl.java:3305)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:2959)
at
org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
at
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:86)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:845)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:831)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:208)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:807)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:523)
at
org.apache.flink.api.table.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:84)
... 10 more
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: No match
found for function signature TUMBLE(, )
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405)
at
org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:514)
/

What am I doing wrong?

Regards.



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-with-table-sql-query-No-match-found-for-function-signature-TUMBLE-TIME-INTERVAL-DAY-TIME-tp9497.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: bucketing in RollingSink

2016-10-12 Thread robert.lancaster
Hi Robert,

Thanks!   I’ll likely pursue option #2 and see if I can copy over the code from 
org.apache.flink….fs.bucketing.

Do you know a general timeline for when 1.2 will be released or perhaps a 
location where I could follow its progress?

Thanks again!

From: Robert Metzger 
Reply-To: "user@flink.apache.org" 
Date: Wednesday, October 12, 2016 at 5:50 PM
To: "user@flink.apache.org" 
Subject: Re: bucketing in RollingSink

Hi Robert,

I see two possible workarounds:
1) You use the unreleased Flink 1.2-SNAPSHOT version. From time to time, there 
are some unstable commits in that version, but most of the time, its quite 
stable.
We provide nightly binaries and maven artifacts for snapshot versions here: 
http://flink.apache.org/contribute-code.html#snapshots-nightly-builds

2) If that's too risky for you, you can also copy the code of the new 
(1.2-SNAPSHOT) bucketer from master into your project. The Apache license 
allows you to copy the code into your own projects, and I think the bucketer 
code doesn't rely on any features / APIs added in 1.2-SNAPSHOT, so you can 
probably run the new code on Flink 1.1 as well.

I hope that helps,

Regards,
Robert


On Wed, Oct 12, 2016 at 2:10 PM, 
mailto:robert.lancas...@hyatt.com>> wrote:
Hi Flinksters,

At one stage in my data stream, I want to save the stream to a set of rolling 
files where the file name used (i.e. the bucket) is chosen based on an 
attribute of each data record.  Specifically, I’m using a windowing function to 
create aggregates of certain metrics and I want to save that data in a file 
with a name that identifies the window.

I was planning to write my own bucketer for this, but in version 1.1.2 the 
Bucketer interface doesn’t allow for the element being processed to be passed 
to the relevant methods (e.g. getNextBucketPath and shouldStartNewBucket).  I 
see that this is taken care of in 1.2, but since that isn’t available yet, can 
anyone recommend a workaround?  Alternatively, is there a way to have the 
DateTimeBucketer use assigned timestamps instead of system time?


The information contained in this communication is confidential and intended 
only for the use of the recipient named above, and may be legally privileged 
and exempt from disclosure under applicable law. If the reader of this message 
is not the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. If you 
have received this communication in error, please resend it to the sender and 
delete the original message and copy of it from your computer system. Opinions, 
conclusions and other information in this message that do not relate to our 
official business should be understood as neither given nor endorsed by the 
company.



The information contained in this communication is confidential and intended 
only for the use of the recipient named above, and may be legally privileged 
and exempt from disclosure under applicable law. If the reader of this message 
is not the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. If you 
have received this communication in error, please resend it to the sender and 
delete the original message and copy of it from your computer system. Opinions, 
conclusions and other information in this message that do not relate to our 
official business should be understood as neither given nor endorsed by the 
company.


Re: bucketing in RollingSink

2016-10-12 Thread Robert Metzger
Hi Robert,

I see two possible workarounds:
1) You use the unreleased Flink 1.2-SNAPSHOT version. From time to time,
there are some unstable commits in that version, but most of the time, its
quite stable.
We provide nightly binaries and maven artifacts for snapshot versions here:
http://flink.apache.org/contribute-code.html#snapshots-nightly-builds

2) If that's too risky for you, you can also copy the code of the new
(1.2-SNAPSHOT) bucketer from master into your project. The Apache license
allows you to copy the code into your own projects, and I think the
bucketer code doesn't rely on any features / APIs added in 1.2-SNAPSHOT, so
you can probably run the new code on Flink 1.1 as well.

I hope that helps,

Regards,
Robert


On Wed, Oct 12, 2016 at 2:10 PM,  wrote:

> Hi Flinksters,
>
>
>
> At one stage in my data stream, I want to save the stream to a set of
> rolling files where the file name used (i.e. the bucket) is chosen based on
> an attribute of each data record.  Specifically, I’m using a windowing
> function to create aggregates of certain metrics and I want to save that
> data in a file with a name that identifies the window.
>
>
>
> I was planning to write my own bucketer for this, but in version 1.1.2 the
> Bucketer interface doesn’t allow for the element being processed to be
> passed to the relevant methods (e.g. getNextBucketPath and
> shouldStartNewBucket).  I see that this is taken care of in 1.2, but since
> that isn’t available yet, can anyone recommend a workaround?
> Alternatively, is there a way to have the DateTimeBucketer use assigned
> timestamps instead of system time?
>
> --
> The information contained in this communication is confidential and
> intended only for the use of the recipient named above, and may be legally
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any dissemination, distribution or copying of this communication is
> strictly prohibited. If you have received this communication in error,
> please resend it to the sender and delete the original message and copy of
> it from your computer system. Opinions, conclusions and other information
> in this message that do not relate to our official business should be
> understood as neither given nor endorsed by the company.
>


Re: Distributing Tasks over Task manager

2016-10-12 Thread Robert Metzger
Hi Jürgen,

Are you using the DataStream or the DataSet API?
Maybe the operator chaining is causing too many operations to be "packed"
into one task. Check out this documentation page:
https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_api.html#task-chaining-and-resource-groups

You could try to disable chaining completely to see if that resolves the
issue (you'll probably pay for this by having more serialization overhead
and network traffic).

If my suggestions don't help, can you post a screenshot of your job plan
(from the web interface) here, so that we see what operations you are
performing?

Regards,
Robert



On Wed, Oct 12, 2016 at 12:52 PM, Jürgen Thomann <
juergen.thom...@innogames.com> wrote:

> Hi,
>
> we currently have an issue with Flink, as it allocates many tasks to the
> same task manager and as a result it overloads it. I reduced the amount of
> task slots per task manager (keeping the CPU count) and added some more
> servers but that did not help to distribute the load.
>
> Is there some way to force Flink to distribute the load/tasks on a
> standalone cluster? I saw that https://issues.apache.org/jira
> /browse/FLINK-1003 would maybe provide what we need, but that is
> currently not worked on as it seems.
>
> Cheers,
> Jürgen
>


Re: About Sliding window

2016-10-12 Thread Kostas Kloudas
Hello,

So in your previous figure (yesterday) when e3 arrives, also e2 should be 
included in the result, right?

In this case, I think that what you need is a Session window with gap equal to 
your event aging duration and
an evictor that evicts the elements that lag behind more than the gap duration.

The latter, the evictor that I am describing, is not currently supported in 
Flink but there is an ongoing 
discussion in the dev mailing list about it. So it is worth having a look there 
and participate in the discussion.

I also loop in Aljoscha in the discussion, in case he has another solution that 
you can deploy right-away.

Thanks,
Kostas

> On Oct 12, 2016, at 3:36 PM, Zhangrucong  wrote:
> 
> Hi Kostas:
> It doesn’t matter. Can you see the picture? My user case is:
>  
> 1、The events are coming according to the following order
> 
> At 9:01 e1 is coming
> At 9:02 e2 is coming
> At 9:06  e3 is coming
> At 9:08   e4 is coming
>  
> The time is system time.
>  
> 2、And  event aging time is 5 minutes.
>  
> 3、
>At 9:01 e1 is coming, aged nothing, store e1,we count e1 and send the 
> result.
>At 9:02 e2 is coming,  aged nothing, store e2,  We count e1 and e2. 
> and send the result.
>   At 9:06  e3 is coming,  aged e1,  store e3, we count e2 and e3, and 
> send the result.
>  At 9:08   e4 is coming,  aged e2,  store e4, we count e3 and e4, and 
> send the result.
>  
>  
> I think I need a certain duration window.
>  
> Thank you very much!
> 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 
> 发送时间: 2016年10月12日 21:11
> 收件人: Zhangrucong
> 抄送: user@flink.apache.org
> 主题: Re: About Sliding window
>  
> Hello again,
>  
> Sorry for the delay but I cannot really understand your use case.
> Could you explain a bit more what do you mean by “out-of-date” event and 
> “aging” an event?
>  
> Also your windows are of a certain duration or global?
>  
> Thanks,
> Kostas
>  
> On Oct 11, 2016, at 3:04 PM, Zhangrucong  > wrote:
>  
> Hi Kostas:
> Thank you for your rapid response!
>  
> My use-case is that :
> For every incoming event, we want to age the out-of-date event , count the 
> event in window and send the result.
>  
> For example:
> The events are coming as flowing:
> 
>  
> We want flowing result:
> 
>  
>  
> By the way, In StreamSQL API, in FILP11, It will realize row window. It seems 
> that the function of Slide Event-time row-window suits my use-case. Does data 
> stream API  support row window?
>  
> Thanks !
>  
> 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com 
> ] 
> 发送时间: 2016年10月11日 19:38
> 收件人: user@flink.apache.org 
> 主题: Re: About Sliding window
>  
> Hi Zhangrucong,
>  
> Sliding windows only support time-based slide. 
> So your use-case is not supported out-of-the-box.
>  
> But, if you describe a bit more what you want to do, 
> we may be able to find a way together to do your job using 
> the currently offered functionality.
>  
> Kostas
>  
> On Oct 11, 2016, at 1:20 PM, Zhangrucong  > wrote:
>  
> Hello everyone:
>   Now, I am want to use DataStream sliding window API. I look at the API and 
> I have a question, dose the sliding time window support sliding by every 
> incoming event?
>  
> Thanks in advance!
>  
> 
>  
> 



Keyed join Flink Streaming

2016-10-12 Thread Adrienne Kole
Hi,

I have 2 streams which are partitioned based on key field. I want to join
those streams based on  key fields on windows. This is an example I saw in
the flink website:

val firstInput: DataStream[MyType] = ...
val secondInput: DataStream[AnotherType] = ...

val firstKeyed = firstInput.keyBy("userId")
val secondKeyed = secondInput.keyBy("id")

val result: DataStream[(MyType, AnotherType)] =
   firstKeyed.join(secondKeyed)
   onWindow(Time.of(5, SECONDS))

However, with current flink version,(1.1.2) I cannot do it. Basically even
if streams are keyed or not, I still have to specify the "where" and
"equal" clauses.

My question is that, is how can I implement keyed window joins in flink
streaming? And is there a difference between:

val firstInput: KeyedStream[MyType] = ...
val secondInput: KeyedStream[AnotherType] = ...
val result: DataStream[(MyType, AnotherType)] =
   firstKeyed.join(secondKeyed).where(..).equalTo(..).window(..).apply(..)

and


val firstInput: DataStream[MyType] = ...
val secondInput: DataStream[AnotherType] = ...
val result: DataStream[(MyType, AnotherType)] =
   firstKeyed.join(secondKeyed).where(..).equalTo(..).window(..).apply(..)


Thanks
Adrienne


Re: Tumbling window rich functionality

2016-10-12 Thread Robert Metzger
Hi,
apply() will be called for each key.

On Wed, Oct 12, 2016 at 2:25 PM, Swapnil Chougule 
wrote:

> Thanks Aljoscha.
>
> Whenever I am using WindowFunction.apply() on keyed stream, apply() will
> be called once or multiple times (equal to number of keys in that windowed
> stream)?
>
> Ex:
> DataStream dataStream = env
> .socketTextStream("localhost", )
> .flatMap(new Splitter())
> .keyBy(0)
> .timeWindow(Time.seconds(10))
> .apply(new WindowFunction,
> Boolean, Tuple, TimeWindow>() {
>
> @Override
> public void apply(Tuple key, TimeWindow window,
> Iterable> input,
> Collector out) throws Exception {
>  //Some business logic
> }
> });
>
> Regards,
> Swapnil
>
> On Wed, Sep 14, 2016 at 2:26 PM, Aljoscha Krettek 
> wrote:
>
>> Hi,
>> WindowFunction.apply() will be called once for each window so you should
>> be able to do the setup/teardown in there. open() and close() are called at
>> the start of processing, end of processing, respectively.
>>
>> Cheers,
>> Aljoscha
>>
>> On Wed, 14 Sep 2016 at 09:04 Swapnil Chougule 
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am using tumbling window functionality having window size 5 minutes.
>>> I want to perform setup & teardown functionality for each window. I
>>> tried using RichWindowFunction but it didn't work for me.
>>> Can anybody tell me how can I do it ?
>>>
>>> Attaching code snippet what I tried
>>>
>>> impressions.map(new LineItemAdUnitAggr()).keyBy(0)
>>> .timeWindow(Time.seconds(300)).apply(new 
>>> RichWindowFunction,Long>,
>>> Boolean, Tuple, TimeWindow>() {
>>>
>>> @Override
>>> public void open(Configuration parameters) throws
>>> Exception {
>>> super.open(parameters);
>>> //setup method
>>> }
>>>
>>> public void apply(Tuple key, TimeWindow window,
>>> Iterable,
>>> Long>> input,
>>> Collector out) throws Exception {
>>> //do processing
>>> }
>>>
>>> @Override
>>> public void close() throws Exception {
>>> //tear down method
>>> super.close();
>>> }
>>> });
>>>
>>> Thanks,
>>> Swapnil
>>>
>>
>


re: About Sliding window

2016-10-12 Thread Zhangrucong
Hi Kostas:
It doesn’t matter. Can you see the picture? My user case is:

1、The events are coming according to the following order
[cid:image004.png@01D224D0.A15CB290]
At 9:01 e1 is coming
At 9:02 e2 is coming
At 9:06  e3 is coming
At 9:08   e4 is coming

The time is system time.

2、And  event aging time is 5 minutes.

3、
   At 9:01 e1 is coming, aged nothing, store e1,we count e1 and send the 
result.
   At 9:02 e2 is coming,  aged nothing, store e2,  We count e1 and e2. and 
send the result.
  At 9:06  e3 is coming,  aged e1,  store e3, we count e2 and e3, and send 
the result.
 At 9:08   e4 is coming,  aged e2,  store e4, we count e3 and e4, and send 
the result.


I think I need a certain duration window.

Thank you very much!
发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月12日 21:11
收件人: Zhangrucong
抄送: user@flink.apache.org
主题: Re: About Sliding window

Hello again,

Sorry for the delay but I cannot really understand your use case.
Could you explain a bit more what do you mean by “out-of-date” event and 
“aging” an event?

Also your windows are of a certain duration or global?

Thanks,
Kostas

On Oct 11, 2016, at 3:04 PM, Zhangrucong 
mailto:zhangruc...@huawei.com>> wrote:

Hi Kostas:
Thank you for your rapid response!

My use-case is that :
For every incoming event, we want to age the out-of-date event , count the 
event in window and send the result.

For example:
The events are coming as flowing:


We want flowing result:



By the way, In StreamSQL API, in FILP11, It will realize row window. It seems 
that the function of Slide Event-time row-window suits my use-case. Does data 
stream API  support row window?

Thanks !

发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月11日 19:38
收件人: user@flink.apache.org
主题: Re: About Sliding window

Hi Zhangrucong,

Sliding windows only support time-based slide.
So your use-case is not supported out-of-the-box.

But, if you describe a bit more what you want to do,
we may be able to find a way together to do your job using
the currently offered functionality.

Kostas

On Oct 11, 2016, at 1:20 PM, Zhangrucong 
mailto:zhangruc...@huawei.com>> wrote:

Hello everyone:
  Now, I am want to use DataStream sliding window API. I look at the API and I 
have a question, dose the sliding time window support sliding by every incoming 
event?

Thanks in advance!





oledata.mso
Description: oledata.mso


image003.emz
Description: image003.emz


Re: About Sliding window

2016-10-12 Thread Kostas Kloudas
Hello again,

Sorry for the delay but I cannot really understand your use case.
Could you explain a bit more what do you mean by “out-of-date” event and 
“aging” an event?

Also your windows are of a certain duration or global?

Thanks,
Kostas

> On Oct 11, 2016, at 3:04 PM, Zhangrucong  wrote:
> 
> Hi Kostas:
> Thank you for your rapid response!
>  
> My use-case is that :
> For every incoming event, we want to age the out-of-date event , count the 
> event in window and send the result.
>  
> For example:
> The events are coming as flowing:
> 
>  
> We want flowing result:
> 
>  
>  
> By the way, In StreamSQL API, in FILP11, It will realize row window. It seems 
> that the function of Slide Event-time row-window suits my use-case. Does data 
> stream API  support row window?
>  
> Thanks !
>  
> 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 
> 发送时间: 2016年10月11日 19:38
> 收件人: user@flink.apache.org
> 主题: Re: About Sliding window
>  
> Hi Zhangrucong,
>  
> Sliding windows only support time-based slide. 
> So your use-case is not supported out-of-the-box.
>  
> But, if you describe a bit more what you want to do, 
> we may be able to find a way together to do your job using 
> the currently offered functionality.
>  
> Kostas
>  
> On Oct 11, 2016, at 1:20 PM, Zhangrucong  > wrote:
>  
> Hello everyone:
>   Now, I am want to use DataStream sliding window API. I look at the API and 
> I have a question, dose the sliding time window support sliding by every 
> incoming event?
>  
> Thanks in advance!
>  
> 



Re: Exception while running Flink jobs (1.0.0)

2016-10-12 Thread Flavio Pompermaier
Ok, thanks for the update Ufuk! Let me know if you need test or anything!

Best,
Flavio

On Wed, Oct 12, 2016 at 11:26 AM, Ufuk Celebi  wrote:

> No, sorry. I was waiting for Tarandeep's feedback before looking into
> it further. I will do it over the next days in any case.
>
> On Wed, Oct 12, 2016 at 10:49 AM, Flavio Pompermaier
>  wrote:
> > Hi Ufuk,
> > any news on this?
> >
> > On Thu, Oct 6, 2016 at 1:30 PM, Ufuk Celebi  wrote:
> >>
> >> I guess that this is caused by a bug in the checksum calculation. Let
> >> me check that.
> >>
> >> On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier <
> pomperma...@okkam.it>
> >> wrote:
> >> > I've ran the job once more (always using the checksum branch) and this
> >> > time
> >> > I got:
> >> >
> >> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112
> >> > at
> >> >
> >> > org.apache.flink.api.common.typeutils.base.EnumSerializer.
> deserialize(EnumSerializer.java:83)
> >> > at
> >> >
> >> > org.apache.flink.api.common.typeutils.base.EnumSerializer.
> deserialize(EnumSerializer.java:32)
> >> > at
> >> >
> >> > org.apache.flink.api.java.typeutils.runtime.
> PojoSerializer.deserialize(PojoSerializer.java:431)
> >> > at
> >> >
> >> > org.apache.flink.api.java.typeutils.runtime.
> TupleSerializer.deserialize(TupleSerializer.java:135)
> >> > at
> >> >
> >> > org.apache.flink.api.java.typeutils.runtime.
> TupleSerializer.deserialize(TupleSerializer.java:30)
> >> > at
> >> >
> >> > org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next(
> ChannelReaderInputViewIterator.java:100)
> >> > at
> >> >
> >> > org.apache.flink.runtime.operators.sort.MergeIterator$
> HeadStream.nextHead(MergeIterator.java:161)
> >> > at
> >> >
> >> > org.apache.flink.runtime.operators.sort.MergeIterator.
> next(MergeIterator.java:113)
> >> > at
> >> >
> >> > org.apache.flink.runtime.operators.util.metrics.
> CountingMutableObjectIterator.next(CountingMutableObjectIterator.java:45)
> >> > at
> >> >
> >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.
> advanceToNext(NonReusingKeyGroupedIterator.java:130)
> >> > at
> >> >
> >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.
> access$300(NonReusingKeyGroupedIterator.java:32)
> >> > at
> >> >
> >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator$
> ValuesIterator.next(NonReusingKeyGroupedIterator.java:192)
> >> > at
> >> >
> >> > org.okkam.entitons.mapping.flink.IndexMappingExecutor$
> TupleToEntitonJsonNode.reduce(IndexMappingExecutor.java:64)
> >> > at
> >> >
> >> > org.apache.flink.runtime.operators.GroupReduceDriver.
> run(GroupReduceDriver.java:131)
> >> > at org.apache.flink.runtime.operators.BatchTask.run(
> BatchTask.java:486)
> >> > at
> >> > org.apache.flink.runtime.operators.BatchTask.invoke(
> BatchTask.java:351)
> >> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)
> >> > at java.lang.Thread.run(Thread.java:745)
> >> >
> >> >
> >> > On Thu, Oct 6, 2016 at 11:00 AM, Ufuk Celebi  wrote:
> >> >>
> >> >> Yes, if that's the case you should go with option (2) and run with
> the
> >> >> checksums I think.
> >> >>
> >> >> On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier
> >> >>  wrote:
> >> >> > The problem is that data is very large and usually cannot run on a
> >> >> > single
> >> >> > machine :(
> >> >> >
> >> >> > On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi 
> wrote:
> >> >> >>
> >> >> >> On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh
> >> >> >> 
> >> >> >> wrote:
> >> >> >> > @Stephan my flink cluster setup- 5 nodes, each running 1
> >> >> >> > TaskManager.
> >> >> >> > Slots
> >> >> >> > per task manager: 2-4 (I tried varying this to see if this has
> any
> >> >> >> > impact).
> >> >> >> > Network buffers: 5k - 20k (tried different values for it).
> >> >> >>
> >> >> >> Could you run the job first on a single task manager to see if the
> >> >> >> error occurs even if no network shuffle is involved? That should
> be
> >> >> >> less overhead for you than running the custom build (which might
> be
> >> >> >> buggy ;)).
> >> >> >>
> >> >> >> – Ufuk
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >> >
> >
> >
> >
>


Re: Tumbling window rich functionality

2016-10-12 Thread Swapnil Chougule
Thanks Aljoscha.

Whenever I am using WindowFunction.apply() on keyed stream, apply() will be
called once or multiple times (equal to number of keys in that windowed
stream)?

Ex:
DataStream dataStream = env
.socketTextStream("localhost", )
.flatMap(new Splitter())
.keyBy(0)
.timeWindow(Time.seconds(10))
.apply(new WindowFunction, Boolean,
Tuple, TimeWindow>() {

@Override
public void apply(Tuple key, TimeWindow window,
Iterable> input,
Collector out) throws Exception {
 //Some business logic
}
});

Regards,
Swapnil

On Wed, Sep 14, 2016 at 2:26 PM, Aljoscha Krettek 
wrote:

> Hi,
> WindowFunction.apply() will be called once for each window so you should
> be able to do the setup/teardown in there. open() and close() are called at
> the start of processing, end of processing, respectively.
>
> Cheers,
> Aljoscha
>
> On Wed, 14 Sep 2016 at 09:04 Swapnil Chougule 
> wrote:
>
>> Hi Team,
>>
>> I am using tumbling window functionality having window size 5 minutes.
>> I want to perform setup & teardown functionality for each window. I tried
>> using RichWindowFunction but it didn't work for me.
>> Can anybody tell me how can I do it ?
>>
>> Attaching code snippet what I tried
>>
>> impressions.map(new LineItemAdUnitAggr()).keyBy(0)
>> .timeWindow(Time.seconds(300)).apply(new 
>> RichWindowFunction,Long>,
>> Boolean, Tuple, TimeWindow>() {
>>
>> @Override
>> public void open(Configuration parameters) throws
>> Exception {
>> super.open(parameters);
>> //setup method
>> }
>>
>> public void apply(Tuple key, TimeWindow window,
>> Iterable, Long>>
>> input,
>> Collector out) throws Exception {
>> //do processing
>> }
>>
>> @Override
>> public void close() throws Exception {
>> //tear down method
>> super.close();
>> }
>> });
>>
>> Thanks,
>> Swapnil
>>
>


bucketing in RollingSink

2016-10-12 Thread robert.lancaster
Hi Flinksters,

At one stage in my data stream, I want to save the stream to a set of rolling 
files where the file name used (i.e. the bucket) is chosen based on an 
attribute of each data record.  Specifically, I’m using a windowing function to 
create aggregates of certain metrics and I want to save that data in a file 
with a name that identifies the window.

I was planning to write my own bucketer for this, but in version 1.1.2 the 
Bucketer interface doesn’t allow for the element being processed to be passed 
to the relevant methods (e.g. getNextBucketPath and shouldStartNewBucket).  I 
see that this is taken care of in 1.2, but since that isn’t available yet, can 
anyone recommend a workaround?  Alternatively, is there a way to have the 
DateTimeBucketer use assigned timestamps instead of system time?


The information contained in this communication is confidential and intended 
only for the use of the recipient named above, and may be legally privileged 
and exempt from disclosure under applicable law. If the reader of this message 
is not the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. If you 
have received this communication in error, please resend it to the sender and 
delete the original message and copy of it from your computer system. Opinions, 
conclusions and other information in this message that do not relate to our 
official business should be understood as neither given nor endorsed by the 
company.


AW: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-12 Thread Claudia Wegmann
Hey,

I face the same problem and decided to go with your third solution. I use 
Groovy as the scripting language, which has access to Java classes and 
therefore also to Flink constructs like Time.seconds(10). See below for an 
example of a pattern definition with Groovy:

private static Binding bind = new Binding();
private static GroovyShell gs = new GroovyShell(bind);

Pattern dynPattern = Pattern
.begin(two.getState()).where((FilterFunction) 
bds -> {
bind.setVariable("bds", bds);
Object ergPat = gs.evaluate(two.getWhere());
return ( (ergPat instanceof Boolean)) ?(Boolean)ergPat : false;
})
.within((gs.evaluate(two.getWithin()) instanceof Time) ? 
(Time)gs.evaluate(two.getWithin()) : null);

Sorry, but I'm not aware on how to format this nicely.
I don't know, if it's the best way, but it works :)

Best,
Claudia

-Ursprüngliche Nachricht-
Von: PedroMrChaves [mailto:pedro.mr.cha...@gmail.com] 
Gesendet: Mittwoch, 12. Oktober 2016 10:32
An: user@flink.apache.org
Betreff: Re: What is the best way to load/add patterns dynamically (at runtime) 
with Flink?

I've been thinking in several options to solve this problem:

1. I can use Flink savepoints in order to save the application state , change 
the jar file and submit a new job (as the new jar file with the patterns 
added/changed). The problem in this case is to be able to correctly handle the 
savepoints and because I must stop and start the job, the events will be 
delayed.

2. I can compile java code at runtime using using java compiler library I don't 
know if this would be a viable solution.

3. I can use a scripting language like you did, but I would lose the ability to 
use the native Flink library which is available in scala and java. 





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-is-the-best-way-to-load-add-patterns-dynamically-at-runtime-with-Flink-tp9461p9470.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Distributing Tasks over Task manager

2016-10-12 Thread Jürgen Thomann

Hi,

we currently have an issue with Flink, as it allocates many tasks to the 
same task manager and as a result it overloads it. I reduced the amount 
of task slots per task manager (keeping the CPU count) and added some 
more servers but that did not help to distribute the load.


Is there some way to force Flink to distribute the load/tasks on a 
standalone cluster? I saw that 
https://issues.apache.org/jira/browse/FLINK-1003 would maybe provide 
what we need, but that is currently not worked on as it seems.


Cheers,
Jürgen


Re: jdbc.JDBCInputFormat

2016-10-12 Thread sunny patel
Hi guys,

I am facing following error message in flink scala JDBC wordcount.
could you please advise me on this?

*Information:12/10/2016, 10:43 - Compilation completed with 2 errors and 0
warnings in 1s 903ms*
*/Users/janaidu/faid/src/main/scala/fgid/JDBC.scala*

*Error:(17, 67) can't expand macros compiled by previous versions of Scala*
*val stringColum: TypeInformation[Int] = createTypeInformation[Int]*

*Error:(29, 33) can't expand macros compiled by previous versions of Scala*
*val dataset =env.createInput(inputFormat)*


 code


package DataSources

import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.api.java.io.jdbc.JDBCInputFormat
import org.apache.flink.api.scala._
import org.apache.flink.api.table.typeutils.RowTypeInfo

object WordCount {
  def main(args: Array[String]) {

val PATH = getClass.getResource("").getPath

// set up the execution environment
val env = ExecutionEnvironment.getExecutionEnvironment

// Read data from JDBC (Kylin in our case)
val stringColum: TypeInformation[Int] = createTypeInformation[Int]
val DB_ROWTYPE = new RowTypeInfo(Seq(stringColum))

val inputFormat = JDBCInputFormat.buildJDBCInputFormat()
  .setDrivername("org.postgresql.jdbc.Driver")
  .setDBUrl("jdbc:postgresql://localhost:5432/mydb")
  .setUsername("MI")
  .setPassword("MI")
  .setQuery("select * FROM identity")
  .setRowTypeInfo(DB_ROWTYPE)
  .finish()

val dataset =env.createInput(inputFormat)
dataset.print()

println(PATH)
  }
}


-pom.xml



http://maven.apache.org/POM/4.0.0";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
   4.0.0
   
  flink-parent
  org.apache.flink
  1.2-SNAPSHOT
   

   org.apache.flink.quickstart
   flink-scala-project
   0.1
   jar

   Flink Quickstart Job
   http://www.myorganization.org

   
  
 apache.snapshots
 Apache Development Snapshot Repository
 
https://repository.apache.org/content/repositories/snapshots/
 
false
 
 
 
  
   

   
  UTF-8
  1.1.2
   

   

   
  
 org.apache.flink
 flink-jdbc
 ${flink.version}
  
  
 org.apache.flink
 flink-table_2.10
 ${flink.version}
  
  
 org.apache.flink
 flink-scala_2.10
 ${flink.version}
  
  
 org.apache.flink
 flink-streaming-scala_2.10
 ${flink.version}
  
  
 org.apache.flink
 flink-clients_2.10
 ${flink.version}
  
   

   
  
 
 build-jar
 
 
 

   org.apache.flink
   flink-scala_2.10
   ${flink.version}
   provided


   org.apache.flink
   flink-streaming-scala_2.10
   ${flink.version}
   provided


   org.apache.flink
   flink-clients_2.10
   ${flink.version}
   provided

 

 

   
   
  org.apache.maven.plugins
  maven-shade-plugin
  2.4.1
  
 
package

   shade


   
  
   

 
  
   

 
  
   

   
   
  
 
 
org.apache.maven.plugins
maven-shade-plugin
2.4.1

   
   
  package
  
 shade
  
  
 

   
   org.apache.flink:flink-annotations

org.apache.flink:flink-shaded-hadoop1_2.10

org.apache.flink:flink-shaded-hadoop2

org.apache.flink:flink-shaded-curator-recipes
   org.apache.flink:flink-core
   org.apache.flink:flink-java
   org.apache.flink:flink-scala_2.10

org.apache.flink:flink-runtime_2.10

org.apache.flink:flink-optimizer_2.10

org.apache.flink:flink-clients_2.10
   org.apache.flink:flink-avro_2.10

org.apache.flink:flink-examples-batch_2.10

org.apache.flink:flink-examples-streaming_2.10

org.apache.flink:flink-streaming-java_2.10

   

   org.scala-lang:scala-library
   org.scala-lang:scala-compiler
  

Re: Flink Kafka connector08 not updating the offsets with the zookeeper

2016-10-12 Thread Robert Metzger
Hi Anchit,

Can you re-run your job with the debug level for Flink set to DEBUG?
Then, you should see the following log message every time the offset is
committed of Zookeeper:

"Committing offsets to Kafka/ZooKeeper for checkpoint"

Alternatively, can you check whether the offsets are available in Zookeeper
once the first checkpoint completed?
They should be located in

/consumers//offsets//

In Flink 1.2-SNAPSHOT, Flink is exposing the current offset as a metric for
all kafka connector versions.

Regards,
Robert


On Wed, Oct 12, 2016 at 2:35 AM, Anchit Jatana  wrote:

> Hi All,
>
> I'm using Flink Kafka connector08. I need to check/monitor the offsets of
> the my flink application's kafka consumer.
>
> When running this:
>
> bin/kafka-consumer-groups.sh --zookeeper  --describe
> --group 
>
> I get the message: No topic available for consumer group provided. Why is
> the consumer not updating the offsets with the zookeeper ?
>
> PS: I have enabled checkpointing. Is there any configuration that I'm
> missing or is this some sort of a bug?
>
> Using Flink version 1.1.2
>
> Thank you
>
> Regards,
> Anchit
>


Re: "Slow ReadProcessor" warnings when using BucketSink

2016-10-12 Thread Robert Metzger
Hi,
I haven't seen this error before. Also, I didn't find anything helpful
searching for the error on Google.

Did you check the GC times also for Flink? Is your Flink job doing any
heavy tasks (like maintaining large windows, or other operations involving
a lot of heap space?)

Regards,
Robert


On Tue, Oct 11, 2016 at 10:51 AM, static-max 
wrote:

> Hi,
>
> I have a low throughput job (approx. 1000 messager per Minute), that
> consumes from Kafka und writes directly to HDFS. After an hour or so, I get
> the following warnings in the Task Manager log:
>
> 2016-10-10 01:59:44,635 WARN  org.apache.hadoop.hdfs.DFSClient
>- Slow ReadProcessor read fields took 30001ms
> (threshold=3ms); ack: seqno: 66 reply: SUCCESS reply: SUCCESS reply:
> SUCCESS downstreamAckTimeNanos: 1599276 flag: 0 flag: 0 flag: 0, targets:
> [DatanodeInfoWithStorage[Node1, Node2, Node3]]
> 2016-10-10 02:04:44,635 WARN  org.apache.hadoop.hdfs.DFSClient
>- Slow ReadProcessor read fields took 30002ms
> (threshold=3ms); ack: seqno: 13 reply: SUCCESS reply: SUCCESS reply:
> SUCCESS downstreamAckTimeNanos: 2394027 flag: 0 flag: 0 flag: 0, targets:
> [DatanodeInfoWithStorage[Node1, Node2, Node3]]
> 2016-10-10 02:05:14,635 WARN  org.apache.hadoop.hdfs.DFSClient
>- Slow ReadProcessor read fields took 30001ms
> (threshold=3ms); ack: seqno: 17 reply: SUCCESS reply: SUCCESS reply:
> SUCCESS downstreamAckTimeNanos: 2547467 flag: 0 flag: 0 flag: 0, targets:
> [DatanodeInfoWithStorage[Node1, Node2, Node3]]
>
> I have not found any erros or warning at the datanodes or the namenode.
> Every other application using HDFS performs fine. I have very little load
> and network latency is fine also. I also checked GC, disk I/O.
>
> The files written are very small (only a few MB), so writing the blocks
> should be fast.
>
> The threshold is crossed only 1 or 2 ms, this makes me wonder.
>
> Does anyone have an Idea where to look next or how to fix these warnings?
>


Re: Exception while running Flink jobs (1.0.0)

2016-10-12 Thread Ufuk Celebi
No, sorry. I was waiting for Tarandeep's feedback before looking into
it further. I will do it over the next days in any case.

On Wed, Oct 12, 2016 at 10:49 AM, Flavio Pompermaier
 wrote:
> Hi Ufuk,
> any news on this?
>
> On Thu, Oct 6, 2016 at 1:30 PM, Ufuk Celebi  wrote:
>>
>> I guess that this is caused by a bug in the checksum calculation. Let
>> me check that.
>>
>> On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier 
>> wrote:
>> > I've ran the job once more (always using the checksum branch) and this
>> > time
>> > I got:
>> >
>> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112
>> > at
>> >
>> > org.apache.flink.api.common.typeutils.base.EnumSerializer.deserialize(EnumSerializer.java:83)
>> > at
>> >
>> > org.apache.flink.api.common.typeutils.base.EnumSerializer.deserialize(EnumSerializer.java:32)
>> > at
>> >
>> > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:431)
>> > at
>> >
>> > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:135)
>> > at
>> >
>> > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
>> > at
>> >
>> > org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next(ChannelReaderInputViewIterator.java:100)
>> > at
>> >
>> > org.apache.flink.runtime.operators.sort.MergeIterator$HeadStream.nextHead(MergeIterator.java:161)
>> > at
>> >
>> > org.apache.flink.runtime.operators.sort.MergeIterator.next(MergeIterator.java:113)
>> > at
>> >
>> > org.apache.flink.runtime.operators.util.metrics.CountingMutableObjectIterator.next(CountingMutableObjectIterator.java:45)
>> > at
>> >
>> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.advanceToNext(NonReusingKeyGroupedIterator.java:130)
>> > at
>> >
>> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.access$300(NonReusingKeyGroupedIterator.java:32)
>> > at
>> >
>> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator$ValuesIterator.next(NonReusingKeyGroupedIterator.java:192)
>> > at
>> >
>> > org.okkam.entitons.mapping.flink.IndexMappingExecutor$TupleToEntitonJsonNode.reduce(IndexMappingExecutor.java:64)
>> > at
>> >
>> > org.apache.flink.runtime.operators.GroupReduceDriver.run(GroupReduceDriver.java:131)
>> > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
>> > at
>> > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
>> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> >
>> > On Thu, Oct 6, 2016 at 11:00 AM, Ufuk Celebi  wrote:
>> >>
>> >> Yes, if that's the case you should go with option (2) and run with the
>> >> checksums I think.
>> >>
>> >> On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier
>> >>  wrote:
>> >> > The problem is that data is very large and usually cannot run on a
>> >> > single
>> >> > machine :(
>> >> >
>> >> > On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi  wrote:
>> >> >>
>> >> >> On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh
>> >> >> 
>> >> >> wrote:
>> >> >> > @Stephan my flink cluster setup- 5 nodes, each running 1
>> >> >> > TaskManager.
>> >> >> > Slots
>> >> >> > per task manager: 2-4 (I tried varying this to see if this has any
>> >> >> > impact).
>> >> >> > Network buffers: 5k - 20k (tried different values for it).
>> >> >>
>> >> >> Could you run the job first on a single task manager to see if the
>> >> >> error occurs even if no network shuffle is involved? That should be
>> >> >> less overhead for you than running the custom build (which might be
>> >> >> buggy ;)).
>> >> >>
>> >> >> – Ufuk
>> >> >
>> >> >
>> >> >
>> >> >
>> >
>> >
>> >
>
>
>


Re: Data Transfer between TM should be encrypted

2016-10-12 Thread Robert Metzger
Hi,
I think that pull request will be merged for 1.2.

On Fri, Oct 7, 2016 at 6:26 PM, vinay patil  wrote:

> Hi Stephan,
>
> https://github.com/apache/flink/pull/2518
> Is this pull request going to be part of 1.2 release ? Just wanted to get
> an idea on timelines so that I can pass on to the team.
>
>
> Regards,
> Vinay Patil
>
> On Thu, Sep 15, 2016 at 11:45 AM, Vijay Srinivasaraghavan <[hidden email]
> > wrote:
>
>> Hi Vinay,
>>
>> There are some delays and we expect the PR to be created next week.
>>
>> Regards
>> Vijay
>>
>> On Wednesday, September 14, 2016 5:41 PM, vinay patil <[hidden email]
>> > wrote:
>>
>>
>> Hi Vijay,
>>
>> Did you raise the PR for this task, I don't mind testing it out as well.
>>
>> Regards,
>> Vinay Patil
>>
>> On Tue, Aug 30, 2016 at 6:28 PM, Vinay Patil <[hidden email]> wrote:
>>
>> Hi Vijay,
>>
>> That's a good news for me. Eagerly waiting for this change so that I can
>> integrate and test it before going live.
>>
>> Regards,
>> Vinay Patil
>>
>> On Tue, Aug 30, 2016 at 4:06 PM, Vijay Srinivasaraghavan [via Apache
>> Flink User Mailing List archive.] <[hidden email]> wrote:
>>
>> Hi Stephan,
>>
>> The dev work is almost complete except the Yarn mode deployment stuff
>> that needs to be patched. We are expecting to send a PR in a week or two.
>>
>> Regards
>> Vijay
>>
>>
>> On Tuesday, August 30, 2016 12:39 AM, Stephan Ewen <[hidden email]>
>> wrote:
>>
>>
>> Let me loop in Vijay, I think he is the one working on this and can
>> probably give the best estimate when it can be expected.
>>
>> @vijay: For the SSL/TLS transport encryption - do you have an estimate
>> for the timeline of that feature?
>>
>>
>> On Mon, Aug 29, 2016 at 8:54 PM, vinay patil <[hidden email]> wrote:
>>
>> Hi Stephan,
>>
>> Thank you for your reply.
>>
>> Till when can I expect this feature to be integrated in master or release
>> version ?
>>
>> We are going to get production data (financial data) in October end , so
>> want to have this feature before that.
>>
>> Regards,
>> Vinay Patil
>>
>> On Mon, Aug 29, 2016 at 11:15 AM, Stephan Ewen [via Apache Flink User
>> Mailing List archive.] <[hidden email]> wrote:
>>
>> Hi!
>>
>> The way that the JIRA issue you linked will achieve this is by hooking
>> into the network stream pipeline directly, and encrypt the raw network byte
>> stream. We built the network stack on Netty, and will use Netty's SSL/TLS
>> handlers for that.
>>
>> That should be much more efficient than manual encryption/decryption in
>> each user function.
>>
>> Stephan
>>
>>
>>
>>
>>
>>
>> On Mon, Aug 29, 2016 at 6:12 PM, vinay patil <[hidden email]> wrote:
>>
>> Hi Ufuk,
>>
>> This is regarding this issue
>> https://issues.apache.org/jira /browse/FLINK-4404
>> 
>>
>> How can we achieve this, I am able to decrypt the data from Kafka coming
>> in, but I want to make sure that the data is encrypted when flowing between
>> TM's.
>>
>> One approach I can think of is to decrypt the data at the start of each
>> operator and encrypt it at the end of each operator, but I feel this is not
>> an efficient approach.
>>
>> I just want to check if there are alternatives to this and can this be
>> achieved by doing some configurations.
>>
>> Regards,
>> Vinay Patil
>>
>> --
>> View this message in context: Data Transfer between TM should be
>> encrypted
>> 
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive
>> 
>> at Nabble.com.
>>
>>
>>
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-maili ng-list-archive.2336050.n4.
>> nabble.com/Data-Transfer-betwe en-TM-should-be-encrypted- tp8781p8782.html
>> 
>> To start a new topic under Apache Flink User Mailing List archive., email 
>> [hidden
>> email]
>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>> NAML
>> 
>>
>>
>>
>> --
>> View this message in context: Re: Data Transfer between TM should 

Re: Exception while running Flink jobs (1.0.0)

2016-10-12 Thread Flavio Pompermaier
Hi Ufuk,
any news on this?

On Thu, Oct 6, 2016 at 1:30 PM, Ufuk Celebi  wrote:

> I guess that this is caused by a bug in the checksum calculation. Let
> me check that.
>
> On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier 
> wrote:
> > I've ran the job once more (always using the checksum branch) and this
> time
> > I got:
> >
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112
> > at
> > org.apache.flink.api.common.typeutils.base.EnumSerializer.
> deserialize(EnumSerializer.java:83)
> > at
> > org.apache.flink.api.common.typeutils.base.EnumSerializer.
> deserialize(EnumSerializer.java:32)
> > at
> > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(
> PojoSerializer.java:431)
> > at
> > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(
> TupleSerializer.java:135)
> > at
> > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(
> TupleSerializer.java:30)
> > at
> > org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next(
> ChannelReaderInputViewIterator.java:100)
> > at
> > org.apache.flink.runtime.operators.sort.MergeIterator$
> HeadStream.nextHead(MergeIterator.java:161)
> > at
> > org.apache.flink.runtime.operators.sort.MergeIterator.
> next(MergeIterator.java:113)
> > at
> > org.apache.flink.runtime.operators.util.metrics.
> CountingMutableObjectIterator.next(CountingMutableObjectIterator.java:45)
> > at
> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.
> advanceToNext(NonReusingKeyGroupedIterator.java:130)
> > at
> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.access$300(
> NonReusingKeyGroupedIterator.java:32)
> > at
> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator$
> ValuesIterator.next(NonReusingKeyGroupedIterator.java:192)
> > at
> > org.okkam.entitons.mapping.flink.IndexMappingExecutor$
> TupleToEntitonJsonNode.reduce(IndexMappingExecutor.java:64)
> > at
> > org.apache.flink.runtime.operators.GroupReduceDriver.
> run(GroupReduceDriver.java:131)
> > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
> > at
> > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> > On Thu, Oct 6, 2016 at 11:00 AM, Ufuk Celebi  wrote:
> >>
> >> Yes, if that's the case you should go with option (2) and run with the
> >> checksums I think.
> >>
> >> On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier
> >>  wrote:
> >> > The problem is that data is very large and usually cannot run on a
> >> > single
> >> > machine :(
> >> >
> >> > On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi  wrote:
> >> >>
> >> >> On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh  >
> >> >> wrote:
> >> >> > @Stephan my flink cluster setup- 5 nodes, each running 1
> TaskManager.
> >> >> > Slots
> >> >> > per task manager: 2-4 (I tried varying this to see if this has any
> >> >> > impact).
> >> >> > Network buffers: 5k - 20k (tried different values for it).
> >> >>
> >> >> Could you run the job first on a single task manager to see if the
> >> >> error occurs even if no network shuffle is involved? That should be
> >> >> less overhead for you than running the custom build (which might be
> >> >> buggy ;)).
> >> >>
> >> >> – Ufuk
> >> >
> >> >
> >> >
> >> >
> >
> >
> >
>


Re: mapreduce.HadoopOutputFormat config value issue

2016-10-12 Thread Fabian Hueske
Hi Shannon,

I tried to reproduce the problem in a unit test without success.
My test configures a HadoopOutputFormat object, serializes and deserializes
it, cally open, and verifies that a configured String property is present
in the getRecordWriter() method.

Next I would try to reproduce the error with Cassandra. Which version are
you using?
Can you also open a JIRA issue for this bug?

Thanks, Fabian

2016-10-11 23:40 GMT+02:00 Shannon Carey :

> In Flink 1.1.1, I am seeing what looks like a serialization issue of
> org.apache.hadoop.conf.Configuration or when used
> with org.apache.flink.api.scala.hadoop.mapreduce.HadoopOutputFormat. When
> I use the mapred.HadoopOutputFormat version, it works just fine.
>
> Specifically, the job fails because "java.lang.UnsupportedOperationException:
> You must set the ColumnFamily schema using setColumnFamilySchema." I am
> definitely setting that property, and it appears to be getting serialized,
> but when the config deserializes the setting is gone. Anybody have any
> ideas? In the meantime, I will continue using the "mapred" package.
>
> Stack trace:
> java.lang.UnsupportedOperationException: You must set the ColumnFamily
> schema using setColumnFamilySchema.
> at org.apache.cassandra.hadoop.cql3.CqlBulkOutputFormat.
> getColumnFamilySchema(CqlBulkOutputFormat.java:184)
> at org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter.setConfigs(
> CqlBulkRecordWriter.java:94)
> at org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter.<
> init>(CqlBulkRecordWriter.java:74)
> at org.apache.cassandra.hadoop.cql3.CqlBulkOutputFormat.getRecordWriter(
> CqlBulkOutputFormat.java:86)
> at org.apache.cassandra.hadoop.cql3.CqlBulkOutputFormat.getRecordWriter(
> CqlBulkOutputFormat.java:52)
> at org.apache.flink.api.java.hadoop.mapreduce.HadoopOutputFormatBase.open(
> HadoopOutputFormatBase.java:146)
> at org.apache.flink.runtime.operators.DataSinkTask.invoke(
> DataSinkTask.java:176)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> at java.lang.Thread.run(Thread.java:745)
>
>
> Code that works:
>
> val insertStmt = s"INSERT INTO $keyspace.$colFamily (user_id,
> updated_time, value) VALUES (?, ?, ?)"
> val config = new JobConf()
>
> ConfigHelper.setOutputInitialAddress(config, initialOutputAddress.
> getHostAddress)
>
> CqlBulkOutputFormat.setColumnFamilySchema(config, colFamily,
> cqlTableSchema)
> CqlBulkOutputFormat.setColumnFamilyInsertStatement(config, colFamily,
> insertStmt)
> CqlBulkOutputFormat.setDeleteSourceOnSuccess(config, true)
> ConfigHelper.setOutputColumnFamily(config,
>   keyspace,
>   colFamily)
> ConfigHelper.setOutputPartitioner(config, partitionerClass)
>
> val outputFormat = new mapred.HadoopOutputFormat[Object,
> java.util.List[ByteBuffer]](
>   new CqlBulkOutputFormat,
>   config)
>
> Code that doesn't work:
>
> val insertStmt = s"INSERT INTO $keyspace.$colFamily (user_id,
> updated_time, value) VALUES (?, ?, ?)"
> val config = new Configuration()
>
> ConfigHelper.setOutputInitialAddress(config, initialOutputAddress.
> getHostAddress)
>
> CqlBulkOutputFormat.setColumnFamilySchema(config, colFamily,
> cqlTableSchema)
> CqlBulkOutputFormat.setColumnFamilyInsertStatement(config, colFamily,
> insertStmt)
> CqlBulkOutputFormat.setDeleteSourceOnSuccess(config, true)
> ConfigHelper.setOutputColumnFamily(config,
>   keyspace,
>   colFamily)
> ConfigHelper.setOutputPartitioner(config, partitionerClass)
>
> val hadoopJob: Job = Job.getInstance(config)
>
> val outputFormat = new mapreduce.HadoopOutputFormat[Object,
> java.util.List[ByteBuffer]](
>   new CqlBulkOutputFormat,
>   hadoopJob)
>
>


Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-12 Thread PedroMrChaves
I've been thinking in several options to solve this problem:

1. I can use Flink savepoints in order to save the application state ,
change the jar file and submit a new job (as the new jar file with the
patterns added/changed). The problem in this case is to be able to correctly
handle the savepoints and because I must stop and start the job, the events
will be delayed.

2. I can compile java code at runtime using using java compiler library I
don't know if this would be a viable solution.

3. I can use a scripting language like you did, but I would lose the ability
to use the native Flink library which is available in scala and java. 





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-is-the-best-way-to-load-add-patterns-dynamically-at-runtime-with-Flink-tp9461p9470.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.