Re: Event Time Session Window does not trigger..

2018-08-05 Thread anna stax
Hi Hequn,

Thanks for link. Looks like I better use ProcessingTime instead of
EventTime especially because of the 4th reason you listed..
"Data should cover a longer time span than the window size to advance the
event time."
I need the trigger when the data stops.

I have 1 more question.

Can I set the TimeCharacteristic to the stream level instead on the
application level?
Can I use both TimeCharacteristic.ProcessingTime  and
TimeCharacteristic.EventTime in an application.

Thank you

On Sat, Aug 4, 2018 at 10:05 PM, Hequn Cheng  wrote:

> Hi shyla,
>
> I answered a similar question on stackoverflow[1], you can take a look
> first.
>
> Best, Hequn
>
> [1] https://stackoverflow.com/questions/51691269/event-time-
> window-in-flink-does-not-trigger
>
> On Sun, Aug 5, 2018 at 11:24 AM, shyla deshpande  > wrote:
>
>> Hi,
>>
>> I used PopularPlacesFromKafka from dataartisans.flinktraining.exercises as 
>> the basis. I made very minor changes
>>
>> and the session window is not triggered. If I use ProcessingTime instead of 
>> EventTime it works. Here is my code.
>>
>> Appreciate any help. Thanks
>>
>> object KafkaEventTimeWindow {
>>
>>   private val LOCAL_ZOOKEEPER_HOST = "localhost:2181"
>>   private val LOCAL_KAFKA_BROKER = "localhost:9092"
>>   private val CON_GROUP = "KafkaEventTimeSessionWindow"
>>   private val MAX_EVENT_DELAY = 60 // events are out of order by max 60 
>> seconds
>>
>>   def main(args: Array[String]) {
>>
>> val env = StreamExecutionEnvironment.getExecutionEnvironment
>> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>>
>> val kafkaProps = new Properties
>> kafkaProps.setProperty("zookeeper.connect", LOCAL_ZOOKEEPER_HOST)
>> kafkaProps.setProperty("bootstrap.servers", LOCAL_KAFKA_BROKER)
>> kafkaProps.setProperty("group.id", CON_GROUP)
>> kafkaProps.setProperty("auto.offset.reset", "earliest")
>>
>> val consumer = new FlinkKafkaConsumer011[PositionEventProto](
>>   "positionevent",
>>   new PositionEventProtoSchema,
>>   kafkaProps)
>> consumer.assignTimestampsAndWatermarks(new PositionEventProtoTSAssigner)
>>
>> val posstream = env.addSource(consumer)
>>
>> def convtoepochmilli(cdt: String): Long = {
>>   val  odt:OffsetDateTime = OffsetDateTime.parse(cdt);
>>   val i:Instant  = odt.toInstant();
>>   val millis:Long = i.toEpochMilli();
>>   millis
>> }
>>
>> val outputstream = posstream
>>   .mapWith{case(p) => (p.getConsumerUserId, 
>> convtoepochmilli(p.getCreateDateTime.getInIso8601Format))}
>>   .keyBy(0)
>>   .window(EventTimeSessionWindows.withGap(Time.seconds(60)))
>>   .reduce { (v1, v2) => (v1._1, Math.max(v1._2 , v2._2)) }
>>
>> outputstream.print()
>>
>> // execute the transformation pipeline
>> env.execute("Output Stream")
>>   }
>>
>> }
>>
>> class PositionEventProtoTSAssigner
>>   extends 
>> BoundedOutOfOrdernessTimestampExtractor[PositionEventProto](Time.seconds(60))
>>  {
>>
>>   override def extractTimestamp(pos: PositionEventProto): Long = {
>> val  odt:OffsetDateTime = 
>> OffsetDateTime.parse(pos.getCreateDateTime.getInIso8601Format);
>> val i:Instant  = odt.toInstant();
>> val millis:Long = i.toEpochMilli();
>> millis
>>   }
>> }
>>
>>
>>
>


Multiple output operations in a job vs multiple jobs

2018-07-31 Thread anna stax
Hi all,

I am not sure when I should go for multiple jobs or have 1 job with all the
sources and sinks. Following is my code.

   val env = StreamExecutionEnvironment.getExecutionEnvironment
...
// create a Kafka source
val srcstream = env.addSource(consumer)

srcstream
  .keyBy(0)
  .window(ProcessingTimeSessionWindows.withGap(Time.days(14)))
  .reduce  ...
  .map ...
  .addSink ...

srcstream
  .keyBy(0)
  .window(ProcessingTimeSessionWindows.withGap(Time.days(28)))
  .reduce  ...
  .map ...
  .addSink ...

env.execute("Job1")

My questions

1. The srcstream is a very high volume stream and the window size is 2
weeks and 4 weeks. Is the window size a problem? In this case, I think it
is not a problem because I am using reduce which stores only 1 value per
window. Is that right?

2. I am having 2 output operations one with 2 weeks window and the other
with 4 weeks window. Are they executed in parallel or in sequence?

3. When I have multiple output operations like in this case should I break
it into 2 different jobs ?

4. Can I run multiple jobs on the same cluster?

Thanks


Re: ProcessFunction example from the documentation giving me error

2018-07-23 Thread anna stax
It is good now. Sorry, my fault. I had multiple applications running and
both were using the socket stream . Thanks.

On Sun, Jul 22, 2018 at 8:22 PM, vino yang  wrote:

> Hi anna,
>
> From the stack trace you provided, it's socket connect error not about
> Flink.
>
> So, Have you start a socket server at "localhost:"? Using a program or
> CLI tool, such as "nc -l "
>
> There is a example you can have a look[1].
>
> [1]: https://ci.apache.org/projects/flink/flink-docs-
> release-1.5/quickstart/setup_quickstart.html#run-the-example
>
> Thanks, vino.
>
> 2018-07-21 4:30 GMT+08:00 anna stax :
>
>> It is not the code, but I don't know what the problem is.
>> A simple word count with  socketTextStream used to work but now gives
>> the same error.
>> Apps with kafka source which used to work is giving the same error.
>> When I have a source generator within the app itself works good.
>>
>> So, with socketTextStream and kafka source gives me
>> java.net.ConnectException: Operation timed out (Connection timed out)
>> error
>>
>> On Fri, Jul 20, 2018 at 10:29 AM, anna stax  wrote:
>>
>>> My object name is  CreateUserNotificationRequests, thats why  you see
>>> CreateUserNotificationRequests in the Error message.
>>> I edited the object name after pasting the code...Hope there is no
>>> confusion and I get some help.
>>> Thanks
>>>
>>>
>>>
>>> On Fri, Jul 20, 2018 at 10:10 AM, anna stax 
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> This is my code, just trying to make the code example in
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1
>>>> .5/dev/stream/operators/process_function.html work
>>>>
>>>> object ProcessFunctionTest {
>>>>
>>>>   def main(args: Array[String]) {
>>>>
>>>> val env = StreamExecutionEnvironment.getExecutionEnvironment
>>>> env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
>>>> val text = env.socketTextStream("localhost", )
>>>>
>>>> val text1 = text.map(s => (s,s)).keyBy(0).process(new
>>>> CountWithTimeoutFunction())
>>>>
>>>> text1.print()
>>>>
>>>> env.execute("CountWithTimeoutFunction")
>>>>   }
>>>>
>>>>   case class CountWithTimestamp(key: String, count: Long, lastModified:
>>>> Long)
>>>>
>>>>   class CountWithTimeoutFunction extends ProcessFunction[(String,
>>>> String), (String, Long)] {
>>>>
>>>> lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext
>>>>   .getState(new ValueStateDescriptor[CountWithTimestamp]("myState",
>>>> classOf[CountWithTimestamp]))
>>>>
>>>> override def processElement(value: (String, String), ctx:
>>>> ProcessFunction[(String, String), (String, Long)]#Context, out:
>>>> Collector[(String, Long)]): Unit = {
>>>> ..
>>>> }
>>>>
>>>> override def onTimer(timestamp: Long, ctx: ProcessFunction[(String,
>>>> String), (String, Long)]#OnTimerContext, out: Collector[(String, Long)]):
>>>> Unit = {
>>>> ...
>>>> }
>>>>   }
>>>> }
>>>>
>>>>
>>>> Exception in thread "main" 
>>>> org.apache.flink.runtime.client.JobExecutionException:
>>>> java.net.ConnectException: Operation timed out (Connection timed out)
>>>> at org.apache.flink.runtime.minicluster.MiniCluster.executeJobB
>>>> locking(MiniCluster.java:625)
>>>> at org.apache.flink.streaming.api.environment.LocalStreamEnviro
>>>> nment.execute(LocalStreamEnvironment.java:121)
>>>> at org.apache.flink.streaming.api.scala.StreamExecutionEnvironm
>>>> ent.execute(StreamExecutionEnvironment.scala:654)
>>>> at com.whil.flink.streaming.CreateUserNotificationRequests$.mai
>>>> n(CreateUserNotificationRequests.scala:42)
>>>> at com.whil.flink.streaming.CreateUserNotificationRequests.main
>>>> (CreateUserNotificationRequests.scala)
>>>> Caused by: java.net.ConnectException: Operation timed out (Connection
>>>> timed out)
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock
>>>> etIm

Re: ProcessFunction example from the documentation giving me error

2018-07-20 Thread anna stax
It is not the code, but I don't know what the problem is.
A simple word count with  socketTextStream used to work but now gives the
same error.
Apps with kafka source which used to work is giving the same error.
When I have a source generator within the app itself works good.

So, with socketTextStream and kafka source gives me
java.net.ConnectException: Operation timed out (Connection timed out)  error

On Fri, Jul 20, 2018 at 10:29 AM, anna stax  wrote:

> My object name is  CreateUserNotificationRequests, thats why  you see
> CreateUserNotificationRequests in the Error message.
> I edited the object name after pasting the code...Hope there is no
> confusion and I get some help.
> Thanks
>
>
>
> On Fri, Jul 20, 2018 at 10:10 AM, anna stax  wrote:
>
>> Hello all,
>>
>> This is my code, just trying to make the code example in
>> https://ci.apache.org/projects/flink/flink-docs-release-1
>> .5/dev/stream/operators/process_function.html work
>>
>> object ProcessFunctionTest {
>>
>>   def main(args: Array[String]) {
>>
>> val env = StreamExecutionEnvironment.getExecutionEnvironment
>> env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
>> val text = env.socketTextStream("localhost", )
>>
>> val text1 = text.map(s => (s,s)).keyBy(0).process(new
>> CountWithTimeoutFunction())
>>
>> text1.print()
>>
>> env.execute("CountWithTimeoutFunction")
>>   }
>>
>>   case class CountWithTimestamp(key: String, count: Long, lastModified:
>> Long)
>>
>>   class CountWithTimeoutFunction extends ProcessFunction[(String,
>> String), (String, Long)] {
>>
>> lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext
>>   .getState(new ValueStateDescriptor[CountWithTimestamp]("myState",
>> classOf[CountWithTimestamp]))
>>
>> override def processElement(value: (String, String), ctx:
>> ProcessFunction[(String, String), (String, Long)]#Context, out:
>> Collector[(String, Long)]): Unit = {
>> ..
>> }
>>
>> override def onTimer(timestamp: Long, ctx: ProcessFunction[(String,
>> String), (String, Long)]#OnTimerContext, out: Collector[(String, Long)]):
>> Unit = {
>> ...
>> }
>>   }
>> }
>>
>>
>> Exception in thread "main" 
>> org.apache.flink.runtime.client.JobExecutionException:
>> java.net.ConnectException: Operation timed out (Connection timed out)
>> at org.apache.flink.runtime.minicluster.MiniCluster.executeJobB
>> locking(MiniCluster.java:625)
>> at org.apache.flink.streaming.api.environment.LocalStreamEnviro
>> nment.execute(LocalStreamEnvironment.java:121)
>> at org.apache.flink.streaming.api.scala.StreamExecutionEnvironm
>> ent.execute(StreamExecutionEnvironment.scala:654)
>> at com.whil.flink.streaming.CreateUserNotificationRequests$.
>> main(CreateUserNotificationRequests.scala:42)
>> at com.whil.flink.streaming.CreateUserNotificationRequests.
>> main(CreateUserNotificationRequests.scala)
>> Caused by: java.net.ConnectException: Operation timed out (Connection
>> timed out)
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock
>> etImpl.java:350)
>> at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPl
>> ainSocketImpl.java:206)
>> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocket
>> Impl.java:188)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> at java.net.Socket.connect(Socket.java:589)
>> at org.apache.flink.streaming.api.functions.source.SocketTextSt
>> reamFunction.run(SocketTextStreamFunction.java:96)
>> at org.apache.flink.streaming.api.operators.StreamSource.run(
>> StreamSource.java:87)
>> at org.apache.flink.streaming.api.operators.StreamSource.run(
>> StreamSource.java:56)
>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.
>> run(SourceStreamTask.java:99)
>> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>> StreamTask.java:306)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> On Thu, Jul 19, 2018 at 11:22 PM, vino yang 
>> wrote:
>>
>>> Hi anna,
>>>
>>> Can you share your program and the exception stack trace and more
>>> details about what's your source and state backend?
>>>
>>> From the information you provided, it seems Flink started a network
>>> connect but timed out.
>>>
>>> Thanks, vino.
>>>
>>> 2018-07-20 14:14 GMT+08:00 anna stax :
>>>
>>>> Hi all,
>>>>
>>>> I am new to Flink. I am using the classes CountWithTimestamp and
>>>> CountWithTimeoutFunction from the examples found in
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>> dev/stream/operators/process_function.html
>>>>
>>>> I am getting the error Exception in thread "main"
>>>> org.apache.flink.runtime.client.JobExecutionException:
>>>> java.net.ConnectException: Operation timed out (Connection timed out)
>>>>
>>>> Looks like when timer’s time is reached I am getting this error. Any
>>>> idea why. Please help
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>


Re: ProcessFunction example from the documentation giving me error

2018-07-20 Thread anna stax
My object name is  CreateUserNotificationRequests, thats why  you see
CreateUserNotificationRequests
in the Error message.
I edited the object name after pasting the code...Hope there is no
confusion and I get some help.
Thanks



On Fri, Jul 20, 2018 at 10:10 AM, anna stax  wrote:

> Hello all,
>
> This is my code, just trying to make the code example in
> https://ci.apache.org/projects/flink/flink-docs-release-
> 1.5/dev/stream/operators/process_function.html work
>
> object ProcessFunctionTest {
>
>   def main(args: Array[String]) {
>
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
> val text = env.socketTextStream("localhost", )
>
> val text1 = text.map(s => (s,s)).keyBy(0).process(new
> CountWithTimeoutFunction())
>
> text1.print()
>
> env.execute("CountWithTimeoutFunction")
>   }
>
>   case class CountWithTimestamp(key: String, count: Long, lastModified:
> Long)
>
>   class CountWithTimeoutFunction extends ProcessFunction[(String, String),
> (String, Long)] {
>
> lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext
>   .getState(new ValueStateDescriptor[CountWithTimestamp]("myState",
> classOf[CountWithTimestamp]))
>
> override def processElement(value: (String, String), ctx:
> ProcessFunction[(String, String), (String, Long)]#Context, out:
> Collector[(String, Long)]): Unit = {
> ..
> }
>
> override def onTimer(timestamp: Long, ctx: ProcessFunction[(String,
> String), (String, Long)]#OnTimerContext, out: Collector[(String, Long)]):
> Unit = {
> ...
> }
>   }
> }
>
>
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException:
> java.net.ConnectException: Operation timed out (Connection timed out)
> at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(
> MiniCluster.java:625)
> at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.
> execute(LocalStreamEnvironment.java:121)
> at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.
> execute(StreamExecutionEnvironment.scala:654)
> at com.whil.flink.streaming.CreateUserNotificationRequests$.main(
> CreateUserNotificationRequests.scala:42)
> at com.whil.flink.streaming.CreateUserNotificationRequests.main(
> CreateUserNotificationRequests.scala)
> Caused by: java.net.ConnectException: Operation timed out (Connection
> timed out)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.AbstractPlainSocketImpl.doConnect(
> AbstractPlainSocketImpl.java:350)
> at java.net.AbstractPlainSocketImpl.connectToAddress(
> AbstractPlainSocketImpl.java:206)
> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
> 188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at org.apache.flink.streaming.api.functions.source.
> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:87)
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:56)
> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
> SourceStreamTask.java:99)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.
> invoke(StreamTask.java:306)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> at java.lang.Thread.run(Thread.java:745)
>
> On Thu, Jul 19, 2018 at 11:22 PM, vino yang  wrote:
>
>> Hi anna,
>>
>> Can you share your program and the exception stack trace and more details
>> about what's your source and state backend?
>>
>> From the information you provided, it seems Flink started a network
>> connect but timed out.
>>
>> Thanks, vino.
>>
>> 2018-07-20 14:14 GMT+08:00 anna stax :
>>
>>> Hi all,
>>>
>>> I am new to Flink. I am using the classes CountWithTimestamp and
>>> CountWithTimeoutFunction from the examples found in
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>> dev/stream/operators/process_function.html
>>>
>>> I am getting the error Exception in thread "main"
>>> org.apache.flink.runtime.client.JobExecutionException:
>>> java.net.ConnectException: Operation timed out (Connection timed out)
>>>
>>> Looks like when timer’s time is reached I am getting this error. Any
>>> idea why. Please help
>>>
>>> Thanks
>>>
>>
>>
>


Re: ProcessFunction example from the documentation giving me error

2018-07-20 Thread anna stax
Hello all,

This is my code, just trying to make the code example in
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/
operators/process_function.html work

object ProcessFunctionTest {

  def main(args: Array[String]) {

val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
val text = env.socketTextStream("localhost", )

val text1 = text.map(s => (s,s)).keyBy(0).process(new
CountWithTimeoutFunction())

text1.print()

env.execute("CountWithTimeoutFunction")
  }

  case class CountWithTimestamp(key: String, count: Long, lastModified:
Long)

  class CountWithTimeoutFunction extends ProcessFunction[(String, String),
(String, Long)] {

lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext
  .getState(new ValueStateDescriptor[CountWithTimestamp]("myState",
classOf[CountWithTimestamp]))

override def processElement(value: (String, String), ctx:
ProcessFunction[(String, String), (String, Long)]#Context, out:
Collector[(String, Long)]): Unit = {
..
}

override def onTimer(timestamp: Long, ctx: ProcessFunction[(String,
String), (String, Long)]#OnTimerContext, out: Collector[(String, Long)]):
Unit = {
...
}
  }
}


Exception in thread "main"
org.apache.flink.runtime.client.JobExecutionException:
java.net.ConnectException: Operation timed out (Connection timed out)
at
org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:625)
at
org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:121)
at
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
at
com.whil.flink.streaming.CreateUserNotificationRequests$.main(CreateUserNotificationRequests.scala:42)
at
com.whil.flink.streaming.CreateUserNotificationRequests.main(CreateUserNotificationRequests.scala)
Caused by: java.net.ConnectException: Operation timed out (Connection timed
out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at
org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)

On Thu, Jul 19, 2018 at 11:22 PM, vino yang  wrote:

> Hi anna,
>
> Can you share your program and the exception stack trace and more details
> about what's your source and state backend?
>
> From the information you provided, it seems Flink started a network
> connect but timed out.
>
> Thanks, vino.
>
> 2018-07-20 14:14 GMT+08:00 anna stax :
>
>> Hi all,
>>
>> I am new to Flink. I am using the classes CountWithTimestamp and
>> CountWithTimeoutFunction from the examples found in
>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>> dev/stream/operators/process_function.html
>>
>> I am getting the error Exception in thread "main"
>> org.apache.flink.runtime.client.JobExecutionException:
>> java.net.ConnectException: Operation timed out (Connection timed out)
>>
>> Looks like when timer’s time is reached I am getting this error. Any idea
>> why. Please help
>>
>> Thanks
>>
>
>


Re: Is KeyedProcessFunction available in Flink 1.4?

2018-07-20 Thread anna stax
Thanks Bowen.

On Thu, Jul 19, 2018 at 4:45 PM, Bowen Li  wrote:

> Hi Anna,
>
> KeyedProcessFunction is only available starting from Flink 1.5. The doc is
> here
> <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#the-keyedprocessfunction>.
> It extends ProcessFunction and shares the same functionalities except
> giving more access to timers' key, thus you can refer to examples of
> ProcessFunction in that document.
>
> Bowen
>
>
> On Thu, Jul 19, 2018 at 3:26 PM anna stax  wrote:
>
>> Hello all,
>> I am using Flink 1.4 because thats the version provided by the latest AWS
>> EMR.
>> Is KeyedProcessFunction available in Flink 1.4?
>>
>> Also please share any links to good examples on using
>> KeyedProcessFunction .
>>
>> Thanks
>>
>


ProcessFunction example from the documentation giving me error

2018-07-20 Thread anna stax
Hi all,

I am new to Flink. I am using the classes CountWithTimestamp and
CountWithTimeoutFunction from the examples found in
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/process_function.html

I am getting the error Exception in thread "main"
org.apache.flink.runtime.client.JobExecutionException:
java.net.ConnectException: Operation timed out (Connection timed out)

Looks like when timer’s time is reached I am getting this error. Any idea
why. Please help

Thanks


Is KeyedProcessFunction available in Flink 1.4?

2018-07-19 Thread anna stax
Hello all,
I am using Flink 1.4 because thats the version provided by the latest AWS
EMR.
Is KeyedProcessFunction available in Flink 1.4?

Also please share any links to good examples on using KeyedProcessFunction
.

Thanks


Re: How to create User Notifications/Reminder ?

2018-07-12 Thread anna stax
Thanks Hequn and Dawid for your input.  Thanks Dawid for the link.  Great
help!

On Thu, Jul 12, 2018 at 5:59 AM, Dawid Wysakowicz <
wysakowicz.da...@gmail.com> wrote:

> Hi shyla,
>
> It should be doable with CEP. You can create pattern like:
> Pattern.begin("start").next/followedBy("end").where(...).within(/* two
> weeks*/) and subscribe for timed out events. You can check very similar
> example here[1].
>
> Best,
>
> Dawid
>
> [1] https://github.com/dataArtisans/flink-training-
> exercises/blob/master/src/main/java/com/dataartisans/
> flinktraining/exercises/datastream_java/cep/LongRidesExercise.java
>
> On 12/07/18 14:26, Hequn Cheng wrote:
>
> Hi shyla,
>
> Considering window, I think it is not very convenient. Two weeks window is
> used to process data in the recent 2 weeks while you want to process data
> beyond 2 weeks.
> I'm not familiar with CEP, but it sounds like a good idea.
>
> Best, Hequn
>
>
> On Thu, Jul 12, 2018 at 10:56 AM, shyla deshpande <
> deshpandesh...@gmail.com> wrote:
>
>> Hi Hequen,
>>
>> I was more interested in solving using CEP.
>> I want to have a window of 2 weeks and in the Timeout Handler I want to
>> create Notification/Reminder.
>> Is this doable in Flink 1.4.2.?
>>
>> Thanks
>>
>>
>> On Wed, Jul 11, 2018 at 6:14 PM, Hequn Cheng 
>> wrote:
>>
>>> Hi shyla,
>>>
>>> There is a same question[1] asked two days ago. Maybe it is helpful for
>>> you. Let me know if you have any other concern.
>>> Best, Hequn
>>>
>>> [1] http://apache-flink-user-mailing-list-archive.2336050.n4
>>> .nabble.com/How-to-trigger-a-function-on-the-state-periodica
>>> lly-td21311.html
>>>
>>> On Thu, Jul 12, 2018 at 4:50 AM, shyla deshpande <
>>> deshpandesh...@gmail.com> wrote:
>>>
 I need to create User Notification/Reminder when I don’t see a specific
 event (high volume) from that user for more than 2 weeks.

 Should I be using windowing or CEP or  ProcessFunction?

 I am pretty new to Flink. Can anyone please advise me what is the best
 way to solve this?

 Thank you for your time.

>>>
>>>
>>
>
>


Re: How to trigger a function on the state periodically?

2018-07-10 Thread anna stax
sure. I will go ahead with this for now. Thanks for your suggestions.

On Mon, Jul 9, 2018 at 11:10 PM, Hequn Cheng  wrote:

> Hi,
> It depends on how many different users. In most cases, the performance
> will be fine. I think it worth to give a try. :-)
> Of course, there are ways to reduce the number of timers, for example
> keyBy(userId%1024), and use a MapState to store different users for the
> same group.
>
> On Tue, Jul 10, 2018 at 1:54 PM, anna stax  wrote:
>
>> Thanks Hequn. I think so too, the large number of timers could be a
>> problem.
>>
>> On Mon, Jul 9, 2018 at 10:23 PM, Hequn Cheng 
>> wrote:
>>
>>> Hi anna,
>>>
>>> According to your description, I think we can use the Timer to solve
>>> your problem. The TimerService deduplicates timers per key and timestamp.
>>> Also, note that a large number of timers can significantly increase
>>> checkpointing time.
>>>
>>> On Tue, Jul 10, 2018 at 11:38 AM, anna stax 
>>> wrote:
>>>
>>>> Thanks Hequn, for the links.
>>>>
>>>> This is my use case..
>>>>
>>>> When there is no user activity for n weeks, I need to send a
>>>> Notification to user.
>>>> The activity stream is usually very high volume for most users.
>>>> I thought it is not a good idea to use windowing for this, because of
>>>> the stream volume and window size.
>>>> I want to store in the state, for every user the last activity date and
>>>> process them once daily.
>>>>
>>>> I want to make sure I am heading in the right direction. Thank you for
>>>> your suggestions.
>>>>
>>>> -Anna
>>>>
>>>> On Mon, Jul 9, 2018 at 7:16 PM, Hequn Cheng 
>>>> wrote:
>>>>
>>>>> Hi anna,
>>>>>
>>>>> > I need to trigger a function once every day
>>>>> If you want to trigger by the function itself, you can use the
>>>>> Timer[1]. Both types of timers (processing-time and event-time) are
>>>>> internally maintained by the TimerService, and onTimer() method will be
>>>>> called once a timer fires.
>>>>> If you want to trigger the function of different
>>>>> parallelism synchronously, then the broadcast state[2] may be helpful.
>>>>>
>>>>> Hope this helps.
>>>>> Hequn
>>>>>
>>>>> [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/s
>>>>> tream/operators/process_function.html#timers
>>>>> [2] https://ci.apache.org/projects/flink/flink-docs-master/d
>>>>> ev/stream/state/broadcast_state.html
>>>>>
>>>>> On Tue, Jul 10, 2018 at 7:47 AM, anna stax 
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I need to trigger a function once every day to read the state and
>>>>>> create kafka events and also remove some records from state if they are 
>>>>>> too
>>>>>> old.
>>>>>>
>>>>>> Is there a way to do this? I am new to Flink, appreciate any feedback
>>>>>> and suggestions.
>>>>>>
>>>>>> Thanks
>>>>>> Anna
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: How to trigger a function on the state periodically?

2018-07-09 Thread anna stax
Thanks Hequn. I think so too, the large number of timers could be a
problem.

On Mon, Jul 9, 2018 at 10:23 PM, Hequn Cheng  wrote:

> Hi anna,
>
> According to your description, I think we can use the Timer to solve your
> problem. The TimerService deduplicates timers per key and timestamp. Also,
> note that a large number of timers can significantly increase checkpointing
> time.
>
> On Tue, Jul 10, 2018 at 11:38 AM, anna stax  wrote:
>
>> Thanks Hequn, for the links.
>>
>> This is my use case..
>>
>> When there is no user activity for n weeks, I need to send a Notification
>> to user.
>> The activity stream is usually very high volume for most users.
>> I thought it is not a good idea to use windowing for this, because of the
>> stream volume and window size.
>> I want to store in the state, for every user the last activity date and
>> process them once daily.
>>
>> I want to make sure I am heading in the right direction. Thank you for
>> your suggestions.
>>
>> -Anna
>>
>> On Mon, Jul 9, 2018 at 7:16 PM, Hequn Cheng  wrote:
>>
>>> Hi anna,
>>>
>>> > I need to trigger a function once every day
>>> If you want to trigger by the function itself, you can use the
>>> Timer[1]. Both types of timers (processing-time and event-time) are
>>> internally maintained by the TimerService, and onTimer() method will be
>>> called once a timer fires.
>>> If you want to trigger the function of different
>>> parallelism synchronously, then the broadcast state[2] may be helpful.
>>>
>>> Hope this helps.
>>> Hequn
>>>
>>> [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/s
>>> tream/operators/process_function.html#timers
>>> [2] https://ci.apache.org/projects/flink/flink-docs-master/d
>>> ev/stream/state/broadcast_state.html
>>>
>>> On Tue, Jul 10, 2018 at 7:47 AM, anna stax  wrote:
>>>
>>>> Hi all,
>>>>
>>>> I need to trigger a function once every day to read the state and
>>>> create kafka events and also remove some records from state if they are too
>>>> old.
>>>>
>>>> Is there a way to do this? I am new to Flink, appreciate any feedback
>>>> and suggestions.
>>>>
>>>> Thanks
>>>> Anna
>>>>
>>>
>>>
>>
>


Re: How to trigger a function on the state periodically?

2018-07-09 Thread anna stax
Thanks Hequn, for the links.

This is my use case..

When there is no user activity for n weeks, I need to send a Notification
to user.
The activity stream is usually very high volume for most users.
I thought it is not a good idea to use windowing for this, because of the
stream volume and window size.
I want to store in the state, for every user the last activity date and
process them once daily.

I want to make sure I am heading in the right direction. Thank you for your
suggestions.

-Anna

On Mon, Jul 9, 2018 at 7:16 PM, Hequn Cheng  wrote:

> Hi anna,
>
> > I need to trigger a function once every day
> If you want to trigger by the function itself, you can use the
> Timer[1]. Both types of timers (processing-time and event-time) are
> internally maintained by the TimerService, and onTimer() method will be
> called once a timer fires.
> If you want to trigger the function of different
> parallelism synchronously, then the broadcast state[2] may be helpful.
>
> Hope this helps.
> Hequn
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> master/dev/stream/operators/process_function.html#timers
> [2] https://ci.apache.org/projects/flink/flink-docs-
> master/dev/stream/state/broadcast_state.html
>
> On Tue, Jul 10, 2018 at 7:47 AM, anna stax  wrote:
>
>> Hi all,
>>
>> I need to trigger a function once every day to read the state and create
>> kafka events and also remove some records from state if they are too old.
>>
>> Is there a way to do this? I am new to Flink, appreciate any feedback and
>> suggestions.
>>
>> Thanks
>> Anna
>>
>
>


How to trigger a function on the state periodically?

2018-07-09 Thread anna stax
Hi all,

I need to trigger a function once every day to read the state and create
kafka events and also remove some records from state if they are too old.

Is there a way to do this? I am new to Flink, appreciate any feedback and
suggestions.

Thanks
Anna