Re: Event Time Session Window does not trigger..
Hi Hequn, Thanks for link. Looks like I better use ProcessingTime instead of EventTime especially because of the 4th reason you listed.. "Data should cover a longer time span than the window size to advance the event time." I need the trigger when the data stops. I have 1 more question. Can I set the TimeCharacteristic to the stream level instead on the application level? Can I use both TimeCharacteristic.ProcessingTime and TimeCharacteristic.EventTime in an application. Thank you On Sat, Aug 4, 2018 at 10:05 PM, Hequn Cheng wrote: > Hi shyla, > > I answered a similar question on stackoverflow[1], you can take a look > first. > > Best, Hequn > > [1] https://stackoverflow.com/questions/51691269/event-time- > window-in-flink-does-not-trigger > > On Sun, Aug 5, 2018 at 11:24 AM, shyla deshpande > wrote: > >> Hi, >> >> I used PopularPlacesFromKafka from dataartisans.flinktraining.exercises as >> the basis. I made very minor changes >> >> and the session window is not triggered. If I use ProcessingTime instead of >> EventTime it works. Here is my code. >> >> Appreciate any help. Thanks >> >> object KafkaEventTimeWindow { >> >> private val LOCAL_ZOOKEEPER_HOST = "localhost:2181" >> private val LOCAL_KAFKA_BROKER = "localhost:9092" >> private val CON_GROUP = "KafkaEventTimeSessionWindow" >> private val MAX_EVENT_DELAY = 60 // events are out of order by max 60 >> seconds >> >> def main(args: Array[String]) { >> >> val env = StreamExecutionEnvironment.getExecutionEnvironment >> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) >> >> val kafkaProps = new Properties >> kafkaProps.setProperty("zookeeper.connect", LOCAL_ZOOKEEPER_HOST) >> kafkaProps.setProperty("bootstrap.servers", LOCAL_KAFKA_BROKER) >> kafkaProps.setProperty("group.id", CON_GROUP) >> kafkaProps.setProperty("auto.offset.reset", "earliest") >> >> val consumer = new FlinkKafkaConsumer011[PositionEventProto]( >> "positionevent", >> new PositionEventProtoSchema, >> kafkaProps) >> consumer.assignTimestampsAndWatermarks(new PositionEventProtoTSAssigner) >> >> val posstream = env.addSource(consumer) >> >> def convtoepochmilli(cdt: String): Long = { >> val odt:OffsetDateTime = OffsetDateTime.parse(cdt); >> val i:Instant = odt.toInstant(); >> val millis:Long = i.toEpochMilli(); >> millis >> } >> >> val outputstream = posstream >> .mapWith{case(p) => (p.getConsumerUserId, >> convtoepochmilli(p.getCreateDateTime.getInIso8601Format))} >> .keyBy(0) >> .window(EventTimeSessionWindows.withGap(Time.seconds(60))) >> .reduce { (v1, v2) => (v1._1, Math.max(v1._2 , v2._2)) } >> >> outputstream.print() >> >> // execute the transformation pipeline >> env.execute("Output Stream") >> } >> >> } >> >> class PositionEventProtoTSAssigner >> extends >> BoundedOutOfOrdernessTimestampExtractor[PositionEventProto](Time.seconds(60)) >> { >> >> override def extractTimestamp(pos: PositionEventProto): Long = { >> val odt:OffsetDateTime = >> OffsetDateTime.parse(pos.getCreateDateTime.getInIso8601Format); >> val i:Instant = odt.toInstant(); >> val millis:Long = i.toEpochMilli(); >> millis >> } >> } >> >> >> >
Multiple output operations in a job vs multiple jobs
Hi all, I am not sure when I should go for multiple jobs or have 1 job with all the sources and sinks. Following is my code. val env = StreamExecutionEnvironment.getExecutionEnvironment ... // create a Kafka source val srcstream = env.addSource(consumer) srcstream .keyBy(0) .window(ProcessingTimeSessionWindows.withGap(Time.days(14))) .reduce ... .map ... .addSink ... srcstream .keyBy(0) .window(ProcessingTimeSessionWindows.withGap(Time.days(28))) .reduce ... .map ... .addSink ... env.execute("Job1") My questions 1. The srcstream is a very high volume stream and the window size is 2 weeks and 4 weeks. Is the window size a problem? In this case, I think it is not a problem because I am using reduce which stores only 1 value per window. Is that right? 2. I am having 2 output operations one with 2 weeks window and the other with 4 weeks window. Are they executed in parallel or in sequence? 3. When I have multiple output operations like in this case should I break it into 2 different jobs ? 4. Can I run multiple jobs on the same cluster? Thanks
Re: ProcessFunction example from the documentation giving me error
It is good now. Sorry, my fault. I had multiple applications running and both were using the socket stream . Thanks. On Sun, Jul 22, 2018 at 8:22 PM, vino yang wrote: > Hi anna, > > From the stack trace you provided, it's socket connect error not about > Flink. > > So, Have you start a socket server at "localhost:"? Using a program or > CLI tool, such as "nc -l " > > There is a example you can have a look[1]. > > [1]: https://ci.apache.org/projects/flink/flink-docs- > release-1.5/quickstart/setup_quickstart.html#run-the-example > > Thanks, vino. > > 2018-07-21 4:30 GMT+08:00 anna stax : > >> It is not the code, but I don't know what the problem is. >> A simple word count with socketTextStream used to work but now gives >> the same error. >> Apps with kafka source which used to work is giving the same error. >> When I have a source generator within the app itself works good. >> >> So, with socketTextStream and kafka source gives me >> java.net.ConnectException: Operation timed out (Connection timed out) >> error >> >> On Fri, Jul 20, 2018 at 10:29 AM, anna stax wrote: >> >>> My object name is CreateUserNotificationRequests, thats why you see >>> CreateUserNotificationRequests in the Error message. >>> I edited the object name after pasting the code...Hope there is no >>> confusion and I get some help. >>> Thanks >>> >>> >>> >>> On Fri, Jul 20, 2018 at 10:10 AM, anna stax >>> wrote: >>> >>>> Hello all, >>>> >>>> This is my code, just trying to make the code example in >>>> https://ci.apache.org/projects/flink/flink-docs-release-1 >>>> .5/dev/stream/operators/process_function.html work >>>> >>>> object ProcessFunctionTest { >>>> >>>> def main(args: Array[String]) { >>>> >>>> val env = StreamExecutionEnvironment.getExecutionEnvironment >>>> env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) >>>> val text = env.socketTextStream("localhost", ) >>>> >>>> val text1 = text.map(s => (s,s)).keyBy(0).process(new >>>> CountWithTimeoutFunction()) >>>> >>>> text1.print() >>>> >>>> env.execute("CountWithTimeoutFunction") >>>> } >>>> >>>> case class CountWithTimestamp(key: String, count: Long, lastModified: >>>> Long) >>>> >>>> class CountWithTimeoutFunction extends ProcessFunction[(String, >>>> String), (String, Long)] { >>>> >>>> lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext >>>> .getState(new ValueStateDescriptor[CountWithTimestamp]("myState", >>>> classOf[CountWithTimestamp])) >>>> >>>> override def processElement(value: (String, String), ctx: >>>> ProcessFunction[(String, String), (String, Long)]#Context, out: >>>> Collector[(String, Long)]): Unit = { >>>> .. >>>> } >>>> >>>> override def onTimer(timestamp: Long, ctx: ProcessFunction[(String, >>>> String), (String, Long)]#OnTimerContext, out: Collector[(String, Long)]): >>>> Unit = { >>>> ... >>>> } >>>> } >>>> } >>>> >>>> >>>> Exception in thread "main" >>>> org.apache.flink.runtime.client.JobExecutionException: >>>> java.net.ConnectException: Operation timed out (Connection timed out) >>>> at org.apache.flink.runtime.minicluster.MiniCluster.executeJobB >>>> locking(MiniCluster.java:625) >>>> at org.apache.flink.streaming.api.environment.LocalStreamEnviro >>>> nment.execute(LocalStreamEnvironment.java:121) >>>> at org.apache.flink.streaming.api.scala.StreamExecutionEnvironm >>>> ent.execute(StreamExecutionEnvironment.scala:654) >>>> at com.whil.flink.streaming.CreateUserNotificationRequests$.mai >>>> n(CreateUserNotificationRequests.scala:42) >>>> at com.whil.flink.streaming.CreateUserNotificationRequests.main >>>> (CreateUserNotificationRequests.scala) >>>> Caused by: java.net.ConnectException: Operation timed out (Connection >>>> timed out) >>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock >>>> etIm
Re: ProcessFunction example from the documentation giving me error
It is not the code, but I don't know what the problem is. A simple word count with socketTextStream used to work but now gives the same error. Apps with kafka source which used to work is giving the same error. When I have a source generator within the app itself works good. So, with socketTextStream and kafka source gives me java.net.ConnectException: Operation timed out (Connection timed out) error On Fri, Jul 20, 2018 at 10:29 AM, anna stax wrote: > My object name is CreateUserNotificationRequests, thats why you see > CreateUserNotificationRequests in the Error message. > I edited the object name after pasting the code...Hope there is no > confusion and I get some help. > Thanks > > > > On Fri, Jul 20, 2018 at 10:10 AM, anna stax wrote: > >> Hello all, >> >> This is my code, just trying to make the code example in >> https://ci.apache.org/projects/flink/flink-docs-release-1 >> .5/dev/stream/operators/process_function.html work >> >> object ProcessFunctionTest { >> >> def main(args: Array[String]) { >> >> val env = StreamExecutionEnvironment.getExecutionEnvironment >> env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) >> val text = env.socketTextStream("localhost", ) >> >> val text1 = text.map(s => (s,s)).keyBy(0).process(new >> CountWithTimeoutFunction()) >> >> text1.print() >> >> env.execute("CountWithTimeoutFunction") >> } >> >> case class CountWithTimestamp(key: String, count: Long, lastModified: >> Long) >> >> class CountWithTimeoutFunction extends ProcessFunction[(String, >> String), (String, Long)] { >> >> lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext >> .getState(new ValueStateDescriptor[CountWithTimestamp]("myState", >> classOf[CountWithTimestamp])) >> >> override def processElement(value: (String, String), ctx: >> ProcessFunction[(String, String), (String, Long)]#Context, out: >> Collector[(String, Long)]): Unit = { >> .. >> } >> >> override def onTimer(timestamp: Long, ctx: ProcessFunction[(String, >> String), (String, Long)]#OnTimerContext, out: Collector[(String, Long)]): >> Unit = { >> ... >> } >> } >> } >> >> >> Exception in thread "main" >> org.apache.flink.runtime.client.JobExecutionException: >> java.net.ConnectException: Operation timed out (Connection timed out) >> at org.apache.flink.runtime.minicluster.MiniCluster.executeJobB >> locking(MiniCluster.java:625) >> at org.apache.flink.streaming.api.environment.LocalStreamEnviro >> nment.execute(LocalStreamEnvironment.java:121) >> at org.apache.flink.streaming.api.scala.StreamExecutionEnvironm >> ent.execute(StreamExecutionEnvironment.scala:654) >> at com.whil.flink.streaming.CreateUserNotificationRequests$. >> main(CreateUserNotificationRequests.scala:42) >> at com.whil.flink.streaming.CreateUserNotificationRequests. >> main(CreateUserNotificationRequests.scala) >> Caused by: java.net.ConnectException: Operation timed out (Connection >> timed out) >> at java.net.PlainSocketImpl.socketConnect(Native Method) >> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock >> etImpl.java:350) >> at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPl >> ainSocketImpl.java:206) >> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocket >> Impl.java:188) >> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >> at java.net.Socket.connect(Socket.java:589) >> at org.apache.flink.streaming.api.functions.source.SocketTextSt >> reamFunction.run(SocketTextStreamFunction.java:96) >> at org.apache.flink.streaming.api.operators.StreamSource.run( >> StreamSource.java:87) >> at org.apache.flink.streaming.api.operators.StreamSource.run( >> StreamSource.java:56) >> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask. >> run(SourceStreamTask.java:99) >> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> StreamTask.java:306) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) >> at java.lang.Thread.run(Thread.java:745) >> >> On Thu, Jul 19, 2018 at 11:22 PM, vino yang >> wrote: >> >>> Hi anna, >>> >>> Can you share your program and the exception stack trace and more >>> details about what's your source and state backend? >>> >>> From the information you provided, it seems Flink started a network >>> connect but timed out. >>> >>> Thanks, vino. >>> >>> 2018-07-20 14:14 GMT+08:00 anna stax : >>> >>>> Hi all, >>>> >>>> I am new to Flink. I am using the classes CountWithTimestamp and >>>> CountWithTimeoutFunction from the examples found in >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ >>>> dev/stream/operators/process_function.html >>>> >>>> I am getting the error Exception in thread "main" >>>> org.apache.flink.runtime.client.JobExecutionException: >>>> java.net.ConnectException: Operation timed out (Connection timed out) >>>> >>>> Looks like when timer’s time is reached I am getting this error. Any >>>> idea why. Please help >>>> >>>> Thanks >>>> >>> >>> >> >
Re: ProcessFunction example from the documentation giving me error
My object name is CreateUserNotificationRequests, thats why you see CreateUserNotificationRequests in the Error message. I edited the object name after pasting the code...Hope there is no confusion and I get some help. Thanks On Fri, Jul 20, 2018 at 10:10 AM, anna stax wrote: > Hello all, > > This is my code, just trying to make the code example in > https://ci.apache.org/projects/flink/flink-docs-release- > 1.5/dev/stream/operators/process_function.html work > > object ProcessFunctionTest { > > def main(args: Array[String]) { > > val env = StreamExecutionEnvironment.getExecutionEnvironment > env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) > val text = env.socketTextStream("localhost", ) > > val text1 = text.map(s => (s,s)).keyBy(0).process(new > CountWithTimeoutFunction()) > > text1.print() > > env.execute("CountWithTimeoutFunction") > } > > case class CountWithTimestamp(key: String, count: Long, lastModified: > Long) > > class CountWithTimeoutFunction extends ProcessFunction[(String, String), > (String, Long)] { > > lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext > .getState(new ValueStateDescriptor[CountWithTimestamp]("myState", > classOf[CountWithTimestamp])) > > override def processElement(value: (String, String), ctx: > ProcessFunction[(String, String), (String, Long)]#Context, out: > Collector[(String, Long)]): Unit = { > .. > } > > override def onTimer(timestamp: Long, ctx: ProcessFunction[(String, > String), (String, Long)]#OnTimerContext, out: Collector[(String, Long)]): > Unit = { > ... > } > } > } > > > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: > java.net.ConnectException: Operation timed out (Connection timed out) > at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking( > MiniCluster.java:625) > at org.apache.flink.streaming.api.environment.LocalStreamEnvironment. > execute(LocalStreamEnvironment.java:121) > at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment. > execute(StreamExecutionEnvironment.scala:654) > at com.whil.flink.streaming.CreateUserNotificationRequests$.main( > CreateUserNotificationRequests.scala:42) > at com.whil.flink.streaming.CreateUserNotificationRequests.main( > CreateUserNotificationRequests.scala) > Caused by: java.net.ConnectException: Operation timed out (Connection > timed out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.AbstractPlainSocketImpl.doConnect( > AbstractPlainSocketImpl.java:350) > at java.net.AbstractPlainSocketImpl.connectToAddress( > AbstractPlainSocketImpl.java:206) > at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java: > 188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at org.apache.flink.streaming.api.functions.source. > SocketTextStreamFunction.run(SocketTextStreamFunction.java:96) > at org.apache.flink.streaming.api.operators.StreamSource. > run(StreamSource.java:87) > at org.apache.flink.streaming.api.operators.StreamSource. > run(StreamSource.java:56) > at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run( > SourceStreamTask.java:99) > at org.apache.flink.streaming.runtime.tasks.StreamTask. > invoke(StreamTask.java:306) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:745) > > On Thu, Jul 19, 2018 at 11:22 PM, vino yang wrote: > >> Hi anna, >> >> Can you share your program and the exception stack trace and more details >> about what's your source and state backend? >> >> From the information you provided, it seems Flink started a network >> connect but timed out. >> >> Thanks, vino. >> >> 2018-07-20 14:14 GMT+08:00 anna stax : >> >>> Hi all, >>> >>> I am new to Flink. I am using the classes CountWithTimestamp and >>> CountWithTimeoutFunction from the examples found in >>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ >>> dev/stream/operators/process_function.html >>> >>> I am getting the error Exception in thread "main" >>> org.apache.flink.runtime.client.JobExecutionException: >>> java.net.ConnectException: Operation timed out (Connection timed out) >>> >>> Looks like when timer’s time is reached I am getting this error. Any >>> idea why. Please help >>> >>> Thanks >>> >> >> >
Re: ProcessFunction example from the documentation giving me error
Hello all, This is my code, just trying to make the code example in https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/ operators/process_function.html work object ProcessFunctionTest { def main(args: Array[String]) { val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) val text = env.socketTextStream("localhost", ) val text1 = text.map(s => (s,s)).keyBy(0).process(new CountWithTimeoutFunction()) text1.print() env.execute("CountWithTimeoutFunction") } case class CountWithTimestamp(key: String, count: Long, lastModified: Long) class CountWithTimeoutFunction extends ProcessFunction[(String, String), (String, Long)] { lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext .getState(new ValueStateDescriptor[CountWithTimestamp]("myState", classOf[CountWithTimestamp])) override def processElement(value: (String, String), ctx: ProcessFunction[(String, String), (String, Long)]#Context, out: Collector[(String, Long)]): Unit = { .. } override def onTimer(timestamp: Long, ctx: ProcessFunction[(String, String), (String, Long)]#OnTimerContext, out: Collector[(String, Long)]): Unit = { ... } } } Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: java.net.ConnectException: Operation timed out (Connection timed out) at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:625) at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:121) at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654) at com.whil.flink.streaming.CreateUserNotificationRequests$.main(CreateUserNotificationRequests.scala:42) at com.whil.flink.streaming.CreateUserNotificationRequests.main(CreateUserNotificationRequests.scala) Caused by: java.net.ConnectException: Operation timed out (Connection timed out) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) at java.lang.Thread.run(Thread.java:745) On Thu, Jul 19, 2018 at 11:22 PM, vino yang wrote: > Hi anna, > > Can you share your program and the exception stack trace and more details > about what's your source and state backend? > > From the information you provided, it seems Flink started a network > connect but timed out. > > Thanks, vino. > > 2018-07-20 14:14 GMT+08:00 anna stax : > >> Hi all, >> >> I am new to Flink. I am using the classes CountWithTimestamp and >> CountWithTimeoutFunction from the examples found in >> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ >> dev/stream/operators/process_function.html >> >> I am getting the error Exception in thread "main" >> org.apache.flink.runtime.client.JobExecutionException: >> java.net.ConnectException: Operation timed out (Connection timed out) >> >> Looks like when timer’s time is reached I am getting this error. Any idea >> why. Please help >> >> Thanks >> > >
Re: Is KeyedProcessFunction available in Flink 1.4?
Thanks Bowen. On Thu, Jul 19, 2018 at 4:45 PM, Bowen Li wrote: > Hi Anna, > > KeyedProcessFunction is only available starting from Flink 1.5. The doc is > here > <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#the-keyedprocessfunction>. > It extends ProcessFunction and shares the same functionalities except > giving more access to timers' key, thus you can refer to examples of > ProcessFunction in that document. > > Bowen > > > On Thu, Jul 19, 2018 at 3:26 PM anna stax wrote: > >> Hello all, >> I am using Flink 1.4 because thats the version provided by the latest AWS >> EMR. >> Is KeyedProcessFunction available in Flink 1.4? >> >> Also please share any links to good examples on using >> KeyedProcessFunction . >> >> Thanks >> >
ProcessFunction example from the documentation giving me error
Hi all, I am new to Flink. I am using the classes CountWithTimestamp and CountWithTimeoutFunction from the examples found in https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/process_function.html I am getting the error Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: java.net.ConnectException: Operation timed out (Connection timed out) Looks like when timer’s time is reached I am getting this error. Any idea why. Please help Thanks
Is KeyedProcessFunction available in Flink 1.4?
Hello all, I am using Flink 1.4 because thats the version provided by the latest AWS EMR. Is KeyedProcessFunction available in Flink 1.4? Also please share any links to good examples on using KeyedProcessFunction . Thanks
Re: How to create User Notifications/Reminder ?
Thanks Hequn and Dawid for your input. Thanks Dawid for the link. Great help! On Thu, Jul 12, 2018 at 5:59 AM, Dawid Wysakowicz < wysakowicz.da...@gmail.com> wrote: > Hi shyla, > > It should be doable with CEP. You can create pattern like: > Pattern.begin("start").next/followedBy("end").where(...).within(/* two > weeks*/) and subscribe for timed out events. You can check very similar > example here[1]. > > Best, > > Dawid > > [1] https://github.com/dataArtisans/flink-training- > exercises/blob/master/src/main/java/com/dataartisans/ > flinktraining/exercises/datastream_java/cep/LongRidesExercise.java > > On 12/07/18 14:26, Hequn Cheng wrote: > > Hi shyla, > > Considering window, I think it is not very convenient. Two weeks window is > used to process data in the recent 2 weeks while you want to process data > beyond 2 weeks. > I'm not familiar with CEP, but it sounds like a good idea. > > Best, Hequn > > > On Thu, Jul 12, 2018 at 10:56 AM, shyla deshpande < > deshpandesh...@gmail.com> wrote: > >> Hi Hequen, >> >> I was more interested in solving using CEP. >> I want to have a window of 2 weeks and in the Timeout Handler I want to >> create Notification/Reminder. >> Is this doable in Flink 1.4.2.? >> >> Thanks >> >> >> On Wed, Jul 11, 2018 at 6:14 PM, Hequn Cheng >> wrote: >> >>> Hi shyla, >>> >>> There is a same question[1] asked two days ago. Maybe it is helpful for >>> you. Let me know if you have any other concern. >>> Best, Hequn >>> >>> [1] http://apache-flink-user-mailing-list-archive.2336050.n4 >>> .nabble.com/How-to-trigger-a-function-on-the-state-periodica >>> lly-td21311.html >>> >>> On Thu, Jul 12, 2018 at 4:50 AM, shyla deshpande < >>> deshpandesh...@gmail.com> wrote: >>> I need to create User Notification/Reminder when I don’t see a specific event (high volume) from that user for more than 2 weeks. Should I be using windowing or CEP or ProcessFunction? I am pretty new to Flink. Can anyone please advise me what is the best way to solve this? Thank you for your time. >>> >>> >> > >
Re: How to trigger a function on the state periodically?
sure. I will go ahead with this for now. Thanks for your suggestions. On Mon, Jul 9, 2018 at 11:10 PM, Hequn Cheng wrote: > Hi, > It depends on how many different users. In most cases, the performance > will be fine. I think it worth to give a try. :-) > Of course, there are ways to reduce the number of timers, for example > keyBy(userId%1024), and use a MapState to store different users for the > same group. > > On Tue, Jul 10, 2018 at 1:54 PM, anna stax wrote: > >> Thanks Hequn. I think so too, the large number of timers could be a >> problem. >> >> On Mon, Jul 9, 2018 at 10:23 PM, Hequn Cheng >> wrote: >> >>> Hi anna, >>> >>> According to your description, I think we can use the Timer to solve >>> your problem. The TimerService deduplicates timers per key and timestamp. >>> Also, note that a large number of timers can significantly increase >>> checkpointing time. >>> >>> On Tue, Jul 10, 2018 at 11:38 AM, anna stax >>> wrote: >>> >>>> Thanks Hequn, for the links. >>>> >>>> This is my use case.. >>>> >>>> When there is no user activity for n weeks, I need to send a >>>> Notification to user. >>>> The activity stream is usually very high volume for most users. >>>> I thought it is not a good idea to use windowing for this, because of >>>> the stream volume and window size. >>>> I want to store in the state, for every user the last activity date and >>>> process them once daily. >>>> >>>> I want to make sure I am heading in the right direction. Thank you for >>>> your suggestions. >>>> >>>> -Anna >>>> >>>> On Mon, Jul 9, 2018 at 7:16 PM, Hequn Cheng >>>> wrote: >>>> >>>>> Hi anna, >>>>> >>>>> > I need to trigger a function once every day >>>>> If you want to trigger by the function itself, you can use the >>>>> Timer[1]. Both types of timers (processing-time and event-time) are >>>>> internally maintained by the TimerService, and onTimer() method will be >>>>> called once a timer fires. >>>>> If you want to trigger the function of different >>>>> parallelism synchronously, then the broadcast state[2] may be helpful. >>>>> >>>>> Hope this helps. >>>>> Hequn >>>>> >>>>> [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/s >>>>> tream/operators/process_function.html#timers >>>>> [2] https://ci.apache.org/projects/flink/flink-docs-master/d >>>>> ev/stream/state/broadcast_state.html >>>>> >>>>> On Tue, Jul 10, 2018 at 7:47 AM, anna stax >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I need to trigger a function once every day to read the state and >>>>>> create kafka events and also remove some records from state if they are >>>>>> too >>>>>> old. >>>>>> >>>>>> Is there a way to do this? I am new to Flink, appreciate any feedback >>>>>> and suggestions. >>>>>> >>>>>> Thanks >>>>>> Anna >>>>>> >>>>> >>>>> >>>> >>> >> >
Re: How to trigger a function on the state periodically?
Thanks Hequn. I think so too, the large number of timers could be a problem. On Mon, Jul 9, 2018 at 10:23 PM, Hequn Cheng wrote: > Hi anna, > > According to your description, I think we can use the Timer to solve your > problem. The TimerService deduplicates timers per key and timestamp. Also, > note that a large number of timers can significantly increase checkpointing > time. > > On Tue, Jul 10, 2018 at 11:38 AM, anna stax wrote: > >> Thanks Hequn, for the links. >> >> This is my use case.. >> >> When there is no user activity for n weeks, I need to send a Notification >> to user. >> The activity stream is usually very high volume for most users. >> I thought it is not a good idea to use windowing for this, because of the >> stream volume and window size. >> I want to store in the state, for every user the last activity date and >> process them once daily. >> >> I want to make sure I am heading in the right direction. Thank you for >> your suggestions. >> >> -Anna >> >> On Mon, Jul 9, 2018 at 7:16 PM, Hequn Cheng wrote: >> >>> Hi anna, >>> >>> > I need to trigger a function once every day >>> If you want to trigger by the function itself, you can use the >>> Timer[1]. Both types of timers (processing-time and event-time) are >>> internally maintained by the TimerService, and onTimer() method will be >>> called once a timer fires. >>> If you want to trigger the function of different >>> parallelism synchronously, then the broadcast state[2] may be helpful. >>> >>> Hope this helps. >>> Hequn >>> >>> [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/s >>> tream/operators/process_function.html#timers >>> [2] https://ci.apache.org/projects/flink/flink-docs-master/d >>> ev/stream/state/broadcast_state.html >>> >>> On Tue, Jul 10, 2018 at 7:47 AM, anna stax wrote: >>> >>>> Hi all, >>>> >>>> I need to trigger a function once every day to read the state and >>>> create kafka events and also remove some records from state if they are too >>>> old. >>>> >>>> Is there a way to do this? I am new to Flink, appreciate any feedback >>>> and suggestions. >>>> >>>> Thanks >>>> Anna >>>> >>> >>> >> >
Re: How to trigger a function on the state periodically?
Thanks Hequn, for the links. This is my use case.. When there is no user activity for n weeks, I need to send a Notification to user. The activity stream is usually very high volume for most users. I thought it is not a good idea to use windowing for this, because of the stream volume and window size. I want to store in the state, for every user the last activity date and process them once daily. I want to make sure I am heading in the right direction. Thank you for your suggestions. -Anna On Mon, Jul 9, 2018 at 7:16 PM, Hequn Cheng wrote: > Hi anna, > > > I need to trigger a function once every day > If you want to trigger by the function itself, you can use the > Timer[1]. Both types of timers (processing-time and event-time) are > internally maintained by the TimerService, and onTimer() method will be > called once a timer fires. > If you want to trigger the function of different > parallelism synchronously, then the broadcast state[2] may be helpful. > > Hope this helps. > Hequn > > [1] https://ci.apache.org/projects/flink/flink-docs- > master/dev/stream/operators/process_function.html#timers > [2] https://ci.apache.org/projects/flink/flink-docs- > master/dev/stream/state/broadcast_state.html > > On Tue, Jul 10, 2018 at 7:47 AM, anna stax wrote: > >> Hi all, >> >> I need to trigger a function once every day to read the state and create >> kafka events and also remove some records from state if they are too old. >> >> Is there a way to do this? I am new to Flink, appreciate any feedback and >> suggestions. >> >> Thanks >> Anna >> > >
How to trigger a function on the state periodically?
Hi all, I need to trigger a function once every day to read the state and create kafka events and also remove some records from state if they are too old. Is there a way to do this? I am new to Flink, appreciate any feedback and suggestions. Thanks Anna