Re: Flink CEP with files and no streams?
Hi, I'm not aware of a good example but I can give you some pointers. - Implement the SourceFunction interface. This function will not be executed in parallel, so you don't have to worry about parallelism. - Since you said, you want to run it as a batch job, you might not need to implement checkpointing functionality - In the run method, you open the file that you need to read. Start parsing the file, when you have a record, extract the timestamp and emit both by passing them to the SourceContext. - Every n-th record, you can emit a watermark. The watermark timestamp must be smaller than all record that will be emitted in the future. I'd start processing a single file and extending the source from there. Hope this helps, Fabian 2018-02-07 13:59 GMT+01:00 Esa Heikkinen <esa.heikki...@student.tut.fi>: > Hi > > > > Thanks for the reply, but because I am a newbie with Flink, do you have > any good Scala code examples about this ? > > > > Esa > > > > *From:* Fabian Hueske [mailto:fhue...@gmail.com] > *Sent:* Wednesday, February 7, 2018 11:21 AM > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Flink CEP with files and no streams? > > > > Hi Esa, > > you can also read files as a stream. > However, you have to be careful in which order you read the files and how > you generate watermarks. > > The easiest approach is to implement a non-parallel source function that > reads the files in the right order and generates watermarks. > > Things become more tricky when you try to read the files in parallel. > > Best, Fabian > > > > 2018-02-07 9:40 GMT+01:00 Esa Heikkinen <esa.heikki...@student.tut.fi>: > > Hello > > > > I am trying to use CEP of Flink for log files (as batch job), but not for > streams (as realtime). > > Is that possible ? If yes, do you know examples Scala codes about that ? > > > > Or should I convert the log files (with time stamps) into streams ? > > But how to handle time stamps in Flink ? > > > > If I can not use Flink at all for this purpose, do you have any > recommendations of other tools ? > > > > I would want CEP type analysis for log files. > > > > > > >
RE: Flink CEP with files and no streams?
Hi Thanks for the reply, but because I am a newbie with Flink, do you have any good Scala code examples about this ? Esa From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Wednesday, February 7, 2018 11:21 AM To: Esa Heikkinen <esa.heikki...@student.tut.fi> Cc: user@flink.apache.org Subject: Re: Flink CEP with files and no streams? Hi Esa, you can also read files as a stream. However, you have to be careful in which order you read the files and how you generate watermarks. The easiest approach is to implement a non-parallel source function that reads the files in the right order and generates watermarks. Things become more tricky when you try to read the files in parallel. Best, Fabian 2018-02-07 9:40 GMT+01:00 Esa Heikkinen <esa.heikki...@student.tut.fi<mailto:esa.heikki...@student.tut.fi>>: Hello I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime). Is that possible ? If yes, do you know examples Scala codes about that ? Or should I convert the log files (with time stamps) into streams ? But how to handle time stamps in Flink ? If I can not use Flink at all for this purpose, do you have any recommendations of other tools ? I would want CEP type analysis for log files.
Re: Flink CEP with files and no streams?
Hi Esa, you can also read files as a stream. However, you have to be careful in which order you read the files and how you generate watermarks. The easiest approach is to implement a non-parallel source function that reads the files in the right order and generates watermarks. Things become more tricky when you try to read the files in parallel. Best, Fabian 2018-02-07 9:40 GMT+01:00 Esa Heikkinen: > Hello > > > > I am trying to use CEP of Flink for log files (as batch job), but not for > streams (as realtime). > > Is that possible ? If yes, do you know examples Scala codes about that ? > > > > Or should I convert the log files (with time stamps) into streams ? > > But how to handle time stamps in Flink ? > > > > If I can not use Flink at all for this purpose, do you have any > recommendations of other tools ? > > > > I would want CEP type analysis for log files. > > > > >
Flink CEP with files and no streams?
Hello I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime). Is that possible ? If yes, do you know examples Scala codes about that ? Or should I convert the log files (with time stamps) into streams ? But how to handle time stamps in Flink ? If I can not use Flink at all for this purpose, do you have any recommendations of other tools ? I would want CEP type analysis for log files.