Re: About delta awareness caches

2016-12-21 Thread xingcan
Hi Aljoscha,

First of all, sorry for that I missed your prompt reply : (

In these days, I've been learning the implementation mechanism of window in
Flink.

I think the main difference between the window in Storm and Flink (from the
API level) is that, Storm maintains only one window while Flink maintains
several isolated windows. Due to that, Storm users can be aware of the
transformation (tuple add and expire) of a window and take actions on each
window modification (sliding window forwarding) while Flink users can only
implement functions on one and another complete window as if they are
independent of each other (actually they may get quite a few tuples in
common).

Objectively speaking, the window API provided by Flink is more formalize
and easy to use. However, for sliding window with high-capacity and short
interval (e.g. 5m and 1s), each tuple will be calculated redundantly (maybe
300 times in the example?). Though it provide the pane optimization, I
think it's far from enough as the optimization can only be applied on
reduce functions which restrict the input and output data type to be the
same. Some other functions, e.g., the MaxAndMin function which take numbers
as input and output a max pair and the Average function, which should
avoid redundant calculations can not be satisfied.

Actually, I just wondering if a "mergeable fold function" could be added
(just *like* this https://en.wikipedia.org/wiki/Mergeable_heap). I know it
may violate some principles of Flink (probably about states), but I insist
that unnecessary calculations should be avoided in stream processing.

So, could you give some advices, I am all ears : ), or if you think that
is feasible, I'll think carefully and try to complete it.

Thank you and merry Christmas.

Best,

- Xingcan

On Thu, Dec 1, 2016 at 7:56 PM, Aljoscha Krettek 
wrote:

> I'm not aware of how windows work in Storm. If you could maybe give some
> details on your use case we could figure out together how that would map to
> Flink windows.
>
> Cheers,
> Aljoscha
>
> On Tue, 29 Nov 2016 at 15:47 xingcan  wrote:
>
>> Hi all,
>>
>> Recently I tried to transfer some old applications from Storm to Flink.
>> In Storm, the window implementation (TupleWindow) gets two methods named
>> getNew() and getExpired() which supply the delta information of a window
>> and therefore we wrote some stateful caches that are aware of them.
>> However, it seems that Flink deals with the window in a different way and
>> supplies more "formalized" APIs.
>> So, any tips on how to adapt these delta awareness caches in Flink or do
>> some refactors to make them suitable?
>>
>> Thanks.
>>
>> Best,
>> Xingcan
>>
>>


Re: Monitoring REST API

2016-12-21 Thread Ovidiu-Cristian MARCU
Hi Lydia,

I have used sar monitoring (sar -u -n DEV -p -d -r 1) and plotted the average 
over multiple nodes.

1)So for each node you can collect the sar output, and obtain for example:

Linux 3.2.0-4-amd64 (parasilo-4.rennes.grid5000.fr) 2016-01-27  
_x86_64_(16 CPU)
12:54:09CPU %user %nice   %system   %iowait%steal %idle
12:54:10all  4.63  0.00  3.25  0.13  0.00 91.99
12:54:09kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   
%commit  kbactive   kbinact
12:54:10129538812   2525308  1.91  1292 85876   3662636  
2.69   2111652 55132
12:54:09  DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
await svctm %util
12:54:10  sda 28.71   2708.91 87.13 97.38  0.03  
1.10  0.97  2.77
12:54:09IFACE   rxpck/s   txpck/srxkB/stxkB/s   rxcmp/s   
txcmp/s  rxmcst/s
12:54:10 eth0632.67587.13   3173.60 58.47  0.00  
0.00  0.00

2) Calculate the average over your nodes (sync clocks) and obtain a final 
output over which you run some plot scripts:

LINE  DATE  FILENAME CPU_user  CPU_SYS   KBMEMFREE 
KBMEMUSED MEMUSED   DISK_UTIL DISK_RKBs DISK_WKBs _IO_RSTs  _IO_WSTs
1 12:54:10  res1Avg  6.12  1.25  129554704 
2509412   1.90  6.00  4253.63   87.04 3944.00   88.00 
2 12:54:11  res1Avg  3.41  0.28  129523432 
2540690   1.92  4.00  2335.82   51.62 2692.00   0.00  
3 12:54:12  res1Avg  0.06  0.03  129522000 
2542120   1.92  1.60  0.16  0.59  2048.00   32.00 
4 12:54:13  res1Avg  0.09  0.06  129520936 
2543182   1.92  0.60  0.19  0.59  2048.00   0.00  
5 12:54:14  res1Avg  0.06  0.06  129518448 
2545670   1.93  6.80  4.31  169.474044.00   16.00 

For other metrics specific to Flink’s execution you may need to rely on various 
metrics Flink is currently exposing.

Best,
Ovidiu

> On 21 Dec 2016, at 19:55, Lydia Ickler  wrote:
> 
> Hi all,
> 
> I have a question regarding the Monitoring REST API;
> 
> I want to analyze the behavior of my program with regards to I/O MiB/s, 
> Network MiB/s and CPU % as the authors of this paper did. 
> (https://hal.inria.fr/hal-01347638v2/document 
> )
> From the JSON file at http:master:8081/jobs/jobid/ I get a summary including 
> the information of read/write records and read/write bytes.
> Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am 
> running my program on a cluster with up to 32 nodes.
> 
> Where can I find the values for e.g. CPU or Network?
> 
> Thanks in advance!
> Lydia
> 



Re: FlinkML and DataStream API

2016-12-21 Thread Márton Balassi
Thanks for mentioning it, Theo.

Here it is: https://github.com/streamline-eu/ML-Pipelines/tree/stream-ml

Look at these examples:
https://github.com/streamline-eu/ML-Pipelines/commit/314e3d940f1f1ac7b762ba96067e13d806476f57

On Wed, Dec 21, 2016 at 9:38 PM,  wrote:

> I'm interested in that code you mentioned too, I hope you can find it.
>
> Regards,
> Matt
>
> On Dec 21, 2016, at 17:12, Theodore Vasiloudis <
> theodoros.vasilou...@gmail.com> wrote:
>
> Hello Mäki,
>
> I think what you would like to do is train a model using batch, and use
> the Flink streaming API as a way to serve your model and make predictions.
>
> While we don't have an integrated way to do that in FlinkML currently, I
> definitely think that's possible. I know Marton Balassi has been working on
> something like this for the ALS algorithm, but I can't find the code right
> now on mobile.
> The general idea is to keep your model as state and use it to make
> predictions on a stream of incoming data.
>
> Model serving is definitely something we'll be working on in the future,
> I'll have a master student working on exactly that next semester.
>
> --
> Sent from a mobile device. May contain autocorrect errors.
>
> On Dec 21, 2016 5:24 PM, "Mäki Hanna"  wrote:
>
>> Hi,
>>
>>
>>
>> I’m wondering if there is a way to use FlinkML and make predictions
>> continuously for test data coming from a DataStream.
>>
>>
>>
>> I know FlinkML only supports the DataSet API (batch) at the moment, but
>> is there a way to convert a DataStream into DataSets? I’m thinking of
>> something like
>>
>>
>>
>> (0. fit model in batch mode)
>>
>> 1. window the DataStream
>>
>> 2. convert the windowed stream to DataSets
>>
>> 3. use the FlinkML methods to make predictions
>>
>>
>>
>> BR,
>>
>> Hanna
>>
>>
>> Disclaimer: This message and any attachments thereto are intended solely
>> for the addressed recipient(s) and may contain confidential information. If
>> you are not the intended recipient, please notify the sender by reply
>> e-mail and delete the e-mail (including any attachments thereto) without
>> producing, distributing or retaining any copies thereof. Any review,
>> dissemination or other use of, or taking of any action in reliance upon,
>> this information by persons or entities other than the intended
>> recipient(s) is prohibited. Thank you.
>>
>


Re: FlinkML and DataStream API

2016-12-21 Thread dromitlabs
I'm interested in that code you mentioned too, I hope you can find it.

Regards,
Matt

> On Dec 21, 2016, at 17:12, Theodore Vasiloudis 
>  wrote:
> 
> Hello Mäki, 
> 
> I think what you would like to do is train a model using batch, and use the 
> Flink streaming API as a way to serve your model and make predictions. 
> 
> While we don't have an integrated way to do that in FlinkML currently, I 
> definitely think that's possible. I know Marton Balassi has been working on 
> something like this for the ALS algorithm, but I can't find the code right 
> now on mobile.  
> The general idea is to keep your model as state and use it to make 
> predictions on a stream of incoming data. 
> 
> Model serving is definitely something we'll be working on in the future, I'll 
> have a master student working on exactly that next semester. 
> 
> -- 
> Sent from a mobile device. May contain autocorrect errors.
> 
>> On Dec 21, 2016 5:24 PM, "Mäki Hanna"  wrote:
>> Hi,
>> 
>>  
>> 
>> I’m wondering if there is a way to use FlinkML and make predictions 
>> continuously for test data coming from a DataStream.
>> 
>>  
>> 
>> I know FlinkML only supports the DataSet API (batch) at the moment, but is 
>> there a way to convert a DataStream into DataSets? I’m thinking of something 
>> like
>> 
>>  
>> 
>> (0. fit model in batch mode)
>> 
>> 1. window the DataStream
>> 
>> 2. convert the windowed stream to DataSets
>> 
>> 3. use the FlinkML methods to make predictions
>> 
>>  
>> 
>> BR,
>> 
>> Hanna
>> 
>>  
>> 
>> Disclaimer: This message and any attachments thereto are intended solely for 
>> the addressed recipient(s) and may contain confidential information. If you 
>> are not the intended recipient, please notify the sender by reply e-mail and 
>> delete the e-mail (including any attachments thereto) without producing, 
>> distributing or retaining any copies thereof. Any review, dissemination or 
>> other use of, or taking of any action in reliance upon, this information by 
>> persons or entities other than the intended recipient(s) is prohibited. 
>> Thank you.


Re: FlinkML and DataStream API

2016-12-21 Thread Theodore Vasiloudis
Hello Mäki,

I think what you would like to do is train a model using batch, and use the
Flink streaming API as a way to serve your model and make predictions.

While we don't have an integrated way to do that in FlinkML currently, I
definitely think that's possible. I know Marton Balassi has been working on
something like this for the ALS algorithm, but I can't find the code right
now on mobile.
The general idea is to keep your model as state and use it to make
predictions on a stream of incoming data.

Model serving is definitely something we'll be working on in the future,
I'll have a master student working on exactly that next semester.

-- 
Sent from a mobile device. May contain autocorrect errors.

On Dec 21, 2016 5:24 PM, "Mäki Hanna"  wrote:

> Hi,
>
>
>
> I’m wondering if there is a way to use FlinkML and make predictions
> continuously for test data coming from a DataStream.
>
>
>
> I know FlinkML only supports the DataSet API (batch) at the moment, but is
> there a way to convert a DataStream into DataSets? I’m thinking of
> something like
>
>
>
> (0. fit model in batch mode)
>
> 1. window the DataStream
>
> 2. convert the windowed stream to DataSets
>
> 3. use the FlinkML methods to make predictions
>
>
>
> BR,
>
> Hanna
>
>
> Disclaimer: This message and any attachments thereto are intended solely
> for the addressed recipient(s) and may contain confidential information. If
> you are not the intended recipient, please notify the sender by reply
> e-mail and delete the e-mail (including any attachments thereto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you.
>


Re: Stateful Stream Processing with RocksDB causing Job failure

2016-12-21 Thread abiybirtukan
Thanks, that helps. 

Sent from my iPhone

> On Dec 21, 2016, at 12:08 PM, Ufuk Celebi  wrote:
> 
>> On Wed, Dec 21, 2016 at 2:52 PM,   wrote:
>> So the RocksDB state backend should only be used in conjunctions with hdfs
>> or s3 on a multi node cluster?
> 
> Yes. Otherwise there is no way to restore the checkpoint on a different host.


Monitoring REST API

2016-12-21 Thread Lydia Ickler
Hi all,

I have a question regarding the Monitoring REST API;

I want to analyze the behavior of my program with regards to I/O MiB/s, Network 
MiB/s and CPU % as the authors of this paper did. 
(https://hal.inria.fr/hal-01347638v2/document 
)
From the JSON file at http:master:8081/jobs/jobid/ I get a summary including 
the information of read/write records and read/write bytes.
Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am 
running my program on a cluster with up to 32 nodes.

Where can I find the values for e.g. CPU or Network?

Thanks in advance!
Lydia



Re: static/dynamic lookups in flink streaming

2016-12-21 Thread Meghashyam Sandeep V
As a follow up question, can we populate the operator state from an
external source?

My use case is as follows: I have a flink streaming process with Kafka as a
source. I only have ids coming from kafka messages. My look ups ()
which is a static map come from a different source. I would like to use
those lookups while applying operators on stream from Kafka.

Thanks,
Sandeep

On Wed, Dec 21, 2016 at 6:17 AM, Fabian Hueske  wrote:

> OK, I see. Yes, you can do that with Flink. It's actually a very common
> use case.
>
> You can store the names in operator state and Flink takes care of
> checkpointing the state and restoring it in case of a failure.
> In fact, the operator state is persisted in the state backends you
> mentioned before.
>
> Best, Fabian
>
> 2016-12-21 15:02 GMT+01:00 Meghashyam Sandeep V :
>
>> Hi Fabian,
>>
>> I meant look ups like IDs to names. For example if I have IDs coming
>> through the stream and if I want to replace them with corresponding names
>> stored in cache or somewhere within flink.
>>
>> Thanks,
>> Sandeep
>>
>> On Dec 21, 2016 12:35 AM, "Fabian Hueske"  wrote:
>>
>>> Hi Sandeep,
>>>
>>> I'm sorry but I think I do not understand your question.
>>> What do you mean by static or dynamic look ups? Do you want to access an
>>> external data store and cache data?
>>>
>>> Can you give a bit more detail about your use?
>>>
>>> Best, Fabian
>>>
>>> 2016-12-20 23:07 GMT+01:00 Meghashyam Sandeep V >> >:
>>>
 Hi there,

 I know that there are various state backends to persist state. Is there
 a similar way to persist static/dynamic look ups and use them while
 streaming the data in Flink?

 Thanks,
 Sandeep


>>>
>


Re: Stateful Stream Processing with RocksDB causing Job failure

2016-12-21 Thread Ufuk Celebi
On Wed, Dec 21, 2016 at 2:52 PM,   wrote:
> So the RocksDB state backend should only be used in conjunctions with hdfs
> or s3 on a multi node cluster?

Yes. Otherwise there is no way to restore the checkpoint on a different host.


FlinkML and DataStream API

2016-12-21 Thread Mäki Hanna
Hi,

I'm wondering if there is a way to use FlinkML and make predictions 
continuously for test data coming from a DataStream.

I know FlinkML only supports the DataSet API (batch) at the moment, but is 
there a way to convert a DataStream into DataSets? I'm thinking of something 
like

(0. fit model in batch mode)
1. window the DataStream
2. convert the windowed stream to DataSets
3. use the FlinkML methods to make predictions

BR,
Hanna

Disclaimer: This message and any attachments thereto are intended solely for 
the addressed recipient(s) and may contain confidential information. If you are 
not the intended recipient, please notify the sender by reply e-mail and delete 
the e-mail (including any attachments thereto) without producing, distributing 
or retaining any copies thereof. Any review, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or entities 
other than the intended recipient(s) is prohibited. Thank you.


Events are assigned to wrong window

2016-12-21 Thread Nico
Hi @all,

I am using a TumblingEventTimeWindows.of(Time.seconds(20)) for testing.
During this I found a strange behavior (at least for me) in the assignment
of events.

The first element of a new window is actually always part of the old
window. I thought the events are late, but then they they would be dropped
instead of assigned to the new window. Even with a allowedLateness of 10s
the behavior remains the same.

The used timeWindow.getStart() and getEnd in order to get the boundaries of
the window.

Can someone explain this?

Best,
Nico


TimeWindows with Elements:

Start: 148233294 - End: 148233296
timestamp=1482332952907

Start: 148233296 - End: 148233298
timestamp=1482332958929
timestamp=1482332963995
timestamp=1482332969027
timestamp=1482332974039

Start: 148233298 - End: 148233300
timestamp=1482332979059
timestamp=1482332984072
timestamp=1482332989081
timestamp=1482332994089

Start: 148233300 - End: 148233302
timestamp=1482332999113
timestamp=1482333004123
timestamp=1482333009132
timestamp=1482333014144


Re: static/dynamic lookups in flink streaming

2016-12-21 Thread Fabian Hueske
OK, I see. Yes, you can do that with Flink. It's actually a very common use
case.

You can store the names in operator state and Flink takes care of
checkpointing the state and restoring it in case of a failure.
In fact, the operator state is persisted in the state backends you
mentioned before.

Best, Fabian

2016-12-21 15:02 GMT+01:00 Meghashyam Sandeep V :

> Hi Fabian,
>
> I meant look ups like IDs to names. For example if I have IDs coming
> through the stream and if I want to replace them with corresponding names
> stored in cache or somewhere within flink.
>
> Thanks,
> Sandeep
>
> On Dec 21, 2016 12:35 AM, "Fabian Hueske"  wrote:
>
>> Hi Sandeep,
>>
>> I'm sorry but I think I do not understand your question.
>> What do you mean by static or dynamic look ups? Do you want to access an
>> external data store and cache data?
>>
>> Can you give a bit more detail about your use?
>>
>> Best, Fabian
>>
>> 2016-12-20 23:07 GMT+01:00 Meghashyam Sandeep V 
>> :
>>
>>> Hi there,
>>>
>>> I know that there are various state backends to persist state. Is there
>>> a similar way to persist static/dynamic look ups and use them while
>>> streaming the data in Flink?
>>>
>>> Thanks,
>>> Sandeep
>>>
>>>
>>


Re: static/dynamic lookups in flink streaming

2016-12-21 Thread Meghashyam Sandeep V
Hi Fabian,

I meant look ups like IDs to names. For example if I have IDs coming
through the stream and if I want to replace them with corresponding names
stored in cache or somewhere within flink.

Thanks,
Sandeep

On Dec 21, 2016 12:35 AM, "Fabian Hueske"  wrote:

> Hi Sandeep,
>
> I'm sorry but I think I do not understand your question.
> What do you mean by static or dynamic look ups? Do you want to access an
> external data store and cache data?
>
> Can you give a bit more detail about your use?
>
> Best, Fabian
>
> 2016-12-20 23:07 GMT+01:00 Meghashyam Sandeep V :
>
>> Hi there,
>>
>> I know that there are various state backends to persist state. Is there a
>> similar way to persist static/dynamic look ups and use them while streaming
>> the data in Flink?
>>
>> Thanks,
>> Sandeep
>>
>>
>


Re: Stateful Stream Processing with RocksDB causing Job failure

2016-12-21 Thread abiybirtukan
Thanks for the prompt response.

I am persisting checkpoints to local file system. I have three node cluster, 
task managers running on two different hosts. At times the job runs for days 
with no issue but recently it started to fail when it takes snapshots. 

So the RocksDB state backend should only be used in conjunctions with hdfs or 
s3 on a multi node cluster? 


Thanks,
Abiy 

> On Dec 21, 2016, at 4:25 AM, Fabian Hueske  wrote:
> 
> Copying my reply from the other thread with the same issue to have the 
> discussion in one place.
> 
> --
> 
> Hi Abiy,
> 
> to which type of filesystem are you persisting your checkpoints? 
> 
> We have seen problems with S3 and its consistency model. These issues have 
> been addressed in newer versions of Flink. 
> Not sure if the fix went into 1.1.3 already but release 1.1.4 is currently 
> voted on and has tons of other bug fixes as well.
> I would suggest to upgrade to 1.1.3 or even 1.1.4 once it is released (should 
> happen in a few days if no regression is found).
> 
> Best, Fabian
> 
> 2016-12-21 10:12 GMT+01:00 Ufuk Celebi :
>> Hey Abiy!
>> 
>> - Do all the task managers run on a single host? Only then using the
>> local file system will work.
>> 
>> - What does every now and then mean? Every time when the job tries to
>> take a snapshot? After restarts?
>> 
>> The JobManager logs will also help if we can't figure this out like this.
>> 
>> Best,
>> 
>> Ufuk
>> 
>> On Tue, Dec 20, 2016 at 6:05 PM, Abiy Legesse Hailemichael
>>  wrote:
>> > I am running a standalone flink cluster (1.1.2) and I have a stateful
>> > streaming job that uses RocksDB as a state manager. I have two stateful
>> > operators that are using ValueState<> and ListState<>. Every now and then 
>> > my
>> > job fails with the following exception
>> >
>> > java.lang.Exception: Could not restore checkpointed state to operators and
>> > functions
>> >   at
>> > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:552)
>> >   at
>> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:250)
>> >   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>> >   at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.io.FileNotFoundException: File
>> > file:/data/flink/checkpoints/226c84df02e47d1b9c036ba894503145/StreamMap_12_5/dummy_state/chk-83
>> > does not exist
>> >   at
>> > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
>> >   at
>> > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
>> >   at
>> > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
>> >   at
>> > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
>> >   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
>> >   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
>> >   at
>> > org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:88)
>> >   at 
>> > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1975)
>> >   at
>> > org.apache.flink.streaming.util.HDFSCopyToLocal$1.run(HDFSCopyToLocal.java:48)
>> >
>> >
>> > Can someone help me with this, Is this  a known issue ?
>> >
>> > Thanks
>> >
>> > Abiy Hailemichael
>> > Software Engineer
>> > Email: abiybirtu...@gmail.com
>> >
> 


Re: Stateful Stream Processing with RocksDB causing Job failure

2016-12-21 Thread Fabian Hueske
Copying my reply from the other thread with the same issue to have the
discussion in one place.

--

Hi Abiy,

to which type of filesystem are you persisting your checkpoints?

We have seen problems with S3 and its consistency model. These issues have
been addressed in newer versions of Flink.
Not sure if the fix went into 1.1.3 already but release 1.1.4 is currently
voted on and has tons of other bug fixes as well.
I would suggest to upgrade to 1.1.3 or even 1.1.4 once it is released
(should happen in a few days if no regression is found).

Best, Fabian

2016-12-21 10:12 GMT+01:00 Ufuk Celebi :

> Hey Abiy!
>
> - Do all the task managers run on a single host? Only then using the
> local file system will work.
>
> - What does every now and then mean? Every time when the job tries to
> take a snapshot? After restarts?
>
> The JobManager logs will also help if we can't figure this out like this.
>
> Best,
>
> Ufuk
>
> On Tue, Dec 20, 2016 at 6:05 PM, Abiy Legesse Hailemichael
>  wrote:
> > I am running a standalone flink cluster (1.1.2) and I have a stateful
> > streaming job that uses RocksDB as a state manager. I have two stateful
> > operators that are using ValueState<> and ListState<>. Every now and
> then my
> > job fails with the following exception
> >
> > java.lang.Exception: Could not restore checkpointed state to operators
> and
> > functions
> >   at
> > org.apache.flink.streaming.runtime.tasks.StreamTask.
> restoreState(StreamTask.java:552)
> >   at
> > org.apache.flink.streaming.runtime.tasks.StreamTask.
> invoke(StreamTask.java:250)
> >   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> >   at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.io.FileNotFoundException: File
> > file:/data/flink/checkpoints/226c84df02e47d1b9c036ba8945031
> 45/StreamMap_12_5/dummy_state/chk-83
> > does not exist
> >   at
> > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(
> RawLocalFileSystem.java:609)
> >   at
> > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(
> RawLocalFileSystem.java:822)
> >   at
> > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(
> RawLocalFileSystem.java:599)
> >   at
> > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(
> FilterFileSystem.java:421)
> >   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> >   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> >   at
> > org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(
> LocalFileSystem.java:88)
> >   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(
> FileSystem.java:1975)
> >   at
> > org.apache.flink.streaming.util.HDFSCopyToLocal$1.run(
> HDFSCopyToLocal.java:48)
> >
> >
> > Can someone help me with this, Is this  a known issue ?
> >
> > Thanks
> >
> > Abiy Hailemichael
> > Software Engineer
> > Email: abiybirtu...@gmail.com
> >
>


Re: Stream Iterations

2016-12-21 Thread Ufuk Celebi
The delta iterations are a batch-only feature.

Did you try the DataStream#iterate?

https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/index.html#iterations

On Mon, Dec 19, 2016 at 5:35 AM, Govindarajan Srinivasaraghavan
 wrote:
> Hi All,
>
> I have a use case for which I need some suggestions. It's a streaming
> application with kafka source and then groupBy, keyBy and perform some
> calculations. The output of each calculation has to be a side input for the
> next calculation and also it needs to be sent to a sink.
>
> Right now I'm achieving this by storing the result state in memory and also
> save it in redis cache. I was looking at delta iterations in flink
> documentation. It would great if someone can help me understand if I can
> achieve this using iterations or any other api. Thanks in advance.
>
> Regards,
> Govind


Re: Stateful Stream Processing with RocksDB causing Job failure

2016-12-21 Thread Ufuk Celebi
Hey Abiy!

- Do all the task managers run on a single host? Only then using the
local file system will work.

- What does every now and then mean? Every time when the job tries to
take a snapshot? After restarts?

The JobManager logs will also help if we can't figure this out like this.

Best,

Ufuk

On Tue, Dec 20, 2016 at 6:05 PM, Abiy Legesse Hailemichael
 wrote:
> I am running a standalone flink cluster (1.1.2) and I have a stateful
> streaming job that uses RocksDB as a state manager. I have two stateful
> operators that are using ValueState<> and ListState<>. Every now and then my
> job fails with the following exception
>
> java.lang.Exception: Could not restore checkpointed state to operators and
> functions
>   at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:552)
>   at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:250)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: File
> file:/data/flink/checkpoints/226c84df02e47d1b9c036ba894503145/StreamMap_12_5/dummy_state/chk-83
> does not exist
>   at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
>   at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
>   at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
>   at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
>   at
> org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:88)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1975)
>   at
> org.apache.flink.streaming.util.HDFSCopyToLocal$1.run(HDFSCopyToLocal.java:48)
>
>
> Can someone help me with this, Is this  a known issue ?
>
> Thanks
>
> Abiy Hailemichael
> Software Engineer
> Email: abiybirtu...@gmail.com
>


Re: CodeAnalysisMode in Flink

2016-12-21 Thread Fabian Hueske
Hi Vinay,

I had a look into the code. The code analysis is only performed for DataSet
(batch) programs and not for DataStream (streaming) programs.
If your program is a DataStream program, this would explain why no
information is shown.

The DataStream API does also not leverage semantic function annotations
(yet).
So analyzing and annotating functions that you use in a streaming program
does not give any benefits at the moment.

In case you analyzing a batch DataSet program, we would need to dig a bit
deeper to identify the problem.

Best, Fabian

2016-12-20 23:21 GMT+01:00 vinay patil :

> Hi,
>
> Any updates on this thread ?
>
> Regards,
> Vinay Patil
>
> On Fri, Nov 18, 2016 at 10:25 PM, Fabian Hueske-2 [via Apache Flink User
> Mailing List archive.] <[hidden email]
> > wrote:
>
>> Hi Vinay,
>>
>> not sure why it's not working, but maybe TImo (in CC) can help.
>>
>> Best, Fabian
>>
>> 2016-11-18 17:41 GMT+01:00 Vinay Patil <[hidden email]
>> >:
>>
>>> Hi,
>>>
>>> According to JavaDoc if I use the below method
>>> *env.getConfig().setCodeAnalysisMode(CodeAnalysisMode.HINT);*
>>>
>>> ,it will print the program improvements to log, however these details
>>> are not getting printed to log, I have kept the log4j in resources folder
>>> and kept the log level to ALL.
>>>
>>>  I want to analyse my pipeline to get rid of common mistakes.
>>>
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>
>>
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>> ble.com/CodeAnalysisMode-in-Flink-tp10212p10215.html
>> To start a new topic under Apache Flink User Mailing List archive., email 
>> [hidden
>> email] 
>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>> NAML
>> 
>>
>
>
> --
> View this message in context: Re: CodeAnalysisMode in Flink
> 
> Sent from the Apache Flink User Mailing List archive. mailing list archive
>  at
> Nabble.com.
>


Re:

2016-12-21 Thread Fabian Hueske
Hi Abiy,

to which type of filesystem are you persisting your checkpoints?

We have seen problems with S3 and its consistency model. These issues have
been addressed in newer versions of Flink.
Not sure if the fix went into 1.1.3 already but release 1.1.4 is currently
voted on and has tons of other bug fixes as well.
I would suggest to upgrade to 1.1.3 or even 1.1.4 once it is released
(should happen in a few days if no regression is found).

Best, Fabian

2016-12-20 21:19 GMT+01:00 Abiy Legesse Hailemichael :

> I am running a standalone flink cluster (1.1.2) and I have a stateful
> streaming job that uses RocksDB as a state manager. I have two stateful
> operators that are using ValueState<> and ListState<>. Every now and then
> my job fails with the following exception
>
> Caused by: AsynchronousException{java.io.FileNotFoundException: File
> file:/data/flink/checkpoints/471ef8996921bb9c29434abf35490a
> 26/StreamMap_12_0/dummy_state/chk-4 does not exist}
> at org.apache.flink.streaming.runtime.tasks.StreamTask$
> AsyncCheckpointThread.run(StreamTask.java:870)
> Caused by: java.io.FileNotFoundException: File
> file:/data/flink/checkpoints/471ef8996921bb9c29434abf35490a
> 26/StreamMap_12_0/dummy_state/chk-4 does not exist
> at org.apache.hadoop.fs.RawLocalFileSystem.
> deprecatedGetFileStatus(RawLocalFileSystem.java:609)
> at org.apache.hadoop.fs.RawLocalFileSystem.
> getFileLinkStatusInternal(RawLocalFileSystem.java:822)
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(
> RawLocalFileSystem.java:599)
> at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(
> FilterFileSystem.java:421)
> at org.apache.hadoop.fs.FileSystem.getContentSummary(
> FileSystem.java:1467)
> at org.apache.flink.contrib.streaming.state.RocksDBStateBackend$
> FinalSemiAsyncSnapshot.getStateSize(RocksDBStateBackend.java:688)
> at org.apache.flink.streaming.runtime.tasks.StreamTaskStateList.
> getStateSize(StreamTaskStateList.java:89)
> at org.apache.flink.streaming.runtime.tasks.StreamTask$
> AsyncCheckpointThread.run(StreamTask.java:860)
>
>
>
> Abiy Hailemichael
> Software Engineer
> Phone: (202) 355-8933
> Email: abiybirtu...@gmail.com 
>
>


Re: static/dynamic lookups in flink streaming

2016-12-21 Thread Fabian Hueske
Hi Sandeep,

I'm sorry but I think I do not understand your question.
What do you mean by static or dynamic look ups? Do you want to access an
external data store and cache data?

Can you give a bit more detail about your use?

Best, Fabian

2016-12-20 23:07 GMT+01:00 Meghashyam Sandeep V :

> Hi there,
>
> I know that there are various state backends to persist state. Is there a
> similar way to persist static/dynamic look ups and use them while streaming
> the data in Flink?
>
> Thanks,
> Sandeep
>
>