Re: Questions about the V-C Iteration in Gelly

2017-02-14 Thread Xingcan Cui
Hi Vasia,

sorry that I should have read the archive before (it's already been posted
in FLINK-1526, though with an ugly format). Now everything's clear and I
think this thread should be closed here.

Thanks. @Vasia @Greg

Best,
Xingcan

On Tue, Feb 14, 2017 at 3:55 PM, Vasiliki Kalavri  wrote:

> Hi Xingcan,
>
> that's my bad, I was thinking of scatter-gather iterations in my previous
> reply. You're right, in VertexCentricIteration a vertex is only active in
> the next superstep if it has received at least one message in the current
> superstep. Updating its value does not impact the activation. This is
> intentional in the vertex-centric model.
>
> I agree that the current design of the iterative models is restrictive and
> doesn't allow for the expression of complex iterative algorithms that
> require updating edges or defining phases. We have discussed this before,
> e.g. in [1]. The outcome of that discussion was that we should use for-loop
> iterations for such cases, as the closed-loop iteration operators of Flink
> might not provide the necessary flexibility. As you will see in the thread
> though, that proposal didn't work out, as efficiently supporting for-loops
> in Flink is not possible right now.
>
> -Vasia.
>
> [1]: http://apache-flink-mailing-list-archive.1008284.
> n3.nabble.com/DISCUSS-Gelly-iteration-abstractions-td3949.html
>
> On 14 February 2017 at 08:10, Xingcan Cui  wrote:
>
>> Hi Greg,
>>
>> I also found that in VertexCentricIteration.java, the message set is
>> taken as the workset while the vertex set is taken as the delta for
>> solution set. By doing like that, the setNewVertex method will not actually
>> active a vertex. In other words, if no message is generated (the workset is
>> empty) the "pact.runtime.workset-empty-aggregator" will judge
>> convergence of the delta iteration and then the iteration just terminates.
>> Is this a bug?
>>
>> Best,
>> Xingcan
>>
>>
>> On Mon, Feb 13, 2017 at 5:24 PM, Xingcan Cui  wrote:
>>
>>> Hi Greg,
>>>
>>> Thanks for your attention.
>>>
>>> It takes me a little time to read the old PR on FLINK-1885. Though
>>> the VertexCentricIteration, as well as its related classes, has been
>>> refactored, I understand what Markus want to achieve.
>>>
>>> I am not sure if using a bulk iteration instead of a delta one could
>>> eliminate the "out of memory" problem.  Except for that, I think the "auto
>>> update" has nothing to do with the bulk mode. Considering the compatible
>>> guarantee, here is my suggestions to improve gelly's iteration API:
>>>
>>> 1) Add an "autoHalt" flag to the ComputeFunction.
>>>
>>> 2) If the flag is set true (default), apply the current mechanism .
>>>
>>> 3) If the flag is set false, call out.collect() to update the vertex
>>> value whether the setNewVertexValue() method is called or not, unless the
>>> user explicitly call a (new added) voteToHalt() method in the
>>> ComputeFunction.
>>>
>>> By adding these, users can decide when to halt a vertex themselves. What
>>> do you think?
>>>
>>> As for the "update edge values during vertex iterations" problem, I
>>> think it needs a redesign for the gelly framework (Maybe merge the vertices
>>> and edges into a single data set? Or just change the iterations'
>>>  implementation? I can't think it clearly now.), so that's it for now.
>>> Besides, I don't think there will be someone who really would love to write
>>> a graph algorithm with Flink native operators and that's why gelly is
>>> designed, isn't it?
>>>
>>> Best,
>>> Xingcan
>>>
>>> On Fri, Feb 10, 2017 at 10:31 PM, Greg Hogan  wrote:
>>>
 Hi Xingcan,

 FLINK-1885 looked into adding a bulk mode to Gelly's iterative models.

 As an alternative you could implement your algorithm with Flink
 operators and a bulk iteration. Most of the Gelly library is written with
 native operators.

 Greg

 On Fri, Feb 10, 2017 at 5:02 AM, Xingcan Cui 
 wrote:

> Hi Vasia,
>
> b) As I said, when some vertices finished their work in current phase,
> they have nothing to do (no value updates, no message received, just like
> slept) but to wait for other vertices that have not finished (the current
> phase) yet. After that in the next phase, all the vertices should go back
> to work again and if there are some vertices become inactive in last 
> phase,
> it could be hard to reactive them again by message since we even don't 
> know
> which vertices to send to. The only solution is to keep all vertices
> active, whether by updating vertices values in each super step or sending
> heartbeat messages to vertices themselves (which will bring a lot of extra
> work to the MessageCombiner).
>
> c) I know it's not elegant or even an awful idea to store the edge
> info into vertex values. However, we can not change edge values or 
> maintain
> states (even a pick or unpick mark) in edges during a vertex-centric
> iteration

Re: A way to control redistribution of operator state?

2017-02-14 Thread Stefan Richter
Hi,

there is something that we call "raw keyed“ operator state, which might exactly 
serve your purpose. It is already used internally by Flink’s window operator, 
but there exists currently no public API for this feature. Way it works 
currently is that you obtain input and output streams that are aware of 
key-groups being written or read, but the API needs to consider the fact that 
each key-group must be written only once and complete before the next key-group 
can start. This is a bit tricky to expose for inheritance hierarchies. My guess 
is that you can expect this for the next version of Flink.

Best,
Stefan

> Am 14.02.2017 um 08:31 schrieb Tzu-Li (Gordon) Tai :
> 
> Hi Dmitry,
> 
> Technically, from the looks of the internal code around 
> `OperatorStateRepartitioner`, I think it is certainly possible to be 
> pluggable.
> Right now it is just hard coded to use a round-robin repartitioner 
> implementation as default.
> 
> However, I’m not sure of the plans in exposing this to the user and making it 
> configurable.
> Looping in Stefan (in cc) who mostly worked on this part and see if he can 
> provide more info.
> 
> - Gordon
> 
> On February 14, 2017 at 2:30:27 AM, Dmitry Golubets (dgolub...@gmail.com 
> ) wrote:
> 
>> Hi,
>> 
>> It looks impossible to implement a keyed state with operator state now.
>> 
>> I know it sounds like "just use a keyed state", but latter requires updating 
>> it on every value change as opposed to operator state and thus can be 
>> expensive (especially if you have to deal with mutable structures inside 
>> which have to be serialized).
>> 
>> The problem is that there is no way to tell Flink how to reassign savepoint 
>> parts between partitions, and thus impossible to route data to correct 
>> partitions.
>> 
>> Is there anything I missed or maybe a plan to implement it in future?
>> 
>> Best regards,
>> Dmitry



Re: #kafka/#ssl: how to enable ssl authentication for a new kafka consumer?

2017-02-14 Thread Tzu-Li (Gordon) Tai
Hi Alex,

Kafka authentication and data transfer encryption using SSL can be simply
done be configuring brokers and the connecting client.

You can take a look at this:
https://kafka.apache.org/documentation/#security_ssl.

The Kafka client that the Flink connector uses can be configured through the
`Properties` configuration provided when instantiating `FlinkKafkaConsumer`.
You just need to set values for these config properties:
https://kafka.apache.org/documentation/#security_configclients.

Note that SSL truststore / keystore locations must exist on all of your
Flink TMs for this to work.

Hope this helps!

- Gordon



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/kafka-ssl-how-to-enable-ssl-authentication-for-a-new-kafka-consumer-tp11532p11610.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Support for daylight saving timezone changes in Flink

2017-02-14 Thread Till Rohrmann
Hi Swapnil,

Flink does not have explicit support for this. It's the responsibility of
the user to make sure that the right watermarks are extracted from the
incoming events. This also means the correct handling of different time
zones.

Cheers,
Till

On Tue, Feb 14, 2017 at 8:41 AM, Swapnil Chougule 
wrote:

> I want to know the behavior of flink streaming systems during daylight
> saving changes in multiple timezones. As streaming systems may in such
> timezones.
> Is there any built-in support is needed ? Can anybody answer ?
> Thanks in advance
>
> --Swapnil
>


Re: A way to control redistribution of operator state?

2017-02-14 Thread Dmitry Golubets
Hi,

I was playing with it more today and I think I've found a workaround.

So what I do:
1. I define a constant N logical groups
2. I use consistent hash mapping of data keys to these groups
3. I map these groups to partitions using even distribution (same as Flink
distributes state)
4. In a stateful function I'm able to calculate wich groups are assigned to
that partition and produce the right number of states for each groups
(empty states too)
5. I do manual partitioning before that stateful function using same
calculations with groups

So far it looks like scaling up and down results in correct behavior.
Can I rely on Flink distributing state evenly and in the order I return it
in the list?

Best regards,
Dmitry

On Tue, Feb 14, 2017 at 9:33 AM, Stefan Richter  wrote:

> Hi,
>
> there is something that we call "raw keyed“ operator state, which might
> exactly serve your purpose. It is already used internally by Flink’s window
> operator, but there exists currently no public API for this feature. Way it
> works currently is that you obtain input and output streams that are aware
> of key-groups being written or read, but the API needs to consider the fact
> that each key-group must be written only once and complete before the next
> key-group can start. This is a bit tricky to expose for inheritance
> hierarchies. My guess is that you can expect this for the next version of
> Flink.
>
> Best,
> Stefan
>
> Am 14.02.2017 um 08:31 schrieb Tzu-Li (Gordon) Tai :
>
> Hi Dmitry,
>
> Technically, from the looks of the internal code around
> `OperatorStateRepartitioner`, I think it is certainly possible to be
> pluggable.
> Right now it is just hard coded to use a round-robin repartitioner
> implementation as default.
>
> However, I’m not sure of the plans in exposing this to the user and making
> it configurable.
> Looping in Stefan (in cc) who mostly worked on this part and see if he can
> provide more info.
>
> - Gordon
>
> On February 14, 2017 at 2:30:27 AM, Dmitry Golubets (dgolub...@gmail.com)
> wrote:
>
> Hi,
>
> It looks impossible to implement a keyed state with operator state now.
>
> I know it sounds like "just use a keyed state", but latter requires
> updating it on every value change as opposed to operator state and thus can
> be expensive (especially if you have to deal with mutable structures inside
> which have to be serialized).
>
> The problem is that there is no way to tell Flink how to reassign
> savepoint parts between partitions, and thus impossible to route data to
> correct partitions.
>
> Is there anything I missed or maybe a plan to implement it in future?
>
> Best regards,
> Dmitry
>
>
>


Inconsistent Results in Event Time based Sliding Window processing

2017-02-14 Thread Sujit Sakre
Hi,

I have been using Flink 1.1.1 with Kafka 0.9 to process real time streams.
We have written our own Window Function and are processing data with
Sliding Windows. We are using Event Time and use a custom watermark
generator.

We select a particular window out of multiple sliding windows and process
all items in the window in an iterative loop to increment counts of the
items selected. After this we call the sink method to log the result in a
database.

This is working fine most of the times, i.e. it produces the expected
result most of the times. However there are situations when certain windows
are not processed (under same test conditions), this means the results are
less than expected i.e. the counts are less than expected. Sometimes, items
instead of certain windows not getting processed, certain items do not get
processed in the window. This is unpredictable.

I wish to know what could be the cause of this inconsistent behaviour. How
do we resolve it. We have integrated 1.2 and Kafka 0.10 now, the problem
persists.

Please could you suggest about what the problem could be and how to resolve
this.

Many thanks.


*Sujit Sakre*

-- 
This email is sent on behalf of Northgate Public Services (UK) Limited and 
its associated companies including Rave Technologies (India) Pvt Limited 
(together "Northgate Public Services") and is strictly confidential and 
intended solely for the addressee(s). 
If you are not the intended recipient of this email you must: (i) not 
disclose, copy or distribute its contents to any other person nor use its 
contents in any way or you may be acting unlawfully;  (ii) contact 
Northgate Public Services immediately on +44(0)1908 264500 quoting the name 
of the sender and the addressee then delete it from your system.
Northgate Public Services has taken reasonable precautions to ensure that 
no viruses are contained in this email, but does not accept any 
responsibility once this email has been transmitted.  You should scan 
attachments (if any) for viruses.

Northgate Public Services (UK) Limited, registered in England and Wales 
under number 00968498 with a registered address of Peoplebuilding 2, 
Peoplebuilding Estate, Maylands Avenue, Hemel Hempstead, Hertfordshire, HP2 
4NN.  Rave Technologies (India) Pvt Limited, registered in India under 
number 117068 with a registered address of 2nd Floor, Ballard House, Adi 
Marzban Marg, Ballard Estate, Mumbai, Maharashtra, India, 41.


Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Flavio Pompermaier
Hi to all,
I've tried to migrate to Flink 1.2.0 and now my Eclipse projects says that
they can't find *apacheds-jdbm1* that has packaging bundle. Should I
install some plugin?

Best,
Flavio


Re: Inconsistent Results in Event Time based Sliding Window processing

2017-02-14 Thread Nico Kruber
Hi Sujit,
this does indeed sound strange and we are not aware of any data loss issues.
Are there any exceptions or other errors in the job/taskmanager logs?
Do you have a minimal working example? Is it that whole windows are not 
processed or just single items inside a window?


Nico

On Tuesday, 14 February 2017 16:57:31 CET Sujit Sakre wrote:
> Hi,
> 
> I have been using Flink 1.1.1 with Kafka 0.9 to process real time streams.
> We have written our own Window Function and are processing data with
> Sliding Windows. We are using Event Time and use a custom watermark
> generator.
> 
> We select a particular window out of multiple sliding windows and process
> all items in the window in an iterative loop to increment counts of the
> items selected. After this we call the sink method to log the result in a
> database.
> 
> This is working fine most of the times, i.e. it produces the expected
> result most of the times. However there are situations when certain windows
> are not processed (under same test conditions), this means the results are
> less than expected i.e. the counts are less than expected. Sometimes, items
> instead of certain windows not getting processed, certain items do not get
> processed in the window. This is unpredictable.
> 
> I wish to know what could be the cause of this inconsistent behaviour. How
> do we resolve it. We have integrated 1.2 and Kafka 0.10 now, the problem
> persists.
> 
> Please could you suggest about what the problem could be and how to resolve
> this.
> 
> Many thanks.
> 
> 
> *Sujit Sakre*



signature.asc
Description: This is a digitally signed message part.


Re: JavaDoc 404

2017-02-14 Thread Robert Metzger
HI Yassine,

I've now fixed the javadocs build. It is already rebuild for the 1.2
javadocs:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html
The build for master should be done in 30 minutes.


On Wed, Feb 8, 2017 at 10:49 AM, Yassine MARZOUGUI <
y.marzou...@mindlytix.com> wrote:

> Thanks Robert and Ufuk for the update.
>
> 2017-02-07 18:43 GMT+01:00 Robert Metzger :
>
>> I've filed a JIRA for the issue: https://issues.apache.o
>> rg/jira/browse/FLINK-5736
>>
>> On Tue, Feb 7, 2017 at 5:00 PM, Robert Metzger 
>> wrote:
>>
>>> Yes, I'll try to fix it asap. Sorry for the inconvenience.
>>>
>>> On Mon, Feb 6, 2017 at 4:43 PM, Ufuk Celebi  wrote:
>>>
 Thanks for reporting this. I think Robert (cc'd) is working in fixing
 this, correct?

 On Sat, Feb 4, 2017 at 12:12 PM, Yassine MARZOUGUI
  wrote:
 > Hi,
 >
 > The JavaDoc link of BucketingSink in this page[1] yields to a 404
 error. I
 > couldn't find the correct url.
 > The broken link :
 > https://ci.apache.org/projects/flink/flink-docs-master/api/j
 ava/org/apache/flink/streaming/connectors/fs/bucketing/Bucke
 tingSink.html
 >
 > Other pages in the JavaDoc, like this one[2], seem lacking formatting,
 > because
 > https://ci.apache.org/projects/flink/flink-docs-master/api/j
 ava/stylesheet.css
 > and
 > https://ci.apache.org/projects/flink/flink-docs-master/api/j
 ava/script.js
 > are not found (404).
 >
 > Best,
 > Yassine
 >
 >
 > [1] :
 > https://ci.apache.org/projects/flink/flink-docs-release-1.2/
 dev/connectors/filesystem_sink.html
 > [2] :
 > https://ci.apache.org/projects/flink/flink-docs-master/api/j
 ava/org/apache/flink/streaming/api/functions/sink/RichSinkFunction.html

>>>
>>>
>>
>


Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Nico Kruber
You do not require a plugin, but most probably this dependency was not fetched 
by Eclipse. Please try a "mvn clean package" in your project and see whether 
this helps Eclipse.

Also, you may create a clean test project with

mvn archetype:generate   \
  -DarchetypeGroupId=org.apache.flink  \
  -DarchetypeArtifactId=flink-quickstart-java  \
  -DarchetypeVersion=1.2.0

for which I could not find any dependency issues using Eclipse.

Regards,
Nico

On Tuesday, 14 February 2017 14:17:10 CET Flavio Pompermaier wrote:
> Hi to all,
> I've tried to migrate to Flink 1.2.0 and now my Eclipse projects says that
> they can't find *apacheds-jdbm1* that has packaging bundle. Should I
> install some plugin?
> 
> Best,
> Flavio



signature.asc
Description: This is a digitally signed message part.


Re: Inconsistent Results in Event Time based Sliding Window processing

2017-02-14 Thread Sujit Sakre
Hi Nico,

Thanks for the reply.

There are no exceptions or other errors in the job/task manager logs. I am
running this example from Eclipse IDE with Kafka and Zookeeper running
separately; in the console there are no errors shown while processing.
Previously, we were missing some windows due to watermark exceeding the
upcoming data timestamps, however, that is not the case anymore.

There is a working example. I will share the code and data with you
separately on your email ID.

The results of processing are of three types:
1) Complete set of results is obtained without any missing calculations
2) A few windows (from 1 or 2 to more) are missing uniformly from
calculations
3) In a particular window, only selective data is missing, whereas other
data is processed accurately

These results are for the same set of inputs under same processing steps.

This is not predictable, and makes identification of error difficult, as
sometimes it works i.e. results of pattern #1 and sometimes results of
pattern #2 or #3 (#3 occurs less frequently, however it does take place.




*Sujit Sakre*

Senior Technical Architect
Tel: +91 22 6660 6600
Ext:
247
Direct: 6740 5247

Mobile: +91 98672 01204

www.rave-tech.com



Follow us on: Twitter  / LinkedIn
 / YouTube




Rave Technologies – A Northgate Public Services Company




Please consider the environment before printing this email

On 14 February 2017 at 18:55, Nico Kruber  wrote:

> Hi Sujit,
> this does indeed sound strange and we are not aware of any data loss
> issues.
> Are there any exceptions or other errors in the job/taskmanager logs?
> Do you have a minimal working example? Is it that whole windows are not
> processed or just single items inside a window?
>
>
> Nico
>
> On Tuesday, 14 February 2017 16:57:31 CET Sujit Sakre wrote:
> > Hi,
> >
> > I have been using Flink 1.1.1 with Kafka 0.9 to process real time
> streams.
> > We have written our own Window Function and are processing data with
> > Sliding Windows. We are using Event Time and use a custom watermark
> > generator.
> >
> > We select a particular window out of multiple sliding windows and process
> > all items in the window in an iterative loop to increment counts of the
> > items selected. After this we call the sink method to log the result in a
> > database.
> >
> > This is working fine most of the times, i.e. it produces the expected
> > result most of the times. However there are situations when certain
> windows
> > are not processed (under same test conditions), this means the results
> are
> > less than expected i.e. the counts are less than expected. Sometimes,
> items
> > instead of certain windows not getting processed, certain items do not
> get
> > processed in the window. This is unpredictable.
> >
> > I wish to know what could be the cause of this inconsistent behaviour.
> How
> > do we resolve it. We have integrated 1.2 and Kafka 0.10 now, the problem
> > persists.
> >
> > Please could you suggest about what the problem could be and how to
> resolve
> > this.
> >
> > Many thanks.
> >
> >
> > *Sujit Sakre*
>
>

-- 
This email is sent on behalf of Northgate Public Services (UK) Limited and 
its associated companies including Rave Technologies (India) Pvt Limited 
(together "Northgate Public Services") and is strictly confidential and 
intended solely for the addressee(s). 
If you are not the intended recipient of this email you must: (i) not 
disclose, copy or distribute its contents to any other person nor use its 
contents in any way or you may be acting unlawfully;  (ii) contact 
Northgate Public Services immediately on +44(0)1908 264500 quoting the name 
of the sender and the addressee then delete it from your system.
Northgate Public Services has taken reasonable precautions to ensure that 
no viruses are contained in this email, but does not accept any 
responsibility once this email has been transmitted.  You should scan 
attachments (if any) for viruses.

Northgate Public Services (UK) Limited, registered in England and Wales 
under number 00968498 with a registered address of Peoplebuilding 2, 
Peoplebuilding Estate, Maylands Avenue, Hemel Hempstead, Hertfordshire, HP2 
4NN.  Rave Technologies (India) Pvt Limited, registered in India under 
number 117068 with a registered address of 2nd Floor, Ballard House, Adi 
Marzban Marg, Ballard Estate, Mumbai, Maharashtra, India, 41.


Re: Inconsistent Results in Event Time based Sliding Window processing

2017-02-14 Thread Nico Kruber
Hmm, without any exceptions in the logs, I'd say that you may be on the right 
track with elements arriving with timestamps older than the last watermark.

You may play around with allowed lateness
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/
windows.html#allowed-lateness
to see if this is actually the case.


Nico

On Tuesday, 14 February 2017 19:39:46 CET Sujit Sakre wrote:
> Hi Nico,
> 
> Thanks for the reply.
> 
> There are no exceptions or other errors in the job/task manager logs. I am
> running this example from Eclipse IDE with Kafka and Zookeeper running
> separately; in the console there are no errors shown while processing.
> Previously, we were missing some windows due to watermark exceeding the
> upcoming data timestamps, however, that is not the case anymore.
> 
> There is a working example. I will share the code and data with you
> separately on your email ID.
> 
> The results of processing are of three types:
> 1) Complete set of results is obtained without any missing calculations
> 2) A few windows (from 1 or 2 to more) are missing uniformly from
> calculations
> 3) In a particular window, only selective data is missing, whereas other
> data is processed accurately
> 
> These results are for the same set of inputs under same processing steps.
> 
> This is not predictable, and makes identification of error difficult, as
> sometimes it works i.e. results of pattern #1 and sometimes results of
> pattern #2 or #3 (#3 occurs less frequently, however it does take place.
> 
> 
> 
> 
> *Sujit Sakre*
> 
> Senior Technical Architect
> Tel: +91 22 6660 6600
> Ext:
> 247
> Direct: 6740 5247
> 
> Mobile: +91 98672 01204
> 
> www.rave-tech.com
> 
> 
> 
> Follow us on: Twitter  / LinkedIn
>  / YouTube
> 
> 
> 
> 
> Rave Technologies – A Northgate Public Services Company
>  ,17z/data=!3m1!4b1!4m5!3m4!1s0x3bae17fcde71c3b9:0x1e2a8c0c4a075145!8m2!3d19.
> 0058078!4d72.8257047>
> 
> 
> 
> Please consider the environment before printing this email
> 
> On 14 February 2017 at 18:55, Nico Kruber  wrote:
> > Hi Sujit,
> > this does indeed sound strange and we are not aware of any data loss
> > issues.
> > Are there any exceptions or other errors in the job/taskmanager logs?
> > Do you have a minimal working example? Is it that whole windows are not
> > processed or just single items inside a window?
> > 
> > 
> > Nico
> > 
> > On Tuesday, 14 February 2017 16:57:31 CET Sujit Sakre wrote:
> > > Hi,
> > > 
> > > I have been using Flink 1.1.1 with Kafka 0.9 to process real time
> > 
> > streams.
> > 
> > > We have written our own Window Function and are processing data with
> > > Sliding Windows. We are using Event Time and use a custom watermark
> > > generator.
> > > 
> > > We select a particular window out of multiple sliding windows and
> > > process
> > > all items in the window in an iterative loop to increment counts of the
> > > items selected. After this we call the sink method to log the result in
> > > a
> > > database.
> > > 
> > > This is working fine most of the times, i.e. it produces the expected
> > > result most of the times. However there are situations when certain
> > 
> > windows
> > 
> > > are not processed (under same test conditions), this means the results
> > 
> > are
> > 
> > > less than expected i.e. the counts are less than expected. Sometimes,
> > 
> > items
> > 
> > > instead of certain windows not getting processed, certain items do not
> > 
> > get
> > 
> > > processed in the window. This is unpredictable.
> > > 
> > > I wish to know what could be the cause of this inconsistent behaviour.
> > 
> > How
> > 
> > > do we resolve it. We have integrated 1.2 and Kafka 0.10 now, the problem
> > > persists.
> > > 
> > > Please could you suggest about what the problem could be and how to
> > 
> > resolve
> > 
> > > this.
> > > 
> > > Many thanks.
> > > 
> > > 
> > > *Sujit Sakre*



signature.asc
Description: This is a digitally signed message part.


Unable to use Scala's BeanProperty with classes

2017-02-14 Thread Adarsh Jain
I am getting the same problem when trying to do FlatMap operation on my
POJO class.

Exception in thread "main" java.lang.IllegalStateException: Detected
more than one setter



Am using Flink 1.2, the exception is coming when using FlatMap

https://issues.apache.org/jira/browse/FLINK-5070


Re: Questions about the V-C Iteration in Gelly

2017-02-14 Thread Vasiliki Kalavri
Dear Xingcan,

no need to apologize, we are here to help :) You are always welcome to ask
questions / make suggestions.

Cheers,
-Vasia.

On 14 February 2017 at 09:35, Xingcan Cui  wrote:

> Hi Vasia,
>
> sorry that I should have read the archive before (it's already been posted
> in FLINK-1526, though with an ugly format). Now everything's clear and I
> think this thread should be closed here.
>
> Thanks. @Vasia @Greg
>
> Best,
> Xingcan
>
> On Tue, Feb 14, 2017 at 3:55 PM, Vasiliki Kalavri <
> vasilikikala...@gmail.com> wrote:
>
>> Hi Xingcan,
>>
>> that's my bad, I was thinking of scatter-gather iterations in my previous
>> reply. You're right, in VertexCentricIteration a vertex is only active in
>> the next superstep if it has received at least one message in the current
>> superstep. Updating its value does not impact the activation. This is
>> intentional in the vertex-centric model.
>>
>> I agree that the current design of the iterative models is restrictive
>> and doesn't allow for the expression of complex iterative algorithms that
>> require updating edges or defining phases. We have discussed this before,
>> e.g. in [1]. The outcome of that discussion was that we should use for-loop
>> iterations for such cases, as the closed-loop iteration operators of Flink
>> might not provide the necessary flexibility. As you will see in the thread
>> though, that proposal didn't work out, as efficiently supporting for-loops
>> in Flink is not possible right now.
>>
>> -Vasia.
>>
>> [1]: http://apache-flink-mailing-list-archive.1008284.n3.
>> nabble.com/DISCUSS-Gelly-iteration-abstractions-td3949.html
>>
>> On 14 February 2017 at 08:10, Xingcan Cui  wrote:
>>
>>> Hi Greg,
>>>
>>> I also found that in VertexCentricIteration.java, the message set is
>>> taken as the workset while the vertex set is taken as the delta for
>>> solution set. By doing like that, the setNewVertex method will not actually
>>> active a vertex. In other words, if no message is generated (the workset is
>>> empty) the "pact.runtime.workset-empty-aggregator" will judge
>>> convergence of the delta iteration and then the iteration just terminates.
>>> Is this a bug?
>>>
>>> Best,
>>> Xingcan
>>>
>>>
>>> On Mon, Feb 13, 2017 at 5:24 PM, Xingcan Cui  wrote:
>>>
 Hi Greg,

 Thanks for your attention.

 It takes me a little time to read the old PR on FLINK-1885. Though
 the VertexCentricIteration, as well as its related classes, has been
 refactored, I understand what Markus want to achieve.

 I am not sure if using a bulk iteration instead of a delta one could
 eliminate the "out of memory" problem.  Except for that, I think the "auto
 update" has nothing to do with the bulk mode. Considering the compatible
 guarantee, here is my suggestions to improve gelly's iteration API:

 1) Add an "autoHalt" flag to the ComputeFunction.

 2) If the flag is set true (default), apply the current mechanism .

 3) If the flag is set false, call out.collect() to update the vertex
 value whether the setNewVertexValue() method is called or not, unless the
 user explicitly call a (new added) voteToHalt() method in the
 ComputeFunction.

 By adding these, users can decide when to halt a vertex themselves.
 What do you think?

 As for the "update edge values during vertex iterations" problem, I
 think it needs a redesign for the gelly framework (Maybe merge the vertices
 and edges into a single data set? Or just change the iterations'
  implementation? I can't think it clearly now.), so that's it for now.
 Besides, I don't think there will be someone who really would love to write
 a graph algorithm with Flink native operators and that's why gelly is
 designed, isn't it?

 Best,
 Xingcan

 On Fri, Feb 10, 2017 at 10:31 PM, Greg Hogan 
 wrote:

> Hi Xingcan,
>
> FLINK-1885 looked into adding a bulk mode to Gelly's iterative models.
>
> As an alternative you could implement your algorithm with Flink
> operators and a bulk iteration. Most of the Gelly library is written with
> native operators.
>
> Greg
>
> On Fri, Feb 10, 2017 at 5:02 AM, Xingcan Cui 
> wrote:
>
>> Hi Vasia,
>>
>> b) As I said, when some vertices finished their work in current
>> phase, they have nothing to do (no value updates, no message received, 
>> just
>> like slept) but to wait for other vertices that have not finished (the
>> current phase) yet. After that in the next phase, all the vertices should
>> go back to work again and if there are some vertices become inactive in
>> last phase, it could be hard to reactive them again by message since we
>> even don't know which vertices to send to. The only solution is to keep 
>> all
>> vertices active, whether by updating vertices values in each super step 
>> or
>> sending heartbeat me

Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Flavio Pompermaier
Hi Nico,
thanks for the response. The problem is that I don't use the quickstart
example.
I have a working set of jobs (in Flink 1.1.4) with some unit tests.
In the unit tests I use the following dependency that causes the problem:

   
org.apache.flink
flink-test-utils_2.10
1.2.0
test-jar
test


Best,
Flavio

On Tue, Feb 14, 2017 at 2:51 PM, Nico Kruber  wrote:

> You do not require a plugin, but most probably this dependency was not
> fetched
> by Eclipse. Please try a "mvn clean package" in your project and see
> whether
> this helps Eclipse.
>
> Also, you may create a clean test project with
>
> mvn archetype:generate   \
>   -DarchetypeGroupId=org.apache.flink  \
>   -DarchetypeArtifactId=flink-quickstart-java  \
>   -DarchetypeVersion=1.2.0
>
> for which I could not find any dependency issues using Eclipse.
>
> Regards,
> Nico
>
> On Tuesday, 14 February 2017 14:17:10 CET Flavio Pompermaier wrote:
> > Hi to all,
> > I've tried to migrate to Flink 1.2.0 and now my Eclipse projects says
> that
> > they can't find *apacheds-jdbm1* that has packaging bundle. Should I
> > install some plugin?
> >
> > Best,
> > Flavio
>
>


Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Nico Kruber
I did some digging and this is actually documented in FLINK-4813 [1].

To work around this issue, add the following plugin to your build plugins:



org.apache.felix
maven-bundle-plugin
3.0.1
true
true



Nico

[1] https://issues.apache.org/jira/browse/FLINK-4813

On Tuesday, 14 February 2017 17:45:12 CET Flavio Pompermaier wrote:
> Hi Nico,
> thanks for the response. The problem is that I don't use the quickstart
> example.
> I have a working set of jobs (in Flink 1.1.4) with some unit tests.
> In the unit tests I use the following dependency that causes the problem:
> 
>
> org.apache.flink
> flink-test-utils_2.10
> 1.2.0
> test-jar
> test
> 
> 
> Best,
> Flavio
> 
> On Tue, Feb 14, 2017 at 2:51 PM, Nico Kruber  wrote:
> > You do not require a plugin, but most probably this dependency was not
> > fetched
> > by Eclipse. Please try a "mvn clean package" in your project and see
> > whether
> > this helps Eclipse.
> > 
> > Also, you may create a clean test project with
> > 
> > mvn archetype:generate   \
> > 
> >   -DarchetypeGroupId=org.apache.flink  \
> >   -DarchetypeArtifactId=flink-quickstart-java  \
> >   -DarchetypeVersion=1.2.0
> > 
> > for which I could not find any dependency issues using Eclipse.
> > 
> > Regards,
> > Nico
> > 
> > On Tuesday, 14 February 2017 14:17:10 CET Flavio Pompermaier wrote:
> > > Hi to all,
> > > I've tried to migrate to Flink 1.2.0 and now my Eclipse projects says
> > 
> > that
> > 
> > > they can't find *apacheds-jdbm1* that has packaging bundle. Should I
> > > install some plugin?
> > > 
> > > Best,
> > > Flavio



signature.asc
Description: This is a digitally signed message part.


Re: Unable to use Scala's BeanProperty with classes

2017-02-14 Thread Nico Kruber
Hi Adarsh,
thanks for reporting this. It should be fixed eventually.

@Timo: do you have an idea for a work-around or quick-fix?


Regards
Nico

On Tuesday, 14 February 2017 21:11:21 CET Adarsh Jain wrote:
> I am getting the same problem when trying to do FlatMap operation on my
> POJO class.
> 
> Exception in thread "main" java.lang.IllegalStateException: Detected
> more than one setter
> 
> 
> 
> Am using Flink 1.2, the exception is coming when using FlatMap
> 
> https://issues.apache.org/jira/browse/FLINK-5070



signature.asc
Description: This is a digitally signed message part.


Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Flavio Pompermaier
Great! Thanks a lot

On 14 Feb 2017 6:07 p.m., "Nico Kruber"  wrote:

> I did some digging and this is actually documented in FLINK-4813 [1].
>
> To work around this issue, add the following plugin to your build plugins:
>
> 
> 
> org.apache.felix
> maven-bundle-plugin
> 3.0.1
> true
> true
> 
>
>
> Nico
>
> [1] https://issues.apache.org/jira/browse/FLINK-4813
>
> On Tuesday, 14 February 2017 17:45:12 CET Flavio Pompermaier wrote:
> > Hi Nico,
> > thanks for the response. The problem is that I don't use the quickstart
> > example.
> > I have a working set of jobs (in Flink 1.1.4) with some unit tests.
> > In the unit tests I use the following dependency that causes the problem:
> >
> >
> > org.apache.flink
> > flink-test-utils_2.10
> > 1.2.0
> > test-jar
> > test
> > 
> >
> > Best,
> > Flavio
> >
> > On Tue, Feb 14, 2017 at 2:51 PM, Nico Kruber 
> wrote:
> > > You do not require a plugin, but most probably this dependency was not
> > > fetched
> > > by Eclipse. Please try a "mvn clean package" in your project and see
> > > whether
> > > this helps Eclipse.
> > >
> > > Also, you may create a clean test project with
> > >
> > > mvn archetype:generate   \
> > >
> > >   -DarchetypeGroupId=org.apache.flink  \
> > >   -DarchetypeArtifactId=flink-quickstart-java  \
> > >   -DarchetypeVersion=1.2.0
> > >
> > > for which I could not find any dependency issues using Eclipse.
> > >
> > > Regards,
> > > Nico
> > >
> > > On Tuesday, 14 February 2017 14:17:10 CET Flavio Pompermaier wrote:
> > > > Hi to all,
> > > > I've tried to migrate to Flink 1.2.0 and now my Eclipse projects says
> > >
> > > that
> > >
> > > > they can't find *apacheds-jdbm1* that has packaging bundle. Should I
> > > > install some plugin?
> > > >
> > > > Best,
> > > > Flavio
>
>


Re: JavaDoc 404

2017-02-14 Thread Yassine MARZOUGUI
Hi Robert,

Thanks for reporting back!
The docs look much better now.

Cheers, Yassine


On Feb 14, 2017 14:30, "Robert Metzger"  wrote:

HI Yassine,

I've now fixed the javadocs build. It is already rebuild for the 1.2
javadocs: https://ci.apache.org/projects/flink/flink-docs-
release-1.2/api/java/org/apache/flink/streaming/connectors/fs/bucketing/
BucketingSink.html
The build for master should be done in 30 minutes.


On Wed, Feb 8, 2017 at 10:49 AM, Yassine MARZOUGUI <
y.marzou...@mindlytix.com> wrote:

> Thanks Robert and Ufuk for the update.
>
> 2017-02-07 18:43 GMT+01:00 Robert Metzger :
>
>> I've filed a JIRA for the issue: https://issues.apache.o
>> rg/jira/browse/FLINK-5736
>>
>> On Tue, Feb 7, 2017 at 5:00 PM, Robert Metzger 
>> wrote:
>>
>>> Yes, I'll try to fix it asap. Sorry for the inconvenience.
>>>
>>> On Mon, Feb 6, 2017 at 4:43 PM, Ufuk Celebi  wrote:
>>>
 Thanks for reporting this. I think Robert (cc'd) is working in fixing
 this, correct?

 On Sat, Feb 4, 2017 at 12:12 PM, Yassine MARZOUGUI
  wrote:
 > Hi,
 >
 > The JavaDoc link of BucketingSink in this page[1] yields to a 404
 error. I
 > couldn't find the correct url.
 > The broken link :
 > https://ci.apache.org/projects/flink/flink-docs-master/api/j
 ava/org/apache/flink/streaming/connectors/fs/bucketing/Bucke
 tingSink.html
 >
 > Other pages in the JavaDoc, like this one[2], seem lacking formatting,
 > because
 > https://ci.apache.org/projects/flink/flink-docs-master/api/j
 ava/stylesheet.css
 > and
 > https://ci.apache.org/projects/flink/flink-docs-master/api/j
 ava/script.js
 > are not found (404).
 >
 > Best,
 > Yassine
 >
 >
 > [1] :
 > https://ci.apache.org/projects/flink/flink-docs-release-1.2/
 dev/connectors/filesystem_sink.html
 > [2] :
 > https://ci.apache.org/projects/flink/flink-docs-master/api/j
 ava/org/apache/flink/streaming/api/functions/sink/RichSinkFunction.html

>>>
>>>
>>
>


Clarification: use of AllWindowedStream.apply() function

2017-02-14 Thread nsengupta
I am trying to understand if the AllWindowedStream.apply() function can be
used for creating a DataStream of new types. 

Here is a portion of the code:


case class RawMITSIMTuple(
 tupletype: Int,  timeOfReport: Int,
vehicleID: Int,   vehicleSpeed: Int,
 expressWayID: Int,   vehicleLane: Int, 
vehicleDir: Int,
 vehicleSegment: Int, vehiclePos: Int,   queyID:
Int,
 segmentInit: Int,segmentEnd: Int ,
 dayOfWeek: Int,  timeOfDay: Int,dayID:
Int
 )

  case class EWayCoordinates(eWayID: Int, eWayDir: Int, eWayLane: Int,
eWaySegment: Int)

  case class VehicleDetails(vehicleID: Int, vehicleSpeed: Int, vehiclePos:
Int)

  case class PositionReport(
  tupletype: Int, timeOfReport: Int,
  eWayCoordinates: EWayCoordinates,
  vehicleDetails: VehicleDetails
   )

val envDefault = StreamExecutionEnvironment.getExecutionEnvironment
envDefault.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

// ...

val positionReportStream = this
  .readRawMITSIMTuplesInjected(envDefault,args(0))
  .assignAscendingTimestamps(e => e.timeOfReport)
  .windowAll(TumblingEventTimeWindows.of(Time.seconds(30)))



positionReportStream above is of type *AllWindowedStream*. As such, I cannot
use it as a DataStream[PositionReport]: I cannot segregate it by some kind
of KeySelection and use it further down. 

I have been thinking of using a FoldFunction on it, but that gives a
collection of PositionReport. So, I get a DataStream[Vector[PositionReport]]
(Vector is just an example).

The other alternative is to use an AllWindowedStream.apply() function, where
I can emit a DataStream[PositionReport]. But, that will mean that I am using
the apply function more as a *mapper*. Is that the right way to use it?

Could someone please push me to the correct way to deal with it?

-- Nirmalya
 



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Clarification-use-of-AllWindowedStream-apply-function-tp11627.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Watermarks per key

2017-02-14 Thread Jordan Ganoff
Hi,

I’m designing a streaming job whose elements need to be windowed by event time 
across a large set of keys. All elements are read from the same source. Event 
time progresses independently across keys. Is it possible to assign timestamps, 
and thus generate independent watermarks, per keyed stream, so late arriving 
elements can be handled per keyed stream?

And in general, what’s the best approach to designing a job that needs to 
process different keyed streams whose event times do not relate to each other? 
My current approach generates timestamps at the source but never generates 
watermarks so no record is ever considered late. This has the unfortunate side 
effect of windows never closing. As a result, each event time window relies on 
a custom trigger which fires and purges the window after a given amount of 
processing time elapses during which no new records arrived.

Thanks,
Jordan

Re: Unable to use Scala's BeanProperty with classes

2017-02-14 Thread Adarsh Jain
Any help will be highly appreciable, am stuck on this one.






On Tue, Feb 14, 2017 at 10:47 PM, Nico Kruber 
wrote:

> Hi Adarsh,
> thanks for reporting this. It should be fixed eventually.
>
> @Timo: do you have an idea for a work-around or quick-fix?
>
>
> Regards
> Nico
>
> On Tuesday, 14 February 2017 21:11:21 CET Adarsh Jain wrote:
> > I am getting the same problem when trying to do FlatMap operation on my
> > POJO class.
> >
> > Exception in thread "main" java.lang.IllegalStateException: Detected
> > more than one setter
> >
> >
> >
> > Am using Flink 1.2, the exception is coming when using FlatMap
> >
> > https://issues.apache.org/jira/browse/FLINK-5070
>
>


Re: Clarification: use of AllWindowedStream.apply() function

2017-02-14 Thread nsengupta
I have gone through this  post

 
, where Aljoscha explains that /mapping/ on WindowedStream is /not/ allowed. 

So, I think I haven't asked the question properly. Here is (hopefully) a
better and easier version:

1.I begin with records of type RawMITSIMTuple. 
2.When I group them using a Window, I get an
AllWindowedStream[RawMITSIMTuple].
3.I /fold/ the tuples obtained in the Window, which gives me a
DataStream[Vector[RawMITSIMTuple].
4.What I need is a DataStream[PositionReport]. So, I need to flatMap the
output of previous step, where I first get hold of each of the
RawMITSIMTuple and map that to PositionReport.

val positionReportStream = this
  .readRawMITSIMTuplesInjected(envDefault,args(0))
  .assignAscendingTimestamps(e => e.timeOfReport)
  .windowAll(TumblingEventTimeWindows.of(Time.seconds(30)))
  .fold(Vector[RawMITSIMTuple]())((collectorBin,rawRecord) => {
  collectorBin :+ rawRecord)
})
  .flatMap(r => r.map(e => this.preparePositionReport(e)))

This gives me what I want, but I feel this is verbose and inefficient. Am I
thinking correctly? If so, what is a better idiom to use in such cases?

-- Nirmalya




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Clarification-use-of-AllWindowedStream-apply-function-tp11627p11630.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.