Re: Gelly PageRank implementations in 1.2 to 1.3

2017-07-22 Thread Greg Hogan
Hi Marc,

PageRank and GSAPageRank were moved to the flink-gelly-examples jar in the 
org.apache.flink.graph.examples package. A library algorithm was added that 
supports both source and sink vertices. This limitation of the old algorithms 
was noted in the class documentation and I understand to be an effect of delta 
iterations. The new implementation is also significantly faster 
(https://github.com/apache/flink/pull/2733#issuecomment-278789830 
).

PageRank can be run using the examples jar from the command line, for example 
(don’t wildcard the jar file as in the documentation until we get the javadoc 
jar removed from the next release).

$ mv opt/flink-gelly* lib/
$ ./bin/flink run examples/gelly/flink-gelly-examples_2.11-1.3.1.jar \
--algorithm PageRank \
--input CSV --type integer --simplify directed --input_filename  
--input_field_delimiter $'\t' \
--output print

The output can also be written to CSV in similar fashion to the input.

The code to call the library PageRank from the examples driver is as with any 
GraphAlgorithm 
(https://github.com/apache/flink/blob/release-1.3/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/PageRank.java
 
):

graph.run(new PageRank(dampingFactor, iterations,  
convergenceThreshold));

Please let us know of any issues or additional questions!

Greg


> On Jul 22, 2017, at 4:33 PM, Kaepke, Marc  wrote:
> 
> Hi there,
> 
> why was the PageRank version (which implements the GraphAlgorithm interface) 
> removed in 1.3?
> 
> How can I use the new PageRank implementation in 1.3.x?
> 
> Why PageRank doesn’t use the graph processing models (vertex-centric, sg or 
> gsa) anymore?
> 
> Thanks!
> 
> Bests,
> marc


Gelly PageRank implementations in 1.2 to 1.3

2017-07-22 Thread Kaepke, Marc
Hi there,

why was the PageRank version (which implements the GraphAlgorithm interface) 
removed in 1.3?

How can I use the new PageRank implementation in 1.3.x?

Why PageRank doesn’t use the graph processing models (vertex-centric, sg or 
gsa) anymore?

Thanks!

Bests,
marc

notNext() and next(negation) not yielding same output in Flink CEP

2017-07-22 Thread Yassine MARZOUGUI
Hi all,

I would like to match the maximal consecutive sequences of events of type A
in a stream.
I'm using the following :

Pattern.begin("start").where(event is not A)

.next("middle").where(event is A).oneOrMore().consecutive()

.next("not").where(event is not A)

I This give the output I want. However if I use
notNext("not").where(event is A) instead of next("not").where(event is
not A), the middle patterns contain only sequences of single elements
of type A.
My understaning is that notNext() in this case is equivalent to
next(negation), so why is the output different?

Thank you in advance.

Best,
Yassine


Re: Integrating Flink CEP with a Rules Engine

2017-07-22 Thread Anton
We also have a requirement of using Drools in Flink. Drools brings a very
mature and usable business rules editor. And to be able to integrate Drools
into Flink would be very useful.

On 23 June 2017 at 22:09, Suneel Marthi  wrote:

> Sorry I didn't read the whole thread.
>
> We have a similar rqmt wherein the users would like to add/update/delete
> CEP patterns via UX or REST api and we started discussing building a REST
> api for that, glad to see that this is a common ask and if there's already
> a community effort around this - that's great to know.
>
> On Fri, Jun 23, 2017 at 9:54 AM, Sridhar Chellappa 
> wrote:
>
>> Folks,
>>
>> Plenty of very good points but I see this discussion digressing from what
>> I originally asked for. We need a dashboard to let the Business Analysts to
>> define rules and the CEP to run them.
>>
>> My original question was how to solve this with Flink CEP?
>>
>> From what I see, this is not a solved problem. Correct me if I am wrong.
>>
>> On Fri, Jun 23, 2017 at 6:52 PM, Kostas Kloudas <
>> k.klou...@data-artisans.com> wrote:
>>
>>> Hi all,
>>>
>>> Currently there is an ongoing effort to integrate FlinkCEP with Flink's
>>> SQL API.
>>> There is already an open FLIP for this:
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-20%3A
>>> +Integration+of+SQL+and+CEP
>>> 
>>>
>>> So, if there was an effort for integration of different
>>> libraries/tools/functionality as well, it
>>> would be nice to go a bit more into details on i) what is already there,
>>> ii) what is planned to be
>>> integrated for the SQL effort, and iii) what else is required, and
>>> consolidate the resources
>>> available.
>>>
>>> This will allow the community to move faster and with a clear roadmap.
>>>
>>> Kostas
>>>
>>> On Jun 23, 2017, at 2:51 PM, Suneel Marthi  wrote:
>>>
>>> FWIW, here's an old Cloudera blog about using Drools with Spark.
>>>
>>> https://blog.cloudera.com/blog/2015/11/how-to-build-a-comple
>>> x-event-processing-app-on-apache-spark-and-drools/
>>>
>>> It should be possible to invoke Drools from Flink in a similar way (I
>>> have not tried it).
>>>
>>> It all depends on what the use case and how much of present Flink CEP
>>> satisfies the use case before considering integration with more complex
>>> rule engines.
>>>
>>>
>>> Disclaimer: I work for Red Hat
>>>
>>> On Fri, Jun 23, 2017 at 8:43 AM, Ismaël Mejía  wrote:
>>>
 Hello,

 It is really interesting to see this discussion because that was one
 of the questions on the presentation on CEP at Berlin Buzzwords, and
 this is one line of work that may eventually make sense to explore.

 Rule engines like drools implement the Rete algorithm that if I
 understood correctly optimizes the analysis of a relatively big set of
 facts (conditions) into a simpler evaluation graph. For more details
 this is a really nice explanation.
 https://www.sparklinglogic.com/rete-algorithm-demystified-part-2/

 On flink's CEP I have the impression that you define this graph by
 hand. Using a rule engine you could infer an optimal graph from the
 set of rules, and then this graph could be translated into CEP
 patterns.

 Of course take all of this with a grain of salt because I am not an
 expert on both CEP or the Rete algorithm, but I start to see the
 connection of both worlds more clearly now. So if anyone else has
 ideas of the feasibility of this or can see some other
 issues/consequences please comment. I also have the impression that
 distribution is less of an issue because the rete network is
 calculated only once and updates are not 'dynamic' (but I might be
 wrong).

 Ismaël

 ps. I add Thomas in copy who was who made the question in the
 conference in case he has some comments/ideas.


 On Fri, Jun 23, 2017 at 1:48 PM, Kostas Kloudas
  wrote:
 > Hi Jorn and Sridhar,
 >
 > It would be worth describing a bit more what these tools are and what
 are
 > your needs.
 > In addition, and to see what the CEP library already offers here you
 can
 > find the documentation:
 >
 > https://ci.apache.org/projects/flink/flink-docs-release-1.3/
 dev/libs/cep.html
 >
 >
 > Thanks,
 > Kostas
 >
 > On Jun 23, 2017, at 1:41 PM, Jörn Franke 
 wrote:
 >
 > Hallo,
 >
 > It si possible, but some caveat : flink is a distributed system, but
 in
 > drools the fact are only locally available. This may lead to strange
 effects
 > when rules update the fact base.
 >
 > Best regards
 >
 > On 23. Jun 2017, at 12:49, Sridhar Chellappa 
 wrote:
 >
 > Folks,