[jira] [Commented] (SPARK-25994) SPIP: Property Graphs, Cypher Queries, and Algorithms

Ruben Berenguel (JIRA) Wed, 05 Jun 2019 09:18:18 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856827#comment-16856827
 ]


Ruben Berenguel commented on SPARK-25994:
-----------------------------------------

Hi [~mju] I'd like to lend a hand if you feel like it (I've been following 
on-and-off the discussions and SPIPs for this, and currently use GraphFrames). 
Wouldn't mind helping with Python APIs (I'm somewhat familiar with the Python 
APIs and a bit of the internals, even if I'm not a frequent user of PySpark)

> SPIP: Property Graphs, Cypher Queries, and Algorithms
> -----------------------------------------------------
>
>                 Key: SPARK-25994
>                 URL: https://issues.apache.org/jira/browse/SPARK-25994
>             Project: Spark
>          Issue Type: Epic
>          Components: Graph
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Martin Junghanns
>            Priority: Major
>              Labels: SPIP
>
> Copied from the SPIP doc:
> {quote}
> GraphX was one of the foundational pillars of the Spark project, and is the 
> current graph component. This reflects the importance of the graphs data 
> model, which naturally pairs with an important class of analytic function, 
> the network or graph algorithm. 
> However, GraphX is not actively maintained. It is based on RDDs, and cannot 
> exploit Spark 2’s Catalyst query engine. GraphX is only available to Scala 
> users.
> GraphFrames is a Spark package, which implements DataFrame-based graph 
> algorithms, and also incorporates simple graph pattern matching with fixed 
> length patterns (called “motifs”). GraphFrames is based on DataFrames, but 
> has a semantically weak graph data model (based on untyped edges and 
> vertices). The motif pattern matching facility is very limited by comparison 
> with the well-established Cypher language. 
> The Property Graph data model has become quite widespread in recent years, 
> and is the primary focus of commercial graph data management and of graph 
> data research, both for on-premises and cloud data management. Many users of 
> transactional graph databases also wish to work with immutable graphs in 
> Spark.
> The idea is to define a Cypher-compatible Property Graph type based on 
> DataFrames; to replace GraphFrames querying with Cypher; to reimplement 
> GraphX/GraphFrames algos on the PropertyGraph type. 
> To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), 
> reusing existing proven designs and code, will be employed in Spark 3.0. This 
> graph query processor, like CAPS, will overlay and drive the SparkSQL 
> Catalyst query engine, using the CAPS graph query planner.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25994) SPIP: Property Graphs, Cypher Queries, and Algorithms

Reply via email to