[ https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857833#comment-16857833 ]
Martin Junghanns commented on SPARK-25994: ------------------------------------------ Hi [~RBerenguel]. Thanks a lot for your interest. I think a good way to get involved is to join the discussion on the PRs. As soon as we have https://issues.apache.org/jira/browse/SPARK-27300 merged, we'll open a PR that inludes the API for property graph construction. I think this is the perfect opportunity to discuss your ideas and get an understanding of how the Python API should behave. wdyt? > SPIP: Property Graphs, Cypher Queries, and Algorithms > ----------------------------------------------------- > > Key: SPARK-25994 > URL: https://issues.apache.org/jira/browse/SPARK-25994 > Project: Spark > Issue Type: Epic > Components: Graph > Affects Versions: 3.0.0 > Reporter: Xiangrui Meng > Assignee: Martin Junghanns > Priority: Major > Labels: SPIP > > Copied from the SPIP doc: > {quote} > GraphX was one of the foundational pillars of the Spark project, and is the > current graph component. This reflects the importance of the graphs data > model, which naturally pairs with an important class of analytic function, > the network or graph algorithm. > However, GraphX is not actively maintained. It is based on RDDs, and cannot > exploit Spark 2’s Catalyst query engine. GraphX is only available to Scala > users. > GraphFrames is a Spark package, which implements DataFrame-based graph > algorithms, and also incorporates simple graph pattern matching with fixed > length patterns (called “motifs”). GraphFrames is based on DataFrames, but > has a semantically weak graph data model (based on untyped edges and > vertices). The motif pattern matching facility is very limited by comparison > with the well-established Cypher language. > The Property Graph data model has become quite widespread in recent years, > and is the primary focus of commercial graph data management and of graph > data research, both for on-premises and cloud data management. Many users of > transactional graph databases also wish to work with immutable graphs in > Spark. > The idea is to define a Cypher-compatible Property Graph type based on > DataFrames; to replace GraphFrames querying with Cypher; to reimplement > GraphX/GraphFrames algos on the PropertyGraph type. > To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), > reusing existing proven designs and code, will be employed in Spark 3.0. This > graph query processor, like CAPS, will overlay and drive the SparkSQL > Catalyst query engine, using the CAPS graph query planner. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org