[jira] [Updated] (CALCITE-6171) Support gremlin adapter

ASF GitHub Bot (Jira) Sun, 31 Dec 2023 01:22:09 -0800


     [ 
https://issues.apache.org/jira/browse/CALCITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated CALCITE-6171:
------------------------------------
    Labels: pull-request-available  (was: )

> Support gremlin adapter
> -----------------------
>
>                 Key: CALCITE-6171
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6171
>             Project: Calcite
>          Issue Type: New Feature
>    Affects Versions: 1.36.0
>            Reporter: xingyuan cheng
>            Priority: Major
>              Labels: pull-request-available
>
> h1. Motivation
> Hi, community. Currently, more and more users are using some graph databases, 
> such as JanusGraph, HugeGraph, etc.
> To do some relationship representation of personnel social networks,
> It is used to count the activity of each user in the social network. Most 
> graph databases are in the graph building and graph traversal stage.
> All will be implemented using Gremlin syntax.
> However, this is very unfriendly to users who are not familiar with gremlin 
> syntax. While calcite exists as a query framework,
> It also provides an adapter interface to adapt to different database 
> dialects, such as parsing, relational algebra conversion, and query plan 
> binding.
> Our company has solved the problem of adapting various graph databases. This 
> is my warehouse: [https://github.com/kaori-seasons/calcite-gremlin-sql]
>  
> h2. Background
>  
> Calcite itself supports the database language expansion of the adapter, which 
> enables users to understand the meaning of the grammar.
> Complete the simplification of the dialect. For example, expand SqlNode to 
> implement syntax analysis, and expand RelNode to implement logical plan 
> mapping.
> thinkerpop is an adaptation framework for various graph databases. In this 
> framework, gremlin syntax is mentioned for the first time.
> It has now become a universal query layer for graph databases. While 
> expanding query statements through calcite’s adapter interface,
> We will also use thinkerpop's universal graph database API to provide dialect 
> compatibility for different graph databases.
> Give a simple example:
> From 
> {code:java}
> SELECT "key" FROM inttype{code}
>  maps to
> {code:java}
> g.V().hasLabel("inttype").group().unfold().select(Column.values).order().by(_.unfold().id()).project("inttype").
>  
> by(.project("key").by(.unfold().choose(.has("key"),.values("key"),_.constant("\$%#NULL#
>  %\$")))) {code}
>  
>  
> The design architecture is divided into three layers.
> Analytical syntax layer, relational algebra transformation layer, logical 
> plan binding layer.
> Parsing syntax layer: In the parsing query statement phase, fields and 
> equivalent conditions are split and converted into points and edges.
> Relational algebra layer: Split it into a collection of points and edges, and 
> convert it into a TableScan during the aggregation/sorting/query stage where 
> calcite abstracts it.
> It is convenient to generate query plans based on conditions and field 
> information.
> Connection scanning/single table filtering and other methods can be used to 
> traverse from any edge/any starting point in the graph
> Logical plan binding layer: Bind behaviors such as connection scanning/single 
> table filtering/projection to calcite’s planner to generate query plans.
> The process of generating Gremlin logical plan using select statement:
> 1. First of all, all graph databases start from a source point to build the 
> graph. We will use the GraphTraversalSource provided by thinkerpop.
> As the origin, extract the syntax of the incoming point and side information. 
> This step will be implemented in SqlSchemaGrabber
> 2. Secondly, for select/where/having/order by/group by our plan in the 
> parsing phase is as follows:
> {code:java}
>   - group by: for a point. There are out-degree and in-degree attributes in 
> the graph. From the perspective of the data table, it is equivalent to 
> converting the table data into IN or OUT.
> These two dimensions are aggregated. This behavior also corresponds to the 
> table traversal graph operation. During the graph traversal process, 
> fold/unfold tags will be generated, representing the direction of graph 
> traversal.
>   - select: For the select operation, the operation of scanning the entire 
> table can be regarded as projecting all columns into point attributes. The 
> value changes of each column are mapped to the gremlin operation of adding 
> points.
>   - where: The filter operation is used in graph computing semantic 
> scenarios. It can be regarded as the edges connected by the out-degree and 
> in-degree of the filter point, so it does not involve the conversion of 
> relational algebra.
>   Instead, it is pushed directly to the logical plan.
>   - order by: In the process of graph traversal, we mentioned that 
> fold/unflod will be generated on the path to represent the forward/backward 
> direction.
>   If a field is encountered and there is no value that can be sorted, we will 
> fall back to the origin of GraphTraversalSource and end the sorting operation.
>   If there are values that can be sorted, they will be unified in 
> SqlTraversalEngine, in-degree and out-degree will be counted separately for 
> aggregation, and then used with group by according to label (IN/OUT)
>   - having: The meaning is the same as group by, but the label is different 
> (in addition to the IN/OUT columns, specific point fields need to be 
> specified) {code}
>  
> h2. samples
> Below I will give a simple example using the unit test in my project
> Now suppose there is such a graph data set, there are multiple point sets 🎁 
> and an edge set
> Since the company is connected to the graph database, further testing is 
> needed. I will complete it in the near future //todo
> The point sets respectively represent countries, companies, groups of people, 
> and spatial locations.
>  
> {code:java}
> {
>    "tables": [
>      
> {        "name": "company",        "columns": [          \{"name": "name", 
> "type": "string"}
>        ]
>      },
>      
> {        "name": "country",        "columns": [          \{"name": "name", 
> "type": "string"}
>        ]
>      },
>      
> {        "name": "planet",        "columns": [          \{"name": "name", 
> "type": "string"}
>        ]
>      },
>      
> {        "name": "person",        "columns": [          \{"name": "name", 
> "type": "string"}
> ,
>          {"name": "age", "type": "integer"}
>        ]
>      },
>      
> {        "name": "spaceship",        "columns": [          \{"name": "name", 
> "type": "string"}
> ,
>          {"name": "model", "type": "string"}
>        ]
>      },
>      
> {        "name": "satellite",        "columns": [          \{"name": "name", 
> "type": "string"}
>        ]
>      },
>      
> {        "name": "sensor",        "columns": [          \{"name": "name", 
> "type": "string"}
> ,
>          {"name": "type", "type": "string"}
>        ]
>      },
>      
> {        "name": "sensorReading",        "columns": [          \{"name": 
> "tstamp", "type": "long_timestamp", "propertyName": "timestamp"}
> ,
>          {"name": "dt", "type": "long_date", "propertyName": "date"},
>          {"name": "value", "type": "double"}
>        ]
>      },
>      
> {        "name": "fliesTo",        "columns":[          \{"name": "trips", 
> "type": "integer"}
>        ]
>      },
>      
> {        "name": "orbits",        "columns":[          \{"name": "launched", 
> "type": "integer"}
>        ]
>      }
>      ],
>    "relationships": [
>      {"outTable": "company", "inTable": "country", "edgeLabel": "baseIn"},
>      {"outTable": "person", "inTable": "company", "edgeLabel": "worksFor"},
>      {"outTable": "person", "inTable": "planets", "edgeLabel": "travelledTo"},
>      {"outTable": "company", "inTable": "spaceship", "edgeLabel": "owns"},
>      {"outTable": "person", "inTable": "spaceship", "edgeLabel": "pilots"},
>      {"outTable": "sensor", "inTable": "sensorReading", "edgeLabel": 
> "hasReading", "fkTable": "sensorReading"},
>      {"outTable": "person", "inTable": "planet", "edgeLabel": "fliesTo"},
>      {"outTable": "satellite", "inTable": "planet", "edgeLabel": "orbits"},
>      {"outTable": "person", "inTable": "person", "edgeLabel": "friendsWith"}
>    ]
> } {code}
> Scope of application
> Graph database compatible with thinkerpop's gremlin syntax



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-6171) Support gremlin adapter

Reply via email to