[DISCUSS] Add native CSV loading support for gremlin (GraphReader)

Alaa Mahmoud Tue, 01 Dec 2015 07:47:31 -0800

Adding support for loading CSV into a graph using Gremlin's GraphReader
will lower the entry barrier for new users. A lot of data is already in CSV
format and a lot of existing databases/repositories allow users to export
their data as CSV.


I'd like to add this capability to the gremlin core as a new GraphReader
instance. Since the CSV data doesn't map directly to nodes and vertexes,
I'm planning to do the loading on two steps:

*Nodes*
The first is to load a CSV as vertex CSV file. I'll create a node for every
line in the csv and a property for each column on that line. If the csv has
column headers, then the names of the columns will be the names of the
corresponding vertex property. Otherwise, It'll be prop1, prop2, etc...
(There are other ways to do it as well, but I'm just trying to show the
general idea)

*Edges*
The second step is loading the edges csv file which will be in the
following format

vertex1 prop name (source vertex), vertex2 prop name (destination vertex),
bidirectional (TRUE/FALSE), prop1,prop2,prop3,etc...

For each line in the edge csv file, the reader will search for a vertex
with the vertex1 prop value (caller need to ensure it's unique) to find the
source vertex, search for a destination vertex with destination prop value
and then create an edge that ties the two together. We will be creating an
edge property for each additional property on the line.

Thoughts?

Alaa

[DISCUSS] Add native CSV loading support for gremlin (GraphReader)

Reply via email to