Re: [HACKERS] Graph datatype addition

Jim Nasby Wed, 08 May 2013 12:08:30 -0700

On 5/8/13 1:40 PM, Atri Sharma wrote:

On Thu, May 2, 2013 at 7:58 AM, Atri Sharma <[email protected]> wrote:



Sent from my iPad

On 02-May-2013, at 4:33, Misa Simic <[email protected]> wrote:



On Wednesday, May 1, 2013, Atri Sharma wrote:


Hi all,

Please find a probable prototype for the same:

struct GraphNode
{
     Oid NodeOid;    // Oid of the row which is the node here. We will
store an identifier to it here rather than the complete row(with data)
itself.
     AdjacencyList *list;   // Pointer to the node's adjacency list.
};

struct AdjacencyList
{
       Oid[] neighbours_list;
};

struct AdjacencyList is probably the 'hottest' data structure in our
entire implementation. We can think of making a cache of recently
accessed struct AdjacencyList instances, or the AdjacencyList(s) of
the neighbours of the recently accessed nodes, because, they are most
likely to be accessed in near future. Advice here, please?

So.

struct AdjacencyCache
{
      Oid[] cache_values;
};

push and pop functions for AdjacencyCache follow.

We need a replacement and invalidation algorithm for the cache. I feel
a version of LRU should be good here.

I have not given a prototype for operations and algorithm implementations.

I feel,as suggested by Peter and Jaime, we can look at pgRouting code
for algorithm implementations.

Florian's concerns are mitigated here to some extent,IMO. Since the
nodes and linkings are loosely coupled, and not represented as a
single representation, updating or changing of any part or adding a
new edge is no longer an expensive operation, as it only requires a
lookup of GraphNode and then its AdjacencyList. If we use the cache as
well, it will further reduce the lookup costs.

I have not yet thought of the user visible layer as suggested by Jim.
Probably. once we are ok with the internal layer, we can move to the
user visible layer.

Advice/Comments/Feedback please?


Honestly - I think I dont understand proposal...

Datatypes - are about values - what will be stored in that column in a
table....

Datatype - cant have any clue about "rows"

How I understand what you described - you can achieve the same with pure SQL
- struct are equvalent to graph tables... Instead od Oid column will store
PKs of nodes table...


Yes, I agree.I need to think more.

Let me get back with a deeper proposal.

Regards,

Atri


Hi all,

In continuation of the above discussion,I have been thinking about the
design of the data type. I have been thinking on the lines of making a
multi dimensional data structure for the same:

Specifically, I have been thinking about making multi lists for
representing data. After some research, I think that the following
data structure may help:

Each node will be represented as:

[Down Pointer][Atom][Right Pointer]

Suppose, a graph is like(sorry for the bad drawing):

      B
    /
A    D
   \  /
    C
     \
      E

can be represented as:
                              C's data                    E's data
           D's data
                              ^                               ^
                  ^
A's data
[|][1][---------------------->[|][1][------------------>[|][1][NULL]
  ^                          ^
[|][1][------------------>[|][0][--------------------->[|][1][NULL]
                                                            ^
                                                            B's data


Essentially, the Atom flag denotes if the node has any out edges from
it. If it has no out edge, ATOM is 0 and Down Pointer points to an
auxiliary structure which holds the node's data(hence, the data can be
of arbitrary size).

If the node has some out degree, then, those nodes are added to a new
sublist which starts from the node which spawns those nodes.Node's
down pointer points to the head of the new sublist.

Essentially, a sublist has all the nodes directly spawning from the
head node of the sublist.

This approach has multiple advantages in term of memory and
efficiency. Also, isolating sub graphs based on some criteria is
pretty efficient, which is good for many analytics based operations.

Access time is restricted as well.

Thoughts/Comments/Feedback please?


Your second drawing didn't really make any sense to me. :(

I do think it would be most productive to focus on what the API for dealing 
with graph data would look like before trying to handle the storage aspect. The 
storage is potentially dirt-simple, as others have shown. The only challenge 
would be efficiency, but it's impossible to discuss efficiency without some 
clue of how the data will be accessed. Frankly, for the first round of this I 
think it would be best if the storage really was just some raw tables. Once 
something is available people will start figuring out how to use it, and where 
the API needs to be improved.
--
Jim C. Nasby, Data Architect                       [email protected]
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

Reply via email to