Hi, I've been working with a social network start-up that uses PostgreSQL as their only database. Recently, they became interested in graph databases, largely because of an article [1] suggesting that a SQL database "just chokes" when it encounters a depth-five friends-of-friends query (for a million users having 50 friends each).
The article didn't provide the complete SQL queries, so I had to buy the referenced book to get the full picture. It turns out, the query was a simple self-join, which, of course, isn't very efficient. When we rewrote the query using a modern RECURSIVE CTE, it worked but still took quite some time. Of course, there will always be a need for specific databases, and some queries will run faster on them. But, if PostgreSQL could handle graph queries with a Big-O runtime similar to graph databases, scalability wouldn't be such a big worry. Just like the addition of the JSON type to PostgreSQL helped reduce the hype around NoSQL, maybe there's something simple that's missing in PostgreSQL that could help us achieve the same Big-O class performance as graph databases for some of these type of graph queries? Looking into the key differences between PostgreSQL and graph databases, it seems that one is how they store adjacent nodes. In SQL, a graph can be represented as one table for the Nodes and another table for the Edges. For a friends-of-friends query, we would need to query Edges to find adjacent nodes, recursively. Graph databases, on the other hand, keep adjacent nodes immediately accessible by storing them with the node itself. This looks like a major difference in terms of how the data is stored. Could a hashset type help bridge this gap? The idea would be to store adjacent nodes as a hashset column in a Nodes table. Apache AGE is an option for users who really need a new graph query language. But I believe there are more users who occasionally run into a graph problem and would be glad if there was an efficient way to solve it in SQL without having to bring in a new database. /Joel [1] https://neo4j.com/news/how-much-faster-is-a-graph-database-really/