Re: Creating Nodes in Bulk,

2021-09-02 Thread Muhammad Shoaib
HI Josh, Hi All,

Thanks for the response.

Going forward with option 1, would you suggest inserting values directly
into tables or constructing cypher commands.

Kind regards
Shoaib

On Thu, Sep 2, 2021 at 12:48 PM Josh Innis  wrote:

> Hi all,
>
> Upon thinking about this more I realized there is an underlying issue that
> needs to be discussed that directly affects the approach needed on this
> feature. To do so: I need to talk about an early design mistake that was
> made on this project.
>
> Back in 2019 when this project was still called Agensgraph-Extension and
> there was no consideration to donate to Apache, the project ignored a basic
> rule of programming: don't use deprecated features. In particular: the use
> of OIDs by non pg_catalog tables. OID stands for Object Id. In short,
> Postgres uses OIDs to assign a unique value to a row that no other system
> table row has. In earlier versions of Postgres they gave the right of the
> user to mark rows in other tables as "system" tables and give them an oid
> (An oid that no POSTGRES system table had either).
>
> This feature was deprecated in version Postgres 8.5. We started development
> of AGE on Postgres 11. To try and summarize a decision that was made in the
> first 3 months of a project. Some of the people involved (but not all) are
> no longer working on the project thought: "The reason why they did this is
> because they don't want systems built on top of Postgres to consider
> themselves system tables. We are not technically a 'system built on top of
> Postgres.' We are an extension of Postgres. Therefore our metadata tables
> are technically system tables. Therefore our unique identifiers should be
> unique in all of Postgres. Therefore we should use OIDs... Oh, it's
> deprecated. We should still use them. Why not, what's the worst that could
> happen?"
>
> The worst that could happen is Postgres removes the functionality of adding
> Oids to users tables (which we are) in Postgres 12. They removed the
> functionality of 'WITH OIDS' from their SQL grammar implementation and they
> removed Oid from being returned on certain Postgres internal C functions
> that they used to give us. This is why AGE is incompatible with Postgres
> 12+ (NOTE: possibly other things too but this is definitely THE issue
> causing us problems).
>
> Our goal since that point has been to implement as much of the openCypher
> spec as possible. Which is why we have not dealt with this problem yet. The
> good news is: none of the features implemented since the point we realized
> this mistake have made the job of changing AGE to fix this take more time.
>
> The reason why I am bringing this up now in this email is: Option #1 is
> clearly the superior option. We should funnel as much functionality that we
> can through the ANSI SQL and the openCypher specification that we can. The
> internal C functions of Postgres are not part of those standards.
> Therefore, this bulk import tool needs to create dynamic sql queries on a
> row per row basis, rather than use the internal Postgres functions to
> bypass some of those standards. Even though it is slower.
>
> With all of that said: we need to deal with this issue soon. In other
> orders: soon == 2021.
>
> TLDR: Option #1 is better.
> We create value for our clients by connecting the world's data.
> Josh Innis Software Engineer
> Core R Team
> *P*: (831)278-0327
> 3945 Freedom Circle #260, Santa Clara, CA 95054
> <
> https://www.google.com/maps/place/3945+Freedom+Cir+%23260,+Santa+Clara,+CA+95054/
> >
> bitnine.net
>
>
> On Wed, Sep 1, 2021 at 10:16 AM Josh Innis  wrote:
>
> > Muhammed,
> >
> > Directly calling the functions that insert data into the vertex and edge
> > tables might not be the best way. They are tightly integrated with the
> > Postgres stages of execution (Parse, Transform, Optimize, and Execution).
> > There are two options I would recommend:
> >
> >
> >1. Dynamically create either a Cypher CREATE query or create an SQL
> >insert query.
> >2. Take a look at how we insert data into our ag_catalog tables
> >ag_graph and ag_label. Both are located in src/backend/catalog. This
> would
> >decouple a great amount of our insertion logic from the Postgres
> (Though
> >obviously not completely). If you take a look at agtype_build_map in
> >src/backend/utils/adt/agtype.c that would give you an idea on how to
> create
> >the property map from the spreadsheet.
> >
> > Both solutions have potential drawbacks. Dynamically creating a
> cypher/sql
> > query will be slower however, you are using the standard interface
> provided
> > by both us and Postgres (via the SQL and openCypher language
> > specifications). Which would not be the case in the 2nd option, however
> > that option could decouple some logic that might be useful for later
> > features.
> >
> > We create value for our clients by connecting the world's data.
> > Josh Innis Software Engineer
> > Core R Team
> > *P*: 

Re: Creating Nodes in Bulk,

2021-09-01 Thread Josh Innis
Hi all,

Upon thinking about this more I realized there is an underlying issue that
needs to be discussed that directly affects the approach needed on this
feature. To do so: I need to talk about an early design mistake that was
made on this project.

Back in 2019 when this project was still called Agensgraph-Extension and
there was no consideration to donate to Apache, the project ignored a basic
rule of programming: don't use deprecated features. In particular: the use
of OIDs by non pg_catalog tables. OID stands for Object Id. In short,
Postgres uses OIDs to assign a unique value to a row that no other system
table row has. In earlier versions of Postgres they gave the right of the
user to mark rows in other tables as "system" tables and give them an oid
(An oid that no POSTGRES system table had either).

This feature was deprecated in version Postgres 8.5. We started development
of AGE on Postgres 11. To try and summarize a decision that was made in the
first 3 months of a project. Some of the people involved (but not all) are
no longer working on the project thought: "The reason why they did this is
because they don't want systems built on top of Postgres to consider
themselves system tables. We are not technically a 'system built on top of
Postgres.' We are an extension of Postgres. Therefore our metadata tables
are technically system tables. Therefore our unique identifiers should be
unique in all of Postgres. Therefore we should use OIDs... Oh, it's
deprecated. We should still use them. Why not, what's the worst that could
happen?"

The worst that could happen is Postgres removes the functionality of adding
Oids to users tables (which we are) in Postgres 12. They removed the
functionality of 'WITH OIDS' from their SQL grammar implementation and they
removed Oid from being returned on certain Postgres internal C functions
that they used to give us. This is why AGE is incompatible with Postgres
12+ (NOTE: possibly other things too but this is definitely THE issue
causing us problems).

Our goal since that point has been to implement as much of the openCypher
spec as possible. Which is why we have not dealt with this problem yet. The
good news is: none of the features implemented since the point we realized
this mistake have made the job of changing AGE to fix this take more time.

The reason why I am bringing this up now in this email is: Option #1 is
clearly the superior option. We should funnel as much functionality that we
can through the ANSI SQL and the openCypher specification that we can. The
internal C functions of Postgres are not part of those standards.
Therefore, this bulk import tool needs to create dynamic sql queries on a
row per row basis, rather than use the internal Postgres functions to
bypass some of those standards. Even though it is slower.

With all of that said: we need to deal with this issue soon. In other
orders: soon == 2021.

TLDR: Option #1 is better.
We create value for our clients by connecting the world's data.
Josh Innis Software Engineer
Core R Team
*P*: (831)278-0327
3945 Freedom Circle #260, Santa Clara, CA 95054

bitnine.net


On Wed, Sep 1, 2021 at 10:16 AM Josh Innis  wrote:

> Muhammed,
>
> Directly calling the functions that insert data into the vertex and edge
> tables might not be the best way. They are tightly integrated with the
> Postgres stages of execution (Parse, Transform, Optimize, and Execution).
> There are two options I would recommend:
>
>
>1. Dynamically create either a Cypher CREATE query or create an SQL
>insert query.
>2. Take a look at how we insert data into our ag_catalog tables
>ag_graph and ag_label. Both are located in src/backend/catalog. This would
>decouple a great amount of our insertion logic from the Postgres (Though
>obviously not completely). If you take a look at agtype_build_map in
>src/backend/utils/adt/agtype.c that would give you an idea on how to create
>the property map from the spreadsheet.
>
> Both solutions have potential drawbacks. Dynamically creating a cypher/sql
> query will be slower however, you are using the standard interface provided
> by both us and Postgres (via the SQL and openCypher language
> specifications). Which would not be the case in the 2nd option, however
> that option could decouple some logic that might be useful for later
> features.
>
> We create value for our clients by connecting the world's data.
> Josh Innis Software Engineer
> Core R Team
> *P*: (831)278-0327
> 3945 Freedom Circle #260, Santa Clara, CA 95054
> 
> bitnine.net
>
>
> On Wed, Sep 1, 2021 at 9:42 AM Muhammad Shoaib  wrote:
>
>> Hi Folks,
>>
>> I am trying to write a function to load nodes from CSV into AGE. I have
>> data into C data structure but don't know how to proceed further. The
>> particular questions are
>>
>> Which AGE 

Re: Creating Nodes in Bulk,

2021-09-01 Thread Josh Innis
Muhammed,

Directly calling the functions that insert data into the vertex and edge
tables might not be the best way. They are tightly integrated with the
Postgres stages of execution (Parse, Transform, Optimize, and Execution).
There are two options I would recommend:


   1. Dynamically create either a Cypher CREATE query or create an SQL
   insert query.
   2. Take a look at how we insert data into our ag_catalog tables ag_graph
   and ag_label. Both are located in src/backend/catalog. This would decouple
   a great amount of our insertion logic from the Postgres (Though obviously
   not completely). If you take a look at agtype_build_map in
   src/backend/utils/adt/agtype.c that would give you an idea on how to create
   the property map from the spreadsheet.

Both solutions have potential drawbacks. Dynamically creating a cypher/sql
query will be slower however, you are using the standard interface provided
by both us and Postgres (via the SQL and openCypher language
specifications). Which would not be the case in the 2nd option, however
that option could decouple some logic that might be useful for later
features.

We create value for our clients by connecting the world's data.
Josh Innis Software Engineer
Core R Team
*P*: (831)278-0327
3945 Freedom Circle #260, Santa Clara, CA 95054

bitnine.net


On Wed, Sep 1, 2021 at 9:42 AM Muhammad Shoaib  wrote:

> Hi Folks,
>
> I am trying to write a function to load nodes from CSV into AGE. I have
> data into C data structure but don't know how to proceed further. The
> particular questions are
>
> Which AGE data structure I need to use
> which function I need to call in order to insert data into PG Tables.
>
> Looking forward to hearing from you.
>
> Kind regards
> Shoaib
>