On Sat, Mar 28, 2026 at 6:33 PM Henson Choi <[email protected]> wrote:
>
> Hi Ashutosh and Junwang,
>
>> > patch 0002 in the attached patchset implements it this way. It
>> > collects all the element variables before transforming the path
>> > pattern. Later it uses this list to detect non-local variable
>> > references and throws error.
>
>
> I applied both patches (0001 and 0002) and ran some extended testing
> beyond the included regression tests. All non-local cross-references
> are properly rejected across every combination I tried:
>
>   vertex <-> edge cross-references:
>   - vertex WHERE referencing a later edge (forward)     -> ERROR
>   - vertex WHERE referencing an earlier edge (backward) -> ERROR
>   - edge WHERE referencing a later vertex (forward)     -> ERROR
>   - edge WHERE referencing an earlier vertex (backward) -> ERROR
>
>   longer patterns ()-[]->()-[]->():
>   - 3rd vertex -> 1st vertex (long backward)            -> ERROR
>   - 1st edge -> 3rd vertex (long forward)               -> ERROR
>   - 2nd vertex -> 1st vertex (adjacent backward)        -> ERROR
>   - 2nd vertex -> 3rd vertex (adjacent forward)         -> ERROR
>   - 1st vertex -> 3rd vertex (long forward)             -> ERROR
>   - all references local                                -> OK
>
> All combinations -- vertex-vertex, vertex-edge, edge-vertex, adjacent
> and long-distance -- are blocked as expected. Looks good.

Thanks a lot for testing all these cases. Anything worth including in
the graph_table.sql? I mean, we don't want to include tests which can
be proved to be working (as expected) by induction e.g. If there's
test testing a reference from element just before or after the current
element, by induction it also tests references separated longer than
that. It's good to make sure such cases work as expected once but
testing them every time someone runs make check may not be worth it.
But in case there's a non-trivial test, please provide a patch such
that it uses existing property graphs and underlying tables.

>
>> Seeing the comment above transformGraphTablePropertyRef:
>>
>> *
>> * A property reference is parsed as a ColumnRef of the form:
>> * <variable>.<property>. If <variable> is one of the variables bound to an
>> * element pattern in the graph pattern and <property> can be resolved as a
>> * property of the property graph, then we return a GraphPropertyRef node
>>
>> it seems to imply that G041 is supported. Should we update this comment
>> to better reflect the current behavior?
>
>
> Agreed, it would be nice to update the comment to reflect that
> non-local element variable references are now rejected.

I am -0.1 on modifying the prologue since the comment in the function
explains it. The prologue high level functionality - i.e. what
constitutes a property reference, and how it's resolved, and what
happens if it can not be resolved as a property reference. A cross
reference case is a case where a reference can be resolved but we are
choosing not to do so, temporarily. So, I think a comment within the
function suffices. When we support cross reference, we remove the code
and comment together without disturbing the prologue. However, I don't
have a strong opinion here.

>
>>
>> I have a question about this. Why do we allow a ColumnRef to reference a
>> lateral table outside the graph pattern?
>>
>> For example, in the following query:
>>
>> SELECT x1.a, g.*
>> FROM x1,
>>      GRAPH_TABLE (
>>        myshop
>>        MATCH (x1 IS customers WHERE address = 'US')
>>              -[IS customer_orders]->(o IS orders)
>>        COLUMNS (x1.name AS customer_name,
>>                 x1.customer_id AS cid,
>>                 o.order_id)
>>      ) g;
>>
>> It’s easy for users to get confused and assume that `address` is a column
>> of customers rather than the outside x1.
>
>
>
>
> GRAPH_TABLE has implicit LATERAL semantics per the SQL standard
> (ISO/IEC 9075-16, Syntax Rule 5 of Subclause 7.6: <graph table derived
> table> is added to the list of <table primary>s that support lateral
> join), so referencing outer FROM items is expected behavior.
>
> When the FROM table name doesn't conflict with an element variable,
> lateral references work as expected:
>
>   SELECT x2.a, g.*
>   FROM x2,
>        GRAPH_TABLE (myshop
>          MATCH (c IS customers WHERE c.address = 'US'
>                                  AND c.customer_id = x2.a)
>                -[IS customer_orders]->(o IS orders)
>          COLUMNS (c.name AS customer_name,
>                   c.customer_id AS cid, o.order_id)) g;
>   -- OK: x2.a resolves to lateral table x2
>
> But when the same name is used for both (as in your example with x1),
> qualified references are always captured by the element variable:
>
>   SELECT x1.a, g.*
>   FROM x1,
>        GRAPH_TABLE (myshop
>          MATCH (x1 IS customers WHERE x1.address = 'US')
>                -[IS customer_orders]->(o IS orders
>                                        WHERE o.order_id = x1.a)
>          COLUMNS (x1.name AS customer_name,
>                   x1.customer_id AS cid, o.order_id)) g;
>   -- ERROR: non-local element variable reference is not supported
>   -- x1.a is recognized as element variable x1, not lateral table x1
>
> I agree this can be a bit confusing for users, but given that PGQ is
> still in its early stage, prioritizing stability seems reasonable
> for now. This should be resolved once G041 support is added in the
> future.

Thanks for the explanation. There's one more point to add here.

In SQL/PGQ standard, a local where clause is WHERE <search condition>.
The <search condition> is defined in part 2 as different types of
boolean expressions which can ultimately contain <non-parenthesized
value expression primary>. SQL/PGQ allows <property reference> to be
added to this production and it defines <property reference> as <basic
identifier chain>. <basic identifier chain> can be of any length - so
single identifier is allowed here. Interpreting syntax rule 3, a
property reference requires its first identifier as an element
variable name. This means that a true property reference contains at
least 2 identifiers: element variable, and the property name. This
means references with a single identifier are something other than the
property reference e.g. lateral table column references. And we do
allow such references e.g. select * from generate_series(1, 10) g(i),
generate_series(1, i); If the name of a property is same as a column
of a lateral table, a single identifier reference will be resolved to
the latter and not the first, against intuition. So the confusion
arises from the standard itself.

-- 
Best Wishes,
Ashutosh Bapat


Reply via email to