xudong963 commented on issue #1082: URL: https://github.com/apache/arrow-datafusion/issues/1082#issuecomment-943386326
> It looks like PostgreSQL has some explicit operator (`HashSetOp`) for these kind of things: Yes, I have seen the pg code which uses the `HashSetOp` for `INTERSECT` and `EXCEPT` ``` * In SETOP_HASHED mode, the input is delivered in no particular order, * except that we know all the tuples from one input relation will come before * all the tuples of the other. The planner guarantees that the first input * relation is the left-hand one for EXCEPT, and tries to make the smaller * input relation come first for INTERSECT. We build a hash table in memory * with one entry for each group of identical tuples, and count the number of * tuples in the group from each relation. After seeing all the input, we * scan the hashtable and generate the correct output using those counts. * We can avoid making hashtable entries for any tuples appearing only in the * second input relation, since they cannot result in any output. * * This node type is not used for UNION or UNION ALL, since those can be * implemented more cheaply (there's no need for the junk attribute to * identify the source relation). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org