On Mon, May 28, 2018 at 4:25 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Fri, May 25, 2018 at 6:33 AM, Asim Praveen <aprav...@pivotal.io> wrote: >> We are evaluating the use of shared buffers for temporary tables. The >> advantage being queries involving temporary tables can make use of parallel >> workers. > This is one way, but I think there are other choices as well. We can > identify and flush all the dirty (local) buffers for the relation > being accessed parallelly. Now, once the parallel operation is > started, we won't allow performing any write operation on them. It > could be expensive if we have a lot of dirty local buffers for a > particular relation. I think if we are worried about the cost of > writes, then we can try some different way to parallelize temporary > table scan. At the beginning of the scan, leader backend will > remember the dirty blocks present in local buffers, it can then share > the list with parallel workers which will skip scanning those blocks > and in the end leader ensures that all those blocks will be scanned by > the leader. This shouldn't incur a much additional cost as the > skipped blocks should be present in local buffers of backend.
This sounds awkward and limiting. How about using DSA to allocate space for the backend's temporary buffers and a dshash for lookups? Then all backends can share, but we don't have to go through shared_buffers. Honestly, I don't see how pushing this through the main shared_buffers arena is every going to work. Widening the buffer tags by another 4 bytes is a non-starter, I think. Maybe with some large rejiggering we could get to a point where every relation, temporary or permanent, can be identified by a single OID, regardless of tablespace, etc. But if you wait for that to happen this might not happen for a loooong time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company