> -----Original Message----- > From: Tom Lane [mailto:[EMAIL PROTECTED] > I'm a tad worried about what happens when the values that are frequently > occurring in the outer relation are also frequently occurring in the > inner (which hardly seems an improbable case). Don't you stand a severe > risk of blowing out the in-memory hash table? It doesn't appear to me > that the code has any way to back off once it's decided that a certain > set of join key values are to be treated in-memory. Splitting the main > join into more batches certainly doesn't help with that. > > Also, AFAICS the benefit of this patch comes entirely from avoiding dump > and reload of tuples bearing the most common values, which means it's a > significant waste of cycles when there's only one batch. It'd be better > to avoid doing any of the extra work in the single-batch case. > > One thought that might address that point as well as the difficulty of > getting stats in nontrivial cases is to wait until we've overrun memory > and are forced to start batching, and at that point determine on-the-fly > which are the most common hash values from inspection of the hash table > as we dump it out. This would amount to optimizing on the basis of > frequency in the *inner* relation not the outer, but offhand I don't see > any strong theoretical basis why that wouldn't be just as good. It > could lose if the first work_mem worth of inner tuples isn't > representative of what follows; but this hardly seems more dangerous > than depending on MCV stats that are for the whole outer relation rather > than the portion of it being selected. > > regards, tom lane
You are correct with both observations. The patch only has a benefit when there is more than one batch. Also, there is a potential issue with MCV hash table overflows if the number of tuples that match the MCVs in the build relation is very large. Bryce has created a patch (attached) that disables the code for one batch joins. This patch also checks for MCV hash table overflows and handles them by "flushing" from the MCV hash table back to the main hash table. The main hash table will then resolve overflows as usual. Note that this will cause the worse case of a build table with all the same values to be handled the same as the current hash code, i.e., it will attempt to re-partition until it eventually gives up and then allocates the entire partition in memory. There may be a better way to handle this case, but the new patch will remain consistent with the current hash join implementation. The issue with determining and using the MCV stats is more challenging than it appears. First, knowing the MCVs of the build table will not help us. What we need are the MCVs of the probe table because by knowing those values we will keep the tuples with those values in the build relation in memory. For example, consider a join between tables Part and LineItem. Assume 1 popular part accounts for 10% of all LineItems. If Part is the build relation and LineItem is the probe relation, then by keeping that 1 part record in memory, we will guarantee that we do not need to write out 10% of LineItem. If a selection occurs on LineItem before the join, it may change the distribution of LineItem (the MCVs) but it is probable that they are still a good estimate of the MCVs in the derived LineItem relation. (We did experiments on trying to sample the first few thousand tuples of the probe relation to dynamically determine the MCVs but generally found this was inaccurate due to non-random samples.) In essence, the goal is to smartly pick the tuples that remain in the in-memory batch before probing begins. Since the number of MCVs is small, incorrectly selecting build tuples to remain in memory has negligible cost. If we assume that LineItem has been filtered so much that it is now smaller than Part and is the build relation then the MCV approach does not apply. There is no skew in Part on partkey (since it is the PK) and knowing the MCV partkeys in LineItem does not help us because they each only join with a single tuple in Part. In this case, the MCV approach should not be used because no benefit is possible, and it will not be used because there will be no MCVs for Part.partkey. The bad case with MCV hash table overflow requires a many-to-many join between the two relations which would not occur on the more typical PK-FK joins. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED]
histojoin_v3.patch
Description: histojoin_v3.patch
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers