On Mon, Oct 20, 2008 at 4:42 PM, Lawrence, Ramon <[EMAIL PROTECTED]> wrote: > We propose a patch that improves hybrid hash join's performance for large > multi-batch joins where the probe relation has skew. > > Project name: Histojoin > Patch file: histojoin_v1.patch > > This patch implements the Histojoin join algorithm as an optional feature > added to the standard Hybrid Hash Join (HHJ). A flag is used to enable or > disable the Histojoin features. When Histojoin is disabled, HHJ acts as > normal. The Histojoin features allow HHJ to use PostgreSQL's statistics to > do skew aware partitioning. The basic idea is to keep build relation tuples > in a small in-memory hash table that have join values that are frequently > occurring in the probe relation. This improves performance of HHJ when > multiple batches are used by 10% to 50% for skewed data sets. The > performance improvements of this patch can be seen in the paper (pages > 25-30) at: > > http://people.ok.ubc.ca/rlawrenc/histojoin2.pdf > > All generators and materials needed to verify these results can be provided. > > This is a patch against the HEAD of the repository. > > This patch does not contain platform specific code. It compiles and has > been tested on our machines in both Windows (MSVC++) and Linux (GCC). > > Currently the Histojoin feature is enabled by default and is used whenever > HHJ is used and there are Most Common Value (MCV) statistics available on > the probe side base relation of the join. To disable this feature simply > set the enable_hashjoin_usestatmcvs flag to off in the database > configuration file or at run time with the 'set' command. > > One potential improvement not included in the patch is that Most Common > Value (MCV) statistics are only determined when the probe relation is > produced by a scan operator. There is a benefit to using MCVs even when the > probe relation is not a base scan, but we were unable to determine how to > find statistics from a base relation after other operators are performed. > > This patch was created by Bryce Cutt as part of his work on his M.Sc. > thesis. > > -- > Dr. Ramon Lawrence > Assistant Professor, Department of Computer Science, University of British > Columbia Okanagan > E-mail: [EMAIL PROTECTED]
I'm interested in trying to review this patch. Having not done patch review before, I can't exactly promise grand results, but if you could provide me with the data to check your results? In the meantime I'll go read the paper. - Josh / eggyknap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers