Re: Choosing values for multivariate MCV lists

2019-07-02 Thread Tomas Vondra
On Mon, Jul 01, 2019 at 12:02:28PM +0100, Dean Rasheed wrote: On Sat, 29 Jun 2019 at 14:01, Tomas Vondra wrote: >However, it looks like the problem is with mcv_list_items()'s use >of %f to convert to text, which is pretty ugly. >>>There's one issue with the signature, though -

Re: Choosing values for multivariate MCV lists

2019-07-01 Thread Dean Rasheed
On Sat, 29 Jun 2019 at 14:01, Tomas Vondra wrote: > > >However, it looks like the problem is with mcv_list_items()'s use > >of %f to convert to text, which is pretty ugly. > > >>>There's one issue with the signature, though - currently the function > >>>returns null flags as bool

Re: Choosing values for multivariate MCV lists

2019-06-29 Thread Tomas Vondra
On Tue, Jun 25, 2019 at 11:18:19AM +0200, Tomas Vondra wrote: On Mon, Jun 24, 2019 at 02:54:01PM +0100, Dean Rasheed wrote: On Mon, 24 Jun 2019 at 00:42, Tomas Vondra wrote: On Sun, Jun 23, 2019 at 10:23:19PM +0200, Tomas Vondra wrote: On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed

Re: Choosing values for multivariate MCV lists

2019-06-25 Thread Tomas Vondra
On Mon, Jun 24, 2019 at 02:54:01PM +0100, Dean Rasheed wrote: On Mon, 24 Jun 2019 at 00:42, Tomas Vondra wrote: On Sun, Jun 23, 2019 at 10:23:19PM +0200, Tomas Vondra wrote: >On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed wrote: >>On Sat, 22 Jun 2019 at 15:10, Tomas Vondra wrote:

Re: Choosing values for multivariate MCV lists

2019-06-24 Thread Dean Rasheed
On Mon, 24 Jun 2019 at 00:42, Tomas Vondra wrote: > > On Sun, Jun 23, 2019 at 10:23:19PM +0200, Tomas Vondra wrote: > >On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed wrote: > >>On Sat, 22 Jun 2019 at 15:10, Tomas Vondra > >>wrote: > >>>One annoying thing I noticed is that the

Re: Choosing values for multivariate MCV lists

2019-06-23 Thread Tomas Vondra
On Mon, Jun 24, 2019 at 01:42:32AM +0200, Tomas Vondra wrote: On Sun, Jun 23, 2019 at 10:23:19PM +0200, Tomas Vondra wrote: On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed wrote: On Sat, 22 Jun 2019 at 15:10, Tomas Vondra wrote: One annoying thing I noticed is that the base_frequency

Re: Choosing values for multivariate MCV lists

2019-06-23 Thread Tomas Vondra
On Sat, Jun 22, 2019 at 04:10:52PM +0200, Tomas Vondra wrote: On Fri, Jun 21, 2019 at 08:50:33AM +0100, Dean Rasheed wrote: On Thu, 20 Jun 2019 at 23:35, Tomas Vondra wrote: On Thu, Jun 20, 2019 at 06:55:41AM +0100, Dean Rasheed wrote: I'm not sure it's easy to justify ordering by

Re: Choosing values for multivariate MCV lists

2019-06-23 Thread Tomas Vondra
On Sun, Jun 23, 2019 at 10:23:19PM +0200, Tomas Vondra wrote: On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed wrote: On Sat, 22 Jun 2019 at 15:10, Tomas Vondra wrote: One annoying thing I noticed is that the base_frequency tends to end up being 0, most likely due to getting too small.

Re: Choosing values for multivariate MCV lists

2019-06-23 Thread Tomas Vondra
On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed wrote: On Sat, 22 Jun 2019 at 15:10, Tomas Vondra wrote: One annoying thing I noticed is that the base_frequency tends to end up being 0, most likely due to getting too small. It's a bit strange, though, because with statistic target set

Re: Choosing values for multivariate MCV lists

2019-06-23 Thread Dean Rasheed
On Sat, 22 Jun 2019 at 15:10, Tomas Vondra wrote: > One annoying thing I noticed is that the base_frequency tends to end up > being 0, most likely due to getting too small. It's a bit strange, though, > because with statistic target set to 10k the smallest frequency for a > single column is

Re: Choosing values for multivariate MCV lists

2019-06-22 Thread Tomas Vondra
On Fri, Jun 21, 2019 at 08:50:33AM +0100, Dean Rasheed wrote: On Thu, 20 Jun 2019 at 23:35, Tomas Vondra wrote: On Thu, Jun 20, 2019 at 06:55:41AM +0100, Dean Rasheed wrote: >I'm not sure it's easy to justify ordering by Abs(freq-base_freq)/freq >though, because that would seem likely to put

Re: Choosing values for multivariate MCV lists

2019-06-21 Thread Dean Rasheed
On Thu, 20 Jun 2019 at 23:35, Tomas Vondra wrote: > > On Thu, Jun 20, 2019 at 06:55:41AM +0100, Dean Rasheed wrote: > > >I'm not sure it's easy to justify ordering by Abs(freq-base_freq)/freq > >though, because that would seem likely to put too much weight on the > >least commonly occurring

Re: Choosing values for multivariate MCV lists

2019-06-20 Thread Tomas Vondra
On Thu, Jun 20, 2019 at 06:55:41AM +0100, Dean Rasheed wrote: On Tue, 18 Jun 2019 at 21:59, Tomas Vondra wrote: The current implementation of multi-column MCV lists (added in this cycle) uses a fairly simple algorithm to pick combinations to include in the MCV list. We just compute a minimum

Re: Choosing values for multivariate MCV lists

2019-06-19 Thread Dean Rasheed
On Tue, 18 Jun 2019 at 21:59, Tomas Vondra wrote: > > The current implementation of multi-column MCV lists (added in this > cycle) uses a fairly simple algorithm to pick combinations to include in > the MCV list. We just compute a minimum number of occurences, and then > include all entries

Choosing values for multivariate MCV lists

2019-06-18 Thread Tomas Vondra
Hi, The current implementation of multi-column MCV lists (added in this cycle) uses a fairly simple algorithm to pick combinations to include in the MCV list. We just compute a minimum number of occurences, and then include all entries sampled more often. See get_mincount_for_mcv_list(). By