Hi Jon,

I think to me it is a good news that we can increase
hawq_rm_nvseg_perquery_perseg_limit to improve performance when accessing
randomly distributed table. I think this limit is just a upper limit for
random table. In my opinion, it is not active when considering a hash
distributed table, and it is not considered when deciding the bucket number
of a hash table.

So, even setting 24 as high as you mentioned, I think hash table always
follows its bucket number to acquire virtual segments. I think Hubert (
hzh...@pivotal.io) can provide you more information how to decide bucket
number of a hash distributed table and how to decide number of virtual
segments for a query accessing mixed distributed tables.

I want to mention another case that we have tight resource or busy
workload, query for random distributed table will not get stable number of
virtual segments as a hash table, even when hawq_rm_nvseg_perquery_perseg_limit
is set as high as 24.

Best,
Yi

On Wed, Dec 7, 2016 at 4:49 AM, Jon Roberts <jrobe...@pivotal.io> wrote:

> I've been testing TPC-DS queries and found that I can get Randomly
> Distributed tables to outperform Hash Distributed tables by increasing
> hawq_rm_nvseg_perquery_perseg_limit on a per query basis to as high as 24.
>
> For Hash Distributed tables, 24 is way too high.  It is also not a great
> idea to make the default so high in case users are creating a mix of Random
> and Hash Distributed Tables.
>
> Would it be possible to make this one GUC separated into two so that you
> can leave it 6 for Hash Distributed tables but another value like 16 for
> Randomly Distributed tables?
>
> This enhancement would also make it possible for later improvements in the
> optimizer to determine how many vsegs to use.  For example, some queries
> worked best set to 12 while others greatly benefited when set to 24.
>
>
> Jon Roberts
>

Reply via email to