Hi Jon, I think to me it is a good news that we can increase hawq_rm_nvseg_perquery_perseg_limit to improve performance when accessing randomly distributed table. I think this limit is just a upper limit for random table. In my opinion, it is not active when considering a hash distributed table, and it is not considered when deciding the bucket number of a hash table.
So, even setting 24 as high as you mentioned, I think hash table always follows its bucket number to acquire virtual segments. I think Hubert ( hzh...@pivotal.io) can provide you more information how to decide bucket number of a hash distributed table and how to decide number of virtual segments for a query accessing mixed distributed tables. I want to mention another case that we have tight resource or busy workload, query for random distributed table will not get stable number of virtual segments as a hash table, even when hawq_rm_nvseg_perquery_perseg_limit is set as high as 24. Best, Yi On Wed, Dec 7, 2016 at 4:49 AM, Jon Roberts <jrobe...@pivotal.io> wrote: > I've been testing TPC-DS queries and found that I can get Randomly > Distributed tables to outperform Hash Distributed tables by increasing > hawq_rm_nvseg_perquery_perseg_limit on a per query basis to as high as 24. > > For Hash Distributed tables, 24 is way too high. It is also not a great > idea to make the default so high in case users are creating a mix of Random > and Hash Distributed Tables. > > Would it be possible to make this one GUC separated into two so that you > can leave it 6 for Hash Distributed tables but another value like 16 for > Randomly Distributed tables? > > This enhancement would also make it possible for later improvements in the > optimizer to determine how many vsegs to use. For example, some queries > worked best set to 12 while others greatly benefited when set to 24. > > > Jon Roberts >