I created https://issues.apache.org/jira/browse/HAWQ-1299 to track this, When have an agreement about the limit, I will fix it at once.
Please go there and comment for this improvement. Thank you. Yi On Mon, Jan 30, 2017 at 3:56 PM, Yi Jin <y...@pivotal.io> wrote: > Jon, Sorry for late response, I find I missed this one. > > With regard to the memory quota of one vseg, I think I know your > consideration now. It makes sense. What's your expectation for the memory > limit? 32GB or 64GB? I think I can open a jira to track this and make > decision which limit we can extend to. > > And for hawq_rm_nvseg_perquery_limit, I think if you dont want this > limits your query resource in a large cluster ,just set it to a large value > 65535 for example, then I think you can achieve your goal, saying using > hawq_rm_nvseg_perquery_perseg_limit to dynamically limit the number of > vsegs for one query. > > Yi > > On Tue, Jan 24, 2017 at 3:21 AM, Jon Roberts <jrobe...@pivotal.io> wrote: > >> It is common in Greenplum to use "explain analyze" to look at the query >> plan and if the query spills to disk, then increasing the statement memory >> will typically increase the speed of the query. With hash distributed >> tables in Hawq, this will be true too and the only way, short of rewriting >> the query, to make it perform better. >> >> For the really large Greenplum clusters out there, we don't see benefits >> from running 8 segments per host. Instead, we dial this back to 4 >> segments >> because the network traffic in the interconnect becomes the bottleneck. >> For Hawq where the clusters will likely be much bigger, I think we will >> have the same network limitations. So with random distribution, we will >> probably reduce the number of vsegs to get better performance on the >> network side and then use more memory per vseg to prevent spilling to disk >> and use more of the available resources. That is why we need the ability >> to allocate more than 16GB of RAM per vseg. >> >> On a different but related note, I would like to see >> hawq_rm_nvseg_perquery_limit be changed to not determine the number of >> vsegs per query across the entire cluster but at the datanode level. That >> would make it much easier to expand and shrink a cluster without having to >> adjust GUCs each time. >> >> >> >> Jon Roberts >> Principal Engineer | jrobe...@pivotal.io | 615-426-8661 >> >> On Sun, Jan 22, 2017 at 10:05 PM, Yi Jin <y...@pivotal.io> wrote: >> >> > Hi Jon, >> > >> > For scenario 1, it is true. However, do you think it necessary to make >> one >> > vseg having so high memory quota setting? Of course, I think if we force >> > hawq_rm_stmt_nvseg a value different with bucket number, optimizer will >> > give different plan. This is way to force high concurrency and high >> > resource occupation. >> > >> > For scenario 2, I think if we want to release more concurrency in >> executing >> > one query, we should advice customer to >> > increase hawq_rm_nvseg_perquery_limit (default 512) >> > and hawq_rm_nvseg_perquery_perseg_limit(default 6), then we choose an >> > appropriate vseg mem quota, as a result HAWQ will make decision to >> leverage >> > more concurrency and more resource for big queries automatically. I >> mean >> > HAWQ will choose the number of virtual segments automatically, not >> always >> > high number of virtual segments. >> > >> > For scenario 3, It is recommended not to assign all system memory to >> HAWQ >> > in the segment side. It is a good idea to leave some for system and for >> the >> > other applications. >> > >> > I think it is fine to support higher memory limit for one virtual >> segment, >> > but still I have no idea why it is necessary. >> > >> > As HAWQ uses different memory allocation strategy comparing with >> Greenplum, >> > it is true, the external behavior of HAWQ is different. Virtual segment >> > (VSEG) is a critical concept in HAWQ to manage resource consumption and >> > query QE concurrency, in fact for Greenplum, the QE concurrency is >> fixed, >> > i.e. the number of QEs for one query execution is fixed, this is why >> > Greenplum user just needs to think of memory consumption in one >> segment, as >> > all segments consume the same amount of memory for the same query. While >> > back to HAWQ, this number is dynamic, you can easily change resource >> queue >> > definition or statement resource definition to change it, you don't >> have to >> > install and config more segments in those nodes to force the QE >> > concurrency. >> > >> > Consider a scenario that there are a lot of small scale queries >> > concurrently arrived, HAWQ just allocate a small number of vsegs for >> each >> > query without resource waste and this makes HAWQ able to support high >> > concurrency much better than traditional MPP-architecture database to >> > achieve high throughput. Opposite, for those large scale queries, HAWQ >> can >> > automatically increase number of vseg to a very high number if you set >> > those limit gucs high as well. >> > >> > Best, >> > Yi >> > >> > On Mon, Jan 23, 2017 at 12:30 PM, Jon Roberts <jrobe...@pivotal.io> >> wrote: >> > >> > > I've been thinking about these scenarios: >> > > >> > > 1. Hash distributed tables with fixed number of buckets. >> > > If the tables were built using the defaults, buckets = 6 * number of >> > > nodes. So you basically have 6 vsegs per host. Multiply that by 16GB >> > and >> > > you only can use 96GB of the 256GB of RAM per node. >> > > >> > > 2. A user has random tables but doesn't understand they can increase >> the >> > > number of vsegs. >> > > This will be common for users that come from Greenplum. They again >> can >> > > only set statement member to 16GB so they are stuck with a max of >> 96GB of >> > > RAM usage. >> > > >> > > 3. User increases vsegs and statement memory. >> > > Possibly run out of memory if too aggressive with settings. >> > > >> > > - I think we should be able to specify statement memory higher than >> 16GB. >> > > Maybe the limit should be something much higher such as 1TB. >> > > >> > > - The optimizer should limit the number of vsegs based on statement >> > memory >> > > setting to prevent OOM. You could do the opposite too. (limit memory >> > and >> > > use the vseg setting provided) Greenplum can limit the amount of >> memory >> > > but we have two dials to adjust with vsegs and memory. >> > > >> > > >> > > Jon Roberts >> > > >> > > On Sun, Jan 22, 2017 at 5:20 PM, Yi Jin <y...@pivotal.io> wrote: >> > > >> > > > Hi Jon, >> > > > >> > > > That guc setting limit means for one virtual segment, the maximum >> > > > consumable memory is 16GB, for one segment/node, there maybe >> multiple >> > > vsegs >> > > > allocated to run queries, so if a node has 256gb expected to be >> > consumed >> > > by >> > > > HAWQ, it will have at most 16 vsegs running concurrently. >> > > > >> > > > hawq_rm_stmt_vseg_memory is for setting statement level vseg memory >> > > > consumption, it is required to specify hawq_rm_stmt_nvseg as well, >> only >> > > > when hawq_rm_stmt_nvseg is greater than 0, hawq_rm_stmt_vseg_memory >> is >> > > > activated regardless the original target resource queue vseg >> resource >> > > quota >> > > > definition. For example, you can set hawq_rm_stmt_vseg_memory as >> 16gb, >> > > > hawq_rm_stmt_nvseg >> > > > as 256, if you have a cluster having 256gb * 16 nodes and your >> target >> > > > resource queue can use 100% cluster resource, you will have 16 vsegs >> > > > running per node to consume all memory resource for this query. >> > > > >> > > > Best, >> > > > Yi >> > > > >> > > > On Sat, Jan 21, 2017 at 3:40 PM, Lei Chang <lei_ch...@apache.org> >> > wrote: >> > > > >> > > > > hawq_rm_stmt_vseg_memory and hawq_rm_stmt_nvseg need to be used >> > > together >> > > > to >> > > > > set the specific number of segments and the vseg memory. And >> > > > > hawq_rm_stmt_nvseg should be less than >> hawq_rm_nvseg_perquery_perseg_ >> > > > > limit. >> > > > > >> > > > > set hawq_rm_stmt_vseg_memory = '2GB';set hawq_rm_stmt_nvseg = 6; >> > > > > >> > > > > looks 16GB is somewhat small for big dedicated machines: if 16GB >> is >> > per >> > > > > virtual segment memory, if 8 segment is used, it only use 128GB. >> > > > > >> > > > > Cheers >> > > > > Lei >> > > > > >> > > > > >> > > > > On Fri, Jan 20, 2017 at 9:11 PM, Jon Roberts <jrobe...@pivotal.io >> > >> > > > wrote: >> > > > > >> > > > > > Why is there a limit of 16GB for hawq_rm_stmt_vseg_memory? A >> > cluster >> > > > > with >> > > > > > 256GB per node and dedicated for HAWQ may certainly want to >> utilize >> > > > more >> > > > > > memory per segment. Is there something I'm missing regarding >> > > statement >> > > > > > memory? >> > > > > > >> > > > > > Secondly, does the number of vsegs for a query get influenced by >> > the >> > > > > > statement memory or does it just look at the plan and >> > > > > > hawq_rm_nvseg_perquery_perseg_limit? >> > > > > > >> > > > > > >> > > > > > Jon Roberts >> > > > > > >> > > > > >> > > > >> > > >> > >> > >