Re: How efficient is memory allocation in tez.
wasn't being limited by memory but I tried to get the memory usage of each tez task down so it could spawn more tasks(but it didn't) Giving tez more or less memory didn't really improve the performance. How would one go about find out the limiting factor on the performance of a job. would job counters be the only way in tez ? Hive uses a lot of in-memory processing, particularly for broadcast JOINs and group-by aggregations. Cutting down memory is likely to have a tipping point where smaller JVMs are much slower, due to having to shuffle more data. I usually use 4Gb containers for everything. YARN allocates memory according to the scheduler min-allocation, so lowering tez container size might have wastage if it is not a multiple of the min-alloc. A great resource for monitoring this closely is the YARN JMX output - on YARN NodeManager, there¹s http://workernode0:8042/jmx, which gives out max/min/std-dev of memory usage of each container. In hive-1.0 + tez, it works out how much memory was actually returned by YARN and tries to resize Xmx to 80% of that. You can ask for a 513Mb container and if you get a 1Gb container, the Xmx will be set to 800Mb instead of assuming 400Mb based on the resource request. We¹re slowly building a generic Counter analysis kit (TEZ-2076) for Tez, which tracks things slightly closer than just visually inspecting counters. That I want to use to detect skewed keys, because we track keys/records per-edge - so I get to find out if a reducer is slow because it had ~10x values as the others, but ~1x keys. Until then, the primary tool for Hive would be to look at the Swimlanes in Tez. https://github.com/apache/tez/tree/master/tez-tools/swimlanes This generates something which looks like this http://people.apache.org/~gopalv/query27.svg It doesn¹t work with failed queries, but for successful queries this is extremely useful as each box is a clickable link to that particular task's log. Also I debug Tez locality delay allocations, speedups during reuse etc with that. Cheers, Gopal
Re: How efficient is memory allocation in tez.
I have a map join in which the smaller tables together are 200 MB and trying to have one block of main table be processed by one tez task. ... What am I missing and is this even the right way of approaching the problem ? You need to be more specific about the Hive version. Hive-13 needs ~6x the amount of map-join memory for Tez compared to Hive-14. Hive-1.0 branch is a bit better at estimating map-join sizes as well, since it counts the memory overheads of JavaDataModel. Hive-1.1 got a little worse, which will get fixed when we get to hive-1.2. But for the 1.x line, the approx size of data that fits within a map-join is (container Xmx - io.sort.mb)/3. This plays into the NewRatio settings in JDK7 as well, make sure you have set the new ratio to only 1/8th the memory instead of using 1/3rd default (which means 30% of your memory cannot be used by the sort buffer or the map-join since they are tenured data). Also running ³ANALYZE TABLE tbl compute statistics;² on the small tables will fill in the uncompressed size fields so that we don¹t estimate map-joins based on zlib sizes (which coincidentally is ~3x off). And if you still keep getting heap errors, I can take a look at it if you have a .hprof.bz2 file to share fix any corner cases we might¹ve missed. Cheers, Gopal PS: The current trunk implements a Grace HashJoin which is another approach to the memory limit problem - a more traditional solution than fixing mem sizes.
How efficient is memory allocation in tez.
My tez query seems to error out. I have a map join in which the smaller tables together are 200 MB and trying to have one block of main table be processed by one tez task. Using the following formula to calculate the tez container size Small table size + each block size + memory for sort + some more space for threads + non heap allocating 40 % of total for threads and other heap objects, 20% for non heap. Tried doing 1:2 and 1:3 (calculate for two blocks but set tez.am.min-size to one block) They error out too. What am I missing and is this even the right way of approaching the problem ?