> wasn't being limited by memory but I tried to get the memory usage of
>each tez task down so it could spawn more tasks(but it didn't)  Giving
>tez more or less memory didn't really improve the performance.
> How would one go about find out the limiting factor on the performance
>of a job. would job counters be the only way in tez ?

Hive uses a lot of in-memory processing, particularly for broadcast JOINs
and group-by aggregations.

Cutting down memory is likely to have a tipping point where smaller JVMs
are much slower, due to having to shuffle more data. I usually use 4Gb
containers for everything.

YARN allocates memory according to the scheduler min-allocation, so
lowering tez container size might have wastage if it is not a multiple of
the min-alloc.

A great resource for monitoring this closely is the YARN JMX output - on
YARN NodeManager, there¹s http://workernode0:8042/jmx, which gives out
max/min/std-dev of memory usage of each container.

In hive-1.0 + tez, it works out how much memory was actually returned by
YARN and tries to resize Xmx to 80% of that. You can ask for a 513Mb
container and if you get a 1Gb container, the Xmx will be set to 800Mb
instead of assuming 400Mb based on the resource request.

We¹re slowly building a generic Counter analysis kit (TEZ-2076) for Tez,
which tracks things slightly closer than just visually inspecting counters.

That I want to use to detect skewed keys, because we track keys/records
per-edge - so I get to find out if a reducer is slow because it had ~10x
values as the others, but ~1x keys.

Until then, the primary tool for Hive would be to look at the Swimlanes in
Tez.

https://github.com/apache/tez/tree/master/tez-tools/swimlanes


This generates something which looks like this

http://people.apache.org/~gopalv/query27.svg

It doesn¹t work with failed queries, but for successful queries this is
extremely useful as each box is a clickable link to that particular task's
log.

Also I debug Tez locality delay allocations, speedups during reuse etc
with that.

Cheers,
Gopal 







Reply via email to