Re: How efficient is memory allocation in tez.

2015-04-10 Thread Gopal Vijayaraghavan
 wasn't being limited by memory but I tried to get the memory usage of
each tez task down so it could spawn more tasks(but it didn't)  Giving
tez more or less memory didn't really improve the performance.
 How would one go about find out the limiting factor on the performance
of a job. would job counters be the only way in tez ?

Hive uses a lot of in-memory processing, particularly for broadcast JOINs
and group-by aggregations.

Cutting down memory is likely to have a tipping point where smaller JVMs
are much slower, due to having to shuffle more data. I usually use 4Gb
containers for everything.

YARN allocates memory according to the scheduler min-allocation, so
lowering tez container size might have wastage if it is not a multiple of
the min-alloc.

A great resource for monitoring this closely is the YARN JMX output - on
YARN NodeManager, there¹s http://workernode0:8042/jmx, which gives out
max/min/std-dev of memory usage of each container.

In hive-1.0 + tez, it works out how much memory was actually returned by
YARN and tries to resize Xmx to 80% of that. You can ask for a 513Mb
container and if you get a 1Gb container, the Xmx will be set to 800Mb
instead of assuming 400Mb based on the resource request.

We¹re slowly building a generic Counter analysis kit (TEZ-2076) for Tez,
which tracks things slightly closer than just visually inspecting counters.

That I want to use to detect skewed keys, because we track keys/records
per-edge - so I get to find out if a reducer is slow because it had ~10x
values as the others, but ~1x keys.

Until then, the primary tool for Hive would be to look at the Swimlanes in
Tez.

https://github.com/apache/tez/tree/master/tez-tools/swimlanes


This generates something which looks like this

http://people.apache.org/~gopalv/query27.svg

It doesn¹t work with failed queries, but for successful queries this is
extremely useful as each box is a clickable link to that particular task's
log.

Also I debug Tez locality delay allocations, speedups during reuse etc
with that.

Cheers,
Gopal 









Re: How efficient is memory allocation in tez.

2015-04-06 Thread Gopal Vijayaraghavan

 I have a map join in which the smaller tables together are 200 MB and
trying to  have one block of main table be processed by one tez task.
...
 What am I missing and is this even the right way of approaching the
problem ?

You need to be more specific about the Hive version. Hive-13 needs ~6x the
amount of map-join memory for Tez compared to Hive-14.

Hive-1.0 branch is a bit better at estimating map-join sizes as well,
since it counts the memory overheads of JavaDataModel.

Hive-1.1 got a little worse, which will get fixed when we get to hive-1.2.

But for the 1.x line, the approx size of data that fits within a map-join
is (container Xmx - io.sort.mb)/3.

This plays into the NewRatio settings in JDK7 as well, make sure you have
set the new ratio to only 1/8th the memory instead of using 1/3rd default
(which means 30% of your memory cannot be used by the sort buffer or the
map-join since they are tenured data).

Also running ³ANALYZE TABLE tbl compute statistics;² on the small tables
will fill in the uncompressed size fields so that we don¹t estimate
map-joins based on zlib sizes (which coincidentally is ~3x off).

And if you still keep getting heap errors, I can take a look at it if you
have a .hprof.bz2 file to share  fix any corner cases we might¹ve missed.

Cheers,
Gopal
PS: The current trunk implements a Grace HashJoin which is another
approach to the memory limit problem - a more traditional solution than
fixing mem sizes.




How efficient is memory allocation in tez.

2015-04-05 Thread P lva
My tez query seems to error out.

I have a map join in which the smaller tables together are 200 MB and
trying to  have one block of main table be processed by one tez task.

Using the following formula to calculate the tez container size

Small table size + each block size + memory for sort + some more space for
threads + non heap

allocating 40 % of total for threads and other heap objects, 20% for non
heap.

Tried doing 1:2 and 1:3 (calculate for two blocks but set tez.am.min-size
to one block) They error out too.

What am I missing and is this even the right way of approaching the problem
?