Hyunsik Choi created TAJO-900:
---------------------------------
Summary: Reducing memory usage during query processing
Key: TAJO-900
URL: https://issues.apache.org/jira/browse/TAJO-900
Project: Tajo
Issue Type: Bug
Components: physical operator, storage
Reporter: Hyunsik Choi
Fix For: 0.9.0
Currently, we have used tuple structures implemented as Java objects. It
internally uses Datum objects. Current Tuple structure occupies in JVM heap
space. As a result, it is hard to control memory usage, and it is impossible to
predict garbage collection. This problem usually becomes severe when Tajo deals
with very large data in relatively small cluster and lots of grouping or join
keys.
I've tried various tests and I made some prototype to show the possibility to
eliminate this problem.
The main idea is as follows:
* Do not use Datum class in expression evaluation. Instead, we should use java
primitive type values
** It will significantly reduces object creations and memory usages
* Redesign Tuple using direct memory allocation (DirectByteBuffer or
Unsafe.allocateMemory)
** It allows each worker to control memory usages during in-memory
operations like sort and hash aggregation/joins.
** It enables column values to be stored in adjacent memory, improving cache
locality.
In order to achieve the above idea, we should do as follows:
* implement an alternative to EvalNode framework
* Design new tuple data structure using direct memory allocation
* Refactor in-memory sort, hash aggregation, hash join operators
This is an umbrella issue. I'll create subtasks, and I've already started some
issues. I'll use this jira to track them.
--
This message was sent by Atlassian JIRA
(v6.2#6252)