+1 for this incremental allocation proposal. Also, our whole budgeting / memory management model was version 0 - and we need to think about version 1 sometime in the not-so-distant future. (Once upon a time we imagined an intelligent memory controller overseeing the union of buffer cache, working memories for queries, and in-memory components, and working with a future version of the query optimizer to make more intelligent choices. In the absence of statistics, workload info, and cost info, or runtime observations thereof, this was our first brain-dead approach to getting something running that we could improve later - and it seems to be getting towards later now. :-))

On 3/10/16 2:55 PM, Yingyi Bu wrote:
A more fundament question Is it possible that all those datasets share a
global budget in a multi-tenant way?
In principle, the budget should just be a upper-bound. If a dataset doesn't
need that much, it shouldn't pre-allocate all
"storage.memorycomponent.numpages"
pages.

However, in the current implementation, we pre-allocate all in-memory pages
upfront:
https://github.com/apache/incubator-asterixdb-hyracks/blob/master/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/VirtualBufferCache.java#L247

I think we should fix it to dynamically allocate memory when needed.  (Disk
buffer cache already does that.)

Best,
Yingyi


On Thu, Mar 10, 2016 at 2:46 PM, Jianfeng Jia <[email protected]>
wrote:

Dear Devs,

I have some questions about the memory management of the in-memory
components for different datasets.

The current AsterixDB backing the cloudberry demo is down every few days.
It always throws an exception like following:
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Failed
to open index with resource ID 7 since it does not exist.

As described in ASTERIXDB-1337, each dataset has a fixed budget no matter
how small/big it is. Then the number of datasets can be loaded at the same
time is also fixed by $number =
storage.memorycomponent.globalbudget/storage.memorycomponent.numpages. My
question is if we have more than $number of datasets, then the eviction
will happen? Will it evict a entire dataset of the victim? Base on the
symptom of above exception, it seems the metadata get evicted? Could we
protect the metadata from eviction?

A more fundament question Is it possible that all those datasets share a
global budget in a multi-tenant way?
In my workload there are one main dataset( ~10Gb) and five tiny auxiliary
datasets (each size <20M). In addition, the client will create a bunch of
temporary datasets depends on how many concurrent users are and each
temp-dataset will be “refreshed" for a new query. (The refresh is done by
drop and create the temp-dataset). It’s hard to find one
storage.memorycomponent.numpages that make every dataset happy.



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine



Reply via email to