[
https://issues.apache.org/jira/browse/ASTERIXDB-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281114#comment-15281114
]
Wenhai edited comment on ASTERIXDB-1433 at 5/12/16 2:17 AM:
------------------------------------------------------------
The IO statistics is from the iostat command which is on average at the speed
of 160MB/s (with hot running) or 60MB/s (on code running). i.e., after we
aggregating a 60GB table, the reloading time of another aggregation will
consume at least 600s. Of course, we can question whether we configured so slow
disk system, but we have a huge memory space which is not so much expensive.
Best,
Wenhai
was (Author: lwhay):
The IO statistics is from the iostat command which is on average at the speed
of 160MB/s (with hot running) or 60MB/s (on code running). i.e., after we
aggregating a 60GB table, the reloading time of another aggregation will
consume at least 600s. Of course, we can question whether we configured so slow
disk system, but we have a huge memory space which is not so much expensive.
> Multiple cores with huge memory slow down in the big fact table aggregation.
> ----------------------------------------------------------------------------
>
> Key: ASTERIXDB-1433
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1433
> Project: Apache AsterixDB
> Issue Type: Improvement
> Components: Hyracks Core
> Environment: 10 nodes X Linux ubuntu/6 cpu X 4 cores/per cpu, 128 GB
> memory/per node.
> Reporter: Wenhai
>
> This is a classic hardware platform that shoes up the TB scale of dataset in
> total. AsterixDB does extremely well for the complex query that includes
> multiple join operators over a high-selectivity select operator. However, the
> running trace results demonstrate that, as compared to the big memory
> configurations, the original tables is always re-loaded from the disk to the
> actual memory even they have been handled in the latest query. To this end,
> why not provide the strategy to keep the intermediate data of the last
> completed query into the memory and free them in case the memory is not
> enough for the newly query. In some case, the user will always trigger the
> query with the different parameters on the same tables, for example, the
> variant-parameter aggregation on the single big fact table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)