[ 
https://issues.apache.org/jira/browse/CARBONDATA-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushant Sammanwar updated CARBONDATA-4187:
------------------------------------------
    Comment: was deleted

(was: [^Testing-Results-loading-stats_Carbon.xlsx]

Adding test results, details)

> Performance Issue with Materialized views - increased loading time due to 
> full refresh
> --------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-4187
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4187
>             Project: CarbonData
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.1.0
>            Reporter: Sushant Sammanwar
>            Priority: Major
>              Labels: materializedviews, performance
>
> Hi Team ,
> We have been doing a POC by using Carbon 2.1.0 and created a wrapper code 
> around carbon and deployed it as docker container.
> Concurrent data loading is happening in many tables.
> Our objective if get optimal performance for aggregated queries and using 
> materialized views .
> Our observation is after creating MVs data loading is slow and not able to 
> keep-up the pace of incoming data .
> Process is also consuming a lot of memory when MVs are created .
> Data is received in continuous manner and MVs are refreshed which is 
> resulting in increased load time.
> Ideally MVs should only perform incremental refresh as it doesnot require to 
> calculate old data again.
> But it seems the full refresh is causing high memory usages and increased 
> loading time.
> Testing involved loading data without MVs for 6 hrs , then creating MVs and 
> load data again for 4 hours.
> Loading time with MVs increased there creating backlog of data ( loaded only 
> 1/5 th no. of rows than expected).
> Below are major bottlenecks observed :
> 1. High Memory consumption after creating MVs
> 2. MVs doing a full refresh
> Please find attached details of testing with list of tables.
> Below is definition of table :
> create table if not exists fact_365_1_eutrancell_1 (ts timestamp, metric 
> STRING, tags_id STRING, value DOUBLE, epoch bigint) partitioned by (ts2 
> timestamp) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='metric')
> Below is definition of MV :
> create materialized view if not exists fact_365_1_eutrancell_1_hour as select 
> tags_id ,metric,timeseries(ts,'hour') as 
> ts,sum(value),avg(value),min(value),max(value) from fact_365_1_eutrancell_1 
> group by metric, tags_id, timeseries(ts,'hour')
> Can you suggest why MV creation is slowing down the ingestion so much and 
> what can be done to improve ?
> Is there any way to have incremental refresh of MV - refresh only that hour 
> for which we are loading the data ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to