[jira] [Commented] (CARBONDATA-4239) Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

Sushant Sammanwar (Jira) Wed, 14 Jul 2021 08:30:04 -0700


    [ 
https://issues.apache.org/jira/browse/CARBONDATA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380685#comment-17380685
 ]


Sushant Sammanwar commented on CARBONDATA-4239:
-----------------------------------------------

Thanks [~Indhumathi27] for your response.

If it is expected to for MV to write data to new segment then what benefit is 
MV giving here.
I have data being inserted every 15 mins and for hourly MV all 4 rows are there 
in parent table as well as MV.
I donot get any benefit in terms of storage.
As far as query time is concerned as no. of rows are same in MV ,it will take 
same time to run query on table.

> Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly 
> -----------------------------------------------------------------------------
>
>                 Key: CARBONDATA-4239
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4239
>             Project: CarbonData
>          Issue Type: Bug
>          Components: core, data-load
>    Affects Versions: 2.1.1
>         Environment: RHEL  spark-2.4.5-bin-hadoop2.7 for carbon 2.1.1 
>            Reporter: Sushant Sammanwar
>            Priority: Major
>              Labels: Materialistic_Views, materializedviews, refreshnodes
>
> Hi Team ,
> We are doing a POC with Carbondata using MV .
> Our MV doesnot contain AVG function as we wanted to utilize the feature of 
> incremental refresh.
> But with incremetnal refresh , we noticed the MV doesnot aggregate value 
> correctly.
> If a row is inserted , it creates another row in MV instead of adding 
> incremental value .
> As a result no. of rows in MV are almost same as raw table.
> This doesnot happen with full refresh MV. 
> Below is the data in MV with 3 rows :
> scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show()
> +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |fact_365_1_eutrancell_21_tags_id|fact_365_1_eutrancell_21_metric| ts| 
> sum_value|min_value|max_value|fact_365_1_eutrancell_21_ts2|
> +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 
> 06:30:00|5412.6810000000005| 31.345| 4578.112| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 05:30:00| 1176.7035| 
> 392.2345| 392.2345| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 06:00:00| 58.112| 
> 58.112| 58.112| 2020-09-25 05:30:00|
> +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+
> Below , i am inserting data for 6th hour, and it should add incremental 
> values to 6th hour row of MV. 
> Note the data being inserted ; columns which are part of groupby clause are 
> having same values as existing data.
> scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 
> 06:05:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',118.112,'2020-09-25
>  05:30:00')").show()
> 21/06/28 16:01:31 AUDIT audit: \{"time":"June 28, 2021 4:01:31 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"START"}
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:33 AUDIT audit: \{"time":"June 28, 2021 4:01:33 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"START"}
> [Stage 40:=====================================================>(199 + 1) / 
> 200]21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row 
> batch one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"SUCCESS","opTime":"11343 
> ms","table":"default.fact_365_1_eutrancell_21_30_minute","extraInfo":{}}
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"SUCCESS","opTime":"13137 
> ms","table":"default.fact_365_1_eutrancell_21","extraInfo":{}}
> +----------+
> |Segment ID|
> +----------+
> | 8|
> +----------+
> Below we can see it has added another row of 2020-09-25 06:00:00 .
> Note: All values of columns which are part of groupby caluse have same value.
> This means there should have been single row for 2020-09-25 06:00:00 .
> scala> carbon.sql("select * from 
> fact_365_1_eutrancell_21_30_minute").show(1000,false)
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |fact_365_1_eutrancell_21_tags_id |fact_365_1_eutrancell_21_metric |ts 
> |sum_value |min_value|max_value|fact_365_1_eutrancell_21_ts2|
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:30:00|5412.6810000000005|31.345 |4578.112 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  05:30:00|1176.7035 |392.2345 |392.2345 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:00:00|58.112 |58.112 |58.112 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:00:00|118.112 |118.112 |118.112 |2020-09-25 05:30:00 |
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> scala> carbon.sql("select * from fact_365_1_eutrancell_21").show(1000,false)
> +-------------------+--------------------------------------+------------------------------------+--------+-------------------+
> |ts |metric |tags_id |value |ts2 |
> +-------------------+--------------------------------------+------------------------------------+--------+-------------------+
> |2020-09-25 
> 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25
>  05:30:00|
> |2020-09-25 
> 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25
>  05:30:00|
> |2020-09-25 
> 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25
>  05:30:00|
> |2020-09-25 
> 06:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|31.345
>  |2020-09-25 05:30:00|
> |2020-09-25 
> 06:40:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|745.112
>  |2020-09-25 05:30:00|
> |2020-09-25 
> 06:50:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|4578.112|2020-09-25
>  05:30:00|
> |2020-09-25 
> 06:55:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|58.112
>  |2020-09-25 05:30:00|
> |2020-09-25 
> 06:25:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|58.112
>  |2020-09-25 05:30:00|
> |2020-09-25 
> 06:05:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|118.112
>  |2020-09-25 05:30:00|
> +-------------------+--------------------------------------+------------------------------------+--------+-------------------+
>  
> after droping and creating the MV again, we can see single row with 
> 2020-09-25 06:00:00 .
> scala> carbon.sql("select * from 
> fact_365_1_eutrancell_21_30_minute").show(1000,false)
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |fact_365_1_eutrancell_21_tags_id |fact_365_1_eutrancell_21_metric |ts 
> |sum_value |min_value|max_value|fact_365_1_eutrancell_21_ts2|
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:30:00|5412.6810000000005|31.345 |4578.112 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  05:30:00|1176.7035 |392.2345 |392.2345 |2020-09-25 05:30:00 |
> |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25
>  06:00:00|176.224 |58.112 |118.112 |2020-09-25 05:30:00 |
> +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+
>  
> Please check what is the issue with incremental refresh MV.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CARBONDATA-4239) Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

Reply via email to