[ https://issues.apache.org/jira/browse/CARBONDATA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381175#comment-17381175 ]
Indhumathi commented on CARBONDATA-4239: ---------------------------------------- MV can be used for real-time data loading, even for every 15 mins data, but, with more data. If you use INSERT to add a single row every 5/15 mins, then it will not give much benefit. As i already suggested in previous comments, you can still use MV for your scenario, with manual refresh. > Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly > ----------------------------------------------------------------------------- > > Key: CARBONDATA-4239 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4239 > Project: CarbonData > Issue Type: Bug > Components: core, data-load > Affects Versions: 2.1.1 > Environment: RHEL spark-2.4.5-bin-hadoop2.7 for carbon 2.1.1 > Reporter: Sushant Sammanwar > Priority: Major > Labels: Materialistic_Views, materializedviews, refreshnodes > > Hi Team , > We are doing a POC with Carbondata using MV . > Our MV doesnot contain AVG function as we wanted to utilize the feature of > incremental refresh. > But with incremetnal refresh , we noticed the MV doesnot aggregate value > correctly. > If a row is inserted , it creates another row in MV instead of adding > incremental value . > As a result no. of rows in MV are almost same as raw table. > This doesnot happen with full refresh MV. > Below is the data in MV with 3 rows : > scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show() > +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+ > |fact_365_1_eutrancell_21_tags_id|fact_365_1_eutrancell_21_metric| ts| > sum_value|min_value|max_value|fact_365_1_eutrancell_21_ts2| > +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+ > | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 > 06:30:00|5412.6810000000005| 31.345| 4578.112| 2020-09-25 05:30:00| > | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 05:30:00| 1176.7035| > 392.2345| 392.2345| 2020-09-25 05:30:00| > | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 06:00:00| 58.112| > 58.112| 58.112| 2020-09-25 05:30:00| > +--------------------------------+-------------------------------+-------------------+------------------+---------+---------+----------------------------+ > Below , i am inserting data for 6th hour, and it should add incremental > values to 6th hour row of MV. > Note the data being inserted ; columns which are part of groupby clause are > having same values as existing data. > scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 > 06:05:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',118.112,'2020-09-25 > 05:30:00')").show() > 21/06/28 16:01:31 AUDIT audit: \{"time":"June 28, 2021 4:01:31 PM > IST","username":"root","opName":"INSERT > INTO","opId":"7332282307468267","opStatus":"START"} > 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch > one more time. > 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch > one more time. > 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch > one more time. > 21/06/28 16:01:33 AUDIT audit: \{"time":"June 28, 2021 4:01:33 PM > IST","username":"root","opName":"INSERT > INTO","opId":"7332284066443156","opStatus":"START"} > [Stage 40:=====================================================>(199 + 1) / > 200]21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row > batch one more time. > 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch > one more time. > 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch > one more time. > 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM > IST","username":"root","opName":"INSERT > INTO","opId":"7332284066443156","opStatus":"SUCCESS","opTime":"11343 > ms","table":"default.fact_365_1_eutrancell_21_30_minute","extraInfo":{}} > 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM > IST","username":"root","opName":"INSERT > INTO","opId":"7332282307468267","opStatus":"SUCCESS","opTime":"13137 > ms","table":"default.fact_365_1_eutrancell_21","extraInfo":{}} > +----------+ > |Segment ID| > +----------+ > | 8| > +----------+ > Below we can see it has added another row of 2020-09-25 06:00:00 . > Note: All values of columns which are part of groupby caluse have same value. > This means there should have been single row for 2020-09-25 06:00:00 . > scala> carbon.sql("select * from > fact_365_1_eutrancell_21_30_minute").show(1000,false) > +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+ > |fact_365_1_eutrancell_21_tags_id |fact_365_1_eutrancell_21_metric |ts > |sum_value |min_value|max_value|fact_365_1_eutrancell_21_ts2| > +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+ > |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25 > 06:30:00|5412.6810000000005|31.345 |4578.112 |2020-09-25 05:30:00 | > |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25 > 05:30:00|1176.7035 |392.2345 |392.2345 |2020-09-25 05:30:00 | > |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25 > 06:00:00|58.112 |58.112 |58.112 |2020-09-25 05:30:00 | > |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25 > 06:00:00|118.112 |118.112 |118.112 |2020-09-25 05:30:00 | > +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+ > scala> carbon.sql("select * from fact_365_1_eutrancell_21").show(1000,false) > +-------------------+--------------------------------------+------------------------------------+--------+-------------------+ > |ts |metric |tags_id |value |ts2 | > +-------------------+--------------------------------------+------------------------------------+--------+-------------------+ > |2020-09-25 > 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25 > 05:30:00| > |2020-09-25 > 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25 > 05:30:00| > |2020-09-25 > 05:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|392.2345|2020-09-25 > 05:30:00| > |2020-09-25 > 06:30:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|31.345 > |2020-09-25 05:30:00| > |2020-09-25 > 06:40:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|745.112 > |2020-09-25 05:30:00| > |2020-09-25 > 06:50:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|4578.112|2020-09-25 > 05:30:00| > |2020-09-25 > 06:55:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|58.112 > |2020-09-25 05:30:00| > |2020-09-25 > 06:25:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|58.112 > |2020-09-25 05:30:00| > |2020-09-25 > 06:05:00|eUtranCell.HHO.X2.InterFreq.PrepAttOut|ff6cb0f7-fba0-4134-81ee-55e820574627|118.112 > |2020-09-25 05:30:00| > +-------------------+--------------------------------------+------------------------------------+--------+-------------------+ > > after droping and creating the MV again, we can see single row with > 2020-09-25 06:00:00 . > scala> carbon.sql("select * from > fact_365_1_eutrancell_21_30_minute").show(1000,false) > +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+ > |fact_365_1_eutrancell_21_tags_id |fact_365_1_eutrancell_21_metric |ts > |sum_value |min_value|max_value|fact_365_1_eutrancell_21_ts2| > +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+ > |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25 > 06:30:00|5412.6810000000005|31.345 |4578.112 |2020-09-25 05:30:00 | > |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25 > 05:30:00|1176.7035 |392.2345 |392.2345 |2020-09-25 05:30:00 | > |ff6cb0f7-fba0-4134-81ee-55e820574627|eUtranCell.HHO.X2.InterFreq.PrepAttOut|2020-09-25 > 06:00:00|176.224 |58.112 |118.112 |2020-09-25 05:30:00 | > +------------------------------------+--------------------------------------+-------------------+------------------+---------+---------+----------------------------+ > > Please check what is the issue with incremental refresh MV. -- This message was sent by Atlassian Jira (v8.3.4#803005)