[jira] [Commented] (CARBONDATA-4132) Numer of records not matching in MVs

Indhumathi (Jira) Wed, 14 Jul 2021 23:31:05 -0700


    [ 
https://issues.apache.org/jira/browse/CARBONDATA-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381080#comment-17381080
 ]


Indhumathi commented on CARBONDATA-4132:
----------------------------------------

Please refer the comment that i have added in CARBONDATA-4239 which can help 
you to use MV in better way for your scenario to get storage benefit and 
performance

> Numer of records not matching in MVs
> ------------------------------------
>
>                 Key: CARBONDATA-4132
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4132
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.0.1
>         Environment: Apache carbondata 2.0.1
>            Reporter: suyash yadav
>            Priority: Major
>             Fix For: 2.0.1
>
>
> Hi Team, 
> We are working on a POC where we need to insert 300k records/second in a 
> table where we have already created Timeeries MVs with Minute,Hour,Day 
> granularity.
>  
> As per our the Minute based MV should contain 300K records till the insertion 
> of next minute data. Also the hour and Day based MVs should contain 300K 
> records till the arrival of next hour and next day data respectively.
>  
> But The count of records in MV is not coming out as per our expectation.It is 
> always more than our expectation.
> But the strange thing is, When we drop the MV and create the MV after 
> inserting the data in the table then the count if reocrds comes correct.So it 
> is clear there is no problem with MV definition and the data.
>  
> Kindly help us in resolving this issue on priority.Please find more details 
> below:
> Table definition:
> ===========
> spark.sql("create table Flow_Raw_TS(export_ms bigint,exporter_ip 
> string,pkt_seq_num bigint,flow_seq_num int,src_ip string,dst_ip 
> string,protocol_id smallint,src_tos smallint,dst_tos smallint,raw_src_tos 
> smallint,raw_dst_tos smallint,src_mask smallint,dst_mask smallint,tcp_bits 
> int,src_port int,in_if_id bigint,in_if_entity_id bigint,in_if_enabled 
> boolean,dst_port int,out_if_id bigint,out_if_entity_id bigint,out_if_enabled 
> boolean,direction smallint,in_octets bigint,out_octets bigint,in_packets 
> bigint,out_packets bigint,next_hop_ip string,bgp_src_as_num 
> bigint,bgp_dst_as_num bigint,bgp_next_hop_ip string,end_ms timestamp,start_ms 
> timestamp,app_id string,app_name string,src_ip_group string,dst_ip_group 
> string,policy_qos_classification_hierarchy string,policy_qos_queue_id 
> bigint,worker_id int,day bigint ) stored as carbondata TBLPROPERTIES 
> ('local_dictionary_enable'='false')
> MV definition:
>  
> ==============
> +*Minute based*+
> spark.sql("create materialized view Flow_Raw_TS_agg_001_min as select 
> timeseries(end_ms,'minute') as 
> end_ms,src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,dst_ip_group,protocol_id,bgp_src_as_num,
>  bgp_dst_as_num,policy_qos_classification_hierarchy, 
> policy_qos_queue_id,sum(in_octets) as octects, sum(in_packets) as packets, 
> sum(out_packets) as out_packets, sum(out_octets) as out_octects FROM 
> Flow_Raw_TS group by 
> timeseries(end_ms,'minute'),src_ip,dst_ip,app_name,in_if_id,src_tos,src_ip_group,
>  
> dst_ip_group,protocol_id,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy,
>  policy_qos_queue_id").show()
> +*Hour Based*+
> val startTime = System.nanoTime
> spark.sql("create materialized view Flow_Raw_TS_agg_001_hour as select 
> timeseries(end_ms,'hour') as end_ms,app_name,sum(in_octets) as octects, 
> sum(in_packets) as packets, sum(out_packets) as out_packets, sum(out_octets) 
> as out_octects, in_if_id,src_tos,src_ip_group, 
> dst_ip_group,protocol_id,src_ip, dst_ip,bgp_src_as_num, 
> bgp_dst_as_num,policy_qos_classification_hierarchy, policy_qos_queue_id FROM 
> Flow_Raw_TS group by 
> timeseries(end_ms,'hour'),in_if_id,app_name,src_tos,src_ip_group,dst_ip_group,protocol_id,src_ip,
>  dst_ip,bgp_src_as_num,bgp_dst_as_num,policy_qos_classification_hierarchy, 
> policy_qos_queue_id").show()
> val endTime = System.nanoTime
> val elapsedSeconds = (endTime - startTime) / 1e9d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CARBONDATA-4132) Numer of records not matching in MVs

Reply via email to