[jira] [Created] (CARBONDATA-4346) Performance degrade for incremental update operation
SHREELEKHYA GAMPA created CARBONDATA-4346: - Summary: Performance degrade for incremental update operation Key: CARBONDATA-4346 URL: https://issues.apache.org/jira/browse/CARBONDATA-4346 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA [Steps] :- There is a degrade for incremental update operation for Partition table . We have used Jinling Table and loaded 100 csv and executed 100 incremental update operation on table . [Expected Result] :- There should not be any performance degrade for incremental update operation [Actual Issue]:- Performance degrade for incremental update operation(partition table). 【Cause location】:- NA -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CARBONDATA-4344) Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCAL _DICTIONARY_EXCLUDE column: does not exist in table. Please check the DDL" error
SHREELEKHYA GAMPA created CARBONDATA-4344: - Summary: Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCAL _DICTIONARY_EXCLUDE column: does not exist in table. Please check the DDL" error Key: CARBONDATA-4344 URL: https://issues.apache.org/jira/browse/CARBONDATA-4344 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA [Steps] :- >From spark beeline the queries are executed. drop table if exists uniqdata; CREATE TABLE uniqdata(CUST_ID int ,CUST_NAME string,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 int) STORED AS carbondata; LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS ('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); alter table uniqdata add columns(a int); take store from hdfs Drop table uniqdata; Put store in hdfs refresh table uniqdata; drop MATERIALIZED VIEW if exists uniq2_mv; create MATERIALIZED VIEW uniq2_mv as select CUST_NAME, sum(CUST_ID) from uniqdata group by CUST_NAME; [Expected Result] :- Create MV fails should be successful for table created in older version. [Actual Issue]:- Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCAL _DICTIONARY_EXCLUDE column: does not exist in table. Please check the DDL" error -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (CARBONDATA-4341) Drop Index Fails after TABLE RENAME
SHREELEKHYA GAMPA created CARBONDATA-4341: - Summary: Drop Index Fails after TABLE RENAME Key: CARBONDATA-4341 URL: https://issues.apache.org/jira/browse/CARBONDATA-4341 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Drop Index Fails after TABLE RENAME [Steps] :- >From spark beeline the queries are executed. drop table if exists uniqdata; CREATE TABLE uniqdata(CUST_ID int ,CUST_NAME string,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 int) STORED AS carbondata; LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS ('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); create index uniq2_index on table uniqdata(CUST_NAME) as 'carbondata'; alter table uniqdata rename to uniqdata_i; drop index if exists uniq2_index on uniqdata_i; [Expected Result] :- Drop Index should be success after TABLE RENAME [Actual Issue]:- Drop Index Fails after TABLE RENAME Error message: Table or view 'uniqdata_i' not found in database 'default'; -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (CARBONDATA-4330) Incremental Dataload of Average aggregate in MV
SHREELEKHYA GAMPA created CARBONDATA-4330: - Summary: Incremental Dataload of Average aggregate in MV Key: CARBONDATA-4330 URL: https://issues.apache.org/jira/browse/CARBONDATA-4330 Project: CarbonData Issue Type: Improvement Reporter: SHREELEKHYA GAMPA Currently, whenever MV is created with average aggregate, a full refresh is done meaning it reloads the whole MV for any newly added segments. This will slow down the loading. With incremental data load, only the segments that are newly added can be loaded to the MV. If avg is present, rewrite the query with the sum and count of the columns to create MV and use them to derive avg. Refer: https://docs.google.com/document/d/1kPEMCX50FLZcmyzm6kcIQtUH9KXWDIqh-Hco7NkTp80/edit -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (CARBONDATA-4328) Load parquet table with options error message fix
SHREELEKHYA GAMPA created CARBONDATA-4328: - Summary: Load parquet table with options error message fix Key: CARBONDATA-4328 URL: https://issues.apache.org/jira/browse/CARBONDATA-4328 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA If parquet table is created and load statement with options is triggerred, then its failing with NoSuchTableException: Table ${tableIdentifier.table} does not exist. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (CARBONDATA-4327) Update documentation related to partition
SHREELEKHYA GAMPA created CARBONDATA-4327: - Summary: Update documentation related to partition Key: CARBONDATA-4327 URL: https://issues.apache.org/jira/browse/CARBONDATA-4327 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Drop partition with data is not supported and a few of the links are not working in https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (CARBONDATA-4326) mv created in beeline not hitting in sql/shell and vice versa if both beeline and sql/shell are running in parellel
SHREELEKHYA GAMPA created CARBONDATA-4326: - Summary: mv created in beeline not hitting in sql/shell and vice versa if both beeline and sql/shell are running in parellel Key: CARBONDATA-4326 URL: https://issues.apache.org/jira/browse/CARBONDATA-4326 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA [Steps] :- When MV is created in spark-shell/spark-sql on table created using Spark Dataframe, Explain query hits MV in spark-shell/spark-sql, but doesnt hit MV in spark-beeline, Same is the case when MV is created in spark-beeline on table created using Spark Dataframe, query hits MV in spark-beeline, but doesnt hit MV in spark-shell/spark-sql. This issue is faced when both sessions are running in parallel during MV Creation. On restarting the sessions of Spark-shell/ Spark-beeline, query hits the MV in both sessions. Queries Table created using Spark Dataframe: val geoSchema = StructType(Seq(StructField("timevalue", LongType, nullable = true), StructField("longitude", LongType, nullable = false), StructField("latitude", LongType, nullable = false))) val geoDf = sqlContext.read.option("delimiter", ",").option("header", "true").schema(geoSchema).csv("hdfs://hacluster/geodata/geodata.csv") sql("drop table if exists source_index_df").show() geoDf.write .format("carbondata") .option("tableName", "source_index_df") .mode(SaveMode.Overwrite) .save() Queries for MV created in spark-shell: sql("CREATE MATERIALIZED VIEW datamap_mv1 as select latitude,longitude from source_index_df group by latitude,longitude").show() sql("explain select latitude,longitude from source_index_df group by latitude,longitude").show(100,false) Queries for MV created in spark-beeline/spark-sql: CREATE MATERIALIZED VIEW datamap_mv1 as select latitude,longitude from source_index_df group by latitude,longitude; explain select latitude,longitude from source_index_df group by latitude,longitude; [Expected Result] :- mv created in beeline should hit the sql/shell and vice versa if both beeline and sql/shell are running in parellel [Actual Issue]:- mv created in beeline not hitting in sql/shell and vice versa if both beeline and sql/shell are running in parellel -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (CARBONDATA-4322) Insert into local sort partition table select * from text table launch thousands tasks
SHREELEKHYA GAMPA created CARBONDATA-4322: - Summary: Insert into local sort partition table select * from text table launch thousands tasks Key: CARBONDATA-4322 URL: https://issues.apache.org/jira/browse/CARBONDATA-4322 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA [Reproduce steps] # CREATE TABLE partitionthree1 (empno int, doj Timestamp, workgroupcategoryname String, deptno int, deptname String, projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization int,salary int, empname String, designation String) PARTITIONED BY (workgroupcategory int) STORED AS carbondata tblproperties('sort_scope'='local_sort', 'sort_columns'='deptname,empname'); # CREATE TABLE partitionthree2 (empno int, doj Timestamp, workgroupcategoryname String, deptno int, deptname String, projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization int,salary int, empname String, designation String) PARTITIONED BY (workgroupcategory int); # LOAD DATA local inpath 'hdfs://hacluster/user/data.csv' INTO TABLE partitionthree1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"', 'TIMESTAMPFORMAT'='dd-MM-'); # set hive.exec.dynamic.partition.mode=nonstrict; # insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; insert into partitionthree2 select * from partitionthree1; # insert into partitionthree1 select * from partitionthree2; [Expect Result] Step 6 only launches number of tasks equal to number of nodes. [Current Behavior] Number of tasks far larger than number of nodes. [Impact] In several product sites, query performance get impact significantly. [Initial analysis] Insert into non partition local sort table will launch number of tasks equal to number of nodes, make partition table the same. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (CARBONDATA-4298) IS_EMPTY_DATA_BAD_RECORD property not supported for complex types.
SHREELEKHYA GAMPA created CARBONDATA-4298: - Summary: IS_EMPTY_DATA_BAD_RECORD property not supported for complex types. Key: CARBONDATA-4298 URL: https://issues.apache.org/jira/browse/CARBONDATA-4298 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA {{IS_EMPTY_DATA_BAD_RECORD}} property not supported for complex types. A flag to determine if empty record is to be considered a bad record or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4292) Support spatial index creation using data frame
SHREELEKHYA GAMPA created CARBONDATA-4292: - Summary: Support spatial index creation using data frame Key: CARBONDATA-4292 URL: https://issues.apache.org/jira/browse/CARBONDATA-4292 Project: CarbonData Issue Type: New Feature Reporter: SHREELEKHYA GAMPA To support spatial index creation using data frame -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4284) Load/insert after alter add column on partition table with complex column fails
SHREELEKHYA GAMPA created CARBONDATA-4284: - Summary: Load/insert after alter add column on partition table with complex column fails Key: CARBONDATA-4284 URL: https://issues.apache.org/jira/browse/CARBONDATA-4284 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Insert after alter add column on partition table with complex column fails with bufferUnderFlowException [Steps] :- drop table if exists strarmap1; create table strarmap1(id int,str struct>,arr array>) PARTITIONED BY(name string) stored as carbondata tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,str,arr'); load data inpath 'hdfs://hacluster/chetan/strarmap1.csv' into table strarmap1 partition(name='name0') options('fileheader'='id,name,str,arr','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE'); select * from strarmap1 limit 1; show partitions strarmap1; ALTER TABLE strarmap1 ADD COLUMNS(map1 Map, map2 Map, map3 Map, map4 Map, map5 Map,map6 Map,map7 map>, map8 map>>); load data inpath 'hdfs://hacluster/chetan/strarmap1.csv' into table strarmap1 partition(name='name0') options('fileheader'='id,name,str,arr,map1,map2,map3,map4,map5,map6,map7,map8','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE'); [Expected Result] :- load after add map columns on partition table should be success [Actual Issue]:- error on load after add map columns on partition table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4282) Issues with table having complex columns related to long string, SI, local dictionary
SHREELEKHYA GAMPA created CARBONDATA-4282: - Summary: Issues with table having complex columns related to long string, SI, local dictionary Key: CARBONDATA-4282 URL: https://issues.apache.org/jira/browse/CARBONDATA-4282 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA *1. Insert/load fails after alter add complex column if table contains long string columns.* [Steps] :- DROP TABLE IF EXISTS alter_com; CREATE TABLE alter_com(intfield int,EDUCATED string ,rankk string ) STORED AS carbondata TBLPROPERTIES('inverted_index'='intField','sort_columns'='intField','TABLE_BLOCKSIZE'= '256 MB','TABLE_BLOCKLET_SIZE'='8','SORT_SCOPE'='no_sort','COLUMN_META_CACHE'='rankk','carbon.column.compressor'='gzip','long_string_columns'='rankk','table_page_size_inmb'='1'); insert into alter_com values(1,'cse','xi'); select * from alter_com limit 1; ALTER TABLE alter_com ADD COLUMNS(map1 Map, map2 Map, map3 Map, map4 Map, map5 Map,map6 Map,map7 map>, map8 map>>); ALTER TABLE alter_com SET TBLPROPERTIES('long_string_columns'='EDUCATED'); insert into alter_com values(1,'ece','x', map(1,2),map(3,2.34), map(1.23,'hello'),map('abc','def'), map(true,'2017-02-01'),map('time','2018-02-01 02:00:00.0'),map('ph',array(1,2)), map('a',named_struct('d',23,'s',named_struct('im','sh'; [Expected Result] :- insert/load should be success after alter add map column ,if table contains long string columns *2. create index on array of complex column (map/struct) throws null pointer exception instead of correct error message.* [Steps] :- drop table if exists strarmap1; create table strarmap1(id int,name string,str struct>,arr array>) stored as carbondata tblproperties('inverted_index'='name','sort_columns'='name','TABLE_BLOCKSIZE'= '256 MB','TABLE_BLOCKLET_SIZE'='8','CACHE_LEVEL'='BLOCKLET'); load data inpath 'hdfs://hacluster/chetan/strarmap1.csv' into table strarmap1 options('fileheader'='id,name,str,arr','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE'); CREATE INDEX index2 ON TABLE strarmap1 (arr) as 'carbondata' properties('sort_scope'='global_sort','global_sort_partitions'='3'); [Expected Result] :- create index on array of map(string,timestamp) should thrown correct validation error message. [Actual Issue]:- create index on array of map(string,timestamp) throws null pointer exception instead of correct error message *3. alter table property local dictionary inlcude/exclude with newly added map column is failing.* [Steps] :- drop table if exists strarmap1; create table strarmap1(id int,name string,str struct>,arr array>) stored as carbondata tblproperties('inverted_index'='name','sort_columns'='name','local_dictionary_enable'='false','local_dictionary_include'='map1','local_dictionary_exclude'='str,arr','local_dictionary_threshold'='1000'); ALTER TABLE strarmap1 ADD COLUMNS(map1 Map, map2 Map, map3 Map, map4 Map, map5 Map,map6 Map,map7 map>, map8 map>>); ALTER TABLE strarmap1 SET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','local_dictionary_include'='map4','local_dictionary_threshold'='1000'); [Expected Result] :- alter table property local dictionary inlcude/exclude with newly added map column should be success [Actual Issue]:- alter table property local dictionary inlcude/exclude with newly added map column is failing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4274) Create partition table error with spark 3.1
SHREELEKHYA GAMPA created CARBONDATA-4274: - Summary: Create partition table error with spark 3.1 Key: CARBONDATA-4274 URL: https://issues.apache.org/jira/browse/CARBONDATA-4274 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA With spark 3.1, we can create a partition table by giving partition columns from schema. Like below example: {{create table partitionTable(c1 int, c2 int, v1 string, v2 string) stored as carbondata partitioned by (v2,c2)}} When the table is created by SparkSession with CarbonExtension, catalog table is created with the specified partitions. But in cluster/ with carbon session, when we create partition table with above syntax it is creating normal table with no partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4119) User Input for GeoID column not validated.
[ https://issues.apache.org/jira/browse/CARBONDATA-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400186#comment-17400186 ] SHREELEKHYA GAMPA commented on CARBONDATA-4119: --- As part of enhancement, insert with customized geoID changes were made. Will update the documentation accordingly. > User Input for GeoID column not validated. > -- > > Key: CARBONDATA-4119 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4119 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.1.0 >Reporter: PURUJIT CHAUGULE >Priority: Minor > > * User Input for geoId column can be paired to multiple pairs of source > columns values (correct internally calculated geoID values are different for > such above source columns values). > * The advantage of using geoID is not applicable when taking user input for > GeoId column is not validated and user input values may differ from actual > internally calculated values. GeoID value is only generated internally if > user does not input the geoID column. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4165) Carbondata summing up two values of same timestamp.
[ https://issues.apache.org/jira/browse/CARBONDATA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383867#comment-17383867 ] SHREELEKHYA GAMPA commented on CARBONDATA-4165: --- Hi Suyash, Can you please share more details of the problem with some example queries. > Carbondata summing up two values of same timestamp. > --- > > Key: CARBONDATA-4165 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4165 > Project: CarbonData > Issue Type: Wish > Components: core >Affects Versions: 2.0.1 > Environment: apache carbondata 2.0.1, apache spark 2.4.5 hadoop 2.7.2 >Reporter: suyash yadav >Priority: Major > Fix For: 2.0.1 > > > Hi Team, > > We have seen a behaviour while using Carbondata 2.0.1 that if we get 2 values > for same timestamp then it tries to sum both the values and put it as one > value. Instead we need that it should discard previous value and use the > latest one. > > Please let us know if there is any functionality already available in > carbondata to handle duplicate values by it self or if there is any plan to > implement such a functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4231) On update operation with 3.1v, cloned spark session is used and set properties are lost.
SHREELEKHYA GAMPA created CARBONDATA-4231: - Summary: On update operation with 3.1v, cloned spark session is used and set properties are lost. Key: CARBONDATA-4231 URL: https://issues.apache.org/jira/browse/CARBONDATA-4231 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA *Update operation with bad records property fails with 3.1v.* *[Steps to reproduce]:* 0: jdbc:hive2://linux-221:22550/> set carbon.options.bad.records.action=force; +++ | key | value | +++ | carbon.options.bad.records.action | force | +++ 1 row selected (0.04 seconds) 0: jdbc:hive2://linux-221:22550/> create table t_carbn1(item_type_cd int, sell_price bigint, profit decimal(10,4), item_name string, update_time timestamp) stored a +-+ | Result | +-+ +-+ No rows selected (2.117 seconds) 0: jdbc:hive2://linux-221:22550/> insert into t_carbn1 select 2, 10,23.3,'Apple','2012-11-11 11:11:11'; INFO : Execution ID: 858 +-+ | Segment ID | +-+ | 0 | +-+ 1 row selected (4.278 seconds) 0: jdbc:hive2://linux-221:22550/> update t_carbn1 set (item_type_cd) = (item_type_cd/1); Error: org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.RuntimeException: Update operation failed. DataLoad failure *[Root cause]:* On update command, persist is called and with latest 3.1 spark changes, spark returns a cloned SparkSession from cacheManager with all specified configurations disabled. As now its using different sparkSession for 3.1 which is not initialized in CarbonEnv. So CarbonEnv.init is called where new CarbonSessionInfo is created with no sessionParams. So, the properties set were not accessible. Spark creates cloned spark session based on following properties: 1. spark.sql.optimizer.canChangeCachedPlanOutputPartitioning 2. spark.sql.sources.bucketing.autoBucketedScan.enabled 3. spark.sql.adaptive.enabled -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4205) MINOR compaction getting triggered by it self while inserting data to a table
[ https://issues.apache.org/jira/browse/CARBONDATA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365535#comment-17365535 ] SHREELEKHYA GAMPA commented on CARBONDATA-4205: --- Hi, can you share the carbon configuration set? Pls check for carbon.enable.auto.load.merge and carbon.compaction.level.threshold properties. When carbon.enable.auto.load.merge is set to true, compaction will be automatically triggered once data load completes. > MINOR compaction getting triggered by it self while inserting data to a table > - > > Key: CARBONDATA-4205 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4205 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.0.1 > Environment: apache carbondata 2.0.1, hadoop 2.7.2, spark 2.4.5 >Reporter: suyash yadav >Priority: Major > > Hi Team we have created a table and also created a timeseries MV on it. Later > we tried to insert a some data from other table to this newly created table > but we have observed that while inserting ...MINOR compaction on the MV is > getting triggered by it self. It doesn't happen for all the insert but > whnever we insert 6 to 7th hour data and then 14 to 15 hour datathe MINOR > compaction gets triggered. Could you tell us why the MINOR compaction is > getting triggered by it self. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4211) from xx Insert into select fails if an SQL statement contains multiple inserts
SHREELEKHYA GAMPA created CARBONDATA-4211: - Summary: from xx Insert into select fails if an SQL statement contains multiple inserts Key: CARBONDATA-4211 URL: https://issues.apache.org/jira/browse/CARBONDATA-4211 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA When multiple inserts with single query is used, it fails from SparkPlan with: {{java.lang.ClassCastException: GenericInternalRow cannot be cast to UnsafeRow}}. [Steps] :- >From Spark SQL execute the following queries 1、create tables: create table catalog_returns_5(cr_returned_date_sk int,cr_returned_time_sk int,cr_item_sk int)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' ; create table catalog_returns_6(cr_returned_time_sk int,cr_item_sk int) partitioned by (cr_returned_date_sk int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ( 'table_blocksize'='64'); 2、insert table: from catalog_returns_5 insert overwrite table catalog_returns_6 partition (cr_returned_date_sk) select cr_returned_time_sk, cr_item_sk, cr_returned_date_sk where cr_returned_date_sk is not null distribute by cr_returned_date_sk insert overwrite table catalog_returns_6 partition (cr_returned_date_sk) select cr_returned_time_sk, cr_item_sk, cr_returned_date_sk where cr_returned_date_sk is null distribute by cr_returned_date_sk; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4202) Fix issue when refresh main table with MV
SHREELEKHYA GAMPA created CARBONDATA-4202: - Summary: Fix issue when refresh main table with MV Key: CARBONDATA-4202 URL: https://issues.apache.org/jira/browse/CARBONDATA-4202 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA [Problem phenomenon] : - error when trying to refresh main table which contain mv in 2.1.1. Store for main table with mv created in 2.1.0 [Steps] :- CREATE TABLE originTable_mv (empno int, empname String, designation String, doj Timestamp,workgroupcategory int, workgroupcategoryname String, deptno int, deptname String,projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int,utilization int,salary int)STORED AS carbondata; LOAD DATA local inpath 'hdfs://hacluster/BabuStore/Data/data.csv' INTO TABLE originTable_mv OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"','timestampformat'='dd-MM-'); create MATERIALIZED VIEW datamap_comp_mv as select empno,sum(attendance) ,min(projectjoindate) ,max(projectenddate) ,avg(attendance) ,count(empno),count(distinct workgroupcategoryname) from originTable_mv group by empno; LOAD DATA local inpath 'hdfs://hacluster/BabuStore/Data/data.csv' INTO TABLE originTable_mv OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"','timestampformat'='dd-MM-'); Backup the store and copy the store . execute the refresh table command on the main table that has MV table. [Expected Result] :- refresh main table which contain mv in 2.1.1. Store for main table with mv created in 2.1.0 should be successful [Actual Issue] : -error when trying to refresh main table which contain mv in 2.1.1. Store for main table with mv created in 2.1.0 0: jdbc:hive2://linux-221:22550/> refresh table originTable_mv; Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: == Spark Parser: org.apache.spark.sql.hive.FISqlParser == extraneous input '2_1' expecting \{')', ','}(line 8, pos 25) == SQL == CREATE TABLE 2_1.origintable_mv ({{empno}} int,{{empname}} string,{{designation}} string,{{doj}} timestamp,{{workgroupcategory}} int,{{workgroupcategoryname}} string,{{deptno}} int,{{deptname}} string,{{projectcode}} int,{{projectjoindate}} timestamp,{{projectenddate}} timestamp,{{attendance}} int,{{utilization}} int,{{salary}} int) USING carbondata OPTIONS ( indexexists "false", sort_columns "", comment "", relatedmvtablesmap "\{"2_1":["datamap_comp_mv"]}", -^^^ bad_record_path "", local_dictionary_enable "true", indextableexists "false", tableName "origintable_mv", dbName "2_1", tablePath "hdfs://hacluster/user/hive/warehouse/carbon.store/2_1/origintable_mv", path "hdfs://hacluster/user/hive/warehouse/carbon.store/2_1/origintable_mv", isExternal "false", isTransactional "true", isVisible "true" ,carbonSchemaPartsNo '2',carbonSchema0 '\{"databaseName":"2_1","tableUniqueName":"2_1_origintable_mv","factTable":{"tableId":"5b9f23bf-c08a-49bf-9b33-6f1a397014e5","tableName":"origintable_mv","listOfColumns":[{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"empname","columnUniqueId":"663d1b53-2898-49e5-be7c-c7ce7c70d538","columnReferenceId":"663d1b53-2898-49e5-be7c-c7ce7c70d538","encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":1,"numberOfChild":0,"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"designation","columnUniqueId":"9f708f4b-5ce4-4169-b6db-dbe34092ded0","columnReferenceId":"9f708f4b-5ce4-4169-b6db-dbe34092ded0","encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":2,"numberOfChild":0,"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":2,"precedenceOrder":2,"name":"TIMESTAMP","sizeInBytes":-1},"columnName":"doj","columnUniqueId":"2f606b57-5fc3-4f98-93b9-3332e78cb475","columnReferenceId":"2f606b57-5fc3-4f98-93b9-3332e78cb475","encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":3,"numberOfChild":0,"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false},\{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"workgroupcategoryname","columnUniqueId":"b20c49ac-e59c-4142-a7b8-6b38cc65e908","columnReferenceId":"b20c49ac-e59c-4142-a7b8-6b38cc65e908","encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":5,"numberOfChild":0,"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"deptname","columnUniqueId":"dc6e48c9-9814
[jira] [Updated] (CARBONDATA-4143) UT with index server
[ https://issues.apache.org/jira/browse/CARBONDATA-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4143: -- Description: To enable to run UT with index server using flag {{useIndexServer.}} excluded some of the test cases to not run with index server. To Fix below issues: 1. With index server enabled, select query gives incorrect result with SI when parent and child table segments are not in sync. queries to execute: 0: jdbc:hive2://dggphisprb50622:22550/> create table test (c1 string,c2 int,c3 string,c5 string) STORED AS carbondata; +-+ |Result| +-+ +-+ No rows selected (0.564 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> load data inpath 'hdfs://hacluster/chetan/dest.csv' into table test; +-+ |Segment ID| +-+ |0| +-+ 1 row selected (1.764 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> create index index_test on table test (c3) AS 'carbondata'; +-+ |Result| +-+ +-+ No rows selected (2.412 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> load data inpath 'hdfs://hacluster/chetan/dest.csv' into table test; +-+ |Segment ID| +-+ |1| +-+ 1 row selected (2.839 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> select * from test where c3='dd'; +--+++---+ |c1|c2|c3|c5| +--+++---+ |d|4|dd|ddd| |d|4|dd|ddd| +--+++---+ 2 rows selected (3.452 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> delete from table index_test where segment.ID in(1); +-+ |Result| +-+ +-+ No rows selected (0.413 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> select * from test where c3='dd'; +--+++---+ |c1|c2|c3|c5| +--+++---+ |d|4|dd|ddd| +--+++---+ 1 row selected (3.262 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> Expected: to return 2 rows. 2. When reindex is triggered, if stale files are present in the segment directory the segment file is being written with incorrect file names. (both valid index and stale mergeindex file names). As a result, duplicate data is present in SI table but there is no error/incorrect query results. was: To enable to run UT with index server using flag {{useIndexServer.}} excluded some of the test cases to not run with index server. added test case with prepriming. To Fix below issues: 1. With index server enabled, select query gives incorrect result with SI when parent and child table segments are not in sync. queries to execute: 0: jdbc:hive2://dggphisprb50622:22550/> create table test (c1 string,c2 int,c3 string,c5 string) STORED AS carbondata; +-+ | Result | +-+ +-+ No rows selected (0.564 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> load data inpath 'hdfs://hacluster/chetan/dest.csv' into table test; +-+ | Segment ID | +-+ | 0 | +-+ 1 row selected (1.764 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> create index index_test on table test (c3) AS 'carbondata'; +-+ | Result | +-+ +-+ No rows selected (2.412 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> load data inpath 'hdfs://hacluster/chetan/dest.csv' into table test; +-+ | Segment ID | +-+ | 1 | +-+ 1 row selected (2.839 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> select * from test where c3='dd'; +-+-+-+--+ | c1 | c2 | c3 | c5 | +-+-+-+--+ | d | 4 | dd | ddd | | d | 4 | dd | ddd | +-+-+-+--+ 2 rows selected (3.452 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> delete from table index_test where segment.ID in(1); +-+ | Result | +-+ +-+ No rows selected (0.413 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> select * from test where c3='dd'; +-+-+-+--+ | c1 | c2 | c3 | c5 | +-+-+-+--+ | d | 4 | dd | ddd | +-+-+-+--+ 1 row selected (3.262 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> Expected: to return 2 rows. 2. When reindex is triggered, if stale files are present in the segment directory the segment file is being written with incorrect file names. (both valid index and stale mergeindex file names). As a result, duplicate data is present in SI table but there is no error/incorrect query results. > UT with index server > > > Key: CARBONDATA-4143 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4143 > Project: CarbonData > Issue Type: Improvement >Reporter: SHREELEKHYA GAMPA >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > To enable to run UT with index server using flag {{useIndexServer.}} > excluded some of the test cases to not run with
[jira] [Created] (CARBONDATA-4193) Fix compaction failure after alter add complex column.
SHREELEKHYA GAMPA created CARBONDATA-4193: - Summary: Fix compaction failure after alter add complex column. Key: CARBONDATA-4193 URL: https://issues.apache.org/jira/browse/CARBONDATA-4193 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA [Steps] :- >From spark beeline/SQL/Shell/Submit the following queries are executed drop table if exists alter_complex; create table alter_complex (a int, b string) stored as carbondata; insert into alter_complex select 1,'a'; insert into alter_complex select 1,'a'; insert into alter_complex select 1,'a'; insert into alter_complex select 1,'a'; insert into alter_complex select 1,'a'; select _from alter_complex; ALTER TABLE alter_complex ADD COLUMNS(struct1 STRUCT); insert into alter_complex select 3,'c',named_struct('s1',4,'s2','d'); insert into alter_complex select 3,'c',named_struct('s1',4,'s2','d'); insert into alter_complex select 3,'c',named_struct('s1',4,'s2','d'); insert into alter_complex select 3,'c',named_struct('s1',4,'s2','d'); insert into alter_complex select 3,'c',named_struct('s1',4,'s2','d'); select_ from alter_complex; alter table alter_complex compact 'minor'; OR alter table alter_complex compact 'major'; OR alter table alter_complex compact 'custom' where segment.id In (3,4,5,6); [Expected Result] :- Compaction should be success after alter add complex column. [Actual Issue] : - Compaction fails after alter add complex column. !https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/26/c71035/ec9486ee659c4374a211db588b2f6b2a/image.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4143) UT with index server
[ https://issues.apache.org/jira/browse/CARBONDATA-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4143: -- Description: To enable to run UT with index server using flag {{useIndexServer.}} excluded some of the test cases to not run with index server. added test case with prepriming. To Fix below issues: 1. With index server enabled, select query gives incorrect result with SI when parent and child table segments are not in sync. queries to execute: 0: jdbc:hive2://dggphisprb50622:22550/> create table test (c1 string,c2 int,c3 string,c5 string) STORED AS carbondata; +-+ | Result | +-+ +-+ No rows selected (0.564 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> load data inpath 'hdfs://hacluster/chetan/dest.csv' into table test; +-+ | Segment ID | +-+ | 0 | +-+ 1 row selected (1.764 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> create index index_test on table test (c3) AS 'carbondata'; +-+ | Result | +-+ +-+ No rows selected (2.412 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> load data inpath 'hdfs://hacluster/chetan/dest.csv' into table test; +-+ | Segment ID | +-+ | 1 | +-+ 1 row selected (2.839 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> select * from test where c3='dd'; +-+-+-+--+ | c1 | c2 | c3 | c5 | +-+-+-+--+ | d | 4 | dd | ddd | | d | 4 | dd | ddd | +-+-+-+--+ 2 rows selected (3.452 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> delete from table index_test where segment.ID in(1); +-+ | Result | +-+ +-+ No rows selected (0.413 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> select * from test where c3='dd'; +-+-+-+--+ | c1 | c2 | c3 | c5 | +-+-+-+--+ | d | 4 | dd | ddd | +-+-+-+--+ 1 row selected (3.262 seconds) 0: jdbc:hive2://dggphisprb50622:22550/> Expected: to return 2 rows. 2. When reindex is triggered, if stale files are present in the segment directory the segment file is being written with incorrect file names. (both valid index and stale mergeindex file names). As a result, duplicate data is present in SI table but there is no error/incorrect query results. was: To enable to run UT with index server using flag {{useIndexServer.}} excluded some of the test cases to not run with index server. added test case with prepriming. > UT with index server > > > Key: CARBONDATA-4143 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4143 > Project: CarbonData > Issue Type: Improvement >Reporter: SHREELEKHYA GAMPA >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > To enable to run UT with index server using flag {{useIndexServer.}} > excluded some of the test cases to not run with index server. > added test case with prepriming. > To Fix below issues: > 1. With index server enabled, select query gives incorrect result with SI > when parent and child table segments are not in sync. > queries to execute: > 0: jdbc:hive2://dggphisprb50622:22550/> create table test (c1 string,c2 > int,c3 string,c5 string) STORED AS carbondata; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.564 seconds) > 0: jdbc:hive2://dggphisprb50622:22550/> load data inpath > 'hdfs://hacluster/chetan/dest.csv' into table test; > +-+ > | Segment ID | > +-+ > | 0 | > +-+ > 1 row selected (1.764 seconds) > 0: jdbc:hive2://dggphisprb50622:22550/> create index index_test on table test > (c3) AS 'carbondata'; > +-+ > | Result | > +-+ > +-+ > No rows selected (2.412 seconds) > 0: jdbc:hive2://dggphisprb50622:22550/> load data inpath > 'hdfs://hacluster/chetan/dest.csv' into table test; > +-+ > | Segment ID | > +-+ > | 1 | > +-+ > 1 row selected (2.839 seconds) > 0: jdbc:hive2://dggphisprb50622:22550/> select * from test where c3='dd'; > +-+-+-+--+ > | c1 | c2 | c3 | c5 | > +-+-+-+--+ > | d | 4 | dd | ddd | > | d | 4 | dd | ddd | > +-+-+-+--+ > 2 rows selected (3.452 seconds) > 0: jdbc:hive2://dggphisprb50622:22550/> delete from table index_test where > segment.ID in(1); > +-+ > | Result | > +-+ > +-+ > No rows selected (0.413 seconds) > 0: jdbc:hive2://dggphisprb50622:22550/> select * from test where c3='dd'; > +-+-+-+--+ > | c1 | c2 | c3 | c5 | > +-+-+-+--+ > | d | 4 | dd | ddd | > +-+-+-+--+ > 1 row selected (3.262 seconds) > 0: jdbc:hive2://dggphisprb50622:22550/> > Expected: to return 2 rows. > 2. When reindex is triggered, if stale files are present in the segment > direct
[jira] [Created] (CARBONDATA-4174) Handle exception for desc column
SHREELEKHYA GAMPA created CARBONDATA-4174: - Summary: Handle exception for desc column Key: CARBONDATA-4174 URL: https://issues.apache.org/jira/browse/CARBONDATA-4174 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Validation not present for children column in desc column for a primitive datatype and higher level non existing children column desc column for a complex datatype drop table if exists complexcarbontable; create table complexcarbontable (deviceInformationId int,channelsId string,ROMSize string,purchasedate string,mobile struct,MAC array,gamePointId map,contractNumber double) STORED AS carbondata; describe column deviceInformationId.x on complexcarbontable; describe column channelsId.x on complexcarbontable; describe column mobile.imei.x on complexcarbontable; describe column MAC.item.x on complexcarbontable; describe column gamePointId.key.x on complexcarbontable; [Expected Result] :- Validation should be provided for children column in desc column for a primitive datatype and higher level non existing children column desc column for a complex datatype. Command execution should fail. [Actual Issue] : - Validation not present for children column in desc column for a primitive datatype and higher level non existing children column desc column for a complex datatype. As a result the command execution is successful. [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/21/c71035/7a3b04d78ceb4a489e6c038f4bb257db/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/21/c71035/7a3b04d78ceb4a489e6c038f4bb257db/image.png] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4173) Fix inverted index query issue
SHREELEKHYA GAMPA created CARBONDATA-4173: - Summary: Fix inverted index query issue Key: CARBONDATA-4173 URL: https://issues.apache.org/jira/browse/CARBONDATA-4173 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA select query with filter column which is present in inverted_index column does not return any value >From Spark beeline/SQL/Shell execute the following queries drop table if exists uniqdata6; CREATE TABLE uniqdata6(cust_id int,cust_name string,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 int)stored as carbondata TBLPROPERTIES ('sort_columns'='CUST_ID,CUST_NAME', 'inverted_index'='CUST_ID,CUST_NAME','sort_scope'='global_sort'); LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata6 OPTIONS ('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); select cust_name from uniqdata6 limit 5; select * from uniqdata6 where CUST_NAME='CUST_NAME_2'; select * from uniqdata6 where CUST_NAME='CUST_NAME_3'; [Expected Result] :- select query with filter column which is present in inverted_index column should return correct value [Actual Issue] : - select query with filter column which is present in inverted_index column does not return any value [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/15/c71035/05443c9a9c11457e947645f1cf0ad347/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/15/c71035/05443c9a9c11457e947645f1cf0ad347/image.png] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4168) UDF validation Issues related to Geospatial support
SHREELEKHYA GAMPA created CARBONDATA-4168: - Summary: UDF validation Issues related to Geospatial support Key: CARBONDATA-4168 URL: https://issues.apache.org/jira/browse/CARBONDATA-4168 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA --Gives a wrong error message for size less than equal to 0. drop table if exists source_index; create table source_index(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS carbondata TBLPROPERTIES ('SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.832277','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.conversionRatio'='100'); LOAD DATA inpath '/geodata/geodata2.csv' INTO TABLE source_index OPTIONS ('DELIMITER'= ','); select longitude, latitude from source_index where IN_POLYLINE_LIST('LINESTRING (120.184179 30.327465, 120.191603 30.328946, 120.199242 30.324464, 120.190359 30.315388)', -65); select longitude, latitude from source_index where IN_POLYLINE_LIST('LINESTRING (120.184179 30.327465, 120.191603 30.328946, 120.199242 30.324464, 120.190359 30.315388)', 0); Scenario 2: --Accepts Invalid Buffer Size select longitude, latitude from source_index where IN_POLYLINE_LIST('LINESTRING (120.184179 30.327465, 120.191603 30.328946, 120.199242 30.324464), LINESTRING (120.199242 30.324464, 120.190359 30.315388)', 'X'); Scenario 3: --Accepts negative and 0 gridSize select LatLngToGeoId(39930753, 116302895, 39.832277, -50) as geoId; select GeoIdToLatLng(855279270226, 39.832277, -50) as LatitudeAndLongitude; select ToRangeList('116.321011 40.123503, 116.320311 40.122503,116.32 40.121503, 116.321011 40.123503', 39.832277, -50) as rangeList; select LatLngToGeoId(39930753, 116302895, 39.832277, 0) as geoId; select GeoIdToLatLng(855279270226, 39.832277, 0) as LatitudeAndLongitude; --Gives Wrong error message fro gridSize 0 select ToRangeList('116.321011 40.123503, 116.320311 40.122503,116.32 40.121503, 116.321011 40.123503', 39.832277, 0) as rangeList; Scenario 4: --Accepting Double values for GeoId select GeoIdToLatLng(8.55279270226, 39.832277, -50) as LatitudeAndLongitude; select ToUpperLayerGeoId(8.55279270226) as upperLayerGeoId; select GeoIdToGridXy(8.55279270226) as GridXY; Scanerio 5: --Accepting Invalid Values in All UDFs select GeoIdToGridXy('X') as GridXY; select LatLngToGeoId('X', 'X', 'X', 'X') as geoId; select GeoIdToLatLng('X', 'X', 'X') as LatitudeAndLongitude; select ToUpperLayerGeoId('X') as upperLayerGeoId; select ToRangeList('116.321011 40.123503, 116.320311 40.122503,116.32 40.121503, 116.321011 40.123503', 39.832277, 'X') as rangeList; select ToRangeList('116.321011 40.123503, 116.320311 40.122503,116.32 40.121503, 116.321011 40.123503', 'X', 50) as rangeList; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4167) Case sensitive issues in Geospatial index support
SHREELEKHYA GAMPA created CARBONDATA-4167: - Summary: Case sensitive issues in Geospatial index support Key: CARBONDATA-4167 URL: https://issues.apache.org/jira/browse/CARBONDATA-4167 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA 1) create table source_index(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS carbondata TBLPROPERTIES ('SPATIAL_INDEX.MYGEOHASH.type'='geohash','SPATIAL_INDEX.MYGEOHASH.sourcecolumns'='longitude, latitude','SPATIAL_INDEX.MYGEOHASH.originLatitude'='39.930753','SPATIAL_INDEX.MYGEOHASH.gridSize'='50','SPATIAL_INDEX'='MYGEOHASH','SPATIAL_INDEX.MYGEOHASH.conversionRatio'='100'); properties are being case sensitive: [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/5/c71035/9d1b20f836a048909679864f0c0fb4d8/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/5/c71035/9d1b20f836a048909679864f0c0fb4d8/image.png] 2) select query with lower case in Query UDFs fails [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/9/c71035/d28803c9cbc54db997b9d28015d59bee/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/9/c71035/d28803c9cbc54db997b9d28015d59bee/image.png] select query with with lower case linestring in Polyline UDf does not return any value. [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/9/c71035/0b1fe6bdd8454079bd52047f2c72bda4/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/9/c71035/0b1fe6bdd8454079bd52047f2c72bda4/image.png] Select query with lower case rangelist in IN_POLYGON_RANGE_LIST UDF returns no value. [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/9/c71035/d4d19f80d4d740b99bc5a0b6859f/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/9/c71035/d4d19f80d4d740b99bc5a0b6859f/image.png] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4161) Describe columns
SHREELEKHYA GAMPA created CARBONDATA-4161: - Summary: Describe columns Key: CARBONDATA-4161 URL: https://issues.apache.org/jira/browse/CARBONDATA-4161 Project: CarbonData Issue Type: New Feature Reporter: SHREELEKHYA GAMPA {{The DESCRIBE output can be formatted to avoid long lines for multiple fields. We can pass the column name to the command and visualize its structure with child fields.}} {{DESCRIBE COLUMN fieldname ON [db_name.]table_name; DESCRIBE short [db_name.]table_name;}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4149) Query with SI after add partition based on location on partition table gives incorrect results
SHREELEKHYA GAMPA created CARBONDATA-4149: - Summary: Query with SI after add partition based on location on partition table gives incorrect results Key: CARBONDATA-4149 URL: https://issues.apache.org/jira/browse/CARBONDATA-4149 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Queries to execute: * drop table if exists partitionTable; * create table partitionTable (id int,name String) partitioned by(email string) stored as carbondata; * insert into partitionTable select 1,'blue','abc'; * CREATE INDEX maintable_si112 on table partitionTable (name) as 'carbondata'; * alter table partitionTable add partition (email='def') location '$sdkWritePath'; * select *from partitionTable where name = 'red'; ---> returns empty result * select *from partitionTable where ni(name = 'red'); * alter table partitionTable compact 'major'; * select *from partitionTable where name = 'red'; spark-sql> create table partitionTable (id int,name String) partitioned by(email string) STORED AS carbondata; Time taken: 1.962 seconds spark-sql> CREATE INDEX maintable_si112 on table partitionTable (name) as 'carbondata'; Time taken: 2.759 seconds spark-sql> insert into partitionTable select 1,'huawei','abc'; 0 Time taken: 5.808 seconds, Fetched 1 row(s) spark-sql> alter table partitionTable add partition (email='def') location 'hdfs://hacluster/datastore'; Time taken: 1.108 seconds spark-sql> insert into partitionTable select 1,'huawei','def'; 1 Time taken: 2.707 seconds, Fetched 1 row(s) spark-sql> select *from partitionTable where name='huawei'; 1 huawei abc Time taken: 0.75 seconds, Fetched 1 row(s) spark-sql> select *from partitionTable where ni(name='huawei'); 1 huawei def 1 huawei abc Time taken: 0.507 seconds, Fetched 2 row(s) spark-sql> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4143) UT with index server
SHREELEKHYA GAMPA created CARBONDATA-4143: - Summary: UT with index server Key: CARBONDATA-4143 URL: https://issues.apache.org/jira/browse/CARBONDATA-4143 Project: CarbonData Issue Type: Improvement Reporter: SHREELEKHYA GAMPA To enable to run UT with index server using flag {{useIndexServer.}} excluded some of the test cases to not run with index server. added test case with prepriming. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4074) Should clean stale data in success segments
[ https://issues.apache.org/jira/browse/CARBONDATA-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295933#comment-17295933 ] SHREELEKHYA GAMPA commented on CARBONDATA-4074: --- can also include, 4. clean stale index files 5. clean stale segment files with retention time. > Should clean stale data in success segments > --- > > Key: CARBONDATA-4074 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4074 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Major > > cleaning stale data in success segments include the following parts. > 1. clean stale delete delta (when force is true) > 2. clean stale small files for index table > 3. clean stale data files for loading/compaction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4037) Improve the table status and segment file writing
[ https://issues.apache.org/jira/browse/CARBONDATA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4037: -- Attachment: Improve table status and segment file writing_1.docx > Improve the table status and segment file writing > - > > Key: CARBONDATA-4037 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4037 > Project: CarbonData > Issue Type: Improvement >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Attachments: Improve table status and segment file writing_1.docx > > Time Spent: 15h 40m > Remaining Estimate: 0h > > Currently, we update table status and segment files multiple times for a > single iud/merge/compact operation and delete the index files immediately > after merge. When concurrent queries are run, there may be situations like > user query is trying to access the segment index files and they are not > present, which is availability issue. > * To solve above issue, we can make mergeindex files generation mandatory > and fail load/compaction if mergeindex fails. Then if merge index is success, > update table status file and can delete index files immediately. However, in > legacy stores when alter segment merge is called, after merge index success, > do not delete index files immediately as it may cause issues for parallel > queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4037) Improve the table status and segment file writing
[ https://issues.apache.org/jira/browse/CARBONDATA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4037: -- Attachment: (was: Improve table status and segment file writing_1.docx) > Improve the table status and segment file writing > - > > Key: CARBONDATA-4037 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4037 > Project: CarbonData > Issue Type: Improvement >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Time Spent: 15h 40m > Remaining Estimate: 0h > > Currently, we update table status and segment files multiple times for a > single iud/merge/compact operation and delete the index files immediately > after merge. When concurrent queries are run, there may be situations like > user query is trying to access the segment index files and they are not > present, which is availability issue. > * To solve above issue, we can make mergeindex files generation mandatory > and fail load/compaction if mergeindex fails. Then if merge index is success, > update table status file and can delete index files immediately. However, in > legacy stores when alter segment merge is called, after merge index success, > do not delete index files immediately as it may cause issues for parallel > queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4037) Improve the table status and segment file writing
[ https://issues.apache.org/jira/browse/CARBONDATA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4037: -- Attachment: Improve table status and segment file writing_1.docx > Improve the table status and segment file writing > - > > Key: CARBONDATA-4037 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4037 > Project: CarbonData > Issue Type: Improvement >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Attachments: Improve table status and segment file writing_1.docx > > Time Spent: 15h 40m > Remaining Estimate: 0h > > Currently, we update table status and segment files multiple times for a > single iud/merge/compact operation and delete the index files immediately > after merge. When concurrent queries are run, there may be situations like > user query is trying to access the segment index files and they are not > present, which is availability issue. > * To solve above issue, we can make mergeindex files generation mandatory > and fail load/compaction if mergeindex fails. Then if merge index is success, > update table status file and can delete index files immediately. However, in > legacy stores when alter segment merge is called, after merge index success, > do not delete index files immediately as it may cause issues for parallel > queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4037) Improve the table status and segment file writing
[ https://issues.apache.org/jira/browse/CARBONDATA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4037: -- Description: Currently, we update table status and segment files multiple times for a single iud/merge/compact operation and delete the index files immediately after merge. When concurrent queries are run, there may be situations like user query is trying to access the segment index files and they are not present, which is availability issue. * To solve above issue, we can make mergeindex files generation mandatory and fail load/compaction if mergeindex fails. Then if merge index is success, update table status file and can delete index files immediately. However, in legacy stores when alter segment merge is called, after merge index success, do not delete index files immediately as it may cause issues for parallel queries. was: Currently, we update table status and segment files multiple times for a single iud/merge/compact operation and delete the index files immediately after merge. When concurrent queries are run, there may be situations like user query is trying to access the segment index files and they are not present, which is availability issue. * Instead of deleting carbon index files immediately after merge, delete index files only when clean files command is executed and delete only those that have existed for more than 1 hour. * Generate segment file after merge index and update table status at beginning and after merge index. order: create table status file => index files => merge index => generate segment file => update table status > Improve the table status and segment file writing > - > > Key: CARBONDATA-4037 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4037 > Project: CarbonData > Issue Type: Improvement >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Time Spent: 15h 40m > Remaining Estimate: 0h > > Currently, we update table status and segment files multiple times for a > single iud/merge/compact operation and delete the index files immediately > after merge. When concurrent queries are run, there may be situations like > user query is trying to access the segment index files and they are not > present, which is availability issue. > * To solve above issue, we can make mergeindex files generation mandatory > and fail load/compaction if mergeindex fails. Then if merge index is success, > update table status file and can delete index files immediately. However, in > legacy stores when alter segment merge is called, after merge index success, > do not delete index files immediately as it may cause issues for parallel > queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4133) Concurrent Insert Overwrite with static partition on Index server fails
SHREELEKHYA GAMPA created CARBONDATA-4133: - Summary: Concurrent Insert Overwrite with static partition on Index server fails Key: CARBONDATA-4133 URL: https://issues.apache.org/jira/browse/CARBONDATA-4133 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA [Steps] :- with Index Server running execute the concurrent insert overwrite with static partition. Set 0: CREATE TABLE if not exists uniqdata_string(CUST_ID int,CUST_NAME String,DOB timestamp,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10),DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) PARTITIONED BY(ACTIVE_EMUI_VERSION string) STORED AS carbondata TBLPROPERTIES ('TABLE_BLOCKSIZE'= '256 MB'); Set 1: LOAD DATA INPATH 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into table uniqdata_string partition(active_emui_version='abc') OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); LOAD DATA INPATH 'hdfs://hacluster/datasets/2000_UniqData.csv' into table uniqdata_string partition(active_emui_version='abc') OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); Set 2: CREATE TABLE if not exists uniqdata_hive (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 int)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; load data local inpath "/opt/csv/2000_UniqData.csv" into table uniqdata_hive; Set 3: (concurrent) insert overwrite table uniqdata_string partition(active_emui_version='abc') select CUST_ID, CUST_NAME,DOB,doj, bigint_column1, bigint_column2, decimal_column1, decimal_column2,double_column1, double_column2,integer_column1 from uniqdata_hive limit 10; insert overwrite table uniqdata_string partition(active_emui_version='abc') select CUST_ID, CUST_NAME,DOB,doj, bigint_column1, bigint_column2, decimal_column1, decimal_column2,double_column1, double_column2,integer_column1 from uniqdata_hive limit 10; [Expected Result] :- Insert should be success for timestamp data in Hive Carbon partition table [Actual Issue] : - Concurrent Insert Overwrite with static partition on Index server fails [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/1/17/c71035/a40a6d6be1434b1db8e8c1c6f5a2e97b/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/1/17/c71035/a40a6d6be1434b1db8e8c1c6f5a2e97b/image.png] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4123) Bloom index query with Index server giving incorrect results
SHREELEKHYA GAMPA created CARBONDATA-4123: - Summary: Bloom index query with Index server giving incorrect results Key: CARBONDATA-4123 URL: https://issues.apache.org/jira/browse/CARBONDATA-4123 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Queries: create table and load data so that it can create >1 blocklet. spark-sql> select count(*) from test_rcd where city = 'city40'; 2021-02-04 22:13:29,759 | WARN | pool-24-thread-1 | It is not recommended to set off-heap working memory size less than 512MB, so setting default value to 512 | org.apache.carbondata.core.memory.UnsafeMemoryManager.(UnsafeMemoryManager.java:83) 10 Time taken: 2.417 seconds, Fetched 1 row(s) spark-sql> CREATE INDEX dm_rcd ON TABLE test_rcd (city) AS 'bloomfilter' properties ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); 2021-02-04 22:13:58,683 | AUDIT | main | \{"time":"February 4, 2021 10:13:58 PM CST","username":"carbon","opName":"CREATE INDEX","opId":"15148202700230273","opStatus":"START"} | carbon.audit.logOperationStart(Auditor.java:74) 2021-02-04 22:13:58,759 | WARN | main | Bloom compress is not configured for index dm_rcd, use default value true | org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) 2021-02-04 22:13:59,292 | WARN | Executor task launch worker for task 2 | Bloom compress is not configured for index dm_rcd, use default value true | org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) 2021-02-04 22:13:59,629 | WARN | main | Bloom compress is not configured for index dm_rcd, use default value true | org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) 2021-02-04 22:14:00,331 | AUDIT | main | \{"time":"February 4, 2021 10:14:00 PM CST","username":"carbon","opName":"CREATE INDEX","opId":"15148202700230273","opStatus":"SUCCESS","opTime":"1648 ms","table":"default.test_rcd","extraInfo":{"provider":"bloomfilter","indexName":"dm_rcd","bloom_size":"64","bloom_fpp":"0.1"}} | carbon.audit.logOperationEnd(Auditor.java:97) Time taken: 1.818 seconds spark-sql> select count(*) from test_rcd where city = 'city40'; 30 Time taken: 0.556 seconds, Fetched 1 row(s) spark-sql> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4117) Test cg index query with Index server fails with NPE
SHREELEKHYA GAMPA created CARBONDATA-4117: - Summary: Test cg index query with Index server fails with NPE Key: CARBONDATA-4117 URL: https://issues.apache.org/jira/browse/CARBONDATA-4117 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Test queries to execute: spark-sql> CREATE TABLE index_test_cg(id INT, name STRING, city STRING, age INT) STORED AS carbondata TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='LOCAL_SORT'); spark-sql> create index cgindex on table index_test_cg (name) as 'org.apache.carbondata.spark.testsuite.index.CGIndexFactory'; LOAD DATA LOCAL INPATH '$file2' INTO TABLE index_test_cg OPTIONS('header'='false') spark-sql> select * from index_test_cg where name='n502670'; 2021-01-29 15:09:25,881 | ERROR | main | Exception occurred while getting splits using index server. Initiating Fallback to embedded mode | org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:454) java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy69.getSplits(Unknown Source) at org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:85) at org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:59) at org.apache.carbondata.spark.util.CarbonScalaUtil$.logTime(CarbonScalaUtil.scala:769) at org.apache.carbondata.indexserver.DistributedIndexJob.execute(IndexJobs.scala:58) at org.apache.carbondata.core.index.IndexUtil.executeIndexJob(IndexUtil.java:307) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:443) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:555) at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:500) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:357) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:205) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:159) at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:68) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:989) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:384) at org.apache.spark.rdd.RDD.collect(RDD.scala:988) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:345) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372) at org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:63) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:383) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:277) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDrive
[jira] [Created] (CARBONDATA-4113) Partition query results invalid when carbon.read.partition.hive.direct is disabled
SHREELEKHYA GAMPA created CARBONDATA-4113: - Summary: Partition query results invalid when carbon.read.partition.hive.direct is disabled Key: CARBONDATA-4113 URL: https://issues.apache.org/jira/browse/CARBONDATA-4113 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA set 'carbon.read.partition.hive.direct' to false. queries to execute: create table partition_cache(a string) partitioned by(b int) stored as carbondata insert into partition_cache select 'k',1; insert into partition_cache select 'k',1; insert into partition_cache select 'k',2; insert into partition_cache select 'k',2; alter table partition_cache compact 'minor'; select *from partition_cache; => no results -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4111) Filter query having invalid results after add segment to table having SI with Indexserver
[ https://issues.apache.org/jira/browse/CARBONDATA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4111: -- Description: queries to execute: create table maintable_sdk(a string, b int, c string) stored as carbondata; insert into maintable_sdk select 'k',1,'k'; insert into maintable_sdk select 'l',2,'l'; CREATE INDEX maintable_si_sdk on table maintable_sdk (c) as 'carbondata'; alter table maintable_sdk add segment options('path'='hdfs://hacluster/sdkfiles/newsegment/', 'format'='carbon'); spark-sql> select *from maintable_sdk where c='m'; 2021-01-27 12:10:54,326 | WARN | IPC Client (653337757) connection to linux-30/10.19.90.30:22900 from car...@hadoop.com | Unexpected error reading responses on connection Thread[IPC Client (653337757) connection to linux-30/10.19.90.30:22900 from car...@hadoop.com,5,main] | org.apache.hadoop.ipc.Client.run(Client.java:1113) java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.carbondata.core.indexstore.SegmentWrapperContainer.() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135) at org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.java:58) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:284) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77) at org.apache.hadoop.ipc.RpcWritable$WritableWrapper.readFrom(RpcWritable.java:85) at org.apache.hadoop.ipc.RpcWritable$Buffer.getValue(RpcWritable.java:187) at org.apache.hadoop.ipc.RpcWritable$Buffer.newInstance(RpcWritable.java:183) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1223) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1107) Caused by: java.lang.NoSuchMethodException: org.apache.carbondata.core.indexstore.SegmentWrapperContainer.() at java.lang.Class.getConstructor0(Class.java:3082) at java.lang.Class.getDeclaredConstructor(Class.java:2178) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 8 more 2021-01-27 12:10:54,330 | WARN | main | Distributed Segment Pruning failed, initiating embedded pruning | org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:349) java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy59.getPrunedSegments(Unknown Source) at org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:341) at org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:426) at org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.partitions$lzycompute(BroadCastSIFilterPushJoin.scala:80) at org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.partitions(BroadCastSIFilterPushJoin.scala:78) at org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.inputCopy$lzycompute(BroadCastSIFilterPushJoin.scala:94) at org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.inputCopy(BroadCastSIFilterPushJoin.scala:93) at org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.doExecute(BroadCastSIFilterPushJoin.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:177) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:201) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:198) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:173) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:293) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:342) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372) at org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:63) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(S
[jira] [Updated] (CARBONDATA-4111) Filter query having invalid results after add segment to table having SI with Indexserver
[ https://issues.apache.org/jira/browse/CARBONDATA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4111: -- Summary: Filter query having invalid results after add segment to table having SI with Indexserver (was: Filter query having invalid results when add segment to SI with Indexserver) > Filter query having invalid results after add segment to table having SI with > Indexserver > - > > Key: CARBONDATA-4111 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4111 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Attachments: addseg_si_is.png > > > queries to execute: > create table maintable_sdk(a string, b int, c string) stored as carbondata; > insert into maintable_sdk select 'k',1,'k'; > insert into maintable_sdk select 'l',2,'l'; > CREATE INDEX maintable_si_sdk on table maintable_sdk (c) as 'carbondata'; > alter table maintable_sdk add segment > options('path'='hdfs://hacluster/sdkfiles/newsegment/', 'format'='carbon'); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4111) Filter query having invalid results when add segment to SI with Indexserver
SHREELEKHYA GAMPA created CARBONDATA-4111: - Summary: Filter query having invalid results when add segment to SI with Indexserver Key: CARBONDATA-4111 URL: https://issues.apache.org/jira/browse/CARBONDATA-4111 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Attachments: addseg_si_is.png queries to execute: create table maintable_sdk(a string, b int, c string) stored as carbondata; insert into maintable_sdk select 'k',1,'k'; insert into maintable_sdk select 'l',2,'l'; CREATE INDEX maintable_si_sdk on table maintable_sdk (c) as 'carbondata'; alter table maintable_sdk add segment options('path'='hdfs://hacluster/sdkfiles/newsegment/', 'format'='carbon'); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4111) Filter query having invalid results when add segment to SI with Indexserver
[ https://issues.apache.org/jira/browse/CARBONDATA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4111: -- Attachment: addseg_si_is.png > Filter query having invalid results when add segment to SI with Indexserver > --- > > Key: CARBONDATA-4111 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4111 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Attachments: addseg_si_is.png > > > queries to execute: > create table maintable_sdk(a string, b int, c string) stored as carbondata; > insert into maintable_sdk select 'k',1,'k'; > insert into maintable_sdk select 'l',2,'l'; > CREATE INDEX maintable_si_sdk on table maintable_sdk (c) as 'carbondata'; > alter table maintable_sdk add segment > options('path'='hdfs://hacluster/sdkfiles/newsegment/', 'format'='carbon'); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4096) SDK read fails from cluster and sdk read filter query on sort column giving wrong result with IndexServer
SHREELEKHYA GAMPA created CARBONDATA-4096: - Summary: SDK read fails from cluster and sdk read filter query on sort column giving wrong result with IndexServer Key: CARBONDATA-4096 URL: https://issues.apache.org/jira/browse/CARBONDATA-4096 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Attachments: image-2020-12-22-18-54-52-361.png, wrongresults_with_IS.PNG Test write sdk and read with spark. Queries to reproduce: put written sdk files in $warehouse/sdk path - contains .carbondata and .index files. +From spark-sql:+ create table sdkout using carbon options(path='$warehouse/sdk'); select * from sdkout where salary = 100; !image-2020-12-22-18-54-52-361.png|width=744,height=279! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4078) add external segment and query with index server fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-4078: -- Attachment: is_noncarbonsegments stacktrace > add external segment and query with index server fails > -- > > Key: CARBONDATA-4078 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4078 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Attachments: is_noncarbonsegments stacktrace > > > index server tries to cache parquet/orc segments and fails as it cannot read > the file format when the fallback mode is disabled. > Ex: 'test parquet table' test case > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4078) add external segment and query with index server fails
SHREELEKHYA GAMPA created CARBONDATA-4078: - Summary: add external segment and query with index server fails Key: CARBONDATA-4078 URL: https://issues.apache.org/jira/browse/CARBONDATA-4078 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Attachments: is_noncarbonsegments stacktrace index server tries to cache parquet/orc segments and fails as it cannot read the file format when the fallback mode is disabled. Ex: 'test parquet table' test case -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CARBONDATA-3970) Carbondata 2.0.1 MV ERROR CarbonInternalMetastore$: Adding/Modifying tableProperties operation failed
[ https://issues.apache.org/jira/browse/CARBONDATA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209365#comment-17209365 ] SHREELEKHYA GAMPA edited comment on CARBONDATA-3970 at 10/21/20, 8:07 AM: -- Hi [~sushantsam], Could you provide the spark configurations set particularly related to metastore, and complete stacktrace of error. Please ensure carbon extensions are configured in spark-defaults.conf Like, 'spark.sql.extensions=org.apache.spark.sql.CarbonExtensions' . was (Author: shreelekhya): Hi [~sushantsam], Could you please provide the spark configurations set particularly related to metastore, and complete stacktrace of error. > Carbondata 2.0.1 MV ERROR CarbonInternalMetastore$: Adding/Modifying > tableProperties operation failed > -- > > Key: CARBONDATA-3970 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3970 > Project: CarbonData > Issue Type: Bug > Components: data-query, hive-integration >Affects Versions: 2.0.1 > Environment: CarbonData 2.0.1 with Spark 2.4.5 >Reporter: Sushant Sammanwar >Priority: Major > > Hi , > > I am facing issues with materialized views - the query is not hitting the > view in the explain plan .I would really appreciate if you could help me on > this. > Below are the details : > I am using Spark shell to connect to Carbon 2.0.1 using spark 2.4.5 > Underlying table has data loaded. > I think problem is while create materialized view as i am getting a error > related to metastore. > > > scala> carbon.sql("create MATERIALIZED VIEW agg_sales_mv as select country, > sex,sum(quantity),avg(price) from sales group by country,sex").show() > 20/08/26 01:04:41 AUDIT audit: \{"time":"August 26, 2020 1:04:41 AM > IST","username":"root","opName":"CREATE MATERIALIZED > VIEW","opId":"16462372696035311","opStatus":"START"} > 20/08/26 01:04:45 AUDIT audit: \{"time":"August 26, 2020 1:04:45 AM > IST","username":"root","opName":"CREATE > TABLE","opId":"16462377160819798","opStatus":"START"} > 20/08/26 01:04:46 AUDIT audit: \{"time":"August 26, 2020 1:04:46 AM > IST","username":"root","opName":"CREATE > TABLE","opId":"16462377696791275","opStatus":"START"} > 20/08/26 01:04:48 AUDIT audit: \{"time":"August 26, 2020 1:04:48 AM > IST","username":"root","opName":"CREATE > TABLE","opId":"16462377696791275","opStatus":"SUCCESS","opTime":"2326 > ms","table":"NA","extraInfo":{}} > 20/08/26 01:04:48 AUDIT audit: \{"time":"August 26, 2020 1:04:48 AM > IST","username":"root","opName":"CREATE > TABLE","opId":"16462377160819798","opStatus":"SUCCESS","opTime":"2955 > ms","table":"default.agg_sales_mv","extraInfo":{"local_dictionary_threshold":"1","bad_record_path":"","table_blocksize":"1024","local_dictionary_enable":"true","flat_folder":"false","external":"false","sort_columns":"","comment":"","carbon.column.compressor":"snappy","mv_related_tables":"sales"}} > 20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying > tableProperties operation failed: > org.apache.spark.sql.hive.HiveExternalCatalog cannot be cast to > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener > 20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying > tableProperties operation failed: > org.apache.spark.sql.hive.HiveExternalCatalog cannot be cast to > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener > 20/08/26 01:04:51 AUDIT audit: \{"time":"August 26, 2020 1:04:51 AM > IST","username":"root","opName":"CREATE MATERIALIZED > VIEW","opId":"16462372696035311","opStatus":"SUCCESS","opTime":"10551 > ms","table":"NA","extraInfo":{"mvName":"agg_sales_mv"}} > ++ > || > ++ > ++ > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4037) Improve the table status and segment file writing
SHREELEKHYA GAMPA created CARBONDATA-4037: - Summary: Improve the table status and segment file writing Key: CARBONDATA-4037 URL: https://issues.apache.org/jira/browse/CARBONDATA-4037 Project: CarbonData Issue Type: Improvement Reporter: SHREELEKHYA GAMPA Currently, we update table status and segment files multiple times for a single iud/merge/compact operation and delete the index files immediately after merge. When concurrent queries are run, there may be situations like user query is trying to access the segment index files and they are not present, which is availability issue. * Instead of deleting carbon index files immediately after merge, delete index files only when clean files command is executed and delete only those that have existed for more than 1 hour. * Generate segment file after merge index and update table status at beginning and after merge index. order: create table status file => index files => merge index => generate segment file => update table status -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3903) Documentation Issue in Github Docs Link https://github.com/apache/carbondata/tree/master/docs
[ https://issues.apache.org/jira/browse/CARBONDATA-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212872#comment-17212872 ] SHREELEKHYA GAMPA commented on CARBONDATA-3903: --- Made changes in UPDAT/DELETE section as suggested and added compaction hyperlink in DML section of language manual. Other information is either already present in other documents or not necessary. > Documentation Issue in Github Docs Link > https://github.com/apache/carbondata/tree/master/docs > -- > > Key: CARBONDATA-3903 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3903 > Project: CarbonData > Issue Type: Bug > Components: docs >Affects Versions: 2.0.1 > Environment: https://github.com/apache/carbondata/tree/master/docs >Reporter: PURUJIT CHAUGULE >Priority: Minor > > dml-of-carbondata.md > LOAD DATA: > * Mention Each Load is considered as a Segment. > * Give all possible options for SORT_SCOPE like > GLOBAL_SORT/LOCAL_SORT/NO_SORT (with explanation of difference between each > type). > * Add Example Of complete Load query with/without use of OPTIONS. > INSERT DATA: > * Mention each insert is a Segment. > LOAD Using Static/Dynamic Partitioning: > * Can give a hyperlink to Static/Dynamic partitioning. > UPDATE/DELETE: > * Mention about delta files concept in update and delete. > DELETE: > * Add example for deletion of all records from a table (delete from > tablename). > COMPACTION: > * Can mention Minor compaction of two types Auto and Manual( > carbon.auto.load.merge =true/false), and that if > carbon.auto.load.merge=false, trigger should be done manually. > * Hyperlink to Configurable properties of Compaction. > * Mention that compacted segments do not get cleaned automatically and > should be triggered manually using clean files. > > flink-integration-guide.md > * Mention what are stages, how is it used. > * Process of insertion, deletion of stages in carbontable. (How is it stored > in carbontable). > > language-manual.md > * Mention Compaction Hyperlink in DML section. > > spatial-index-guide.md > * Mention the TBLPROPERTIES supported / not supported for Geo table. > * Mention Spatial Index does not make a new column. > * CTAS from one geo table to another does not create another Geo table can > be mentioned. > * Mention that a certain combination of Spatial Index table properties need > to be added in create table, without which a geo table does not get created. > * Mention that we cannot alter columns (change datatype, change name, drop) > mentioned in spatial_index. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3970) Carbondata 2.0.1 MV ERROR CarbonInternalMetastore$: Adding/Modifying tableProperties operation failed
[ https://issues.apache.org/jira/browse/CARBONDATA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209365#comment-17209365 ] SHREELEKHYA GAMPA commented on CARBONDATA-3970: --- Hi [~sushantsam], Could you please provide the spark configurations set particularly related to metastore, and complete stacktrace of error. > Carbondata 2.0.1 MV ERROR CarbonInternalMetastore$: Adding/Modifying > tableProperties operation failed > -- > > Key: CARBONDATA-3970 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3970 > Project: CarbonData > Issue Type: Bug > Components: data-query, hive-integration >Affects Versions: 2.0.1 > Environment: CarbonData 2.0.1 with Spark 2.4.5 >Reporter: Sushant Sammanwar >Priority: Major > > Hi , > > I am facing issues with materialized views - the query is not hitting the > view in the explain plan .I would really appreciate if you could help me on > this. > Below are the details : > I am using Spark shell to connect to Carbon 2.0.1 using spark 2.4.5 > Underlying table has data loaded. > I think problem is while create materialized view as i am getting a error > related to metastore. > > > scala> carbon.sql("create MATERIALIZED VIEW agg_sales_mv as select country, > sex,sum(quantity),avg(price) from sales group by country,sex").show() > 20/08/26 01:04:41 AUDIT audit: \{"time":"August 26, 2020 1:04:41 AM > IST","username":"root","opName":"CREATE MATERIALIZED > VIEW","opId":"16462372696035311","opStatus":"START"} > 20/08/26 01:04:45 AUDIT audit: \{"time":"August 26, 2020 1:04:45 AM > IST","username":"root","opName":"CREATE > TABLE","opId":"16462377160819798","opStatus":"START"} > 20/08/26 01:04:46 AUDIT audit: \{"time":"August 26, 2020 1:04:46 AM > IST","username":"root","opName":"CREATE > TABLE","opId":"16462377696791275","opStatus":"START"} > 20/08/26 01:04:48 AUDIT audit: \{"time":"August 26, 2020 1:04:48 AM > IST","username":"root","opName":"CREATE > TABLE","opId":"16462377696791275","opStatus":"SUCCESS","opTime":"2326 > ms","table":"NA","extraInfo":{}} > 20/08/26 01:04:48 AUDIT audit: \{"time":"August 26, 2020 1:04:48 AM > IST","username":"root","opName":"CREATE > TABLE","opId":"16462377160819798","opStatus":"SUCCESS","opTime":"2955 > ms","table":"default.agg_sales_mv","extraInfo":{"local_dictionary_threshold":"1","bad_record_path":"","table_blocksize":"1024","local_dictionary_enable":"true","flat_folder":"false","external":"false","sort_columns":"","comment":"","carbon.column.compressor":"snappy","mv_related_tables":"sales"}} > 20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying > tableProperties operation failed: > org.apache.spark.sql.hive.HiveExternalCatalog cannot be cast to > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener > 20/08/26 01:04:50 ERROR CarbonInternalMetastore$: Adding/Modifying > tableProperties operation failed: > org.apache.spark.sql.hive.HiveExternalCatalog cannot be cast to > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener > 20/08/26 01:04:51 AUDIT audit: \{"time":"August 26, 2020 1:04:51 AM > IST","username":"root","opName":"CREATE MATERIALIZED > VIEW","opId":"16462372696035311","opStatus":"SUCCESS","opTime":"10551 > ms","table":"NA","extraInfo":{"mvName":"agg_sales_mv"}} > ++ > || > ++ > ++ > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3972) Date/timestamp compatability between hive and carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA closed CARBONDATA-3972. - Resolution: Invalid > Date/timestamp compatability between hive and carbon > - > > Key: CARBONDATA-3972 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3972 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > To ensure the date/timestamp that is supported by hive also to be supported > by carbon. > Ex: -01-01 is accepted by hive as a valid record and converted to > 0001-01-01. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4023) Create MV failed on table with geospatial index
SHREELEKHYA GAMPA created CARBONDATA-4023: - Summary: Create MV failed on table with geospatial index Key: CARBONDATA-4023 URL: https://issues.apache.org/jira/browse/CARBONDATA-4023 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Create MV failed on the table with geospatial index using carbonsession. Failed with, java.lang.ClassNotFoundException: org.apache.carbondata.geo.geohashindex -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4005) SI with cache level blocklet issue
SHREELEKHYA GAMPA created CARBONDATA-4005: - Summary: SI with cache level blocklet issue Key: CARBONDATA-4005 URL: https://issues.apache.org/jira/browse/CARBONDATA-4005 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Select query on SI column returns blank resultset after changing the cache level to blocklet PR: https://github.com/apache/carbondata/pull/3951 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3952) After reset query not hitting MV
[ https://issues.apache.org/jira/browse/CARBONDATA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA closed CARBONDATA-3952. - Resolution: Fixed > After reset query not hitting MV > > > Key: CARBONDATA-3952 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3952 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > After reset query not hitting MV. > With the reset, spark.sql.warehouse.dir and carbonStorePath don't match and > the databaseLocation will change to old table path format. So, new tables > that are created after reset, take a different path incase of default. > Closing this , as it is identified as spark bug. More details can be found at > https://issues.apache.org/jira/browse/SPARK-31234 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3952) After reset query not hitting MV
[ https://issues.apache.org/jira/browse/CARBONDATA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3952: -- Description: After reset query not hitting MV. With the reset, spark.sql.warehouse.dir and carbonStorePath don't match and the databaseLocation will change to old table path format. So, new tables that are created after reset, take a different path incase of default. Closing this , as it is identified as spark bug. More details can be found at https://issues.apache.org/jira/browse/SPARK-31234 was: After reset query not hitting MV. With the reset, spark.sql.warehouse.dir and carbonStorePath don't match and the databaseLocation will change to old table path format. So, new tables that are created after reset, take a different path incase of default. Closing this PR, as it is identified as spark bug. More details can be found at https://issues.apache.org/jira/browse/SPARK-31234 > After reset query not hitting MV > > > Key: CARBONDATA-3952 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3952 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > After reset query not hitting MV. > With the reset, spark.sql.warehouse.dir and carbonStorePath don't match and > the databaseLocation will change to old table path format. So, new tables > that are created after reset, take a different path incase of default. > Closing this , as it is identified as spark bug. More details can be found at > https://issues.apache.org/jira/browse/SPARK-31234 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3952) After reset query not hitting MV
[ https://issues.apache.org/jira/browse/CARBONDATA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3952: -- Description: After reset query not hitting MV. With the reset, spark.sql.warehouse.dir and carbonStorePath don't match and the databaseLocation will change to old table path format. So, new tables that are created after reset, take a different path incase of default. Closing this PR, as it is identified as spark bug. More details can be found at https://issues.apache.org/jira/browse/SPARK-31234 was:After reset query not hitting MV > After reset query not hitting MV > > > Key: CARBONDATA-3952 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3952 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > After reset query not hitting MV. > With the reset, spark.sql.warehouse.dir and carbonStorePath don't match and > the databaseLocation will change to old table path format. So, new tables > that are created after reset, take a different path incase of default. > Closing this PR, as it is identified as spark bug. More details can be found > at https://issues.apache.org/jira/browse/SPARK-31234 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3983) SI compatability issue
SHREELEKHYA GAMPA created CARBONDATA-3983: - Summary: SI compatability issue Key: CARBONDATA-3983 URL: https://issues.apache.org/jira/browse/CARBONDATA-3983 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Read from maintable having SI returns empty resultset when SI is stored with old tuple id storage format. Bug id: BUG2020090205414 PR link: https://github.com/apache/carbondata/pull/3922 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3980) Load fails with aborted exception when Bad records action is unspecified
[ https://issues.apache.org/jira/browse/CARBONDATA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3980: -- Description: When the partition column is loaded with a bad record value, load fails with 'Job aborted' message in cluster. However in complete stack trace we can see the actual error message. ('Data load failed due to bad record: The value with column name projectjoindate and column data type TIMESTAMP is not a valid TIMESTAMP type') Bug id: BUG2020082802430 PR link: https://github.com/apache/carbondata/pull/3919 was: When the partition column is loaded with a bad record value, load fails with 'Job aborted' message in cluster. However in complete stack trace we can see the actual error message. ('Data load failed due to bad record: The value with column name projectjoindate and column data type TIMESTAMP is not a valid TIMESTAMP type') Bug id: BUG2020082802430 Remaining Estimate: (was: 0h) > Load fails with aborted exception when Bad records action is unspecified > > > Key: CARBONDATA-3980 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3980 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Time Spent: 10m > > When the partition column is loaded with a bad record value, load fails with > 'Job aborted' message in cluster. However in complete stack trace we can see > the actual error message. ('Data load failed due to bad record: The value > with column name projectjoindate and column data type TIMESTAMP is not a > valid TIMESTAMP type') > Bug id: BUG2020082802430 > PR link: https://github.com/apache/carbondata/pull/3919 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3980) Load fails with aborted exception when Bad records action is unspecified
SHREELEKHYA GAMPA created CARBONDATA-3980: - Summary: Load fails with aborted exception when Bad records action is unspecified Key: CARBONDATA-3980 URL: https://issues.apache.org/jira/browse/CARBONDATA-3980 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA When the partition column is loaded with a bad record value, load fails with 'Job aborted' message in cluster. However in complete stack trace we can see the actual error message. ('Data load failed due to bad record: The value with column name projectjoindate and column data type TIMESTAMP is not a valid TIMESTAMP type') Bug id: BUG2020082802430 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3979) Added Hive local dictionary support example
SHREELEKHYA GAMPA created CARBONDATA-3979: - Summary: Added Hive local dictionary support example Key: CARBONDATA-3979 URL: https://issues.apache.org/jira/browse/CARBONDATA-3979 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA To verify local dictionary support in hive for the carbon tables created from spark. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3972) Date/timestamp compatability between hive and carbon
SHREELEKHYA GAMPA created CARBONDATA-3972: - Summary: Date/timestamp compatability between hive and carbon Key: CARBONDATA-3972 URL: https://issues.apache.org/jira/browse/CARBONDATA-3972 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA To ensure the date/timestamp that is supported by hive also to be supported by carbon. Ex: -01-01 is accepted by hive as a valid record and converted to 0001-01-01. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3955) Fix load failures due to daylight saving time changes
SHREELEKHYA GAMPA created CARBONDATA-3955: - Summary: Fix load failures due to daylight saving time changes Key: CARBONDATA-3955 URL: https://issues.apache.org/jira/browse/CARBONDATA-3955 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA 1) Fix load failures due to daylight saving time changes. 2) During load, date/timestamp year values with >4 digit should fail or be null according to bad records action property. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3952) After reset query not hitting MV
SHREELEKHYA GAMPA created CARBONDATA-3952: - Summary: After reset query not hitting MV Key: CARBONDATA-3952 URL: https://issues.apache.org/jira/browse/CARBONDATA-3952 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA After reset query not hitting MV -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3943) Handling the addition of geo column to hive at the time of table creation
SHREELEKHYA GAMPA created CARBONDATA-3943: - Summary: Handling the addition of geo column to hive at the time of table creation Key: CARBONDATA-3943 URL: https://issues.apache.org/jira/browse/CARBONDATA-3943 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA Handling the addition of geo column to hive at the time of table creation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3943) Handling the addition of geo column to hive at the time of table creation
[ https://issues.apache.org/jira/browse/CARBONDATA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3943: -- Priority: Minor (was: Major) > Handling the addition of geo column to hive at the time of table creation > -- > > Key: CARBONDATA-3943 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3943 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > > Handling the addition of geo column to hive at the time of table creation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3913) Table level timestamp support
SHREELEKHYA GAMPA created CARBONDATA-3913: - Summary: Table level timestamp support Key: CARBONDATA-3913 URL: https://issues.apache.org/jira/browse/CARBONDATA-3913 Project: CarbonData Issue Type: New Feature Reporter: SHREELEKHYA GAMPA To support the timestamp format table level. The priority of timestamp format as: 1. Load command options 2. Table level properties 3. configurable properties (carbon.timestamp.format) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3899) drop materialized view when executed concurrently from 4 concurrent client fails in all 4 clients.
[ https://issues.apache.org/jira/browse/CARBONDATA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3899: -- Description: drop materialized view when executed concurrently from 4 concurrent client fails in all 4 clients from beeline. !screenshot-1.png! was:drop materialized view when executed concurrently from 4 concurrent client fails in all 4 clients. > drop materialized view when executed concurrently from 4 concurrent client > fails in all 4 clients. > -- > > Key: CARBONDATA-3899 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3899 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Major > Attachments: screenshot-1.png > > > drop materialized view when executed concurrently from 4 concurrent client > fails in all 4 clients from beeline. > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3899) drop materialized view when executed concurrently from 4 concurrent client fails in all 4 clients.
[ https://issues.apache.org/jira/browse/CARBONDATA-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3899: -- Attachment: screenshot-1.png > drop materialized view when executed concurrently from 4 concurrent client > fails in all 4 clients. > -- > > Key: CARBONDATA-3899 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3899 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Major > Attachments: screenshot-1.png > > > drop materialized view when executed concurrently from 4 concurrent client > fails in all 4 clients. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3899) drop materialized view when executed concurrently from 4 concurrent client fails in all 4 clients.
SHREELEKHYA GAMPA created CARBONDATA-3899: - Summary: drop materialized view when executed concurrently from 4 concurrent client fails in all 4 clients. Key: CARBONDATA-3899 URL: https://issues.apache.org/jira/browse/CARBONDATA-3899 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA drop materialized view when executed concurrently from 4 concurrent client fails in all 4 clients. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3833) Make GeoID visible to the user
[ https://issues.apache.org/jira/browse/CARBONDATA-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3833: -- Description: GeoID is a column created internally for spatial tables and currently it is not visible to the users while querying. This feature is to make GeoID visible to the user. (was: Make GeoID visible to the user) > Make GeoID visible to the user > -- > > Key: CARBONDATA-3833 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3833 > Project: CarbonData > Issue Type: New Feature >Reporter: SHREELEKHYA GAMPA >Priority: Minor > > GeoID is a column created internally for spatial tables and currently it is > not visible to the users while querying. This feature is to make GeoID > visible to the user. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3833) Make GeoID visible to the user
SHREELEKHYA GAMPA created CARBONDATA-3833: - Summary: Make GeoID visible to the user Key: CARBONDATA-3833 URL: https://issues.apache.org/jira/browse/CARBONDATA-3833 Project: CarbonData Issue Type: New Feature Reporter: SHREELEKHYA GAMPA Make GeoID visible to the user -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3772) Update index documents
SHREELEKHYA GAMPA created CARBONDATA-3772: - Summary: Update index documents Key: CARBONDATA-3772 URL: https://issues.apache.org/jira/browse/CARBONDATA-3772 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA PR: [https://github.com/apache/carbondata/pull/3708] -- This message was sent by Atlassian Jira (v8.3.4#803005)