[jira] [Resolved] (CARBONDATA-4262) [summer-2021] Huawei's first big data open source project
[ https://issues.apache.org/jira/browse/CARBONDATA-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4262. -- Resolution: Fixed > [summer-2021] Huawei's first big data open source project > - > > Key: CARBONDATA-4262 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4262 > Project: CarbonData > Issue Type: Task > Components: docs, examples, test >Affects Versions: 2.1.0, 2.1.1 >Reporter: CHEN XIN >Assignee: CHEN XIN >Priority: Minor > Fix For: 2.1.1 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (CARBONDATA-4329) External Table Creation overwrites schema and drop external table deletes the location data
[ https://issues.apache.org/jira/browse/CARBONDATA-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4329. -- Fix Version/s: 2.3.1 Resolution: Fixed > External Table Creation overwrites schema and drop external table deletes the > location data > --- > > Key: CARBONDATA-4329 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4329 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi >Priority: Major > Fix For: 2.3.1 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Issue 1: > When we create external table on transactional table location, schema file > will be present. While creating external table, which is also transactional, > the schema file is overwritten > Issue 2: > If external table is created on a location, where the source table already > exists, on drop external table, it is deleting the table data. Query on the > source table fails -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (CARBONDATA-4327) Update documentation related to partition
[ https://issues.apache.org/jira/browse/CARBONDATA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4327. -- Fix Version/s: 2.3.1 Resolution: Fixed > Update documentation related to partition > - > > Key: CARBONDATA-4327 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4327 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.3.1 > > Time Spent: 50m > Remaining Estimate: 0h > > Drop partition with data is not supported and a few of the links are not > working in > https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (CARBONDATA-4306) Query Performance issue with Spark 3.1
[ https://issues.apache.org/jira/browse/CARBONDATA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4306: - Fix Version/s: 2.3.1 (was: 2.3.0) > Query Performance issue with Spark 3.1 > -- > > Key: CARBONDATA-4306 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4306 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi >Priority: Major > Fix For: 2.3.1 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Some rules are applied many times while running benchmark queries like TPCDS > and TPCH -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (CARBONDATA-4306) Query Performance issue with Spark 3.1
[ https://issues.apache.org/jira/browse/CARBONDATA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor closed CARBONDATA-4306. > Query Performance issue with Spark 3.1 > -- > > Key: CARBONDATA-4306 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4306 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi >Priority: Major > Fix For: 2.3.1 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Some rules are applied many times while running benchmark queries like TPCDS > and TPCH -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (CARBONDATA-4318) Partition overwrite performance degrades as number of loads increase
[ https://issues.apache.org/jira/browse/CARBONDATA-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4318. -- Fix Version/s: 2.3.0 Resolution: Fixed > Partition overwrite performance degrades as number of loads increase > > > Key: CARBONDATA-4318 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4318 > Project: CarbonData > Issue Type: Improvement >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Partition overwrite performance degrades as the number of loads increase -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (CARBONDATA-4319) Fixed clean files not deleteting stale delete delta files after horizontal compaction
[ https://issues.apache.org/jira/browse/CARBONDATA-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4319. -- Fix Version/s: 2.3.0 Resolution: Fixed > Fixed clean files not deleteting stale delete delta files after horizontal > compaction > - > > Key: CARBONDATA-4319 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4319 > Project: CarbonData > Issue Type: Improvement >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (CARBONDATA-4317) TPCDS perf issues
[ https://issues.apache.org/jira/browse/CARBONDATA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4317. -- Fix Version/s: 2.3.0 Resolution: Fixed > TPCDS perf issues > - > > Key: CARBONDATA-4317 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4317 > Project: CarbonData > Issue Type: Improvement >Reporter: Indhumathi Muthumurugesh >Priority: Major > Fix For: 2.3.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > h3. > The following issues has degraded the TPCDS query performance > # If dynamic filters is not present in partitionFilters Set, then that > filter is skipped, to pushdown to spark. > # In some cases, some nodes like Exchange / Shuffle is not reused, because > the CarbonDataSourceSCan plan is not mached > # While accessing the metadata on the canonicalized plan throws NPE -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (CARBONDATA-4316) Horizontal compaction fails for partition table
[ https://issues.apache.org/jira/browse/CARBONDATA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4316. -- Resolution: Fixed > Horizontal compaction fails for partition table > --- > > Key: CARBONDATA-4316 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4316 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > when delete operation performed on partition table, the horizontal compaction > fails leading to lot of small delete delta files and impact query performance -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (CARBONDATA-4305) Support Carbondata Streamer tool to fetch data incrementally and merge
[ https://issues.apache.org/jira/browse/CARBONDATA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4305. -- Fix Version/s: 2.3.0 Resolution: Fixed > Support Carbondata Streamer tool to fetch data incrementally and merge > -- > > Key: CARBONDATA-4305 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4305 > Project: CarbonData > Issue Type: Sub-task >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > Fix For: 2.3.0 > > Time Spent: 16h 10m > Remaining Estimate: 0h > > Support a Spark streaming application that basically fetches new incremental > data from sources like kafka and DFS and does deduplication and merge the > changes onto the target carbondata table. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (CARBONDATA-4296) Handle schema evolution, enforcement and deduplication
[ https://issues.apache.org/jira/browse/CARBONDATA-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4296. -- Fix Version/s: 2.3.0 Resolution: Fixed > Handle schema evolution, enforcement and deduplication > -- > > Key: CARBONDATA-4296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4296 > Project: CarbonData > Issue Type: Sub-task > Components: data-load >Reporter: Pratyaksh Sharma >Priority: Major > Fix For: 2.3.0 > > Time Spent: 18h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (CARBONDATA-4306) Query Performance issue with Spark 3.1
[ https://issues.apache.org/jira/browse/CARBONDATA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4306. -- Fix Version/s: 2.3.0 Resolution: Fixed > Query Performance issue with Spark 3.1 > -- > > Key: CARBONDATA-4306 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4306 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi >Priority: Major > Fix For: 2.3.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Some rules are applied many times while running benchmark queries like TPCDS > and TPCH -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4293) Table without External keyword is created as external table in local mode
[ https://issues.apache.org/jira/browse/CARBONDATA-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4293. -- Fix Version/s: 2.3.0 Resolution: Fixed > Table without External keyword is created as external table in local mode > - > > Key: CARBONDATA-4293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4293 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4203) Compaction in SDK segments added is causing compaction issue after update, delete operations.
[ https://issues.apache.org/jira/browse/CARBONDATA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4203. -- Fix Version/s: 2.3.0 Resolution: Fixed > Compaction in SDK segments added is causing compaction issue after update, > delete operations. > - > > Key: CARBONDATA-4203 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4203 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.1.1 > Environment: FI cluster - 3 node >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.3.0 > > Attachments: primitive- SDK files.rar > > > Compaction in SDK segments added through add segments is causing compaction > issue after update, delete operations. This issue is present only when delete > and update happens on one of the added segment. This issue is not seen > without delete and update on 1 segment. > Place the attached SDK files in the > /sdkfiles/primitive/,/sdkfiles/primitive2/, > /sdkfiles/primitive3/,/sdkfiles/primitive4/ and /sdkfiles/primitive5/ folders > in HDFS and then execute the below queries. > Test queries: > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > insert into external_primitive select > 1,"Pr",1,10,true,"1992-12-09","1992-10-07 22:00:20.0","chennai","CSE"; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); > delete from external_primitive where id =2; > update external_primitive set (name)=("RAMU") where name="CCC"; > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); > alter table external_primitive compact 'minor'; > > !image-2021-06-08-16-54-52-412.png! > Error traces: > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs > for more info. Exception in compaction Compaction Failure in Merger Rdd. > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:396) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.spark.sql.AnalysisException: Compaction failed. Please > check log
[jira] [Resolved] (CARBONDATA-4228) Deleted records are reappearing in the select queries from the Alter added carbon segments after delete,update opertions.
[ https://issues.apache.org/jira/browse/CARBONDATA-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4228. -- Fix Version/s: 2.3.0 Resolution: Fixed > Deleted records are reappearing in the select queries from the Alter added > carbon segments after delete,update opertions. > - > > Key: CARBONDATA-4228 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4228 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.3.0 > > Time Spent: 6h > Remaining Estimate: 0h > > Deleted records are not deleting and displaying in the select queries from > the Alter added carbon segments after delete, update operations. > Test queries: > drop table uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > --Create a copy of files from the first seg; > --hdfs dfs -rm -r -f /uniq1/*; > --hdfs dfs -mkdir -p /uniq1/ > --hdfs dfs -cp > /user/hive/warehouse/carbon.store/rps/uniqdata/Fact/Part0/Segment_0/* /uniq1/; > --hdfs dfs -ls /uniq1/; > use rps; > Alter table uniqdata add segment options > ('path'='hdfs://hacluster/uniq1/','format'='carbon'); > --update and delete works fine without throwing error but it wont work on the > added carbon segments; > delete from uniqdata where cust_id=9001; > update uniqdata set (cust_name)=('Rahu') where cust_id=1; > set carbon.input.segments.rps.uniqdata=1; > --First segment represent the added segment; > select cust_name from uniqdata where cust_id=1;--CUST_NAME_01000 - > incorrect value should be Rahu; > select count(*) from uniqdata where cust_id=9001;--returns 1 - incorrect, > should be 0 as 9001 cust_id records are deleted through Delete DDL; > reset; > > Console: > > Alter table uniqdata add segment options > > ('path'='hdfs://hacluster/uniq1/','format'='carbon'); > +-+ > | Result | > +-+ > +-+ > No rows selected (1.226 seconds) > > > > delete from uniqdata where cust_id=9001; > INFO : Execution ID: 139 > ++ > | Deleted Row Count | > ++ > | 2 | > ++ > 1 row selected (5.321 seconds) > > update uniqdata set (cust_name)=('Rahu') where cust_id=1; > INFO : Execution ID: 142 > ++ > | Updated Row Count | > ++ > | 2 | > ++ > 1 row selected (7.938 seconds) > > > > > > set carbon.input.segments.rps.uniqdata=1; > +-++ > | key | value | > +-++ > | carbon.input.segments.rps.uniqdata | 1 | > +-++ > 1 row selected (0.05 seconds) > > --First segment represent the added segment; > > select cust_name from uniqdata where cust_id=1;--CUST_NAME_01000 - > > incorrect value should be Rahu; > INFO : Execution ID: 147 > +--+ > | cust_name | > +--+ > | CUST_NAME_01000 | > +--+ > 1 row selected (0.93 seconds) > > select count(*) from uniqdata where cust_id=9001;--returns 1 - incorrect, > > should be 0 as 9001 cust_id records are deleted through Delete DDL; > INFO : Execution ID: 148 > +---+ > | count(1) | > +---+ > | 1 | > +---+ > 1 row selected (1.149 seconds) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4289) Wrong cache value showed after firing concurrent select queries to index server
[ https://issues.apache.org/jira/browse/CARBONDATA-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4289. -- Fix Version/s: 2.3.0 Resolution: Fixed > Wrong cache value showed after firing concurrent select queries to index > server > --- > > Key: CARBONDATA-4289 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4289 > Project: CarbonData > Issue Type: Bug >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.3.0 > > > Steps to reproduce: > Start index server > Fire 8 loads concurrently from different spark-sql to the same index server > Show metacache show extra segments in the index server cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4288) Index Server loading duplicate cache to other executors in the case of SI table
[ https://issues.apache.org/jira/browse/CARBONDATA-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4288. -- Fix Version/s: 2.3.0 Resolution: Fixed > Index Server loading duplicate cache to other executors in the case of SI > table > --- > > Key: CARBONDATA-4288 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4288 > Project: CarbonData > Issue Type: Bug >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Steps to reproduce: > Start index server > Disable prepriming > Create main table > create SI table > Load to main table > Cache in index server has 1 entry even if prepriming is disabled > do select * on main table > Show metacache shows 2/1 cache in the Index server -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4285) complex columns with global sort compaction is failed
[ https://issues.apache.org/jira/browse/CARBONDATA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4285. -- Fix Version/s: 2.3.0 Resolution: Fixed > complex columns with global sort compaction is failed > - > > Key: CARBONDATA-4285 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4285 > Project: CarbonData > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Priority: Major > Fix For: 2.3.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > complex columns with global sort compaction is failed. > > Steps to reproduce > -=- > 1) create table with global sort > 2) load the data multiple times > 3) alter add columns > 4) insert the data > 5) repeat 3 and 4 for four times > 6) execute the compaction. > test("test the complex columns with global sort compaction") { > sql("DROP TABLE IF EXISTS alter_global1") > sql("CREATE TABLE alter_global1(intField INT) STORED AS carbondata " + > "TBLPROPERTIES('sort_columns'='intField','sort_scope'='global_sort')") > sql("insert into alter_global1 values(1)") > sql("insert into alter_global1 values(2)") > sql("insert into alter_global1 values(3)") > sql( "ALTER TABLE alter_global1 ADD COLUMNS(str1 array)") > sql("insert into alter_global1 values(4, array(1))") > checkAnswer(sql("select * from alter_global1"), > Seq(Row(1, null), Row(2, null), Row(3, null), Row(4, make(Array(1) > val addedColumns = addedColumnsInSchemaEvolutionEntry("alter_global1") > assert(addedColumns.size == 1) > sql("alter table alter_global1 compact 'minor'") > checkAnswer(sql("select * from alter_global1"), > Seq(Row(1, null), Row(2, null), Row(3, null), Row(4, make(Array(1) > sql("DROP TABLE IF EXISTS alter_global1") > } > test("test the multi-level complex columns with global sort compaction") { > sql("DROP TABLE IF EXISTS alter_global2") > sql("CREATE TABLE alter_global2(intField INT) STORED AS carbondata " + > "TBLPROPERTIES('sort_columns'='intField','sort_scope'='global_sort')") > sql("insert into alter_global2 values(1)") > // multi-level nested array > sql( > "ALTER TABLE alter_global2 ADD COLUMNS(arr1 array>, arr2 > array "map1:Map>>) ") > sql( > "insert into alter_global2 values(1, array(array(1,2)), > array(named_struct('a1','st'," + > "'map1', map('a','b'") > // multi-level nested struct > sql("ALTER TABLE alter_global2 ADD COLUMNS(struct1 struct array>," + > " struct2 struct>>) ") > sql("insert into alter_global2 values(1, " + > "array(array(1,2)), array(named_struct('a1','st','map1', map('a','b'))), " + > "named_struct('s1','hi','arr',array(1,2)), > named_struct('num',2.3,'contact',map('ph'," + > "array(1,2") > // multi-level nested map > sql( > "ALTER TABLE alter_global2 ADD COLUMNS(map1 map>, map2 > map "struct>>)") > sql("insert into alter_global2 values(1, " + > "array(array(1,2)), array(named_struct('a1','st','map1', map('a','b'))), " + > "named_struct('s1','hi','arr',array(1,2)), > named_struct('num',2.3,'contact',map('ph'," + > "array(1,2))),map('a',array('hi')), > map('a',named_struct('d',23,'s',named_struct('im'," + > "'sh'") > val addedColumns = addedColumnsInSchemaEvolutionEntry("alter_global2") > assert(addedColumns.size == 6) > sql("alter table alter_global2 compact 'minor'") > sql("DROP TABLE IF EXISTS alter_global2") -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4284) Load/insert after alter add column on partition table with complex column fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4284. -- Fix Version/s: 2.3.0 Resolution: Fixed > Load/insert after alter add column on partition table with complex column > fails > > > Key: CARBONDATA-4284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4284 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Insert after alter add column on partition table with complex column fails > with bufferUnderFlowException > [Steps] :- > drop table if exists strarmap1; create table strarmap1(id int,str > struct>,arr > array>) PARTITIONED BY(name string) stored as > carbondata > tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,str,arr'); > load data inpath 'hdfs://hacluster/chetan/strarmap1.csv' into table > strarmap1 partition(name='name0') > options('fileheader'='id,name,str,arr','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE'); > select * from strarmap1 limit 1; show partitions strarmap1; ALTER TABLE > strarmap1 ADD COLUMNS(map1 Map, map2 Map, map3 > Map, map4 Map, map5 > Map,map6 Map,map7 map>, > map8 map>>); load data inpath > 'hdfs://hacluster/chetan/strarmap1.csv' into table strarmap1 > partition(name='name0') > options('fileheader'='id,name,str,arr,map1,map2,map3,map4,map5,map6,map7,map8','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE'); > [Expected Result] :- load after add map columns on partition table should be > success > [Actual Issue]:- error on load after add map columns on partition table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4271) Support DPP for carbon filters
[ https://issues.apache.org/jira/browse/CARBONDATA-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4271. -- Fix Version/s: 2.3.0 Resolution: Fixed > Support DPP for carbon filters > -- > > Key: CARBONDATA-4271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4271 > Project: CarbonData > Issue Type: Sub-task >Reporter: Indhumathi >Priority: Major > Fix For: 2.3.0 > > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4274) Create partition table error with spark 3.1
[ https://issues.apache.org/jira/browse/CARBONDATA-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4274. -- Fix Version/s: 2.3.0 Resolution: Fixed > Create partition table error with spark 3.1 > > > Key: CARBONDATA-4274 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4274 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Major > Fix For: 2.3.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > With spark 3.1, we can create a partition table by giving partition columns > from schema. > Like below example: > {{create table partitionTable(c1 int, c2 int, v1 string, v2 string) stored as > carbondata partitioned by (v2,c2)}} > When the table is created by SparkSession with CarbonExtension, catalog table > is created with the specified partitions. > But in cluster/ with carbon session, when we create partition table with > above syntax it is creating normal table with no partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4251) optimize clean index file performance
[ https://issues.apache.org/jira/browse/CARBONDATA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4251: - Fix Version/s: (was: 2.2.1) 2.2.0 > optimize clean index file performance > - > > Key: CARBONDATA-4251 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4251 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.2.0 >Reporter: Jiayu Shen >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > When cleanfile cleans up data, it cleans up all the carbonindex and > carbonmergeindex that once existed, even though many carbonindex have been > all deleted, which have been merged into carbonergeindex. considering that > there are tens of thousands of carbonindex that once existed after the > completion of the compaction, the clean file command will take serveral hours. > Here, we just need to clean up the existing files, carbonmergeindex or > carbonindex files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4204) When the path is empty in Carbon add segments then "String Index out of range" error is thrown.
[ https://issues.apache.org/jira/browse/CARBONDATA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4204. -- Fix Version/s: 2.2.0 Resolution: Fixed > When the path is empty in Carbon add segments then "String Index out of > range" error is thrown. > --- > > Key: CARBONDATA-4204 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4204 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.1.1 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 12.5h > Remaining Estimate: 0h > > Test queries: > CREATE TABLE uniqdata(cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > Alter table uniqdata add segment options ('path'='','format'='carbon'); > -- > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:396) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -1 > at java.lang.String.charAt(String.java:658) > at > org.apache.spark.sql.execution.command.management.CarbonAddLoadCommand.processMetadata(CarbonAddLoadCommand.scala:93) > at > org.apache.spark.sql.execution.command.MetadataCommand.$anonfun$run$1(package.scala:137) > at > org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118) > at > org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114) > at > org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134) > at > org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:71) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:69) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:80) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:231) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3697) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:108) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:170) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNe
[jira] [Resolved] (CARBONDATA-4231) On update operation with 3.1v, cloned spark session is used and set properties are lost.
[ https://issues.apache.org/jira/browse/CARBONDATA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4231. -- Fix Version/s: 2.2.0 Resolution: Fixed > On update operation with 3.1v, cloned spark session is used and set > properties are lost. > > > Key: CARBONDATA-4231 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4231 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Major > Fix For: 2.2.0 > > > *Update operation with bad records property fails with 3.1v.* > *[Steps to reproduce]:* > 0: jdbc:hive2://linux-221:22550/> set carbon.options.bad.records.action=force; > +++ > | key | value | > +++ > | carbon.options.bad.records.action | force | > +++ > 1 row selected (0.04 seconds) > 0: jdbc:hive2://linux-221:22550/> create table t_carbn1(item_type_cd int, > sell_price bigint, profit decimal(10,4), item_name string, update_time > timestamp) stored a > +-+ > | Result | > +-+ > +-+ > No rows selected (2.117 seconds) > 0: jdbc:hive2://linux-221:22550/> insert into t_carbn1 select 2, > 10,23.3,'Apple','2012-11-11 11:11:11'; > INFO : Execution ID: 858 > +-+ > | Segment ID | > +-+ > | 0 | > +-+ > 1 row selected (4.278 seconds) > 0: jdbc:hive2://linux-221:22550/> update t_carbn1 set (item_type_cd) = > (item_type_cd/1); > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > java.lang.RuntimeException: Update operation failed. DataLoad failure > *[Root cause]:* > On update command, persist is called and with latest 3.1 spark changes, spark > returns a cloned SparkSession from cacheManager with all specified > configurations disabled. As now its using different sparkSession for 3.1 > which is not initialized in CarbonEnv. So CarbonEnv.init is called where new > CarbonSessionInfo is created with no sessionParams. So, the properties set > were not accessible. > Spark creates cloned spark session based on following properties: > 1. spark.sql.optimizer.canChangeCachedPlanOutputPartitioning > 2. spark.sql.sources.bucketing.autoBucketedScan.enabled > 3. spark.sql.adaptive.enabled > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4210) Support Alter change column for complex columns and fix other issues for Spark 3.1.1
[ https://issues.apache.org/jira/browse/CARBONDATA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4210: - Fix Version/s: 2.2.0 > Support Alter change column for complex columns and fix other issues for > Spark 3.1.1 > > > Key: CARBONDATA-4210 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4210 > Project: CarbonData > Issue Type: Sub-task >Reporter: Vikram Ahuja >Priority: Major > Fix For: 2.2.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Support Alter change column for complex columns and fix other issues for > Spark 3.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4210) Support Alter change column for complex columns and fix other issues for Spark 3.1.1
[ https://issues.apache.org/jira/browse/CARBONDATA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4210. -- Resolution: Fixed > Support Alter change column for complex columns and fix other issues for > Spark 3.1.1 > > > Key: CARBONDATA-4210 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4210 > Project: CarbonData > Issue Type: Sub-task >Reporter: Vikram Ahuja >Priority: Major > Fix For: 2.2.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Support Alter change column for complex columns and fix other issues for > Spark 3.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4191) update table for primitive column not working when complex child column name and primitive column name match
[ https://issues.apache.org/jira/browse/CARBONDATA-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4191. -- Fix Version/s: 2.2.0 Resolution: Fixed > update table for primitive column not working when complex child column name > and primitive column name match > > > Key: CARBONDATA-4191 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4191 > Project: CarbonData > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Priority: Major > Fix For: 2.2.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > > below steps to reproduce the issue: > drop table if exists update_complex; > create table update_complex (a int, b string, struct1 STRUCT c:string>) stored as carbondata; > insert into update_complex select 1,'c', named_struct('a',4,'b','d'); > update update_complex set (a)=(4); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4186) Insert query is failing when partition column is part of local sort scope.
[ https://issues.apache.org/jira/browse/CARBONDATA-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4186. -- Fix Version/s: 2.2.0 Resolution: Fixed > Insert query is failing when partition column is part of local sort scope. > -- > > Key: CARBONDATA-4186 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4186 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.2.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently when we create table with partition column and put the same column > as part of local sort scope then Insert query fails with > ArrayIndexOutOfBounds exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4185) Heterogeneous format segments in carbondata documenation
[ https://issues.apache.org/jira/browse/CARBONDATA-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4185. -- Fix Version/s: 2.2.0 Resolution: Fixed > Heterogeneous format segments in carbondata documenation > > > Key: CARBONDATA-4185 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4185 > Project: CarbonData > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Priority: Major > Fix For: 2.2.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Heterogeneous format segments in carbondata documenation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4188) Select query fails for longstring data with small table page size after alter add columns
[ https://issues.apache.org/jira/browse/CARBONDATA-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4188. -- Fix Version/s: 2.2.0 Resolution: Fixed > Select query fails for longstring data with small table page size after alter > add columns > - > > Key: CARBONDATA-4188 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4188 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.2.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Steps to reproduce: > # Create table with small page size and longstring data type. > # Load large amount of data(more than one page should be created.) > # Alter add int column on the same table. > # Select query with filter on newly added columns fails with > ArrayIndexOutOfBoundException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4162) Leverage Secondary Index till segment level with SI as datamap and SI with plan rewrite
[ https://issues.apache.org/jira/browse/CARBONDATA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4162. -- Fix Version/s: 2.2.0 Resolution: Fixed https://github.com/apache/carbondata/pull/4116 > Leverage Secondary Index till segment level with SI as datamap and SI with > plan rewrite > --- > > Key: CARBONDATA-4162 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4162 > Project: CarbonData > Issue Type: New Feature >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.2.0 > > Attachments: Support SI at segment level.pdf > > Time Spent: 4h 20m > Remaining Estimate: 0h > > *Background:* > Secondary index tables are created as indexes and managed as child tables > internally by Carbondata. In the existing architecture, if the parent(main) > table and SI table don’t > have the same valid segments then we disable the SI table. And then from the > next query onwards, we scan and prune only the parent table until we trigger > the next load or REINDEX command (as these commands will make the > parent and SI table segments in sync). Because of this, queries take more > time to give the result when SI is disabled. > *Proposed Solution:* > We are planning to leverage SI till the segment level. It means at place > of disabling the SI table(when parent and child table segments are not in > sync) > we will do pruning on SI tables for all the valid segments(segments with > status > success, marked for update and load partial success) and the rest of the > segments will be pruned by the parent table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4175) Issue with array_contains after altering schema for array types
[ https://issues.apache.org/jira/browse/CARBONDATA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4175. -- Fix Version/s: 2.2.0 Resolution: Fixed > Issue with array_contains after altering schema for array types > --- > > Key: CARBONDATA-4175 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4175 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Reporter: Akshay >Priority: Major > Fix For: 2.2.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > NPE on executing filter query after adding array column to the carbon table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4175) Issue with array_contains after altering schema for array types
[ https://issues.apache.org/jira/browse/CARBONDATA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342316#comment-17342316 ] Kunal Kapoor commented on CARBONDATA-4175: -- PR: https://github.com/apache/carbondata/pull/4116 > Issue with array_contains after altering schema for array types > --- > > Key: CARBONDATA-4175 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4175 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Reporter: Akshay >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > NPE on executing filter query after adding array column to the carbon table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4172) Select query having parent and child struct column in projection returns incorrect results
[ https://issues.apache.org/jira/browse/CARBONDATA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4172. -- Fix Version/s: 2.2.0 Resolution: Fixed > Select query having parent and child struct column in projection returns > incorrect results > --- > > Key: CARBONDATA-4172 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4172 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Priority: Major > Fix For: 2.2.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > struct column: col1 struct > insert: named_struct('a',1,'b',2,'c','a') > Query : select col1,col1.a from table; > Result: > col1 col1.a > {a:1,b:null,c:null} 1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4158) Make Secondary Index as a coarse grain datamap and use secondary indexes for Presto queries
[ https://issues.apache.org/jira/browse/CARBONDATA-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4158. -- Fix Version/s: 2.2.0 Resolution: Fixed > Make Secondary Index as a coarse grain datamap and use secondary indexes for > Presto queries > --- > > Key: CARBONDATA-4158 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4158 > Project: CarbonData > Issue Type: New Feature >Reporter: Venugopal Reddy K >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 13h 10m > Remaining Estimate: 0h > > *Background:* > Secondary Indexes are created as carbon tables and are managed as child > tables to the main table. And these indexes are leveraged for query pruning > via spark plan modification during optimizer/execution phases of query > execution. In order to make use of Secondary Indexes for queries from engines > other than spark like presto etc, it is not feasible to modify the engine > specific query execution plans as we desire in the current approach. It makes > Secondary Indexes not usable for presto query pruning. Thus need arises for > an engine agnostic approach to use Secondary Indexes for presto queries. > *Description:* > Current Secondary Index pruning is tightly coupled with spark because the > query plan modification is specific to the spark engine. It is hard to reuse > the solution for presto queries. Need a new solution to use secondary indexes > with Presto queries. And it shouldn’t affect the existing customer using > secondary index with spark. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4156) Segment min max is not written considering all blocks in a segment
[ https://issues.apache.org/jira/browse/CARBONDATA-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4156. -- Fix Version/s: 2.1.1 Resolution: Fixed > Segment min max is not written considering all blocks in a segment > -- > > Key: CARBONDATA-4156 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4156 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-4155) CReate table like on table with MV fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor reassigned CARBONDATA-4155: Assignee: (was: Kunal Kapoor) > CReate table like on table with MV fails > - > > Key: CARBONDATA-4155 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4155 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > steps to reproduce: > {color:#067d17}create table maintable(name string, c_code int, price int) > STORED AS carbondata;{color} > {color:#067d17}create materialized view mv_table as select name, sum(price) > from maintable group by name;{color} > {color:#067d17}create table new_Table like maintable;{color} > {color:#172b4d}Result: > {color} > 2021-03-22 20:40:06 ERROR CarbonCreateTableCommand:176 - > org.apache.spark.sql.AnalysisException: == Spark Parser: > org.apache.spark.sql.execution.SparkSqlParser == > extraneous input 'default' expecting \{')', ','}(line 8, pos 25) > == SQL == > CREATE TABLE default.new_table > (`name` string,`c_code` int,`price` int) > USING carbondata > OPTIONS ( > indexexists "false", > sort_columns "", > comment "", > relatedmvtablesmap "\{"default":["mv_table"]}", > -^^^ > bad_record_path "", > local_dictionary_enable "true", > indextableexists "false", > tableName "new_table", > dbName "default", > tablePath > "/home/root1/carbondata/integration/spark/target/warehouse/new_table", > path > "file:/home/root1/carbondata/integration/spark/target/warehouse/new_table", > isExternal "false", > isTransactional "true", > isVisible "true" > ,carbonSchemaPartsNo '1',carbonSchema0 > '\{"databaseName":"default","tableUniqueName":"default_new_table","factTable":{"tableId":"4ddbaea5-42b8-4ca2-b0ce-dec0af81d3b6","tableName":"new_table","listOfColumns":[{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"name","columnUniqueId":"2293eee8-41fa-4869-8275-8c16a5dd7222","columnReferenceId":"2293eee8-41fa-4869-8275-8c16a5dd7222","isColumnar":true,"encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":0,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"c_code","columnUniqueId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","columnReferenceId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":1,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"price","columnUniqueId":"c67ed6d5-8f10-488f-a990-dfda20739907","columnReferenceId":"c67ed6d5-8f10-488f-a990-dfda20739907","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":2,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false}],"schemaEvolution":\{"schemaEvolutionEntryList":[{"timeStamp":1616425806915}]},"tableProperties":\{"indexexists":"false","sort_columns":"","comment":"","relatedmvtablesmap":"{\"default\":[\"mv_table\"]}","bad_record_path":"","local_dictionary_enable":"true","indextableexists":"false"}},"lastUpdatedTime":1616425806915,"tablePath":"file:/home/root1/carbondata/integration/spark/target/warehouse/new_table","isTransactionalTable":true,"hasColumnDrift":false,"isSchemaModified":false}') -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4155) CReate table like on table with MV fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4155. -- Fix Version/s: 2.1.1 Resolution: Fixed > CReate table like on table with MV fails > - > > Key: CARBONDATA-4155 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4155 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Assignee: Kunal Kapoor >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > steps to reproduce: > {color:#067d17}create table maintable(name string, c_code int, price int) > STORED AS carbondata;{color} > {color:#067d17}create materialized view mv_table as select name, sum(price) > from maintable group by name;{color} > {color:#067d17}create table new_Table like maintable;{color} > {color:#172b4d}Result: > {color} > 2021-03-22 20:40:06 ERROR CarbonCreateTableCommand:176 - > org.apache.spark.sql.AnalysisException: == Spark Parser: > org.apache.spark.sql.execution.SparkSqlParser == > extraneous input 'default' expecting \{')', ','}(line 8, pos 25) > == SQL == > CREATE TABLE default.new_table > (`name` string,`c_code` int,`price` int) > USING carbondata > OPTIONS ( > indexexists "false", > sort_columns "", > comment "", > relatedmvtablesmap "\{"default":["mv_table"]}", > -^^^ > bad_record_path "", > local_dictionary_enable "true", > indextableexists "false", > tableName "new_table", > dbName "default", > tablePath > "/home/root1/carbondata/integration/spark/target/warehouse/new_table", > path > "file:/home/root1/carbondata/integration/spark/target/warehouse/new_table", > isExternal "false", > isTransactional "true", > isVisible "true" > ,carbonSchemaPartsNo '1',carbonSchema0 > '\{"databaseName":"default","tableUniqueName":"default_new_table","factTable":{"tableId":"4ddbaea5-42b8-4ca2-b0ce-dec0af81d3b6","tableName":"new_table","listOfColumns":[{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"name","columnUniqueId":"2293eee8-41fa-4869-8275-8c16a5dd7222","columnReferenceId":"2293eee8-41fa-4869-8275-8c16a5dd7222","isColumnar":true,"encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":0,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"c_code","columnUniqueId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","columnReferenceId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":1,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"price","columnUniqueId":"c67ed6d5-8f10-488f-a990-dfda20739907","columnReferenceId":"c67ed6d5-8f10-488f-a990-dfda20739907","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":2,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false}],"schemaEvolution":\{"schemaEvolutionEntryList":[{"timeStamp":1616425806915}]},"tableProperties":\{"indexexists":"false","sort_columns":"","comment":"","relatedmvtablesmap":"{\"default\":[\"mv_table\"]}","bad_record_path":"","local_dictionary_enable":"true","indextableexists":"false"}},"lastUpdatedTime":1616425806915,"tablePath":"file:/home/root1/carbondata/integration/spark/target/warehouse/new_table","isTransactionalTable":true,"hasColumnDrift":false,"isSchemaModified":false}') -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-4155) CReate table like on table with MV fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor reassigned CARBONDATA-4155: Assignee: Kunal Kapoor > CReate table like on table with MV fails > - > > Key: CARBONDATA-4155 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4155 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Assignee: Kunal Kapoor >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > steps to reproduce: > {color:#067d17}create table maintable(name string, c_code int, price int) > STORED AS carbondata;{color} > {color:#067d17}create materialized view mv_table as select name, sum(price) > from maintable group by name;{color} > {color:#067d17}create table new_Table like maintable;{color} > {color:#172b4d}Result: > {color} > 2021-03-22 20:40:06 ERROR CarbonCreateTableCommand:176 - > org.apache.spark.sql.AnalysisException: == Spark Parser: > org.apache.spark.sql.execution.SparkSqlParser == > extraneous input 'default' expecting \{')', ','}(line 8, pos 25) > == SQL == > CREATE TABLE default.new_table > (`name` string,`c_code` int,`price` int) > USING carbondata > OPTIONS ( > indexexists "false", > sort_columns "", > comment "", > relatedmvtablesmap "\{"default":["mv_table"]}", > -^^^ > bad_record_path "", > local_dictionary_enable "true", > indextableexists "false", > tableName "new_table", > dbName "default", > tablePath > "/home/root1/carbondata/integration/spark/target/warehouse/new_table", > path > "file:/home/root1/carbondata/integration/spark/target/warehouse/new_table", > isExternal "false", > isTransactional "true", > isVisible "true" > ,carbonSchemaPartsNo '1',carbonSchema0 > '\{"databaseName":"default","tableUniqueName":"default_new_table","factTable":{"tableId":"4ddbaea5-42b8-4ca2-b0ce-dec0af81d3b6","tableName":"new_table","listOfColumns":[{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"name","columnUniqueId":"2293eee8-41fa-4869-8275-8c16a5dd7222","columnReferenceId":"2293eee8-41fa-4869-8275-8c16a5dd7222","isColumnar":true,"encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":0,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"c_code","columnUniqueId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","columnReferenceId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":1,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"price","columnUniqueId":"c67ed6d5-8f10-488f-a990-dfda20739907","columnReferenceId":"c67ed6d5-8f10-488f-a990-dfda20739907","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":2,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false}],"schemaEvolution":\{"schemaEvolutionEntryList":[{"timeStamp":1616425806915}]},"tableProperties":\{"indexexists":"false","sort_columns":"","comment":"","relatedmvtablesmap":"{\"default\":[\"mv_table\"]}","bad_record_path":"","local_dictionary_enable":"true","indextableexists":"false"}},"lastUpdatedTime":1616425806915,"tablePath":"file:/home/root1/carbondata/integration/spark/target/warehouse/new_table","isTransactionalTable":true,"hasColumnDrift":false,"isSchemaModified":false}') -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4153) DoNot Push down 'not equal to' filter with Cast on SI
[ https://issues.apache.org/jira/browse/CARBONDATA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4153. -- Fix Version/s: 2.1.1 Resolution: Fixed > DoNot Push down 'not equal to' filter with Cast on SI > - > > Key: CARBONDATA-4153 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4153 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > For NOT EQUAL TO filter on SI index column, should not be pushed down to SI > table. > Currently, where x!='2' is not pushing down to SI, but where x!=2 is pushed > down to SI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4137) Refactor CarbonDataSourceScan without Spark Filter
[ https://issues.apache.org/jira/browse/CARBONDATA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4137. -- Fix Version/s: 2.1.1 Resolution: Fixed > Refactor CarbonDataSourceScan without Spark Filter > -- > > Key: CARBONDATA-4137 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4137 > Project: CarbonData > Issue Type: Sub-task >Reporter: David Cai >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4141) index server is not caching the index files from external sdk table.
[ https://issues.apache.org/jira/browse/CARBONDATA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4141. -- Fix Version/s: (was: 2.0.1) 2.1.1 Resolution: Fixed > index server is not caching the index files from external sdk table. > > > Key: CARBONDATA-4141 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4141 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: Karan >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Indexes cached in Executor cache are not dropped when drop table is called > for external SDK table. Because, external tables with sdk segments will not > have metadata like table status file. So in drop table command we send zero > segments to indexServer clearIndexes job, which clears nothing from executor > side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4121) Prepriming is not working in index server
[ https://issues.apache.org/jira/browse/CARBONDATA-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4121. -- Fix Version/s: (was: 2.0.1) 2.1.1 Resolution: Fixed > Prepriming is not working in index server > - > > Key: CARBONDATA-4121 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4121 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.0.0 >Reporter: Karan >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Prepriming is always executed in a async thread. Server.getRemoteUser in a > async thread causes NPE, which crashes the index server application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4126) Concurrent Compaction fails with Load on table with SI
[ https://issues.apache.org/jira/browse/CARBONDATA-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4126. -- Fix Version/s: 2.1.1 Resolution: Fixed > Concurrent Compaction fails with Load on table with SI > -- > > Key: CARBONDATA-4126 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4126 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.1.0 > Environment: Spark 2.4.5 >Reporter: Chetan Bhat >Priority: Major > Fix For: 2.1.1 > > Time Spent: 3.5h > Remaining Estimate: 0h > > [Steps] :- > Create table, load data and create SI. > create table brinjal (imei string,AMSize string,channelsId > string,ActiveCountry string, Activecity string,gamePointId > double,deviceInformationId double,productionDate Timestamp,deliveryDate > timestamp,deliverycharge double) stored as carbondata > TBLPROPERTIES('table_blocksize'='1'); > LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE > brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= > '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= > 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge'); > create index indextable1 ON TABLE brinjal (AMSize) AS 'carbondata'; > > From one terminal load data to table and other terminal perform minor and > major compaction on the table concurrently for some time. > LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE > brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= > '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= > 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge'); > alter table brinjal compact 'minor'; > alter table brinjal compact 'major'; > > [Expected Result] :- Concurrent Compaction should be success with Load on > table with SI > > [Actual Issue] : - Concurrent Compaction fails with Load on table with SI > *0: jdbc:hive2://linux-32:22550/> alter table brinjal compact 'major';* > *Error: org.apache.spark.sql.AnalysisException: Compaction failed. Please > check logs for more info. Exception in compaction Failed to acquire lock on > segment 2, during compaction of table test.brinjal; (state=,code=0)* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4123) Bloom index query with Index server giving incorrect results
[ https://issues.apache.org/jira/browse/CARBONDATA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4123. -- Fix Version/s: 2.1.1 Resolution: Fixed > Bloom index query with Index server giving incorrect results > > > Key: CARBONDATA-4123 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4123 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > > Queries: create table and load data so that it can create >1 blocklet. > > spark-sql> select count(*) from test_rcd where city = 'city40'; > 2021-02-04 22:13:29,759 | WARN | pool-24-thread-1 | It is not recommended to > set off-heap working memory size less than 512MB, so setting default value to > 512 | > org.apache.carbondata.core.memory.UnsafeMemoryManager.(UnsafeMemoryManager.java:83) > 10 > Time taken: 2.417 seconds, Fetched 1 row(s) > spark-sql> CREATE INDEX dm_rcd ON TABLE test_rcd (city) AS 'bloomfilter' > properties ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > 2021-02-04 22:13:58,683 | AUDIT | main | \{"time":"February 4, 2021 10:13:58 > PM CST","username":"carbon","opName":"CREATE > INDEX","opId":"15148202700230273","opStatus":"START"} | > carbon.audit.logOperationStart(Auditor.java:74) > 2021-02-04 22:13:58,759 | WARN | main | Bloom compress is not configured for > index dm_rcd, use default value true | > org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) > 2021-02-04 22:13:59,292 | WARN | Executor task launch worker for task 2 | > Bloom compress is not configured for index dm_rcd, use default value true | > org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) > 2021-02-04 22:13:59,629 | WARN | main | Bloom compress is not configured for > index dm_rcd, use default value true | > org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) > 2021-02-04 22:14:00,331 | AUDIT | main | \{"time":"February 4, 2021 10:14:00 > PM CST","username":"carbon","opName":"CREATE > INDEX","opId":"15148202700230273","opStatus":"SUCCESS","opTime":"1648 > ms","table":"default.test_rcd","extraInfo":{"provider":"bloomfilter","indexName":"dm_rcd","bloom_size":"64","bloom_fpp":"0.1"}} > | carbon.audit.logOperationEnd(Auditor.java:97) > Time taken: 1.818 seconds > spark-sql> select count(*) from test_rcd where city = 'city40'; > 30 > Time taken: 0.556 seconds, Fetched 1 row(s) > spark-sql> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4117) Test cg index query with Index server fails with NPE
[ https://issues.apache.org/jira/browse/CARBONDATA-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4117. -- Fix Version/s: 2.1.1 Resolution: Fixed > Test cg index query with Index server fails with NPE > > > Key: CARBONDATA-4117 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4117 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Test queries to execute: > spark-sql> CREATE TABLE index_test_cg(id INT, name STRING, city STRING, age > INT) STORED AS carbondata TBLPROPERTIES('SORT_COLUMNS'='city,name', > 'SORT_SCOPE'='LOCAL_SORT'); > spark-sql> create index cgindex on table index_test_cg (name) as > 'org.apache.carbondata.spark.testsuite.index.CGIndexFactory'; > LOAD DATA LOCAL INPATH '$file2' INTO TABLE index_test_cg > OPTIONS('header'='false') > spark-sql> select * from index_test_cg where name='n502670'; > 2021-01-29 15:09:25,881 | ERROR | main | Exception occurred while getting > splits using index server. Initiating Fallback to embedded mode | > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:454) > java.lang.reflect.UndeclaredThrowableException > at com.sun.proxy.$Proxy69.getSplits(Unknown Source) > at > org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:85) > at > org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:59) > at > org.apache.carbondata.spark.util.CarbonScalaUtil$.logTime(CarbonScalaUtil.scala:769) > at > org.apache.carbondata.indexserver.DistributedIndexJob.execute(IndexJobs.scala:58) > at > org.apache.carbondata.core.index.IndexUtil.executeIndexJob(IndexUtil.java:307) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:443) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:555) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:500) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:357) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:205) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:159) > at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:68) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:989) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:384) > at org.apache.spark.rdd.RDD.collect(RDD.scala:988) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:345) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372) > at > org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789) > at > org.apache.spar
[jira] [Resolved] (CARBONDATA-4082) When a segment is added to a carbon table by alter table add segment query and that segment also have a deleteDelta file present in it then on querying the carbon t
[ https://issues.apache.org/jira/browse/CARBONDATA-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4082. -- Fix Version/s: 2.1.1 Resolution: Fixed > When a segment is added to a carbon table by alter table add segment query > and that segment also have a deleteDelta file present in it then on querying > the carbon table the deleted rows are coming in the result. > --- > > Key: CARBONDATA-4082 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4082 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.0.0 >Reporter: Karan >Priority: Major > Fix For: 2.1.1 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > When a segment is added to a carbon table by alter table add segment query > and that segment also have a deleteDelta file present in it then on querying > the carbon table the deleted rows are coming in the result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4113) Partition query results invalid when carbon.read.partition.hive.direct is disabled
[ https://issues.apache.org/jira/browse/CARBONDATA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4113. -- Fix Version/s: 2.1.1 Resolution: Fixed > Partition query results invalid when carbon.read.partition.hive.direct is > disabled > -- > > Key: CARBONDATA-4113 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4113 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > set 'carbon.read.partition.hive.direct' to false. > queries to execute: > create table partition_cache(a string) partitioned by(b int) stored as > carbondata > insert into partition_cache select 'k',1; > insert into partition_cache select 'k',1; > insert into partition_cache select 'k',2; > insert into partition_cache select 'k',2; > alter table partition_cache compact 'minor'; > select *from partition_cache; => no results -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4097) Direct filling of column vector is not allowed for a alter table, because it uses RestructureBasedCollector. However ColumnVectors were initialized as ColumnVectorW
[ https://issues.apache.org/jira/browse/CARBONDATA-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4097. -- Fix Version/s: (was: 2.0.1) 2.1.1 Resolution: Fixed > Direct filling of column vector is not allowed for a alter table, because it > uses RestructureBasedCollector. However ColumnVectors were initialized as > ColumnVectorWrapperDirect even for alter table. > -- > > Key: CARBONDATA-4097 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4097 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Karan >Priority: Major > Fix For: 2.1.1 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > ColumnVector for alter tables should not be initialized as > ColumnVectorWrapperDirect because direct filling is not allowed for alter > table. It should be initialized as ColumnVectorWrapper. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4096) SDK read fails from cluster and sdk read filter query on sort column giving wrong result with IndexServer
[ https://issues.apache.org/jira/browse/CARBONDATA-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4096. -- Fix Version/s: 2.1.1 Resolution: Fixed > SDK read fails from cluster and sdk read filter query on sort column giving > wrong result with IndexServer > - > > Key: CARBONDATA-4096 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4096 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > Attachments: image-2020-12-22-18-54-52-361.png, > wrongresults_with_IS.PNG > > Time Spent: 1.5h > Remaining Estimate: 0h > > Test write sdk and read with spark. > Queries to reproduce: > put written sdk files in $warehouse/sdk path - contains .carbondata and > .index files. > +From spark-sql:+ > create table sdkout using carbon options(path='$warehouse/sdk'); > select * from sdkout where salary = 100; > !image-2020-12-22-18-54-52-361.png|width=744,height=279! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4055) Empty segment created and unnecessary entry to table status in update
[ https://issues.apache.org/jira/browse/CARBONDATA-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4055. -- Fix Version/s: 2.1.1 Resolution: Fixed > Empty segment created and unnecessary entry to table status in update > - > > Key: CARBONDATA-4055 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4055 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > Fix For: 2.1.1 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > When the update command is executed and no data is updated, empty segment > directories are created and an in progress stale entry added to table status, > and even segment dirs are not cleaned during clean files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3898) Support Option 'carbon.enable.querywithmv'
[ https://issues.apache.org/jira/browse/CARBONDATA-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3898: - Issue Type: Improvement (was: New Feature) > Support Option 'carbon.enable.querywithmv' > -- > > Key: CARBONDATA-3898 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3898 > Project: CarbonData > Issue Type: Improvement >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > When MV enabled, SQL rewrite takes a lot of time, a new option > 'carbon.enable.querywithmv' shall be supported, which can turn off SQL > Rewrite when the configured value is false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3854) Quote char support to unprintable character like \u0009 \u0010
[ https://issues.apache.org/jira/browse/CARBONDATA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3854: - Issue Type: Bug (was: New Feature) > Quote char support to unprintable character like \u0009 \u0010 > -- > > Key: CARBONDATA-3854 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3854 > Project: CarbonData > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Quote char support to unprintable character like \u0009 \u0010 > Currently carbondata will not support setting quotechar to printable char > like \u0009. > current behaviour is quotechar will through exception if we give more than > one character. > > Need to support more than one character same as like delimiter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3854) Quote char support to unprintable character like \u0009 \u0010
[ https://issues.apache.org/jira/browse/CARBONDATA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3854. -- Resolution: Fixed > Quote char support to unprintable character like \u0009 \u0010 > -- > > Key: CARBONDATA-3854 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3854 > Project: CarbonData > Issue Type: New Feature >Reporter: Mahesh Raju Somalaraju >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Quote char support to unprintable character like \u0009 \u0010 > Currently carbondata will not support setting quotechar to printable char > like \u0009. > current behaviour is quotechar will through exception if we give more than > one character. > > Need to support more than one character same as like delimiter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (CARBONDATA-3854) Quote char support to unprintable character like \u0009 \u0010
[ https://issues.apache.org/jira/browse/CARBONDATA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor reopened CARBONDATA-3854: -- > Quote char support to unprintable character like \u0009 \u0010 > -- > > Key: CARBONDATA-3854 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3854 > Project: CarbonData > Issue Type: New Feature >Reporter: Mahesh Raju Somalaraju >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Quote char support to unprintable character like \u0009 \u0010 > Currently carbondata will not support setting quotechar to printable char > like \u0009. > current behaviour is quotechar will through exception if we give more than > one character. > > Need to support more than one character same as like delimiter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3820) Fix CDC failure when sort columns present in source dataframe
[ https://issues.apache.org/jira/browse/CARBONDATA-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3820: - Issue Type: Bug (was: New Feature) > Fix CDC failure when sort columns present in source dataframe > - > > Key: CARBONDATA-3820 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3820 > Project: CarbonData > Issue Type: Bug >Reporter: Xingjun Hao >Priority: Major > Fix For: 2.1.0 > > Time Spent: 4h > Remaining Estimate: 0h > > If there is GloabalSort table in the CDC Flow. The following exception will > be throwed: > Exception in thread "main" java.lang.RuntimeException: column: id specified > in sort columns does not exist in schema > at > org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildTableSchema(CarbonWriterBuilder.java:828) > at > org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildCarbonTable(CarbonWriterBuilder.java:794) > at > org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildLoadModel(CarbonWriterBuilder.java:720) > at > org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil$.prepareLoadModel(CarbonSparkDataSourceUtil.scala:281) > at > org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat.prepareWrite(SparkCarbonFileFormat.scala:141) > at > org.apache.spark.sql.execution.command.mutation.merge.CarbonMergeDataSetCommand.processIUD(CarbonMergeDataSetCommand.scala:269) > at > org.apache.spark.sql.execution.command.mutation.merge.CarbonMergeDataSetCommand.processData(CarbonMergeDataSetCommand.scala:152) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4080) Wrong results for select count on invalid segments
[ https://issues.apache.org/jira/browse/CARBONDATA-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4080. -- Fix Version/s: 2.1.1 Resolution: Fixed > Wrong results for select count on invalid segments > -- > > Key: CARBONDATA-4080 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4080 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Reporter: Akshay >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Wrong results for > * select count on marked for delete segment > * select count on compacted segment > Issue comes only when the user explicitly sets deleted/compacted segments > using the property carbon.input.segments. > As select * on such segments gives 0 rows as output, in order to maintain > consistency, select count should also give 0 rows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4077) Insert into partition with FileMergeSortComparator is failing with NPE
[ https://issues.apache.org/jira/browse/CARBONDATA-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4077. -- Fix Version/s: (was: 2.2.0) 2.1.1 Resolution: Fixed > Insert into partition with FileMergeSortComparator is failing with NPE > -- > > Key: CARBONDATA-4077 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4077 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4050) TPC-DS queries performance degraded when compared to older versions due to redundant getFileStatus() invocations
[ https://issues.apache.org/jira/browse/CARBONDATA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4050. -- Resolution: Fixed > TPC-DS queries performance degraded when compared to older versions due to > redundant getFileStatus() invocations > > > Key: CARBONDATA-4050 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4050 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.0.0 >Reporter: Venugopal Reddy K >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > *Issue:* > In createCarbonDataFileBlockMetaInfoMapping method, we get list of carbondata > files in the segment, loop through all the carbon files and make a map of > fileNameToMetaInfoMapping > In that carbon files loop, if the file is of AbstractDFSCarbonFile > type, we get the org.apache.hadoop.fs.FileStatus thrice for each file. And > the method to get file status is an RPC call(fileSystem.getFileStatus(path)). > It takes ~2ms in the cluster for each call. Thus, incur an overhead of ~6ms > per file. So overall driver side query processing time has increased > significantly when there are more carbon files. Hence caused TPC-DS queries > performance degradation. > Have shown the methods/calls which get the file status for the carbon file in > loop: > {code:java} > public static Map > createCarbonDataFileBlockMetaInfoMapping( > String segmentFilePath, Configuration configuration) throws IOException { > Map fileNameToMetaInfoMapping = new TreeMap(); > CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath, > configuration); > if (carbonFile instanceof AbstractDFSCarbonFile && !(carbonFile instanceof > S3CarbonFile)) { > PathFilter pathFilter = new PathFilter() { > @Override > public boolean accept(Path path) { > return CarbonTablePath.isCarbonDataFile(path.getName()); > } > }; > CarbonFile[] carbonFiles = carbonFile.locationAwareListFiles(pathFilter); > for (CarbonFile file : carbonFiles) { > String[] location = file.getLocations(); // RPC call - 1 > long len = file.getSize(); // RPC call - 2 > BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len); > fileNameToMetaInfoMapping.put(file.getPath(), blockMetaInfo); // RPC > call - 3 in file.getpath() method > } > } > return fileNameToMetaInfoMapping; > } > {code} > > *Suggestion:* > I think, currently we make RPC call to get the file status upon each > invocation because file status may change over a period of time. And we > shouldn't cache the file status in AbstractDFSCarbonFile. > In the current case, just before the loop of carbon files, we get the > file status of all the carbon files in the segment with RPC call shown below. > LocatedFileStatus is a child class of FileStatus. It has BlockLocation along > with file status. > {code:java} > RemoteIterator iter = > fileSystem.listLocatedStatus(path);{code} > Intention of getting all the file status here is to create instance > of BlockMetaInfo and maintain the map of fileNameToMetaInfoMapping. > So it is safe to avoid these unnecessary rpc calls to get file status again > in getLocations(), getSize() and getPath() methods. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4046) Select count(*) fails on partition table.
[ https://issues.apache.org/jira/browse/CARBONDATA-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4046. -- Fix Version/s: 2.1.1 Resolution: Fixed > Select count(*) fails on partition table. > - > > Key: CARBONDATA-4046 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4046 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Fix For: 2.1.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Steps to reproduce > 1. set property `carbon.read.partition.hive.direct=false` > 2. Create table which contain more than one partition column. > 3. run query select count (*) > > It fails with exception as `Key not found`. > > create table partition_cache(a string) partitioned by(b int, c String) stored > as carbondata; > insert into partition_cache select 'k',1,'nihal'; > select count(*) from partition_cache where b = 1; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4022) Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations.
[ https://issues.apache.org/jira/browse/CARBONDATA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4022. -- Fix Version/s: 2.1.1 Resolution: Fixed > Getting the error - "PathName is not a valid DFS filename." with index server > and after adding carbon SDK segments and then doing select/update/delete > operations. > -- > > Key: CARBONDATA-4022 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4022 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.1.1 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Getting this error - "PathName is not a valid DFS filename." during the > update/delete/select queries on a added SDK segment table. Also the path > represented in the error is not proper, which is the cause of error. This is > seen only when index server is running and disable fallback is true. > Queries and errors: > > create table sdk_2level_1(name string, rec1 > > struct>) stored as carbondata; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.425 seconds) > > alter table sdk_2level_1 add segment > > options('path'='hdfs://hacluster/sdkfiles/twolevelnestedrecwitharray','format'='carbondata'); > +-+ > | Result | > +-+ > +-+ > No rows selected (0.77 seconds) > > select * from sdk_2level_1; > INFO : Execution ID: 1855 > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 600.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 600.0 (TID 21345, linux, executor 16): > java.lang.IllegalArgumentException: Pathname > /user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata > from > hdfs://hacluster/user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata > is not a valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:249) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:955) > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:316) > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:293) > at > org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:198) > at > org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:188) > at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:100) > at > org.apache.carbondata.core.reader.CarbonHeaderReader.readHeader(CarbonHeaderReader.java:60) > at > org.apache.carbondata.core.util.DataFileFooterConverterV3.readDataFileFooter(DataFileFooterConverterV3.java:65) > at > org.apache.carbondata.core.util.CarbonUtil.getDataFileFooter(CarbonUtil.java:902) > at > org.apache.carbondata.core.util.CarbonUtil.readMetadataFile(CarbonUtil.java:874) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:216) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382) > at > org.apache.carbondata.core.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:47) > at > org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:117) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPl
[jira] [Updated] (CARBONDATA-3875) Support show segments include stage
[ https://issues.apache.org/jira/browse/CARBONDATA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3875: - Fix Version/s: (was: 2.0.2) 2.1.1 > Support show segments include stage > --- > > Key: CARBONDATA-3875 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3875 > Project: CarbonData > Issue Type: New Feature > Components: spark-integration >Affects Versions: 2.0.0, 2.0.1 >Reporter: Xingjun Hao >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > There is a lack of monitoring of the stage information in the current system, > 'Show segments include stage' command shall be supported. which will provide > monitoring information, such as createTime, partitioninfo, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3856) Support the LIMIT operator for show segments command
[ https://issues.apache.org/jira/browse/CARBONDATA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3856: - Fix Version/s: (was: 2.0.2) 2.1.1 > Support the LIMIT operator for show segments command > > > Key: CARBONDATA-3856 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3856 > Project: CarbonData > Issue Type: New Feature > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Now, in the 2.0.0 release, CarbonData doesn't support LIMIT operator in the > SHOW SEGMENTS command. The time cost is expensive when there are too many > segments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.
[ https://issues.apache.org/jira/browse/CARBONDATA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3670: - Fix Version/s: (was: 2.1.0) 2.1.1 > Support compress offheap columnpage directly, avoding a copy of data from > offhead to heap when compressed. > -- > > Key: CARBONDATA-3670 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3670 > Project: CarbonData > Issue Type: Wish > Components: core >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When writing data, the columnpages are stored on the offheap, the pages will > be compressed to save storage cost. Now, in the compression processing, the > data will be copied from the offheap to the heap before compressed, which > leads to heavier GC overhead compared with compress offhead directly. > To sum up, we support compress offheap columnpage directly, avoding a copy of > data from offhead to heap when compressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3615) Show metacache shows the index server index-dictionary files when data loaded after index server disabled using set command
[ https://issues.apache.org/jira/browse/CARBONDATA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3615: - Fix Version/s: (was: 2.1.0) 2.1.1 > Show metacache shows the index server index-dictionary files when data loaded > after index server disabled using set command > --- > > Key: CARBONDATA-3615 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3615 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.0.0 >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Show metacache shows the index server index-dictionary files when data loaded > after index server disabled using set command > +-+-+-+-+--+ > | Field | Size | Comment | Cache Location | > +-+-+-+-+--+ > | Index | 0 B | 0/2 index files cached | DRIVER | > | Dictionary | 0 B | | DRIVER | > *| Index | 1.5 KB | 2/2 index files cached | INDEX SERVER |* > *| Dictionary | 0 B | | INDEX SERVER |* > *+-+-+-+*-+--+ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3608) Drop 'STORED BY' syntax in create table
[ https://issues.apache.org/jira/browse/CARBONDATA-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3608: - Fix Version/s: (was: 2.1.0) 2.1.1 > Drop 'STORED BY' syntax in create table > --- > > Key: CARBONDATA-3608 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3608 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Priority: Major > Fix For: 2.1.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3816) Support Float and Decimal in the Merge Flow
[ https://issues.apache.org/jira/browse/CARBONDATA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3816: - Fix Version/s: (was: 2.1.0) 2.1.1 > Support Float and Decimal in the Merge Flow > --- > > Key: CARBONDATA-3816 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3816 > Project: CarbonData > Issue Type: New Feature > Components: data-load >Affects Versions: 2.0.0 >Reporter: Xingjun Hao >Priority: Major > Fix For: 2.1.1 > > > We don't support FLOAT and DECIMAL datatype in the CDC Flow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3746) Support column chunk cache creation and basic read/write
[ https://issues.apache.org/jira/browse/CARBONDATA-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3746: - Fix Version/s: (was: 2.1.0) 2.1.1 > Support column chunk cache creation and basic read/write > > > Key: CARBONDATA-3746 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3746 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Fix For: 2.1.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4026) Thread leakage while Loading
[ https://issues.apache.org/jira/browse/CARBONDATA-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4026: - Fix Version/s: (was: 2.1.0) 2.1.1 > Thread leakage while Loading > > > Key: CARBONDATA-4026 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4026 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.0.1 >Reporter: Xingjun Hao >Priority: Major > Fix For: 2.1.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > A few code of Inserting/Loading/InsertStage/IndexServer won't shutdown > executorservice. leads to thread leakage which will degrade the performance > of the driver and executor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4003) Improve IUD Concurrency
[ https://issues.apache.org/jira/browse/CARBONDATA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4003: - Fix Version/s: (was: 2.1.0) 2.1.1 > Improve IUD Concurrency > --- > > Key: CARBONDATA-4003 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4003 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 2.0.1 >Reporter: Kejian Li >Priority: Major > Fix For: 2.1.1 > > Time Spent: 18h 10m > Remaining Estimate: 0h > > When some segments' state of the table is INSERT IN PROGRESS, update > operation on the table fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4008) IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true
[ https://issues.apache.org/jira/browse/CARBONDATA-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4008: - Fix Version/s: (was: 2.1.0) 2.1.1 > IN filter on date column is returning 0 results when > 'carbon.push.rowfilters.for.vector' is true > > > Key: CARBONDATA-4008 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4008 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.0.0 >Reporter: Venugopal Reddy K >Priority: Major > Fix For: 2.1.1 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > *Issue:* > IN filter with date column in condition is returning 0 results when > 'carbon.push.rowfilters.for.vector' is set to true. > > *Steps to reproduce:* > sql("set carbon.push.rowfilters.for.vector=true") > sql("create table test_table(i int, dt date, ts timestamp) stored as > carbondata") > sql("insert into test_table select 1, '2020-03-30', '2020-03-30 10:00:00'") > sql("insert into test_table select 2, '2020-07-04', '2020-07-04 14:12:15'") > sql("insert into test_table select 3, '2020-09-23', '2020-09-23 12:30:45'") > sql("select * from test_table where dt IN ('2020-03-30', > '2020-09-23')").show() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4050) TPC-DS queries performance degraded when compared to older versions due to redundant getFileStatus() invocations
[ https://issues.apache.org/jira/browse/CARBONDATA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4050: - Fix Version/s: (was: 2.1.0) 2.1.1 > TPC-DS queries performance degraded when compared to older versions due to > redundant getFileStatus() invocations > > > Key: CARBONDATA-4050 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4050 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.0.0 >Reporter: Venugopal Reddy K >Priority: Major > Fix For: 2.1.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > *Issue:* > In createCarbonDataFileBlockMetaInfoMapping method, we get list of carbondata > files in the segment, loop through all the carbon files and make a map of > fileNameToMetaInfoMapping > In that carbon files loop, if the file is of AbstractDFSCarbonFile > type, we get the org.apache.hadoop.fs.FileStatus thrice for each file. And > the method to get file status is an RPC call(fileSystem.getFileStatus(path)). > It takes ~2ms in the cluster for each call. Thus, incur an overhead of ~6ms > per file. So overall driver side query processing time has increased > significantly when there are more carbon files. Hence caused TPC-DS queries > performance degradation. > Have shown the methods/calls which get the file status for the carbon file in > loop: > {code:java} > public static Map > createCarbonDataFileBlockMetaInfoMapping( > String segmentFilePath, Configuration configuration) throws IOException { > Map fileNameToMetaInfoMapping = new TreeMap(); > CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath, > configuration); > if (carbonFile instanceof AbstractDFSCarbonFile && !(carbonFile instanceof > S3CarbonFile)) { > PathFilter pathFilter = new PathFilter() { > @Override > public boolean accept(Path path) { > return CarbonTablePath.isCarbonDataFile(path.getName()); > } > }; > CarbonFile[] carbonFiles = carbonFile.locationAwareListFiles(pathFilter); > for (CarbonFile file : carbonFiles) { > String[] location = file.getLocations(); // RPC call - 1 > long len = file.getSize(); // RPC call - 2 > BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len); > fileNameToMetaInfoMapping.put(file.getPath(), blockMetaInfo); // RPC > call - 3 in file.getpath() method > } > } > return fileNameToMetaInfoMapping; > } > {code} > > *Suggestion:* > I think, currently we make RPC call to get the file status upon each > invocation because file status may change over a period of time. And we > shouldn't cache the file status in AbstractDFSCarbonFile. > In the current case, just before the loop of carbon files, we get the > file status of all the carbon files in the segment with RPC call shown below. > LocatedFileStatus is a child class of FileStatus. It has BlockLocation along > with file status. > {code:java} > RemoteIterator iter = > fileSystem.listLocatedStatus(path);{code} > Intention of getting all the file status here is to create instance > of BlockMetaInfo and maintain the map of fileNameToMetaInfoMapping. > So it is safe to avoid these unnecessary rpc calls to get file status again > in getLocations(), getSize() and getPath() methods. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3643: - Fix Version/s: (was: 2.1.0) 2.1.1 > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.1.1 > > > > {code:java} > // > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * > FROM datatype_struct_parquet")) > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3617) loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow
[ https://issues.apache.org/jira/browse/CARBONDATA-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3617: - Fix Version/s: (was: 2.1.0) 2.1.1 > loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow > -- > > Key: CARBONDATA-3617 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3617 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > During loading Data usesing globalsort, the sortby processing is based the > whole carbon row, the overhead of gc is huge when there are many columns. > Theoretically, the sortby processing can works well just based on the sort > columns, which will brings less time overhead and gc overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3880) How to start JDBC service in distributed index
[ https://issues.apache.org/jira/browse/CARBONDATA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3880: - Fix Version/s: (was: 2.1.0) 2.1.1 > How to start JDBC service in distributed index > --- > > Key: CARBONDATA-3880 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3880 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.0.0 >Reporter: li >Priority: Major > Fix For: 2.1.1 > > > How to start JDBC service in distributed index -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4031) Query result is incorrect after Delete and Insert overwrite
[ https://issues.apache.org/jira/browse/CARBONDATA-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4031: - Fix Version/s: (was: 2.1.0) 2.1.1 > Query result is incorrect after Delete and Insert overwrite > --- > > Key: CARBONDATA-4031 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4031 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 2.0.0 >Reporter: Kejian Li >Priority: Critical > Fix For: 2.1.1 > > Attachments: s_x034_carbon-07.csv, s_x034_carbon-08.csv > > Time Spent: 5h 50m > Remaining Estimate: 0h > > There is a table with two partitions. User deletes some records on one of > partitions and then inserts overwrite the other partition. Deleted records on > the previous partition comes back. > 1. CREATE TABLE s_x034_carbon (guid STRING, sales_guid STRING) PARTITIONED > BY (dt STRING) STORED AS carbondata; > 2. load data local inpath > '/home/lizi/Workspace/carbondata_test_workspace/data/s_x034_carbon-07.csv' > into table s_x034_carbon; > load data local inpath > '/home/lizi/Workspace/carbondata_test_workspace/data/s_x034_carbon-08.csv' > into table s_x034_carbon; > 3. select count(1), dt from s_x034_carbon group by dt; > 4. select * from s_x034_carbon where dt=20200907 limit 5; > 5. delete from s_x034_carbon where dt= 20200907 and > guid='595E1862D81A09D0E1008000AC1E0124'; > delete from s_x034_carbon where dt= 20200907 and > guid='005056AF06441EDA89ABF853E435A6BD'; > 6. select count(1), dt from s_x034_carbon group by dt; > 7. insert overwrite table s_x034_carbon partition (dt=20200908) > select a.guid as guid, a.sales_guid as sales_guid from s_x034_carbon a > where dt = 20200907; > 8. select count(1), dt from s_x034_carbon group by dt; > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4032) Drop partition command clean other partition dictionaries
[ https://issues.apache.org/jira/browse/CARBONDATA-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-4032: - Fix Version/s: (was: 2.1.0) 2.1.1 > Drop partition command clean other partition dictionaries > - > > Key: CARBONDATA-4032 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4032 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.0.1 >Reporter: Xingjun Hao >Priority: Critical > Fix For: 2.1.1 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > 1. CREATE TABLE droppartition (id STRING, sales STRING) PARTITIONED BY (dtm > STRING)STORED AS carbondata > 2. insert into droppartition values ('01', '0', '20200907'),('03', '0', > '20200908'), > 3. insert overwrite table droppartition partition (dtm=20200908) select * > from droppartition where dtm = 20200907; > insert overwrite table droppartition partition (dtm=20200909) select * from > droppartition where dtm = 20200907; > 4. alter table droppartition drop partition (dtm=20200909) > the dirctionary "20200908" was deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3559) Support adding carbon file into CarbonData table
[ https://issues.apache.org/jira/browse/CARBONDATA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3559: - Fix Version/s: (was: 2.1.0) 2.1.1 > Support adding carbon file into CarbonData table > > > Key: CARBONDATA-3559 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3559 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > Fix For: 2.1.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Since adding parquet/orc files into CarbonData table are supported now, > adding carbon files should be supported as well -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3603) Feature Change in CarbonData 2.0
[ https://issues.apache.org/jira/browse/CARBONDATA-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3603: - Fix Version/s: (was: 2.1.0) 2.1.1 > Feature Change in CarbonData 2.0 > > > Key: CARBONDATA-3603 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3603 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Priority: Major > Fix For: 2.1.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3370) fix missing version of maven-duplicate-finder-plugin
[ https://issues.apache.org/jira/browse/CARBONDATA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3370: - Fix Version/s: (was: 2.1.0) 2.1.1 > fix missing version of maven-duplicate-finder-plugin > > > Key: CARBONDATA-3370 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3370 > Project: CarbonData > Issue Type: Improvement > Components: build >Affects Versions: 1.5.3 >Reporter: lamber-ken >Priority: Critical > Fix For: 2.1.1 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > fix missing version of maven-duplicate-finder-plugin in pom file -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3991) File system could not set modified time because don't override the settime function
[ https://issues.apache.org/jira/browse/CARBONDATA-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3991: - Fix Version/s: (was: 2.0.1) 2.1.1 > File system could not set modified time because don't override the settime > function > --- > > Key: CARBONDATA-3991 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3991 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.0.1 >Reporter: jingpan xiong >Priority: Major > Fix For: 2.1.1 > > Time Spent: 17h 10m > Remaining Estimate: 0h > > The file system like S3 and Alluxio, don't override the settime function, > cause the updata and create mv got some problem. This bug can't raise a > exception on set modified time, and may set a null value in modified time. > This bug may cause multi tenant problem and data consistency problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4006) Get count method of Index server gives currentUser as NULL in fallback mode, can later lead to Null pointer exception
[ https://issues.apache.org/jira/browse/CARBONDATA-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4006. -- Fix Version/s: 2.1.0 Resolution: Fixed > Get count method of Index server gives currentUser as NULL in fallback mode, > can later lead to Null pointer exception > - > > Key: CARBONDATA-4006 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4006 > Project: CarbonData > Issue Type: Improvement >Reporter: Vikram Ahuja >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4043) Fix data load failure issue for columns added in legacy store
[ https://issues.apache.org/jira/browse/CARBONDATA-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4043. -- Fix Version/s: 2.1.0 Resolution: Fixed > Fix data load failure issue for columns added in legacy store > - > > Key: CARBONDATA-4043 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4043 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Indhumathi Muthumurugesh >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > h3. When dimension is added in older versions like 1.1, by default it will be > sort column. In sort step we assume data will be coming as sort column in the > beginning. But the added column will be at last eventhough sort column. So, > while building the dataload configurations for loading data, we rearrange the > columns(dimensions and datafields) in order to bring the sort column to > beginning and no-sort to last and revert them back to schema order before > FinalMerge/DataWriter step. > Issue: > Data loading is failing because of castException in data writing step in > case of NO_SORT and in final sort step in case of LOCAL_SORT. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4007) ArrayIndexOutofBoundsException when IUD operations performed using SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4007. -- Fix Version/s: 2.1.0 Resolution: Fixed > ArrayIndexOutofBoundsException when IUD operations performed using SDK > -- > > Key: CARBONDATA-4007 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4007 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.1.0 > Environment: Spark 2.4.5 jars used for compilation of SDK >Reporter: Chetan Bhat >Priority: Major > Fix For: 2.1.0 > > Time Spent: 9h 20m > Remaining Estimate: 0h > > Issue - > ArrayIndexOutofBoundsException when IUD operations performed using SDK. > Exception - > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.close(CarbonTableOutputFormat.java:579) > at org.apache.carbondata.sdk.file.CarbonIUD.delete(CarbonIUD.java:110) > at > org.apache.carbondata.sdk.file.CarbonIUD.deleteExecution(CarbonIUD.java:238) > at org.apache.carbondata.sdk.file.CarbonIUD.closeDelete(CarbonIUD.java:123) > at org.apache.carbondata.sdk.file.CarbonIUD.commit(CarbonIUD.java:221) > at com.apache.spark.SdkIUD_Test.testDelete(SdkIUD_Test.java:130) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3979) Added Hive local dictionary support example
[ https://issues.apache.org/jira/browse/CARBONDATA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3979. -- Fix Version/s: 2.1.0 Resolution: Fixed > Added Hive local dictionary support example > --- > > Key: CARBONDATA-3979 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3979 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > To verify local dictionary support in hive for the carbon tables created > from spark. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3999) The permission of IndexServer's temporary directory /tmp/indexservertmp is not 777 after running sometime.
[ https://issues.apache.org/jira/browse/CARBONDATA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3999. -- Fix Version/s: 2.1.0 Resolution: Fixed > The permission of IndexServer's temporary directory /tmp/indexservertmp is > not 777 after running sometime. > -- > > Key: CARBONDATA-3999 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3999 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.0.0 >Reporter: renhao >Priority: Critical > Labels: IndexServer > Fix For: 2.1.0 > > Attachments: 4700942c-3158-424f-8861-3dfcb6fae205.png > > > 1.start index server in FI.check the permission of "/tmp/indexservertmp" in > hdfs is 777; > 2.run sometime,an error occured when using indexserver,and check the > permission of "/tmp/indexservertmp" became 755 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3994) Skip Order by for map task if it is sort column and use limit pushdown for array_contains filter
[ https://issues.apache.org/jira/browse/CARBONDATA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3994. -- Fix Version/s: 2.1.0 Resolution: Fixed > Skip Order by for map task if it is sort column and use limit pushdown for > array_contains filter > > > Key: CARBONDATA-3994 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3994 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Fix For: 2.1.0 > > Time Spent: 6h > Remaining Estimate: 0h > > When the order by column is in sort column, every map task output will be > already sorted. No need to sort the data again. > Hence skipping the order at map task by changing plan node from > {{TakeOrderedAndProject}} --> {{CarbonTakeOrderedAndProjectExec}} > Also in this scenario collecting the limit at map task and Array_contains() > will use this limit value for row scan filtering to break scan once limit > value is reached. > Also added a carbon property to control this . > {{carbon.mapOrderPushDown._.column}} > Note: later we can improve this for other filters also to use the limit value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is
[ https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3937. -- Resolution: Invalid Not able to reproduce in master code, please recheck > Insert into select from another carbon /parquet table is not working on Hive > Beeline on a newly create Hive write format - carbon table. We are getting > “Database is not set" error. > > > Key: CARBONDATA-3937 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3937 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 >Reporter: Prasanna Ravichandran >Priority: Major > > Insert into select from another carbon or parquet table to a carbon table is > not working on Hive Beeline on a newly create Hive write format carbon table. > We are getting “Database is not set” error. > > Test queries: > drop table if exists hive_carbon; > create table hive_carbon(id int, name string, scale decimal, country string, > salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; > insert into hive_carbon select 1,"Ram","2.3","India",3500; > insert into hive_carbon select 2,"Raju","2.4","Russia",3600; > insert into hive_carbon select 3,"Raghu","2.5","China",3700; > insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; > > drop table if exists hive_carbon2; > create table hive_carbon2(id int, name string, scale decimal, country string, > salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; > insert into hive_carbon2 select * from hive_carbon; > select * from hive_carbon; > select * from hive_carbon2; > > --execute below queries in spark-beeline; > create table hive_table(id int, name string, scale decimal, country string, > salary double); > create table parquet_table(id int, name string, scale decimal, country > string, salary double) stored as parquet; > insert into hive_table select 1,"Ram","2.3","India",3500; > select * from hive_table; > insert into parquet_table select 1,"Ram","2.3","India",3500; > select * from parquet_table; > --execute the below query in hive beeline; > insert into hive_carbon select * from parquet_table; > Attached the logs for your reference. But the insert into select from the > parquet and hive table into carbon table is working fine. > > Only insert into select from hive table to carbon table is only working. > Error details in MR job which run through hive query: > Error: java.io.IOException: java.io.IOException: Database name is not set. at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at > org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: > java.io.IOException: Database name is not set. at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841) > at > org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80) > at > org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215) > at > org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411) > ... 9 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4036) When the ` character is present in column name, the table creation fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4036. -- Fix Version/s: 2.1.0 Resolution: Fixed > When the ` character is present in column name, the table creation fails > > > Key: CARBONDATA-4036 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4036 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When the ` character is present in column name, the table creation fails > sql("create table special_char(`i#d` string, `nam(e` string,`ci)@!ty` > string,`a\be` int, `ag!e` float, `na^me1` Decimal(8,4), ```a``bc``!!d``` int) > stored as carbondata" + > " tblproperties('INVERTED_INDEX'='`a`bc`!!d`', > 'SORT_COLUMNS'='`a`bc`!!d`')") -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4004) Wrong result in Presto select query after executing update
[ https://issues.apache.org/jira/browse/CARBONDATA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4004. -- Fix Version/s: 2.1.0 Resolution: Fixed > Wrong result in Presto select query after executing update > -- > > Key: CARBONDATA-4004 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4004 > Project: CarbonData > Issue Type: Bug > Components: core, presto-integration >Reporter: Akshay >Priority: Major > Fix For: 2.1.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Presto select query after update operation returns different number of rows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4012) Documentations issues.
[ https://issues.apache.org/jira/browse/CARBONDATA-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4012. -- Fix Version/s: 2.1.0 Resolution: Fixed > Documentations issues. > -- > > Key: CARBONDATA-4012 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4012 > Project: CarbonData > Issue Type: Bug >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > > Support Array and Struct of all primitive type reading on presto from Spark > Carbon tables. This feature details have to be added in the below opensource > link: > [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3975) Data mismatch when the binary data is read via hive in carbon.
[ https://issues.apache.org/jira/browse/CARBONDATA-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3975. -- Fix Version/s: 2.1.0 Resolution: Fixed > Data mismatch when the binary data is read via hive in carbon. > -- > > Key: CARBONDATA-3975 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3975 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > Fix For: 2.1.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Data mismatch when the binary data is read via hive in carbon. carbon gives > some wrong data compared to hive table for the same input data -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4019) CDC fails when the join expression contains the AND or any logical expression
[ https://issues.apache.org/jira/browse/CARBONDATA-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4019. -- Fix Version/s: 2.1.0 Resolution: Fixed > CDC fails when the join expression contains the AND or any logical expression > - > > Key: CARBONDATA-4019 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4019 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > CDC fails when the join expression contains the AND or any logical expressions > Fails with cast expression -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4016) NPE and FileNotFound in Show Segments and Insert Stage
[ https://issues.apache.org/jira/browse/CARBONDATA-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4016. -- Resolution: Fixed > NPE and FileNotFound in Show Segments and Insert Stage > -- > > Key: CARBONDATA-4016 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4016 > Project: CarbonData > Issue Type: Bug > Components: flink-integration, spark-integration >Affects Versions: 2.0.1 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > # Insert Stage, While Spark read Stages which are writting by Flink in the > meanwhile, JSONFORMAT EXCEPTION will be thrown. > # Show Segments with STAGE, when read stages which are writting by Flink or > deleting by spark. JSONFORMAT EXCEPTION will be thrown > # Show Segment will load partition info for non-partition table, which shall > be avoided. > # In getLastModifiedTime of TableStatus, if the loadendtime is empty, > getLastModifiedTime throw NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4014) Support Change Column Comment
[ https://issues.apache.org/jira/browse/CARBONDATA-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4014. -- Resolution: Fixed > Support Change Column Comment > - > > Key: CARBONDATA-4014 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4014 > Project: CarbonData > Issue Type: New Feature > Components: sql >Affects Versions: 2.0.1 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Now, we support add comment when CREATE TABLE and ADD COLUMN. but do not > support alter comment of specified column. > We shall support alter comment with hive syntax > "ALTER TABLE table_name CHANGE [COLUMN] col_name col_name data_type [COMMENT > col_comment]" > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3997) Issue in decimal value reading for negative numbers from presto
[ https://issues.apache.org/jira/browse/CARBONDATA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3997. -- Fix Version/s: 2.1.0 Resolution: Fixed > Issue in decimal value reading for negative numbers from presto > --- > > Key: CARBONDATA-3997 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3997 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Fix For: 2.1.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > When complex decimal column is stored with DIRECT_COMPRESS codec, > DataTypeUtil#bigDecimalToByte is used to create a byte array. > So, while decoding it, need to use DataTypeUtil#byteToBigDecimal to get back > the proper value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4001) SI global sort load on partition table results in 0 rows
[ https://issues.apache.org/jira/browse/CARBONDATA-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4001. -- Fix Version/s: 2.1.0 Resolution: Fixed > SI global sort load on partition table results in 0 rows > > > Key: CARBONDATA-4001 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4001 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > On a partition table, when SI is created with global sort and when data is > loaded. It shows 0 rows as main table query results is 0 rows. > For partition table, local sort SI flow {{current.segmentfile}} is set in > {{CarbonSecondaryIndexRDD}} > For the global sort, this value was not set. so, the main table query was > resulting in 0 rows. Setting this value for global sort flow also. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3998) FileNotFoundException being thrown in hive during insert.
Kunal Kapoor created CARBONDATA-3998: Summary: FileNotFoundException being thrown in hive during insert. Key: CARBONDATA-3998 URL: https://issues.apache.org/jira/browse/CARBONDATA-3998 Project: CarbonData Issue Type: Bug Reporter: Kunal Kapoor Assignee: Kunal Kapoor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3995) Support presto querying older complex type stores
[ https://issues.apache.org/jira/browse/CARBONDATA-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3995. -- Fix Version/s: 2.1.0 Resolution: Fixed > Support presto querying older complex type stores > - > > Key: CARBONDATA-3995 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3995 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Before carbon 2.0, complex child length is stored as SHORT for string, > varchar, binary, date, decimal types. > So, In 2.0 as it is stored as INT, presto complex query code always assumes > it as INT > and goes to out of bound exception when old store is queried. > > If INT_LENGTH_COMPLEX_CHILD_BYTE_ARRAY encoding is present, parse as INT, > else parse as SHORT. > so, that both stores can be queried. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3982) Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver
[ https://issues.apache.org/jira/browse/CARBONDATA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3982. -- Fix Version/s: 2.1.0 Resolution: Fixed > Use Partition instead of Span to split legacy and non-legacy segments for > executor distribution in indexserver > --- > > Key: CARBONDATA-3982 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3982 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)