[jira] [Created] (CARBONDATA-4140) The carbon implement of DataSourceV2
David Cai created CARBONDATA-4140: - Summary: The carbon implement of DataSourceV2 Key: CARBONDATA-4140 URL: https://issues.apache.org/jira/browse/CARBONDATA-4140 Project: CarbonData Issue Type: Sub-task Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4139) Integration for Spark 3 and Hadoop 3
[ https://issues.apache.org/jira/browse/CARBONDATA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-4139: -- Summary: Integration for Spark 3 and Hadoop 3 (was: integration for Spark 3 and Hadoop 3) > Integration for Spark 3 and Hadoop 3 > > > Key: CARBONDATA-4139 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4139 > Project: CarbonData > Issue Type: Sub-task >Reporter: David Cai >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4139) integration for Spark 3 and Hadoop 3
David Cai created CARBONDATA-4139: - Summary: integration for Spark 3 and Hadoop 3 Key: CARBONDATA-4139 URL: https://issues.apache.org/jira/browse/CARBONDATA-4139 Project: CarbonData Issue Type: Sub-task Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4138) Carbon Expression Reorder instead of Spark Filter Reorder
David Cai created CARBONDATA-4138: - Summary: Carbon Expression Reorder instead of Spark Filter Reorder Key: CARBONDATA-4138 URL: https://issues.apache.org/jira/browse/CARBONDATA-4138 Project: CarbonData Issue Type: Sub-task Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4137) Refactor CarbonDataSourceScan without Spark Filter
David Cai created CARBONDATA-4137: - Summary: Refactor CarbonDataSourceScan without Spark Filter Key: CARBONDATA-4137 URL: https://issues.apache.org/jira/browse/CARBONDATA-4137 Project: CarbonData Issue Type: Sub-task Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4136) Support Spark 3 and Hadoop 3
David Cai created CARBONDATA-4136: - Summary: Support Spark 3 and Hadoop 3 Key: CARBONDATA-4136 URL: https://issues.apache.org/jira/browse/CARBONDATA-4136 Project: CarbonData Issue Type: Improvement Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4075) Should refactor to use withEvents instead of fireEvent
[ https://issues.apache.org/jira/browse/CARBONDATA-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-4075: -- Summary: Should refactor to use withEvents instead of fireEvent (was: Should refactor carbon to use withEvents instead of fireEvent) > Should refactor to use withEvents instead of fireEvent > -- > > Key: CARBONDATA-4075 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4075 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4075) Should refactor carbon to use withEvents instead of fireEvent
David Cai created CARBONDATA-4075: - Summary: Should refactor carbon to use withEvents instead of fireEvent Key: CARBONDATA-4075 URL: https://issues.apache.org/jira/browse/CARBONDATA-4075 Project: CarbonData Issue Type: Improvement Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4074) Should clean stale data in success segments
David Cai created CARBONDATA-4074: - Summary: Should clean stale data in success segments Key: CARBONDATA-4074 URL: https://issues.apache.org/jira/browse/CARBONDATA-4074 Project: CarbonData Issue Type: Improvement Reporter: David Cai cleaning stale data in success segments include the following parts. 1. clean stale delete delta (when force is true) 2. clean stale small files for index table 3. clean stale data files for loading/compaction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager
[ https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-4062: -- Description: To prevent accidental deletion of data, carbon will introduce data trash management. It will provide buffer time for accidental deletion of data to roll back the delete operation. Data trash management is a part of carbon data lifecycle management. Clean files as a data trash manager should contain the following two parts. part 1: manage metadata-indexed data trash. This data is at the original place of the table and indexed by metadata. carbon manages this data by metadata index and should avoid using listFile() interface. part 2: manage ".Trash" folder. Now ".Trash" folder is without metadata index, and the operation on it bases on timestamp and listFile() interface. In the future, carbon will index ".Trash" folder to improve data trash management. was: To prevent accidental deletion of data, carbon will introduce data trash management. It will provide buffer time for accidental deletion of data to roll back the delete operation. Data trash management is a part of carbon data lifecycle management. Clean files as a data trash manager should contain the following two parts. part 1: manage metadata-indexed data trash. this data should be at the original place part 2: manage ".Trash" folder. Now this ".Trash" folder is without metadata index, and the operation on it will depend on timestamp and listFile interface. It should be improve in the future. > Should make clean files become data trash manager > - > > Key: CARBONDATA-4062 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4062 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Major > > To prevent accidental deletion of data, carbon will introduce data trash > management. It will provide buffer time for accidental deletion of data to > roll back the delete operation. > Data trash management is a part of carbon data lifecycle management. Clean > files as a data trash manager should contain the following two parts. > part 1: manage metadata-indexed data trash. > This data is at the original place of the table and indexed by metadata. > carbon manages this data by metadata index and should avoid using listFile() > interface. > part 2: manage ".Trash" folder. > Now ".Trash" folder is without metadata index, and the operation on it > bases on timestamp and listFile() interface. In the future, carbon will index > ".Trash" folder to improve data trash management. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager
[ https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-4062: -- Description: To prevent accidental deletion of data, carbon will introduce data trash management. It will provide buffer time for accidental deletion of data to roll back the delete operation. Data trash management is a part of carbon data lifecycle management. Clean files as a data trash manager should contain the following two parts. part 1: manage metadata-indexed data trash. this data should be at the original place part 2: manage ".Trash" folder. Now this ".Trash" folder is without metadata index, and the operation on it will depend on timestamp and listFile interface. It should be improve in the future. was: To prevent accidental deletion of data, carbon will introduce data trash management. It will provide buffer time for accidental deletion of data to roll back the delete operation. Data trash management is a part of carbon data lifecycle management. Clean files as a data trash manager should contain the following two parts. part 1: manage metadata-indexed data trash. this data should be at the original place part 2: manage ".Trash" folder. Now this ".Trash" folder is without metadata index, and the operation on it will depend on listFile interface. It should be improve in the future. > Should make clean files become data trash manager > - > > Key: CARBONDATA-4062 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4062 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Major > > To prevent accidental deletion of data, carbon will introduce data trash > management. It will provide buffer time for accidental deletion of data to > roll back the delete operation. > Data trash management is a part of carbon data lifecycle management. Clean > files as a data trash manager should contain the following two parts. > part 1: manage metadata-indexed data trash. > this data should be at the original place > part 2: manage ".Trash" folder. > Now this ".Trash" folder is without metadata index, and the operation on it > will depend on timestamp and listFile interface. It should be improve in the > future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager
[ https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-4062: -- Description: To prevent accidental deletion of data, carbon will introduce data trash management. It will provide buffer time for accidental deletion of data to roll back the delete operation. Data trash management is a part of carbon data lifecycle management. Clean files as a data trash manager should contain the following two parts. part 1: manage metadata-indexed data trash. this data should be at the original place part 2: manage ".Trash" folder. Now this ".Trash" folder is without metadata index, and the operation on it will depend on listFile interface. It should be improve in the future. was:To prevent accidental deletion of data, carbon introduced a data garbage manager. It will provide buffer time for accidental deletion of data to roll back the delete operation. > Should make clean files become data trash manager > - > > Key: CARBONDATA-4062 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4062 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Major > > To prevent accidental deletion of data, carbon will introduce data trash > management. It will provide buffer time for accidental deletion of data to > roll back the delete operation. > Data trash management is a part of carbon data lifecycle management. Clean > files as a data trash manager should contain the following two parts. > part 1: manage metadata-indexed data trash. > this data should be at the original place > part 2: manage ".Trash" folder. > Now this ".Trash" folder is without metadata index, and the operation on it > will depend on listFile interface. It should be improve in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager
[ https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-4062: -- Summary: Should make clean files become data trash manager (was: should Make clean files become data trash manager) > Should make clean files become data trash manager > - > > Key: CARBONDATA-4062 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4062 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Major > > To prevent accidental deletion of data, carbon introduced a data garbage > manager. It will provide buffer time for accidental deletion of data to roll > back the delete operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4062) should Make clean files become data trash manager
David Cai created CARBONDATA-4062: - Summary: should Make clean files become data trash manager Key: CARBONDATA-4062 URL: https://issues.apache.org/jira/browse/CARBONDATA-4062 Project: CarbonData Issue Type: Improvement Reporter: David Cai To prevent accidental deletion of data, carbon introduced a data garbage manager. It will provide buffer time for accidental deletion of data to roll back the delete operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4015) RetryCount and retryInterval of updateLock and compactLock is fixed as 3 when they try to get lock
[ https://issues.apache.org/jira/browse/CARBONDATA-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-4015. --- Resolution: Fixed > RetryCount and retryInterval of updateLock and compactLock is fixed as 3 when > they try to get lock > --- > > Key: CARBONDATA-4015 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4015 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 2.0.1 >Reporter: Kejian Li >Priority: Trivial > Fix For: 2.1.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3960) Column comment should be null by default when adding column
David Cai created CARBONDATA-3960: - Summary: Column comment should be null by default when adding column Key: CARBONDATA-3960 URL: https://issues.apache.org/jira/browse/CARBONDATA-3960 Project: CarbonData Issue Type: Improvement Reporter: David Cai 1. create table create table test_add_column_with_comment( col1 string comment 'col1 comment', col2 int, col3 string) stored as carbondata 2 . alter table alter table test_add_column_with_comment add columns( col4 string comment "col4 comment", col5 int, col6 string comment "") 3. describe table describe test_add_column_with_comment ++-++ |col_name|data_type|comment | ++-++ |col1 |string |col1 comment| |col2 |int |null | |col3 |string |null | |col4 |string |col4 comment| |col5 |int | | |col6 |string | | ++-++ the comment of col5 is "" by default -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3958) CDC Merge task can't finish
David Cai created CARBONDATA-3958: - Summary: CDC Merge task can't finish Key: CARBONDATA-3958 URL: https://issues.apache.org/jira/browse/CARBONDATA-3958 Project: CarbonData Issue Type: Bug Reporter: David Cai # The merge tasks take a long time and can't finish in some cases. # We find warning "This scenario should not happen" in log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3930) MVExample is throwing DataLoadingException
David Cai created CARBONDATA-3930: - Summary: MVExample is throwing DataLoadingException Key: CARBONDATA-3930 URL: https://issues.apache.org/jira/browse/CARBONDATA-3930 Project: CarbonData Issue Type: Bug Affects Versions: 2.1.0 Reporter: David Cai [Reproduce] Run examples/spark/src/main/scala/org/apache/carbondata/examples/MVExample.scala in IDEA [LOG] Exception in thread "main" org.apache.carbondata.processing.exception.DataLoadingException: The input file does not exist: /***/carbondata/integration/spark-common-test/src/test/resources/sample.csvException in thread "main" org.apache.carbondata.processing.exception.DataLoadingException: The input file does not exist: /home/david/Documents/code/carbondata/integration/spark-common-test/src/test/resources/sample.csv at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:81) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.util.FileUtils$.getPaths(FileUtils.scala:77) at org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:97) at org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:148) at org.apache.spark.sql.execution.command.AtomicRunnableCommand$$anonfun$run$3.apply(package.scala:145) at org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:104) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:141) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:145) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$51.apply(Dataset.scala:3265) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3264) at org.apache.spark.sql.Dataset.(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at org.apache.carbondata.examples.MVExample$.exampleBody(MVExample.scala:67) at org.apache.carbondata.examples.MVExample$.main(MVExample.scala:37) at org.apache.carbondata.examples.MVExample.main(MVExample.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3924) Should add default dynamic parameters only one time in one JVM process
David Cai created CARBONDATA-3924: - Summary: Should add default dynamic parameters only one time in one JVM process Key: CARBONDATA-3924 URL: https://issues.apache.org/jira/browse/CARBONDATA-3924 Project: CarbonData Issue Type: Bug Reporter: David Cai Because ConfigEntry.registerEntry method cann't register same entry one times, so it should add default dynamic parameters only one time in one JVM process -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3889) Should optimize the code of the inspection result of Intellij IDEA
David Cai created CARBONDATA-3889: - Summary: Should optimize the code of the inspection result of Intellij IDEA Key: CARBONDATA-3889 URL: https://issues.apache.org/jira/browse/CARBONDATA-3889 Project: CarbonData Issue Type: Improvement Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3888) Should move .flattened-pom.xml to target folder
[ https://issues.apache.org/jira/browse/CARBONDATA-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-3888: -- Description: after .flattened-pom.xml is generated in the project folder, it will impact the project import of Intellij idea (was: When ) > Should move .flattened-pom.xml to target folder > --- > > Key: CARBONDATA-3888 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3888 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Minor > > after .flattened-pom.xml is generated in the project folder, it will impact > the project import of Intellij idea -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3888) Should move .flattened-pom.xml to target folder
[ https://issues.apache.org/jira/browse/CARBONDATA-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-3888: -- Description: When > Should move .flattened-pom.xml to target folder > --- > > Key: CARBONDATA-3888 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3888 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Priority: Minor > > When -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3888) Should move .flattened-pom.xml to target folder
David Cai created CARBONDATA-3888: - Summary: Should move .flattened-pom.xml to target folder Key: CARBONDATA-3888 URL: https://issues.apache.org/jira/browse/CARBONDATA-3888 Project: CarbonData Issue Type: Improvement Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'
[ https://issues.apache.org/jira/browse/CARBONDATA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-3878. --- Resolution: Fixed > Should get the last modified time from 'tablestatus' file instead of segment > file to reduce file operation 'getLastModifiedTime' > - > > Key: CARBONDATA-3878 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3878 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Assignee: David Cai >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'
[ https://issues.apache.org/jira/browse/CARBONDATA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai reassigned CARBONDATA-3878: - Assignee: David Cai > Should get the last modified time from 'tablestatus' file instead of segment > file to reduce file operation 'getLastModifiedTime' > - > > Key: CARBONDATA-3878 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3878 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Assignee: David Cai >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'
David Cai created CARBONDATA-3878: - Summary: Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime' Key: CARBONDATA-3878 URL: https://issues.apache.org/jira/browse/CARBONDATA-3878 Project: CarbonData Issue Type: Improvement Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3870) Global lock impact the performance of the concurrent query
David Cai created CARBONDATA-3870: - Summary: Global lock impact the performance of the concurrent query Key: CARBONDATA-3870 URL: https://issues.apache.org/jira/browse/CARBONDATA-3870 Project: CarbonData Issue Type: Improvement Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-3837) Should fallback to the original plan when MV rewrite throw exception
[ https://issues.apache.org/jira/browse/CARBONDATA-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai reassigned CARBONDATA-3837: - Assignee: David Cai > Should fallback to the original plan when MV rewrite throw exception > > > Key: CARBONDATA-3837 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3837 > Project: CarbonData > Issue Type: Improvement >Reporter: David Cai >Assignee: David Cai >Priority: Major > Fix For: 2.0.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3835) Global sort doesn't sort string columns properly
[ https://issues.apache.org/jira/browse/CARBONDATA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-3835. --- Fix Version/s: (was: 2.0.0) 2.0.1 Resolution: Fixed > Global sort doesn't sort string columns properly > > > Key: CARBONDATA-3835 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3835 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Fix For: 2.0.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > problem: > For global sort without partition, string comes as byte[], if we use string > comparator (StringSerializableComparator) it wll convert byte[] to toString > which gives address and comparision goes wrong. > > solution: change data type to byte before choosing comparator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3837) Should fallback to the original plan when MV rewrite throw exception
David Cai created CARBONDATA-3837: - Summary: Should fallback to the original plan when MV rewrite throw exception Key: CARBONDATA-3837 URL: https://issues.apache.org/jira/browse/CARBONDATA-3837 Project: CarbonData Issue Type: Improvement Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3812) Data load jobs are missing output metrics
[ https://issues.apache.org/jira/browse/CARBONDATA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-3812: -- Attachment: Screenshot from 2020-05-09 11-54-59.png > Data load jobs are missing output metrics > - > > Key: CARBONDATA-3812 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3812 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: David Cai >Priority: Minor > Attachments: Screenshot from 2020-05-09 11-54-59.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3812) Data load jobs are missing output metrics
[ https://issues.apache.org/jira/browse/CARBONDATA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-3812: -- Description: please check attachments, the output item is empty. > Data load jobs are missing output metrics > - > > Key: CARBONDATA-3812 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3812 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: David Cai >Priority: Minor > Attachments: Screenshot from 2020-05-09 11-54-59.png > > > please check attachments, the output item is empty. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3812) Data load jobs are missing output metrics
David Cai created CARBONDATA-3812: - Summary: Data load jobs are missing output metrics Key: CARBONDATA-3812 URL: https://issues.apache.org/jira/browse/CARBONDATA-3812 Project: CarbonData Issue Type: Improvement Affects Versions: 2.0.0 Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3810) Partition column name should be case insensitive
David Cai created CARBONDATA-3810: - Summary: Partition column name should be case insensitive Key: CARBONDATA-3810 URL: https://issues.apache.org/jira/browse/CARBONDATA-3810 Project: CarbonData Issue Type: Bug Reporter: David Cai [Reproduce] create table cs_insert_p (id int, Name string) stored as carbondata partitioned by (c1 int, c2 int, C3 string) insert into table cs_insert_p partition(c1=3, C2=111, c3='2019-11-18') select 200, 'cc' It will throw NoSuchElementException: key not found: c2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-910) Implement Partition feature
[ https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai closed CARBONDATA-910. Resolution: Invalid deprecated since 2.0 > Implement Partition feature > --- > > Key: CARBONDATA-910 > URL: https://issues.apache.org/jira/browse/CARBONDATA-910 > Project: CarbonData > Issue Type: New Feature > Components: core, data-load, data-query >Reporter: Cao, Lionel >Assignee: Cao, Lionel >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Why need partition table > Partition table provide an option to divide table into some smaller pieces. > With partition table: > 1. Data could be better managed, organized and stored. > 2. We can avoid full table scan in some scenario and improve query > performance. (partition column in filter, > multiple partition tables join in the same partition column etc.) > Partitioning design > Range Partitioning >range partitioning maps data to partitions according to the range of > partition column values, operator '<' defines non-inclusive upper bound of > current partition. > List Partitioning >list partitioning allows you map data to partitions with specific > value list > Hash Partitioning >hash partitioning maps data to partitions with hash algorithm and put > them to the given number of partitions > Composite Partitioning(2 levels at most for now) >Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, > Hash-Range, Hash-List, Hash-Hash > DDL-Create > Create table sales( > itemid long, > logdate datetime, > customerid int > ... > ...) > [partition by range logdate(...)] > [subpartition by list area(...)] > Stored By 'carbondata' > [tblproperties(...)]; > range partition: > partition by range logdate(< '2016-01-01', < '2017-01-01', < > '2017-02-01', < '2017-03-01', < '2099-01-01') > list partition: > partition by list area('Asia', 'Europe', 'North America', 'Africa', > 'Oceania') > hash partition: > partition by hash(itemid, 9) > composite partition: > partition by range logdate(< '2016- -01', < '2017-01-01', < > '2017-02-01', < '2017-03-01', < '2099-01-01') > subpartition by list area('Asia', 'Europe', 'North America', 'Africa', > 'Oceania') > DDL-Rebuild, Add > Alter table sales rebuild partition by (range|list|hash)(...); > Alter table salse add partition (< '2018-01-01');#only support range > partitioning, list partitioning > Alter table salse add partition ('South America'); > #Note: No delete operation for partition, please use rebuild. > If need delete data, use delete statement, but the definition of partition > will not be deleted. > Partition Table Data Store > [Option One] > Use the current design, keep partition folder out of segments > Fact >|___Part0 >| |___Segment_0 >| |___ ***-[bucketId]-.carbondata >| |___ ***-[bucketId]-.carbondata >| |___Segment_1 >| ... >|___Part1 >| |___Segment_0 >| |___Segment_1 >|... > [Option Two] > remove partition folder, add partition id into file name and build btree in > driver side. > Fact >|___Segment_0 >| |___ ***-[bucketId]-[partitionId].carbondata >| |___ ***-[bucketId]-[partitionId].carbondata >|___Segment_1 >|___Segment_2 >... > Pros & Cons: > Option one would be faster to locate target files > Option two need to store more metadata of folders > Partition Table MetaData Store > partitioni info should be stored in file footer/index file and load into > memory before user query. > Relationship with Bucket > Bucket should be lower level of partition. > Partition Table Query > Example: > Select * from sales > where logdate <= date '2016-12-01'; > User should remember to add a partition filter when write SQL on a partition > table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-2776) Support ingesting data from Kafka service
[ https://issues.apache.org/jira/browse/CARBONDATA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-2776. --- Resolution: Fixed > Support ingesting data from Kafka service > - > > Key: CARBONDATA-2776 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2776 > Project: CarbonData > Issue Type: Sub-task >Reporter: David Cai >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-2917) Should support binary datatype
[ https://issues.apache.org/jira/browse/CARBONDATA-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-2917. --- Resolution: Fixed > Should support binary datatype > -- > > Key: CARBONDATA-2917 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2917 > Project: CarbonData > Issue Type: Improvement > Components: file-format >Affects Versions: 1.5.0 >Reporter: David Cai >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3021) Streaming throw Unsupported data type exception
[ https://issues.apache.org/jira/browse/CARBONDATA-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-3021. --- Resolution: Fixed > Streaming throw Unsupported data type exception > --- > > Key: CARBONDATA-3021 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3021 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: David Cai >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:343) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206) > Caused by: org.apache.carbondata.streaming.CarbonStreamException: Job failed > to write data file > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:288) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply(CarbonAppendableStreamSink.scala:238) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply(CarbonAppendableStreamSink.scala:238) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileJob(CarbonAppendableStreamSink.scala:238) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink.addBatch(CarbonAppendableStreamSink.scala:133) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply$mcV$sp(StreamExecution.scala:666) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply(StreamExecution.scala:666) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply(StreamExecution.scala:666) > at > org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:665) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:306) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294) > at > org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294) > at > org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290) > ... 1 more > Caused by: java.lang.IllegalArgumentException: Unsupported data type: LONG > at > org.apache.carbondata.core.util.comparator.Comparator.getComparatorByDataTypeForMeasure(Comparator.java:73) > at > org.apache.carbondata.streaming.segment.StreamSegment.mergeBatchMinMax(StreamSegment.java:471) > at > org.apache.carbondata.streaming.segment.StreamSegment.updateStreamFileIndex(StreamSegment.java:610) > at > org.apache.carbondata.streaming.segment.StreamSegment.updateIndexFile(StreamSegment.java:627) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:277) > ... 20 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-2923) should log the info of the min/max identification on streaming table
[ https://issues.apache.org/jira/browse/CARBONDATA-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-2923. --- Resolution: Fixed > should log the info of the min/max identification on streaming table > > > Key: CARBONDATA-2923 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2923 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 1.5.0 >Reporter: David Cai >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > currently, the query doesn't log the info of the min/max identification on > the streaming table, so we don't know whether the min/max of streaming is > working fine or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3641) Should improve data loading performance for partition table
[ https://issues.apache.org/jira/browse/CARBONDATA-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-3641. --- Resolution: Fixed > Should improve data loading performance for partition table > --- > > Key: CARBONDATA-3641 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3641 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Reporter: David Cai >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > [Background] > # only implemented commit algorithm version 1 > # generated too many segment files during loading > # generated too many small data files and index files > [Modification] > 1. implemented carbon commit algorithm, avoid to move data file and > index files > 2. generate the final segment file directly > 3. optimize global_sort to avoid small files issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3347) support SORT_COLUMNS modification
[ https://issues.apache.org/jira/browse/CARBONDATA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai resolved CARBONDATA-3347. --- Resolution: Fixed > support SORT_COLUMNS modification > - > > Key: CARBONDATA-3347 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3347 > Project: CarbonData > Issue Type: New Feature > Components: spark-integration >Reporter: David Cai >Assignee: David Cai >Priority: Major > Attachments: sort_columns modification.pdf, sort_columns > modification_v2.pdf > > > *Background* > Now SORT_COLUMNS can’t be modified after the table is created. If we want to > modify SORT_COLUMNS on this table, we need to create a new table and migrate > data. If the data is huge, the migration will take a long time and even > impact the user business. > SORT_SCOPE in table properties can be modified now. And we can specify new > SORT_SCOPE during data loading. Carbon index file will mark whether this > segment is sorted or not. So the different segments maybe have different > SORT_SCOPE. > *Mo**tivation* > After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS > according to their business. History segments will still use old > SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one > if need. > But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they > create the table because the modification will take many resources to resort > data of old segments. > > please check design doc for more detail. > [^sort_columns modification_v2.pdf] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3803) Should mark CarbonSession as deprecated in version 2.0
David Cai created CARBONDATA-3803: - Summary: Should mark CarbonSession as deprecated in version 2.0 Key: CARBONDATA-3803 URL: https://issues.apache.org/jira/browse/CARBONDATA-3803 Project: CarbonData Issue Type: Wish Affects Versions: 2.0.0 Reporter: David Cai Better to use CarbonExtensions instead of CarbonSession in version 2.0. We should mark CarbonSession as deprecated in version 2.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3756) the query of stage files only read the first blocklet of each carbondata file
David Cai created CARBONDATA-3756: - Summary: the query of stage files only read the first blocklet of each carbondata file Key: CARBONDATA-3756 URL: https://issues.apache.org/jira/browse/CARBONDATA-3756 Project: CarbonData Issue Type: Improvement Reporter: David Cai the query of stage files only read the first blocklet of each carbondata file. if the file contains multiple blocklets, the query result will be wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3752) Query on carbon table should support reusing Exchange
David Cai created CARBONDATA-3752: - Summary: Query on carbon table should support reusing Exchange Key: CARBONDATA-3752 URL: https://issues.apache.org/jira/browse/CARBONDATA-3752 Project: CarbonData Issue Type: Improvement Reporter: David Cai Query on carbon table should support reusing Exchange [Reproduce] create table t1(c1 int, c2 string) using carbondata insert into t1 values(1, 'abc') explain select c2, sum(c1) from t1 group by c2 union all select c2, sum(c1) from t1 group by c2 [Physical Plan] {noformat} Union :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))]) : +- Exchange hashpartitioning(c2#37, 200) : +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))]) : +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))]) +- Exchange hashpartitioning(c2#37, 200) +- *(3) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))]) +- *(3) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct{noformat} It should reuse Exchange like Following: {noformat} Union :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))]) : +- Exchange hashpartitioning(c2#37, 200) : +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))]) :+- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))]) +- ReusedExchange [c2#37, sum#54L], Exchange hashpartitioning(c2#37, 200){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3668) CarbonSession should use old flow (not CarbonExtensions flow)
David Cai created CARBONDATA-3668: - Summary: CarbonSession should use old flow (not CarbonExtensions flow) Key: CARBONDATA-3668 URL: https://issues.apache.org/jira/browse/CARBONDATA-3668 Project: CarbonData Issue Type: Improvement Reporter: David Cai Considering back-compatibility, CarbonSession should use old flow (not CarbonExtensions flow) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3641) Should improve data loading performance for partition table
David Cai created CARBONDATA-3641: - Summary: Should improve data loading performance for partition table Key: CARBONDATA-3641 URL: https://issues.apache.org/jira/browse/CARBONDATA-3641 Project: CarbonData Issue Type: Improvement Components: data-load Reporter: David Cai [Background] # only implemented commit algorithm version 1 # generated too many segment files during loading # generated too many small data files and index files [Modification] 1. implemented carbon commit algorithm, avoid to move data file and index files 2. generate the final segment file directly 3. optimize global_sort to avoid small files issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-3547) Delete duplicate data in a segment
[ https://issues.apache.org/jira/browse/CARBONDATA-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai reassigned CARBONDATA-3547: - Assignee: David Cai > Delete duplicate data in a segment > -- > > Key: CARBONDATA-3547 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3547 > Project: CarbonData > Issue Type: Sub-task >Reporter: David Cai >Assignee: David Cai >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3545) support deduplication
[ https://issues.apache.org/jira/browse/CARBONDATA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-3545: -- Description: support to delete duplicate data in the table > support deduplication > -- > > Key: CARBONDATA-3545 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3545 > Project: CarbonData > Issue Type: New Feature >Reporter: David Cai >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > support to delete duplicate data in the table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-3546) Delete duplicate data between segments
[ https://issues.apache.org/jira/browse/CARBONDATA-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai reassigned CARBONDATA-3546: - Assignee: David Cai > Delete duplicate data between segments > -- > > Key: CARBONDATA-3546 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3546 > Project: CarbonData > Issue Type: Sub-task >Reporter: David Cai >Assignee: David Cai >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-3545) support deduplication
[ https://issues.apache.org/jira/browse/CARBONDATA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai reassigned CARBONDATA-3545: - Assignee: David Cai > support deduplication > -- > > Key: CARBONDATA-3545 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3545 > Project: CarbonData > Issue Type: New Feature >Reporter: David Cai >Assignee: David Cai >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > support to delete duplicate data in the table -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3547) Delete duplicate data in a segment
David Cai created CARBONDATA-3547: - Summary: Delete duplicate data in a segment Key: CARBONDATA-3547 URL: https://issues.apache.org/jira/browse/CARBONDATA-3547 Project: CarbonData Issue Type: Sub-task Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3546) Delete duplicate data between segments
David Cai created CARBONDATA-3546: - Summary: Delete duplicate data between segments Key: CARBONDATA-3546 URL: https://issues.apache.org/jira/browse/CARBONDATA-3546 Project: CarbonData Issue Type: Sub-task Reporter: David Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3545) support deduplication
[ https://issues.apache.org/jira/browse/CARBONDATA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cai updated CARBONDATA-3545: -- Description: (was: delete duplicate the data from new segments if old segments exist the same data. delete repeated col1 from t1 where new.segment.id between 3 and 4 and old.segment.id between 0 and 2) Issue Type: New Feature (was: Improvement) Summary: support deduplication (was: support delete repeated data from a segment if the data is exists in other segments) > support deduplication > -- > > Key: CARBONDATA-3545 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3545 > Project: CarbonData > Issue Type: New Feature >Reporter: David Cai >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3545) support delete repeated data from a segment if the data is exists in other segments
David Cai created CARBONDATA-3545: - Summary: support delete repeated data from a segment if the data is exists in other segments Key: CARBONDATA-3545 URL: https://issues.apache.org/jira/browse/CARBONDATA-3545 Project: CarbonData Issue Type: Improvement Reporter: David Cai delete duplicate the data from new segments if old segments exist the same data. delete repeated col1 from t1 where new.segment.id between 3 and 4 and old.segment.id between 0 and 2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3544) CLI should support a option to show statistics for all columns
David Cai created CARBONDATA-3544: - Summary: CLI should support a option to show statistics for all columns Key: CARBONDATA-3544 URL: https://issues.apache.org/jira/browse/CARBONDATA-3544 Project: CarbonData Issue Type: Improvement Reporter: David Cai better to add option -C to show statistics for all columns -- This message was sent by Atlassian Jira (v8.3.4#803005)