[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[jira] [Closed] (CARBONDATA-1694) Incorrect exception on presto CLI while executing select query after applying alter drop column query on a table
[ https://issues.apache.org/jira/browse/CARBONDATA-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1694. - Resolved > Incorrect exception on presto CLI while executing select query after applying > alter drop column query on a table > > > Key: CARBONDATA-1694 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1694 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Minor > Fix For: 1.3.0 > > Attachments: 2000_UniqData.csv > > Time Spent: 1h > Remaining Estimate: 0h > > Incorrect exception on presto CLI while executing select query after applying > alter drop column query on a table > Steps to Reproduce: > On Beeline: > 1) Create Table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > 2) Load Data > LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' > into table uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 3) Execute Query: > a) alter table uniqdata drop columns (cust_id); > b)select * from uniqdata; > ouput: > +-+--+---+--+--+ > |cust_name |active_emui_version | dob | > doj | bigint_column1 | bigint_column2 | decimal_column1 > | decimal_column2 |double_column1|double_column2 > | integer_column1 | > +--++++-+-+-+-+--+---+--+--+ > | CUST_NAME_01987 | ACTIVE_EMUI_VERSION_01987 | 1975-06-11 01:00:03.0 | > 1975-06-11 02:00:03.0 | 123372038841| -223372034867 | > 12345680888.123400 | 22345680888.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 1988 | > | CUST_NAME_01988 | ACTIVE_EMUI_VERSION_01988 | 1975-06-12 01:00:03.0 | > 1975-06-12 02:00:03.0 | 123372038842| -223372034866 | > 12345680889.123400 | 22345680889.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 1989 | > | CUST_NAME_01989 | ACTIVE_EMUI_VERSION_01989 | 1975-06-13 01:00:03.0 | > 1975-06-13 02:00:03.0 | 123372038843| -223372034865 | > 12345680890.123400 | 22345680890.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 1990 | > | CUST_NAME_01990 | ACTIVE_EMUI_VERSION_01990 | 1975-06-14 01:00:03.0 | > 1975-06-14 02:00:03.0 | 123372038844| -223372034864 | > 12345680891.123400 | 22345680891.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 1991 | > | CUST_NAME_01991 | ACTIVE_EMUI_VERSION_01991 | 1975-06-15 01:00:03.0 | > 1975-06-15 02:00:03.0 | 123372038845| -223372034863 | > 12345680892.123400 | 22345680892.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 1992 | > | CUST_NAME_01992 | ACTIVE_EMUI_VERSION_01992 | 1975-06-16 01:00:03.0 | > 1975-06-16 02:00:03.0 | 123372038846| -223372034862 | > 12345680893.123400 | 22345680893.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 1993 | > | CUST_NAME_01993 | ACTIVE_EMUI_VERSION_01993 | 1975-06-17 01:00:03.0 | > 1975-06-17 02:00:03.0 | 123372038847| -223372034861 | > 12345680894.123400 | 22345680894.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 1994 | > | CUST_NAME_01994 | ACTIVE_EMUI_VERSION_01994 | 1975-06-18 01:00:03.0 | > 1975-06-18 02:00:03.0 | 123372038848| -223372034860 | > 12345680895.123400 | 22345680895.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 1995 | > | CUST_NAME_01995 | ACTIVE_EMUI_VERSION_01995 | 1975-06-19 01:00:03.0 | > 1975-06-19 02:00:03.0 | 123372038849| -223372034859 | > 12345680896.123400 | 22345680896.123400 | 1.12345674897976E10 | > -1.12345674897976E10 | 19
[jira] [Closed] (CARBONDATA-1682) Incorrect output on presto CLI after applying alter query on a table
[ https://issues.apache.org/jira/browse/CARBONDATA-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1682. - Resolution: Fixed this issue is resolved with https://github.com/apache/carbondata/pull/1486 PR. > Incorrect output on presto CLI after applying alter query on a table > > > Key: CARBONDATA-1682 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1682 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Minor > > Incorrect output on presto CLI after applying alter query on a table > Steps to reproduce: > On beeline > 1) Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > 2) Execute Query: > a)desc uniqdata; > b) alter table uniqdata drop columns (cust_id); > c)select cust_id from uniqdata; > output: > Error: org.apache.spark.sql.AnalysisException: cannot resolve '`cust_id`' > given input columns: [doj, dob, double_column1, double_column2, > active_emui_version, bigint_column1, decimal_column1, decimal_column2, > cust_name, bigint_column2, integer_column1]; line 1 pos 7; > 'Project ['cust_id] > +- SubqueryAlias uniqdata >+- > Relation[cust_name#3097,active_emui_version#3098,dob#3099,doj#3100,bigint_column1#3101L,bigint_column2#3102L,decimal_column1#3103,decimal_column2#3104,double_column1#3105,double_column2#3106,integer_column1#3107] > CarbonDatasourceHadoopRelation [ Database name :newpresto, Table name > :uniqdata, Schema :Some(StructType(StructField(cust_name,StringType,true), > StructField(active_emui_version,StringType,true), > StructField(dob,TimestampType,true), StructField(doj,TimestampType,true), > StructField(bigint_column1,LongType,true), > StructField(bigint_column2,LongType,true), > StructField(decimal_column1,DecimalType(30,10),true), > StructField(decimal_column2,DecimalType(36,10),true), > StructField(double_column1,DoubleType,true), > StructField(double_column2,DoubleType,true), > StructField(integer_column1,IntegerType,true))) ] (state=,code=0) > On Presto CLI: > 1) Execute Query: > select cust_id from uniqdata; > 2)Expected Output: It should through an error same as on Beeline > 3)Actual Output: > cust_id > - > (0 rows) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1677) Incorrect result displays on presto CLI after applying drop table command
[ https://issues.apache.org/jira/browse/CARBONDATA-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1677. - Resolution: Fixed this issue is resolved with https://github.com/apache/carbondata/pull/1486 PR. > Incorrect result displays on presto CLI after applying drop table command > - > > Key: CARBONDATA-1677 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1677 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Minor > Attachments: 2000_UniqData.csv > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Incorrect result displays on presto CLI after applying drop table command > Steps to reproduce: > On Beeline: > 1) Create Table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > 2) Load Data > LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' > into table uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 3) Execute Query > select * from uniqdata; > Start presto server: > bin/launcher run > run presto CLI: > ./presto --server localhost:9000 --catalog carbondata --schema newpresto > On Presto CLI: > 1)Execute Queries: > a) show tables; > b) select * from uniqdata; > c)Now on beeline drop the table and execute query: > select * from uniqdata; > Expected Result: it should throw an error as table or view does not exist or > not found > Actual Result : > On beeline: > Error: org.apache.spark.sql.AnalysisException: Table or view not found: > uniqdata; line 1 pos 14 (state=,code=0) > on Presto: > cust_id | cust_name | active_emui_version | dob | doj | bigint_column1 | > bigint_column2 | decimal_column1 | decimal_column2 | double_column1 | > double_column2 | integer_column1 > -+---+-+-+-+++-+-+++- > (0 rows) > Query 20171108_115415_2_34smd, FINISHED, 1 node > Splits: 16 total, 16 done (100.00%) > 0:00 [0 rows, 0B] [0 rows/s, 0B/s] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1508: [CARBONDATA-1738] [PreAgg] Block direct insert/load ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1508 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1405/ ---
[GitHub] carbondata pull request #1560: [CARBONDATA-1804] Support Plug-gable File Ope...
GitHub user ManoharVanam opened a pull request: https://github.com/apache/carbondata/pull/1560 [CARBONDATA-1804] Support Plug-gable File Operations based on File types Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? Not, Old test cases will take care - How it is tested? Please attach test report. Verified in cluster - Is it a performance related change? Please attach the performance test report. NO - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ManoharVanam/incubator-carbondata FileFactory Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1560 commit 5ea38446f3726aae84a96fa39a203a364e74e5a1 Author: Manohar Date: 2017-11-24T07:10:36Z [CARBONDATA-1804] Support Plug-gable File Operations based on File types ---
[jira] [Closed] (CARBONDATA-1675) Incorrect result displays after applying drop column query on a table
[ https://issues.apache.org/jira/browse/CARBONDATA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1675. - Resolution: Fixed this issue is resolved with https://github.com/apache/carbondata/pull/1486 PR. > Incorrect result displays after applying drop column query on a table > - > > Key: CARBONDATA-1675 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1675 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Minor > > Incorrect result displays after applying drop column query on a table > Steps to reproduce: > 1) Create table: > CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB") > 2) Load data > LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' > into table uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > 3)Execute Query: > desc uniqdata > output: > +--+-+--+--+ > | col_name |data_type| comment | > +--+-+--+--+ > | CUST_ID | int | NULL | > | CUST_NAME| string | NULL | > | ACTIVE_EMUI_VERSION | string | NULL | > | DOB | timestamp | NULL | > | DOJ | timestamp | NULL | > | BIGINT_COLUMN1 | bigint | NULL | > | BIGINT_COLUMN2 | bigint | NULL | > | DECIMAL_COLUMN1 | decimal(30,10) | NULL | > | DECIMAL_COLUMN2 | decimal(36,10) | NULL | > | Double_COLUMN1 | double | NULL | > | Double_COLUMN2 | double | NULL | > | INTEGER_COLUMN1 | int | NULL | > +--+-+--+--+ > 12 rows selected (0.041 seconds) > > Start Presto server > sudo ./bin/launcher run > run presto CLI: > ./presto --server localhost:9000 --catalog carbondata --schema newpresto > On Presto CLI: > 1) Execute Query: > a) desc uniqdata; > output: >Column| Type | Extra | Comment > -++---+- > cust_id | integer| | > cust_name | varchar| | > active_emui_version | varchar| | > dob | timestamp | | > doj | timestamp | | > bigint_column1 | bigint | | > bigint_column2 | bigint | | > decimal_column1 | decimal(30,10) | | > decimal_column2 | decimal(36,10) | | > double_column1 | double | | > double_column2 | double | | > integer_column1 | integer| | > (12 rows) > b) Now on Beeline execute the drop column query on the table which is : > alter table uniqdata drop columns (CUST_ID) > c)desc uniqdata; > Expected output: it should display updated table description as on beeline > Actual output: > on beeline: > 0: jdbc:hive2://localhost:1> desc uniqdata; > +--+-+--+--+ > | col_name |data_type| comment | > +--+-+--+--+ > | cust_name| string | NULL | > | active_emui_version | string | NULL | > | dob | timestamp | NULL | > | doj | timestamp | NULL | > | bigint_column1 | bigint | NULL | > | bigint_column2 | bigint | NULL | > | decimal_column1 | decimal(30,10) | NULL | > | decimal_column2 | decimal(36,10) | NULL | > | double_column1 | double | NULL | > | double_column2 | double | NULL | > | integer_column1 | int | NULL | > +--+-+--+--+ > 11 rows selected (0.039 seconds) > On presto CLI > presto:newpresto> desc uniqdata; >Column| Ty
[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1540 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1846/ ---
[jira] [Closed] (CARBONDATA-1670) Incorrect result displays while select query on presto CLI after recreating a table.
[ https://issues.apache.org/jira/browse/CARBONDATA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1670. - Resolution: Fixed this issue is resolved with https://github.com/apache/carbondata/pull/1486 PR. > Incorrect result displays while select query on presto CLI after recreating a > table. > > > Key: CARBONDATA-1670 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1670 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Priority: Minor > Attachments: partition_table.csv > > > Incorrect result displays while select query on presto CLI after recreating a > table. > Steps to reproduce: > On Beeline: > 1) Create Table: > CREATE TABLE list_partition_table_short(intField INT, bigintField LONG, > doubleField DOUBLE, stringField STRING, timestampField TIMESTAMP, > decimalField DECIMAL(18,2), dateField DATE, charField CHAR(5), floatField > FLOAT) PARTITIONED BY (shortField SHORT) STORED BY 'carbondata' > TBLPROPERTIES('PARTITION_TYPE'='LIST', 'LIST_INFO'='10,20,30'); > 2)Load Data: > load data inpath 'hdfs://localhost:54310/Data/partition_table.csv' into table > list_partition_table_short > options('FILEHEADER'='shortfield,intfield,bigintfield,doublefield,stringfield,timestampfield,decimalfield,datefield,charfield,floatfield'); > 3) Execute select Query: > select * from list_partition_table_short; > Output: > +---+--+--+-+-+---+++-+-+--+ > | intField | bigintField | doubleField | stringField | > timestampField | decimalField | dateField | charField | floatField | > shortField | > +---+--+--+-+-+---+++-+-+--+ > | 19| 109 | 1009.0 | HashPartition | NULL > | 19.25 | NULL | W | 109.01 | 10 > | > | 11| 101 | 1001.0 | HashPartition | NULL > | 11.25 | NULL | Z | 101.01 | 2 > | > | 21| 111 | 1011.0 | HashPartition | NULL > | 21.25 | NULL | Z | 111.01 | 12 > | > | 10| 100 | 1000.0 | ListPartition | NULL > | 10.25 | NULL | A | 100.01 | 1 > | > | 22| 112 | 1012.0 | ListPartition | NULL > | 22.25 | NULL | F | 112.01 | 13 > | > | 23| 113 | 1013.0 | ListPartition | NULL > | 23.25 | NULL | M | 113.01 | 14 > | > | 16| 106 | 1006.0 | ListPartition | NULL > | 16.25 | NULL | Y | 106.01 | 7 > | > | 12| 102 | 1002.0 | NoPartition | NULL > | 12.25 | NULL | F | 102.01 | 3 > | > | 15| 105 | 1005.0 | NoPartition | NULL > | 15.25 | NULL | K | 105.01 | 6 > | > | 20| 110 | 1010.0 | NoPartition | NULL > | 20.25 | NULL | K | 110.01 | 11 > | > | 18| 108 | 1008.0 | RangeIntervalPartition | NULL > | 18.25 | NULL | A | 108.01 | 9 > | > | 14| 104 | 1004.0 | RangePartition | NULL > | 14.25 | NULL | L | 104.01 | 5 > | > | 13| 103 | 1003.0 | RangePartition | NULL > | 13.25 | NULL | M | 103.01 | 4 > | > | 17| 107 | 1007.0 | RangePartition | NULL > | 17.25 | NULL | T | 107.01 | 8 > | > +---+--+--+-+-+---+++-+-+--+ > > Start presto server: > bin/launcher run > run presto CLI: > ./presto --server localhost:9000 --catalog carbondata --schema newpresto > On Presto CLI: > 1)Execute Queries: > a) show tables; > b) select * from list_partition_table_short; > Output: same as beeline. > intfield | bigintfiel
[GitHub] carbondata issue #1559: [CARBONDATA-1805][Dictionary] Optimize pruning for d...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/1559 nice jobï¼loading performance is improved obviouslyã ---
[GitHub] carbondata issue #1521: [WIP] [CARBONDATA-1743] fix conurrent pre-agg creati...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1521 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1404/ ---
[jira] [Closed] (CARBONDATA-1664) Abnormal behavior of timestamp data type in carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1664. - Resolution: Fixed not related to carbondata > Abnormal behavior of timestamp data type in carbondata > -- > > Key: CARBONDATA-1664 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1664 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Priority: Trivial > Attachments: 2000_UniqData.csv > > > Abnormal behavior of timestamp data type in carbondata > Steps to Reproduce: > 1) Create Table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB") > 2)Load Data: > LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into > table uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > 3) Execute Query: > a) select DOB from UNIQDATA where DOB ='1970-01-01 10:00:03.0' or DOB = > '1970-01-04 01:00:03.0'; > output: > ++--+ > | DOB | > ++--+ > | 1970-01-01 10:00:03.0 | > | 1970-01-04 01:00:03.0 | > ++--+ > b) select DOB from UNIQDATA where DOB in ('1970-01-01 10:00:03.0','1970-01-04 > 01:00:03.0'); > output: > +--+--+ > | DOB | > +--+--+ > +--+--+ > c)select DOB from UNIQDATA where DOB in (cast('1970-01-01 10:00:03.0' as > timestamp),cast('1970-01-04 01:00:03.0' as timestamp)); > output: > ++--+ > | DOB | > ++--+ > | 1970-01-01 10:00:03.0 | > | 1970-01-04 01:00:03.0 | > ++--+ > Abnormality of timestamp datatype: > In the select query (a) it fetch the records containing DOB 1970-01-01 > 10:00:03.0 and 1970-01-04 01:00:03.0 but for query (b) while using IN > operator it shows no data and again in the same query when we cast it to > timestamp as in query (c) it displays result. > There should be a strict type checking for timestamp values. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1661) Incorrect output of select query with timestamp data type on presto CLI
[ https://issues.apache.org/jira/browse/CARBONDATA-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1661. - Resolved > Incorrect output of select query with timestamp data type on presto CLI > --- > > Key: CARBONDATA-1661 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1661 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Minor > Fix For: 1.3.0 > > Attachments: 2000_UniqData.csv > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Incorrect output of select query with timestamp data type on presto CLI > Steps to Reproduce: > On Beeline: > 1) Create Table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB") > 2)Load Data: > LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into > table uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > 3) Start presto server: > bin/launcher run > 4) run presto CLI: > ./presto --server localhost:9000 --catalog carbondata --schema newpresto > On presto CLI > 1) Execute select Query: > select cust_name from uniqdata where dob= cast('1970-01-11 01:00:03.000' as > timestamp); > 2)Expected Result: it should display correct output as on beeline: > +--+--+ > |cust_name | > +--+--+ > | CUST_NAME_00010 | > +--+--+ > 3) Actual Result: > cust_name > --- > (0 rows) > Query 20171031_084306_00030_k9q68, FINISHED, 1 node > Splits: 17 total, 17 done (100.00%) > 0:00 [0 rows, 0B] [0 rows/s, 0B/s] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1523: [CARBONDATA-1756] Improve Boolean data compress rate...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1523 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1403/ ---
[jira] [Closed] (CARBONDATA-1660) Incorrect result displays while executing select query with where clause for decimal data type
[ https://issues.apache.org/jira/browse/CARBONDATA-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1660. - Resolved > Incorrect result displays while executing select query with where clause for > decimal data type > -- > > Key: CARBONDATA-1660 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1660 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Minor > Fix For: 1.3.0 > > Attachments: 2000_UniqData.csv > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Incorrect result displays while executing select query with where clause for > decimal data type > Steps to reproduce: > On Beeline: > 1) Create Table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB") > 2)Load Data: > LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into > table uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > 3) Start presto server: > bin/launcher run > 4) run presto CLI: > ./presto --server localhost:9000 --catalog carbondata --schema newpresto > On presto CLI > 1) Execute select Query: > select cust_name from uniqdata where decimal_column1=12345678902.123400; > Expected Result: it should display the cust_name as on beeline > +--+--+ > |cust_name | > +--+--+ > | CUST_NAME_1 | > +--+--+ > Actual Result: > it throws an error saying error while setting filter expression to job. > presto:newpresto> select cust_name from uniqdata where > decimal_column1=12345678902.123400; > Query 20171031_074909_00013_k9q68 failed: Error while setting filter > expression to Job -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1541 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1845/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1402/ ---
[GitHub] carbondata pull request #1559: [CARBONDATA-1805][Dictionary] Optimize prunin...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/1559 [CARBONDATA-1805][Dictionary] Optimize pruning for dictionary loading Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? `NO` - [X] Any backward compatibility impacted? `NO` - [X] Document update required? `NO` - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO TESTS ADDED, PERFORMANCE ENHANCEMENT DIDN'T AFFECT THE FUNCTIONALITY` - How it is tested? Please attach test report. `TESTED IN CLUSTER WITH REAL DATA` - Is it a performance related change? Please attach the performance test report. `PERFORMANCE ENHANCED, DICTIONARY TIME REDUCED FROM 2.9MIN TO 29SEC` - Any additional information to help reviewers in testing this change. `NO` - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NOT RELATED` COPY FROM JIRA === # SCENARIO Recently I have tried dictionary feature in Carbondata and found its dictionary generating phase in data loading is quite slow. My scenario is as below: + Input Data: 35.8GB CSV file with 199 columns and 126 Million lines + Dictionary columns: 3 columns each containing 19213,4,9 distinct values The whole data loading consumes about 2.9min for dictionary generating and 4.6min for fact data loading -- about 39% of the time are spent on dictionary. Having observed the nmon result, Ifound the CPU usage were quite high during the dictionary generating phase and the Disk, Network were quite normal. # ANALYZE After I went through the dictionary generating related code, I found Carbondata aleady prune non-dictionary columns before generating dictionary. But the problem is that `the pruning comes after data file reading`, this will cause some overhead, we can optimize it by `prune while reading data file`. # RESOLVE Refactor the `loadDataFrame` method in `GlobalDictionaryUtil`, only pruning the non-dictionary columns while reading the data file. After implementing the above optimization, the dictionary generating costs only `29s` -- **`about 6 times better than before`**(2.9min), and the fact data loading costs the same as before(4.6min), about 10% of the time are spent on dictionary. # NOTE + Currently only `load data file` will benefit from this optimization, while `load data frame` will not. + Before implementing this solution, I tried another solution -- cache dataframe of the data file, the performance was even worse -- the dictionary generating time was 5.6min. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata opt_dict_load Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1559 commit e8e49ed54085700eadde81842af0b0daecaed12a Author: xuchuanyin Date: 2017-11-24T03:27:02Z optimize pruning for dictionary loading ---
[jira] [Created] (CARBONDATA-1805) Optimize pruning for dictionary loading
xuchuanyin created CARBONDATA-1805: -- Summary: Optimize pruning for dictionary loading Key: CARBONDATA-1805 URL: https://issues.apache.org/jira/browse/CARBONDATA-1805 Project: CarbonData Issue Type: Improvement Components: data-load, spark-integration Reporter: xuchuanyin Assignee: xuchuanyin Fix For: 1.3.0 # SCENARIO Recently I tried dictionary feature in Carbondata and found its dictionary generating phase in data loading is quite slow. My scenario is as below: + Input Data: 35.8GB CSV file with 199 columns and 126 Million lines + Dictionary columns: 3 columns each containing 19213,4,9 distinct values The whole data loading consumes about 2.9min for dictionary generating and 4.6min for fact data loading -- about 39% of the time are spent on dictionary. Having observed the nmon result, Ifound the CPU usage were quite high during the dictionary generating phase and the Disk, Network were quite normal. # ANALYZE After I went through the dictionary generating related code, I found Carbondata aleady prune non-dictionary columns before generating dictionary. But the problem is that `the pruning comes after data file reading`, this will cause some overhead, we can optimize it by `prune while reading data file`. # RESOLVE Refactor the `loadDataFrame` method in `GlobalDictionaryUtil`, only pruning the non-dictionary columns while reading the data file. After implementing the above optimization, the dictionary generating costs only `29s` -- `about 6 times better than before`(2.9min), and the fact data loading costs the same as before(4.6min), about 10% of the time are spent on dictionary. # NOTE + Currently only `load data file` will benefit from this optimization, while `load data frame` will not. + Before implementing this solution, I tried another solution -- cache dataframe of the data file, the performance was even worse -- the dictionary generating time was 5.6min. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1540 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1401/ ---
[jira] [Created] (CARBONDATA-1804) Make FileOperations Pluggable
Manohar Vanam created CARBONDATA-1804: - Summary: Make FileOperations Pluggable Key: CARBONDATA-1804 URL: https://issues.apache.org/jira/browse/CARBONDATA-1804 Project: CarbonData Issue Type: Improvement Components: core Reporter: Manohar Vanam Assignee: Manohar Vanam 1. Refactor FileFactory based on FileType to support plug-gable file handlers so that custom file handlers can have their specific logic. Example : User can provide his own implementations by extending existing FileTypes -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1542 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1844/ ---
[GitHub] carbondata pull request #1558: [CARBONDATA-1803] Changing format of Show seg...
GitHub user dhatchayani opened a pull request: https://github.com/apache/carbondata/pull/1558 [CARBONDATA-1803] Changing format of Show segments - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Manual Testing - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/incubator-carbondata show_segments_format Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1558 commit 1ac6c6132eff013416264b25eba8ee9a1d3ae3e1 Author: dhatchayani Date: 2017-11-24T05:56:10Z [CARBONDATA-1803] Changing format of Show segments ---
[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1541 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1400/ ---
[GitHub] carbondata pull request #1557: [CARBONDATA-1796] While submitting new job to...
GitHub user dhatchayani opened a pull request: https://github.com/apache/carbondata/pull/1557 [CARBONDATA-1796] While submitting new job to HadoopRdd, token should be generated for accessing paths Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Manual Testing - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/incubator-carbondata delegation_token1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1557.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1557 commit 634e15b745bce4ec2ee4e9d7b5922b33e420d7eb Author: dhatchayani Date: 2017-11-24T05:41:50Z [CARBONDATA-1796] While submitting new job to HadoopRdd, token should be generated for accessing paths ---
[jira] [Assigned] (CARBONDATA-1803) Changing format of Show segments
[ https://issues.apache.org/jira/browse/CARBONDATA-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani reassigned CARBONDATA-1803: --- Assignee: dhatchayani > Changing format of Show segments > > > Key: CARBONDATA-1803 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1803 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1803) Changing format of Show segments
dhatchayani created CARBONDATA-1803: --- Summary: Changing format of Show segments Key: CARBONDATA-1803 URL: https://issues.apache.org/jira/browse/CARBONDATA-1803 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1428) Incorrect Result displays while alter drop command on partitioned and non-partitioned table
[ https://issues.apache.org/jira/browse/CARBONDATA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1428. - Resolution: Fixed working fine > Incorrect Result displays while alter drop command on partitioned and > non-partitioned table > --- > > Key: CARBONDATA-1428 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1428 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.2.0 > Environment: spark 2.1 >Reporter: Vandana Yadav > Attachments: 2000_UniqData.csv > > > Incorrect Result displays while alter drop command on partitioned and > non-partitioned table > Steps to reproduce: > 1) Create a partitioned table > CREATE TABLE uniqdata_part1 (CUST_NAME String,ACTIVE_EMUI_VERSION string,DOB > Timestamp,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) PARTITIONED BY (CUST_ID int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES > ('PARTITION_TYPE'='RANGE','RANGE_INFO'='9090,9500,9800',"TABLE_BLOCKSIZE"= > "256 MB") > 2) Load data into partitioned table > LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into > table uniqdata_part1 OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > 3) Execute drop column query on partitioned table > ALTER TABLE uniqdata_part1 drop columns(BIGINT_COLUMN1) > 4) Result on beeline > Error: java.lang.RuntimeException: Alter table drop column operation failed: > Column bigint_column1 does not exists in the table partition.uniqdata_part1 > (state=,code=0) > 5) Expected Result: > it should drop the column from the partitioned table as it is not the > partitioned column and existing column of the table. > +--+-+--+--+ > | col_name |data_type| comment | > +--+-+--+--+ > | CUST_NAME| string | NULL | > | ACTIVE_EMUI_VERSION | string | NULL | > | DOB | timestamp | NULL | > | DOJ | timestamp | NULL | > | BIGINT_COLUMN1 | bigint | NULL | > | BIGINT_COLUMN2 | bigint | NULL | > | DECIMAL_COLUMN1 | decimal(30,10) | NULL | > | DECIMAL_COLUMN2 | decimal(36,10) | NULL | > | Double_COLUMN1 | double | NULL | > | Double_COLUMN2 | double | NULL | > | INTEGER_COLUMN1 | int | NULL | > | CUST_ID | int | NULL | -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1544: [CARBONDATA-1740] Fixed order by issue in case of pr...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1544 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1843/ ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1542 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1399/ ---
[GitHub] carbondata issue #1544: [CARBONDATA-1740] Fixed order by issue in case of pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1544 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1398/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1545 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1842/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1545 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1397/ ---
[GitHub] carbondata issue #1546: [CARBONDATA-1736] Query from segment set is not effe...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1546 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1841/ ---
[GitHub] carbondata issue #1546: [CARBONDATA-1736] Query from segment set is not effe...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1546 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1396/ ---
[GitHub] carbondata pull request #1547: [CARBONDATA-1792] Add example of data managem...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1547 ---
[jira] [Resolved] (CARBONDATA-1792) Adding example of data management for Spark2.X
[ https://issues.apache.org/jira/browse/CARBONDATA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-1792. Resolution: Fixed Fix Version/s: 1.3.0 > Adding example of data management for Spark2.X > -- > > Key: CARBONDATA-1792 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1792 > Project: CarbonData > Issue Type: Task > Components: examples >Affects Versions: 1.3.0 >Reporter: Zhoujin >Assignee: Jin Zhou >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Adding example of data management for Spark2.X -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1792) Adding example of data management for Spark2.X
[ https://issues.apache.org/jira/browse/CARBONDATA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen reassigned CARBONDATA-1792: -- Assignee: Jin Zhou > Adding example of data management for Spark2.X > -- > > Key: CARBONDATA-1792 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1792 > Project: CarbonData > Issue Type: Task > Components: examples >Affects Versions: 1.3.0 >Reporter: Zhoujin >Assignee: Jin Zhou >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Adding example of data management for Spark2.X -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1792) Adding example of data management for Spark2.X
[ https://issues.apache.org/jira/browse/CARBONDATA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen updated CARBONDATA-1792: --- Priority: Minor (was: Major) > Adding example of data management for Spark2.X > -- > > Key: CARBONDATA-1792 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1792 > Project: CarbonData > Issue Type: Task > Components: examples >Affects Versions: 1.3.0 >Reporter: Zhoujin >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Adding example of data management for Spark2.X -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1538: [CARBONDATA-1779] GenericVectorizedReader
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1538 ---
[GitHub] carbondata issue #1547: [CARBONDATA-1792] Add example of data management for...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1547 LGTM ---
[GitHub] carbondata pull request #1556: [CARBONDATA-1770] Updated documentaion for da...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1556 ---
[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1556 LGTM ---
[GitHub] carbondata issue #1496: [CARBONDATA-1709][DataFrame] Support sort_columns op...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1496 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1395/ ---
[GitHub] carbondata issue #1496: [CARBONDATA-1709][DataFrame] Support sort_columns op...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1496 retest this please ---
[GitHub] carbondata issue #1499: [WIP][CARBONDATA-1235]Add Lucene Datamap
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1499 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1840/ ---
[GitHub] carbondata issue #1499: [WIP][CARBONDATA-1235]Add Lucene Datamap
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1499 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1394/ ---
[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1104 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1839/ ---
[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1104 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1393/ ---
[jira] [Updated] (CARBONDATA-1802) Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column
[ https://issues.apache.org/jira/browse/CARBONDATA-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajeet Rai updated CARBONDATA-1802: -- Description: Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column. Steps: 1: create table ttt(c int,d int,e int) stored by 'carbondata'; 2: Alter table ttt drop columns(c); 3: observe that below error is coming: Error: java.lang.RuntimeException: Alter table drop column operation failed: Alter drop operation failed. AtLeast one key column should exist after drop. Expected: Since user is able to create a table with all numeric columns, Same should be supported in Alter feature. was: Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column. Steps: 1: create table ttt(c int,d int,e int) stored by 'carbondata'; 2: Alter table ttt drop columns(c); 3: observe that below error is coming: Error: java.lang.RuntimeException: Alter table drop column operation failed: Alter drop operation failed. AtLeast one key column should exist after drop. > Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no > key column > -- > > Key: CARBONDATA-1802 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1802 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.3.0 > Environment: 3 Node ant cluster >Reporter: Ajeet Rai > Labels: functional > > Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no > key column. > Steps: > 1: create table ttt(c int,d int,e int) stored by 'carbondata'; > 2: Alter table ttt drop columns(c); > 3: observe that below error is coming: > Error: java.lang.RuntimeException: Alter table drop column operation failed: > Alter drop operation failed. AtLeast one key column should exist after drop. > Expected: Since user is able to create a table with all numeric columns, Same > should be supported in Alter feature. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1802) Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column
Ajeet Rai created CARBONDATA-1802: - Summary: Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column Key: CARBONDATA-1802 URL: https://issues.apache.org/jira/browse/CARBONDATA-1802 Project: CarbonData Issue Type: Bug Affects Versions: 1.3.0 Environment: 3 Node ant cluster Reporter: Ajeet Rai Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column. Steps: 1: create table ttt(c int,d int,e int) stored by 'carbondata'; 2: Alter table ttt drop columns(c); 3: observe that below error is coming: Error: java.lang.RuntimeException: Alter table drop column operation failed: Alter drop operation failed. AtLeast one key column should exist after drop. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1104 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1392/ ---
[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1104 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1838/ ---
[jira] [Closed] (CARBONDATA-1103) Integer datatype as a long datatype in carbondata on cluster
[ https://issues.apache.org/jira/browse/CARBONDATA-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1103. - Resolution: Fixed resolved > Integer datatype as a long datatype in carbondata on cluster > > > Key: CARBONDATA-1103 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1103 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.2.0 > Environment: spark 1.6 >Reporter: Vandana Yadav >Priority: Minor > > Integer datatype as a long datatype in carbondata on cluster > Steps to reproduce Bug: > In CarbonData: > Create Table: > create table myvmall (imei String,uuid String,MAC String,device_color > String,device_shell_color String,device_name String,product_name String,ram > String,rom String,cpu_clock String,series String,check_date String,check_year > int,check_month int ,check_day int,check_hour int,bom String,inside_name > String,packing_date String,packing_year String,packing_month > String,packing_day String,packing_hour String,customer_name > String,deliveryAreaId String,deliveryCountry String,deliveryProvince > String,deliveryCity String,deliveryDistrict String,packing_list_no > String,order_no String,Active_check_time String,Active_check_year > int,Active_check_month int,Active_check_day int,Active_check_hour > int,ActiveAreaId String,ActiveCountry String,ActiveProvince String,Activecity > String,ActiveDistrict String,Active_network String,Active_firmware_version > String,Active_emui_version String,Active_os_version String,Latest_check_time > String,Latest_check_year int,Latest_check_month int,Latest_check_day > int,Latest_check_hour int,Latest_areaId String,Latest_country > String,Latest_province String,Latest_city String,Latest_district > String,Latest_firmware_version String,Latest_emui_version > String,Latest_os_version String,Latest_network String,site String,site_desc > String,product String,product_desc String) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('DICTIONARY_INCLUDE'='check_year,check_month,check_day,check_hour,Active_check_year,Active_check_month,Active_check_day,Active_check_hour,Latest_check_year,Latest_check_month,Latest_check_day') > Load Data: > LOAD DATA INPATH > 'HDFS_URL/BabuStore/Data/100_VMALL_1_Day_DATA_2015-09-15.csv' INTO table > myvmall options('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='imei,uuid,MAC,device_color,device_shell_color,device_name,product_name,ram,rom,cpu_clock,series,check_date,check_year,check_month,check_day,check_hour,bom,inside_name,packing_date,packing_year,packing_month,packing_day,packing_hour,customer_name,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,packing_list_no,order_no,Active_check_time,Active_check_year,Active_check_month,Active_check_day,Active_check_hour,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,Active_network,Active_firmware_version,Active_emui_version,Active_os_version,Latest_check_time,Latest_check_year,Latest_check_month,Latest_check_day,Latest_check_hour,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_firmware_version,Latest_emui_version,Latest_os_version,Latest_network,site,site_desc,product,product_desc') > description in carbondata: > +--++--+--+ > | col_name | data_type | comment | > +--++--+--+ > | imei | string | | > | uuid | string | | > | mac | string | | > | device_color | string | | > | device_shell_color | string | | > | device_name | string | | > | product_name | string | | > | ram | string | | > | rom | string | | > | cpu_clock| string | | > | series | string | | > | check_date | string | | > | check_year | int| | > | check_month | int| | > | check_day| int| | > | check_hour | int| | > | bom | string | | > | inside_name | string | | > | packing_date | string | | > | packing_year | string | | > | packing_month| string | | > | packing_day | string | | > | packing_hour | string | | > | customer_name| string
[jira] [Closed] (CARBONDATA-1086) Add documentation for batch sort support for data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1086. - Resolution: Fixed PR is closed > Add documentation for batch sort support for data loading > - > > Key: CARBONDATA-1086 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1086 > Project: CarbonData > Issue Type: Improvement > Components: docs >Reporter: Vandana Yadav >Assignee: Pallavi Singh >Priority: Minor > Time Spent: 7.5h > Remaining Estimate: 0h > > Improves Loading Performance > Commands to be added ( JIRA 742,JIRA 1047) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1085) add documentation for size based blocklet for V3 data format
[ https://issues.apache.org/jira/browse/CARBONDATA-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1085. - Resolution: Fixed PR is closed > add documentation for size based blocklet for V3 data format > - > > Key: CARBONDATA-1085 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1085 > Project: CarbonData > Issue Type: Improvement >Reporter: Vandana Yadav >Assignee: Pallavi Singh >Priority: Minor > > Configurable number of pages to improve IO by specifying the property in > carbon.properties ( JIRA 766) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1084) Add documentation for V3 Data Format
[ https://issues.apache.org/jira/browse/CARBONDATA-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1084. - Resolution: Fixed PR is closed > Add documentation for V3 Data Format > > > Key: CARBONDATA-1084 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1084 > Project: CarbonData > Issue Type: Improvement > Components: docs >Reporter: Vandana Yadav >Assignee: Pallavi Singh >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Benefits to be added in documentation and add commands to set this format and > specify that this is the dafault format -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-995) Incorrect result displays while using variance aggregate function in presto integration
[ https://issues.apache.org/jira/browse/CARBONDATA-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264256#comment-16264256 ] Vandana Yadav commented on CARBONDATA-995: -- while operating the same query on hive it results differently 1)Create table: hive> CREATE TABLE uniqdata_h (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 2)Load data: hive> load data local inpath '/home/knoldus/Desktop/csv/TestData/Data/uniqdata/2000_UniqData.csv' into table uniqdata_h 3)Execute query: hive> select variance(DECIMAL_COLUMN1) as a from (select DECIMAL_COLUMN1 from UNIQDATA_h order by DECIMAL_COLUMN1) t; Query ID = knoldus_20171123174059_cdc24e03-f8b1-41d5-b496-3fa3acbc4608 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Job running in-process (local Hadoop) 2017-11-23 17:41:00,945 Stage-1 map = 100%, reduce = 100% Ended Job = job_local1774409020_0004 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 3009784 HDFS Write: 752446 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 333665.7302720188 Time taken: 1.512 seconds, Fetched: 1 row(s) > Incorrect result displays while using variance aggregate function in presto > integration > --- > > Key: CARBONDATA-995 > URL: https://issues.apache.org/jira/browse/CARBONDATA-995 > Project: CarbonData > Issue Type: Bug > Components: data-query, presto-integration >Affects Versions: 1.1.0 > Environment: spark 2.1 , presto 0.166 >Reporter: Vandana Yadav >Priority: Minor > Attachments: 2000_UniqData.csv > > > Incorrect result displays while using variance aggregate function in presto > integration > Steps to reproduce : > 1. In CarbonData: > a) Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > b) Load data : > LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 2. In presto > a) Execute the query: > select variance(DECIMAL_COLUMN1) as a from (select DECIMAL_COLUMN1 from > UNIQDATA order by DECIMAL_COLUMN1) t > Actual result : > In CarbonData : > "++--+ > | a | > ++--+ > | 333832.4983039884 | > ++--+ > 1 row selected (0.695 seconds) > " > in presto: > " a > --- > 333832.3010442859 > (1 row) > Query 20170420_082837_00062_hd7jy, FINISHED, 1 node > Splits: 35 total, 35 done (100.00%) > 0:00 [2.01K rows, 1.97KB] [8.09K rows/s, 7.91KB/s]" > Expected result: it should display the same result as showing in CarbonData. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1793) Insert / update is allowing more than 32000 characters for String column
[ https://issues.apache.org/jira/browse/CARBONDATA-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1793. - Resolution: Fixed Fix Version/s: 1.3.0 > Insert / update is allowing more than 32000 characters for String column > > > Key: CARBONDATA-1793 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1793 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1549: [CARBONDATA-1793] Insert / update is allowing...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1549 ---
[GitHub] carbondata issue #1549: [CARBONDATA-1793] Insert / update is allowing more t...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1549 LGTM ---
[GitHub] carbondata issue #1549: [CARBONDATA-1793] Insert / update is allowing more t...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1549 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1391/ ---
[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1556 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1837/ ---
[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1556 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1390/ ---
[jira] [Closed] (CARBONDATA-983) Incorrect result displays while using not equal to (!=) operator in presto integration
[ https://issues.apache.org/jira/browse/CARBONDATA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-983. Resolution: Fixed resolved > Incorrect result displays while using not equal to (!=) operator in presto > integration > -- > > Key: CARBONDATA-983 > URL: https://issues.apache.org/jira/browse/CARBONDATA-983 > Project: CarbonData > Issue Type: Bug > Components: data-query, presto-integration >Affects Versions: 1.1.0 > Environment: spark 2.1, presto 0.166 >Reporter: Vandana Yadav >Priority: Minor > Attachments: 2000_UniqData.csv > > > Incorrect result displays while using not equal to (!=) operator in presto > integration(result set should exclude the provided record but it is present > in our result set) > Steps to reproduce : > 1. In CarbonData: > a) Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > b) Load data : > LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 2. In presto > a) Execute the query: > DECIMAL_COLUMN1 from UNIQDATA where DECIMAL_COLUMN1 !=12345678902.123400 > order by DECIMAL_COLUMN1; > b) Actual Result: > In Carbondata: > +-+--+ > | DECIMAL_COLUMN1 | > +-+--+ > | 12345678901.123400 | > | 12345678901.123400 | > | 12345678903.123400 | > | 12345678904.123400 | > | 12345678905.123400 | > In presto: > DECIMAL_COLUMN1 > > 12345678901.123400 > 12345678901.123400 > 12345678902.123400 > 12345678903.123400 > 12345678904.123400 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1797) Segment_Index compaction should take compaction lock to support concurrent scenarios better
[ https://issues.apache.org/jira/browse/CARBONDATA-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1797. - Resolution: Fixed Fix Version/s: 1.3.0 > Segment_Index compaction should take compaction lock to support concurrent > scenarios better > --- > > Key: CARBONDATA-1797 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1797 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani > Fix For: 1.3.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SEGMENT_INDEX compaction is not taking compaction lock. While concurrent > operation, this may be successful but the output may not be as expected. > Scenario: > Execute MINOR compaction and SEGMENT_INDEX compaction concurrently. > As SEGMENT_INDEX compaction is not taking any lock it will do tasks in > between, finally some segments index files will be merged, probably the newly > created segments may be left out. > Solution: > To take compaction lock -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-979) Incorrect result displays to user in presto integration as compare to CarbonData.
[ https://issues.apache.org/jira/browse/CARBONDATA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-979. Resolution: Fixed Resollved > Incorrect result displays to user in presto integration as compare to > CarbonData. > - > > Key: CARBONDATA-979 > URL: https://issues.apache.org/jira/browse/CARBONDATA-979 > Project: CarbonData > Issue Type: Bug > Components: data-query, presto-integration >Affects Versions: 1.1.0 > Environment: Spark 2.1,Presto 0.166 >Reporter: Vandana Yadav >Priority: Minor > Attachments: 2000_UniqData.csv > > > Incorrect result displays to user in presto integration as compare to > CarbonData (As in Carbondata our result set include null values but in presto > it exclude those). > Steps to reproduce : > 1. In CarbonData: > a) Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > b) Load data : > LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > > 2. In presto > a) Execute the query: > select CUST_NAME from uniqdata where CUST_NAME !='CUST_NAME_01844' order by > CUST_NAME > expected result : it should display all cust_name except "cust_name_01844" > Actual result: > In CarbonData: > "| CUST_NAME_01995 | > | CUST_NAME_01996 | > | CUST_NAME_01997 | > | CUST_NAME_01998 | > | CUST_NAME_01999 | > +--+--+ > 2,012 rows selected (1.777 seconds) > " > In presto: > "CUST_NAME_01997 > CUST_NAME_01998 > CUST_NAME_01999 > (2000 rows) > Query 20170418_105903_00012_disp5, FINISHED, 1 node > Splits: 18 total, 18 done (100.00%) > 3:21 [2.01K rows, 1.97KB] [10 rows/s, 10B/s] > " -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1553: [CARBONDATA-1797] Segment_Index compaction sh...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1553 ---
[GitHub] carbondata issue #1549: [CARBONDATA-1793] Insert / update is allowing more t...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1549 retest this please ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1542 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1389/ ---
[GitHub] carbondata issue #1553: [CARBONDATA-1797] Segment_Index compaction should ta...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1553 LGTM ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1542 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1836/ ---
[jira] [Closed] (CARBONDATA-920) errors while executing create table examples from docs
[ https://issues.apache.org/jira/browse/CARBONDATA-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-920. resolved > errors while executing create table examples from docs > -- > > Key: CARBONDATA-920 > URL: https://issues.apache.org/jira/browse/CARBONDATA-920 > Project: CarbonData > Issue Type: Improvement > Components: docs > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: Vandana Yadav >Priority: Minor > Fix For: 1.2.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Examples for creating table in docs throw error while > execution(docs/useful-tips-on-carbondata.md) > Steps to reproduce: > 1. run query from examples to create table > create table carbondata_table( > Dime_1 String, > HOST String, > MSISDN String, > counter_1 double, > counter_2 double, > BEGIN_TIME bigint, > counter_100 double > )STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI', > 'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME'); > output on beeline: > Error: org.apache.carbondata.spark.exception.MalformedCarbonCommandException: > DICTIONARY_EXCLUDE column: imsi does not exist in table. Please check create > table statement. (state=,code=0) > Expected result : > It should create table successfully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1786) Getting null pointer exception while loading data into table and while fetching data getting NULL values
[ https://issues.apache.org/jira/browse/CARBONDATA-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1786. - Resolution: Fixed this bug is resolved with PR 1550 > Getting null pointer exception while loading data into table and while > fetching data getting NULL values > > > Key: CARBONDATA-1786 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1786 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Blocker > Attachments: 2000_UniqData.csv > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Getting null pointer exception while loading data into table and while > fetching data getting NULL values > Steps to reproduce: > 1)Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > 2)Load Data > LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' > into table uniqdata OPTIONS('DELIMITER'='/' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','TIMESTAMPFORMAT'='-mm-dd > hh:mm:ss'); > 3) Expected result: it should load data into table successfully. > 4) Actual Result: it throws an error > Error: java.lang.NullPointerException (state=,code=0) > logs: > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.delete(AbstractDFSCarbonFile.java:142) > at > org.apache.carbondata.processing.util.DeleteLoadFolders.physicalFactAndMeasureMetadataDeletion(DeleteLoadFolders.java:79) > at > org.apache.carbondata.processing.util.DeleteLoadFolders.deleteLoadFoldersFromFileSystem(DeleteLoadFolders.java:134) > at > org.apache.carbondata.spark.rdd.DataManagementFunc$.deleteLoadsAndUpdateMetadata(DataManagementFunc.scala:188) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:281) > at > org.apache.spark.sql.execution.command.management.LoadTableCommand.loadData(LoadTableCommand.scala:347) > at > org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:183) > at > org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:64) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:220) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:163) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteS
[jira] [Commented] (CARBONDATA-1786) Getting null pointer exception while loading data into table and while fetching data getting NULL values
[ https://issues.apache.org/jira/browse/CARBONDATA-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264107#comment-16264107 ] Vandana Yadav commented on CARBONDATA-1786: --- this bug is resolved with #PR1550 > Getting null pointer exception while loading data into table and while > fetching data getting NULL values > > > Key: CARBONDATA-1786 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1786 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Blocker > Attachments: 2000_UniqData.csv > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Getting null pointer exception while loading data into table and while > fetching data getting NULL values > Steps to reproduce: > 1)Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > 2)Load Data > LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' > into table uniqdata OPTIONS('DELIMITER'='/' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','TIMESTAMPFORMAT'='-mm-dd > hh:mm:ss'); > 3) Expected result: it should load data into table successfully. > 4) Actual Result: it throws an error > Error: java.lang.NullPointerException (state=,code=0) > logs: > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.delete(AbstractDFSCarbonFile.java:142) > at > org.apache.carbondata.processing.util.DeleteLoadFolders.physicalFactAndMeasureMetadataDeletion(DeleteLoadFolders.java:79) > at > org.apache.carbondata.processing.util.DeleteLoadFolders.deleteLoadFoldersFromFileSystem(DeleteLoadFolders.java:134) > at > org.apache.carbondata.spark.rdd.DataManagementFunc$.deleteLoadsAndUpdateMetadata(DataManagementFunc.scala:188) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:281) > at > org.apache.spark.sql.execution.command.management.LoadTableCommand.loadData(LoadTableCommand.scala:347) > at > org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:183) > at > org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:64) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:220) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:163) > at > org.apache.spar
[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1556 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1388/ ---
[jira] [Resolved] (CARBONDATA-1796) While submitting new job to Hadoop, token should be generated for accessing paths
[ https://issues.apache.org/jira/browse/CARBONDATA-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1796. - Resolution: Fixed Fix Version/s: 1.3.0 > While submitting new job to Hadoop, token should be generated for accessing > paths > - > > Key: CARBONDATA-1796 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1796 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani > Fix For: 1.3.0 > > Time Spent: 40m > Remaining Estimate: 0h > > In hadoop secure mode cluster, > while submitting job to hadoopRdd token should be generated for the path in > JobConf, else Delegation Token exception will be thrown during load. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1552: [CARBONDATA-1796] While submitting new job to...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1552 ---
[GitHub] carbondata issue #1552: [CARBONDATA-1796] While submitting new job to Hadoop...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1552 LGTM ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1542 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1834/ ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1542 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1387/ ---
[jira] [Resolved] (CARBONDATA-1799) CarbonInputMapperTest is failing
[ https://issues.apache.org/jira/browse/CARBONDATA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1799. - Resolution: Fixed Fix Version/s: 1.3.0 > CarbonInputMapperTest is failing > > > Key: CARBONDATA-1799 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1799 > Project: CarbonData > Issue Type: Bug >Reporter: Rahul Kumar >Assignee: Rahul Kumar > Fix For: 1.3.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1555: [CARBONDATA-1799] conf added in testcase
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1555 LGTM ---
[jira] [Commented] (CARBONDATA-1650) load data into hive table fail
[ https://issues.apache.org/jira/browse/CARBONDATA-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263977#comment-16263977 ] Vandana Yadav commented on CARBONDATA-1650: --- can you please check your permissions of the carbondata table status file, looking at the logs it seems like there is some permission issue, i am not able to reproduce this bug please share reproducible steps > load data into hive table fail > -- > > Key: CARBONDATA-1650 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1650 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 1.2.0 > Environment: hive.version:1.1.0-cdh5.10.0 > hadoop:version:2.6.0-cdh5.10.0 >Reporter: xujie >Priority: Critical > > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://namenodeb:8020/app/carbondata" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon = > SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", > warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, > storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb) > import org.apache.spark.sql.types._ > import org.apache.spark.sql.Row > val rdd = sc.textFile("/data/home/hadoop/test.txt"); > val schemaString = "id name city" > val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, > StringType, nullable = true)) > val schema = StructType(fields) > val rowRDD = rdd.map(_.split(",")).map(attributes => > Row(attributes(0),attributes(1),attributes(2))) > val peopleDF = spark.createDataFrame(rowRDD, schema) > peopleDF.createOrReplaceTempView("tmp_table") > spark.sql("insert into target_table SELECT * FROM tmp_table") > java.lang.RuntimeException: Failed to add entry in table status for > default.target_table > at scala.sys.package$.error(package.scala:27) > at > org.apache.carbondata.spark.util.CommonUtil$.readAndUpdateLoadProgressInTableMeta(CommonUtil.scala:533) > at > org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:928) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637) > at > org.apache.spark.sql.CarbonDatasourceHadoopRelation.insert(CarbonDatasourceHadoopRelation.scala:98) > at > org.apache.spark.sql.execution.datasources.InsertIntoDataSourceCommand.run(InsertIntoDataSourceCommand.scala:43) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > ... 52 elided -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1538: [CARBONDATA-1779] GenericVectorizedReader
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1538#discussion_r152741031 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java --- @@ -141,24 +135,25 @@ public SafeVariableLengthDimensionDataChunkStore(boolean isInvertedIndex, int nu // for last record length = (short) (this.data.length - currentDataOffset); } -DataType dt = vector.getType(); -if ((!(dt instanceof StringType) && length == 0) || ByteUtil.UnsafeComparer.INSTANCE +org.apache.carbondata.core.metadata.datatype.DataType dt = vector.getType(); --- End diff -- why do this change ? remove import, add the full import at here ? ---
[GitHub] carbondata pull request #1556: [CARBONDATA-1770] Updated documentaion for da...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1556#discussion_r152738129 --- Diff: docs/data-management-on-carbondata.md --- @@ -294,11 +294,11 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ``` NOTE: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together. - - **DATEFORMAT:** Date format for specified column. + - **DATEFORMAT/TIMESTAMPFORMAT:** Date and Timestamp format for specified column. ``` -OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2') -``` +OPTIONS('dateformat' = '-MM-dd','timestampformat'='/MM/dd HH:mm:ss') --- End diff -- please use uppercase for "dateformat,timestampformat". ---