date:20171123

[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all

2017-11-23 Thread chenerlu

Github user chenerlu commented on the issue:

https://github.com/apache/carbondata/pull/1537
  
retest this please


---

[jira] [Closed] (CARBONDATA-1694) Incorrect exception on presto CLI while executing select query after applying alter drop column query on a table

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1694.
-

Resolved

> Incorrect exception on presto CLI while executing select query after applying 
> alter drop column query on a table
> 
>
> Key: CARBONDATA-1694
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1694
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: anubhav tarar
>Priority: Minor
> Fix For: 1.3.0
>
> Attachments: 2000_UniqData.csv
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Incorrect exception on presto CLI while executing select query after applying 
> alter drop column query on a table
> Steps to Reproduce:
> On Beeline:
> 1) Create Table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB");
> 2) Load Data
> LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' 
> into table uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> 3) Execute Query:
> a) alter table uniqdata drop columns (cust_id);
> b)select * from uniqdata;
> ouput:
> +-+--+---+--+--+
> |cust_name |active_emui_version |  dob   |
>   doj   | bigint_column1  | bigint_column2  | decimal_column1 
> | decimal_column2 |double_column1|double_column2 
> | integer_column1  |
> +--++++-+-+-+-+--+---+--+--+
> | CUST_NAME_01987  | ACTIVE_EMUI_VERSION_01987  | 1975-06-11 01:00:03.0  | 
> 1975-06-11 02:00:03.0  | 123372038841| -223372034867   | 
> 12345680888.123400  | 22345680888.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 1988 |
> | CUST_NAME_01988  | ACTIVE_EMUI_VERSION_01988  | 1975-06-12 01:00:03.0  | 
> 1975-06-12 02:00:03.0  | 123372038842| -223372034866   | 
> 12345680889.123400  | 22345680889.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 1989 |
> | CUST_NAME_01989  | ACTIVE_EMUI_VERSION_01989  | 1975-06-13 01:00:03.0  | 
> 1975-06-13 02:00:03.0  | 123372038843| -223372034865   | 
> 12345680890.123400  | 22345680890.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 1990 |
> | CUST_NAME_01990  | ACTIVE_EMUI_VERSION_01990  | 1975-06-14 01:00:03.0  | 
> 1975-06-14 02:00:03.0  | 123372038844| -223372034864   | 
> 12345680891.123400  | 22345680891.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 1991 |
> | CUST_NAME_01991  | ACTIVE_EMUI_VERSION_01991  | 1975-06-15 01:00:03.0  | 
> 1975-06-15 02:00:03.0  | 123372038845| -223372034863   | 
> 12345680892.123400  | 22345680892.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 1992 |
> | CUST_NAME_01992  | ACTIVE_EMUI_VERSION_01992  | 1975-06-16 01:00:03.0  | 
> 1975-06-16 02:00:03.0  | 123372038846| -223372034862   | 
> 12345680893.123400  | 22345680893.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 1993 |
> | CUST_NAME_01993  | ACTIVE_EMUI_VERSION_01993  | 1975-06-17 01:00:03.0  | 
> 1975-06-17 02:00:03.0  | 123372038847| -223372034861   | 
> 12345680894.123400  | 22345680894.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 1994 |
> | CUST_NAME_01994  | ACTIVE_EMUI_VERSION_01994  | 1975-06-18 01:00:03.0  | 
> 1975-06-18 02:00:03.0  | 123372038848| -223372034860   | 
> 12345680895.123400  | 22345680895.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 1995 |
> | CUST_NAME_01995  | ACTIVE_EMUI_VERSION_01995  | 1975-06-19 01:00:03.0  | 
> 1975-06-19 02:00:03.0  | 123372038849| -223372034859   | 
> 12345680896.123400  | 22345680896.123400  | 1.12345674897976E10  | 
> -1.12345674897976E10  | 19

[jira] [Closed] (CARBONDATA-1682) Incorrect output on presto CLI after applying alter query on a table

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1682.
-
Resolution: Fixed

this issue is resolved with https://github.com/apache/carbondata/pull/1486 PR.

> Incorrect output on presto CLI after applying alter query on a table
> 
>
> Key: CARBONDATA-1682
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1682
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: anubhav tarar
>Priority: Minor
>
> Incorrect output on presto CLI after applying alter query on a table
> Steps to reproduce:
> On beeline 
> 1) Create table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB");
> 2) Execute Query:
> a)desc uniqdata;
> b) alter table uniqdata drop columns (cust_id);
> c)select cust_id from uniqdata;
> output:
> Error: org.apache.spark.sql.AnalysisException: cannot resolve '`cust_id`' 
> given input columns: [doj, dob, double_column1, double_column2, 
> active_emui_version, bigint_column1, decimal_column1, decimal_column2, 
> cust_name, bigint_column2, integer_column1]; line 1 pos 7;
> 'Project ['cust_id]
> +- SubqueryAlias uniqdata
>+- 
> Relation[cust_name#3097,active_emui_version#3098,dob#3099,doj#3100,bigint_column1#3101L,bigint_column2#3102L,decimal_column1#3103,decimal_column2#3104,double_column1#3105,double_column2#3106,integer_column1#3107]
>  CarbonDatasourceHadoopRelation [ Database name :newpresto, Table name 
> :uniqdata, Schema :Some(StructType(StructField(cust_name,StringType,true), 
> StructField(active_emui_version,StringType,true), 
> StructField(dob,TimestampType,true), StructField(doj,TimestampType,true), 
> StructField(bigint_column1,LongType,true), 
> StructField(bigint_column2,LongType,true), 
> StructField(decimal_column1,DecimalType(30,10),true), 
> StructField(decimal_column2,DecimalType(36,10),true), 
> StructField(double_column1,DoubleType,true), 
> StructField(double_column2,DoubleType,true), 
> StructField(integer_column1,IntegerType,true))) ] (state=,code=0)
> On Presto CLI:
> 1) Execute Query:
> select cust_id from uniqdata;
> 2)Expected Output: It should through an error same as on Beeline
> 3)Actual Output:
>  cust_id 
> -
> (0 rows)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-1677) Incorrect result displays on presto CLI after applying drop table command

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1677.
-
Resolution: Fixed

this issue is resolved with https://github.com/apache/carbondata/pull/1486 PR.

> Incorrect result displays on presto CLI after applying drop table command
> -
>
> Key: CARBONDATA-1677
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1677
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: anubhav tarar
>Priority: Minor
> Attachments: 2000_UniqData.csv
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Incorrect result displays on presto CLI after applying drop table command
> Steps to reproduce:
> On Beeline:
> 1) Create Table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB");
> 2) Load Data
> LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' 
> into table uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> 3) Execute Query
> select * from uniqdata;
> Start presto server:
> bin/launcher run
> run presto CLI:
> ./presto --server localhost:9000 --catalog carbondata --schema newpresto
> On Presto CLI:
> 1)Execute Queries:
> a) show tables;
> b) select * from uniqdata;
> c)Now on beeline drop the table and execute query:
> select * from uniqdata;
> Expected Result: it should throw an error as table or view does not exist or 
> not found
>  Actual Result :
> On beeline:
> Error: org.apache.spark.sql.AnalysisException: Table or view not found: 
> uniqdata; line 1 pos 14 (state=,code=0)
> on Presto:
> cust_id | cust_name | active_emui_version | dob | doj | bigint_column1 | 
> bigint_column2 | decimal_column1 | decimal_column2 | double_column1 | 
> double_column2 | integer_column1 
> -+---+-+-+-+++-+-+++-
> (0 rows)
> Query 20171108_115415_2_34smd, FINISHED, 1 node
> Splits: 16 total, 16 done (100.00%)
> 0:00 [0 rows, 0B] [0 rows/s, 0B/s]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1508: [CARBONDATA-1738] [PreAgg] Block direct insert/load ...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1508
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1405/



---

[GitHub] carbondata pull request #1560: [CARBONDATA-1804] Support Plug-gable File Ope...

2017-11-23 Thread ManoharVanam

GitHub user ManoharVanam opened a pull request:

https://github.com/apache/carbondata/pull/1560

[CARBONDATA-1804] Support Plug-gable File Operations based on File types

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
Not, Old test cases will take care
- How it is tested? Please attach test report.
Verified in cluster
- Is it a performance related change? Please attach the performance 
test report.
NO
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ManoharVanam/incubator-carbondata FileFactory

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1560


commit 5ea38446f3726aae84a96fa39a203a364e74e5a1
Author: Manohar 
Date:   2017-11-24T07:10:36Z

[CARBONDATA-1804] Support Plug-gable File Operations based on File types




---

[jira] [Closed] (CARBONDATA-1675) Incorrect result displays after applying drop column query on a table

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1675.
-
Resolution: Fixed

this issue is resolved with https://github.com/apache/carbondata/pull/1486 PR.

> Incorrect result displays after applying drop column query on a table
> -
>
> Key: CARBONDATA-1675
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1675
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: anubhav tarar
>Priority: Minor
>
> Incorrect result displays after applying drop column query on a table
> Steps to reproduce:
> 1) Create table:
> CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB")
> 2) Load data
> LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' 
> into table uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3)Execute Query:
> desc uniqdata
> output:
> +--+-+--+--+
> |   col_name   |data_type| comment  |
> +--+-+--+--+
> | CUST_ID  | int | NULL |
> | CUST_NAME| string  | NULL |
> | ACTIVE_EMUI_VERSION  | string  | NULL |
> | DOB  | timestamp   | NULL |
> | DOJ  | timestamp   | NULL |
> | BIGINT_COLUMN1   | bigint  | NULL |
> | BIGINT_COLUMN2   | bigint  | NULL |
> | DECIMAL_COLUMN1  | decimal(30,10)  | NULL |
> | DECIMAL_COLUMN2  | decimal(36,10)  | NULL |
> | Double_COLUMN1   | double  | NULL |
> | Double_COLUMN2   | double  | NULL |
> | INTEGER_COLUMN1  | int | NULL |
> +--+-+--+--+
> 12 rows selected (0.041 seconds)
>  
> Start Presto server
> sudo ./bin/launcher run
> run presto CLI:
> ./presto --server localhost:9000 --catalog carbondata --schema newpresto
> On Presto CLI:
> 1) Execute Query:
> a) desc uniqdata;
> output:
>Column|  Type  | Extra | Comment 
> -++---+-
>  cust_id | integer|   | 
>  cust_name   | varchar|   | 
>  active_emui_version | varchar|   | 
>  dob | timestamp  |   | 
>  doj | timestamp  |   | 
>  bigint_column1  | bigint |   | 
>  bigint_column2  | bigint |   | 
>  decimal_column1 | decimal(30,10) |   | 
>  decimal_column2 | decimal(36,10) |   | 
>  double_column1  | double |   | 
>  double_column2  | double |   | 
>  integer_column1 | integer|   | 
> (12 rows)
> b) Now on Beeline execute the drop column query on the table which is :
> alter table uniqdata drop columns (CUST_ID)
> c)desc uniqdata;
> Expected output: it should display updated table description as on beeline
> Actual output: 
> on beeline:
> 0: jdbc:hive2://localhost:1> desc uniqdata;
> +--+-+--+--+
> |   col_name   |data_type| comment  |
> +--+-+--+--+
> | cust_name| string  | NULL |
> | active_emui_version  | string  | NULL |
> | dob  | timestamp   | NULL |
> | doj  | timestamp   | NULL |
> | bigint_column1   | bigint  | NULL |
> | bigint_column2   | bigint  | NULL |
> | decimal_column1  | decimal(30,10)  | NULL |
> | decimal_column2  | decimal(36,10)  | NULL |
> | double_column1   | double  | NULL |
> | double_column2   | double  | NULL |
> | integer_column1  | int | NULL |
> +--+-+--+--+
> 11 rows selected (0.039 seconds)
> On presto CLI
> presto:newpresto> desc uniqdata;
>Column|  Ty

[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1540
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1846/



---

[jira] [Closed] (CARBONDATA-1670) Incorrect result displays while select query on presto CLI after recreating a table.

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1670.
-
Resolution: Fixed

this issue is resolved with https://github.com/apache/carbondata/pull/1486 PR.

> Incorrect result displays while select query on presto CLI after recreating a 
> table.
> 
>
> Key: CARBONDATA-1670
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1670
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Priority: Minor
> Attachments: partition_table.csv
>
>
> Incorrect result displays while select query on presto CLI after recreating a 
> table.
> Steps to reproduce:
> On Beeline:
> 1) Create Table:
> CREATE TABLE list_partition_table_short(intField INT, bigintField LONG, 
> doubleField DOUBLE, stringField STRING, timestampField TIMESTAMP, 
> decimalField DECIMAL(18,2), dateField DATE, charField CHAR(5), floatField 
> FLOAT) PARTITIONED BY (shortField SHORT) STORED BY 'carbondata' 
> TBLPROPERTIES('PARTITION_TYPE'='LIST', 'LIST_INFO'='10,20,30');
> 2)Load Data:
> load data inpath 'hdfs://localhost:54310/Data/partition_table.csv' into table 
> list_partition_table_short 
> options('FILEHEADER'='shortfield,intfield,bigintfield,doublefield,stringfield,timestampfield,decimalfield,datefield,charfield,floatfield');
> 3) Execute select Query:
> select * from list_partition_table_short;
> Output:
> +---+--+--+-+-+---+++-+-+--+
> | intField  | bigintField  | doubleField  |   stringField   | 
> timestampField  | decimalField  | dateField  | charField  | floatField  | 
> shortField  |
> +---+--+--+-+-+---+++-+-+--+
> | 19| 109  | 1009.0   | HashPartition   | NULL
> | 19.25 | NULL   | W  | 109.01  | 10  
> |
> | 11| 101  | 1001.0   | HashPartition   | NULL
> | 11.25 | NULL   | Z  | 101.01  | 2   
> |
> | 21| 111  | 1011.0   | HashPartition   | NULL
> | 21.25 | NULL   | Z  | 111.01  | 12  
> |
> | 10| 100  | 1000.0   | ListPartition   | NULL
> | 10.25 | NULL   | A  | 100.01  | 1   
> |
> | 22| 112  | 1012.0   | ListPartition   | NULL
> | 22.25 | NULL   | F  | 112.01  | 13  
> |
> | 23| 113  | 1013.0   | ListPartition   | NULL
> | 23.25 | NULL   | M  | 113.01  | 14  
> |
> | 16| 106  | 1006.0   | ListPartition   | NULL
> | 16.25 | NULL   | Y  | 106.01  | 7   
> |
> | 12| 102  | 1002.0   | NoPartition | NULL
> | 12.25 | NULL   | F  | 102.01  | 3   
> |
> | 15| 105  | 1005.0   | NoPartition | NULL
> | 15.25 | NULL   | K  | 105.01  | 6   
> |
> | 20| 110  | 1010.0   | NoPartition | NULL
> | 20.25 | NULL   | K  | 110.01  | 11  
> |
> | 18| 108  | 1008.0   | RangeIntervalPartition  | NULL
> | 18.25 | NULL   | A  | 108.01  | 9   
> |
> | 14| 104  | 1004.0   | RangePartition  | NULL
> | 14.25 | NULL   | L  | 104.01  | 5   
> |
> | 13| 103  | 1003.0   | RangePartition  | NULL
> | 13.25 | NULL   | M  | 103.01  | 4   
> |
> | 17| 107  | 1007.0   | RangePartition  | NULL
> | 17.25 | NULL   | T  | 107.01  | 8   
> |
> +---+--+--+-+-+---+++-+-+--+
>  
> Start presto server:
> bin/launcher run
>  run presto CLI:
> ./presto --server localhost:9000 --catalog carbondata --schema newpresto
> On Presto CLI:
>  1)Execute Queries:
> a) show tables;
> b) select * from list_partition_table_short;
> Output: same as beeline.
>  intfield | bigintfiel

[GitHub] carbondata issue #1559: [CARBONDATA-1805][Dictionary] Optimize pruning for d...

2017-11-23 Thread ndwangsen

Github user ndwangsen commented on the issue:

https://github.com/apache/carbondata/pull/1559
  
nice jobï¼loading performance is improved obviouslyã


---

[GitHub] carbondata issue #1521: [WIP] [CARBONDATA-1743] fix conurrent pre-agg creati...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1521
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1404/



---

[jira] [Closed] (CARBONDATA-1664) Abnormal behavior of timestamp data type in carbondata

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1664.
-
Resolution: Fixed

not related to carbondata

> Abnormal behavior of timestamp data type in carbondata
> --
>
> Key: CARBONDATA-1664
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1664
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Priority: Trivial
> Attachments: 2000_UniqData.csv
>
>
> Abnormal behavior of timestamp data type in carbondata
> Steps to Reproduce:
> 1) Create Table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB")
> 2)Load Data:
> LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into 
> table uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Execute Query:
> a) select DOB from UNIQDATA where DOB ='1970-01-01 10:00:03.0' or DOB = 
> '1970-01-04 01:00:03.0';
> output:
> ++--+
> |  DOB   |
> ++--+
> | 1970-01-01 10:00:03.0  |
> | 1970-01-04 01:00:03.0  |
> ++--+
> b) select DOB from UNIQDATA where DOB in ('1970-01-01 10:00:03.0','1970-01-04 
> 01:00:03.0');
> output:
> +--+--+
> | DOB  |
> +--+--+
> +--+--+
> c)select DOB from UNIQDATA where DOB in (cast('1970-01-01 10:00:03.0' as 
> timestamp),cast('1970-01-04 01:00:03.0' as timestamp));
> output:
> ++--+
> |  DOB   |
> ++--+
> | 1970-01-01 10:00:03.0  |
> | 1970-01-04 01:00:03.0  |
> ++--+
> Abnormality of timestamp datatype:
> In the select query (a) it fetch the records containing DOB  1970-01-01 
> 10:00:03.0 and 1970-01-04 01:00:03.0 but for query (b) while using IN 
> operator it shows no data and again in the same query when we cast it to 
> timestamp as in query (c) it displays result.
> There should be a strict type checking for timestamp values.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-1661) Incorrect output of select query with timestamp data type on presto CLI

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1661.
-

Resolved

> Incorrect output of select query with timestamp data type on presto CLI
> ---
>
> Key: CARBONDATA-1661
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1661
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: anubhav tarar
>Priority: Minor
> Fix For: 1.3.0
>
> Attachments: 2000_UniqData.csv
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Incorrect output of select query with timestamp data type on presto CLI
> Steps to Reproduce: 
> On Beeline:
> 1) Create Table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB")
> 2)Load Data:
> LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into 
> table uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Start presto server:
> bin/launcher run
> 4) run presto CLI:
> ./presto --server localhost:9000 --catalog carbondata --schema newpresto
> On presto CLI
> 1) Execute select Query:
> select cust_name from uniqdata where dob= cast('1970-01-11 01:00:03.000' as 
> timestamp);
> 2)Expected Result: it should display correct output as on beeline:
> +--+--+
> |cust_name |
> +--+--+
> | CUST_NAME_00010  |
> +--+--+
> 3) Actual Result:
> cust_name 
> ---
> (0 rows)
> Query 20171031_084306_00030_k9q68, FINISHED, 1 node
> Splits: 17 total, 17 done (100.00%)
> 0:00 [0 rows, 0B] [0 rows/s, 0B/s]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1523: [CARBONDATA-1756] Improve Boolean data compress rate...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1523
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1403/



---

[jira] [Closed] (CARBONDATA-1660) Incorrect result displays while executing select query with where clause for decimal data type

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1660.
-

Resolved

> Incorrect result displays while executing select query with where clause for 
> decimal data type
> --
>
> Key: CARBONDATA-1660
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1660
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: anubhav tarar
>Priority: Minor
> Fix For: 1.3.0
>
> Attachments: 2000_UniqData.csv
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Incorrect result displays while executing select query with where clause for 
> decimal data type
> Steps to reproduce:
> On Beeline:
> 1) Create Table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB")
> 2)Load Data:
> LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into 
> table uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Start presto server:
> bin/launcher run
> 4) run presto CLI:
> ./presto --server localhost:9000 --catalog carbondata --schema newpresto
> On presto CLI
> 1) Execute select Query:
> select cust_name from uniqdata where decimal_column1=12345678902.123400;
> Expected Result: it should display the cust_name as on beeline
> +--+--+
> |cust_name |
> +--+--+
> | CUST_NAME_1  |
> +--+--+
> Actual Result:
> it throws an error saying error while setting filter expression to job.
> presto:newpresto> select cust_name from uniqdata where 
> decimal_column1=12345678902.123400;
> Query 20171031_074909_00013_k9q68 failed: Error while setting filter 
> expression to Job



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1541
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1845/



---

[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1537
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1402/



---

[GitHub] carbondata pull request #1559: [CARBONDATA-1805][Dictionary] Optimize prunin...

2017-11-23 Thread xuchuanyin

GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/1559

[CARBONDATA-1805][Dictionary] Optimize pruning for dictionary loading

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [X] Any interfaces changed?
  `NO`
 - [X] Any backward compatibility impacted?
  `NO`
 - [X] Document update required?
  `NO`
 - [X] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
`NO TESTS ADDED, PERFORMANCE ENHANCEMENT DIDN'T AFFECT THE 
FUNCTIONALITY`
- How it is tested? Please attach test report.
`TESTED IN CLUSTER WITH REAL DATA`
- Is it a performance related change? Please attach the performance 
test report.
`PERFORMANCE ENHANCED, DICTIONARY TIME REDUCED FROM 2.9MIN TO 29SEC`
- Any additional information to help reviewers in testing this 
change.
`NO`
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
`NOT RELATED`

COPY FROM JIRA
===

# SCENARIO

Recently I have tried dictionary feature in Carbondata and found its 
dictionary generating phase in data loading is quite slow. My scenario is as 
below:

+ Input Data: 35.8GB CSV file with 199 columns and 126 Million lines

+ Dictionary columns: 3 columns each containing 19213,4,9 distinct values

The whole data loading consumes about 2.9min for dictionary generating and 
4.6min for fact data loading -- about 39% of the time are spent on dictionary.

Having observed the nmon result, Ifound the CPU usage were quite high 
during the dictionary generating phase and the Disk, Network were quite normal.

# ANALYZE

After I went through the dictionary generating related code, I found 
Carbondata aleady prune non-dictionary columns before generating dictionary. 
But the problem is that `the pruning comes after data file reading`, this will 
cause some overhead, we can optimize it by `prune while reading data file`.

# RESOLVE

Refactor the `loadDataFrame` method in `GlobalDictionaryUtil`, only pruning 
the non-dictionary columns while reading the data file.

After implementing the above optimization, the dictionary generating costs 
only `29s` -- **`about 6 times better than before`**(2.9min), and the fact data 
loading costs the same as before(4.6min), about 10% of the time are spent on 
dictionary.

# NOTE

+ Currently only `load data file` will benefit from this optimization, 
while `load data frame` will not.

+ Before implementing this solution, I tried another solution -- cache 
dataframe of the data file, the performance was even worse -- the dictionary 
generating time was 5.6min.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata opt_dict_load

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1559


commit e8e49ed54085700eadde81842af0b0daecaed12a
Author: xuchuanyin 
Date:   2017-11-24T03:27:02Z

optimize pruning for dictionary loading




---

[jira] [Created] (CARBONDATA-1805) Optimize pruning for dictionary loading

2017-11-23 Thread xuchuanyin (JIRA)

xuchuanyin created CARBONDATA-1805:
--

 Summary: Optimize pruning for dictionary loading
 Key: CARBONDATA-1805
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1805
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load, spark-integration
Reporter: xuchuanyin
Assignee: xuchuanyin
 Fix For: 1.3.0


# SCENARIO

Recently I tried dictionary feature in Carbondata and found its dictionary 
generating phase in data loading is quite slow. My scenario is as below:

+ Input Data: 35.8GB CSV file with 199 columns and 126 Million lines

+ Dictionary columns: 3 columns each containing 19213,4,9 distinct values

The whole data loading consumes about 2.9min for dictionary generating and 
4.6min for fact data loading -- about 39% of the time are spent on dictionary.

Having observed the nmon result, Ifound the CPU usage were quite high during 
the dictionary generating phase and the Disk, Network were quite normal.

# ANALYZE

After I went through the dictionary generating related code, I found Carbondata 
aleady prune non-dictionary columns before generating dictionary. But the 
problem is that `the pruning comes after data file reading`, this will cause 
some overhead, we can optimize it by `prune while reading data file`.

# RESOLVE

Refactor the `loadDataFrame` method in `GlobalDictionaryUtil`, only pruning the 
non-dictionary columns while reading the data file.

After implementing the above optimization, the dictionary generating costs only 
`29s` -- `about 6 times better than before`(2.9min), and the fact data loading 
costs the same as before(4.6min), about 10% of the time are spent on dictionary.

# NOTE

+ Currently only `load data file` will benefit from this optimization, while 
`load data frame` will not.

+ Before implementing this solution, I tried another solution -- cache 
dataframe of the data file, the performance was even worse -- the dictionary 
generating time was 5.6min.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1540
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1401/



---

[jira] [Created] (CARBONDATA-1804) Make FileOperations Pluggable

2017-11-23 Thread Manohar Vanam (JIRA)

Manohar Vanam created CARBONDATA-1804:
-

 Summary: Make FileOperations Pluggable
 Key: CARBONDATA-1804
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1804
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: Manohar Vanam
Assignee: Manohar Vanam


1. Refactor FileFactory based on FileType to support plug-gable file handlers 
so that custom file handlers can have their specific logic.
Example : User can provide his own implementations by extending existing 
FileTypes





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1542
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1844/



---

[GitHub] carbondata pull request #1558: [CARBONDATA-1803] Changing format of Show seg...

2017-11-23 Thread dhatchayani

GitHub user dhatchayani opened a pull request:

https://github.com/apache/carbondata/pull/1558

[CARBONDATA-1803] Changing format of Show segments

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
   Manual Testing
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhatchayani/incubator-carbondata 
show_segments_format

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1558


commit 1ac6c6132eff013416264b25eba8ee9a1d3ae3e1
Author: dhatchayani 
Date:   2017-11-24T05:56:10Z

[CARBONDATA-1803] Changing format of Show segments




---

[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1541
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1400/



---

[GitHub] carbondata pull request #1557: [CARBONDATA-1796] While submitting new job to...

2017-11-23 Thread dhatchayani

GitHub user dhatchayani opened a pull request:

https://github.com/apache/carbondata/pull/1557

[CARBONDATA-1796] While submitting new job to HadoopRdd, token should be 
generated for accessing paths

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
Manual Testing
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhatchayani/incubator-carbondata 
delegation_token1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1557


commit 634e15b745bce4ec2ee4e9d7b5922b33e420d7eb
Author: dhatchayani 
Date:   2017-11-24T05:41:50Z

[CARBONDATA-1796] While submitting new job to HadoopRdd, token should be 
generated for accessing paths




---

[jira] [Assigned] (CARBONDATA-1803) Changing format of Show segments

2017-11-23 Thread dhatchayani (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhatchayani reassigned CARBONDATA-1803:
---

Assignee: dhatchayani

> Changing format of Show segments
> 
>
> Key: CARBONDATA-1803
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1803
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: dhatchayani
>Assignee: dhatchayani
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1803) Changing format of Show segments

2017-11-23 Thread dhatchayani (JIRA)

dhatchayani created CARBONDATA-1803:
---

 Summary: Changing format of Show segments
 Key: CARBONDATA-1803
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1803
 Project: CarbonData
  Issue Type: Improvement
Reporter: dhatchayani
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-1428) Incorrect Result displays while alter drop command on partitioned and non-partitioned table

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1428.
-
Resolution: Fixed

working fine

> Incorrect Result displays while alter drop command on partitioned and 
> non-partitioned table
> ---
>
> Key: CARBONDATA-1428
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1428
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.2.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
> Attachments: 2000_UniqData.csv
>
>
> Incorrect Result displays while alter drop command on partitioned and 
> non-partitioned table
> Steps to reproduce:
> 1) Create a partitioned table
> CREATE TABLE uniqdata_part1 (CUST_NAME String,ACTIVE_EMUI_VERSION string,DOB 
> Timestamp,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) PARTITIONED BY (CUST_ID int) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES 
> ('PARTITION_TYPE'='RANGE','RANGE_INFO'='9090,9500,9800',"TABLE_BLOCKSIZE"= 
> "256 MB")
> 2) Load data into partitioned table
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into 
> table uniqdata_part1 OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Execute drop column query on partitioned table
> ALTER TABLE uniqdata_part1 drop columns(BIGINT_COLUMN1)
> 4) Result on beeline
> Error: java.lang.RuntimeException: Alter table drop column operation failed: 
> Column bigint_column1 does not exists in the table partition.uniqdata_part1 
> (state=,code=0)
> 5) Expected Result:
> it should drop the column from the partitioned table as it is not the 
> partitioned column and existing column of the table.
> +--+-+--+--+
> |   col_name   |data_type| comment  |
> +--+-+--+--+
> | CUST_NAME| string  | NULL |
> | ACTIVE_EMUI_VERSION  | string  | NULL |
> | DOB  | timestamp   | NULL |
> | DOJ  | timestamp   | NULL |
> | BIGINT_COLUMN1   | bigint  | NULL |
> | BIGINT_COLUMN2   | bigint  | NULL |
> | DECIMAL_COLUMN1  | decimal(30,10)  | NULL |
> | DECIMAL_COLUMN2  | decimal(36,10)  | NULL |
> | Double_COLUMN1   | double  | NULL |
> | Double_COLUMN2   | double  | NULL |
> | INTEGER_COLUMN1  | int | NULL |
> | CUST_ID  | int | NULL |



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1544: [CARBONDATA-1740] Fixed order by issue in case of pr...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1544
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1843/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1542
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1399/



---

[GitHub] carbondata issue #1544: [CARBONDATA-1740] Fixed order by issue in case of pr...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1544
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1398/



---

[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1545
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1842/



---

[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1545
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1397/



---

[GitHub] carbondata issue #1546: [CARBONDATA-1736] Query from segment set is not effe...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1546
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1841/



---

[GitHub] carbondata issue #1546: [CARBONDATA-1736] Query from segment set is not effe...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1546
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1396/



---

[GitHub] carbondata pull request #1547: [CARBONDATA-1792] Add example of data managem...

2017-11-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1547


---

[jira] [Resolved] (CARBONDATA-1792) Adding example of data management for Spark2.X

2017-11-23 Thread Liang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen resolved CARBONDATA-1792.

   Resolution: Fixed
Fix Version/s: 1.3.0

> Adding example of data management for Spark2.X
> --
>
> Key: CARBONDATA-1792
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1792
> Project: CarbonData
>  Issue Type: Task
>  Components: examples
>Affects Versions: 1.3.0
>Reporter: Zhoujin
>Assignee: Jin Zhou
>Priority: Minor
> Fix For: 1.3.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Adding example of data management for Spark2.X



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (CARBONDATA-1792) Adding example of data management for Spark2.X

2017-11-23 Thread Liang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen reassigned CARBONDATA-1792:
--

Assignee: Jin Zhou

> Adding example of data management for Spark2.X
> --
>
> Key: CARBONDATA-1792
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1792
> Project: CarbonData
>  Issue Type: Task
>  Components: examples
>Affects Versions: 1.3.0
>Reporter: Zhoujin
>Assignee: Jin Zhou
>Priority: Minor
> Fix For: 1.3.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Adding example of data management for Spark2.X



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (CARBONDATA-1792) Adding example of data management for Spark2.X

2017-11-23 Thread Liang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen updated CARBONDATA-1792:
---
Priority: Minor  (was: Major)

> Adding example of data management for Spark2.X
> --
>
> Key: CARBONDATA-1792
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1792
> Project: CarbonData
>  Issue Type: Task
>  Components: examples
>Affects Versions: 1.3.0
>Reporter: Zhoujin
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Adding example of data management for Spark2.X



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1538: [CARBONDATA-1779] GenericVectorizedReader

2017-11-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1538


---

[GitHub] carbondata issue #1547: [CARBONDATA-1792] Add example of data management for...

2017-11-23 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1547
  
LGTM


---

[GitHub] carbondata pull request #1556: [CARBONDATA-1770] Updated documentaion for da...

2017-11-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1556


---

[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...

2017-11-23 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1556
  
LGTM


---

[GitHub] carbondata issue #1496: [CARBONDATA-1709][DataFrame] Support sort_columns op...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1496
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1395/



---

[GitHub] carbondata issue #1496: [CARBONDATA-1709][DataFrame] Support sort_columns op...

2017-11-23 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/1496
  
retest this please


---

[GitHub] carbondata issue #1499: [WIP][CARBONDATA-1235]Add Lucene Datamap

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1499
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1840/



---

[GitHub] carbondata issue #1499: [WIP][CARBONDATA-1235]Add Lucene Datamap

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1499
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1394/



---

[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1104
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1839/



---

[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1104
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1393/



---

[jira] [Updated] (CARBONDATA-1802) Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column

2017-11-23 Thread Ajeet Rai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajeet Rai updated CARBONDATA-1802:
--
Description: 
Carbon1.3.0  Alter:Alter query fails if a column is dropped and there is no key 
column.

Steps:
1: create table ttt(c int,d int,e int) stored by 'carbondata';
2: Alter table ttt drop columns(c);
3: observe that below error is coming:
Error: java.lang.RuntimeException: Alter table drop column operation failed: 
Alter drop operation failed. AtLeast one key column should exist after drop.

Expected: Since user is able to create a table with all numeric columns, Same 
should be supported in Alter feature.

  was:
Carbon1.3.0  Alter:Alter query fails if a column is dropped and there is no key 
column.

Steps:
1: create table ttt(c int,d int,e int) stored by 'carbondata';
2: Alter table ttt drop columns(c);
3: observe that below error is coming:
Error: java.lang.RuntimeException: Alter table drop column operation failed: 
Alter drop operation failed. AtLeast one key column should exist after drop.


> Carbon1.3.0  Alter:Alter query fails if a column is dropped and there is no 
> key column
> --
>
> Key: CARBONDATA-1802
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1802
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.3.0
> Environment:   3 Node ant cluster
>Reporter: Ajeet Rai
>  Labels: functional
>
> Carbon1.3.0  Alter:Alter query fails if a column is dropped and there is no 
> key column.
> Steps:
> 1: create table ttt(c int,d int,e int) stored by 'carbondata';
> 2: Alter table ttt drop columns(c);
> 3: observe that below error is coming:
> Error: java.lang.RuntimeException: Alter table drop column operation failed: 
> Alter drop operation failed. AtLeast one key column should exist after drop.
> Expected: Since user is able to create a table with all numeric columns, Same 
> should be supported in Alter feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1802) Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column

2017-11-23 Thread Ajeet Rai (JIRA)

Ajeet Rai created CARBONDATA-1802:
-

 Summary: Carbon1.3.0  Alter:Alter query fails if a column is 
dropped and there is no key column
 Key: CARBONDATA-1802
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1802
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 1.3.0
 Environment:   3 Node ant cluster
Reporter: Ajeet Rai


Carbon1.3.0  Alter:Alter query fails if a column is dropped and there is no key 
column.

Steps:
1: create table ttt(c int,d int,e int) stored by 'carbondata';
2: Alter table ttt drop columns(c);
3: observe that below error is coming:
Error: java.lang.RuntimeException: Alter table drop column operation failed: 
Alter drop operation failed. AtLeast one key column should exist after drop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1104
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1392/



---

[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1104
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1838/



---

[jira] [Closed] (CARBONDATA-1103) Integer datatype as a long datatype in carbondata on cluster

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1103.
-
Resolution: Fixed

resolved 

> Integer datatype as a long datatype in carbondata on cluster
> 
>
> Key: CARBONDATA-1103
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1103
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.2.0
> Environment: spark 1.6
>Reporter: Vandana Yadav
>Priority: Minor
>
> Integer datatype as a long datatype in carbondata on cluster
> Steps to reproduce Bug:
> In CarbonData:
> Create Table:
> create table myvmall (imei String,uuid String,MAC String,device_color 
> String,device_shell_color String,device_name String,product_name String,ram 
> String,rom String,cpu_clock String,series String,check_date String,check_year 
> int,check_month int ,check_day int,check_hour int,bom String,inside_name 
> String,packing_date String,packing_year String,packing_month 
> String,packing_day String,packing_hour String,customer_name 
> String,deliveryAreaId String,deliveryCountry String,deliveryProvince 
> String,deliveryCity String,deliveryDistrict String,packing_list_no 
> String,order_no String,Active_check_time String,Active_check_year 
> int,Active_check_month int,Active_check_day int,Active_check_hour 
> int,ActiveAreaId String,ActiveCountry String,ActiveProvince String,Activecity 
> String,ActiveDistrict String,Active_network String,Active_firmware_version 
> String,Active_emui_version String,Active_os_version String,Latest_check_time 
> String,Latest_check_year int,Latest_check_month int,Latest_check_day 
> int,Latest_check_hour int,Latest_areaId String,Latest_country 
> String,Latest_province String,Latest_city String,Latest_district 
> String,Latest_firmware_version String,Latest_emui_version 
> String,Latest_os_version String,Latest_network String,site String,site_desc 
> String,product String,product_desc String) STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES 
> ('DICTIONARY_INCLUDE'='check_year,check_month,check_day,check_hour,Active_check_year,Active_check_month,Active_check_day,Active_check_hour,Latest_check_year,Latest_check_month,Latest_check_day')
>  Load Data:
> LOAD DATA INPATH 
> 'HDFS_URL/BabuStore/Data/100_VMALL_1_Day_DATA_2015-09-15.csv' INTO table 
> myvmall options('DELIMITER'=',', 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='imei,uuid,MAC,device_color,device_shell_color,device_name,product_name,ram,rom,cpu_clock,series,check_date,check_year,check_month,check_day,check_hour,bom,inside_name,packing_date,packing_year,packing_month,packing_day,packing_hour,customer_name,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,packing_list_no,order_no,Active_check_time,Active_check_year,Active_check_month,Active_check_day,Active_check_hour,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,Active_network,Active_firmware_version,Active_emui_version,Active_os_version,Latest_check_time,Latest_check_year,Latest_check_month,Latest_check_day,Latest_check_hour,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_firmware_version,Latest_emui_version,Latest_os_version,Latest_network,site,site_desc,product,product_desc')
> description in carbondata:
> +--++--+--+
> | col_name | data_type  | comment  |
> +--++--+--+
> | imei | string |  |
> | uuid | string |  |
> | mac  | string |  |
> | device_color | string |  |
> | device_shell_color   | string |  |
> | device_name  | string |  |
> | product_name | string |  |
> | ram  | string |  |
> | rom  | string |  |
> | cpu_clock| string |  |
> | series   | string |  |
> | check_date   | string |  |
> | check_year   | int|  |
> | check_month  | int|  |
> | check_day| int|  |
> | check_hour   | int|  |
> | bom  | string |  |
> | inside_name  | string |  |
> | packing_date | string |  |
> | packing_year | string |  |
> | packing_month| string |  |
> | packing_day  | string |  |
> | packing_hour | string |  |
> | customer_name| string

[jira] [Closed] (CARBONDATA-1086) Add documentation for batch sort support for data loading

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1086.
-
Resolution: Fixed

PR is closed

> Add documentation for batch sort support for data loading
> -
>
> Key: CARBONDATA-1086
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1086
> Project: CarbonData
>  Issue Type: Improvement
>  Components: docs
>Reporter: Vandana Yadav
>Assignee: Pallavi Singh
>Priority: Minor
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Improves Loading Performance
> Commands to be added ( JIRA 742,JIRA 1047)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-1085) add documentation for size based blocklet for V3 data format

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1085.
-
Resolution: Fixed

PR is closed

> add documentation for size based blocklet for V3 data format 
> -
>
> Key: CARBONDATA-1085
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1085
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vandana Yadav
>Assignee: Pallavi Singh
>Priority: Minor
>
> Configurable number of pages to improve IO by specifying the property in 
> carbon.properties ( JIRA 766)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-1084) Add documentation for V3 Data Format

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1084.
-
Resolution: Fixed

PR is closed

> Add documentation for V3 Data Format
> 
>
> Key: CARBONDATA-1084
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1084
> Project: CarbonData
>  Issue Type: Improvement
>  Components: docs
>Reporter: Vandana Yadav
>Assignee: Pallavi Singh
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Benefits to be added in documentation and add commands to set this format and 
> specify that this is the dafault format



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (CARBONDATA-995) Incorrect result displays while using variance aggregate function in presto integration

2017-11-23 Thread Vandana Yadav (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264256#comment-16264256
 ] 

Vandana Yadav commented on CARBONDATA-995:
--

while operating the same query on hive it results differently 
1)Create table:
hive> CREATE TABLE uniqdata_h (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

2)Load data:
hive> load data local inpath 
'/home/knoldus/Desktop/csv/TestData/Data/uniqdata/2000_UniqData.csv' into table 
uniqdata_h

3)Execute query:
hive> select variance(DECIMAL_COLUMN1) as a   from (select DECIMAL_COLUMN1 from 
UNIQDATA_h order by DECIMAL_COLUMN1) t;
Query ID = knoldus_20171123174059_cdc24e03-f8b1-41d5-b496-3fa3acbc4608
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Job running in-process (local Hadoop)
2017-11-23 17:41:00,945 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1774409020_0004
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 3009784 HDFS Write: 752446 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
333665.7302720188
Time taken: 1.512 seconds, Fetched: 1 row(s)


> Incorrect result displays while using variance aggregate function in presto 
> integration
> ---
>
> Key: CARBONDATA-995
> URL: https://issues.apache.org/jira/browse/CARBONDATA-995
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query, presto-integration
>Affects Versions: 1.1.0
> Environment: spark 2.1 , presto 0.166
>Reporter: Vandana Yadav
>Priority: Minor
> Attachments: 2000_UniqData.csv
>
>
> Incorrect result displays while using variance aggregate function in presto 
> integration
> Steps to reproduce :
> 1. In CarbonData:
> a) Create table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB");
> b) Load data : 
> LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> 2. In presto 
> a) Execute the query:
> select variance(DECIMAL_COLUMN1) as a   from (select DECIMAL_COLUMN1 from 
> UNIQDATA order by DECIMAL_COLUMN1) t
> Actual result :
> In CarbonData :
> "++--+
> | a  |
> ++--+
> | 333832.4983039884  |
> ++--+
> 1 row selected (0.695 seconds)
> "
> in presto:
> " a 
> ---
>  333832.3010442859 
> (1 row)
> Query 20170420_082837_00062_hd7jy, FINISHED, 1 node
> Splits: 35 total, 35 done (100.00%)
> 0:00 [2.01K rows, 1.97KB] [8.09K rows/s, 7.91KB/s]"
> Expected result: it should display the same result as showing in CarbonData.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (CARBONDATA-1793) Insert / update is allowing more than 32000 characters for String column

2017-11-23 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-1793.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

> Insert / update is allowing more than 32000 characters for String column
> 
>
> Key: CARBONDATA-1793
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1793
> Project: CarbonData
>  Issue Type: Bug
>Reporter: dhatchayani
>Assignee: dhatchayani
>Priority: Minor
> Fix For: 1.3.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1549: [CARBONDATA-1793] Insert / update is allowing...

2017-11-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1549


---

[GitHub] carbondata issue #1549: [CARBONDATA-1793] Insert / update is allowing more t...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1549
  
LGTM


---

[GitHub] carbondata issue #1549: [CARBONDATA-1793] Insert / update is allowing more t...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1549
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1391/



---

[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1556
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1837/



---

[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1556
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1390/



---

[jira] [Closed] (CARBONDATA-983) Incorrect result displays while using not equal to (!=) operator in presto integration

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-983.

Resolution: Fixed

resolved

> Incorrect result displays while using not equal to (!=) operator in presto 
> integration
> --
>
> Key: CARBONDATA-983
> URL: https://issues.apache.org/jira/browse/CARBONDATA-983
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query, presto-integration
>Affects Versions: 1.1.0
> Environment: spark 2.1, presto 0.166
>Reporter: Vandana Yadav
>Priority: Minor
> Attachments: 2000_UniqData.csv
>
>
> Incorrect result displays while using not equal to (!=) operator in presto 
> integration(result set should exclude the provided record but it is present 
> in our   result set)
> Steps to reproduce :
> 1. In CarbonData:
> a) Create table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB");
> b) Load data : 
> LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> 2. In presto 
> a) Execute the query: 
> DECIMAL_COLUMN1 from UNIQDATA where DECIMAL_COLUMN1 !=12345678902.123400 
> order by DECIMAL_COLUMN1;
> b) Actual Result:
> In Carbondata:
> +-+--+
> | DECIMAL_COLUMN1 |
> +-+--+
> | 12345678901.123400  |
> | 12345678901.123400  |
> | 12345678903.123400  |
> | 12345678904.123400  |
> | 12345678905.123400  |
> In presto:
> DECIMAL_COLUMN1 
> 
>  12345678901.123400 
>  12345678901.123400 
>  12345678902.123400 
>  12345678903.123400 
>  12345678904.123400 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (CARBONDATA-1797) Segment_Index compaction should take compaction lock to support concurrent scenarios better

2017-11-23 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-1797.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

> Segment_Index compaction should take compaction lock to support concurrent 
> scenarios better
> ---
>
> Key: CARBONDATA-1797
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1797
> Project: CarbonData
>  Issue Type: Bug
>Reporter: dhatchayani
>Assignee: dhatchayani
> Fix For: 1.3.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SEGMENT_INDEX compaction is not taking compaction lock. While concurrent 
> operation, this may be successful but the output may not be as expected.
> Scenario:
> Execute MINOR compaction and SEGMENT_INDEX compaction concurrently.
> As SEGMENT_INDEX compaction is not taking any lock it will do tasks in 
> between, finally some segments index files will be merged, probably the newly 
> created segments may be left out.
> Solution:
> To take compaction lock



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-979) Incorrect result displays to user in presto integration as compare to CarbonData.

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-979.

Resolution: Fixed

Resollved

> Incorrect result displays to user in presto integration as compare to 
> CarbonData.
> -
>
> Key: CARBONDATA-979
> URL: https://issues.apache.org/jira/browse/CARBONDATA-979
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query, presto-integration
>Affects Versions: 1.1.0
> Environment: Spark 2.1,Presto 0.166
>Reporter: Vandana Yadav
>Priority: Minor
> Attachments: 2000_UniqData.csv
>
>
> Incorrect result displays to user in presto integration as compare to 
> CarbonData (As in Carbondata our result set include null values but in presto 
> it exclude those).
> Steps to reproduce :
> 1. In CarbonData:
> a) Create table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB"); 
> b) Load data : 
> LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
>  
> 2. In presto 
> a) Execute the query: 
> select CUST_NAME from uniqdata where CUST_NAME !='CUST_NAME_01844' order by 
> CUST_NAME
>  expected result : it should display all cust_name except "cust_name_01844"
> Actual result: 
> In CarbonData:
> "| CUST_NAME_01995  |
> | CUST_NAME_01996  |
> | CUST_NAME_01997  |
> | CUST_NAME_01998  |
> | CUST_NAME_01999  |
> +--+--+
> 2,012 rows selected (1.777 seconds)
> "
> In presto:
> "CUST_NAME_01997 
>  CUST_NAME_01998 
>  CUST_NAME_01999 
> (2000 rows)
> Query 20170418_105903_00012_disp5, FINISHED, 1 node
> Splits: 18 total, 18 done (100.00%)
> 3:21 [2.01K rows, 1.97KB] [10 rows/s, 10B/s]
> "



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1553: [CARBONDATA-1797] Segment_Index compaction sh...

2017-11-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1553


---

[GitHub] carbondata issue #1549: [CARBONDATA-1793] Insert / update is allowing more t...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1549
  
retest this please


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1542
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1389/



---

[GitHub] carbondata issue #1553: [CARBONDATA-1797] Segment_Index compaction should ta...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1553
  
LGTM


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1542
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1836/



---

[jira] [Closed] (CARBONDATA-920) errors while executing create table examples from docs

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-920.


resolved

> errors while executing create table examples from docs
> --
>
> Key: CARBONDATA-920
> URL: https://issues.apache.org/jira/browse/CARBONDATA-920
> Project: CarbonData
>  Issue Type: Improvement
>  Components: docs
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: Vandana Yadav
>Priority: Minor
> Fix For: 1.2.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Examples for creating table in docs throw error while 
> execution(docs/useful-tips-on-carbondata.md)
> Steps to reproduce:
> 1. run query from examples to create table
> create table carbondata_table(
>   Dime_1 String,
>   HOST String,
>   MSISDN String,
>   counter_1 double,
>   counter_2 double,
>   BEGIN_TIME bigint,
>   counter_100 double
>   )STORED BY 'org.apache.carbondata.format' 
>   TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI',
>   'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');
> output on beeline:
> Error: org.apache.carbondata.spark.exception.MalformedCarbonCommandException: 
> DICTIONARY_EXCLUDE column: imsi does not exist in table. Please check create 
> table statement. (state=,code=0)
> Expected result :
> It should create table successfully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (CARBONDATA-1786) Getting null pointer exception while loading data into table and while fetching data getting NULL values

2017-11-23 Thread Vandana Yadav (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Yadav closed CARBONDATA-1786.
-
Resolution: Fixed

this bug is resolved with PR 1550

> Getting null pointer exception while loading data into table and while 
> fetching data getting NULL values
> 
>
> Key: CARBONDATA-1786
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1786
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: anubhav tarar
>Priority: Blocker
> Attachments: 2000_UniqData.csv
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Getting null pointer exception while loading data into table and while 
> fetching data getting NULL values
> Steps to reproduce:
> 1)Create table:
>  CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB");
> 2)Load Data
> LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' 
> into table uniqdata OPTIONS('DELIMITER'='/' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','TIMESTAMPFORMAT'='-mm-dd
>  hh:mm:ss');
> 3) Expected result: it should load data into table successfully.
> 4) Actual Result: it throws an error
> Error: java.lang.NullPointerException (state=,code=0)
> logs:
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>   at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.delete(AbstractDFSCarbonFile.java:142)
>   at 
> org.apache.carbondata.processing.util.DeleteLoadFolders.physicalFactAndMeasureMetadataDeletion(DeleteLoadFolders.java:79)
>   at 
> org.apache.carbondata.processing.util.DeleteLoadFolders.deleteLoadFoldersFromFileSystem(DeleteLoadFolders.java:134)
>   at 
> org.apache.carbondata.spark.rdd.DataManagementFunc$.deleteLoadsAndUpdateMetadata(DataManagementFunc.scala:188)
>   at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:281)
>   at 
> org.apache.spark.sql.execution.command.management.LoadTableCommand.loadData(LoadTableCommand.scala:347)
>   at 
> org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:183)
>   at 
> org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:64)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:220)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:163)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteS

[jira] [Commented] (CARBONDATA-1786) Getting null pointer exception while loading data into table and while fetching data getting NULL values

2017-11-23 Thread Vandana Yadav (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264107#comment-16264107
 ] 

Vandana Yadav commented on CARBONDATA-1786:
---

this bug is resolved with #PR1550

> Getting null pointer exception while loading data into table and while 
> fetching data getting NULL values
> 
>
> Key: CARBONDATA-1786
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1786
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Assignee: anubhav tarar
>Priority: Blocker
> Attachments: 2000_UniqData.csv
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Getting null pointer exception while loading data into table and while 
> fetching data getting NULL values
> Steps to reproduce:
> 1)Create table:
>  CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB");
> 2)Load Data
> LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' 
> into table uniqdata OPTIONS('DELIMITER'='/' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','TIMESTAMPFORMAT'='-mm-dd
>  hh:mm:ss');
> 3) Expected result: it should load data into table successfully.
> 4) Actual Result: it throws an error
> Error: java.lang.NullPointerException (state=,code=0)
> logs:
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>   at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.delete(AbstractDFSCarbonFile.java:142)
>   at 
> org.apache.carbondata.processing.util.DeleteLoadFolders.physicalFactAndMeasureMetadataDeletion(DeleteLoadFolders.java:79)
>   at 
> org.apache.carbondata.processing.util.DeleteLoadFolders.deleteLoadFoldersFromFileSystem(DeleteLoadFolders.java:134)
>   at 
> org.apache.carbondata.spark.rdd.DataManagementFunc$.deleteLoadsAndUpdateMetadata(DataManagementFunc.scala:188)
>   at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:281)
>   at 
> org.apache.spark.sql.execution.command.management.LoadTableCommand.loadData(LoadTableCommand.scala:347)
>   at 
> org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:183)
>   at 
> org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:64)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:220)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:163)
>   at 
> org.apache.spar

[GitHub] carbondata issue #1556: [CARBONDATA-1770] Updated documentaion for data-mana...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1556
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1388/



---

[jira] [Resolved] (CARBONDATA-1796) While submitting new job to Hadoop, token should be generated for accessing paths

2017-11-23 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-1796.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

> While submitting new job to Hadoop, token should be generated for accessing 
> paths
> -
>
> Key: CARBONDATA-1796
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1796
> Project: CarbonData
>  Issue Type: Bug
>Reporter: dhatchayani
>Assignee: dhatchayani
> Fix For: 1.3.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In hadoop secure mode cluster,
> while submitting job to hadoopRdd token should be generated for the path in 
> JobConf, else Delegation Token exception will be thrown during load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1552: [CARBONDATA-1796] While submitting new job to...

2017-11-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1552


---

[GitHub] carbondata issue #1552: [CARBONDATA-1796] While submitting new job to Hadoop...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1552
  
LGTM


---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1542
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1834/



---

[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...

2017-11-23 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1542
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1387/



---

[jira] [Resolved] (CARBONDATA-1799) CarbonInputMapperTest is failing

2017-11-23 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-1799.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

> CarbonInputMapperTest is failing
> 
>
> Key: CARBONDATA-1799
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1799
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Rahul Kumar
>Assignee: Rahul Kumar
> Fix For: 1.3.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1555: [CARBONDATA-1799] conf added in testcase

2017-11-23 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1555
  
LGTM


---

[jira] [Commented] (CARBONDATA-1650) load data into hive table fail

2017-11-23 Thread Vandana Yadav (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263977#comment-16263977
 ] 

Vandana Yadav commented on CARBONDATA-1650:
---

can you please check your permissions of the carbondata table status file, 
looking at the logs it seems like there is some permission issue, i am not able 
to reproduce this bug please share reproducible steps

> load data into hive table fail
> --
>
> Key: CARBONDATA-1650
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1650
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 1.2.0
> Environment: hive.version:1.1.0-cdh5.10.0
> hadoop:version:2.6.0-cdh5.10.0
>Reporter: xujie
>Priority: Critical
>
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val rootPath = "hdfs://namenodeb:8020/app/carbondata"
> val storeLocation = s"$rootPath/store"
> val warehouse = s"$rootPath/warehouse"
> val metastoredb = s"$rootPath/metastore_db"
> val carbon = 
> SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", 
> warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION,
>  storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> val rdd = sc.textFile("/data/home/hadoop/test.txt");
> val schemaString = "id name city"
> val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, 
> StringType, nullable = true))
> val schema = StructType(fields)
> val rowRDD = rdd.map(_.split(",")).map(attributes => 
> Row(attributes(0),attributes(1),attributes(2)))
> val peopleDF = spark.createDataFrame(rowRDD, schema)
> peopleDF.createOrReplaceTempView("tmp_table")
> spark.sql("insert into target_table SELECT * FROM tmp_table")
> java.lang.RuntimeException: Failed to add entry in table status for 
> default.target_table
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.carbondata.spark.util.CommonUtil$.readAndUpdateLoadProgressInTableMeta(CommonUtil.scala:533)
>   at 
> org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:928)
>   at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
>   at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
>   at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
>   at 
> org.apache.spark.sql.CarbonDatasourceHadoopRelation.insert(CarbonDatasourceHadoopRelation.scala:98)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoDataSourceCommand.run(InsertIntoDataSourceCommand.scala:43)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:185)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>   ... 52 elided



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1538: [CARBONDATA-1779] GenericVectorizedReader

2017-11-23 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1538#discussion_r152741031
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java
 ---
@@ -141,24 +135,25 @@ public 
SafeVariableLengthDimensionDataChunkStore(boolean isInvertedIndex, int nu
   // for last record
   length = (short) (this.data.length - currentDataOffset);
 }
-DataType dt = vector.getType();
-if ((!(dt instanceof StringType) && length == 0) || 
ByteUtil.UnsafeComparer.INSTANCE
+org.apache.carbondata.core.metadata.datatype.DataType dt = 
vector.getType();
--- End diff --

why do this change ?   remove import, add  the full import at here ?


---

[GitHub] carbondata pull request #1556: [CARBONDATA-1770] Updated documentaion for da...

2017-11-23 Thread chenliang613

Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1556#discussion_r152738129
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -294,11 +294,11 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 ```
 NOTE: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together.
 
-  - **DATEFORMAT:** Date format for specified column.
+  - **DATEFORMAT/TIMESTAMPFORMAT:** Date and Timestamp format for 
specified column.
 
 ```
-OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2')
-```
+OPTIONS('dateformat' = '-MM-dd','timestampformat'='/MM/dd 
HH:mm:ss')
--- End diff --

please use uppercase for "dateformat,timestampformat".


---

86 matches

Mail list logo