[jira] [Created] (CARBONDATA-2222) Update the FAQ doc for some mistakes

2018-03-04 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-:


 Summary: Update the FAQ doc for some mistakes
 Key: CARBONDATA-
 URL: https://issues.apache.org/jira/browse/CARBONDATA-
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-1895) Fix issue of create table if not exits

2017-12-13 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1895:


 Summary: Fix issue of create table if not exits 
 Key: CARBONDATA-1895
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1895
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1835) Fix null exception when get table details

2017-11-28 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1835:


 Summary: Fix null exception when get table details
 Key: CARBONDATA-1835
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1835
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1778) Support clean garbage segments for all

2017-11-20 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258972#comment-16258972
 ] 

chenerlu commented on CARBONDATA-1778:
--

Now Carbon only support clean garbage segments for specified table.
Carbon should provide the ability to clean all garbage segments without 
specified the database name and table name.

> Support clean garbage segments for all
> --
>
> Key: CARBONDATA-1778
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1778
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1778) Support clean garbage segments for all

2017-11-20 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1778:


 Summary: Support clean garbage segments for all
 Key: CARBONDATA-1778
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1778
 Project: CarbonData
  Issue Type: Improvement
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1618) Fix issue of not supporting table comment

2017-10-25 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1618:


 Summary: Fix issue of not supporting table comment
 Key: CARBONDATA-1618
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1618
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1438) Unify the sort column and sort scope in create table command

2017-08-31 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1438:
-
Description: 
1   Requirement
Currently, Users can specify sort column in table properties when create table. 
And when load data, users can also specify sort scope in load options.
In order to improve the ease of use for users, it will be better to specify the 
sort related parameters all in create table command.
Once sort scope is specified in create table command, it will be used in load 
data even users have specified in load options.

2   Detailed design
2.1 Task-01
Requirement: Create table can support specify sort scope
Implement: Take use of table properties (Map), will specify 
sort scope in table properties by key/value pair, then existing interface will 
be called to write this key/value pair into metastore.
Will support Global Sort,Local Sort and No Sort,it can be specified in sql 
command:

CREATE TABLE tableWithGlobalSort (
shortField SHORT,
intField INT,
bigintField LONG,
doubleField DOUBLE,
stringField STRING,
timestampField TIMESTAMP,
decimalField DECIMAL(18,2),
dateField DATE,
charField CHAR(5)
)
STORED BY 'carbondata'
TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT')
 
Tips:If the sort scope is global Sort, users should specify 
GLOBAL_SORT_PARTITIONS. If users do not specify it, it will use the number of 
map task. GLOBAL_SORT_PARTITIONS should be Integer type, the range is 
[1,Integer.MaxValue],it is only used when the sort scope is global sort. 

Global Sort Use orderby operator in spark, data is ordered in segment level.
Local Sort  Node ordered, carbondata file is ordered if it is written by 
one task. 
No Sort No sort

Tips:key and value is case-insensitive.
2.2 Task-02
Requirement:
Load data in will support local sort, no sort, global sort 
Ignore the sort scope specified in load data and use the parameter which 
specified in create table.

Currently, user can specify the sort scope and global sort partitions in load 
options, After modification, it will ignore the sort scope which specified in 
load options and will get sort scope from table properties.

Current logic: sort scope is from load options
Number  PrerequisiteSort scope
1   isSortTable is true && Sort Scope is Global SortGlobal 
Sort(first check)
2   isSortTable is falseNo Sort
3   isSortTable is true Local Sort
Tips: isSortTable is true means this table contains sort column or it contains 
dimensions (except complex type), like string type.

For example:
Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table
Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort table
Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties 
(‘sort_column’=’col1’)  –- sort table

New logic:sort scope is from create table
Number  PrerequisiteCode branch
1   isSortTable = true && Sort Scope is Global Sort Global Sort(first check)
2   isSortTable= false || Sort Scope is No Sort No Sort
3   isSortTable is true && Sort Scope is Local Sort Local Sort
4   isSortTable is true,without specify Sort Scope  Local Sort, (Keep 
current logic) 

3   Acceptance standard
Number  Acceptance standard
1   Use can specify sort scope(global, local, no sort) when create carbon 
table in sql type
2   Load data will ignore the sort scope specified in load options and will 
use the parameter which specify in create table command. If user still specify 
the sort scope in load options, will give warning and inform user that he will 
use the sort scope which specified in create table.

4   Feature restrictions
NA
5   Dependencies
NA
6   Technical risk
NA


  was:

1   Requirement
Currently, Users can specify sort column in table properties when create table. 
And when load data, users can also specify sort scope in load options.
In order to improve the ease of use for users, it will be better to specify the 
sort related parameters all in create table command.
Once sort scope is specified in create table command, it will be used in load 
data even users have specified in load options.

2   Detailed design
2.1 Task-01
Requirement: Create table can support specify sort scope
Implement: Take use of table properties (Map), will specify 
sort scope in table properties by key/value pair, then existing interface will 
be called to write this key/value pair into metastore.
Will support Global Sort,Local Sort and No Sort,it can be specified in sql 
command:

CREATE TABLE tableWithGlobalSort (
shortField SHORT,
intField INT,
bigintField LONG,
doubleField DOUBLE,
stringField STRING,
timestampField TIMESTAMP,
decimalField DECIMAL(18,2),
dateField DATE,
charField CHAR(5)
)
STORED BY 'carbondata'
TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT')
 
T

[jira] [Created] (CARBONDATA-1438) Unify the sort column and sort scope in create table command

2017-08-31 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1438:


 Summary: Unify the sort column and sort scope in create table 
command
 Key: CARBONDATA-1438
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1438
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu



1   Requirement
Currently, Users can specify sort column in table properties when create table. 
And when load data, users can also specify sort scope in load options.
In order to improve the ease of use for users, it will be better to specify the 
sort related parameters all in create table command.
Once sort scope is specified in create table command, it will be used in load 
data even users have specified in load options.

2   Detailed design
2.1 Task-01
Requirement: Create table can support specify sort scope
Implement: Take use of table properties (Map), will specify 
sort scope in table properties by key/value pair, then existing interface will 
be called to write this key/value pair into metastore.
Will support Global Sort,Local Sort and No Sort,it can be specified in sql 
command:

CREATE TABLE tableWithGlobalSort (
shortField SHORT,
intField INT,
bigintField LONG,
doubleField DOUBLE,
stringField STRING,
timestampField TIMESTAMP,
decimalField DECIMAL(18,2),
dateField DATE,
charField CHAR(5)
)
STORED BY 'carbondata'
TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT')
 
Tips:If the sort scope is global Sort, users should specify 
GLOBAL_SORT_PARTITIONS. If users do not specify it, it will use the number of 
map task. GLOBAL_SORT_PARTITIONS should be Integer type, the range is 
[1,Integer.MaxValue],it is only used when the sort scope is global sort. 

Global Sort Use orderby operator in spark, data is ordered in segment level.
Local Sort  Node ordered, carbondata file is ordered if it is written by 
one task. 
No Sort No sort

Tips:key and value is case-insensitive.
2.2 Task-02
Requirement:
Load data in will support local sort, no sort, global sort 
Ignore the sort scope specified in load data and use the parameter which 
specified in create table.

Currently, user can specify the sort scope and global sort partitions in load 
options, After modification, it will ignore the sort scope which specified in 
load options and will get sort scope from table properties.

Current logic: sort scope is from load options
Number  PrerequisiteSort scope
1   isSortTable is true && Sort Scope is Global SortGlobal 
Sort(first check)
2   isSortTable is falseNo Sort
3   isSortTable is true Local Sort
Tips: isSortTable is true means this table contains sort column or it contains 
dimensions (except complex type), like string type.

For example:
Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table
Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort table
Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties 
(‘sort_column’=’col1’)  –- sort table

New logic:sort scope is from create table
Number  PrerequisiteCode branch
1   isSortTable = true && Sort Scope is Global Sort Global Sort(first check)
2   isSortTable= false || Sort Scope is No Sort No Sort
3   isSortTable is true && Sort Scope is Local Sort Local Sort
4   isSortTable is true,without specify Sort Scope  Local Sort, (Keep 
current logic) 

3   Acceptance standard
Number  Acceptance standard
1   Use can specify sort scope(global, local, no sort) when create carbon 
table in sql type
2   Load data will ignore the sort scope specified in load options and will 
use the parameter which specify in create table command. If user still specify 
the sort scope in load options, will give warning and inform user that he will 
use the sort scope which specified in create table.

4   Feature restrictions
NA
5   Dependencies
NA
6   Technical risk
NA




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1403) Compaction log is not correct

2017-08-23 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1403:


 Summary: Compaction log is not correct
 Key: CARBONDATA-1403
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1403
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1376) Fix warn message when setting LOCK_TYPE to HDFSLOCK

2017-08-17 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu reassigned CARBONDATA-1376:


Assignee: chenerlu

> Fix warn message when setting LOCK_TYPE to HDFSLOCK
> ---
>
> Key: CARBONDATA-1376
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1376
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Reporter: Liang Chen
>Assignee: chenerlu
>Priority: Minor
>
> scala> 
> CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE, 
> "HDFSLOCK")
> 17/08/13 20:21:38 WARN CarbonProperties: main The value "null" configured for 
> key carbon.lock.type" is invalid. Using the default value "LOCALLOCK
> res0: org.apache.carbondata.core.util.CarbonProperties = 
> org.apache.carbondata.core.util.CarbonProperties@7730da00
> The below WARN message is not correct, need to optimize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1265) Fix AllDictionaryExample because it is only supported when single_pass is true

2017-07-04 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1265:


 Summary: Fix AllDictionaryExample because it is only supported 
when single_pass is true
 Key: CARBONDATA-1265
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1265
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1264) Fix AllDictionaryExample because it is only supported when single_pass is true

2017-07-04 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1264:


 Summary: Fix AllDictionaryExample because it is only supported 
when single_pass is true
 Key: CARBONDATA-1264
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1264
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1251) Add test cases for IUD feature

2017-06-29 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1251:


 Summary: Add test cases for IUD feature
 Key: CARBONDATA-1251
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1251
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-995) Incorrect result displays while using variance aggregate function in presto integration

2017-06-29 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068075#comment-16068075
 ] 

chenerlu commented on CARBONDATA-995:
-

Hi, What is behave of same operation in hive ?

> Incorrect result displays while using variance aggregate function in presto 
> integration
> ---
>
> Key: CARBONDATA-995
> URL: https://issues.apache.org/jira/browse/CARBONDATA-995
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query, presto-integration
>Affects Versions: 1.1.0
> Environment: spark 2.1 , presto 0.166
>Reporter: Vandana Yadav
>Priority: Minor
> Attachments: 2000_UniqData.csv
>
>
> Incorrect result displays while using variance aggregate function in presto 
> integration
> Steps to reproduce :
> 1. In CarbonData:
> a) Create table:
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES 
> ("TABLE_BLOCKSIZE"= "256 MB");
> b) Load data : 
> LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> 2. In presto 
> a) Execute the query:
> select variance(DECIMAL_COLUMN1) as a   from (select DECIMAL_COLUMN1 from 
> UNIQDATA order by DECIMAL_COLUMN1) t
> Actual result :
> In CarbonData :
> "++--+
> | a  |
> ++--+
> | 333832.4983039884  |
> ++--+
> 1 row selected (0.695 seconds)
> "
> in presto:
> " a 
> ---
>  333832.3010442859 
> (1 row)
> Query 20170420_082837_00062_hd7jy, FINISHED, 1 node
> Splits: 35 total, 35 done (100.00%)
> 0:00 [2.01K rows, 1.97KB] [8.09K rows/s, 7.91KB/s]"
> Expected result: it should display the same result as showing in CarbonData.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1227) Remove useless TableCreator

2017-06-25 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1227:


 Summary: Remove useless TableCreator
 Key: CARBONDATA-1227
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1227
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (CARBONDATA-1203) insert data caused many duplicated data on spark 1.6.2

2017-06-22 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060325#comment-16060325
 ] 

chenerlu edited comment on CARBONDATA-1203 at 6/23/17 2:36 AM:
---

Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count (\*) FROM t3")

Actual result:  t3 will have 20 records. (20 records = 10 + 10, the second '10' 
is because t3 has 10 records, if we change t3 to t4 which have 5 records, the 
result will be 15, so I think carbondata handle constant as '\*', not sure, 
this should be confirm).
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala] [~chenliang613]


was (Author: chenerlu):
Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count (\*) FROM t3")

Actual result:  t3 will have 20 records.
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala] [~chenliang613]

> insert data caused  many duplicated data on spark 1.6.2
> ---
>
> Key: CARBONDATA-1203
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1203
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jarck
>
> I use branch-1.1 do insert test on spark 1.6.2 in my local machine
> I try to  run the sql as below to insert a data
>   spark.sql(s"""
>  insert into $tableName select $id,'$date','$country','$testName'
>  ,'$phoneType','$serialname',$salary from $tableName
>  """).show()
> at last the data has been inserted successfully, but it inserted many 
> duplicated data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (CARBONDATA-1203) insert data caused many duplicated data on spark 1.6.2

2017-06-22 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060325#comment-16060325
 ] 

chenerlu edited comment on CARBONDATA-1203 at 6/23/17 2:24 AM:
---

Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count (\*) FROM t3")

Actual result:  t3 will have 20 records.
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala] [~chenliang613]


was (Author: chenerlu):
Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count (\*) FROM t3")

Actual result:  t3 will have 20 records.
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala]

> insert data caused  many duplicated data on spark 1.6.2
> ---
>
> Key: CARBONDATA-1203
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1203
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jarck
>
> I use branch-1.1 do insert test on spark 1.6.2 in my local machine
> I try to  run the sql as below to insert a data
>   spark.sql(s"""
>  insert into $tableName select $id,'$date','$country','$testName'
>  ,'$phoneType','$serialname',$salary from $tableName
>  """).show()
> at last the data has been inserted successfully, but it inserted many 
> duplicated data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (CARBONDATA-1203) insert data caused many duplicated data on spark 1.6.2

2017-06-22 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060325#comment-16060325
 ] 

chenerlu edited comment on CARBONDATA-1203 at 6/23/17 2:23 AM:
---

Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count (\*) FROM t3")

Actual result:  t3 will have 20 records.
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala]


was (Author: chenerlu):
Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count (*) FROM t3")

Actual result:  t3 will have 20 records.
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala]

> insert data caused  many duplicated data on spark 1.6.2
> ---
>
> Key: CARBONDATA-1203
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1203
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jarck
>
> I use branch-1.1 do insert test on spark 1.6.2 in my local machine
> I try to  run the sql as below to insert a data
>   spark.sql(s"""
>  insert into $tableName select $id,'$date','$country','$testName'
>  ,'$phoneType','$serialname',$salary from $tableName
>  """).show()
> at last the data has been inserted successfully, but it inserted many 
> duplicated data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1203) insert data caused many duplicated data on spark 1.6.2

2017-06-22 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060325#comment-16060325
 ] 

chenerlu commented on CARBONDATA-1203:
--

Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count(*) FROM t3")

Actual result:  t3 will have 20 records.
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala]

> insert data caused  many duplicated data on spark 1.6.2
> ---
>
> Key: CARBONDATA-1203
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1203
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jarck
>
> I use branch-1.1 do insert test on spark 1.6.2 in my local machine
> I try to  run the sql as below to insert a data
>   spark.sql(s"""
>  insert into $tableName select $id,'$date','$country','$testName'
>  ,'$phoneType','$serialname',$salary from $tableName
>  """).show()
> at last the data has been inserted successfully, but it inserted many 
> duplicated data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (CARBONDATA-1203) insert data caused many duplicated data on spark 1.6.2

2017-06-22 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060325#comment-16060325
 ] 

chenerlu edited comment on CARBONDATA-1203 at 6/23/17 2:22 AM:
---

Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count (*) FROM t3")

Actual result:  t3 will have 20 records.
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala]


was (Author: chenerlu):
Hi, I encounter same problem. Issue can be summarized as follows.
Step 1: create a carbon table.
  cc.sql("CREATE TABLE IF NOT EXISTS t3 (id Int, name String) STORED BY 
'carbondata'")

Step 2: load data, then t3 will have 10 records
   cc.sql("LOAD DATA LOCAL INPATH 'mypathofdata' INTO TABLE t3 ")

Step 3: insert constant into table t3
cc.sql("INSERT INTO TABLE t3 SELECT 1, 'jack' FROM t3")

Step4: count table t3
cc.sql("SELECT count(*) FROM t3")

Actual result:  t3 will have 20 records.
Expected result:  t3 should have 11 records or throw sql.AnalysisException 
(This will be same as Hive table I think)

Any idea about this issue, which solution is better ? 
[~ravi.pesala]

> insert data caused  many duplicated data on spark 1.6.2
> ---
>
> Key: CARBONDATA-1203
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1203
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jarck
>
> I use branch-1.1 do insert test on spark 1.6.2 in my local machine
> I try to  run the sql as below to insert a data
>   spark.sql(s"""
>  insert into $tableName select $id,'$date','$country','$testName'
>  ,'$phoneType','$serialname',$salary from $tableName
>  """).show()
> at last the data has been inserted successfully, but it inserted many 
> duplicated data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1201) don't support insert syntax "insert into table select constants" on spark 1.6.2

2017-06-20 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056904#comment-16056904
 ] 

chenerlu commented on CARBONDATA-1201:
--

I remember this syntax may not support on spark1.6.2, while spark2.1 support.
So first we can confirm if that spark issue.

> don't support  insert syntax  "insert into table select  constants" on spark 
> 1.6.2
> --
>
> Key: CARBONDATA-1201
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1201
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jarck
>
> I use branch-1.1 do insert test on spark 1.6.2 in my local machine
> I try to  run the sql like "insert into table select  constants", but it 
> failed
> it works on spark 2.1.
> example sql:
>   spark.sql(s"""
>  insert into $tableName select $id,'$date','$country','$testName'
>  ,'$phoneType','$serialname',$salary
>  """).show()
> error log as below
> FailedPredicateException(regularBody,{$s.tree.getChild(1) !=null}?)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41238)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40413)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40283)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1204) Update operation fail and generate extra records when test with big data

2017-06-20 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1204:


 Summary: Update operation fail and generate extra records when 
test with big data
 Key: CARBONDATA-1204
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1204
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: Ravindra Pesala






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1197) Update related docs which still use incubating such as presto integration

2017-06-20 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1197:
-
Description: 
Update related docs which still use incubating.
Just update the references links, file name, directory name, etc.
Summary: Update related docs which still use incubating such as presto 
integration  (was: Update related docs which still use incubating such as 
presto integra)

> Update related docs which still use incubating such as presto integration
> -
>
> Key: CARBONDATA-1197
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1197
> Project: CarbonData
>  Issue Type: Bug
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>
> Update related docs which still use incubating.
> Just update the references links, file name, directory name, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1197) Update related docs which still use incubating such as presto integra

2017-06-19 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu reassigned CARBONDATA-1197:


Assignee: chenerlu

> Update related docs which still use incubating such as presto integra
> -
>
> Key: CARBONDATA-1197
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1197
> Project: CarbonData
>  Issue Type: Bug
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1197) Update related docs which still use incubating such as presto integra

2017-06-19 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1197:


 Summary: Update related docs which still use incubating such as 
presto integra
 Key: CARBONDATA-1197
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1197
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1180) loading data failed for dictionary file id is locked for updation

2017-06-19 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053607#comment-16053607
 ] 

chenerlu commented on CARBONDATA-1180:
--

Is this always happen ? Could  you please remove carbondata_test related 
metafiles and retry ?

>  loading data failed for dictionary file id is locked for updation 
> ---
>
> Key: CARBONDATA-1180
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1180
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.2.0
>Reporter: Liu Shaohui
>
> use Spark 2.1 in yarn-client mode and query from beeline to spark sql 
> thriftserver
> {code}
> CREATE TABLE IF NOT EXISTS carbondata_test(id string, name string, city 
> string, age Int) STORED BY 'carbondata';
> LOAD DATA INPATH 'hdfs:///user/sample-data/sample.csv' INTO TABLE 
> carbondata_test;
> {code}
> Data load is failed for following exception.
> {code}
> java.lang.RuntimeException: Dictionary file id is locked for updation. Please 
> try after some time +details
> java.lang.RuntimeException: Dictionary file id is locked for updation. Please 
> try after some time
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:407)
>   at 
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:345)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:99)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The 1.2.0 contains the fix in CARBONDATA-614.
> Any suggestion about this problem? Thanks~



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1191) Remove carbon-spark-shell script

2017-06-19 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1191:


 Summary: Remove carbon-spark-shell script 
 Key: CARBONDATA-1191
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1191
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1183) Update CarbonPartitionTable because partition columns should not be specified in the schema

2017-06-16 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1183:
-
Summary: Update CarbonPartitionTable because partition columns should not 
be specified in the schema  (was: Update CarbonPartitionTable Because partition 
columns should not be specified in the schema)

> Update CarbonPartitionTable because partition columns should not be specified 
> in the schema
> ---
>
> Key: CARBONDATA-1183
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1183
> Project: CarbonData
>  Issue Type: Bug
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1183) Update CarbonPartitionTable Because partition columns should not be specified in the schema

2017-06-16 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1183:


 Summary: Update CarbonPartitionTable Because partition columns 
should not be specified in the schema
 Key: CARBONDATA-1183
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1183
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1149) Fix issue of mismatch type of partition column when specify partition info and range info overlapping values issue

2017-06-14 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1149:
-
Summary: Fix issue of mismatch type of partition column when specify 
partition info and range info overlapping values issue  (was: Fix issue of 
mismatch type of partition column when specify partition info)

> Fix issue of mismatch type of partition column when specify partition info 
> and range info overlapping values issue
> --
>
> Key: CARBONDATA-1149
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1149
> Project: CarbonData
>  Issue Type: Bug
>Reporter: chenerlu
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1151) Update useful-tips-on-carbondata.md

2017-06-09 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1151:


 Summary: Update useful-tips-on-carbondata.md
 Key: CARBONDATA-1151
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1151
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-1149) Fix issue of mismatch type of partition column when specify partition info

2017-06-09 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1149:


 Summary: Fix issue of mismatch type of partition column when 
specify partition info
 Key: CARBONDATA-1149
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1149
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-1134) Generate redundant folders under integration model when run test cases with mvn command in spark1.6

2017-06-06 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1134:


 Summary: Generate redundant folders under integration model when 
run test cases with mvn command in spark1.6
 Key: CARBONDATA-1134
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1134
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Priority: Minor


When run mvn  -Pspark-1.6 -Dspark.version=1.6.3 clean package, it will generate 
redundant  folders under integration model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CARBONDATA-1115) load csv data fail

2017-06-02 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034242#comment-16034242
 ] 

chenerlu commented on CARBONDATA-1115:
--

Hi, make sure you specify right carbon store path and your sample.csv has 
column header in data file.

> load csv data fail
> --
>
> Key: CARBONDATA-1115
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1115
> Project: CarbonData
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 1.2.0
> Environment: centos 7, spark2.1.0, hadoop 2.7
>Reporter: hyd
> Fix For: 1.2.0
>
>
> is it a bug, or my environment has problem, can anyone help me.
> [root@localhost spark-2.1.0-bin-hadoop2.7]# ls /home/carbondata/sample.csv 
> /home/carbondata/sample.csv
> [root@localhost spark-2.1.0-bin-hadoop2.7]# ./bin/spark-shell --master 
> spark://192.168.32.114:7077 --total-executor-cores 2 --executor-memory 2G
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/spark-2.1.0-bin-hadoop2.7/carbonlib/carbondata_2.11-1.1.0-shade-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/spark-2.1.0-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 17/06/01 14:44:54 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 17/06/01 14:44:54 WARN SparkConf: 
> SPARK_CLASSPATH was detected (set to './carbonlib/*').
> This is deprecated in Spark 1.0+.
> Please instead use:
>  - ./spark-submit with --driver-class-path to augment the driver classpath
>  - spark.executor.extraClassPath to augment the executor classpath
> 
> 17/06/01 14:44:54 WARN SparkConf: Setting 'spark.executor.extraClassPath' to 
> './carbonlib/*' as a work-around.
> 17/06/01 14:44:54 WARN SparkConf: Setting 'spark.driver.extraClassPath' to 
> './carbonlib/*' as a work-around.
> 17/06/01 14:44:54 WARN Utils: Your hostname, localhost.localdomain resolves 
> to a loopback address: 127.0.0.1; using 192.168.32.114 instead (on interface 
> em1)
> 17/06/01 14:44:54 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 17/06/01 14:44:59 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> Spark context Web UI available at http://192.168.32.114:4040
> Spark context available as 'sc' (master = spark://192.168.32.114:7077, app id 
> = app-20170601144454-0001).
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.1.0
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.SparkSession
> scala> import org.apache.spark.sql.CarbonSession._
> import org.apache.spark.sql.CarbonSession._
> scala> val carbon = 
> SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://192.168.32.114/test")
> 17/06/01 14:45:35 WARN SparkContext: Using an existing SparkContext; some 
> configuration may not take effect.
> 17/06/01 14:45:38 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> carbon: org.apache.spark.sql.SparkSession = 
> org.apache.spark.sql.CarbonSession@2165b170
> scala> carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string, name 
> string, city string, age Int) STORED BY 'carbondata'")
> 17/06/01 14:45:45 AUDIT CreateTable: 
> [localhost.localdomain][root][Thread-1]Creating Table with Database name 
> [default] and Table name [test_table]
> res0: org.apache.spark.sql.DataFrame = []
> scala> carbon.sql("LOAD DATA LOCAL INPATH '/home/carbondata/sample.csv' INTO 
> TABLE test_table")
> 17/06/01 14:45:54 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
> 192.168.32.114, executor 0): java.lang.ClassCastException: cannot assign 
> instance of scala.collection.immutable.List$SerializationProxy to field 
> org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
> scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
>   at 
> java.io.Ob

[jira] [Commented] (CARBONDATA-1116) Not able to connect with Carbonsession while starting carbon spark shell and beeline

2017-06-01 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032668#comment-16032668
 ] 

chenerlu commented on CARBONDATA-1116:
--

Hi, I met same issue when I ran CarbonSessionExample on latest master branch.
This issue may caused by creating a new SparkSqlParser with null as its 
parameter.
Please help check and see if same problem. [~ravi.pesala]
Thanks


> Not able to connect with Carbonsession while starting carbon spark shell and 
> beeline
> 
>
> Key: CARBONDATA-1116
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1116
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.2.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Priority: Blocker
>
> Not able to connect with Carbonsession while starting carbon spark shell and 
> beeline
> Steps to reproduce:
> 1)Start thrift-server
> a) cd $SPARK-HOME/bin
> b) ./spark-submit --conf spark.sql.hive.thriftServer.singleSession=true 
> --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
> /opt/spark/spark-2.1/carbonlib/carbondata_2.11-1.1.0-SNAPSHOT-shade-hadoop2.7.3.jar
>  hdfs://localhost:54310/opt/prestocarbonStore
> 2)Start Beeline
> a) cd $SPARK-HOME/bin
> b)./beeline
> 3) Connect with carbondata via jdbc
> !connect jdbc:hive2://localhost:1
> Enter username for jdbc:hive2://localhost:1: hduser
> Enter password for jdbc:hive2://localhost:1: **
> 4) Actual Result:
> Error: Could not establish connection to jdbc:hive2://localhost:1: null 
> (state=08S01,code=0)
> 0: jdbc:hive2://localhost:1 (closed)>
> 5) Expected result : it should connect successfully with carbondata
> 6)console logs:
> 17/06/01 13:03:27 INFO ThriftCLIService: Client protocol version: 
> HIVE_CLI_SERVICE_PROTOCOL_V8
> 17/06/01 13:03:27 INFO SessionState: Created local directory: 
> /tmp/addaba65-46c5-4467-a02f-2bbdfd54329a_resources
> 17/06/01 13:03:27 INFO SessionState: Created HDFS directory: 
> /tmp/hive/hduser/addaba65-46c5-4467-a02f-2bbdfd54329a
> 17/06/01 13:03:27 INFO SessionState: Created local directory: 
> /tmp/hduser/addaba65-46c5-4467-a02f-2bbdfd54329a
> 17/06/01 13:03:27 INFO SessionState: Created HDFS directory: 
> /tmp/hive/hduser/addaba65-46c5-4467-a02f-2bbdfd54329a/_tmp_space.db
> 17/06/01 13:03:27 INFO HiveSessionImpl: Operation log session directory is 
> created: /tmp/hduser/operation_logs/addaba65-46c5-4467-a02f-2bbdfd54329a
> 17/06/01 13:03:27 INFO CarbonSparkSqlParser: Parsing command: use default
> Exception in thread "HiveServer2-Handler-Pool: Thread-84" 
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.spark.sql.hive.CarbonSessionState$$anon$1.(CarbonSessionState.scala:133)
>   at 
> org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:128)
>   at 
> org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:127)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.openSession(SparkSQLSessionManager.scala:83)
>   at 
> org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:202)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:351)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:246)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.hive.CarbonIUDAnalysisRule$.(CarbonAnalysisRules.scala:90)
>   at 
> org.apache.spark.sql.hive.CarbonIUDAnalysisRule$.(CarbonAnalysisRules.scala)
>   ... 20 more

[jira] [Commented] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019445#comment-16019445
 ] 

chenerlu commented on CARBONDATA-1076:
--

Yes, I have reproduced this problem with csv file.
Data in csv file ias follows:
col1,col2,col3
1,2,3
4,5,6
7,8,9

> Join Issue caused by dictionary and shuffle exchange
> 
>
> Key: CARBONDATA-1076
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1076
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.1.1-incubating, 1.1.0
> Environment: Carbon + spark 2.1
>Reporter: chenerlu
>Assignee: Ravindra Pesala
>
> We can reproduce this issue as following steps:
> Step1: create a carbon table
>  
> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
> int) STORED by 'carbondata' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
>  
> Step2: load data
> carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE 
> carbon_table")
> data in file carbon_table as follows:
> col1,col2,col3
> 1,2,3
> 4,5,6
> 7,8,9
>  
> Step3: do the query
> carbon.sql("SELECT c1.col1,c2.col1,c2.col3 FROM (SELECT col1,col2 FROM 
> carbon_table GROUP BY col1,col2) c1 FULL JOIN (SELECT col1,count(col2) as 
> col3 FROM carbon_table GROUP BY col1) c2 ON c1.col1 = c2.col1").show()
> [expected] Hive table and parquet table get same result as below and it 
> should be correct.
> |col1|col1|col3|
> |   1|   1|   1|
> |   4|   4|   1|
> |   7|   7|   1|
> [acutally] carbon will get null because wrong match.
> |col1|col1|col3|
> |   1|null|null|
> |null|   4|   1|
> |   4|null|null|
> |null|   7|   1|
> |   7|null|null|
> |null|   1|   1|
> Root cause analysis:
>  
> It is because this query has two subquery, and one subquey do the decode 
> after exchange and the other subquery do the decode before exchange, and this 
> may lead to wrong match when execute full join.
>  
> My idea: Can we move decode before exchange ? Because I am not very familiar 
> with Carbon query, so any idea about this ?
> Plan as follows:
>  
> == Physical Plan ==
> SortMergeJoin [col1#3445], [col1#3460], FullOuter
> :- Sort [col1#3445 ASC NULLS FIRST], false, 0
> :  +- Exchange hashpartitioning(col1#3445, 200)
> : +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
> col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table 
> name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
> CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
> col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
> :tempdev, Table name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
> IncludeProfile(ArrayBuffer(col1#3445)), CarbonAliasDecoderRelation(), 
> org.apache.spark.sql.CarbonSession@69e87cbe
> :+- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
> output=[col1#3445])
> :   +- Exchange hashpartitioning(col1#3445, col2#3446, 200)
> :  +- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
> output=[col1#3445, col2#3446])
> : +- Scan CarbonDatasourceHadoopRelation [ Database name 
> :tempdev, Table name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ] 
> tempdev.carbon_table[col1#3445,col2#3446] 
> +- Sort [col1#3460 ASC NULLS FIRST], false, 0
>+- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
> col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table 
> name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
> CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
> col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
> :tempdev, Table name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
> IncludeProfile(ArrayBuffer(col1#3460)), CarbonAliasDecoderRelation(), 
> org.apache.spark.sql.CarbonSession@69e87cbe
>   +- HashAggregate(keys=[col1#3460], functions=[count(col2#3461)], 
> output=[col1#3460, col3#3436L])
>  +- Exchange hashpartitioning(col1#3460, 200)
> +- HashAggregate(keys=[col1#3460], 
> functions=[partial_count(col

[jira] [Updated] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1076:
-
Description: 
We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
int) STORED by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")

data in file carbon_table as follows:
col1,col2,col3
1,2,3
4,5,6
7,8,9
 
Step3: do the query
carbon.sql("SELECT c1.col1,c2.col1,c2.col3 FROM (SELECT col1,col2 FROM 
carbon_table GROUP BY col1,col2) c1 FULL JOIN (SELECT col1,count(col2) as col3 
FROM carbon_table GROUP BY col1) c2 ON c1.col1 = c2.col1").show()

[expected] Hive table and parquet table get same result as below and it should 
be correct.
|col1|col1|col3|
|   1|   1|   1|
|   4|   4|   1|
|   7|   7|   1|

[acutally] carbon will get null because wrong match.
|col1|col1|col3|
|   1|null|null|
|null|   4|   1|
|   4|null|null|
|null|   7|   1|
|   7|null|null|
|null|   1|   1|

Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after 
exchange and the other subquery do the decode before exchange, and this may 
lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar 
with Carbon query, so any idea about this ?

Plan as follows:
 
== Physical Plan ==
SortMergeJoin [col1#3445], [col1#3460], FullOuter
:- Sort [col1#3445 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(col1#3445, 200)
: +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3445)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
:+- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445])
:   +- Exchange hashpartitioning(col1#3445, col2#3446, 200)
:  +- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445, col2#3446])
: +- Scan CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ] 
tempdev.carbon_table[col1#3445,col2#3446] 
+- Sort [col1#3460 ASC NULLS FIRST], false, 0
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3460)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
  +- HashAggregate(keys=[col1#3460], functions=[count(col2#3461)], 
output=[col1#3460, col3#3436L])
 +- Exchange hashpartitioning(col1#3460, 200)
+- HashAggregate(keys=[col1#3460], 
functions=[partial_count(col2#3461)], output=[col1#3460, count#3472L])
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 
-> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col2#3461)), CarbonAliasDecoderRelation(), 
org.apa

[jira] [Updated] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1076:
-
Description: 
We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
int) STORED by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")

data in file carbon_table as follows:
col1,col2,col3
1,2,3
4,5,6
7,8,9
 
you can get carbon_table file in attachment.
 
Step3: do the query
carbon.sql("SELECT c1.col1,c2.col1,c2.col3 FROM (SELECT col1,col2 FROM 
carbon_table GROUP BY col1,col2) c1 FULL JOIN (SELECT col1,count(col2) as col3 
FROM carbon_table GROUP BY col1) c2 ON c1.col1 = c2.col1").show()

[expected] Hive table and parquet table get same result as below and it should 
be correct.

|col1|col1|col3|
|   1|null|null|
|null|   4|   1|
|   4|null|null|
|null|   7|   1|
|   7|null|null|
|null|   1|   1|
 
[acutally] carbon will get null because wrong match.

|col1|col1|col3|
|   1|   1|   1|
|   4|   4|   1|
|   7|   7|   1|

Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after 
exchange and the other subquery do the decode before exchange, and this may 
lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar 
with Carbon query, so any idea about this ?

Plan as follows:
 
== Physical Plan ==
SortMergeJoin [col1#3445], [col1#3460], FullOuter
:- Sort [col1#3445 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(col1#3445, 200)
: +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3445)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
:+- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445])
:   +- Exchange hashpartitioning(col1#3445, col2#3446, 200)
:  +- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445, col2#3446])
: +- Scan CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ] 
tempdev.carbon_table[col1#3445,col2#3446] 
+- Sort [col1#3460 ASC NULLS FIRST], false, 0
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3460)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
  +- HashAggregate(keys=[col1#3460], functions=[count(col2#3461)], 
output=[col1#3460, col3#3436L])
 +- Exchange hashpartitioning(col1#3460, 200)
+- HashAggregate(keys=[col1#3460], 
functions=[partial_count(col2#3461)], output=[col1#3460, count#3472L])
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 
-> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(c

[jira] [Updated] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1076:
-
Description: 
We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
int) STORED by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")

data in file carbon_table as follows:
col1,col2,col3
1,2,3
4,5,6
7,8,9
 
Step3: do the query
carbon.sql("SELECT c1.col1,c2.col1,c2.col3 FROM (SELECT col1,col2 FROM 
carbon_table GROUP BY col1,col2) c1 FULL JOIN (SELECT col1,count(col2) as col3 
FROM carbon_table GROUP BY col1) c2 ON c1.col1 = c2.col1").show()

[expected] Hive table and parquet table get same result as below and it should 
be correct.

|col1|col1|col3|
|   1|null|null|
|null|   4|   1|
|   4|null|null|
|null|   7|   1|
|   7|null|null|
|null|   1|   1|
 
[acutally] carbon will get null because wrong match.

|col1|col1|col3|
|   1|   1|   1|
|   4|   4|   1|
|   7|   7|   1|

Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after 
exchange and the other subquery do the decode before exchange, and this may 
lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar 
with Carbon query, so any idea about this ?

Plan as follows:
 
== Physical Plan ==
SortMergeJoin [col1#3445], [col1#3460], FullOuter
:- Sort [col1#3445 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(col1#3445, 200)
: +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3445)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
:+- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445])
:   +- Exchange hashpartitioning(col1#3445, col2#3446, 200)
:  +- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445, col2#3446])
: +- Scan CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ] 
tempdev.carbon_table[col1#3445,col2#3446] 
+- Sort [col1#3460 ASC NULLS FIRST], false, 0
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3460)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
  +- HashAggregate(keys=[col1#3460], functions=[count(col2#3461)], 
output=[col1#3460, col3#3436L])
 +- Exchange hashpartitioning(col1#3460, 200)
+- HashAggregate(keys=[col1#3460], 
functions=[partial_count(col2#3461)], output=[col1#3460, count#3472L])
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 
-> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col2#3461)), CarbonAliasDecoderRelation(), 
org.

[jira] [Updated] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1076:
-
   Affects Version/s: 0.1.1-incubating
  1.1.0
Request participants:   (was: )
 Component/s: core

> Join Issue caused by dictionary and shuffle exchange
> 
>
> Key: CARBONDATA-1076
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1076
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.1.1-incubating, 1.1.0
> Environment: Carbon + spark 2.1
>Reporter: chenerlu
>Assignee: Ravindra Pesala
>
> We can reproduce this issue as following steps:
> Step1: create a carbon table
>  
> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
> int) STORED by 'carbondata' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
>  
> Step2: load data
> carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE 
> carbon_table")
>  
> you can get carbon_table file in attachment.
>  
> Step3: do the query
>  
> [expected] Hive table and parquet table get same result as below and it 
> should be correct.
> |col1|col1|col3|
> |   1|null|null|
> |null|   4|   1|
> |   4|null|null|
> |null|   7|   1|
> |   7|null|null|
> |null|   1|   1|
>  
> [acutally] carbon will get null because wrong match.
> |col1|col1|col3|
> |   1|   1|   1|
> |   4|   4|   1|
> |   7|   7|   1|
> Root cause analysis:
>  
> It is because this query has two subquery, and one subquey do the decode 
> after exchange and the other subquery do the decode before exchange, and this 
> may lead to wrong match when execute full join.
>  
> My idea: Can we move decode before exchange ? Because I am not very familiar 
> with Carbon query, so any idea about this ?
> Plan as follows:
>  
> == Physical Plan ==
> SortMergeJoin [col1#3445], [col1#3460], FullOuter
> :- Sort [col1#3445 ASC NULLS FIRST], false, 0
> :  +- Exchange hashpartitioning(col1#3445, 200)
> : +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
> col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table 
> name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
> CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
> col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
> :tempdev, Table name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
> IncludeProfile(ArrayBuffer(col1#3445)), CarbonAliasDecoderRelation(), 
> org.apache.spark.sql.CarbonSession@69e87cbe
> :+- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
> output=[col1#3445])
> :   +- Exchange hashpartitioning(col1#3445, col2#3446, 200)
> :  +- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
> output=[col1#3445, col2#3446])
> : +- Scan CarbonDatasourceHadoopRelation [ Database name 
> :tempdev, Table name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ] 
> tempdev.carbon_table[col1#3445,col2#3446] 
> +- Sort [col1#3460 ASC NULLS FIRST], false, 0
>+- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
> col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table 
> name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
> CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
> col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
> :tempdev, Table name :carbon_table, Schema 
> :Some(StructType(StructField(col1,IntegerType,true), 
> StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
> IncludeProfile(ArrayBuffer(col1#3460)), CarbonAliasDecoderRelation(), 
> org.apache.spark.sql.CarbonSession@69e87cbe
>   +- HashAggregate(keys=[col1#3460], functions=[count(col2#3461)], 
> output=[col1#3460, col3#3436L])
>  +- Exchange hashpartitioning(col1#3460, 200)
> +- HashAggregate(keys=[col1#3460], 
> functions=[partial_count(col2#3461)], output=[col1#3460, count#3472L])
>+- CarbonDictionaryDecoder 
> [CarbonDecoderRelation(Map(col1#3445 -> col1#3445, col2#3446 -> col2#3446, 
> col3#3447 -> col3#3447),CarbonDatasourceHadoopRelation [ Database name 
> :tempdev, Table name :carbon_table

[jira] [Updated] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1076:
-
Description: 
We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
int) STORED by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")
 
you can get carbon_table file in attachment.
 
Step3: do the query
 
[expected] Hive table and parquet table get same result as below and it should 
be correct.

|col1|col1|col3|
|   1|null|null|
|null|   4|   1|
|   4|null|null|
|null|   7|   1|
|   7|null|null|
|null|   1|   1|
 
[acutally] carbon will get null because wrong match.

|col1|col1|col3|
|   1|   1|   1|
|   4|   4|   1|
|   7|   7|   1|

Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after 
exchange and the other subquery do the decode before exchange, and this may 
lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar 
with Carbon query, so any idea about this ?

Plan as follows:
 
== Physical Plan ==
SortMergeJoin [col1#3445], [col1#3460], FullOuter
:- Sort [col1#3445 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(col1#3445, 200)
: +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3445)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
:+- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445])
:   +- Exchange hashpartitioning(col1#3445, col2#3446, 200)
:  +- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445, col2#3446])
: +- Scan CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ] 
tempdev.carbon_table[col1#3445,col2#3446] 
+- Sort [col1#3460 ASC NULLS FIRST], false, 0
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3460)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
  +- HashAggregate(keys=[col1#3460], functions=[count(col2#3461)], 
output=[col1#3460, col3#3436L])
 +- Exchange hashpartitioning(col1#3460, 200)
+- HashAggregate(keys=[col1#3460], 
functions=[partial_count(col2#3461)], output=[col1#3460, count#3472L])
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 
-> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col2#3461)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
  +- Scan CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,t

[jira] [Updated] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1076:
-
Description: 
We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
int) STORED by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")
 
you can get carbon_table file in attachment.
 
Step3: do the query
 
[expected] Hive table and parquet table get same result as below and it should 
be correct.

++++
|col1|col1|col3|
++++
|   1|null|null|
|null|   4|   1|
|   4|null|null|
|null|   7|   1|
|   7|null|null|
|null|   1|   1|
++++
 
[acutally] carbon will get null because wrong match.

++++
|col1|col1|col3|
++++
|   1|   1|   1|
|   4|   4|   1|
|   7|   7|   1|
++++

 
 
Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after 
exchange and the other subquery do the decode before exchange, and this may 
lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar 
with Carbon query, so any idea about this ?

Plan as follows:
 
== Physical Plan ==
SortMergeJoin [col1#3445], [col1#3460], FullOuter
:- Sort [col1#3445 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(col1#3445, 200)
: +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3445)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
:+- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445])
:   +- Exchange hashpartitioning(col1#3445, col2#3446, 200)
:  +- HashAggregate(keys=[col1#3445, col2#3446], functions=[], 
output=[col1#3445, col2#3446])
: +- Scan CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ] 
tempdev.carbon_table[col1#3445,col2#3446] 
+- Sort [col1#3460 ASC NULLS FIRST], false, 0
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 -> 
col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col1#3460)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
  +- HashAggregate(keys=[col1#3460], functions=[count(col2#3461)], 
output=[col1#3460, col3#3436L])
 +- Exchange hashpartitioning(col1#3460, 200)
+- HashAggregate(keys=[col1#3460], 
functions=[partial_count(col2#3461)], output=[col1#3460, count#3472L])
   +- CarbonDictionaryDecoder [CarbonDecoderRelation(Map(col1#3445 
-> col1#3445, col2#3446 -> col2#3446, col3#3447 -> 
col3#3447),CarbonDatasourceHadoopRelation [ Database name :tempdev, Table name 
:carbon_table, Schema :Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ]), 
CarbonDecoderRelation(Map(col1#3460 -> col1#3460, col2#3461 -> col2#3461, 
col3#3462 -> col3#3462),CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :carbon_table, Schema 
:Some(StructType(StructField(col1,IntegerType,true), 
StructField(col2,IntegerType,true), StructField(col3,IntegerType,true))) ])], 
IncludeProfile(ArrayBuffer(col2#3461)), CarbonAliasDecoderRelation(), 
org.apache.spark.sql.CarbonSession@69e87cbe
  +- Scan CarbonDatasourceHadoopRelation [ Database name 
:tempdev, Table name :

[jira] [Updated] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu updated CARBONDATA-1076:
-
Description: 
We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
int) STORED by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")
 
you can get carbon_table file in attachment.
 
Step3: do the query
 
[expected] Hive table and parquet table get same result as below and it should 
be correct.


 
 
[acutally] carbon will get null because wrong match
 
 
Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after 
exchange and the other subquery do the decode before exchange, and this may 
lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar 
with Carbon query, so any idea about this ?


  was:
We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
int) STORED by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")
 
you can get carbon_table file in attachment.
 
Step3: do the query
 
[expected] Hive table and parquet table get same result as below and it should 
be correct.
 
 
[acutally] carbon will get null because wrong match
 
 
Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after 
exchange and the other subquery do the decode before exchange, and this may 
lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar 
with Carbon query, so any idea about this ?



> Join Issue caused by dictionary and shuffle exchange
> 
>
> Key: CARBONDATA-1076
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1076
> Project: CarbonData
>  Issue Type: Bug
> Environment: Carbon + spark 2.1
>Reporter: chenerlu
>Assignee: Ravindra Pesala
>
> We can reproduce this issue as following steps:
> Step1: create a carbon table
>  
> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
> int) STORED by 'carbondata' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
>  
> Step2: load data
> carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE 
> carbon_table")
>  
> you can get carbon_table file in attachment.
>  
> Step3: do the query
>  
> [expected] Hive table and parquet table get same result as below and it 
> should be correct.
>  
>  
> [acutally] carbon will get null because wrong match
>  
>  
> Root cause analysis:
>  
> It is because this query has two subquery, and one subquey do the decode 
> after exchange and the other subquery do the decode before exchange, and this 
> may lead to wrong match when execute full join.
>  
> My idea: Can we move decode before exchange ? Because I am not very familiar 
> with Carbon query, so any idea about this ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

2017-05-22 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1076:


 Summary: Join Issue caused by dictionary and shuffle exchange
 Key: CARBONDATA-1076
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1076
 Project: CarbonData
  Issue Type: Bug
 Environment: Carbon + spark 2.1
Reporter: chenerlu
Assignee: Ravindra Pesala


We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 
int) STORED by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")
 
you can get carbon_table file in attachment.
 
Step3: do the query
 
[expected] Hive table and parquet table get same result as below and it should 
be correct.
 
 
[acutally] carbon will get null because wrong match
 
 
Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after 
exchange and the other subquery do the decode before exchange, and this may 
lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar 
with Carbon query, so any idea about this ?




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-1040) Add description of carbon not support update table and delete records in spark2.1

2017-05-08 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1040:


 Summary: Add description of carbon not support update table and 
delete records in spark2.1
 Key: CARBONDATA-1040
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1040
 Project: CarbonData
  Issue Type: Improvement
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-1021) Update compact for code style and unne

2017-05-04 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-1021:


 Summary: Update compact for code style and unne
 Key: CARBONDATA-1021
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1021
 Project: CarbonData
  Issue Type: Improvement
Reporter: chenerlu






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-1021) Update compact for code style and unnecessary operation

2017-05-04 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu reassigned CARBONDATA-1021:


Assignee: chenerlu
Request participants:   (was: )
Priority: Minor  (was: Major)
 Summary: Update compact for code style and unnecessary 
operation  (was: Update compact for code style and unne)

> Update compact for code style and unnecessary operation
> ---
>
> Key: CARBONDATA-1021
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1021
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-987) Can not delete lock file when drop table

2017-04-25 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-987:
---

 Summary: Can not delete lock file when drop table
 Key: CARBONDATA-987
 URL: https://issues.apache.org/jira/browse/CARBONDATA-987
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-986) Add alter table example

2017-04-25 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-986:
---

 Summary: Add alter table example
 Key: CARBONDATA-986
 URL: https://issues.apache.org/jira/browse/CARBONDATA-986
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-964) How Carbon will behave when execute insert operation in abnormal scenarios?

2017-04-20 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-964:
---

 Summary: How Carbon will behave when execute insert operation in 
abnormal scenarios?
 Key: CARBONDATA-964
 URL: https://issues.apache.org/jira/browse/CARBONDATA-964
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu
Assignee: chenerlu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CARBONDATA-954) The driverExecutorCacheConfTest failed because of interaction between testcases in CacheProviderTest

2017-04-19 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu resolved CARBONDATA-954.
-
Resolution: Resolved

> The driverExecutorCacheConfTest failed because of interaction between 
> testcases in CacheProviderTest
> 
>
> Key: CARBONDATA-954
> URL: https://issues.apache.org/jira/browse/CARBONDATA-954
> Project: CarbonData
>  Issue Type: Bug
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>
> Problem: The driverExecutorCacheConfTest will fail when run all test cases in 
> CacheProviderTest, while just run driverExecutorCacheConfTest will success.
> Solution:
> The driverExecutorCacheConfTest will fail after run the second test case 
> (createCache), because CacheProvider.getInstance() will get the instance 
> which have created caches in second testcase(createCache).So suggest drop all 
> caches after assertion in second testcase(createCache).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (CARBONDATA-954) The driverExecutorCacheConfTest failed because of interaction between testcases in CacheProviderTest

2017-04-19 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu reopened CARBONDATA-954:
-

> The driverExecutorCacheConfTest failed because of interaction between 
> testcases in CacheProviderTest
> 
>
> Key: CARBONDATA-954
> URL: https://issues.apache.org/jira/browse/CARBONDATA-954
> Project: CarbonData
>  Issue Type: Bug
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>
> Problem: The driverExecutorCacheConfTest will fail when run all test cases in 
> CacheProviderTest, while just run driverExecutorCacheConfTest will success.
> Solution:
> The driverExecutorCacheConfTest will fail after run the second test case 
> (createCache), because CacheProvider.getInstance() will get the instance 
> which have created caches in second testcase(createCache).So suggest drop all 
> caches after assertion in second testcase(createCache).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CARBONDATA-954) The driverExecutorCacheConfTest failed because of interaction between testcases in CacheProviderTest

2017-04-19 Thread chenerlu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenerlu resolved CARBONDATA-954.
-
Resolution: Duplicate

> The driverExecutorCacheConfTest failed because of interaction between 
> testcases in CacheProviderTest
> 
>
> Key: CARBONDATA-954
> URL: https://issues.apache.org/jira/browse/CARBONDATA-954
> Project: CarbonData
>  Issue Type: Bug
>Reporter: chenerlu
>Assignee: chenerlu
>Priority: Minor
>
> Problem: The driverExecutorCacheConfTest will fail when run all test cases in 
> CacheProviderTest, while just run driverExecutorCacheConfTest will success.
> Solution:
> The driverExecutorCacheConfTest will fail after run the second test case 
> (createCache), because CacheProvider.getInstance() will get the instance 
> which have created caches in second testcase(createCache).So suggest drop all 
> caches after assertion in second testcase(createCache).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)