[jira] [Commented] (CARBONDATA-2282) presto carbon does not support reading specific partition on which query is fired mapreduce.input.carboninputformat.partitions.to.prune property is null

2018-07-19 Thread Sangeeta Gulia (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548942#comment-16548942
 ] 

Sangeeta Gulia commented on CARBONDATA-2282:


[~photogamrun] please test and close this issue. PR is already merged to fix 
this issue. https://github.com/apache/carbondata/pull/2139

> presto carbon does not support reading specific partition on which query is 
> fired mapreduce.input.carboninputformat.partitions.to.prune property is null
> 
>
> Key: CARBONDATA-2282
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2282
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, presto-integration
>Affects Versions: 1.3.0
>Reporter: zhangwei
>Assignee: anubhav tarar
>Priority: Major
> Fix For: 1.3.0
>
> Attachments: partitonToPrune.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2582) Carbon properties does not get distributed on cluster mode

2018-07-19 Thread Sangeeta Gulia (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548925#comment-16548925
 ] 

Sangeeta Gulia commented on CARBONDATA-2582:


This issue is resolved by PR 
https://github.com/apache/carbondata/pull/2265/files

> Carbon properties does not get distributed on cluster mode
> --
>
> Key: CARBONDATA-2582
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2582
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
>Affects Versions: 1.4.0
> Environment: presto-server-0.187
>Reporter: Geetika Gupta
>Priority: Major
>
> Unsafe memory related Carbon properties mentioned in carbondata catalog of 
> presto were not getting distributed in presto cluster mode. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2581) Adding QueryStatistics in Presto Integration Code to Measure Performance

2018-07-19 Thread Sangeeta Gulia (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548909#comment-16548909
 ] 

Sangeeta Gulia commented on CARBONDATA-2581:


[~geetikagupta] This issue is also resolved. Its done with [GitHub Pull Request 
#2265|https://github.com/apache/carbondata/pull/2265]. Please close the issue 
and update its resolution to resolved.

> Adding QueryStatistics in Presto Integration Code to Measure Performance
> 
>
> Key: CARBONDATA-2581
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2581
> Project: CarbonData
>  Issue Type: Improvement
>  Components: presto-integration
>Affects Versions: 1.4.0
>Reporter: Geetika Gupta
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2583) Presto Performance Optimization - Creating a Multiblock Split to reduce network IO

2018-07-19 Thread Sangeeta Gulia (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548906#comment-16548906
 ] 

Sangeeta Gulia commented on CARBONDATA-2583:


[~geetikagupta] This issue could be closed. As the PR for optimization is 
merged.

> Presto Performance Optimization - Creating a Multiblock Split to reduce 
> network IO
> --
>
> Key: CARBONDATA-2583
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2583
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 1.4.0
>Reporter: Geetika Gupta
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2263) Date data is loaded incorrectly.

2018-06-20 Thread Sangeeta Gulia (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518029#comment-16518029
 ] 

Sangeeta Gulia commented on CARBONDATA-2263:


Date format should follow conventions as per Simple Date Format as in link: 
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

> Date data is loaded incorrectly.
> 
>
> Key: CARBONDATA-2263
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2263
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Sangeeta Gulia
>Assignee: anubhav tarar
>Priority: Minor
> Attachments: dataSample.csv
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When we set :
> CarbonProperties.getInstance()
> .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/mm/dd")
> and run below commands: 
> spark.sql("DROP TABLE IF EXISTS t3")
> spark.sql(
> s"""
> | CREATE TABLE IF NOT EXISTS t3(
> | ID Int,
> | date Date,
> | country String,
> | name String,
> | phonetype String,
> | serialname String,
> | salary Int,
> | floatField float
> | ) STORED BY 'carbondata'
> """.stripMargin)
> spark.sql(s"""
> LOAD DATA LOCAL INPATH '$testData' into table t3
> options('ALL_DICTIONARY_PATH'='$allDictFile', 'SINGLE_PASS'='true')
> """)
> spark.sql("""
> SELECT * FROM t3
> """).show()
> spark.sql("""
> SELECT * FROM t3 where floatField=3.5
> """).show()
> spark.sql("DROP TABLE IF EXISTS t3")
> Date data is loaded as below: 
> +---+--+---+-+-+--+--+--+
> | id| date|country| name|phonetype|serialname|salary|floatfield|
> +---+--+---+-+-+--+--+--+
> | 9|2015-01-18| china| aaa9| phone706| ASD86717| 15008| 2.34|
> | 10|2015-01-19| usa|aaa10| phone685| ASD30505| 15009| 2.34|
> | 1|2015-01-23| china| aaa1| phone197| ASD69643| 15000| 2.34|
> | 2|2015-01-24| china| aaa2| phone756| ASD42892| 15001| 2.34|
> | 3|2015-01-25| china| aaa3|phone1904| ASD37014| 15002| 2.34|
> | 4|2015-01-26| china| aaa4|phone2435| ASD66902| 15003| 2.34|
> | 5|2015-01-27| china| aaa5|phone2441| ASD90633| 15004| 2.34|
> | 6|2015-01-28| china| aaa6| phone294| ASD59961| 15005| 3.5|
> | 7|2015-01-29| china| aaa7| phone610| ASD14875| 15006| 2.34|
> | 8|2015-01-30| china| aaa8|phone1848| ASD57308| 15007| 2.34|
> +---+--+---+-+-+--+--+--+
>  
> However correct data is : 
> ID,date,country,name,phonetype,serialname,salary,floatField
> 1,2015/7/23,china,aaa1,phone197,ASD69643,15000,2.34
> 2,2015/7/24,china,aaa2,phone756,ASD42892,15001,2.34
> 3,2015/7/25,china,aaa3,phone1904,ASD37014,15002,2.34
> 4,2015/7/26,china,aaa4,phone2435,ASD66902,15003,2.34
> 5,2015/7/27,china,aaa5,phone2441,ASD90633,15004,2.34
> 6,2015/7/28,china,aaa6,phone294,ASD59961,15005,3.5
> 7,2015/7/29,china,aaa7,phone610,ASD14875,15006,2.34
> 8,2015/7/30,china,aaa8,phone1848,ASD57308,15007,2.34
> 9,2015/7/18,china,aaa9,phone706,ASD86717,15008,2.34
> 10,2015/7/19,usa,aaa10,phone685,ASD30505,15009,2.34
>  
> which says Month data is loaded incorrectly.
>  
> Similarly, if we use :
> CarbonProperties.getInstance()
> .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/mm/dd")
> it again store incorrect data for date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-2145) Refactor PreAggregate functionality for dictionary include.

2018-03-28 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia closed CARBONDATA-2145.
--
Resolution: Information Provided

> Refactor PreAggregate functionality for dictionary include.
> ---
>
> Key: CARBONDATA-2145
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2145
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Sangeeta Gulia
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If in maintable, the column is dictionary type then only add the count to 
> measure column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2273) Add sdv test cases for testing boolean datatype support.

2018-03-23 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2273:
--

 Summary: Add sdv test cases for testing boolean datatype support.
 Key: CARBONDATA-2273
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2273
 Project: CarbonData
  Issue Type: Test
Reporter: Sangeeta Gulia






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2272) Boolean Data loaded incorrectly when boolean column is dictionary include.

2018-03-23 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2272:
--

 Summary: Boolean Data loaded incorrectly when boolean column is 
dictionary include.
 Key: CARBONDATA-2272
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2272
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.3.0
Reporter: Sangeeta Gulia


Steps to reproduce: 

sql(
s"""
|CREATE TABLE if not exists boolean_table(
|aa STRING, bb INT, cc BOOLEAN, dd BOOLEAN
|) STORED BY 'carbondata' TBLPROPERTIES 
('DICTIONARY_INCLUDE'='cc')""".stripMargin)

sql("insert into boolean_table values('adam',11,true,true)")
sql("insert into boolean_table values('james',12,false,false)")
sql("insert into boolean_table values('smith',13,true,true)")

sql("select * from boolean_table ").show()

 

Output: 

+-+---++-+
| aa| bb| cc| dd|
+-+---++-+
|james| 12|true|false|
|smith| 13|true| true|
| adam| 11|true| true|
+-+---++-+

 

As can be seen from above output data in cc column is wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2263) Date data is loaded incorrectly.

2018-03-18 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2263:
--

 Summary: Date data is loaded incorrectly.
 Key: CARBONDATA-2263
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2263
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Sangeeta Gulia
 Attachments: dataSample.csv

When we set :

CarbonProperties.getInstance()
.addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/mm/dd")

and run below commands: 

spark.sql("DROP TABLE IF EXISTS t3")

spark.sql(
s"""
| CREATE TABLE IF NOT EXISTS t3(
| ID Int,
| date Date,
| country String,
| name String,
| phonetype String,
| serialname String,
| salary Int,
| floatField float
| ) STORED BY 'carbondata'
""".stripMargin)

spark.sql(s"""
LOAD DATA LOCAL INPATH '$testData' into table t3
options('ALL_DICTIONARY_PATH'='$allDictFile', 'SINGLE_PASS'='true')
""")

spark.sql("""
SELECT * FROM t3
""").show()

spark.sql("""
SELECT * FROM t3 where floatField=3.5
""").show()

spark.sql("DROP TABLE IF EXISTS t3")

Date data is loaded as below: 

+---+--+---+-+-+--+--+--+
| id| date|country| name|phonetype|serialname|salary|floatfield|
+---+--+---+-+-+--+--+--+
| 9|2015-01-18| china| aaa9| phone706| ASD86717| 15008| 2.34|
| 10|2015-01-19| usa|aaa10| phone685| ASD30505| 15009| 2.34|
| 1|2015-01-23| china| aaa1| phone197| ASD69643| 15000| 2.34|
| 2|2015-01-24| china| aaa2| phone756| ASD42892| 15001| 2.34|
| 3|2015-01-25| china| aaa3|phone1904| ASD37014| 15002| 2.34|
| 4|2015-01-26| china| aaa4|phone2435| ASD66902| 15003| 2.34|
| 5|2015-01-27| china| aaa5|phone2441| ASD90633| 15004| 2.34|
| 6|2015-01-28| china| aaa6| phone294| ASD59961| 15005| 3.5|
| 7|2015-01-29| china| aaa7| phone610| ASD14875| 15006| 2.34|
| 8|2015-01-30| china| aaa8|phone1848| ASD57308| 15007| 2.34|
+---+--+---+-+-+--+--+--+

 

However correct data is : 

ID,date,country,name,phonetype,serialname,salary,floatField
1,2015/7/23,china,aaa1,phone197,ASD69643,15000,2.34
2,2015/7/24,china,aaa2,phone756,ASD42892,15001,2.34
3,2015/7/25,china,aaa3,phone1904,ASD37014,15002,2.34
4,2015/7/26,china,aaa4,phone2435,ASD66902,15003,2.34
5,2015/7/27,china,aaa5,phone2441,ASD90633,15004,2.34
6,2015/7/28,china,aaa6,phone294,ASD59961,15005,3.5
7,2015/7/29,china,aaa7,phone610,ASD14875,15006,2.34
8,2015/7/30,china,aaa8,phone1848,ASD57308,15007,2.34
9,2015/7/18,china,aaa9,phone706,ASD86717,15008,2.34
10,2015/7/19,usa,aaa10,phone685,ASD30505,15009,2.34

 

which says Month data is loaded incorrectly.

 

Similarly, if we use :

CarbonProperties.getInstance()
.addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/mm/dd")

it again store incorrect data for date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2241) Wrong Query written in Preaggregation Document

2018-03-08 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2241:
--

 Summary: Wrong Query written in Preaggregation Document
 Key: CARBONDATA-2241
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2241
 Project: CarbonData
  Issue Type: Bug
Reporter: Sangeeta Gulia


Below query is written in document: 

SELECT sum(price), country from sales GROUP BY country

and it is said that it will execute on datamap, but it will execute with main 
table and not datamap.

 

Fix: Correct the query so that it will execute using datamap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2240) Refactor UT's to remove duplicate test scenarios and code to improve CI time for Preaggregate expressions and selection scenario

2018-03-08 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2240:
--

 Summary: Refactor UT's to remove duplicate test scenarios and code 
to improve CI time for Preaggregate expressions and selection scenario
 Key: CARBONDATA-2240
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2240
 Project: CarbonData
  Issue Type: Improvement
Reporter: Sangeeta Gulia


This task includes the following improvements on Preaggregate expressions and 
selection scenario: 

1) Refactor UT's to remove duplicate test scenarios to improve CI time.

2) Refactor test case for duplicate code in different class

3) Correcting test case for missing assert 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2226) Refactor UT's to remove duplicate test scenarios to improve CI time for PreAggregate create and drop feature

2018-03-05 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2226:
--

 Summary: Refactor UT's to remove duplicate test scenarios to 
improve CI time for PreAggregate create and drop feature
 Key: CARBONDATA-2226
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2226
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 1.3.0
Reporter: Sangeeta Gulia






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2155) IS NULL not working correctly on string datatype with dictionary_include in presto integration

2018-02-09 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2155:
--

 Summary: IS NULL not working correctly on string datatype with 
dictionary_include in presto integration
 Key: CARBONDATA-2155
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2155
 Project: CarbonData
  Issue Type: Bug
  Components: presto-integration
Affects Versions: 1.3.0
 Environment: Spark-2.1
Presto 0.187
Reporter: Sangeeta Gulia
 Attachments: lineitem.csv

Steps to reproduce:

1) Create table on carbondata and load data to it.

create table if not exists lineitem_carbon1(
L_SHIPDATE date,
L_SHIPMODE string,
L_SHIPINSTRUCT string,
L_RETURNFLAG string,
L_RECEIPTDATE date,
L_ORDERKEY string,
L_PARTKEY string,
L_SUPPKEY   string,
L_LINENUMBER int,
L_QUANTITY double,
L_EXTENDEDPRICE double,
L_DISCOUNT double,
L_TAX double,
L_LINESTATUS string,
L_COMMITDATE date,
L_COMMENT  string
) STORED BY 'carbondata'
TBLPROPERTIES 
('DICTIONARY_INCLUDE'='L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_LINESTATUS', 
'table_blocksize'='300', 'no_inverted_index'='L_ORDERKEY, L_PARTKEY, L_SUPPKEY, 
L_COMMENT');

load data inpath "hdfs://localhost:54310/user/hduser/input-files/lineitem.csv" 
into table lineitem_carbon1 options('DATEFORMAT' = 
'-MM-dd','DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT','BAD_RECORDS_LOGGER_ENABLE'='true',
 'BAD_RECORDS_ACTION'='FORCE');

1: jdbc:hive2://localhost:1> select l_shipmode from lineitem_carbon1 where 
l_shipmode is NULL;
+-+--+
| l_shipmode |
+-+--+
| NULL |
+-+–+

2) Access the same table from presto-cli and try to run select query form there:

presto:performance> select l_shipmode from lineitem_carbon1 where l_shipmode is 
NULL;
 l_shipmode 

(0 rows)

 Expected Result: It should be same as result from carbon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2152) Min function working incorrectly for string type with dictionary include in presto.

2018-02-09 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia updated CARBONDATA-2152:
---
Attachment: lineitem.csv

> Min function working incorrectly for string type with dictionary include in 
> presto.
> ---
>
> Key: CARBONDATA-2152
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2152
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
>Affects Versions: 1.3.0
> Environment: Spark2.1
> Presto0.187
>Reporter: Sangeeta Gulia
>Assignee: anubhav tarar
>Priority: Major
> Attachments: lineitem.csv
>
>
> Steps to reproduce:
> 1) Create and Load in carbondata.
> create table if not exists lineitem_carbon1(
> L_SHIPDATE date,
> L_SHIPMODE string,
> L_SHIPINSTRUCT string,
> L_RETURNFLAG string,
> L_RECEIPTDATE date,
> L_ORDERKEY string,
> L_PARTKEY string,
> L_SUPPKEY   string,
> L_LINENUMBER int,
> L_QUANTITY double,
> L_EXTENDEDPRICE double,
> L_DISCOUNT double,
> L_TAX double,
> L_LINESTATUS string,
> L_COMMITDATE date,
> L_COMMENT  string
> ) STORED BY 'carbondata'
> TBLPROPERTIES 
> ('DICTIONARY_INCLUDE'='L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_LINESTATUS', 
> 'table_blocksize'='300', 'no_inverted_index'='L_ORDERKEY, L_PARTKEY, 
> L_SUPPKEY, L_COMMENT');
> load data inpath 
> "hdfs://localhost:54310/user/hduser/input-files/lineitem.csv" into table 
> lineitem_carbon1 options('DATEFORMAT' = 
> '-MM-dd','DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT','BAD_RECORDS_LOGGER_ENABLE'='true',
>  'BAD_RECORDS_ACTION'='FORCE');
>  
>  0: jdbc:hive2://localhost:1> select min(l_shipmode) from 
> lineitem_carbon1;
> +--+--+
> | min(l_shipmode)  |
> +--+--+
> | AIR              |
> +--+--+
> 2) Connect to carbondata store from presto and perform the below query from 
> presto-cli:
> presto:performance> select min(l_shipmode) from lineitem_carbon1;
> _col0 
> --
> @NU#LL$! 
> (1 row)
>  
> Expected: On presto also, it should give the correct output as shown on 
> carbondata.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2152) Min function working incorrectly for string type with dictionary include in presto.

2018-02-09 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2152:
--

 Summary: Min function working incorrectly for string type with 
dictionary include in presto.
 Key: CARBONDATA-2152
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2152
 Project: CarbonData
  Issue Type: Bug
  Components: presto-integration
Affects Versions: 1.3.0
 Environment: Spark2.1
Presto0.187
Reporter: Sangeeta Gulia


Steps to reproduce:

1) Create and Load in carbondata.

create table if not exists lineitem_carbon1(
L_SHIPDATE date,
L_SHIPMODE string,
L_SHIPINSTRUCT string,
L_RETURNFLAG string,
L_RECEIPTDATE date,
L_ORDERKEY string,
L_PARTKEY string,
L_SUPPKEY   string,
L_LINENUMBER int,
L_QUANTITY double,
L_EXTENDEDPRICE double,
L_DISCOUNT double,
L_TAX double,
L_LINESTATUS string,
L_COMMITDATE date,
L_COMMENT  string
) STORED BY 'carbondata'
TBLPROPERTIES 
('DICTIONARY_INCLUDE'='L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_LINESTATUS', 
'table_blocksize'='300', 'no_inverted_index'='L_ORDERKEY, L_PARTKEY, L_SUPPKEY, 
L_COMMENT');

load data inpath "hdfs://localhost:54310/user/hduser/input-files/lineitem.csv" 
into table lineitem_carbon1 options('DATEFORMAT' = 
'-MM-dd','DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT','BAD_RECORDS_LOGGER_ENABLE'='true',
 'BAD_RECORDS_ACTION'='FORCE');
 
 0: jdbc:hive2://localhost:1> select min(l_shipmode) from lineitem_carbon1;
+--+--+
| min(l_shipmode)  |
+--+--+
| AIR              |
+--+--+
2) Connect to carbondata store from presto and perform the below query from 
presto-cli:

presto:performance> select min(l_shipmode) from lineitem_carbon1;

_col0 
--
@NU#LL$! 
(1 row)

 

Expected: On presto also, it should give the correct output as shown on 
carbondata.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2145) Refactor PreAggregate functionality for dictionary include.

2018-02-07 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2145:
--

 Summary: Refactor PreAggregate functionality for dictionary 
include.
 Key: CARBONDATA-2145
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2145
 Project: CarbonData
  Issue Type: Improvement
Reporter: Sangeeta Gulia


If in maintable, the column is dictionary type then only add the count to 
measure column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2112) Data getting garbled after datamap creation when table is created with GLOBAL SORT

2018-01-31 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-2112:
--

 Summary: Data getting garbled after datamap creation when table is 
created with GLOBAL SORT
 Key: CARBONDATA-2112
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2112
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
 Environment: spark-2.1
Reporter: Sangeeta Gulia
 Attachments: 2000_UniqData.csv

Data is getting garbled after datamap creation when table is created with 
BATCH_SORT/GLOBAL_SORT.

 

Steps to reproduce :

spark.sql("drop table if exists uniqdata_batchsort_compact3")

spark.sql("CREATE TABLE uniqdata_batchsort_compact3 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'carbondata' TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT')").show()

spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into 
table " +
 "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," +
 
"'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,"
 +
 "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," +
 "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')")

spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into 
table " +
 "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," +
 
"'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,"
 +
 "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," +
 "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')")

spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into 
table " +
 "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," +
 
"'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,"
 +
 "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," +
 "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')")

spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 group 
by cust_id ").show(50)

+---++
|cust_id|avg(cust_id)|
+---++
| 9376| 9376.0|
| 9427| 9427.0|
| 9465| 9465.0|
| 9852| 9852.0|
| 9900| 9900.0|
| 10206| 10206.0|
| 10362| 10362.0|
| 10623| 10623.0|
| 10817| 10817.0|
| 9182| 9182.0|
| 9564| 9564.0|
| 9879| 9879.0|
| 10081| 10081.0|
| 10121| 10121.0|
| 10230| 10230.0|
| 10462| 10462.0|
| 10703| 10703.0|
| 10914| 10914.0|
| 9162| 9162.0|
| 9383| 9383.0|
| 9454| 9454.0|
| 9517| 9517.0|
| 9558| 9558.0|
| 10708| 10708.0|
| 10798| 10798.0|
| 10862| 10862.0|
| 9071| 9071.0|
| 9169| 9169.0|
| 9946| 9946.0|
| 10468| 10468.0|
| 10745| 10745.0|
| 10768| 10768.0|
| 9153| 9153.0|
| 9206| 9206.0|
| 9403| 9403.0|
| 9597| 9597.0|
| 9647| 9647.0|
| 9775| 9775.0|
| 10032| 10032.0|
| 10395| 10395.0|
| 10527| 10527.0|
| 10567| 10567.0|
| 10632| 10632.0|
| 10788| 10788.0|
| 10815| 10815.0|
| 10840| 10840.0|
| 9181| 9181.0|
| 9344| 9344.0|
| 9575| 9575.0|
| 9675| 9675.0|
+---++
only showing top 50 rows

Note: Here the cust_id is coming correct .


spark.sql("create datamap uniqdata_agg on table uniqdata_batchsort_compact3 
using " +
 "'preaggregate' as select avg(cust_id) from uniqdata_batchsort_compact3 group 
by cust_id")

spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 group 
by cust_id ").show(50)

+---++
|cust_id|avg(cust_id)|
+---++
| 27651| 9217.0|
| 31944| 10648.0|
| 32667| 10889.0|
| 28242| 9414.0|
| 29841| 9947.0|
| 28728| 9576.0|
| 27255| 9085.0|
| 32571| 10857.0|
| 30276| 10092.0|
| 27276| 9092.0|
| 31503| 10501.0|
| 27687| 9229.0|
| 27183| 9061.0|
| 29334| 9778.0|
| 29913| 9971.0|
| 28683| 9561.0|
| 31545| 10515.0|
| 30405| 10135.0|
| 27693| 9231.0|
| 29649| 9883.0|
| 30537| 10179.0|
| 32709| 10903.0|
| 29586| 9862.0|
| 32895| 10965.0|
| 32415| 10805.0|
| 31644| 10548.0|
| 30030| 10010.0|
| 31713| 10571.0|
| 28083| 9361.0|
| 27813| 9271.0|
| 27171| 9057.0|
| 27189| 9063.0|
| 30444| 10148.0|
| 28623| 9541.0|
| 28566| 9522.0|
| 32655| 10885.0|
| 31164| 10388.0|
| 30321| 10107.0|
| 31452| 10484.0|
| 29829| 9943.0|
| 27468| 9156.0|
| 31212| 10404.0|
| 32154| 10718.0|
| 27531| 9177.0|
| 27654| 9218.0|
| 27105| 9035.0|
| 31113| 10371.0|
| 28479| 9493.0|
| 29094| 9698.0|
| 31551| 10517.0|
+---++
only showing top 50 rows

Note: But after datamap creation, cust_id is coming incorrect. It is coming as 
thrice(equivalent to number of loads) of its original value and avg(cust_id) is 
correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-1956) Select query with sum, count and avg throws exception for pre aggregate table

2018-01-29 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343035#comment-16343035
 ] 

Sangeeta Gulia commented on CARBONDATA-1956:


[~geetikagupta] This issue is not coming on current master branch code. It is 
working fine. Please close this bug.

> Select query with sum, count and avg throws exception for pre aggregate table
> -
>
> Key: CARBONDATA-1956
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1956
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: spark2.1
>Reporter: Geetika Gupta
>Priority: Major
> Fix For: 1.3.0
>
> Attachments: 2000_UniqData.csv
>
>
> I create a datamap using the following command:
> create datamap uniqdata_agg_d on table uniqdata_29 using 'preaggregate' as 
> select sum(decimal_column1), count(cust_id), avg(bigint_column1) from 
> uniqdata_29 group by cust_id;
> The datamap creation was successfull, but when I tried the following query:
> select sum(decimal_column1), count(cust_id), avg(bigint_column1) from 
> uniqdata_29 group by cust_id;
> It throws the following exception:
> Error: org.apache.spark.sql.AnalysisException: cannot resolve 
> '(sum(uniqdata_29_uniqdata_agg_d.`uniqdata_29_bigint_column1_sum`) / 
> sum(uniqdata_29_uniqdata_agg_d.`uniqdata_29_bigint_column1_count`))' due to 
> data type mismatch: 
> '(sum(uniqdata_29_uniqdata_agg_d.`uniqdata_29_bigint_column1_sum`) / 
> sum(uniqdata_29_uniqdata_agg_d.`uniqdata_29_bigint_column1_count`))' requires 
> (double or decimal) type, not bigint;;
> 'Aggregate [uniqdata_29_cust_id_count#244], 
> [sum(uniqdata_29_decimal_column1_sum#243) AS sum(decimal_column1)#274, 
> sum(cast(uniqdata_29_cust_id_count#244 as bigint)) AS count(cust_id)#276L, 
> (sum(uniqdata_29_bigint_column1_sum#245L) / 
> sum(uniqdata_29_bigint_column1_count#246L)) AS avg(bigint_column1)#279]
> +- 
> Relation[uniqdata_29_decimal_column1_sum#243,uniqdata_29_cust_id_count#244,uniqdata_29_bigint_column1_sum#245L,uniqdata_29_bigint_column1_count#246L]
>  CarbonDatasourceHadoopRelation [ Database name :28dec, Table name 
> :uniqdata_29_uniqdata_agg_d, Schema 
> :Some(StructType(StructField(uniqdata_29_decimal_column1_sum,DecimalType(30,10),true),
>  StructField(uniqdata_29_cust_id_count,IntegerType,true), 
> StructField(uniqdata_29_bigint_column1_sum,LongType,true), 
> StructField(uniqdata_29_bigint_column1_count,LongType,true))) ] 
> (state=,code=0)
> Steps for creation of maintable:
> CREATE TABLE uniqdata_29(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format';
> Load command:
> LOAD DATA INPATH 'hdfs://localhost:54311/Files/2000_UniqData.csv' into table 
> uniqdata_29 OPTIONS('DELIMITER'=',', 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> Datamap creation command:
> create datamap uniqdata_agg_d on table uniqdata_29 using 'preaggregate' as 
> select sum(decimal_column1), count(cust_id), avg(bigint_column1) from 
> uniqdata_29 group by cust_id;
> Note: select sum(decimal_column1), count(cust_id), avg(bigint_column1) from 
> uniqdata_29 group by cust_id; executed successfully on maintable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-1985) Insert into failed for multi partitioned table for static partition

2018-01-28 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342995#comment-16342995
 ] 

Sangeeta Gulia commented on CARBONDATA-1985:


[~geetikagupta] Hive also shows the same behavior. Hence it is an invalid bug. 
Please close.

To verify:

you can create a hive table with partition:

CREATE TABLE uniqdata_hive1(ACTIVE_EMUI_VERSION string, DOB timestamp,
DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
decimal(30,10),
DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,
INTEGER_COLUMN1 int) Partitioned by (cust_id int, cust_name string) stored as 
parquet;

insert into uniqdata_hive1 partition(cust_id='1',cust_name='CUST_NAME_2') 
select * from uniqdata_hive limit 10;

Below are the commands and result for your reference:

0: jdbc:hive2://localhost:1> CREATE TABLE 
uniqdata_hive1(ACTIVE_EMUI_VERSION string, DOB timestamp,
0: jdbc:hive2://localhost:1> DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10),
0: jdbc:hive2://localhost:1> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 
double, Double_COLUMN2 double,
0: jdbc:hive2://localhost:1> INTEGER_COLUMN1 int) Partitioned by (cust_id 
int, cust_name string) stored as parquet;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.305 seconds)
0: jdbc:hive2://localhost:1> insert into uniqdata_hive1 
partition(cust_id='1',cust_name='CUST_NAME_2') select * from uniqdata_hive 
limit 10;
Error: org.apache.spark.sql.AnalysisException: Cannot insert into table 
`default`.`uniqdata_hive1` because the number of columns are different: need 10 
columns, but query has 12 columns.; (state=,code=0)
0: jdbc:hive2://localhost:1>

 

 

> Insert into failed for multi partitioned table for static partition
> ---
>
> Key: CARBONDATA-1985
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1985
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: spark2.1
>Reporter: Geetika Gupta
>Priority: Major
> Fix For: 1.3.0
>
> Attachments: 2000_UniqData.csv
>
>
> I created a table using:
> CREATE TABLE uniqdata_int_string(ACTIVE_EMUI_VERSION string, DOB timestamp,
> DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
> decimal(30,10),
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,
> INTEGER_COLUMN1 int) Partitioned by (cust_id int, cust_name string) STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB")
> Hive create and load table command:
> CREATE TABLE uniqdata_hive (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp,
> DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
> decimal(30,10),
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,
> INTEGER_COLUMN1 int)ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH 'file:///home/geetika/Downloads/2000_UniqData.csv' 
> into table UNIQDATA_HIVE;  
> Insert into table command:
> insert into uniqdata_int_string 
> partition(cust_id='1',cust_name='CUST_NAME_2') select * from 
> uniqdata_hive limit 10;
> Output:
> Error: java.lang.IndexOutOfBoundsException: Index: 4, Size: 4 (state=,code=0)
> Here are the logs:
> 18/01/04 16:24:45 ERROR CarbonLoadDataCommand: pool-23-thread-6 
> org.apache.spark.sql.AnalysisException: Cannot insert into table 
> `28dec`.`uniqdata_int_string` because the number of columns are different: 
> need 10 columns, but query has 12 columns.;
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:222)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:280)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:272)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:272)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:207)
>   at 

[jira] [Commented] (CARBONDATA-2004) Incorrect result displays while loading data into a partitioned table.

2018-01-24 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337120#comment-16337120
 ] 

Sangeeta Gulia commented on CARBONDATA-2004:


This issue is working fine on current master branch code. Please retest and 
close this bug [~Vandana7].

> Incorrect result displays while loading data into a partitioned table.
> --
>
> Key: CARBONDATA-2004
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2004
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
> Environment: spark 2.1
>Reporter: Vandana Yadav
>Priority: Major
> Attachments: 2000_UniqData.csv, timestamp.png
>
>
> Incorrect result displays while loading data into a partitioned table.
> Steps to reproduce:
> 1)create a partitioned table:
> CREATE TABLE uniqdata_timestamp (CUST_ID int,CUST_NAME 
> String,ACTIVE_EMUI_VERSION string, DOJ timestamp, BIGINT_COLUMN1 
> bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 
> int) partitioned by(dob timestamp) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB")
> 2) Load data into table:
> LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' 
> into table uniqdata_timestamp partition(dob='1') OPTIONS 
> ('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
> BIGINT_COLUMN1, 
> BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
> Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');
> 3) Expected result: It should throw an error as invalid partition value.
> 4) Actual Result: it displays notification of successful load, but there is 
> no data into the table
> 5) execute query: select count(*) from uniqdata_timestamp;
> output:
> +---+--+
> | count(1)  |
> +---+--+
> | 0 |
> +---+--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1962) Support alter table add columns/drop columns on S3 table

2018-01-21 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia resolved CARBONDATA-1962.

Resolution: Implemented

> Support alter table add columns/drop columns on S3 table
> 
>
> Key: CARBONDATA-1962
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1962
> Project: CarbonData
>  Issue Type: Task
>Reporter: Sangeeta Gulia
>Assignee: Jatin
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1963) Support S3 table with dictionary

2018-01-21 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia resolved CARBONDATA-1963.

Resolution: Implemented

> Support S3 table with dictionary
> 
>
> Key: CARBONDATA-1963
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1963
> Project: CarbonData
>  Issue Type: Task
>Reporter: Sangeeta Gulia
>Assignee: Jatin
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1960) Add example for creating a local table and load CSV data which is stored in S3.

2018-01-21 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia resolved CARBONDATA-1960.

   Resolution: Implemented
Fix Version/s: 1.4.0

> Add example for creating a local table and load CSV data which is stored in 
> S3.
> ---
>
> Key: CARBONDATA-1960
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1960
> Project: CarbonData
>  Issue Type: Task
>Reporter: Sangeeta Gulia
>Assignee: Jatin
>Priority: Trivial
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1961) Support data update/delete on S3 table

2018-01-21 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia resolved CARBONDATA-1961.

   Resolution: Implemented
Fix Version/s: 1.4.0

> Support data update/delete on S3 table
> --
>
> Key: CARBONDATA-1961
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1961
> Project: CarbonData
>  Issue Type: Task
>Reporter: Sangeeta Gulia
>Assignee: Jatin
>Priority: Minor
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1959) Support compaction on S3 table

2018-01-21 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia resolved CARBONDATA-1959.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Support compaction on S3 table
> --
>
> Key: CARBONDATA-1959
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1959
> Project: CarbonData
>  Issue Type: Task
>Reporter: Sangeeta Gulia
>Assignee: Jatin
>Priority: Minor
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata

2018-01-21 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia resolved CARBONDATA-1827.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Add Support to provide S3 Functionality in Carbondata
> -
>
> Key: CARBONDATA-1827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1827
> Project: CarbonData
>  Issue Type: Task
>  Components: core
>Reporter: Sangeeta Gulia
>Assignee: Jatin
>Priority: Minor
> Fix For: 1.4.0
>
>  Time Spent: 29h 10m
>  Remaining Estimate: 0h
>
> Added Support to provide S3 Functionality in Carbondata. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata

2018-01-21 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333924#comment-16333924
 ] 

Sangeeta Gulia commented on CARBONDATA-1827:


Github PR 1805 completes all the listed task.

> Add Support to provide S3 Functionality in Carbondata
> -
>
> Key: CARBONDATA-1827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1827
> Project: CarbonData
>  Issue Type: Task
>  Components: core
>Reporter: Sangeeta Gulia
>Assignee: Jatin
>Priority: Minor
>  Time Spent: 29h 10m
>  Remaining Estimate: 0h
>
> Added Support to provide S3 Functionality in Carbondata. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-1673) Carbon 1.3.0-Partitioning:Show Partition for Range Partition is not showing the correct details.

2018-01-08 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317882#comment-16317882
 ] 

Sangeeta Gulia commented on CARBONDATA-1673:


I am unable to replicate this issue. It is working correctly on current master 
branch code.

These are the commands I have executed:

CarbonProperties.getInstance()
  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd")

spark.sql("DROP TABLE IF EXISTS t0")
spark.sql("""
| CREATE TABLE IF NOT EXISTS t0
| (
| id Int,
| vin String,
| phonenumber Long,
| country String,
| area String,
| salary Int
| )
| PARTITIONED BY (logdate Timestamp)
| STORED BY 'carbondata'
| TBLPROPERTIES('PARTITION_TYPE'='RANGE',
| 'RANGE_INFO'='2014/01/01, 2015/01/01, 2016/01/01')
  """.stripMargin)

spark.sql("""show partitions t0""").show()

And below is my result after show partition: 
++
|   partition|
++
|0, logdate = DEFAULT|
|1, logdate < 2014...|
|2, 2014/01/01 <= ...|
|3, 2015/01/01 <= ...|
++

Please let me know if you are able to replicate it on current master branch 
code.

> Carbon 1.3.0-Partitioning:Show Partition for Range Partition is not showing 
> the correct details.
> 
>
> Key: CARBONDATA-1673
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1673
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.3.0
>Reporter: Ayushi Sharma
>Priority: Minor
> Attachments: Range_recording.htm, Range_recording.swf
>
>
> For description, please refer to the attachment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1963) Support S3 table with dictionary

2018-01-01 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1963:
--

 Summary: Support S3 table with dictionary
 Key: CARBONDATA-1963
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1963
 Project: CarbonData
  Issue Type: Task
Reporter: Sangeeta Gulia
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata

2018-01-01 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia updated CARBONDATA-1827:
---
Issue Type: Task  (was: New Feature)

> Add Support to provide S3 Functionality in Carbondata
> -
>
> Key: CARBONDATA-1827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1827
> Project: CarbonData
>  Issue Type: Task
>  Components: core
>Reporter: Sangeeta Gulia
>Assignee: Jatin
>Priority: Minor
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Added Support to provide S3 Functionality in Carbondata. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1962) Support alter table add columns/drop columns on S3 table

2018-01-01 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1962:
--

 Summary: Support alter table add columns/drop columns on S3 table
 Key: CARBONDATA-1962
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1962
 Project: CarbonData
  Issue Type: Task
Reporter: Sangeeta Gulia
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1961) Support data update/delete on S3 table

2018-01-01 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia updated CARBONDATA-1961:
---
Priority: Minor  (was: Major)

> Support data update/delete on S3 table
> --
>
> Key: CARBONDATA-1961
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1961
> Project: CarbonData
>  Issue Type: Task
>Reporter: Sangeeta Gulia
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1961) Support data update/delete on S3 table

2018-01-01 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1961:
--

 Summary: Support data update/delete on S3 table
 Key: CARBONDATA-1961
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1961
 Project: CarbonData
  Issue Type: Task
Reporter: Sangeeta Gulia






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1960) Add example for creating a local table and load CSV data which is stored in S3.

2018-01-01 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1960:
--

 Summary: Add example for creating a local table and load CSV data 
which is stored in S3.
 Key: CARBONDATA-1960
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1960
 Project: CarbonData
  Issue Type: Task
Reporter: Sangeeta Gulia
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1959) Support compaction on S3 table

2018-01-01 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia updated CARBONDATA-1959:
---
Priority: Minor  (was: Major)

> Support compaction on S3 table
> --
>
> Key: CARBONDATA-1959
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1959
> Project: CarbonData
>  Issue Type: Task
>Reporter: Sangeeta Gulia
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1959) Support compaction on S3 table

2018-01-01 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1959:
--

 Summary: Support compaction on S3 table
 Key: CARBONDATA-1959
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1959
 Project: CarbonData
  Issue Type: Task
Reporter: Sangeeta Gulia






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1758) Carbon1.3.0- No Inverted Index : Select column with is null for no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException

2017-12-20 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299650#comment-16299650
 ] 

Sangeeta Gulia commented on CARBONDATA-1758:


[~chetdb] This is the result of my query after executing the entire sequence of 
queries you have mentioned.

  0: jdbc:hive2://hadoop-master:1> Select CUST_ID from uniqdata_DI_int 
where CUST_ID is null;
+--+--+
| CUST_ID  |
+--+--+
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
+--+--+
26 rows selected (0.408 seconds)
0: jdbc:hive2://hadoop-master:1> 


> Carbon1.3.0- No Inverted Index : Select column with is null for 
> no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException
> 
>
> Key: CARBONDATA-1758
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1758
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: 3 node cluster
>Reporter: Chetan Bhat
>  Labels: Functional
>
> Steps :
> In Beeline user executes the queries in sequence.
> CREATE TABLE uniqdata_DI_int (CUST_ID int,CUST_NAME 
> String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, 
> BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 
> double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='cust_id','NO_INVERTED_INDEX'='cust_id');
> LOAD DATA INPATH 'hdfs://hacluster/chetan/3000_UniqData.csv' into table 
> uniqdata_DI_int OPTIONS('DELIMITER'=',', 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> Select count(CUST_ID) from uniqdata_DI_int;
> Select count(CUST_ID)*10 as multiple from uniqdata_DI_int;
> Select avg(CUST_ID) as average from uniqdata_DI_int;
> Select floor(CUST_ID) as average from uniqdata_DI_int;
> Select ceil(CUST_ID) as average from uniqdata_DI_int;
> Select ceiling(CUST_ID) as average from uniqdata_DI_int;
> Select CUST_ID*integer_column1 as multiple from uniqdata_DI_int;
> Select CUST_ID from uniqdata_DI_int where CUST_ID is null;
> *Issue : Select column with is null for no_inverted_index column throws 
> java.lang.ArrayIndexOutOfBoundsException*
> 0: jdbc:hive2://10.18.98.34:23040> Select CUST_ID from uniqdata_DI_int where 
> CUST_ID is null;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 79.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 79.0 (TID 123, BLR114278, executor 18): 
> org.apache.spark.util.TaskCompletionListenerException: 
> java.util.concurrent.ExecutionException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105)
> at org.apache.spark.scheduler.Task.run(Task.scala:112)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)
> Expected : Select column with is null for no_inverted_index column should be 
> successful displaying the correct result set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1055) Record count mismatch for Carbon query compared with Parquet for TPCH query 15

2017-12-19 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297910#comment-16297910
 ] 

Sangeeta Gulia commented on CARBONDATA-1055:


I have tried it using 1GB data. It is working fine on that. Can you provide 
more details?

> Record count mismatch for Carbon query compared with Parquet for TPCH query 15
> --
>
> Key: CARBONDATA-1055
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1055
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.1.0
> Environment: 3 node cluster
>Reporter: Chetan Bhat
> Attachments: TPCH_query15.rar
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> User creates a table and loads TPCH data into different tables.
> User executes all the select queries and compares the record count and 
> performance of the Carbon queries with parquet queries.
> Actual Issue : Record count mismatch for Carbon query compared with Parquet 
> for TPCH query 15.
> Carbon record count for TPCH query 15 - 71972
> Parquet record count for TPCH query 15 - 72343
> Expected : There should not be record count mismatch for Carbon query 
> compared with Parquet for TPCH query 15.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1672) Carbon 1.3.0-Partitioning:Hash Partition is not working as specified in the document.

2017-12-14 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290813#comment-16290813
 ] 

Sangeeta Gulia commented on CARBONDATA-1672:


This is the link to the latest documentation:

https://carbondata.apache.org/data-management-on-carbondata.html

which has provided the below syntax for creating hash partition table:

CREATE TABLE IF NOT EXISTS hash_partition_table(
col_A String,
col_B Int,
col_C Long,
col_D Decimal(10,2),
col_F Timestamp
) PARTITIONED BY (col_E Long)
STORED BY 'carbondata' 
TBLPROPERTIES('PARTITION_TYPE'='HASH','NUM_PARTITIONS'='9')

Note: please check the difference in NUM_PARTITIONS attribute for specifying 
the number of partitions.

> Carbon 1.3.0-Partitioning:Hash Partition is not working as specified in the 
> document.
> -
>
> Key: CARBONDATA-1672
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1672
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.3.0
>Reporter: Ayushi Sharma
>Priority: Minor
> Attachments: Part2.PNG, Partition1.PNG
>
>
> create table Carb_part (P_PARTKEY BIGINT,P_NAME STRING,P_MFGR STRING,P_BRAND 
> STRING,P_TYPE STRING,P_CONTAINER STRING,P_RETAILPRICE DOUBLE,P_COMMENT 
> STRING)PARTITIONED BY (P_SIZE int) STORED BY 'CARBONDATA' 
> TBLPROPERTIES('partition_type'='HASH','partition_num'='3');
> This command displays error as mentioned below:
> Error: org.apache.carbondata.spark.exception.MalformedCarbonCommandException: 
> Error: Invalid partition definition (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (CARBONDATA-1758) Carbon1.3.0- No Inverted Index : Select column with is null for no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException

2017-12-14 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290392#comment-16290392
 ] 

Sangeeta Gulia edited comment on CARBONDATA-1758 at 12/14/17 12:50 PM:
---

Please provide more details for this bug as I am not able to replicate this 
issue, neither on my local system or 3 node cluster.

It is showing the result as per expectation.


was (Author: sangeeta04):
Please provide more details for this bug as i am not able to replicate this 
issue, neither on my local system or 3 node cluster.

> Carbon1.3.0- No Inverted Index : Select column with is null for 
> no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException
> 
>
> Key: CARBONDATA-1758
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1758
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: 3 node cluster
>Reporter: Chetan Bhat
>  Labels: Functional
>
> Steps :
> In Beeline user executes the queries in sequence.
> CREATE TABLE uniqdata_DI_int (CUST_ID int,CUST_NAME 
> String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, 
> BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 
> double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='cust_id','NO_INVERTED_INDEX'='cust_id');
> LOAD DATA INPATH 'hdfs://hacluster/chetan/3000_UniqData.csv' into table 
> uniqdata_DI_int OPTIONS('DELIMITER'=',', 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> Select count(CUST_ID) from uniqdata_DI_int;
> Select count(CUST_ID)*10 as multiple from uniqdata_DI_int;
> Select avg(CUST_ID) as average from uniqdata_DI_int;
> Select floor(CUST_ID) as average from uniqdata_DI_int;
> Select ceil(CUST_ID) as average from uniqdata_DI_int;
> Select ceiling(CUST_ID) as average from uniqdata_DI_int;
> Select CUST_ID*integer_column1 as multiple from uniqdata_DI_int;
> Select CUST_ID from uniqdata_DI_int where CUST_ID is null;
> *Issue : Select column with is null for no_inverted_index column throws 
> java.lang.ArrayIndexOutOfBoundsException*
> 0: jdbc:hive2://10.18.98.34:23040> Select CUST_ID from uniqdata_DI_int where 
> CUST_ID is null;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 79.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 79.0 (TID 123, BLR114278, executor 18): 
> org.apache.spark.util.TaskCompletionListenerException: 
> java.util.concurrent.ExecutionException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105)
> at org.apache.spark.scheduler.Task.run(Task.scala:112)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)
> Expected : Select column with is null for no_inverted_index column should be 
> successful displaying the correct result set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1679) Carbon 1.3.0-Partitioning:After Splitting the Partition,no records are displayed

2017-12-14 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290778#comment-16290778
 ] 

Sangeeta Gulia commented on CARBONDATA-1679:


I am unable to replicate this issue. It is working correctly as per current 
master code.

My csv contains the below records:
0|ALGERIA|0| haggle. carefully final deposits detect slyly agai|
1|ARGENTINA|1|al foxes promise slyly according to the regular accounts. bold 
requests alon|
2|BRAZIL|1|y alongside of the pending deposits. carefully special packages are 
about the ironic forges. slyly special |
3|CANADA|1|eas hang ironic, silent packages. slyly regular packages are 
furiously over the tithes. fluffily bold|
4|EGYPT|4|y above the carefully unusual theodolites. final dugouts are quickly 
across the furiously regular d|
5|ETHIOPIA|0|ven packages wake quickly. regu|
6|FRANCE|3|refully final requests. regular, ironi|
7|GERMANY|3|l platelets. regular accounts x-ray: unusual, regular acco|
8|INDIA|2|ss excuses cajole slyly across the packages. deposits print aroun|
9|INDONESIA|2| slyly express asymptotes. regular deposits haggle slyly. 
carefully ironic hockey players sleep blithely. carefull|
10|IRAN|4|efully alongside of the slyly final dependencies. |
11|IRAQ|4|nic deposits boost atop the quickly final requests? quickly regula|
12|JAPAN|2|ously. final, express gifts cajole a|
13|JORDAN|4|ic deposits are blithely about the carefully regular pa|
14|KENYA|0| pending excuses haggle furiously deposits. pending, express pinto 
beans wake fluffily past t|
15|MOROCCO|0|rns. blithely bold courts among the closely regular packages use 
furiously bold platelets?|
16|MOZAMBIQUE|0|s. ironic, unusual asymptotes wake blithely r|
17|PERU|1|platelets. blithely pending dependencies use fluffily across the even 
pinto beans. carefully silent accoun|
18|CHINA|2|c dependencies. furiously express notornis sleep slyly regular 
accounts. ideas sleep. depos|
19|ROMANIA|3|ular asymptotes are about the furious multipliers. express 
dependencies nag above the ironically ironic account|
20|SAUDI ARABIA|4|ts. silent requests haggle. closely express packages sleep 
across the blithely|
21|VIETNAM|2|hely enticingly express accounts. even, final |
22|RUSSIA|3| requests against the platelets use never according to the quickly 
regular pint|
23|UNITED KINGDOM|3|eans boost carefully special requests. accounts are. 
carefull|
24|UNITED STATES|1|y final packages. slow foxes cajole quickly. quickly silent 
platelets breach ironic accounts. unusual pinto be|

and below is my output for the queries after 
"ALTER TABLE part_nation_4 SPLIT PARTITION(5) 
INTO('(EGYPT,ETHIOPIA)','FRANCE');": 

++
|   partition|
++
| 0, n_name = DEFAULT|
| 1, n_name = ALGERIA|
|2, n_name = ARGEN...|
|  3, n_name = BRAZIL|
|  4, n_name = CANADA|
|7, n_name = EGYPT...|
|  8, n_name = FRANCE|
|   6, n_name = JAPAN|
++

+---+---++--+
|n_nationkey|n_regionkey|   n_comment|n_name|
+---+---++--+
|  6|  3|refully final req...|FRANCE|
+---+---++--+

+---+---++--+
|n_nationkey|n_regionkey|   n_comment|n_name|
+---+---++--+
|  4|  4|y above the caref...| EGYPT|
+---+---++--+

+---+---++--+
|n_nationkey|n_regionkey|   n_comment|n_name|
+---+---++--+
|  3|  1|eas hang ironic, ...|CANADA|
+---+---++--+

Can you please retest it again with the csv records I have provided?

> Carbon 1.3.0-Partitioning:After Splitting the Partition,no records are 
> displayed
> 
>
> Key: CARBONDATA-1679
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1679
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.3.0
>Reporter: Ayushi Sharma
> Attachments: Split1.PNG
>
>
> create table part_nation_4 (N_NATIONKEY BIGINT,N_REGIONKEY BIGINT,N_COMMENT 
> STRING) partitioned by (N_NAME STRING) stored by 'carbondata' 
> tblproperties('partition_type'='list','list_info'='ALGERIA,ARGENTINA,BRAZIL,CANADA,(EGYPT,ETHIOPIA,FRANCE),JAPAN');
> load data inpath '/spark-warehouse/tpchhive.db/nation/nation.tbl' into table 
> part_nation_4 
> options('DELIMITER'='|','FILEHEADER'='N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');
> show partitions part_nation_4;
> ALTER TABLE part_nation_4 SPLIT PARTITION(5) 
> INTO('(EGYPT,ETHIOPIA)','FRANCE');
> show partitions part_nation_4;
> select * 

[jira] [Commented] (CARBONDATA-1758) Carbon1.3.0- No Inverted Index : Select column with is null for no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException

2017-12-13 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290392#comment-16290392
 ] 

Sangeeta Gulia commented on CARBONDATA-1758:


Please provide more details for this bug as i am not able to replicate this 
issue, neither on my local system or 3 node cluster.

> Carbon1.3.0- No Inverted Index : Select column with is null for 
> no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException
> 
>
> Key: CARBONDATA-1758
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1758
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.3.0
> Environment: 3 node cluster
>Reporter: Chetan Bhat
>  Labels: Functional
>
> Steps :
> In Beeline user executes the queries in sequence.
> CREATE TABLE uniqdata_DI_int (CUST_ID int,CUST_NAME 
> String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, 
> BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 
> double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='cust_id','NO_INVERTED_INDEX'='cust_id');
> LOAD DATA INPATH 'hdfs://hacluster/chetan/3000_UniqData.csv' into table 
> uniqdata_DI_int OPTIONS('DELIMITER'=',', 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> Select count(CUST_ID) from uniqdata_DI_int;
> Select count(CUST_ID)*10 as multiple from uniqdata_DI_int;
> Select avg(CUST_ID) as average from uniqdata_DI_int;
> Select floor(CUST_ID) as average from uniqdata_DI_int;
> Select ceil(CUST_ID) as average from uniqdata_DI_int;
> Select ceiling(CUST_ID) as average from uniqdata_DI_int;
> Select CUST_ID*integer_column1 as multiple from uniqdata_DI_int;
> Select CUST_ID from uniqdata_DI_int where CUST_ID is null;
> *Issue : Select column with is null for no_inverted_index column throws 
> java.lang.ArrayIndexOutOfBoundsException*
> 0: jdbc:hive2://10.18.98.34:23040> Select CUST_ID from uniqdata_DI_int where 
> CUST_ID is null;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 79.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 79.0 (TID 123, BLR114278, executor 18): 
> org.apache.spark.util.TaskCompletionListenerException: 
> java.util.concurrent.ExecutionException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105)
> at org.apache.spark.scheduler.Task.run(Task.scala:112)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)
> Expected : Select column with is null for no_inverted_index column should be 
> successful displaying the correct result set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1865) Skip Single Pass for first data load.

2017-12-06 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279941#comment-16279941
 ] 

Sangeeta Gulia commented on CARBONDATA-1865:


This issue will be resolved with PR 
https://github.com/apache/carbondata/pull/1622

> Skip Single Pass for first data load.
> -
>
> Key: CARBONDATA-1865
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1865
> Project: CarbonData
>  Issue Type: Task
>Affects Versions: 1.3.0
>Reporter: Sangeeta Gulia
>Assignee: anubhav tarar
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata

2017-11-28 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1827:
--

 Summary: Add Support to provide S3 Functionality in Carbondata
 Key: CARBONDATA-1827
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1827
 Project: CarbonData
  Issue Type: New Feature
  Components: core
Reporter: Sangeeta Gulia
Priority: Minor


Added Support to provide S3 Functionality in Carbondata. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (CARBONDATA-1460) Drop column in alter table working incorrectly when connected to same thrift using different beeline sessions

2017-10-12 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia closed CARBONDATA-1460.
--
Resolution: Invalid

> Drop column in alter table working incorrectly when connected to same thrift 
> using different beeline sessions
> -
>
> Key: CARBONDATA-1460
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1460
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.2.0
> Environment: Spark-2.1
>Reporter: Sangeeta Gulia
>Assignee: anubhav tarar
>Priority: Minor
>
> I am trying to do concurrent testing on the same table. For that, I have 
> started my thrift server with the given command:
> sudo /home/hduser/spark-2.1.0-bin-hadoop2.7/bin/spark-submit  --master 
> spark://host-name:7077 --class 
> org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
> carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.2.0.jar 
> and two different nodes are connecting to same thrift using two beeline 
> sessions.
> While beeline1 session executes a query :  alter table uniqdata drop 
> columns(cust_id);
> Then after execution of the query, beeline1 session gets error: 
> Error: org.apache.spark.sql.AnalysisException: cannot resolve '`cust_id`' 
> given input columns: [bigint_column1, double_column1, dob, doj, 
> active_emui_version, decimal_column2, bigint_column2, integer_column1, 
> cust_name, double_column2, decimal_column1]; line 1 pos 7;
> While beeline2 session is able to get data for cust_id from that table using 
> the below command: 
> select cust_id from uniqdata;
> But, when both the beeline session try to run "describe table uniqdata" they 
> see the same result which does not include cust_id as a column in the table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1424) Delete Operation working incorrectly when subquery returns bad-record

2017-09-07 Thread Sangeeta Gulia (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156924#comment-16156924
 ] 

Sangeeta Gulia commented on CARBONDATA-1424:


Thanks for the information [~ravi.pesala]. I have verified the above query, its 
working as u told.

But, I have found one thing which is a bit confusing. I tried to break down the 
first query into two queries, Ideally the result of both should come similar 
but this is not the case. However it is working the same way in hive as in 
carbondata. 

Below are my queries and its result, first query display only 1 record whereas 
third query gives 13 records. However both should give same output.

QUERY1 ::: select * from uniqdata1 where cust_id in (select cust_id 
from uniqdata1 limit 10);
+--++--+---+---+-+-+--+--+-+-+--+--+
| CUST_ID  | CUST_NAME  | ACTIVE_EMUI_VERSION  |  DOB  |  DOJ  | BIGINT_COLUMN1 
 | BIGINT_COLUMN2  | DECIMAL_COLUMN1  | DECIMAL_COLUMN2  | Double_COLUMN1  | 
Double_COLUMN2  | INTEGER_COLUMN1  |
+--++--+---+---+-+-+--+--+-+-+--+--+
| 8999 ||  | NULL  | NULL  | NULL   
 | NULL| NULL | NULL | NULL| 
NULL| NULL |
+--++--+---+---+-+-+--+--+-+-+--+--+
1 row selected (10.485 seconds)
0: jdbc:hive2://localhost:1> 
0: jdbc:hive2://localhost:1> 
0: jdbc:hive2://localhost:1> 
QUERY2 > select cust_id from uniqdata1 limit 10;
+--+--+
| cust_id  |
+--+--+
| NULL |
| 8999 |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
+--+--+
10 rows selected (0.225 seconds)
QUERY3> select * from uniqdata1 where cust_id in (NULL,8999);
+--+--++++-+-+-+-+--+---+--+--+
| CUST_ID  |CUST_NAME |ACTIVE_EMUI_VERSION |  DOB   
|  DOJ   | BIGINT_COLUMN1  | BIGINT_COLUMN2  | 
DECIMAL_COLUMN1 | DECIMAL_COLUMN2 |Double_COLUMN1|
Double_COLUMN2 | INTEGER_COLUMN1  |
+--+--++++-+-+-+-+--+---+--+--+
| NULL |  || NULL   
| NULL   | NULL| NULL| NULL 
   | NULL| NULL | NULL  
| NULL |
| 8999 |  || NULL   
| NULL   | NULL| NULL| NULL 
   | NULL| NULL | NULL  
| NULL |
| NULL |  || NULL   
| NULL   | 1233720368578   | NULL| NULL 
   | NULL| NULL | NULL  
| NULL |
| NULL |  || NULL   
| NULL   | NULL| -223372036854   | NULL 
   | NULL| NULL | NULL  
| NULL |
| NULL |  || NULL   
| NULL   | NULL| NULL| 
12345678901.123458  | NULL| NULL | NULL 
 | NULL |
| NULL |  || NULL   
| NULL   | NULL| NULL| NULL 
   | 22345678901.123459  | NULL | NULL  
| NULL |
| NULL |  || NULL   
| NULL   | NULL| NULL| NULL 
   | NULL| 1.12345674897976E10  | NULL  
| NULL |
| NULL 

[jira] [Created] (CARBONDATA-1460) Drop column in alter table working incorrectly when connected to same thrift using different beeline sessions

2017-09-07 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1460:
--

 Summary: Drop column in alter table working incorrectly when 
connected to same thrift using different beeline sessions
 Key: CARBONDATA-1460
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1460
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.2.0
 Environment: Spark-2.1
Reporter: Sangeeta Gulia
Priority: Minor


I am trying to do concurrent testing on the same table. For that, I have 
started my thrift server with the given command:

sudo /home/hduser/spark-2.1.0-bin-hadoop2.7/bin/spark-submit  --master 
spark://host-name:7077 --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.2.0.jar 

and two different nodes are connecting to same thrift using two beeline 
sessions.

While beeline1 session executes a query :  alter table uniqdata drop 
columns(cust_id);
Then after execution of the query, beeline1 session gets error: 

Error: org.apache.spark.sql.AnalysisException: cannot resolve '`cust_id`' given 
input columns: [bigint_column1, double_column1, dob, doj, active_emui_version, 
decimal_column2, bigint_column2, integer_column1, cust_name, double_column2, 
decimal_column1]; line 1 pos 7;

While beeline2 session is able to get data for cust_id from that table using 
the below command: 

select cust_id from uniqdata;

But, when both the beeline session try to run "describe table uniqdata" they 
see the same result which does not include cust_id as a column in the table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (CARBONDATA-1431) Dictionary_Include working incorrectly for date and timestamp data type.

2017-09-06 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia closed CARBONDATA-1431.
--
Resolution: Fixed

> Dictionary_Include working incorrectly for date and timestamp data type.
> 
>
> Key: CARBONDATA-1431
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1431
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql, test
>Affects Versions: 1.2.0
>Reporter: Sangeeta Gulia
>Assignee: Pallavi Singh
>Priority: Minor
> Fix For: 1.2.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When we create a table with date and timestamp data type with 
> DICTIONARY_INCLUDE : 
> Example : 
> CREATE TABLE uniqdata_INCLUDEDICTIONARY2 (CUST_ID int,CUST_NAME 
> String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, 
> BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 
> double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2')
> It should either create the dictionary for date and timestamp field or it 
> should throw an error that "DICTIONARY_INCLUDE" feature is not supported for 
> date and timestamp.
> whereas in the current master branch,  the query executed successfully 
> without throwing any error and neither it created dictionary files for date 
> and timestamp field.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1431) Dictionary_Include working incorrectly for date and timestamp data type.

2017-08-30 Thread Sangeeta Gulia (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeeta Gulia updated CARBONDATA-1431:
---
Summary: Dictionary_Include working incorrectly for date and timestamp data 
type.  (was: Dictionary_Include working incorrectly for date and timestamp 
format.)

> Dictionary_Include working incorrectly for date and timestamp data type.
> 
>
> Key: CARBONDATA-1431
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1431
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql, test
>Affects Versions: 1.2.0
>Reporter: Sangeeta Gulia
>Priority: Minor
> Fix For: 1.2.0
>
>
> When we create a table with date and timestamp data type with 
> DICTIONARY_INCLUDE : 
> Example : 
> CREATE TABLE uniqdata_INCLUDEDICTIONARY2 (CUST_ID int,CUST_NAME 
> String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, 
> BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 
> double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2')
> It should either create the dictionary for date and timestamp field or it 
> should throw an error that "DICTIONARY_INCLUDE" feature is not supported for 
> date and timestamp.
> whereas in the current master branch,  the query executed successfully 
> without throwing any error and neither it created dictionary files for date 
> and timestamp field.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1431) Dictionary_Include working incorrectly for date and timestamp format.

2017-08-30 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1431:
--

 Summary: Dictionary_Include working incorrectly for date and 
timestamp format.
 Key: CARBONDATA-1431
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1431
 Project: CarbonData
  Issue Type: Bug
  Components: sql, test
Affects Versions: 1.2.0
Reporter: Sangeeta Gulia
Priority: Minor
 Fix For: 1.2.0


When we create a table with date and timestamp data type with 
DICTIONARY_INCLUDE : 

Example : 
CREATE TABLE uniqdata_INCLUDEDICTIONARY2 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2')

It should either create the dictionary for date and timestamp field or it 
should throw an error that "DICTIONARY_INCLUDE" feature is not supported for 
date and timestamp.

whereas in the current master branch,  the query executed successfully without 
throwing any error and neither it created dictionary files for date and 
timestamp field.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1424) Delete Operation working incorrectly when subquery returns bad-record

2017-08-29 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1424:
--

 Summary: Delete Operation working incorrectly when subquery 
returns bad-record
 Key: CARBONDATA-1424
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1424
 Project: CarbonData
  Issue Type: Bug
  Components: sql, test
Affects Versions: 1.2.0
Reporter: Sangeeta Gulia
Priority: Minor
 Attachments: 3000_UniqData.csv

Delete Operation is working incorrectly when subquery returns bad-record for a 
particular table. 

For the given query, 
delete from uniqdata_delete where cust_id in (select cust_id from 
uniqdata_delete limit 10);

As an Example, if "select cust_id from uniqdata_delete limit 10" returns  
+--+--+
| cust_id  |
+--+--+
| NULL |
| NULL |
| NULL |
| NULL |
| 11000|
| 11001 |
| 11002 |
| 11003 |
| 11004 |
| 11005|
+--+--+

then the query should delete all rows where cust_id is Null or matches any 
values from the returned values(11000-11005) whereas it deletes only those 
records where customer id is from (11000-11005).

I have attached the sample csv file which i have used for reference.

To Regenerate the issue, you can use below commands : 

CREATE TABLE uniqdata_delete (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= 
"256 MB");

LOAD DATA INPATH 
'hdfs://localhost:54310/user/hduser/input-files/3000_UniqData.csv' into table 
uniqdata_delete 
OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

NOTE : Load should be such that starting rows of data should have null stored 
for cust_id field. 

delete from uniqdata_delete where cust_id in (select cust_id from 
uniqdata_delete limit 10);



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1412) delete working incorrectly while using segment.starttime before ''

2017-08-28 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-1412:
--

 Summary: delete working incorrectly while using segment.starttime 
before ''
 Key: CARBONDATA-1412
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1412
 Project: CarbonData
  Issue Type: Bug
  Components: data-query, test
 Environment: Spark-2.1
Reporter: Sangeeta Gulia
Priority: Minor
 Fix For: 1.2.0


Issue exists in the below query :

delete from table uniqdata_delete where segment.starttime before 
'starttime_of_last_segment_created';

It should mark all those segments for delete whose start-time is before the 
given time and should not delete the segment with the given time.

But it is marking the segment for delete which is having the exact start time 
also.

To replicate the issue: 

CREATE TABLE uniqdata_delete (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= 
"256 MB")

LOAD DATA INPATH 
'hdfs://localhost:54310/user/hduser/input-files/3000_UniqData.csv' into table 
uniqdata_delete 
OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')


LOAD DATA INPATH 
'hdfs://localhost:54310/user/hduser/input-files/3000_UniqData.csv' into table 
uniqdata_delete 
OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')


delete from table uniqdata_delete where segment.starttime before 
'starttime_of_last_segment_created';




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)