[jira] [Commented] (CARBONDATA-2282) presto carbon does not support reading specific partition on which query is fired mapreduce.input.carboninputformat.partitions.to.prune property is null
[ https://issues.apache.org/jira/browse/CARBONDATA-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548942#comment-16548942 ] Sangeeta Gulia commented on CARBONDATA-2282: [~photogamrun] please test and close this issue. PR is already merged to fix this issue. https://github.com/apache/carbondata/pull/2139 > presto carbon does not support reading specific partition on which query is > fired mapreduce.input.carboninputformat.partitions.to.prune property is null > > > Key: CARBONDATA-2282 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2282 > Project: CarbonData > Issue Type: Bug > Components: core, presto-integration >Affects Versions: 1.3.0 >Reporter: zhangwei >Assignee: anubhav tarar >Priority: Major > Fix For: 1.3.0 > > Attachments: partitonToPrune.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2582) Carbon properties does not get distributed on cluster mode
[ https://issues.apache.org/jira/browse/CARBONDATA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548925#comment-16548925 ] Sangeeta Gulia commented on CARBONDATA-2582: This issue is resolved by PR https://github.com/apache/carbondata/pull/2265/files > Carbon properties does not get distributed on cluster mode > -- > > Key: CARBONDATA-2582 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2582 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.4.0 > Environment: presto-server-0.187 >Reporter: Geetika Gupta >Priority: Major > > Unsafe memory related Carbon properties mentioned in carbondata catalog of > presto were not getting distributed in presto cluster mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2581) Adding QueryStatistics in Presto Integration Code to Measure Performance
[ https://issues.apache.org/jira/browse/CARBONDATA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548909#comment-16548909 ] Sangeeta Gulia commented on CARBONDATA-2581: [~geetikagupta] This issue is also resolved. Its done with [GitHub Pull Request #2265|https://github.com/apache/carbondata/pull/2265]. Please close the issue and update its resolution to resolved. > Adding QueryStatistics in Presto Integration Code to Measure Performance > > > Key: CARBONDATA-2581 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2581 > Project: CarbonData > Issue Type: Improvement > Components: presto-integration >Affects Versions: 1.4.0 >Reporter: Geetika Gupta >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2583) Presto Performance Optimization - Creating a Multiblock Split to reduce network IO
[ https://issues.apache.org/jira/browse/CARBONDATA-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548906#comment-16548906 ] Sangeeta Gulia commented on CARBONDATA-2583: [~geetikagupta] This issue could be closed. As the PR for optimization is merged. > Presto Performance Optimization - Creating a Multiblock Split to reduce > network IO > -- > > Key: CARBONDATA-2583 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2583 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 1.4.0 >Reporter: Geetika Gupta >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2263) Date data is loaded incorrectly.
[ https://issues.apache.org/jira/browse/CARBONDATA-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518029#comment-16518029 ] Sangeeta Gulia commented on CARBONDATA-2263: Date format should follow conventions as per Simple Date Format as in link: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html > Date data is loaded incorrectly. > > > Key: CARBONDATA-2263 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2263 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Sangeeta Gulia >Assignee: anubhav tarar >Priority: Minor > Attachments: dataSample.csv > > Time Spent: 3h > Remaining Estimate: 0h > > When we set : > CarbonProperties.getInstance() > .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/mm/dd") > and run below commands: > spark.sql("DROP TABLE IF EXISTS t3") > spark.sql( > s""" > | CREATE TABLE IF NOT EXISTS t3( > | ID Int, > | date Date, > | country String, > | name String, > | phonetype String, > | serialname String, > | salary Int, > | floatField float > | ) STORED BY 'carbondata' > """.stripMargin) > spark.sql(s""" > LOAD DATA LOCAL INPATH '$testData' into table t3 > options('ALL_DICTIONARY_PATH'='$allDictFile', 'SINGLE_PASS'='true') > """) > spark.sql(""" > SELECT * FROM t3 > """).show() > spark.sql(""" > SELECT * FROM t3 where floatField=3.5 > """).show() > spark.sql("DROP TABLE IF EXISTS t3") > Date data is loaded as below: > +---+--+---+-+-+--+--+--+ > | id| date|country| name|phonetype|serialname|salary|floatfield| > +---+--+---+-+-+--+--+--+ > | 9|2015-01-18| china| aaa9| phone706| ASD86717| 15008| 2.34| > | 10|2015-01-19| usa|aaa10| phone685| ASD30505| 15009| 2.34| > | 1|2015-01-23| china| aaa1| phone197| ASD69643| 15000| 2.34| > | 2|2015-01-24| china| aaa2| phone756| ASD42892| 15001| 2.34| > | 3|2015-01-25| china| aaa3|phone1904| ASD37014| 15002| 2.34| > | 4|2015-01-26| china| aaa4|phone2435| ASD66902| 15003| 2.34| > | 5|2015-01-27| china| aaa5|phone2441| ASD90633| 15004| 2.34| > | 6|2015-01-28| china| aaa6| phone294| ASD59961| 15005| 3.5| > | 7|2015-01-29| china| aaa7| phone610| ASD14875| 15006| 2.34| > | 8|2015-01-30| china| aaa8|phone1848| ASD57308| 15007| 2.34| > +---+--+---+-+-+--+--+--+ > > However correct data is : > ID,date,country,name,phonetype,serialname,salary,floatField > 1,2015/7/23,china,aaa1,phone197,ASD69643,15000,2.34 > 2,2015/7/24,china,aaa2,phone756,ASD42892,15001,2.34 > 3,2015/7/25,china,aaa3,phone1904,ASD37014,15002,2.34 > 4,2015/7/26,china,aaa4,phone2435,ASD66902,15003,2.34 > 5,2015/7/27,china,aaa5,phone2441,ASD90633,15004,2.34 > 6,2015/7/28,china,aaa6,phone294,ASD59961,15005,3.5 > 7,2015/7/29,china,aaa7,phone610,ASD14875,15006,2.34 > 8,2015/7/30,china,aaa8,phone1848,ASD57308,15007,2.34 > 9,2015/7/18,china,aaa9,phone706,ASD86717,15008,2.34 > 10,2015/7/19,usa,aaa10,phone685,ASD30505,15009,2.34 > > which says Month data is loaded incorrectly. > > Similarly, if we use : > CarbonProperties.getInstance() > .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/mm/dd") > it again store incorrect data for date. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2145) Refactor PreAggregate functionality for dictionary include.
[ https://issues.apache.org/jira/browse/CARBONDATA-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia closed CARBONDATA-2145. -- Resolution: Information Provided > Refactor PreAggregate functionality for dictionary include. > --- > > Key: CARBONDATA-2145 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2145 > Project: CarbonData > Issue Type: Improvement >Reporter: Sangeeta Gulia >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > If in maintable, the column is dictionary type then only add the count to > measure column. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2273) Add sdv test cases for testing boolean datatype support.
Sangeeta Gulia created CARBONDATA-2273: -- Summary: Add sdv test cases for testing boolean datatype support. Key: CARBONDATA-2273 URL: https://issues.apache.org/jira/browse/CARBONDATA-2273 Project: CarbonData Issue Type: Test Reporter: Sangeeta Gulia -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2272) Boolean Data loaded incorrectly when boolean column is dictionary include.
Sangeeta Gulia created CARBONDATA-2272: -- Summary: Boolean Data loaded incorrectly when boolean column is dictionary include. Key: CARBONDATA-2272 URL: https://issues.apache.org/jira/browse/CARBONDATA-2272 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.3.0 Reporter: Sangeeta Gulia Steps to reproduce: sql( s""" |CREATE TABLE if not exists boolean_table( |aa STRING, bb INT, cc BOOLEAN, dd BOOLEAN |) STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='cc')""".stripMargin) sql("insert into boolean_table values('adam',11,true,true)") sql("insert into boolean_table values('james',12,false,false)") sql("insert into boolean_table values('smith',13,true,true)") sql("select * from boolean_table ").show() Output: +-+---++-+ | aa| bb| cc| dd| +-+---++-+ |james| 12|true|false| |smith| 13|true| true| | adam| 11|true| true| +-+---++-+ As can be seen from above output data in cc column is wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2263) Date data is loaded incorrectly.
Sangeeta Gulia created CARBONDATA-2263: -- Summary: Date data is loaded incorrectly. Key: CARBONDATA-2263 URL: https://issues.apache.org/jira/browse/CARBONDATA-2263 Project: CarbonData Issue Type: Bug Affects Versions: 1.3.0 Reporter: Sangeeta Gulia Attachments: dataSample.csv When we set : CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/mm/dd") and run below commands: spark.sql("DROP TABLE IF EXISTS t3") spark.sql( s""" | CREATE TABLE IF NOT EXISTS t3( | ID Int, | date Date, | country String, | name String, | phonetype String, | serialname String, | salary Int, | floatField float | ) STORED BY 'carbondata' """.stripMargin) spark.sql(s""" LOAD DATA LOCAL INPATH '$testData' into table t3 options('ALL_DICTIONARY_PATH'='$allDictFile', 'SINGLE_PASS'='true') """) spark.sql(""" SELECT * FROM t3 """).show() spark.sql(""" SELECT * FROM t3 where floatField=3.5 """).show() spark.sql("DROP TABLE IF EXISTS t3") Date data is loaded as below: +---+--+---+-+-+--+--+--+ | id| date|country| name|phonetype|serialname|salary|floatfield| +---+--+---+-+-+--+--+--+ | 9|2015-01-18| china| aaa9| phone706| ASD86717| 15008| 2.34| | 10|2015-01-19| usa|aaa10| phone685| ASD30505| 15009| 2.34| | 1|2015-01-23| china| aaa1| phone197| ASD69643| 15000| 2.34| | 2|2015-01-24| china| aaa2| phone756| ASD42892| 15001| 2.34| | 3|2015-01-25| china| aaa3|phone1904| ASD37014| 15002| 2.34| | 4|2015-01-26| china| aaa4|phone2435| ASD66902| 15003| 2.34| | 5|2015-01-27| china| aaa5|phone2441| ASD90633| 15004| 2.34| | 6|2015-01-28| china| aaa6| phone294| ASD59961| 15005| 3.5| | 7|2015-01-29| china| aaa7| phone610| ASD14875| 15006| 2.34| | 8|2015-01-30| china| aaa8|phone1848| ASD57308| 15007| 2.34| +---+--+---+-+-+--+--+--+ However correct data is : ID,date,country,name,phonetype,serialname,salary,floatField 1,2015/7/23,china,aaa1,phone197,ASD69643,15000,2.34 2,2015/7/24,china,aaa2,phone756,ASD42892,15001,2.34 3,2015/7/25,china,aaa3,phone1904,ASD37014,15002,2.34 4,2015/7/26,china,aaa4,phone2435,ASD66902,15003,2.34 5,2015/7/27,china,aaa5,phone2441,ASD90633,15004,2.34 6,2015/7/28,china,aaa6,phone294,ASD59961,15005,3.5 7,2015/7/29,china,aaa7,phone610,ASD14875,15006,2.34 8,2015/7/30,china,aaa8,phone1848,ASD57308,15007,2.34 9,2015/7/18,china,aaa9,phone706,ASD86717,15008,2.34 10,2015/7/19,usa,aaa10,phone685,ASD30505,15009,2.34 which says Month data is loaded incorrectly. Similarly, if we use : CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/mm/dd") it again store incorrect data for date. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2241) Wrong Query written in Preaggregation Document
Sangeeta Gulia created CARBONDATA-2241: -- Summary: Wrong Query written in Preaggregation Document Key: CARBONDATA-2241 URL: https://issues.apache.org/jira/browse/CARBONDATA-2241 Project: CarbonData Issue Type: Bug Reporter: Sangeeta Gulia Below query is written in document: SELECT sum(price), country from sales GROUP BY country and it is said that it will execute on datamap, but it will execute with main table and not datamap. Fix: Correct the query so that it will execute using datamap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2240) Refactor UT's to remove duplicate test scenarios and code to improve CI time for Preaggregate expressions and selection scenario
Sangeeta Gulia created CARBONDATA-2240: -- Summary: Refactor UT's to remove duplicate test scenarios and code to improve CI time for Preaggregate expressions and selection scenario Key: CARBONDATA-2240 URL: https://issues.apache.org/jira/browse/CARBONDATA-2240 Project: CarbonData Issue Type: Improvement Reporter: Sangeeta Gulia This task includes the following improvements on Preaggregate expressions and selection scenario: 1) Refactor UT's to remove duplicate test scenarios to improve CI time. 2) Refactor test case for duplicate code in different class 3) Correcting test case for missing assert -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2226) Refactor UT's to remove duplicate test scenarios to improve CI time for PreAggregate create and drop feature
Sangeeta Gulia created CARBONDATA-2226: -- Summary: Refactor UT's to remove duplicate test scenarios to improve CI time for PreAggregate create and drop feature Key: CARBONDATA-2226 URL: https://issues.apache.org/jira/browse/CARBONDATA-2226 Project: CarbonData Issue Type: Improvement Affects Versions: 1.3.0 Reporter: Sangeeta Gulia -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2155) IS NULL not working correctly on string datatype with dictionary_include in presto integration
Sangeeta Gulia created CARBONDATA-2155: -- Summary: IS NULL not working correctly on string datatype with dictionary_include in presto integration Key: CARBONDATA-2155 URL: https://issues.apache.org/jira/browse/CARBONDATA-2155 Project: CarbonData Issue Type: Bug Components: presto-integration Affects Versions: 1.3.0 Environment: Spark-2.1 Presto 0.187 Reporter: Sangeeta Gulia Attachments: lineitem.csv Steps to reproduce: 1) Create table on carbondata and load data to it. create table if not exists lineitem_carbon1( L_SHIPDATE date, L_SHIPMODE string, L_SHIPINSTRUCT string, L_RETURNFLAG string, L_RECEIPTDATE date, L_ORDERKEY string, L_PARTKEY string, L_SUPPKEY string, L_LINENUMBER int, L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double, L_LINESTATUS string, L_COMMITDATE date, L_COMMENT string ) STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_LINESTATUS', 'table_blocksize'='300', 'no_inverted_index'='L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_COMMENT'); load data inpath "hdfs://localhost:54310/user/hduser/input-files/lineitem.csv" into table lineitem_carbon1 options('DATEFORMAT' = '-MM-dd','DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT','BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORDS_ACTION'='FORCE'); 1: jdbc:hive2://localhost:1> select l_shipmode from lineitem_carbon1 where l_shipmode is NULL; +-+--+ | l_shipmode | +-+--+ | NULL | +-+–+ 2) Access the same table from presto-cli and try to run select query form there: presto:performance> select l_shipmode from lineitem_carbon1 where l_shipmode is NULL; l_shipmode (0 rows) Expected Result: It should be same as result from carbon. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2152) Min function working incorrectly for string type with dictionary include in presto.
[ https://issues.apache.org/jira/browse/CARBONDATA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia updated CARBONDATA-2152: --- Attachment: lineitem.csv > Min function working incorrectly for string type with dictionary include in > presto. > --- > > Key: CARBONDATA-2152 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2152 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.3.0 > Environment: Spark2.1 > Presto0.187 >Reporter: Sangeeta Gulia >Assignee: anubhav tarar >Priority: Major > Attachments: lineitem.csv > > > Steps to reproduce: > 1) Create and Load in carbondata. > create table if not exists lineitem_carbon1( > L_SHIPDATE date, > L_SHIPMODE string, > L_SHIPINSTRUCT string, > L_RETURNFLAG string, > L_RECEIPTDATE date, > L_ORDERKEY string, > L_PARTKEY string, > L_SUPPKEY string, > L_LINENUMBER int, > L_QUANTITY double, > L_EXTENDEDPRICE double, > L_DISCOUNT double, > L_TAX double, > L_LINESTATUS string, > L_COMMITDATE date, > L_COMMENT string > ) STORED BY 'carbondata' > TBLPROPERTIES > ('DICTIONARY_INCLUDE'='L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_LINESTATUS', > 'table_blocksize'='300', 'no_inverted_index'='L_ORDERKEY, L_PARTKEY, > L_SUPPKEY, L_COMMENT'); > load data inpath > "hdfs://localhost:54310/user/hduser/input-files/lineitem.csv" into table > lineitem_carbon1 options('DATEFORMAT' = > '-MM-dd','DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT','BAD_RECORDS_LOGGER_ENABLE'='true', > 'BAD_RECORDS_ACTION'='FORCE'); > > 0: jdbc:hive2://localhost:1> select min(l_shipmode) from > lineitem_carbon1; > +--+--+ > | min(l_shipmode) | > +--+--+ > | AIR | > +--+--+ > 2) Connect to carbondata store from presto and perform the below query from > presto-cli: > presto:performance> select min(l_shipmode) from lineitem_carbon1; > _col0 > -- > @NU#LL$! > (1 row) > > Expected: On presto also, it should give the correct output as shown on > carbondata. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2152) Min function working incorrectly for string type with dictionary include in presto.
Sangeeta Gulia created CARBONDATA-2152: -- Summary: Min function working incorrectly for string type with dictionary include in presto. Key: CARBONDATA-2152 URL: https://issues.apache.org/jira/browse/CARBONDATA-2152 Project: CarbonData Issue Type: Bug Components: presto-integration Affects Versions: 1.3.0 Environment: Spark2.1 Presto0.187 Reporter: Sangeeta Gulia Steps to reproduce: 1) Create and Load in carbondata. create table if not exists lineitem_carbon1( L_SHIPDATE date, L_SHIPMODE string, L_SHIPINSTRUCT string, L_RETURNFLAG string, L_RECEIPTDATE date, L_ORDERKEY string, L_PARTKEY string, L_SUPPKEY string, L_LINENUMBER int, L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double, L_LINESTATUS string, L_COMMITDATE date, L_COMMENT string ) STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_LINESTATUS', 'table_blocksize'='300', 'no_inverted_index'='L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_COMMENT'); load data inpath "hdfs://localhost:54310/user/hduser/input-files/lineitem.csv" into table lineitem_carbon1 options('DATEFORMAT' = '-MM-dd','DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT','BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORDS_ACTION'='FORCE'); 0: jdbc:hive2://localhost:1> select min(l_shipmode) from lineitem_carbon1; +--+--+ | min(l_shipmode) | +--+--+ | AIR | +--+--+ 2) Connect to carbondata store from presto and perform the below query from presto-cli: presto:performance> select min(l_shipmode) from lineitem_carbon1; _col0 -- @NU#LL$! (1 row) Expected: On presto also, it should give the correct output as shown on carbondata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2145) Refactor PreAggregate functionality for dictionary include.
Sangeeta Gulia created CARBONDATA-2145: -- Summary: Refactor PreAggregate functionality for dictionary include. Key: CARBONDATA-2145 URL: https://issues.apache.org/jira/browse/CARBONDATA-2145 Project: CarbonData Issue Type: Improvement Reporter: Sangeeta Gulia If in maintable, the column is dictionary type then only add the count to measure column. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2112) Data getting garbled after datamap creation when table is created with GLOBAL SORT
Sangeeta Gulia created CARBONDATA-2112: -- Summary: Data getting garbled after datamap creation when table is created with GLOBAL SORT Key: CARBONDATA-2112 URL: https://issues.apache.org/jira/browse/CARBONDATA-2112 Project: CarbonData Issue Type: Bug Components: data-query Environment: spark-2.1 Reporter: Sangeeta Gulia Attachments: 2000_UniqData.csv Data is getting garbled after datamap creation when table is created with BATCH_SORT/GLOBAL_SORT. Steps to reproduce : spark.sql("drop table if exists uniqdata_batchsort_compact3") spark.sql("CREATE TABLE uniqdata_batchsort_compact3 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'carbondata' TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT')").show() spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," + "'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')") spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," + "'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')") spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," + "'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')") spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 group by cust_id ").show(50) +---++ |cust_id|avg(cust_id)| +---++ | 9376| 9376.0| | 9427| 9427.0| | 9465| 9465.0| | 9852| 9852.0| | 9900| 9900.0| | 10206| 10206.0| | 10362| 10362.0| | 10623| 10623.0| | 10817| 10817.0| | 9182| 9182.0| | 9564| 9564.0| | 9879| 9879.0| | 10081| 10081.0| | 10121| 10121.0| | 10230| 10230.0| | 10462| 10462.0| | 10703| 10703.0| | 10914| 10914.0| | 9162| 9162.0| | 9383| 9383.0| | 9454| 9454.0| | 9517| 9517.0| | 9558| 9558.0| | 10708| 10708.0| | 10798| 10798.0| | 10862| 10862.0| | 9071| 9071.0| | 9169| 9169.0| | 9946| 9946.0| | 10468| 10468.0| | 10745| 10745.0| | 10768| 10768.0| | 9153| 9153.0| | 9206| 9206.0| | 9403| 9403.0| | 9597| 9597.0| | 9647| 9647.0| | 9775| 9775.0| | 10032| 10032.0| | 10395| 10395.0| | 10527| 10527.0| | 10567| 10567.0| | 10632| 10632.0| | 10788| 10788.0| | 10815| 10815.0| | 10840| 10840.0| | 9181| 9181.0| | 9344| 9344.0| | 9575| 9575.0| | 9675| 9675.0| +---++ only showing top 50 rows Note: Here the cust_id is coming correct . spark.sql("create datamap uniqdata_agg on table uniqdata_batchsort_compact3 using " + "'preaggregate' as select avg(cust_id) from uniqdata_batchsort_compact3 group by cust_id") spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 group by cust_id ").show(50) +---++ |cust_id|avg(cust_id)| +---++ | 27651| 9217.0| | 31944| 10648.0| | 32667| 10889.0| | 28242| 9414.0| | 29841| 9947.0| | 28728| 9576.0| | 27255| 9085.0| | 32571| 10857.0| | 30276| 10092.0| | 27276| 9092.0| | 31503| 10501.0| | 27687| 9229.0| | 27183| 9061.0| | 29334| 9778.0| | 29913| 9971.0| | 28683| 9561.0| | 31545| 10515.0| | 30405| 10135.0| | 27693| 9231.0| | 29649| 9883.0| | 30537| 10179.0| | 32709| 10903.0| | 29586| 9862.0| | 32895| 10965.0| | 32415| 10805.0| | 31644| 10548.0| | 30030| 10010.0| | 31713| 10571.0| | 28083| 9361.0| | 27813| 9271.0| | 27171| 9057.0| | 27189| 9063.0| | 30444| 10148.0| | 28623| 9541.0| | 28566| 9522.0| | 32655| 10885.0| | 31164| 10388.0| | 30321| 10107.0| | 31452| 10484.0| | 29829| 9943.0| | 27468| 9156.0| | 31212| 10404.0| | 32154| 10718.0| | 27531| 9177.0| | 27654| 9218.0| | 27105| 9035.0| | 31113| 10371.0| | 28479| 9493.0| | 29094| 9698.0| | 31551| 10517.0| +---++ only showing top 50 rows Note: But after datamap creation, cust_id is coming incorrect. It is coming as thrice(equivalent to number of loads) of its original value and avg(cust_id) is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-1956) Select query with sum, count and avg throws exception for pre aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343035#comment-16343035 ] Sangeeta Gulia commented on CARBONDATA-1956: [~geetikagupta] This issue is not coming on current master branch code. It is working fine. Please close this bug. > Select query with sum, count and avg throws exception for pre aggregate table > - > > Key: CARBONDATA-1956 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1956 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: spark2.1 >Reporter: Geetika Gupta >Priority: Major > Fix For: 1.3.0 > > Attachments: 2000_UniqData.csv > > > I create a datamap using the following command: > create datamap uniqdata_agg_d on table uniqdata_29 using 'preaggregate' as > select sum(decimal_column1), count(cust_id), avg(bigint_column1) from > uniqdata_29 group by cust_id; > The datamap creation was successfull, but when I tried the following query: > select sum(decimal_column1), count(cust_id), avg(bigint_column1) from > uniqdata_29 group by cust_id; > It throws the following exception: > Error: org.apache.spark.sql.AnalysisException: cannot resolve > '(sum(uniqdata_29_uniqdata_agg_d.`uniqdata_29_bigint_column1_sum`) / > sum(uniqdata_29_uniqdata_agg_d.`uniqdata_29_bigint_column1_count`))' due to > data type mismatch: > '(sum(uniqdata_29_uniqdata_agg_d.`uniqdata_29_bigint_column1_sum`) / > sum(uniqdata_29_uniqdata_agg_d.`uniqdata_29_bigint_column1_count`))' requires > (double or decimal) type, not bigint;; > 'Aggregate [uniqdata_29_cust_id_count#244], > [sum(uniqdata_29_decimal_column1_sum#243) AS sum(decimal_column1)#274, > sum(cast(uniqdata_29_cust_id_count#244 as bigint)) AS count(cust_id)#276L, > (sum(uniqdata_29_bigint_column1_sum#245L) / > sum(uniqdata_29_bigint_column1_count#246L)) AS avg(bigint_column1)#279] > +- > Relation[uniqdata_29_decimal_column1_sum#243,uniqdata_29_cust_id_count#244,uniqdata_29_bigint_column1_sum#245L,uniqdata_29_bigint_column1_count#246L] > CarbonDatasourceHadoopRelation [ Database name :28dec, Table name > :uniqdata_29_uniqdata_agg_d, Schema > :Some(StructType(StructField(uniqdata_29_decimal_column1_sum,DecimalType(30,10),true), > StructField(uniqdata_29_cust_id_count,IntegerType,true), > StructField(uniqdata_29_bigint_column1_sum,LongType,true), > StructField(uniqdata_29_bigint_column1_count,LongType,true))) ] > (state=,code=0) > Steps for creation of maintable: > CREATE TABLE uniqdata_29(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > Load command: > LOAD DATA INPATH 'hdfs://localhost:54311/Files/2000_UniqData.csv' into table > uniqdata_29 OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > Datamap creation command: > create datamap uniqdata_agg_d on table uniqdata_29 using 'preaggregate' as > select sum(decimal_column1), count(cust_id), avg(bigint_column1) from > uniqdata_29 group by cust_id; > Note: select sum(decimal_column1), count(cust_id), avg(bigint_column1) from > uniqdata_29 group by cust_id; executed successfully on maintable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-1985) Insert into failed for multi partitioned table for static partition
[ https://issues.apache.org/jira/browse/CARBONDATA-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342995#comment-16342995 ] Sangeeta Gulia commented on CARBONDATA-1985: [~geetikagupta] Hive also shows the same behavior. Hence it is an invalid bug. Please close. To verify: you can create a hive table with partition: CREATE TABLE uniqdata_hive1(ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 int) Partitioned by (cust_id int, cust_name string) stored as parquet; insert into uniqdata_hive1 partition(cust_id='1',cust_name='CUST_NAME_2') select * from uniqdata_hive limit 10; Below are the commands and result for your reference: 0: jdbc:hive2://localhost:1> CREATE TABLE uniqdata_hive1(ACTIVE_EMUI_VERSION string, DOB timestamp, 0: jdbc:hive2://localhost:1> DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 0: jdbc:hive2://localhost:1> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, 0: jdbc:hive2://localhost:1> INTEGER_COLUMN1 int) Partitioned by (cust_id int, cust_name string) stored as parquet; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.305 seconds) 0: jdbc:hive2://localhost:1> insert into uniqdata_hive1 partition(cust_id='1',cust_name='CUST_NAME_2') select * from uniqdata_hive limit 10; Error: org.apache.spark.sql.AnalysisException: Cannot insert into table `default`.`uniqdata_hive1` because the number of columns are different: need 10 columns, but query has 12 columns.; (state=,code=0) 0: jdbc:hive2://localhost:1> > Insert into failed for multi partitioned table for static partition > --- > > Key: CARBONDATA-1985 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1985 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: spark2.1 >Reporter: Geetika Gupta >Priority: Major > Fix For: 1.3.0 > > Attachments: 2000_UniqData.csv > > > I created a table using: > CREATE TABLE uniqdata_int_string(ACTIVE_EMUI_VERSION string, DOB timestamp, > DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 > decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, > INTEGER_COLUMN1 int) Partitioned by (cust_id int, cust_name string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB") > Hive create and load table command: > CREATE TABLE uniqdata_hive (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, > DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 > decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, > INTEGER_COLUMN1 int)ROW FORMAT DELIMITED > FIELDS TERMINATED BY ','; > LOAD DATA LOCAL INPATH 'file:///home/geetika/Downloads/2000_UniqData.csv' > into table UNIQDATA_HIVE; > Insert into table command: > insert into uniqdata_int_string > partition(cust_id='1',cust_name='CUST_NAME_2') select * from > uniqdata_hive limit 10; > Output: > Error: java.lang.IndexOutOfBoundsException: Index: 4, Size: 4 (state=,code=0) > Here are the logs: > 18/01/04 16:24:45 ERROR CarbonLoadDataCommand: pool-23-thread-6 > org.apache.spark.sql.AnalysisException: Cannot insert into table > `28dec`.`uniqdata_int_string` because the number of columns are different: > need 10 columns, but query has 12 columns.; > at > org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:222) > at > org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:280) > at > org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277) > at > org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:272) > at > org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:207) > at
[jira] [Commented] (CARBONDATA-2004) Incorrect result displays while loading data into a partitioned table.
[ https://issues.apache.org/jira/browse/CARBONDATA-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337120#comment-16337120 ] Sangeeta Gulia commented on CARBONDATA-2004: This issue is working fine on current master branch code. Please retest and close this bug [~Vandana7]. > Incorrect result displays while loading data into a partitioned table. > -- > > Key: CARBONDATA-2004 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2004 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Priority: Major > Attachments: 2000_UniqData.csv, timestamp.png > > > Incorrect result displays while loading data into a partitioned table. > Steps to reproduce: > 1)create a partitioned table: > CREATE TABLE uniqdata_timestamp (CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOJ timestamp, BIGINT_COLUMN1 > bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 > int) partitioned by(dob timestamp) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB") > 2) Load data into table: > LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' > into table uniqdata_timestamp partition(dob='1') OPTIONS > ('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, > BIGINT_COLUMN1, > BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, > Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); > 3) Expected result: It should throw an error as invalid partition value. > 4) Actual Result: it displays notification of successful load, but there is > no data into the table > 5) execute query: select count(*) from uniqdata_timestamp; > output: > +---+--+ > | count(1) | > +---+--+ > | 0 | > +---+--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-1962) Support alter table add columns/drop columns on S3 table
[ https://issues.apache.org/jira/browse/CARBONDATA-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia resolved CARBONDATA-1962. Resolution: Implemented > Support alter table add columns/drop columns on S3 table > > > Key: CARBONDATA-1962 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1962 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-1963) Support S3 table with dictionary
[ https://issues.apache.org/jira/browse/CARBONDATA-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia resolved CARBONDATA-1963. Resolution: Implemented > Support S3 table with dictionary > > > Key: CARBONDATA-1963 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1963 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-1960) Add example for creating a local table and load CSV data which is stored in S3.
[ https://issues.apache.org/jira/browse/CARBONDATA-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia resolved CARBONDATA-1960. Resolution: Implemented Fix Version/s: 1.4.0 > Add example for creating a local table and load CSV data which is stored in > S3. > --- > > Key: CARBONDATA-1960 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1960 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Trivial > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-1961) Support data update/delete on S3 table
[ https://issues.apache.org/jira/browse/CARBONDATA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia resolved CARBONDATA-1961. Resolution: Implemented Fix Version/s: 1.4.0 > Support data update/delete on S3 table > -- > > Key: CARBONDATA-1961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1961 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-1959) Support compaction on S3 table
[ https://issues.apache.org/jira/browse/CARBONDATA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia resolved CARBONDATA-1959. Resolution: Fixed Fix Version/s: 1.4.0 > Support compaction on S3 table > -- > > Key: CARBONDATA-1959 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1959 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > Fix For: 1.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia resolved CARBONDATA-1827. Resolution: Fixed Fix Version/s: 1.4.0 > Add Support to provide S3 Functionality in Carbondata > - > > Key: CARBONDATA-1827 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1827 > Project: CarbonData > Issue Type: Task > Components: core >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > Fix For: 1.4.0 > > Time Spent: 29h 10m > Remaining Estimate: 0h > > Added Support to provide S3 Functionality in Carbondata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333924#comment-16333924 ] Sangeeta Gulia commented on CARBONDATA-1827: Github PR 1805 completes all the listed task. > Add Support to provide S3 Functionality in Carbondata > - > > Key: CARBONDATA-1827 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1827 > Project: CarbonData > Issue Type: Task > Components: core >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > Time Spent: 29h 10m > Remaining Estimate: 0h > > Added Support to provide S3 Functionality in Carbondata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-1673) Carbon 1.3.0-Partitioning:Show Partition for Range Partition is not showing the correct details.
[ https://issues.apache.org/jira/browse/CARBONDATA-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317882#comment-16317882 ] Sangeeta Gulia commented on CARBONDATA-1673: I am unable to replicate this issue. It is working correctly on current master branch code. These are the commands I have executed: CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") spark.sql("DROP TABLE IF EXISTS t0") spark.sql(""" | CREATE TABLE IF NOT EXISTS t0 | ( | id Int, | vin String, | phonenumber Long, | country String, | area String, | salary Int | ) | PARTITIONED BY (logdate Timestamp) | STORED BY 'carbondata' | TBLPROPERTIES('PARTITION_TYPE'='RANGE', | 'RANGE_INFO'='2014/01/01, 2015/01/01, 2016/01/01') """.stripMargin) spark.sql("""show partitions t0""").show() And below is my result after show partition: ++ | partition| ++ |0, logdate = DEFAULT| |1, logdate < 2014...| |2, 2014/01/01 <= ...| |3, 2015/01/01 <= ...| ++ Please let me know if you are able to replicate it on current master branch code. > Carbon 1.3.0-Partitioning:Show Partition for Range Partition is not showing > the correct details. > > > Key: CARBONDATA-1673 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1673 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.3.0 >Reporter: Ayushi Sharma >Priority: Minor > Attachments: Range_recording.htm, Range_recording.swf > > > For description, please refer to the attachment. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1963) Support S3 table with dictionary
Sangeeta Gulia created CARBONDATA-1963: -- Summary: Support S3 table with dictionary Key: CARBONDATA-1963 URL: https://issues.apache.org/jira/browse/CARBONDATA-1963 Project: CarbonData Issue Type: Task Reporter: Sangeeta Gulia Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia updated CARBONDATA-1827: --- Issue Type: Task (was: New Feature) > Add Support to provide S3 Functionality in Carbondata > - > > Key: CARBONDATA-1827 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1827 > Project: CarbonData > Issue Type: Task > Components: core >Reporter: Sangeeta Gulia >Assignee: Jatin >Priority: Minor > Time Spent: 9h > Remaining Estimate: 0h > > Added Support to provide S3 Functionality in Carbondata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1962) Support alter table add columns/drop columns on S3 table
Sangeeta Gulia created CARBONDATA-1962: -- Summary: Support alter table add columns/drop columns on S3 table Key: CARBONDATA-1962 URL: https://issues.apache.org/jira/browse/CARBONDATA-1962 Project: CarbonData Issue Type: Task Reporter: Sangeeta Gulia Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1961) Support data update/delete on S3 table
[ https://issues.apache.org/jira/browse/CARBONDATA-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia updated CARBONDATA-1961: --- Priority: Minor (was: Major) > Support data update/delete on S3 table > -- > > Key: CARBONDATA-1961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1961 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1961) Support data update/delete on S3 table
Sangeeta Gulia created CARBONDATA-1961: -- Summary: Support data update/delete on S3 table Key: CARBONDATA-1961 URL: https://issues.apache.org/jira/browse/CARBONDATA-1961 Project: CarbonData Issue Type: Task Reporter: Sangeeta Gulia -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1960) Add example for creating a local table and load CSV data which is stored in S3.
Sangeeta Gulia created CARBONDATA-1960: -- Summary: Add example for creating a local table and load CSV data which is stored in S3. Key: CARBONDATA-1960 URL: https://issues.apache.org/jira/browse/CARBONDATA-1960 Project: CarbonData Issue Type: Task Reporter: Sangeeta Gulia Priority: Trivial -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1959) Support compaction on S3 table
[ https://issues.apache.org/jira/browse/CARBONDATA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia updated CARBONDATA-1959: --- Priority: Minor (was: Major) > Support compaction on S3 table > -- > > Key: CARBONDATA-1959 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1959 > Project: CarbonData > Issue Type: Task >Reporter: Sangeeta Gulia >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1959) Support compaction on S3 table
Sangeeta Gulia created CARBONDATA-1959: -- Summary: Support compaction on S3 table Key: CARBONDATA-1959 URL: https://issues.apache.org/jira/browse/CARBONDATA-1959 Project: CarbonData Issue Type: Task Reporter: Sangeeta Gulia -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1758) Carbon1.3.0- No Inverted Index : Select column with is null for no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/CARBONDATA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299650#comment-16299650 ] Sangeeta Gulia commented on CARBONDATA-1758: [~chetdb] This is the result of my query after executing the entire sequence of queries you have mentioned. 0: jdbc:hive2://hadoop-master:1> Select CUST_ID from uniqdata_DI_int where CUST_ID is null; +--+--+ | CUST_ID | +--+--+ | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | +--+--+ 26 rows selected (0.408 seconds) 0: jdbc:hive2://hadoop-master:1> > Carbon1.3.0- No Inverted Index : Select column with is null for > no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException > > > Key: CARBONDATA-1758 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1758 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node cluster >Reporter: Chetan Bhat > Labels: Functional > > Steps : > In Beeline user executes the queries in sequence. > CREATE TABLE uniqdata_DI_int (CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, > BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 > double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='cust_id','NO_INVERTED_INDEX'='cust_id'); > LOAD DATA INPATH 'hdfs://hacluster/chetan/3000_UniqData.csv' into table > uniqdata_DI_int OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > Select count(CUST_ID) from uniqdata_DI_int; > Select count(CUST_ID)*10 as multiple from uniqdata_DI_int; > Select avg(CUST_ID) as average from uniqdata_DI_int; > Select floor(CUST_ID) as average from uniqdata_DI_int; > Select ceil(CUST_ID) as average from uniqdata_DI_int; > Select ceiling(CUST_ID) as average from uniqdata_DI_int; > Select CUST_ID*integer_column1 as multiple from uniqdata_DI_int; > Select CUST_ID from uniqdata_DI_int where CUST_ID is null; > *Issue : Select column with is null for no_inverted_index column throws > java.lang.ArrayIndexOutOfBoundsException* > 0: jdbc:hive2://10.18.98.34:23040> Select CUST_ID from uniqdata_DI_int where > CUST_ID is null; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 79.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 79.0 (TID 123, BLR114278, executor 18): > org.apache.spark.util.TaskCompletionListenerException: > java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105) > at org.apache.spark.scheduler.Task.run(Task.scala:112) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: (state=,code=0) > Expected : Select column with is null for no_inverted_index column should be > successful displaying the correct result set. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1055) Record count mismatch for Carbon query compared with Parquet for TPCH query 15
[ https://issues.apache.org/jira/browse/CARBONDATA-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297910#comment-16297910 ] Sangeeta Gulia commented on CARBONDATA-1055: I have tried it using 1GB data. It is working fine on that. Can you provide more details? > Record count mismatch for Carbon query compared with Parquet for TPCH query 15 > -- > > Key: CARBONDATA-1055 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1055 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.1.0 > Environment: 3 node cluster >Reporter: Chetan Bhat > Attachments: TPCH_query15.rar > > Original Estimate: 504h > Remaining Estimate: 504h > > User creates a table and loads TPCH data into different tables. > User executes all the select queries and compares the record count and > performance of the Carbon queries with parquet queries. > Actual Issue : Record count mismatch for Carbon query compared with Parquet > for TPCH query 15. > Carbon record count for TPCH query 15 - 71972 > Parquet record count for TPCH query 15 - 72343 > Expected : There should not be record count mismatch for Carbon query > compared with Parquet for TPCH query 15. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1672) Carbon 1.3.0-Partitioning:Hash Partition is not working as specified in the document.
[ https://issues.apache.org/jira/browse/CARBONDATA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290813#comment-16290813 ] Sangeeta Gulia commented on CARBONDATA-1672: This is the link to the latest documentation: https://carbondata.apache.org/data-management-on-carbondata.html which has provided the below syntax for creating hash partition table: CREATE TABLE IF NOT EXISTS hash_partition_table( col_A String, col_B Int, col_C Long, col_D Decimal(10,2), col_F Timestamp ) PARTITIONED BY (col_E Long) STORED BY 'carbondata' TBLPROPERTIES('PARTITION_TYPE'='HASH','NUM_PARTITIONS'='9') Note: please check the difference in NUM_PARTITIONS attribute for specifying the number of partitions. > Carbon 1.3.0-Partitioning:Hash Partition is not working as specified in the > document. > - > > Key: CARBONDATA-1672 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1672 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.3.0 >Reporter: Ayushi Sharma >Priority: Minor > Attachments: Part2.PNG, Partition1.PNG > > > create table Carb_part (P_PARTKEY BIGINT,P_NAME STRING,P_MFGR STRING,P_BRAND > STRING,P_TYPE STRING,P_CONTAINER STRING,P_RETAILPRICE DOUBLE,P_COMMENT > STRING)PARTITIONED BY (P_SIZE int) STORED BY 'CARBONDATA' > TBLPROPERTIES('partition_type'='HASH','partition_num'='3'); > This command displays error as mentioned below: > Error: org.apache.carbondata.spark.exception.MalformedCarbonCommandException: > Error: Invalid partition definition (state=,code=0) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (CARBONDATA-1758) Carbon1.3.0- No Inverted Index : Select column with is null for no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/CARBONDATA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290392#comment-16290392 ] Sangeeta Gulia edited comment on CARBONDATA-1758 at 12/14/17 12:50 PM: --- Please provide more details for this bug as I am not able to replicate this issue, neither on my local system or 3 node cluster. It is showing the result as per expectation. was (Author: sangeeta04): Please provide more details for this bug as i am not able to replicate this issue, neither on my local system or 3 node cluster. > Carbon1.3.0- No Inverted Index : Select column with is null for > no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException > > > Key: CARBONDATA-1758 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1758 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node cluster >Reporter: Chetan Bhat > Labels: Functional > > Steps : > In Beeline user executes the queries in sequence. > CREATE TABLE uniqdata_DI_int (CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, > BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 > double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='cust_id','NO_INVERTED_INDEX'='cust_id'); > LOAD DATA INPATH 'hdfs://hacluster/chetan/3000_UniqData.csv' into table > uniqdata_DI_int OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > Select count(CUST_ID) from uniqdata_DI_int; > Select count(CUST_ID)*10 as multiple from uniqdata_DI_int; > Select avg(CUST_ID) as average from uniqdata_DI_int; > Select floor(CUST_ID) as average from uniqdata_DI_int; > Select ceil(CUST_ID) as average from uniqdata_DI_int; > Select ceiling(CUST_ID) as average from uniqdata_DI_int; > Select CUST_ID*integer_column1 as multiple from uniqdata_DI_int; > Select CUST_ID from uniqdata_DI_int where CUST_ID is null; > *Issue : Select column with is null for no_inverted_index column throws > java.lang.ArrayIndexOutOfBoundsException* > 0: jdbc:hive2://10.18.98.34:23040> Select CUST_ID from uniqdata_DI_int where > CUST_ID is null; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 79.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 79.0 (TID 123, BLR114278, executor 18): > org.apache.spark.util.TaskCompletionListenerException: > java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105) > at org.apache.spark.scheduler.Task.run(Task.scala:112) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: (state=,code=0) > Expected : Select column with is null for no_inverted_index column should be > successful displaying the correct result set. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1679) Carbon 1.3.0-Partitioning:After Splitting the Partition,no records are displayed
[ https://issues.apache.org/jira/browse/CARBONDATA-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290778#comment-16290778 ] Sangeeta Gulia commented on CARBONDATA-1679: I am unable to replicate this issue. It is working correctly as per current master code. My csv contains the below records: 0|ALGERIA|0| haggle. carefully final deposits detect slyly agai| 1|ARGENTINA|1|al foxes promise slyly according to the regular accounts. bold requests alon| 2|BRAZIL|1|y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special | 3|CANADA|1|eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold| 4|EGYPT|4|y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d| 5|ETHIOPIA|0|ven packages wake quickly. regu| 6|FRANCE|3|refully final requests. regular, ironi| 7|GERMANY|3|l platelets. regular accounts x-ray: unusual, regular acco| 8|INDIA|2|ss excuses cajole slyly across the packages. deposits print aroun| 9|INDONESIA|2| slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull| 10|IRAN|4|efully alongside of the slyly final dependencies. | 11|IRAQ|4|nic deposits boost atop the quickly final requests? quickly regula| 12|JAPAN|2|ously. final, express gifts cajole a| 13|JORDAN|4|ic deposits are blithely about the carefully regular pa| 14|KENYA|0| pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t| 15|MOROCCO|0|rns. blithely bold courts among the closely regular packages use furiously bold platelets?| 16|MOZAMBIQUE|0|s. ironic, unusual asymptotes wake blithely r| 17|PERU|1|platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun| 18|CHINA|2|c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos| 19|ROMANIA|3|ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account| 20|SAUDI ARABIA|4|ts. silent requests haggle. closely express packages sleep across the blithely| 21|VIETNAM|2|hely enticingly express accounts. even, final | 22|RUSSIA|3| requests against the platelets use never according to the quickly regular pint| 23|UNITED KINGDOM|3|eans boost carefully special requests. accounts are. carefull| 24|UNITED STATES|1|y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be| and below is my output for the queries after "ALTER TABLE part_nation_4 SPLIT PARTITION(5) INTO('(EGYPT,ETHIOPIA)','FRANCE');": ++ | partition| ++ | 0, n_name = DEFAULT| | 1, n_name = ALGERIA| |2, n_name = ARGEN...| | 3, n_name = BRAZIL| | 4, n_name = CANADA| |7, n_name = EGYPT...| | 8, n_name = FRANCE| | 6, n_name = JAPAN| ++ +---+---++--+ |n_nationkey|n_regionkey| n_comment|n_name| +---+---++--+ | 6| 3|refully final req...|FRANCE| +---+---++--+ +---+---++--+ |n_nationkey|n_regionkey| n_comment|n_name| +---+---++--+ | 4| 4|y above the caref...| EGYPT| +---+---++--+ +---+---++--+ |n_nationkey|n_regionkey| n_comment|n_name| +---+---++--+ | 3| 1|eas hang ironic, ...|CANADA| +---+---++--+ Can you please retest it again with the csv records I have provided? > Carbon 1.3.0-Partitioning:After Splitting the Partition,no records are > displayed > > > Key: CARBONDATA-1679 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1679 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.3.0 >Reporter: Ayushi Sharma > Attachments: Split1.PNG > > > create table part_nation_4 (N_NATIONKEY BIGINT,N_REGIONKEY BIGINT,N_COMMENT > STRING) partitioned by (N_NAME STRING) stored by 'carbondata' > tblproperties('partition_type'='list','list_info'='ALGERIA,ARGENTINA,BRAZIL,CANADA,(EGYPT,ETHIOPIA,FRANCE),JAPAN'); > load data inpath '/spark-warehouse/tpchhive.db/nation/nation.tbl' into table > part_nation_4 > options('DELIMITER'='|','FILEHEADER'='N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT'); > show partitions part_nation_4; > ALTER TABLE part_nation_4 SPLIT PARTITION(5) > INTO('(EGYPT,ETHIOPIA)','FRANCE'); > show partitions part_nation_4; > select *
[jira] [Commented] (CARBONDATA-1758) Carbon1.3.0- No Inverted Index : Select column with is null for no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/CARBONDATA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290392#comment-16290392 ] Sangeeta Gulia commented on CARBONDATA-1758: Please provide more details for this bug as i am not able to replicate this issue, neither on my local system or 3 node cluster. > Carbon1.3.0- No Inverted Index : Select column with is null for > no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException > > > Key: CARBONDATA-1758 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1758 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node cluster >Reporter: Chetan Bhat > Labels: Functional > > Steps : > In Beeline user executes the queries in sequence. > CREATE TABLE uniqdata_DI_int (CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, > BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 > double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='cust_id','NO_INVERTED_INDEX'='cust_id'); > LOAD DATA INPATH 'hdfs://hacluster/chetan/3000_UniqData.csv' into table > uniqdata_DI_int OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > Select count(CUST_ID) from uniqdata_DI_int; > Select count(CUST_ID)*10 as multiple from uniqdata_DI_int; > Select avg(CUST_ID) as average from uniqdata_DI_int; > Select floor(CUST_ID) as average from uniqdata_DI_int; > Select ceil(CUST_ID) as average from uniqdata_DI_int; > Select ceiling(CUST_ID) as average from uniqdata_DI_int; > Select CUST_ID*integer_column1 as multiple from uniqdata_DI_int; > Select CUST_ID from uniqdata_DI_int where CUST_ID is null; > *Issue : Select column with is null for no_inverted_index column throws > java.lang.ArrayIndexOutOfBoundsException* > 0: jdbc:hive2://10.18.98.34:23040> Select CUST_ID from uniqdata_DI_int where > CUST_ID is null; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 79.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 79.0 (TID 123, BLR114278, executor 18): > org.apache.spark.util.TaskCompletionListenerException: > java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105) > at org.apache.spark.scheduler.Task.run(Task.scala:112) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: (state=,code=0) > Expected : Select column with is null for no_inverted_index column should be > successful displaying the correct result set. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1865) Skip Single Pass for first data load.
[ https://issues.apache.org/jira/browse/CARBONDATA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279941#comment-16279941 ] Sangeeta Gulia commented on CARBONDATA-1865: This issue will be resolved with PR https://github.com/apache/carbondata/pull/1622 > Skip Single Pass for first data load. > - > > Key: CARBONDATA-1865 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1865 > Project: CarbonData > Issue Type: Task >Affects Versions: 1.3.0 >Reporter: Sangeeta Gulia >Assignee: anubhav tarar >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1827) Add Support to provide S3 Functionality in Carbondata
Sangeeta Gulia created CARBONDATA-1827: -- Summary: Add Support to provide S3 Functionality in Carbondata Key: CARBONDATA-1827 URL: https://issues.apache.org/jira/browse/CARBONDATA-1827 Project: CarbonData Issue Type: New Feature Components: core Reporter: Sangeeta Gulia Priority: Minor Added Support to provide S3 Functionality in Carbondata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1460) Drop column in alter table working incorrectly when connected to same thrift using different beeline sessions
[ https://issues.apache.org/jira/browse/CARBONDATA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia closed CARBONDATA-1460. -- Resolution: Invalid > Drop column in alter table working incorrectly when connected to same thrift > using different beeline sessions > - > > Key: CARBONDATA-1460 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1460 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.2.0 > Environment: Spark-2.1 >Reporter: Sangeeta Gulia >Assignee: anubhav tarar >Priority: Minor > > I am trying to do concurrent testing on the same table. For that, I have > started my thrift server with the given command: > sudo /home/hduser/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --master > spark://host-name:7077 --class > org.apache.carbondata.spark.thriftserver.CarbonThriftServer > carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.2.0.jar > and two different nodes are connecting to same thrift using two beeline > sessions. > While beeline1 session executes a query : alter table uniqdata drop > columns(cust_id); > Then after execution of the query, beeline1 session gets error: > Error: org.apache.spark.sql.AnalysisException: cannot resolve '`cust_id`' > given input columns: [bigint_column1, double_column1, dob, doj, > active_emui_version, decimal_column2, bigint_column2, integer_column1, > cust_name, double_column2, decimal_column1]; line 1 pos 7; > While beeline2 session is able to get data for cust_id from that table using > the below command: > select cust_id from uniqdata; > But, when both the beeline session try to run "describe table uniqdata" they > see the same result which does not include cust_id as a column in the table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1424) Delete Operation working incorrectly when subquery returns bad-record
[ https://issues.apache.org/jira/browse/CARBONDATA-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156924#comment-16156924 ] Sangeeta Gulia commented on CARBONDATA-1424: Thanks for the information [~ravi.pesala]. I have verified the above query, its working as u told. But, I have found one thing which is a bit confusing. I tried to break down the first query into two queries, Ideally the result of both should come similar but this is not the case. However it is working the same way in hive as in carbondata. Below are my queries and its result, first query display only 1 record whereas third query gives 13 records. However both should give same output. QUERY1 ::: select * from uniqdata1 where cust_id in (select cust_id from uniqdata1 limit 10); +--++--+---+---+-+-+--+--+-+-+--+--+ | CUST_ID | CUST_NAME | ACTIVE_EMUI_VERSION | DOB | DOJ | BIGINT_COLUMN1 | BIGINT_COLUMN2 | DECIMAL_COLUMN1 | DECIMAL_COLUMN2 | Double_COLUMN1 | Double_COLUMN2 | INTEGER_COLUMN1 | +--++--+---+---+-+-+--+--+-+-+--+--+ | 8999 || | NULL | NULL | NULL | NULL| NULL | NULL | NULL| NULL| NULL | +--++--+---+---+-+-+--+--+-+-+--+--+ 1 row selected (10.485 seconds) 0: jdbc:hive2://localhost:1> 0: jdbc:hive2://localhost:1> 0: jdbc:hive2://localhost:1> QUERY2 > select cust_id from uniqdata1 limit 10; +--+--+ | cust_id | +--+--+ | NULL | | 8999 | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | +--+--+ 10 rows selected (0.225 seconds) QUERY3> select * from uniqdata1 where cust_id in (NULL,8999); +--+--++++-+-+-+-+--+---+--+--+ | CUST_ID |CUST_NAME |ACTIVE_EMUI_VERSION | DOB | DOJ | BIGINT_COLUMN1 | BIGINT_COLUMN2 | DECIMAL_COLUMN1 | DECIMAL_COLUMN2 |Double_COLUMN1| Double_COLUMN2 | INTEGER_COLUMN1 | +--+--++++-+-+-+-+--+---+--+--+ | NULL | || NULL | NULL | NULL| NULL| NULL | NULL| NULL | NULL | NULL | | 8999 | || NULL | NULL | NULL| NULL| NULL | NULL| NULL | NULL | NULL | | NULL | || NULL | NULL | 1233720368578 | NULL| NULL | NULL| NULL | NULL | NULL | | NULL | || NULL | NULL | NULL| -223372036854 | NULL | NULL| NULL | NULL | NULL | | NULL | || NULL | NULL | NULL| NULL| 12345678901.123458 | NULL| NULL | NULL | NULL | | NULL | || NULL | NULL | NULL| NULL| NULL | 22345678901.123459 | NULL | NULL | NULL | | NULL | || NULL | NULL | NULL| NULL| NULL | NULL| 1.12345674897976E10 | NULL | NULL | | NULL
[jira] [Created] (CARBONDATA-1460) Drop column in alter table working incorrectly when connected to same thrift using different beeline sessions
Sangeeta Gulia created CARBONDATA-1460: -- Summary: Drop column in alter table working incorrectly when connected to same thrift using different beeline sessions Key: CARBONDATA-1460 URL: https://issues.apache.org/jira/browse/CARBONDATA-1460 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.2.0 Environment: Spark-2.1 Reporter: Sangeeta Gulia Priority: Minor I am trying to do concurrent testing on the same table. For that, I have started my thrift server with the given command: sudo /home/hduser/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --master spark://host-name:7077 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.2.0.jar and two different nodes are connecting to same thrift using two beeline sessions. While beeline1 session executes a query : alter table uniqdata drop columns(cust_id); Then after execution of the query, beeline1 session gets error: Error: org.apache.spark.sql.AnalysisException: cannot resolve '`cust_id`' given input columns: [bigint_column1, double_column1, dob, doj, active_emui_version, decimal_column2, bigint_column2, integer_column1, cust_name, double_column2, decimal_column1]; line 1 pos 7; While beeline2 session is able to get data for cust_id from that table using the below command: select cust_id from uniqdata; But, when both the beeline session try to run "describe table uniqdata" they see the same result which does not include cust_id as a column in the table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1431) Dictionary_Include working incorrectly for date and timestamp data type.
[ https://issues.apache.org/jira/browse/CARBONDATA-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia closed CARBONDATA-1431. -- Resolution: Fixed > Dictionary_Include working incorrectly for date and timestamp data type. > > > Key: CARBONDATA-1431 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1431 > Project: CarbonData > Issue Type: Bug > Components: sql, test >Affects Versions: 1.2.0 >Reporter: Sangeeta Gulia >Assignee: Pallavi Singh >Priority: Minor > Fix For: 1.2.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When we create a table with date and timestamp data type with > DICTIONARY_INCLUDE : > Example : > CREATE TABLE uniqdata_INCLUDEDICTIONARY2 (CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, > BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 > double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2') > It should either create the dictionary for date and timestamp field or it > should throw an error that "DICTIONARY_INCLUDE" feature is not supported for > date and timestamp. > whereas in the current master branch, the query executed successfully > without throwing any error and neither it created dictionary files for date > and timestamp field. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1431) Dictionary_Include working incorrectly for date and timestamp data type.
[ https://issues.apache.org/jira/browse/CARBONDATA-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangeeta Gulia updated CARBONDATA-1431: --- Summary: Dictionary_Include working incorrectly for date and timestamp data type. (was: Dictionary_Include working incorrectly for date and timestamp format.) > Dictionary_Include working incorrectly for date and timestamp data type. > > > Key: CARBONDATA-1431 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1431 > Project: CarbonData > Issue Type: Bug > Components: sql, test >Affects Versions: 1.2.0 >Reporter: Sangeeta Gulia >Priority: Minor > Fix For: 1.2.0 > > > When we create a table with date and timestamp data type with > DICTIONARY_INCLUDE : > Example : > CREATE TABLE uniqdata_INCLUDEDICTIONARY2 (CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, > BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 > double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2') > It should either create the dictionary for date and timestamp field or it > should throw an error that "DICTIONARY_INCLUDE" feature is not supported for > date and timestamp. > whereas in the current master branch, the query executed successfully > without throwing any error and neither it created dictionary files for date > and timestamp field. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1431) Dictionary_Include working incorrectly for date and timestamp format.
Sangeeta Gulia created CARBONDATA-1431: -- Summary: Dictionary_Include working incorrectly for date and timestamp format. Key: CARBONDATA-1431 URL: https://issues.apache.org/jira/browse/CARBONDATA-1431 Project: CarbonData Issue Type: Bug Components: sql, test Affects Versions: 1.2.0 Reporter: Sangeeta Gulia Priority: Minor Fix For: 1.2.0 When we create a table with date and timestamp data type with DICTIONARY_INCLUDE : Example : CREATE TABLE uniqdata_INCLUDEDICTIONARY2 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2') It should either create the dictionary for date and timestamp field or it should throw an error that "DICTIONARY_INCLUDE" feature is not supported for date and timestamp. whereas in the current master branch, the query executed successfully without throwing any error and neither it created dictionary files for date and timestamp field. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1424) Delete Operation working incorrectly when subquery returns bad-record
Sangeeta Gulia created CARBONDATA-1424: -- Summary: Delete Operation working incorrectly when subquery returns bad-record Key: CARBONDATA-1424 URL: https://issues.apache.org/jira/browse/CARBONDATA-1424 Project: CarbonData Issue Type: Bug Components: sql, test Affects Versions: 1.2.0 Reporter: Sangeeta Gulia Priority: Minor Attachments: 3000_UniqData.csv Delete Operation is working incorrectly when subquery returns bad-record for a particular table. For the given query, delete from uniqdata_delete where cust_id in (select cust_id from uniqdata_delete limit 10); As an Example, if "select cust_id from uniqdata_delete limit 10" returns +--+--+ | cust_id | +--+--+ | NULL | | NULL | | NULL | | NULL | | 11000| | 11001 | | 11002 | | 11003 | | 11004 | | 11005| +--+--+ then the query should delete all rows where cust_id is Null or matches any values from the returned values(11000-11005) whereas it deletes only those records where customer id is from (11000-11005). I have attached the sample csv file which i have used for reference. To Regenerate the issue, you can use below commands : CREATE TABLE uniqdata_delete (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB"); LOAD DATA INPATH 'hdfs://localhost:54310/user/hduser/input-files/3000_UniqData.csv' into table uniqdata_delete OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); NOTE : Load should be such that starting rows of data should have null stored for cust_id field. delete from uniqdata_delete where cust_id in (select cust_id from uniqdata_delete limit 10); -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1412) delete working incorrectly while using segment.starttime before ''
Sangeeta Gulia created CARBONDATA-1412: -- Summary: delete working incorrectly while using segment.starttime before '' Key: CARBONDATA-1412 URL: https://issues.apache.org/jira/browse/CARBONDATA-1412 Project: CarbonData Issue Type: Bug Components: data-query, test Environment: Spark-2.1 Reporter: Sangeeta Gulia Priority: Minor Fix For: 1.2.0 Issue exists in the below query : delete from table uniqdata_delete where segment.starttime before 'starttime_of_last_segment_created'; It should mark all those segments for delete whose start-time is before the given time and should not delete the segment with the given time. But it is marking the segment for delete which is having the exact start time also. To replicate the issue: CREATE TABLE uniqdata_delete (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB") LOAD DATA INPATH 'hdfs://localhost:54310/user/hduser/input-files/3000_UniqData.csv' into table uniqdata_delete OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') LOAD DATA INPATH 'hdfs://localhost:54310/user/hduser/input-files/3000_UniqData.csv' into table uniqdata_delete OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') delete from table uniqdata_delete where segment.starttime before 'starttime_of_last_segment_created'; -- This message was sent by Atlassian JIRA (v6.4.14#64029)