[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1542 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1329/ ---
[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/1542 [CARBONDATA-1757] [PreAgg] Fix for wrong avg values after pre-agg table creation when a sum/count aggregation function is applied on the same column along with avg. The plan that was getting transformed was adding 2 columns for sum/count which resulted in wrong data being inserted. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done Test case added - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. No You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata pre_agg_avg_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1542.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1542 commit f05ba4faabc70c19574ed200ddb88909d69cf6e3 Author: kunal642 Date: 2017-11-21T07:25:25Z fixed wrong avg count bug when a sum/count aggregation function is applied on the same column along with avg. The plan that was getting transformed was adding 2 columns for sum/count which resulted in wrong data being inserted. ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1508 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1784/ ---
[jira] [Assigned] (CARBONDATA-1757) Carbon 1.3.0- Pre_aggregate: After creating datamap on parent table, avg is not correct.
[ https://issues.apache.org/jira/browse/CARBONDATA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor reassigned CARBONDATA-1757: Assignee: Kunal Kapoor > Carbon 1.3.0- Pre_aggregate: After creating datamap on parent table, avg is > not correct. > > > Key: CARBONDATA-1757 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1757 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 >Reporter: Ayushi Sharma >Assignee: Kunal Kapoor > Labels: functional > > Steps: > 1. create table cust_2 (c_custkey int, c_name string, c_address string, > c_nationkey bigint, c_phone string,c_acctbal decimal, c_mktsegment string, > c_comment string) STORED BY 'org.apache.carbondata.format'; > 2. load data inpath 'hdfs://hacluster/customer/customer3.csv' into table > cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer3.csv' into table cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer4.csv' into table cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer5.csv' into table cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer6.csv' into table cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer7.csv' into table cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer8.csv' into table cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer9.csv' into table cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer10.csv' into table > cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer11.csv' into table > cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer12.csv' into table > cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer13.csv' into table > cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > load data inpath 'hdfs://hacluster/customer/customer14.csv' into table > cust_2 > options('DELIMITER'='|','QUOTECHAR'='"','FILEHEADER'='c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment'); > 3. SELECT c_custkey, c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 GROUP > BY c_custkey, c_name; > 4. set carbon.input.segments.default.cust_2=0,1; > 5. SELECT c_custkey, c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 GROUP > BY c_custkey, c_name; > 6. CREATE DATAMAP tt1 ON TABLE cust_2 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" AS SELECT c_custkey, > c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 GROUP BY c_custkey, c_name; > 7. SELECT c_custkey, c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 > GROUP BY c_custkey, c_name; > 8. set carbon.input.segments.default.cust_2=*; > 9. SELECT c_custkey, c_name, sum(c_acctbal), avg(c_acctbal) FROM cust_2 GROUP > BY c_custkey, c_name; > Issue: > After creating datamap, avg is not correct > Expected Output: > Avg should have been displayed correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1537 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1783/ ---
[jira] [Created] (CARBONDATA-1787) Carbon 1.3.0- Global Sort: Global_Sort_Partitions parameter doesn't work, if specified in the Tblproperties, while creating the table.
Ayushi Sharma created CARBONDATA-1787: - Summary: Carbon 1.3.0- Global Sort: Global_Sort_Partitions parameter doesn't work, if specified in the Tblproperties, while creating the table. Key: CARBONDATA-1787 URL: https://issues.apache.org/jira/browse/CARBONDATA-1787 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.3.0 Reporter: Ayushi Sharma Priority: Minor Steps: 1. create table tstcust(c_custkey int, c_name string, c_address string, c_nationkey bigint, c_phone string,c_acctbal decimal, c_mktsegment string, c_comment string) STORED BY 'org.apache.carbondata.format' tblproperties('sort_scope'='global_sort','GLOBAL_SORT_PARTITIONS'='2'); Issue: Global_Sort_Partitions when specified during creation of table, it doesn't work, whereas the same property works, if specified during the data load. Expected: Either it should throw error for the property if it is specified in the load like it throws for the sort_scope or the same thing should be updated in the document. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1167: [CARBONDATA-1304] [IUD BuggFix] Iud with single pass
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1167 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1327/ ---
[jira] [Created] (CARBONDATA-1786) Getting null pointer exception while loading data into table and while fetching data getting NULL values
Vandana Yadav created CARBONDATA-1786: - Summary: Getting null pointer exception while loading data into table and while fetching data getting NULL values Key: CARBONDATA-1786 URL: https://issues.apache.org/jira/browse/CARBONDATA-1786 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.3.0 Environment: spark 2.1 Reporter: Vandana Yadav Priority: Blocker Attachments: 2000_UniqData.csv Getting null pointer exception while loading data into table and while fetching data getting NULL values Steps to reproduce: 1)Create table: CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB"); 2)Load Data LOAD DATA INPATH 'hdfs://localhost:54310/Data/uniqdata/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'='/' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','TIMESTAMPFORMAT'='-mm-dd hh:mm:ss'); 3) Expected result: it should load data into table successfully. 4) Actual Result: it throws an error Error: java.lang.NullPointerException (state=,code=0) logs: java.lang.NullPointerException at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.delete(AbstractDFSCarbonFile.java:142) at org.apache.carbondata.processing.util.DeleteLoadFolders.physicalFactAndMeasureMetadataDeletion(DeleteLoadFolders.java:79) at org.apache.carbondata.processing.util.DeleteLoadFolders.deleteLoadFoldersFromFileSystem(DeleteLoadFolders.java:134) at org.apache.carbondata.spark.rdd.DataManagementFunc$.deleteLoadsAndUpdateMetadata(DataManagementFunc.scala:188) at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:281) at org.apache.spark.sql.execution.command.management.LoadTableCommand.loadData(LoadTableCommand.scala:347) at org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:183) at org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:64) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) at org.apache.spark.sql.Dataset.(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:220) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:163) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:160) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:173) at ja
[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1540 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1782/ ---
[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1541 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1781/ ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1508 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1326/ ---
[GitHub] carbondata issue #1541: [CARBONDATA-1785][Build] add coveralls badge to carb...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1541 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1325/ ---
[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1536 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1780/ ---
[GitHub] carbondata pull request #1541: [CARBONDATA-1785][Build] add coveralls badge ...
GitHub user sraghunandan opened a pull request: https://github.com/apache/carbondata/pull/1541 [CARBONDATA-1785][Build] add coveralls badge to carbondata Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done NA Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/sraghunandan/carbondata-1 coverage_badge Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1541.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1541 commit b6b38a11ef4f97ef57168aa987d45dbed338b8eb Author: sraghunandan Date: 2017-11-21T05:28:14Z add coveralls badge to carbondata ---
[jira] [Created] (CARBONDATA-1785) Add Coveralls codecoverage badge to carbondata
Venkata Ramana G created CARBONDATA-1785: Summary: Add Coveralls codecoverage badge to carbondata Key: CARBONDATA-1785 URL: https://issues.apache.org/jira/browse/CARBONDATA-1785 Project: CarbonData Issue Type: Improvement Reporter: Venkata Ramana G Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1508 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1324/ ---
[GitHub] carbondata pull request #1416: [WIP] [CARBONDATA-1592] Adding event listener...
Github user manishgupta88 closed the pull request at: https://github.com/apache/carbondata/pull/1416 ---
[GitHub] carbondata issue #1416: [WIP] [CARBONDATA-1592] Adding event listener interf...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/1416 Code already merged as part of PR #1473 ---
[GitHub] carbondata pull request #1539: [CARBONDATA-1780] Create configuration from S...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1539 ---
[GitHub] carbondata issue #1539: [CARBONDATA-1780] Create configuration from SparkSes...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1539 LGTM ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1323/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1321/ ---
[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1540 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1322/ ---
[GitHub] carbondata issue #1503: [CARBONDATA-1730] Support skip.header.line.count opt...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1503 @sehriff please squash all commits to one commit ---
[GitHub] carbondata pull request #1540: [CARBONDATA-1784] clear column group code
GitHub user chenliang613 opened a pull request: https://github.com/apache/carbondata/pull/1540 [CARBONDATA-1784] clear column group code Clear column group code. - [X] Any interfaces changed? NA - [X] Any backward compatibility impacted? NA - [X] Document update required? NA - [X] Testing done NA - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenliang613/carbondata col_group Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1540.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1540 commit 0324cce43cde7cb753b6fe958ec01d6312acabe4 Author: chenliang613 Date: 2017-11-21T03:04:49Z [CARBONDATA-1784] clear column group code ---
[jira] [Created] (CARBONDATA-1784) Clear column group code
Liang Chen created CARBONDATA-1784: -- Summary: Clear column group code Key: CARBONDATA-1784 URL: https://issues.apache.org/jira/browse/CARBONDATA-1784 Project: CarbonData Issue Type: Task Components: core Reporter: Liang Chen Assignee: Liang Chen Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1499: [WIP][CARBONDATA-1235]Add Lucene Datamap
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1499 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1320/ ---
[GitHub] carbondata issue #1538: [CARBONDATA-1779] GenericVectorizedReader
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1538 @bhavya411 please add the detail description for this pull request. ---
[GitHub] carbondata pull request #1538: [CARBONDATA-1779] GenericVectorizedReader
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1538#discussion_r152168110 --- Diff: integration/presto/pom.xml --- @@ -31,7 +31,7 @@ presto-plugin -0.186 --- End diff -- why changed presto version again ? ---
[GitHub] carbondata issue #1496: [CARBONDATA-1709][DataFrame] Support sort_columns op...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1496 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1319/ ---
[GitHub] carbondata pull request #1516: [CARBONDATA-1729]Fix the compatibility issue ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1516 ---
[jira] [Commented] (CARBONDATA-1778) Support clean garbage segments for all
[ https://issues.apache.org/jira/browse/CARBONDATA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260118#comment-16260118 ] xuchuanyin commented on CARBONDATA-1778: [~chenerlu] Aren't the garbage segments be cleaned in a specific period? Will it be better to leave the work to a timer? > Support clean garbage segments for all > -- > > Key: CARBONDATA-1778 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1778 > Project: CarbonData > Issue Type: Improvement >Reporter: chenerlu >Assignee: chenerlu >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1777: Assignee: kumar vishal (was: Kunal Kapoor) > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: kumar vishal >Priority: Minor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,360 | INFO | [pool-23-thread-53] | Reading in results > for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection > used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77) > 2017-11-20 15:46:48,362 | INFO | [pool-23-thread-53] | Using direct SQL, > underlying DB is MYSQL | > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.
[jira] [Updated] (CARBONDATA-1783) (Carbon1.3.0 - Streaming) Error "Failed to filter row in vector reader" when filter query executed on streaming data
[ https://issues.apache.org/jira/browse/CARBONDATA-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1783: Description: Steps :- Spark submit thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/hive/warehouse/carbon.store" Spark shell is launched using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From Spark shell user creates table and loads data in the table as shown below. import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "all_datatypes_2048" sql(s"create table all_datatypes_2048 (imei string,deviceInformationId int,MAC string,deviceColor string,device_backColor string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series string,productionDate timestamp,bomCode string,internalModels string, deliveryTime string, channelsId string, channelsName string , deliveryAreaId string, deliveryCountry string, deliveryProvince string, deliveryCity string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion string, Active_BacVerNumber string, Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, Latest_country string, Latest_province string, Latest_city string, Latest_district string, Latest_stree
[jira] [Updated] (CARBONDATA-1783) (Carbon1.3.0 - Streaming) Error "Failed to filter row in vector reader" when filter query executed on streaming data
[ https://issues.apache.org/jira/browse/CARBONDATA-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1783: Description: Steps :- Spark submit thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/hive/warehouse/carbon.store" Spark shell is launched using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From Spark shell user creates table and loads data in the table as shown below. import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "all_datatypes_2048" sql(s"create table all_datatypes_2048 (imei string,deviceInformationId int,MAC string,deviceColor string,device_backColor string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series string,productionDate timestamp,bomCode string,internalModels string, deliveryTime string, channelsId string, channelsName string , deliveryAreaId string, deliveryCountry string, deliveryProvince string, deliveryCity string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion string, Active_BacVerNumber string, Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, Latest_country string, Latest_province string, Latest_city string, Latest_district string, Latest_stree
[jira] [Updated] (CARBONDATA-1782) (Carbon1.3.0 - Streaming) Select regexp_extract from table with where clause having is null throws indexoutofbounds exception
[ https://issues.apache.org/jira/browse/CARBONDATA-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1782: Description: Steps : Thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/sparkhive/warehouse" Spark shell is launched using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From Spark shell the streaming table is created and data is loaded to the >streaming table. import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "uniqdata" sql(s"CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('streaming'='true')") sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS( 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')") val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore. lookupRelation(Some("default"), streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier) val port = 8006 val serverSocket = new ServerSocket(port) val socketThread = writeSocket(serverSocket) val streamingThread = startStreaming(carbonSession, tablePath, streamTableName, port) >From Beeline user executes the query select regexp_extract(CUST_NAME,'a',1)from uniqdata where regexp_extract(CUS
[jira] [Created] (CARBONDATA-1783) (Carbon1.3.0 - Streaming) Error "Failed to filter row in vector reader" when filter query executed on streaming data
Chetan Bhat created CARBONDATA-1783: --- Summary: (Carbon1.3.0 - Streaming) Error "Failed to filter row in vector reader" when filter query executed on streaming data Key: CARBONDATA-1783 URL: https://issues.apache.org/jira/browse/CARBONDATA-1783 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.3.0 Environment: 3 node ant cluster Reporter: Chetan Bhat Steps :- Spark submit thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/hive/warehouse/carbon.store" Spark shell is launched using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From Spark shell user creates table and loads data in the table as shown below. import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "all_datatypes_2048" sql(s"create table all_datatypes_2048 (imei string,deviceInformationId int,MAC string,deviceColor string,device_backColor string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series string,productionDate timestamp,bomCode string,internalModels string, deliveryTime string, channelsId string, channelsName string , deliveryAreaId string, deliveryCountry string, deliveryProvince string, deliveryCity string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion string, Active_BacVerNumber string, Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer string,Active_webTypeData
[jira] [Created] (CARBONDATA-1782) (Carbon1.3.0 - Streaming) Select regexp_extract from table with where clause having is null throws indexoutofbounds exception
Chetan Bhat created CARBONDATA-1782: --- Summary: (Carbon1.3.0 - Streaming) Select regexp_extract from table with where clause having is null throws indexoutofbounds exception Key: CARBONDATA-1782 URL: https://issues.apache.org/jira/browse/CARBONDATA-1782 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.3.0 Environment: 3 node ant cluster Reporter: Chetan Bhat Steps : Thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/sparkhive/warehouse" Spark shell is launched using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From Spark shell the streaming table is created and data is loaded to the >streaming table. import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "uniqdata" sql(s"CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('streaming'='true')") sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS( 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')") val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore. lookupRelation(Some("default"), streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdenti
[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1525 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1779/ ---
[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1534 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1318/ ---
[GitHub] carbondata pull request #1484: [CARBONDATA-1700][DataLoad] Add TableProperti...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1484 ---
[jira] [Resolved] (CARBONDATA-1700) Failed to load data to existed table after spark session restarted
[ https://issues.apache.org/jira/browse/CARBONDATA-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1700. - Resolution: Fixed > Failed to load data to existed table after spark session restarted > -- > > Key: CARBONDATA-1700 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1700 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 >Reporter: xuchuanyin >Assignee: xuchuanyin > Fix For: 1.3.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > # scenario > I encounterd loading data to existed carbondata table failure after query the > table after restarting spark session. I have this failure in spark local mode > (found it during local test) and haven't test in other scenarioes. > The problem can be reproduced by following steps: > 0. START: start a session; > 1. CREATE: create table `t1`; > 2. LOAD: create a dataframe and write apppend to `t1`; > 3. STOP: stop current session; > 4. START: start a session; > 5. QUERY: query table `t1`; This step is essential to reproduce the > problem. > 6. LOAD: create a dataframe and write append to `t1`; --- This step will be > failed. > Error will be thrown in Step6. The error message in console looks like > ``` > java.lang.NullPointerException was thrown. > java.lang.NullPointerException > at > org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:92) > at > org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:60) > at > org.apache.spark.sql.CarbonDataFrameWriter.loadDataFrame(CarbonDataFrameWriter.scala:141) > at > org.apache.spark.sql.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:50) > at > org.apache.spark.sql.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:42) > at org.apache.spark.sql.CarbonSource.createRelation(CarbonSource.scala:110) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) > ``` > The following code can be pasted in `TestLoadDataFrame.scala` to reproduce > this problem —— but keep > in mind you should manually run the first test and then the second in > different iteration (to make sure that the sparksession is restarted). > ``` > test("prepare") { > sql("drop table if exists carbon_stand_alone") > sql( "create table if not exists carbon_stand_alone (c1 string, c2 > string, c3 int)" + > " stored by 'carbondata'").collect() > sql("select * from carbon_stand_alone").show() > df.write > .format("carbondata") > .option("tableName", "carbon_stand_alone") > .option("tempCSV", "false") > .mode(SaveMode.Append) > .save() > } > test("test load dataframe after query") { > sql("select * from carbon_stand_alone").show() > // the following line will cause failure > df.write > .format("carbondata") > .option("tableName", "carbon_stand_alone") > .option("tempCSV", "false") > .mode(SaveMode.Append) > .save() > // if it works fine, it sould be true > checkAnswer( > sql("select count(*) from carbon_stand_alone where c3 > 500"), > Row(31500 * 2) > ) > } > ``` > # ANALYSE > I went through the code and found the problem was caused by NULL > `tableProperties` in `tablemeta: tableMeta.carbonTable.getTableInfo > .getFactTable.getTableProperties` (we will name it > `propertyInTableInfo` for short) is null in Line89 in > `LoadTableCommand.scala`. > After debug, I found that the `propertyInTableInfo` sett in > `CarbonTableInputFormat.setTableInfo(...)` had the correct value. But > `CarbonTableInputFormat.getTableInfo(...)` had the incorrect value. The > setter is used to serialized TableInfo, while the getter is used to > deserialized TableInfo That means there are something wrong in > serialization-deserialization. > Keep diving into the code, I found that serialization and deserialization in > `TableSchema`, a member of `TableInfo`, ignores the `tableProperties` member, > thus causing this value empty after deserialization. Since this value has not > been initialized in construtor, so the value remains `NULL` and cause the NPE > problem. > # RESOLVE > 1. Initialize `tableProperties` in `TableSchema` > 2. Include `tableProperties` in serialization-deserialization of `TableSchema` > # Notes > Although the bug has been fix, I still can't understand why the problem can > be triggered in above way. > Tests need the sparksession to be restarted, which is impossible currently, > so no tests will be added. -- This message was sent by Atlassian JIRA (v6.4.
[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1484 LGTM ---
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1516 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1778/ ---
[GitHub] carbondata issue #1539: [CARBONDATA-1780] Create configuration from SparkSes...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1539 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1317/ ---
[GitHub] carbondata issue #1514: [CARBONDATA-1746] Count star optimization
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1514 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1777/ ---
[GitHub] carbondata issue #1539: [CARBONDATA-1780] Create configuration from SparkSes...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1539 retest this please ---
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1516 LGTM ---
[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1534 @sgururajshetty ok ---
[jira] [Updated] (CARBONDATA-1781) (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS or thrift server is killed when str
[ https://issues.apache.org/jira/browse/CARBONDATA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1781: Description: *Steps :* Thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/hive/warehouse/carbon.store" Spark shell is opened using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From spark shell the below code is executed - import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "brinjal" sql(s"drop table brinjal").show sql(s"create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('streaming'='true','table_blocksize'='1')") sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge')") val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore. lookupRelation(Some("default"), streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier) val port = 8002 val serverSocket = new ServerSocket(port) val socketThread = writeSocket(serverSocket) val streamingThread = startStreaming(carbonSession, tablePath, streamTableName, port) >From other terminal user deletes the .streaming file - >BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs -rm >-r /user/hive/ware
[jira] [Updated] (CARBONDATA-1781) (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS or thrift server is killed when str
[ https://issues.apache.org/jira/browse/CARBONDATA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1781: Description: *Steps :* Thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/hive/warehouse/carbon.store" Spark shell is opened using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From spark shell the below code is executed - import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "brinjal" sql(s"drop table brinjal").show sql(s"create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('streaming'='true','table_blocksize'='1')") sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge')") val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore. lookupRelation(Some("default"), streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier) val port = 8002 val serverSocket = new ServerSocket(port) val socketThread = writeSocket(serverSocket) val streamingThread = startStreaming(carbonSession, tablePath, streamTableName, port) >From other terminal user deletes the .streaming file - >BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs -rm >-r /user/hive/ware
[jira] [Updated] (CARBONDATA-1781) (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS or thrift server is killed when str
[ https://issues.apache.org/jira/browse/CARBONDATA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-1781: Summary: (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS or thrift server is killed when streaming in progress (was: (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS) > (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) > is success when .streaming file is removed from HDFS or thrift server is > killed when streaming in progress > --- > > Key: CARBONDATA-1781 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1781 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: Chetan Bhat > Labels: DFX > > *Steps :* > Thrift server is started using the command - bin/spark-submit --master > yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G > --num-executors 3 --class > org.apache.carbondata.spark.thriftserver.CarbonThriftServer > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > "hdfs://hacluster/user/hive/warehouse/carbon.store" > Spark shell is opened using the command - bin/spark-shell --master > yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G > --num-executors 3 --jars > /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > From spark shell the below code is executed - > import java.io.{File, PrintWriter} > import java.net.ServerSocket > import org.apache.spark.sql.{CarbonEnv, SparkSession} > import org.apache.spark.sql.hive.CarbonRelation > import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} > import org.apache.carbondata.core.constants.CarbonCommonConstants > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, > "/MM/dd") > import org.apache.spark.sql.CarbonSession._ > val carbonSession = SparkSession. > builder(). > appName("StreamExample"). > > getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") > > carbonSession.sparkContext.setLogLevel("INFO") > def sql(sql: String) = carbonSession.sql(sql) > def writeSocket(serverSocket: ServerSocket): Thread = { > val thread = new Thread() { > override def run(): Unit = { > // wait for client to connection request and accept > val clientSocket = serverSocket.accept() > val socketWriter = new PrintWriter(clientSocket.getOutputStream()) > var index = 0 > for (_ <- 1 to 1000) { > // write 5 records per iteration > for (_ <- 0 to 100) { > index = index + 1 > socketWriter.println(index.toString + ",name_" + index >+ ",city_" + index + "," + (index * > 1.00).toString + >",school_" + index + ":school_" + index + > index + "$" + index) > } > socketWriter.flush() > Thread.sleep(2000) > } > socketWriter.close() > System.out.println("Socket closed") > } > } > thread.start() > thread > } > > def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, > tableName: String, port: Int): Thread = { > val thread = new Thread() { > override def run(): Unit = { > var qry: StreamingQuery = null > try { > val readSocketDF = spark.readStream > .format("socket") > .option("host", "10.18.98.34") > .option("port", port) > .load() > qry = readSocketDF.writeStream > .format("carbondata") > .trigger(ProcessingTime("5 seconds")) > .option("checkpointLocation", tablePath.getStreamingCheckpointDir) > .option("tablePath", tablePath.getPath).option("tableName", > tableName) > .start() > qry.awaitTermination() > } catch { > case ex: Throwable => > ex.printStackTrace() > println("Done reading and writing streaming data") > } finally { > qry.stop() > } > } > } > thread.start() > thread > } > val streamTableName = "brinjal" > sql(s"drop table brinjal").show > sql(s"create table brinjal (imei string,AMSi
[GitHub] carbondata issue #1514: [CARBONDATA-1746] Count star optimization
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1514 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1776/ ---
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1516 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1775/ ---
[GitHub] carbondata pull request #1514: [CARBONDATA-1746] Count star optimization
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1514 ---
[GitHub] carbondata issue #1534: [CARBONDATA-1770] Update error docs and consolidate ...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1534 @chenliang613 kindly fins my comments The following description can be added for user to know what it does. Description about Minor & Major compaction. Description for Partition and types. ---
[jira] [Created] (CARBONDATA-1781) (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS
Chetan Bhat created CARBONDATA-1781: --- Summary: (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS Key: CARBONDATA-1781 URL: https://issues.apache.org/jira/browse/CARBONDATA-1781 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.3.0 Environment: 3 node ant cluster Reporter: Chetan Bhat *Steps :* Thrift server is started using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/hive/warehouse/carbon.store" Spark shell is opened using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar >From spark shell the below code is executed - import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/carbon.store") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "brinjal" sql(s"drop table brinjal").show sql(s"create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('streaming'='true','table_blocksize'='1')") sql(s"LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge')") val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore. lookupRelation(Some("default"), streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier) val port = 8002 val serverSocke
[jira] [Resolved] (CARBONDATA-1771) While segment_index compaction, .carbonindex files of invalid segments are also getting merged
[ https://issues.apache.org/jira/browse/CARBONDATA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-1771. - Resolution: Fixed Fix Version/s: 1.3.0 > While segment_index compaction, .carbonindex files of invalid segments are > also getting merged > -- > > Key: CARBONDATA-1771 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1771 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1535: [CARBONDATA-1771] While segment_index compact...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1535 ---
[GitHub] carbondata issue #1535: [CARBONDATA-1771] While segment_index compaction, .c...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1535 LGTM ---
[GitHub] carbondata issue #1514: [CARBONDATA-1746] Count star optimization
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1514 LGTM ---
[GitHub] carbondata issue #1539: [CARBONDATA-1780] Create configuration from SparkSes...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1539 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1316/ ---
[GitHub] carbondata issue #1535: [CARBONDATA-1771] While segment_index compaction, .c...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1535 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1774/ ---
[GitHub] carbondata pull request #1539: [CARBONDATA-1780] Create configuration from S...
GitHub user QiangCai opened a pull request: https://github.com/apache/carbondata/pull/1539 [CARBONDATA-1780] Create configuration from SparkSession for data loading Create configuration from SparkSession for data loading, so that we can set configuration into SparkSession during dataloading. - [x] Any interfaces changed? - [x] Any backward compatibility impacted? - [x] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/carbondata configuration Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1539.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1539 commit 13f71680c1fe55e670935b05572ec11b3632057b Author: QiangCai Date: 2017-11-20T10:38:20Z create configuration from sparksession for Data Loading ---
[jira] [Commented] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259086#comment-16259086 ] Ramakrishna S commented on CARBONDATA-1777: --- [~kumarvishal], this happens when pre-aggregate table is created in a different session (spark-shell). but select * on aggregate table is working fine. > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,360 | INFO | [pool-23-thread-53] | Reading in results > for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection > used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77) > 2017-11-20 15:46:48,362 | I
[GitHub] carbondata pull request #1536: [CARBONDATA-1776] Fix some possible test erro...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1536 ---
[jira] [Updated] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramakrishna S updated CARBONDATA-1777: -- Priority: Minor (was: Major) > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor >Priority: Minor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,360 | INFO | [pool-23-thread-53] | Reading in results > for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection > used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77) > 2017-11-20 15:46:48,362 | INFO | [pool-23-thread-53] | Using direct SQL, > underlying DB is MYSQL | > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql
[GitHub] carbondata issue #1469: [WIP] Spark-2.2 Carbon Integration - Phase 1
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1469 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1314/ ---
[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1536 LGTM ---
[jira] [Created] (CARBONDATA-1780) Create configuration from SparkSession for data loading
QiangCai created CARBONDATA-1780: Summary: Create configuration from SparkSession for data loading Key: CARBONDATA-1780 URL: https://issues.apache.org/jira/browse/CARBONDATA-1780 Project: CarbonData Issue Type: Improvement Reporter: QiangCai Create configuration form SparkSession for data loading -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1525: [CARBONDATA-1751] Make the type of exception ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1525 ---
[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1525 LGTM ---
[jira] [Updated] (CARBONDATA-1711) Carbon1.3.0-DataMap - Show datamap on table does not work
[ https://issues.apache.org/jira/browse/CARBONDATA-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramakrishna S updated CARBONDATA-1711: -- Summary: Carbon1.3.0-DataMap - Show datamap on table does not work (was: Carbon1.3.0-Pre-AggregateTable - Show datamap on table does not work) > Carbon1.3.0-DataMap - Show datamap on table does not work > -- > > Key: CARBONDATA-1711 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1711 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 1.3.0 > Environment: Test >Reporter: Ramakrishna S >Priority: Minor > Labels: Functional > Fix For: 1.3.0 > > > 0: jdbc:hive2://10.18.98.34:23040> create datamap agr_lineitem ON TABLE > lineitem USING "org.apache.carbondata.datamap.AggregateDataMapHandler" as > select L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from > lineitem group by L_RETURNFLAG, L_LINESTATUS; > Error: java.lang.RuntimeException: Table [lineitem_agr_lineitem] already > exists under database [default] (state=,code=0) > 0: jdbc:hive2://10.18.98.34:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | default | flow_carbon_test4 | false| > | default | jl_r3 | false| > | default | lineitem | false| > | default | lineitem_agr_lineitem | false| > | default | sensor_reading_blockblank_false | false| > | default | sensor_reading_blockblank_false1 | false| > | default | sensor_reading_blockblank_false2 | false| > | default | sensor_reading_false | false| > | default | sensor_reading_true | false| > | default | t1| false| > | default | t1_agg_t1 | false| > | default | tc4 | false| > | default | uniqdata | false| > +---+---+--+--+ > 13 rows selected (0.04 seconds) > 0: jdbc:hive2://10.18.98.34:23040> show datamap on table lineitem; > Error: java.lang.RuntimeException: > BaseSqlParser > missing 'FUNCTIONS' at 'on'(line 1, pos 13) > == SQL == > show datamap on table lineitem > -^^^ > CarbonSqlParser [1.6] failure: identifier matching regex (?i)SEGMENTS > expected > show datamap on table lineitem -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1525 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1773/ ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1508 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1313/ ---
[GitHub] carbondata issue #1538: [CARBONDATA-1779] GenericVectorizedReader
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1538 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1312/ ---
[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1525 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1772/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1311/ ---
[GitHub] carbondata pull request #1538: [CARBONDATA-1779] GenericVectorizedReader
GitHub user bhavya411 opened a pull request: https://github.com/apache/carbondata/pull/1538 [CARBONDATA-1779] GenericVectorizedReader Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - No interfaces changed? - No backward compatibility impacted? - No Document update required? - [ Yes] Testing done - All Unit test cases are passing, no new unit test cases were needed as this PR implements a Generic Vectorized Reader. - Manual Testing completed for the same . You can merge this pull request into a Git repository by running: $ git pull https://github.com/bhavya411/incubator-carbondata CARBONDATA-1779 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1538.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1538 commit ef28391c656cc2d20082e52dd4ab729b0992cfb3 Author: Bhavya Date: 2017-11-14T10:05:44Z Added Generic vectorized Reader ---
[GitHub] carbondata issue #1525: [CARBONDATA-1751] Make the type of exception and mes...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1525 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1771/ ---
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1516 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1770/ ---
[jira] [Updated] (CARBONDATA-1779) GeneriVectorizedReader for Presto
[ https://issues.apache.org/jira/browse/CARBONDATA-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavya Aggarwal updated CARBONDATA-1779: Summary: GeneriVectorizedReader for Presto (was: GeneriVectorizedReade for Presto) > GeneriVectorizedReader for Presto > - > > Key: CARBONDATA-1779 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1779 > Project: CarbonData > Issue Type: Improvement > Components: presto-integration >Affects Versions: 1.3.0 >Reporter: Bhavya Aggarwal >Assignee: Bhavya Aggarwal >Priority: Minor > > Write a Generic Vectorized Reader for Presto to remove the dependencies on > spark -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1779) GeneriVectorizedReade for Presto
Bhavya Aggarwal created CARBONDATA-1779: --- Summary: GeneriVectorizedReade for Presto Key: CARBONDATA-1779 URL: https://issues.apache.org/jira/browse/CARBONDATA-1779 Project: CarbonData Issue Type: Improvement Components: presto-integration Affects Versions: 1.3.0 Reporter: Bhavya Aggarwal Assignee: Bhavya Aggarwal Priority: Minor Write a Generic Vectorized Reader for Presto to remove the dependencies on spark -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1537: [CARBONDATA-1778] Support clean data for all
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1537 [CARBONDATA-1778] Support clean data for all Modification reasons: Now Carbon only support clean garbage segments for specified table. Carbon should provide the ability to clean all garbage segments without specified the database name and table name. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata cleanfile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1537.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1537 commit d5e9b19809b75f3cb8af27ff059c24b25e552309 Author: chenerlu Date: 2017-11-20T09:01:42Z Support clean data for all ---
[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1536 Please review it @jackylk ---
[jira] [Commented] (CARBONDATA-1778) Support clean garbage segments for all
[ https://issues.apache.org/jira/browse/CARBONDATA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258972#comment-16258972 ] chenerlu commented on CARBONDATA-1778: -- Now Carbon only support clean garbage segments for specified table. Carbon should provide the ability to clean all garbage segments without specified the database name and table name. > Support clean garbage segments for all > -- > > Key: CARBONDATA-1778 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1778 > Project: CarbonData > Issue Type: Improvement >Reporter: chenerlu >Assignee: chenerlu >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1778) Support clean garbage segments for all
chenerlu created CARBONDATA-1778: Summary: Support clean garbage segments for all Key: CARBONDATA-1778 URL: https://issues.apache.org/jira/browse/CARBONDATA-1778 Project: CarbonData Issue Type: Improvement Reporter: chenerlu Assignee: chenerlu Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258966#comment-16258966 ] kumar vishal commented on CARBONDATA-1777: -- [~Ram@huawei] please check the executor log in executor log you will get the detail: Query will be executed on table: > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,360 | INFO | [pool-23-thread-53] | Reading in results > for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection > used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77) > 2017-11-20 15:46:48,362 | INFO | [pool-23-thread-53] | Using di
[jira] [Comment Edited] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258966#comment-16258966 ] kumar vishal edited comment on CARBONDATA-1777 at 11/20/17 8:59 AM: [~Ram@huawei] please check the executor log in executor log you will get the detail: Query will be executed on table: And you can check the query plan which table it is hitting to execute the query was (Author: kumarvishal09): [~Ram@huawei] please check the executor log in executor log you will get the detail: Query will be executed on table: > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,
[GitHub] carbondata issue #1503: [CARBONDATA-1730] Support skip.header.line.count opt...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1503 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1310/ ---
[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1536 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1309/ ---
[GitHub] carbondata issue #1460: [Docs] Fix partition-guide.md docs NUM_PARTITIONS wr...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1460 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1769/ ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] Block direct insert/load on pre-ag...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1508 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1308/ ---
[GitHub] carbondata issue #1460: [Docs] Fix partition-guide.md docs NUM_PARTITIONS wr...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1460 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1768/ ---
[GitHub] carbondata issue #1536: [CARBONDATA-1776] Fix some possible test errors that...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1536 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1307/ ---
[GitHub] carbondata issue #1516: [CARBONDATA-1729]Fix the compatibility issue with ha...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/1516 @jackylk @chenliang613 @QiangCai According to Jacky's suggestion, just use java reflection for FileSystem.truncate in FileFactory.java, please review, thanks. ---