[GitHub] carbondata issue #1548: [WIP]test PR
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1548 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1359/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1545 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1812/ ---
[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1540 retest this please ---
[GitHub] carbondata issue #1547: [CARBONDATA-1792]add example of data management for ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1547 the pr title please change to : [CARBONDATA-1792] Add example of data management for Spark2.X ---
[GitHub] carbondata issue #1547: [CARBONDATA-1792]add example of data management for ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1547 retest this please ---
[GitHub] carbondata issue #1549: [CARBONDATA-1793] Insert / update is allowing more t...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1549 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1358/ ---
[GitHub] carbondata issue #1547: [CARBONDATA-1792]add example of data management for ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1547 Please follow the below template to provide pull request description. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. ---
[GitHub] carbondata issue #1547: [CARBONDATA-1792]add example of data management for ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1547 add to whitelist ---
[GitHub] carbondata issue #1543: [CARBONDATA-1786] [BugFix] Refactored code to avoid ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1543 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1811/ ---
[GitHub] carbondata issue #1543: [CARBONDATA-1786] [BugFix] Refactored code to avoid ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1543 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1357/ ---
[GitHub] carbondata issue #1116: [CARBONDATA-1249] Wrong order of columns in redirect...
Github user mohammadshahidkhan commented on the issue: https://github.com/apache/carbondata/pull/1116 @jackylk Added details in the PR description ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152480438 --- Diff: docs/data-management-on-carbondata.md --- @@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ## COMPACTION -This command merges the specified number of segments into one segment, compaction help to improve query performance. -``` + Compaction help to improve query performance, because frequently load data, will generate several CarbonData files, because data is sorted only within each load(per load per segment and one B+ tree index). + This means that there will be one index for each load and as number of data load increases, the number of indices also increases. + Compaction feature combines several segments into one large segment by merge sorting the data from across the segments. + + There are two types of compaction Minor and Major compaction. + + ``` ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' -``` + ``` - **Minor Compaction** + + In minor compaction the user can specify how many loads to be merged. + Minor compaction triggers for every data load if the parameter carbon.enable.auto.load.merge is set to true. + If any segments are available to be merged, then compaction will run parallel with data load, there are 2 levels in minor compaction: + * Level 1: Merging of the segments which are not yet compacted. + * Level 2: Merging of the compacted segments again to form a bigger segment. + ``` ALTER TABLE table_name COMPACT 'MINOR' ``` - **Major Compaction** + + In Major compaction, many segments can be merged into one big segment. --- End diff -- In Major compaction, multiple segments can be merged into one large segment. ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152480127 --- Diff: docs/data-management-on-carbondata.md --- @@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ## COMPACTION -This command merges the specified number of segments into one segment, compaction help to improve query performance. -``` + Compaction help to improve query performance, because frequently load data, will generate several CarbonData files, because data is sorted only within each load(per load per segment and one B+ tree index). + This means that there will be one index for each load and as number of data load increases, the number of indices also increases. + Compaction feature combines several segments into one large segment by merge sorting the data from across the segments. + + There are two types of compaction Minor and Major compaction. --- End diff -- There are two types of copaction, Minor and Major compaction. ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152480541 --- Diff: docs/data-management-on-carbondata.md --- @@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ## COMPACTION -This command merges the specified number of segments into one segment, compaction help to improve query performance. -``` + Compaction help to improve query performance, because frequently load data, will generate several CarbonData files, because data is sorted only within each load(per load per segment and one B+ tree index). + This means that there will be one index for each load and as number of data load increases, the number of indices also increases. + Compaction feature combines several segments into one large segment by merge sorting the data from across the segments. + + There are two types of compaction Minor and Major compaction. + + ``` ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' -``` + ``` - **Minor Compaction** + + In minor compaction the user can specify how many loads to be merged. + Minor compaction triggers for every data load if the parameter carbon.enable.auto.load.merge is set to true. + If any segments are available to be merged, then compaction will run parallel with data load, there are 2 levels in minor compaction: + * Level 1: Merging of the segments which are not yet compacted. + * Level 2: Merging of the compacted segments again to form a bigger segment. + ``` ALTER TABLE table_name COMPACT 'MINOR' ``` - **Major Compaction** + + In Major compaction, many segments can be merged into one big segment. + User will specify the compaction size until which segments can be merged, Major compaction is usually done during the off-peak time. + This command merges the specified number of segments into one segment: + ``` ALTER TABLE table_name COMPACT 'MAJOR' ``` ## PARTITION + Similar other system's partition features, CarbonData's partition feature can be used to improve query performance by filtering on the partition column. --- End diff -- Similar to other system's partition features, CarbonData's partition feature also can be used to improve query performance by filtering on the partition column. ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152480386 --- Diff: docs/data-management-on-carbondata.md --- @@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ## COMPACTION -This command merges the specified number of segments into one segment, compaction help to improve query performance. -``` + Compaction help to improve query performance, because frequently load data, will generate several CarbonData files, because data is sorted only within each load(per load per segment and one B+ tree index). + This means that there will be one index for each load and as number of data load increases, the number of indices also increases. + Compaction feature combines several segments into one large segment by merge sorting the data from across the segments. + + There are two types of compaction Minor and Major compaction. + + ``` ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' -``` + ``` - **Minor Compaction** + + In minor compaction the user can specify how many loads to be merged. + Minor compaction triggers for every data load if the parameter carbon.enable.auto.load.merge is set to true. + If any segments are available to be merged, then compaction will run parallel with data load, there are 2 levels in minor compaction: + * Level 1: Merging of the segments which are not yet compacted. + * Level 2: Merging of the compacted segments again to form a bigger segment. --- End diff -- Level 2: Merging of the compacted segments again to form a larger segment. ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152480183 --- Diff: docs/data-management-on-carbondata.md --- @@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ## COMPACTION -This command merges the specified number of segments into one segment, compaction help to improve query performance. -``` + Compaction help to improve query performance, because frequently load data, will generate several CarbonData files, because data is sorted only within each load(per load per segment and one B+ tree index). + This means that there will be one index for each load and as number of data load increases, the number of indices also increases. + Compaction feature combines several segments into one large segment by merge sorting the data from across the segments. + + There are two types of compaction Minor and Major compaction. + + ``` ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' -``` + ``` - **Minor Compaction** + + In minor compaction the user can specify how many loads to be merged. --- End diff -- In Minor compaction, user can specify the number of loads to be merged. ---
[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...
Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152480015 --- Diff: docs/data-management-on-carbondata.md --- @@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ## COMPACTION -This command merges the specified number of segments into one segment, compaction help to improve query performance. -``` + Compaction help to improve query performance, because frequently load data, will generate several CarbonData files, because data is sorted only within each load(per load per segment and one B+ tree index). --- End diff -- Compaction improves the query performance significantly. During the load data, several CarbonData files are generated, this is because data is sorted only within each load (per load segment and one B+ tree index). ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1545 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1356/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1545 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1810/ ---
[jira] [Updated] (CARBONDATA-1794) (Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is performed in between the streaming
[ https://issues.apache.org/jira/browse/CARBONDATA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramakrishna S updated CARBONDATA-1794: -- Description: Steps : 1. Create a streaming table and do a batch load 2. Set up the Streaming , so that it does streaming in chunk of 1000 records 20 times was: Steps : 1. Create a streaming table and do a batch load 2. Set up the Streaming , so that it does streaming in chunk of 1000 records 20 times 3. Do another batch load on the table 4. Do one more time streaming +-++--+--+--++--+ | Segment Id | Status | Load Start Time | Load End Time | File Format | Merged To | +-++--+--+--++--+ | 2 | Success| 2017-11-21 21:42:36.77 | 2017-11-21 21:42:40.396 | COLUMNAR_V3 | NA | | 1 | Streaming | 2017-11-21 21:40:46.2| NULL | ROW_V1 | NA | | 0 | Success| 2017-11-21 21:40:39.782 | 2017-11-21 21:40:43.168 | COLUMNAR_V3 | NA | +-++--+--+--++--+ *+Expected:+* Data should be loaded *+Actual+* : Data load fiails 1. One addition offset file is created(marked in bold) -rw-r--r-- 2 root users 62 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/0 -rw-r--r-- 2 root users 63 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/1 -rw-r--r-- 2 root users 63 2017-11-21 21:42 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/10 -rw-r--r-- 2 root users 63 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/2 -rw-r--r-- 2 root users 63 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/3 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/4 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/5 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/6 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/7 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/8 *-rw-r--r-- 2 root users 63 2017-11-21 21:42 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/9* 2. Following error thrown: === Streaming Query === Identifier: [id = 3a5334bc-d471-4676-b6ce-f21105d491d1, runId = b2be9f97-8141-46be-89db-9a0f98d13369] Current Offsets: {org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193: 1000} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193 at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:284) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:177) Caused by: java.lang.RuntimeException: Offsets committed out of order: 20019 followed by 1000 at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.execution.streaming.TextSocketSource.commit(socket.scala:151) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:421) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:420) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2.apply$mcV$sp(StreamExecution.scala:420) at
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1510 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1355/ ---
[GitHub] carbondata pull request #1549: [CARBONDATA-1793] Insert / update is allowing...
GitHub user dhatchayani opened a pull request: https://github.com/apache/carbondata/pull/1549 [CARBONDATA-1793] Insert / update is allowing more than 32000 characters for String column Load is restricted to 32000 characters per column. But insert and update are allowing more than 32000 characters. Behavior should be consistent. Solution: Restrict insert and update to 32000 characters - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [X] Testing done UT Added - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/incubator-carbondata 32000char Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1549.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1549 commit 41f1428b49cce37a72d574c797afef469f9c67bb Author: dhatchayaniDate: 2017-11-22T06:13:49Z [CARBONDATA-1793] Insert / update is allowing more than 32000 characters for String column ---
[GitHub] carbondata issue #1543: [CARBONDATA-1786] [BugFix] Refactored code to avoid ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1543 @jackylk @QiangCai Please check this PR. It seems PR https://github.com/apache/carbondata/pull/1539 makes lot of SDV test fails. Now current master fails in SDV because of this PR 1539 is merged ---
[jira] [Created] (CARBONDATA-1794) (Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is performed in between the streaming
Ramakrishna S created CARBONDATA-1794: - Summary: (Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is performed in between the streaming Key: CARBONDATA-1794 URL: https://issues.apache.org/jira/browse/CARBONDATA-1794 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.3.0 Environment: 3 node ant cluster Reporter: Ramakrishna S Steps : 1. Create a streaming table and do a batch load 2. Set up the Streaming , so that it does streaming in chunk of 1000 records 20 times 3. Do another batch load on the table 4. Do one more time streaming +-++--+--+--++--+ | Segment Id | Status | Load Start Time | Load End Time | File Format | Merged To | +-++--+--+--++--+ | 2 | Success| 2017-11-21 21:42:36.77 | 2017-11-21 21:42:40.396 | COLUMNAR_V3 | NA | | 1 | Streaming | 2017-11-21 21:40:46.2| NULL | ROW_V1 | NA | | 0 | Success| 2017-11-21 21:40:39.782 | 2017-11-21 21:40:43.168 | COLUMNAR_V3 | NA | +-++--+--+--++--+ *+Expected:+* Data should be loaded *+Actual+* : Data load fiails 1. One addition offset file is created(marked in bold) -rw-r--r-- 2 root users 62 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/0 -rw-r--r-- 2 root users 63 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/1 -rw-r--r-- 2 root users 63 2017-11-21 21:42 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/10 -rw-r--r-- 2 root users 63 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/2 -rw-r--r-- 2 root users 63 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/3 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/4 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/5 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/6 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/7 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/8 *-rw-r--r-- 2 root users 63 2017-11-21 21:42 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/9* 2. Following error thrown: === Streaming Query === Identifier: [id = 3a5334bc-d471-4676-b6ce-f21105d491d1, runId = b2be9f97-8141-46be-89db-9a0f98d13369] Current Offsets: {org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193: 1000} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193 at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:284) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:177) Caused by: java.lang.RuntimeException: Offsets committed out of order: 20019 followed by 1000 at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.execution.streaming.TextSocketSource.commit(socket.scala:151) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:421) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:420) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2.apply$mcV$sp(StreamExecution.scala:420) at
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1542 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1809/ ---
[jira] [Created] (CARBONDATA-1793) Insert / update is allowing more than 32000 characters for String column
dhatchayani created CARBONDATA-1793: --- Summary: Insert / update is allowing more than 32000 characters for String column Key: CARBONDATA-1793 URL: https://issues.apache.org/jira/browse/CARBONDATA-1793 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1548: [WIP]test PR
GitHub user ravipesala opened a pull request: https://github.com/apache/carbondata/pull/1548 [WIP]test PR Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata test-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1548.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1548 commit 84da2a186400ba100f402c33b8f9f25fd3e99ae0 Author: Ravindra PesalaDate: 2017-11-22T06:06:24Z test ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1542 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1808/ ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1542 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1354/ ---
[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1542#discussion_r152474076 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala --- @@ -489,4 +489,44 @@ object PreAggregateUtil { } updatedPlan } + --- End diff -- moved the method to PreAggregationRules ---
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1510 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1807/ ---
[GitHub] carbondata issue #1543: [CARBONDATA-1786] [BugFix] Refactored code to avoid ...
Github user geetikagupta16 commented on the issue: https://github.com/apache/carbondata/pull/1543 @ravipesala Please review. This PR will resolve the current NullPointer exception for data loading. SDV test cases are failing due to the same issue. ---
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1510 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1352/ ---
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1510 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1806/ ---
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1510 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1805/ ---
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1510 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1350/ ---
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1510 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1804/ ---
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1510 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1349/ ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] [PreAgg] Block direct insert/load ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1508 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1802/ ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] [PreAgg] Block direct insert/load ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1508 retest sdv please ---
[GitHub] carbondata issue #1547: [CARBONDATA-1792]add example of data management for ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1547 Can one of the admins verify this patch? ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] [PreAgg] Block direct insert/load ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1508 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1801/ ---
[GitHub] carbondata issue #1547: [CARBONDATA-1792]add example of data management for ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1547 Can one of the admins verify this patch? ---
[GitHub] carbondata pull request #1547: [CARBONDATA-1792]add example of data manageme...
GitHub user Xaprice opened a pull request: https://github.com/apache/carbondata/pull/1547 [CARBONDATA-1792]add example of data management for Spark2.X [CARBONDATA-1792]add example of data management for Spark2.X You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xaprice/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1547.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1547 commit 8511ffa1572d15c4dc7141c8833bdee799cbf15f Author: Jin ZhouDate: 2017-11-21T14:57:58Z add example for data management ---
[GitHub] carbondata pull request #:
Github user Xaprice commented on the pull request: https://github.com/apache/carbondata/commit/8511ffa1572d15c4dc7141c8833bdee799cbf15f#commitcomment-25769606 [CARBONDATA-1792]add example for data management ---
[GitHub] carbondata issue #1508: [CARBONDATA-1738] [PreAgg] Block direct insert/load ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1508 retest sdv please ---
[jira] [Created] (CARBONDATA-1792) Adding example of data management for Spark2.X
Zhoujin created CARBONDATA-1792: --- Summary: Adding example of data management for Spark2.X Key: CARBONDATA-1792 URL: https://issues.apache.org/jira/browse/CARBONDATA-1792 Project: CarbonData Issue Type: Task Components: examples Affects Versions: 1.3.0 Reporter: Zhoujin Adding example of data management for Spark2.X -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1496: [CARBONDATA-1709][DataFrame] Support sort_columns op...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1496 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1348/ ---
[GitHub] carbondata issue #1496: [CARBONDATA-1709][DataFrame] Support sort_columns op...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1496 retest this please ---
[GitHub] carbondata issue #1546: [CARBONDATA-1736] Query from segment set is not effe...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1546 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1347/ ---
[GitHub] carbondata issue #1546: [CARBONDATA-1736] Query from segment set is not effe...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1546 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1800/ ---
[jira] [Assigned] (CARBONDATA-1526) 10. Handle compaction in aggregation tables.
[ https://issues.apache.org/jira/browse/CARBONDATA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1526: Assignee: Kunal Kapoor > 10. Handle compaction in aggregation tables. > > > Key: CARBONDATA-1526 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1526 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Kunal Kapoor > > User can trigger compaction on pre-aggregate table directly, it will further > merge the segments inside pre-aggregation table. To do that, use ALTER TABLE > COMPACT command on the pre-aggregate table just like the main table. > For implementation, there are two kinds of implementation for compaction. > 1. Mergable pre-aggregate tables: if aggregate functions are count, max, min, > sum, avg, the pre-aggregate table segments can be merged directly without > re-computing it. > 2. Non-mergable pre-aggregate tables: if aggregate function include > distinct_count, it needs to re-compute when doing compaction on pre-aggregate > table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1519) 3. Create UDF for timestamp to extract year,month,day,hour and minute from timestamp and date
[ https://issues.apache.org/jira/browse/CARBONDATA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1519: Assignee: kumar vishal > 3. Create UDF for timestamp to extract year,month,day,hour and minute from > timestamp and date > - > > Key: CARBONDATA-1519 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1519 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: kumar vishal > > Create UDF for timestamp to extract year,month,day,hour and minute from > timestamp and date -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1518) 2. Support creating timeseries while creating main table.
[ https://issues.apache.org/jira/browse/CARBONDATA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1518: Assignee: kumar vishal > 2. Support creating timeseries while creating main table. > - > > Key: CARBONDATA-1518 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1518 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: kumar vishal > > User can give timeseries option while creating the main table itself and > carbon will create aggregate tables automatically. > {code} > CREATE TABLE agg_sales > STORED BY 'carbondata' > TBLPROPERTIES ('parent_table'='sales', ‘timeseries_column’=’order_time’, > ‘granualarity’=’hour’, ‘rollup’ =’quantity:sum, max # user_id: count # price: > sum, max, min, avg’) > {code} > In the above case, user choose timeseries_column, granularity and aggregation > types for measures, so carbon generates the aggregation tables automatically > for year, month, day and hour level aggregation tables (totally 4 tables, > their table name will be prefixed with agg_sales). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg ...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1542#discussion_r152357935 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala --- @@ -489,4 +489,44 @@ object PreAggregateUtil { } updatedPlan } + --- End diff -- This code is specific for inserting the data to aggregate table better to move this code in rules class ---
[GitHub] carbondata pull request #1546: [CARBONDATA-1736] Query from segment set is n...
GitHub user kumarvishal09 opened a pull request: https://github.com/apache/carbondata/pull/1546 [CARBONDATA-1736] Query from segment set is not effective when pre-aggregate table is present Fixed issue : Query from segment set is not effective when pre-aggregate table is present Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [No ] Any interfaces changed? - [ No] Any backward compatibility impacted? - [No ] Document update required? - [ Yes] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kumarvishal09/incubator-carbondata master_1736 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1546.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1546 ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1545 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1799/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1545 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1346/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1545 retest this please ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1545 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1798/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1545 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1797/ ---
[GitHub] carbondata issue #1521: [WIP] [CARBONDATA-1743] fix conurrent pre-agg creati...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1521 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1796/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1545 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1345/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1344/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1545 retest this please ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1521: [WIP] [CARBONDATA-1743] fix conurrent pre-agg creati...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1521 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1343/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1342/ ---
[GitHub] carbondata issue #1544: [CARBONDATA-1740] Fixed order by issue in case of pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1544 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1341/ ---
[GitHub] carbondata issue #1544: [CARBONDATA-1740] Fixed order by issue in case of pr...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1544 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1795/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1537 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1340/ ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1537 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1794/ ---
[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1540 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1339/ ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1545 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1338/ ---
[GitHub] carbondata issue #1167: [CARBONDATA-1304] [IUD BuggFix] Iud with single pass
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1167 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1337/ ---
[jira] [Created] (CARBONDATA-1791) Carbon1.3.0 Concurrent Load-Alter: user is able to Alter table even if insert/load job is running
Ajeet Rai created CARBONDATA-1791: - Summary: Carbon1.3.0 Concurrent Load-Alter: user is able to Alter table even if insert/load job is running Key: CARBONDATA-1791 URL: https://issues.apache.org/jira/browse/CARBONDATA-1791 Project: CarbonData Issue Type: Bug Components: data-load Environment: 3 Node ant cluster Reporter: Ajeet Rai Fix For: 1.3.0 Carbon1.3.0 Concurrent Load-Alter: user is able to Alter table even if insert/load job is running. Steps: 1: Create a table 2: Start a insert job 3: Concurrently Alter the table(add,drop,rename) 4: Observe that alter is success 5: Observe that insert job is running and after some times job fails if table is renamed otherwise alter is success(for add,drop column) Expected behvaiour: drop job should wait for insert job to complete -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1789) Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if insert/load job is running
[ https://issues.apache.org/jira/browse/CARBONDATA-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajeet Rai updated CARBONDATA-1789: -- Labels: dfx (was: ) > Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if > insert/load job is running > --- > > Key: CARBONDATA-1789 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1789 > Project: CarbonData > Issue Type: Bug > Components: data-load > Environment: 3 Node ant cluster >Reporter: Ajeet Rai > Labels: dfx > Fix For: 1.3.0 > > > Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if > insert/load job is running > Steps: > 1: Create a table > 2: Start a insert job > 3: Concurrently drop the table > 4: Observe that drop is success > 5: Observe that insert job is running and after some times job fails > Expected behvaiour: drop job should wait for insert job to complete -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1790) (Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is performed in between the streaming
[ https://issues.apache.org/jira/browse/CARBONDATA-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramakrishna S updated CARBONDATA-1790: -- Description: Steps : 1. Create a streaming table and do a batch load 2. Set up the Streaming , so that it does streaming in chunk of 1000 records 20 times 3. Do another batch load on the table 4. Do one more time streaming +-++--+--+--++--+ | Segment Id | Status | Load Start Time | Load End Time | File Format | Merged To | +-++--+--+--++--+ | 2 | Success| 2017-11-21 21:42:36.77 | 2017-11-21 21:42:40.396 | COLUMNAR_V3 | NA | | 1 | Streaming | 2017-11-21 21:40:46.2| NULL | ROW_V1 | NA | | 0 | Success| 2017-11-21 21:40:39.782 | 2017-11-21 21:40:43.168 | COLUMNAR_V3 | NA | +-++--+--+--++--+ *+Expected:+* Data should be loaded *+Actual+* : Data load fiails 1. One addition offset file is created(marked in bold) -rw-r--r-- 2 root users 62 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/0 -rw-r--r-- 2 root users 63 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/1 -rw-r--r-- 2 root users 63 2017-11-21 21:42 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/10 -rw-r--r-- 2 root users 63 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/2 -rw-r--r-- 2 root users 63 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/3 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/4 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/5 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/6 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/7 -rw-r--r-- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/8 *-rw-r--r-- 2 root users 63 2017-11-21 21:42 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/9* 2. Following error thrown: === Streaming Query === Identifier: [id = 3a5334bc-d471-4676-b6ce-f21105d491d1, runId = b2be9f97-8141-46be-89db-9a0f98d13369] Current Offsets: {org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193: 1000} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193 at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:284) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:177) Caused by: java.lang.RuntimeException: Offsets committed out of order: 20019 followed by 1000 at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.execution.streaming.TextSocketSource.commit(socket.scala:151) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:421) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:420) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2.apply$mcV$sp(StreamExecution.scala:420) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2.apply(StreamExecution.scala:404) at
[jira] [Created] (CARBONDATA-1790) (Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is performed in between the streaming
Ramakrishna S created CARBONDATA-1790: - Summary: (Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is performed in between the streaming Key: CARBONDATA-1790 URL: https://issues.apache.org/jira/browse/CARBONDATA-1790 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.3.0 Environment: 3 node ant cluster Reporter: Ramakrishna S Steps : User starts the thrift server using the command - bin/spark-submit --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar "hdfs://hacluster/user/hive/warehouse/carbon.store" User connects to spark shell using the command - bin/spark-shell --master yarn-client --executor-memory 10G --executor-cores 5 --driver-memory 5G --num-executors 3 --jars /srv/spark2.2Bigdata/install/spark/sparkJdbc/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar In spark shell User creates a table and does streaming load in the table as per the below socket streaming script. import java.io.{File, PrintWriter} import java.net.ServerSocket import org.apache.spark.sql.{CarbonEnv, SparkSession} import org.apache.spark.sql.hive.CarbonRelation import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.path.{CarbonStorePath, CarbonTablePath} CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession. builder(). appName("StreamExample"). getOrCreateCarbonSession("hdfs://hacluster/user/hive/warehouse/david") carbonSession.sparkContext.setLogLevel("INFO") def sql(sql: String) = carbonSession.sql(sql) def writeSocket(serverSocket: ServerSocket): Thread = { val thread = new Thread() { override def run(): Unit = { // wait for client to connection request and accept val clientSocket = serverSocket.accept() val socketWriter = new PrintWriter(clientSocket.getOutputStream()) var index = 0 for (_ <- 1 to 1000) { // write 5 records per iteration for (_ <- 0 to 100) { index = index + 1 socketWriter.println(index.toString + ",name_" + index + ",city_" + index + "," + (index * 1.00).toString + ",school_" + index + ":school_" + index + index + "$" + index) } socketWriter.flush() Thread.sleep(2000) } socketWriter.close() System.out.println("Socket closed") } } thread.start() thread } def startStreaming(spark: SparkSession, tablePath: CarbonTablePath, tableName: String, port: Int): Thread = { val thread = new Thread() { override def run(): Unit = { var qry: StreamingQuery = null try { val readSocketDF = spark.readStream .format("socket") .option("host", "10.18.98.34") .option("port", port) .load() qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("5 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("tablePath", tablePath.getPath).option("tableName", tableName) .start() qry.awaitTermination() } catch { case ex: Throwable => ex.printStackTrace() println("Done reading and writing streaming data") } finally { qry.stop() } } } thread.start() thread } val streamTableName = "stream_table" sql(s"CREATE TABLE $streamTableName (id INT,name STRING,city STRING,salary FLOAT) STORED BY 'carbondata' TBLPROPERTIES('streaming'='true', 'sort_columns'='name')") sql(s"LOAD DATA LOCAL INPATH 'hdfs://hacluster/tmp/streamSample.csv' INTO TABLE $streamTableName OPTIONS('HEADER'='true')") sql(s"select * from $streamTableName").show val carbonTable = CarbonEnv.getInstance(carbonSession).carbonMetastore. lookupRelation(Some("default"), streamTableName)(carbonSession).asInstanceOf[CarbonRelation].carbonTable val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier) val port = 7995 val serverSocket = new ServerSocket(port) val socketThread = writeSocket(serverSocket) val streamingThread = startStreaming(carbonSession, tablePath, streamTableName, port) While load is in progress user executes select query on the streaming table from beeline. 0:
[jira] [Created] (CARBONDATA-1789) Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if insert/load job is running
Ajeet Rai created CARBONDATA-1789: - Summary: Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if insert/load job is running Key: CARBONDATA-1789 URL: https://issues.apache.org/jira/browse/CARBONDATA-1789 Project: CarbonData Issue Type: Bug Components: data-load Environment: 3 Node ant cluster Reporter: Ajeet Rai Fix For: 1.3.0 Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if insert/load job is running Steps: 1: Create a table 2: Start a insert job 3: Concurrently drop the table 4: Observe that drop is success 5: Observe that insert job is running and after some times job fails Expected behvaiour: drop job should wait for insert job to complete -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1540: [CARBONDATA-1784] clear column group code
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1540 retest this please ---
[GitHub] carbondata issue #1545: [CARBONDATA-1710]Resolved The Bug For Alter Tabel on...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1545 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1793/ ---
[GitHub] carbondata issue #1543: [CARBONDATA-1786] [BugFix] Refactored code to avoid ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1543 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1336/ ---
[GitHub] carbondata issue #1544: [CARBONDATA-1740] Fixed order by issue in case of pr...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1544 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1792/ ---
[jira] [Assigned] (CARBONDATA-1737) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate table loads partially when segment filter is set on the main table
[ https://issues.apache.org/jira/browse/CARBONDATA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1737: Assignee: Kunal Kapoor > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate table loads partially when > segment filter is set on the main table > - > > Key: CARBONDATA-1737 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1737 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > > 1. Create a table > create table if not exists lineitem2(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > 2. Load 2 times to create 2 segments > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem2 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 3. Check the table content without setting any filter: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem2 group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 327800.0 | 4.91387677624E8| > | A | F | 1.263625E7 | 1.893851542524009E10 | > | N | O | 2.5398626E7 | 3.810981608977967E10 | > | R | F | 1.2643878E7 | 1.8948524305619976E10 | > +---+---+--++--+ > 4. Set segment filter on the main table: > set carbon.input.segments.test_db1.lineitem2=1; > +---++--+ > |key| value | > +---++--+ > | carbon.input.segments.test_db1.lineitem2 | 1 | > +---++--+ > 5. Create pre-aggregate table > create datamap agr_lineitem2 ON TABLE lineitem2 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem2 > group by L_RETURNFLAG, L_LINESTATUS; > 6. Check table content: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem2 group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 163900.0 | 2.456938388124E8 | > | A | F | 6318125.0| 9.469257712620043E9| > | N | O | 1.2699313E7 | 1.9054908044889835E10 | > | R | F | 6321939.0| 9.474262152809986E9| > +---+---+--++--+ > 7. remove the filter on segment > 0: jdbc:hive2://10.18.98.48:23040> reset; > 8. Check the table conent: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem2 group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 163900.0 | 2.456938388124E8 | > | A | F | 6318125.0| 9.469257712620043E9| > | N | O | 1.2699313E7 | 1.9054908044889835E10 | > | R | F | 6321939.0| 9.474262152809986E9| >
[GitHub] carbondata issue #1544: [CARBONDATA-1740] Fixed order by issue in case of pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1544 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1335/ ---
[GitHub] carbondata pull request #1545: [CARBONDATA-1710]Resolved The Bug For Alter T...
GitHub user anubhav100 opened a pull request: https://github.com/apache/carbondata/pull/1545 [CARBONDATA-1710]Resolved The Bug For Alter Tabel on decimal typethrows exception when first alter is failed Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? no - [ ] Any backward compatibility impacted? no - [ ] Document update required? not required - [ ] Testing done new Test cases are added in same pr - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/anubhav100/incubator-carbondata CARBONDATA-1710 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1545.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1545 commit 9556bb0d3daac776e3bd8368b31f2b05f84e5c77 Author: anubhav100Date: 2017-11-21T13:05:09Z Resolved The Bug For Alter Tabel on decimal typethrows exception when first alter is failed ---
[GitHub] carbondata issue #1167: [CARBONDATA-1304] [IUD BuggFix] Iud with single pass
Github user mohammadshahidkhan commented on the issue: https://github.com/apache/carbondata/pull/1167 retest this please ---
[jira] [Updated] (CARBONDATA-1788) Insert is not working as expected when loaded with more than 32000 column length.
[ https://issues.apache.org/jira/browse/CARBONDATA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pakanati revathi updated CARBONDATA-1788: - Description: Insert should accept only 32000 length column. But when trying to load more than 32000 column length data, insert is successful. Expected result: When inserted more than 32000 column length, the insert should throw error. Actual result: When inserted more than 32000 column length, the insert is successful. Note: Update also should throw error while updating more than 3200 column length.Please implement for update also. was: Insert should accept only 32000 length column. But when trying to load more than 32000 column length data is inserting successfully. Expected result: When inserted more than 32000 column length, the insert should throw error. Actual result: When inserted more than 32000 column length, the insert is successful. Note: Update also should throw error while updating more than 3200 column length.Please implement for update also. > Insert is not working as expected when loaded with more than 32000 column > length. > - > > Key: CARBONDATA-1788 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1788 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.3.0 > Environment: 3 node ant cluster >Reporter: pakanati revathi >Priority: Minor > Attachments: Insert.PNG > > > Insert should accept only 32000 length column. But when trying to load more > than 32000 column length data, insert is successful. > Expected result: When inserted more than 32000 column length, the insert > should throw error. > Actual result: When inserted more than 32000 column length, the insert is > successful. > Note: Update also should throw error while updating more than 3200 column > length.Please implement for update also. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1543: [CARBONDATA-1786] [BugFix] Refactored code to avoid ...
Github user geetikagupta16 commented on the issue: https://github.com/apache/carbondata/pull/1543 retest this please ---
[jira] [Created] (CARBONDATA-1788) Insert is not working as expected when loaded with more than 32000 column length.
pakanati revathi created CARBONDATA-1788: Summary: Insert is not working as expected when loaded with more than 32000 column length. Key: CARBONDATA-1788 URL: https://issues.apache.org/jira/browse/CARBONDATA-1788 Project: CarbonData Issue Type: Bug Components: sql Affects Versions: 1.3.0 Environment: 3 node ant cluster Reporter: pakanati revathi Priority: Minor Attachments: Insert.PNG Insert should accept only 32000 length column. But when trying to load more than 32000 column length data is inserting successfully. Expected result: When inserted more than 32000 column length, the insert should throw error. Actual result: When inserted more than 32000 column length, the insert is successful. Note: Update also should throw error while updating more than 3200 column length.Please implement for update also. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1544: [CARBONDATA-1740] Fixed order by issue in cas...
GitHub user kumarvishal09 opened a pull request: https://github.com/apache/carbondata/pull/1544 [CARBONDATA-1740] Fixed order by issue in case of preAggregate Fixed order by issue in case of preAggregate - [ ] Any interfaces changed- None - [ ] Any backward compatibility impacted- None - [ ] Document update required- None - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kumarvishal09/incubator-carbondata master-1740 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1544.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1544 commit 4d5a9cb6222799103b8b377862bcfd0ff5ffb0dc Author: kumarvishalDate: 2017-11-20T11:06:11Z Fixed 1740 ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1542 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1334/ ---
[GitHub] carbondata issue #1542: [CARBONDATA-1757] [PreAgg] Fix for wrong avg values ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/1542 retest this please ---
[GitHub] carbondata issue #1543: [CARBONDATA-1786] [BugFix] Refactored code to avoid ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1543 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1791/ ---
[GitHub] carbondata pull request #1543: [CARBONDATA-1786] [BugFix] Refactored code to...
GitHub user geetikagupta16 opened a pull request: https://github.com/apache/carbondata/pull/1543 [CARBONDATA-1786] [BugFix] Refactored code to avoid null pointer exception while data loading Refactored code to add hadoop conf to AbstractDFS file constructor You can merge this pull request into a Git repository by running: $ git pull https://github.com/geetikagupta16/incubator-carbondata CARBONDATA-1786 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1543.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1543 commit fa64dec0b900e37be0f9ed165f504b833ae50d70 Author: Geetika GuptaDate: 2017-11-21T10:32:43Z Refactored code to add hadoop conf to AbstractDfs file constructor ---
[jira] [Assigned] (CARBONDATA-1710) Incorrect result with alter query in while adding new column to the table
[ https://issues.apache.org/jira/browse/CARBONDATA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anubhav tarar reassigned CARBONDATA-1710: - Assignee: anubhav tarar > Incorrect result with alter query in while adding new column to the table > - > > Key: CARBONDATA-1710 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1710 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: spark 2.1 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Minor > Attachments: 2000_UniqData.csv > > > Incorrect result with alter query while adding new column to the table > Steps to reproduce: > 1) Create table: > CREATE TABLE uniqdata_alter (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB") > 2)Load data into the table: > LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into > table uniqdata_alter OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > 3) Execute Query: > a) add column query with wrong precision and scale: > alter table uniqdata_alter add columns (decimal_column3 decimal(10,14)) > output: > Error: org.apache.spark.sql.AnalysisException: Decimal scale (14) cannot be > greater than precision (10).; (state=,code=0) > b)add column query with correct precision and scale: > alter table uniqdata_alter add columns (decimal_column3 decimal(10,4)); > Expected Output: > it should successfully add a new column to the table. > Actual output: > Error: org.apache.spark.sql.AnalysisException: Decimal scale (14) cannot be > greater than precision (10).; (state=,code=0) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1510 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1332/ ---
[GitHub] carbondata issue #1510: [WIP] Supported DataMap chooser and expression for s...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1510 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1790/ ---