[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/2620 retest this please ---
[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/2620 @chenliang613, please take a look. ---
[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/2620 retest this please ---
[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...
Github user Xaprice commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2620#discussion_r208840858 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CustomCompactionExample.scala --- @@ -0,0 +1,69 @@ +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SparkSession + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.examples.util.ExampleUtils + + +object CustomCompactionExample { + + def main(args: Array[String]): Unit = { +val spark = ExampleUtils.createCarbonSession("CustomCompactionExample") +exampleBody(spark) +spark.close() + } + + def exampleBody(spark : SparkSession): Unit = { +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd") + +spark.sql("DROP TABLE IF EXISTS custom_compaction_table") + +spark.sql( + s""" + | CREATE TABLE IF NOT EXISTS custom_compaction_table( + | ID Int, + | date Date, + | country String, + | name String, + | phonetype String, + | serialname String, + | salary Int, + | floatField float + | ) STORED BY 'carbondata' + """.stripMargin) + +val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath +val path = s"$rootPath/examples/spark2/src/main/resources/dataSample.csv" + +// load 4 segments +// scalastyle:off +(1 to 4).foreach(_ => spark.sql( + s""" + | LOAD DATA LOCAL INPATH '$path' + | INTO TABLE custom_compaction_table + | OPTIONS('HEADER'='true') + """.stripMargin)) +// scalastyle:on + +// show all segments: 0,1,2,3 +spark.sql("SHOW SEGMENTS FOR TABLE custom_compaction_table").show() + +// do custom compaction, segments specified will be merged +spark.sql("ALTER TABLE custom_compaction_table COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (1,2)") +spark.sql("SHOW SEGMENTS FOR TABLE custom_compaction_table").show() + +CarbonProperties.getInstance().addProperty( --- End diff -- This Property is set to non-default value in the beginning of method 'exampleBody' . To ensure the completeness of this test case, the property is set back to default value, though it seems to be redundant. ---
[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/2620 @xuchuanyin, âCUSTOM COMPACTIONâ is a new compaction type in addition to MAJOR and MINOR COMPACTION. When doing custom compaction, user can directly specify segment ids to be merged. ---
[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...
GitHub user Xaprice opened a pull request: https://github.com/apache/carbondata/pull/2620 [CARBONDATA-2839] Add custom compaction example Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ x ] Any interfaces changed? no - [ x ] Any backward compatibility impacted? no - [ x ] Document update required? no - [ x ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ x ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. small change You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xaprice/carbondata custom_compaction_example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2620 commit 0402a5f1f66027e5b7d72b514eb80b09c2d7222e Author: Jin Zhou Date: 2018-08-08T09:01:43Z [CARBONDATA-2839] Add custom compaction example ---
[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1812 I've raised a sub-task for custom compaction for child tables/datamaps: https://issues.apache.org/jira/browse/CARBONDATA-2412 ---
[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1812 @manishgupta88, I've submitted some changes, have a look please. ---
[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1812 retest this please ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/2136 hi, @manishgupta88 , please take a look? ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/2136 retest this please ---
[GitHub] carbondata pull request #2136: [CARBONDATA-2307] Fix OOM issue when using Da...
Github user Xaprice commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2136#discussion_r179941484 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala --- @@ -402,7 +402,7 @@ class CarbonScanRDD( // one query id per table model.setQueryId(queryId) // get RecordReader by FileFormat - val reader: RecordReader[Void, Object] = inputSplit.getFileFormat match { + var reader: RecordReader[Void, Object] = inputSplit.getFileFormat match { --- End diff -- reader will be set null in closeReader() method to reduce memory occupation when using coalesce, otherwise there will be lots of reader instances in memory. ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/2136 retest this please ---
[GitHub] carbondata pull request #2136: [CARBONDATA-2307] Fix OOM issue when using Da...
GitHub user Xaprice opened a pull request: https://github.com/apache/carbondata/pull/2136 [CARBONDATA-2307] Fix OOM issue when using DataFrame.coalesce Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? NO - [x] Any backward compatibility impacted? NO - [x] Document update required? NO - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Tested on cluster - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. Bug fix, not large changes You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xaprice/carbondata fix_memoryleak_using_coalesce Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2136.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2136 commit d4a233b86592ddca52b584a4dc22f3c84912483a Author: Jin Zhou <xaprice@...> Date: 2018-04-03T10:48:51Z [CARBONDATA-2307] Fix OOM issue when using DataFrame.coalesce ---
[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1812 retest this please ---
[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1812 @ravipesala Compacting adjacent segments is certainly the best practice in most cases. But is it not flexible enough to take it as a mandatory rule? ---
[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1812 @chenliang613 For question 1: I thought minor compaction are mainly used in auto-merging scenario. But after reconsidering this feature, maybe it's better to support both major and minor compaction. I will add support of minor compaction soon. For question 2: I will follow your advice and modify the syntax to keep consistent syntax as "query with specified segments". ---
[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1812 Hi @chenliang613 , can you please take a look? ---
[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...
GitHub user Xaprice opened a pull request: https://github.com/apache/carbondata/pull/1812 [CARBONDATA-2033]support user specified segments in major compation Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? **no** - [ ] Any backward compatibility impacted? **no** - [x] Document update required? **Yes, data-management-on-carbondata.md has been updated.** - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? **yes** - How it is tested? Please attach test report. **test on cluster with 7 nodes** - Is it a performance related change? Please attach the performance test report. **no** - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xaprice/carbondata specified_segs_in_major_compact Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1812.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1812 commit 96bddafbc9edf48cbb427a75d267178cc1cef2f8 Author: Jin Zhou <xaprice@...> Date: 2018-01-16T09:02:51Z [CARBONDATA-2033]support user specified segments in major compation ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 retest this please ---
[GitHub] carbondata pull request #1575: [CARBONDATA-1698]Adding support for table lev...
Github user Xaprice commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1575#discussion_r158186274 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -205,7 +205,8 @@ object CarbonDataRDDFactory { val newCarbonLoadModel = prepareCarbonLoadModel(table) - val compactionSize = CarbonDataMergerUtil.getCompactionSize(CompactionType.MAJOR) + val compactionSize = CarbonDataMergerUtil +.getCompactionSize(CompactionType.MAJOR, carbonLoadModel) --- End diff -- carbonLoadModel may contain table-level major compaction size if it is specified in create table SQL, so the purpose for adding parameter 'carbonLoadModel' is to get the table-level major compaction size. ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 Hi @chenliang613 , can you please take a look? ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 retest this please ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 retest this please ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 fix code style problem, retest this please ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 retest this please ---
[GitHub] carbondata pull request #1575: [CARBONDATA-1698]Adding support for table lev...
Github user Xaprice commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1575#discussion_r154612395 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -863,6 +863,16 @@ public static final String TABLE_BLOCKSIZE = "table_blocksize"; // set in column level to disable inverted index public static final String NO_INVERTED_INDEX = "no_inverted_index"; + // table property name of major compaction size + public static final String TBL_PROP_MAJOR_COMPACTION_SIZE = "major_compaction_size"; --- End diff -- TBL_PROPs removed ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 retest this please ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 @jackylk User can create table by SQL below: ``` CREATE TABLE tableWithCompactionOptions( intField INT, stringField STRING ) STORED BY 'carbondata' TBLPROPERTIES('MAJOR_COMPACTION_SIZE'='10240', 'AUTO_LOAD_MERGE'='true', 'COMPACTION_LEVEL_THRESHOLD'='5,6', 'COMPACTION_PRESERVE_SEGMENTS'='10', 'ALLOWED_COMPACTION_DAYS'='5') ``` Thus user can specify compaction configurations in table level. The configurations are all optional, if not specified, corresponding configurations in carbon.properties will be used. Related document has been updated. ---
[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...
Github user Xaprice commented on the issue: https://github.com/apache/carbondata/pull/1575 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? **no** - [x] Any backward compatibility impacted? **no** - [x] Document update required? **Yes, data-management-on-carbondata.md has been updated** - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? **new unit test cases added** - How it is tested? Please attach test report. **unit test and tested on cluster with 7 nodes** - Is it a performance related change? Please attach the performance test report. **no** - Any additional information to help reviewers in testing this change. **no** - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. **NOT RELATED** ---
[GitHub] carbondata pull request #1575: [CARBONDATA-1698]Adding support for table lev...
GitHub user Xaprice opened a pull request: https://github.com/apache/carbondata/pull/1575 [CARBONDATA-1698]Adding support for table level compaction configuration Adding support for table level compaction configuration You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xaprice/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1575.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1575 commit 0a6ba166795872b41c8fa3fa8a5e1a2e5faa81b0 Author: å¨ç¾ <zhou...@dataspy.com> Date: 2017-11-27T02:30:17Z add support for table level compaction properties commit fa8e847cf26f3b8daa067af792b84f5666ad3920 Author: Jin Zhou <xapr...@yeah.net> Date: 2017-11-27T09:05:08Z [CARBONDATA-1698]Adding support for table level compaction configuration commit f50cd67caf4ea9280d16f34dfe984e218634824c Author: Jin Zhou <xapr...@yeah.net> Date: 2017-11-27T09:45:53Z [CARBONDATA-1698]Adding table level compaction configuration commit 763e22ce95b829f6a5cb43fa92a523137807a7db Author: Jin Zhou <xapr...@yeah.net> Date: 2017-11-27T09:46:04Z Merge branch 'master' of https://github.com/apache/carbondata ---
[GitHub] carbondata pull request #1547: [CARBONDATA-1792]add example of data manageme...
GitHub user Xaprice opened a pull request: https://github.com/apache/carbondata/pull/1547 [CARBONDATA-1792]add example of data management for Spark2.X [CARBONDATA-1792]add example of data management for Spark2.X You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xaprice/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1547.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1547 commit 8511ffa1572d15c4dc7141c8833bdee799cbf15f Author: Jin Zhou <xapr...@yeah.net> Date: 2017-11-21T14:57:58Z add example for data management ---
[GitHub] carbondata pull request #:
Github user Xaprice commented on the pull request: https://github.com/apache/carbondata/commit/8511ffa1572d15c4dc7141c8833bdee799cbf15f#commitcomment-25769606 [CARBONDATA-1792]add example for data management ---