[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example

2018-08-10 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/2620
  
retest this please


---


[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example

2018-08-10 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/2620
  
@chenliang613, please take a look.


---


[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example

2018-08-10 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/2620
  
retest this please


---


[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...

2018-08-09 Thread Xaprice
Github user Xaprice commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2620#discussion_r208840858
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CustomCompactionExample.scala
 ---
@@ -0,0 +1,69 @@
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.spark.sql.SparkSession
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.carbondata.examples.util.ExampleUtils
+
+
+object CustomCompactionExample {
+
+  def main(args: Array[String]): Unit = {
+val spark = ExampleUtils.createCarbonSession("CustomCompactionExample")
+exampleBody(spark)
+spark.close()
+  }
+
+  def exampleBody(spark : SparkSession): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd")
+
+spark.sql("DROP TABLE IF EXISTS custom_compaction_table")
+
+spark.sql(
+  s"""
+ | CREATE TABLE IF NOT EXISTS custom_compaction_table(
+ | ID Int,
+ | date Date,
+ | country String,
+ | name String,
+ | phonetype String,
+ | serialname String,
+ | salary Int,
+ | floatField float
+ | ) STORED BY 'carbondata'
+   """.stripMargin)
+
+val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
+val path = 
s"$rootPath/examples/spark2/src/main/resources/dataSample.csv"
+
+// load 4 segments
+// scalastyle:off
+(1 to 4).foreach(_ => spark.sql(
+  s"""
+ | LOAD DATA LOCAL INPATH '$path'
+ | INTO TABLE custom_compaction_table
+ | OPTIONS('HEADER'='true')
+   """.stripMargin))
+// scalastyle:on
+
+// show all segments: 0,1,2,3
+spark.sql("SHOW SEGMENTS FOR TABLE custom_compaction_table").show()
+
+// do custom compaction, segments specified will be merged
+spark.sql("ALTER TABLE custom_compaction_table COMPACT 'CUSTOM' WHERE 
SEGMENT.ID IN (1,2)")
+spark.sql("SHOW SEGMENTS FOR TABLE custom_compaction_table").show()
+
+CarbonProperties.getInstance().addProperty(
--- End diff --

This Property is set to non-default value in the beginning of method 
'exampleBody' . To ensure the completeness of this test case, the property is 
set back to default value, though it seems to be redundant.


---


[GitHub] carbondata issue #2620: [CARBONDATA-2839] Add custom compaction example

2018-08-09 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/2620
  
@xuchuanyin, ‘CUSTOM COMPACTION’ is a new compaction type in addition 
to MAJOR and MINOR COMPACTION. When doing custom compaction, user can directly 
specify segment ids to be merged. 


---


[GitHub] carbondata pull request #2620: [CARBONDATA-2839] Add custom compaction examp...

2018-08-08 Thread Xaprice
GitHub user Xaprice opened a pull request:

https://github.com/apache/carbondata/pull/2620

[CARBONDATA-2839] Add custom compaction example

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ x ] Any interfaces changed?
 no
 - [ x ] Any backward compatibility impacted?
 no
 - [ x ] Document update required?
no
 - [ x ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ x ] For large changes, please consider breaking it into sub-tasks 
under an umbrella JIRA. 
small change


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Xaprice/carbondata custom_compaction_example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2620


commit 0402a5f1f66027e5b7d72b514eb80b09c2d7222e
Author: Jin Zhou 
Date:   2018-08-08T09:01:43Z

[CARBONDATA-2839] Add custom compaction example




---


[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

2018-04-27 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812
  
I've raised a sub-task for custom compaction for child tables/datamaps:
https://issues.apache.org/jira/browse/CARBONDATA-2412


---


[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

2018-04-27 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812
  
@manishgupta88, I've submitted some changes, have a look please.


---


[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

2018-04-23 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812
  
retest this please


---


[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...

2018-04-10 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/2136
  
hi, @manishgupta88 , please take a look?


---


[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...

2018-04-10 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/2136
  
retest this please


---


[GitHub] carbondata pull request #2136: [CARBONDATA-2307] Fix OOM issue when using Da...

2018-04-08 Thread Xaprice
Github user Xaprice commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2136#discussion_r179941484
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 ---
@@ -402,7 +402,7 @@ class CarbonScanRDD(
   // one query id per table
   model.setQueryId(queryId)
   // get RecordReader by FileFormat
-  val reader: RecordReader[Void, Object] = inputSplit.getFileFormat 
match {
+  var reader: RecordReader[Void, Object] = inputSplit.getFileFormat 
match {
--- End diff --

reader will be set null in closeReader() method to reduce memory occupation 
when using coalesce, otherwise there will be lots of reader instances in memory.


---


[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...

2018-04-04 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/2136
  
retest this please


---


[GitHub] carbondata pull request #2136: [CARBONDATA-2307] Fix OOM issue when using Da...

2018-04-03 Thread Xaprice
GitHub user Xaprice opened a pull request:

https://github.com/apache/carbondata/pull/2136

[CARBONDATA-2307] Fix OOM issue when using DataFrame.coalesce

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 NO
 - [x] Any backward compatibility impacted?
 NO
 - [x] Document update required?
NO
 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   Tested on cluster
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
Bug fix, not large changes


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Xaprice/carbondata 
fix_memoryleak_using_coalesce

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2136.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2136


commit d4a233b86592ddca52b584a4dc22f3c84912483a
Author: Jin Zhou <xaprice@...>
Date:   2018-04-03T10:48:51Z

[CARBONDATA-2307] Fix OOM issue when using DataFrame.coalesce




---


[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

2018-03-01 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812
  
retest this please


---


[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

2018-02-01 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812
  
@ravipesala  Compacting adjacent segments is certainly the best practice in 
most cases. But is it not flexible enough to take it as  a mandatory rule?  


---


[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

2018-02-01 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812
  
@chenliang613  
For question 1:  I thought minor compaction are mainly used in auto-merging 
scenario. But after reconsidering this feature, maybe it's better to support 
both major and minor compaction. I will add support of minor compaction soon.
For question 2: I will follow your advice and modify the syntax to keep 
consistent syntax as "query with specified segments".


---


[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

2018-01-16 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1812
  
Hi @chenliang613 , can you please take a look?


---


[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

2018-01-16 Thread Xaprice
GitHub user Xaprice opened a pull request:

https://github.com/apache/carbondata/pull/1812

[CARBONDATA-2033]support user specified segments in major compation

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
**no**
 - [ ] Any backward compatibility impacted?
  **no**
 - [x] Document update required?
**Yes, data-management-on-carbondata.md has been updated.**
 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
   **yes**
- How it is tested? Please attach test report.
   **test on cluster with 7 nodes**
- Is it a performance related change? Please attach the performance 
test report.
   **no**
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Xaprice/carbondata 
specified_segs_in_major_compact

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1812


commit 96bddafbc9edf48cbb427a75d267178cc1cef2f8
Author: Jin Zhou <xaprice@...>
Date:   2018-01-16T09:02:51Z

[CARBONDATA-2033]support user specified segments in major compation




---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-12-20 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
retest this please


---


[GitHub] carbondata pull request #1575: [CARBONDATA-1698]Adding support for table lev...

2017-12-20 Thread Xaprice
Github user Xaprice commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1575#discussion_r158186274
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -205,7 +205,8 @@ object CarbonDataRDDFactory {
 
   val newCarbonLoadModel = prepareCarbonLoadModel(table)
 
-  val compactionSize = 
CarbonDataMergerUtil.getCompactionSize(CompactionType.MAJOR)
+  val compactionSize = CarbonDataMergerUtil
+.getCompactionSize(CompactionType.MAJOR, carbonLoadModel)
--- End diff --

carbonLoadModel may contain table-level major compaction size if it is 
specified in create table SQL, so the purpose for adding parameter 
'carbonLoadModel' is to get the table-level major compaction size.


---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-12-15 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
Hi @chenliang613 , can you please take a look?


---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-12-15 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
retest this please


---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-12-05 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
retest this please


---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-12-04 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
fix code style problem, retest this please


---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-12-04 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
retest this please


---


[GitHub] carbondata pull request #1575: [CARBONDATA-1698]Adding support for table lev...

2017-12-04 Thread Xaprice
Github user Xaprice commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1575#discussion_r154612395
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -863,6 +863,16 @@
   public static final String TABLE_BLOCKSIZE = "table_blocksize";
   // set in column level to disable inverted index
   public static final String NO_INVERTED_INDEX = "no_inverted_index";
+  // table property name of major compaction size
+  public static final String TBL_PROP_MAJOR_COMPACTION_SIZE = 
"major_compaction_size";
--- End diff --

TBL_PROPs removed


---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-11-30 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
retest this please


---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-11-30 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
@jackylk 
User can create table by SQL below:
```
 CREATE TABLE tableWithCompactionOptions(
 intField INT,
 stringField STRING
 )
 STORED BY 'carbondata'
 TBLPROPERTIES('MAJOR_COMPACTION_SIZE'='10240',
 'AUTO_LOAD_MERGE'='true',
 'COMPACTION_LEVEL_THRESHOLD'='5,6',
 'COMPACTION_PRESERVE_SEGMENTS'='10',
 'ALLOWED_COMPACTION_DAYS'='5')
```
Thus user can specify compaction configurations in table level. The 
configurations are all optional, if not specified, corresponding configurations 
in carbon.properties will be used. Related document has been updated.




---


[GitHub] carbondata issue #1575: [CARBONDATA-1698]Adding support for table level comp...

2017-11-30 Thread Xaprice
Github user Xaprice commented on the issue:

https://github.com/apache/carbondata/pull/1575
  
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [x] Any interfaces changed?
  **no**
- [x] Any backward compatibility impacted?
  **no**
- [x]  Document update required?
  **Yes, data-management-on-carbondata.md has been updated**
- [x] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are 
required?
**new unit test cases added**
- How it is tested? Please attach test report.
**unit test and tested on cluster with 7 nodes**
- Is it a performance related change? Please attach the performance 
test report.
**no**
- Any additional information to help reviewers in testing this change.
**no**

- [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
**NOT RELATED**


---


[GitHub] carbondata pull request #1575: [CARBONDATA-1698]Adding support for table lev...

2017-11-27 Thread Xaprice
GitHub user Xaprice opened a pull request:

https://github.com/apache/carbondata/pull/1575

[CARBONDATA-1698]Adding support for table level compaction configuration

Adding support for table level compaction configuration

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Xaprice/carbondata master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1575.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1575


commit 0a6ba166795872b41c8fa3fa8a5e1a2e5faa81b0
Author: 周瑾 <zhou...@dataspy.com>
Date:   2017-11-27T02:30:17Z

add support for table level compaction properties

commit fa8e847cf26f3b8daa067af792b84f5666ad3920
Author: Jin Zhou <xapr...@yeah.net>
Date:   2017-11-27T09:05:08Z

[CARBONDATA-1698]Adding support for table level compaction configuration

commit f50cd67caf4ea9280d16f34dfe984e218634824c
Author: Jin Zhou <xapr...@yeah.net>
Date:   2017-11-27T09:45:53Z

[CARBONDATA-1698]Adding table level compaction configuration

commit 763e22ce95b829f6a5cb43fa92a523137807a7db
Author: Jin Zhou <xapr...@yeah.net>
Date:   2017-11-27T09:46:04Z

Merge branch 'master' of https://github.com/apache/carbondata




---


[GitHub] carbondata pull request #1547: [CARBONDATA-1792]add example of data manageme...

2017-11-21 Thread Xaprice
GitHub user Xaprice opened a pull request:

https://github.com/apache/carbondata/pull/1547

[CARBONDATA-1792]add example of data management for Spark2.X

[CARBONDATA-1792]add example of data management for Spark2.X

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Xaprice/carbondata master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1547


commit 8511ffa1572d15c4dc7141c8833bdee799cbf15f
Author: Jin Zhou <xapr...@yeah.net>
Date:   2017-11-21T14:57:58Z

add example for data management




---


[GitHub] carbondata pull request #:

2017-11-21 Thread Xaprice
Github user Xaprice commented on the pull request:


https://github.com/apache/carbondata/commit/8511ffa1572d15c4dc7141c8833bdee799cbf15f#commitcomment-25769606
  
[CARBONDATA-1792]add example for data management


---