[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2565


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r206367929
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala
 ---
@@ -371,7 +371,7 @@ class LuceneFineGrainDataMapSuite extends QueryTest 
with BeforeAndAfterAll {
   """
 | CREATE TABLE datamap_test_table(id INT, name STRING, city 
STRING, age INT)
 | STORED BY 'carbondata'
-| TBLPROPERTIES('SORT_COLUMNS'='city,name', 
'SORT_SCOPE'='GLOBAL_SORT')
+| TBLPROPERTIES('SORT_COLUMNS'='city,name', 
'SORT_SCOPE'='GLOBAL_SORT', 'CACHE_LEVEL'='BLOCKLET')
--- End diff --

By default the cache_level is BLOCK which may affect the pruning info. In 
some test cases in this file, they assert on the content of pruning info. So 
here, I just change the cache_level to BLOCKLET, so that I do not to modify the 
assertion.


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r206367642
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
 ---
@@ -318,13 +318,22 @@ private DataMapRowImpl 
loadBlockMetaInfo(CarbonRowSchema[] taskSummarySchema,
   blockMinValues, blockMaxValues);
   blockletCountInEachBlock.add(totalBlockletsInOneBlock);
 }
-byte[] blockletCount = ArrayUtils
-.toPrimitive(blockletCountInEachBlock.toArray(new 
Byte[blockletCountInEachBlock.size()]));
+byte[] blockletCount = 
convertRowCountFromShortToByteArray(blockletCountInEachBlock);
 // blocklet count index is the last index
 summaryRow.setByteArray(blockletCount, taskSummarySchema.length - 1);
 return summaryRow;
   }
 
+  private byte[] convertRowCountFromShortToByteArray(List 
blockletCountInEachBlock) {
--- End diff --

because we are using offheap store, which needs to  store the bytes.


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r206367396
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -71,7 +71,7 @@
   /**
* variable for cache level BLOCKLET
*/
-  private static final String CACHE_LEVEL_BLOCKLET = "BLOCKLET";
+  public static final String CACHE_LEVEL_BLOCKLET = "BLOCKLET";
--- End diff --

Because this member needs to be accessed outside this class. Currently in 
`CarbonInputFormat` we need to use this variable to know the current cache 
level.


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-30 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r206181478
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala
 ---
@@ -371,7 +371,7 @@ class LuceneFineGrainDataMapSuite extends QueryTest 
with BeforeAndAfterAll {
   """
 | CREATE TABLE datamap_test_table(id INT, name STRING, city 
STRING, age INT)
 | STORED BY 'carbondata'
-| TBLPROPERTIES('SORT_COLUMNS'='city,name', 
'SORT_SCOPE'='GLOBAL_SORT')
+| TBLPROPERTIES('SORT_COLUMNS'='city,name', 
'SORT_SCOPE'='GLOBAL_SORT', 'CACHE_LEVEL'='BLOCKLET')
--- End diff --

why need change "CACHE_LEVEL" to "BLOCKLET"


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-30 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r206178902
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
 ---
@@ -318,13 +318,22 @@ private DataMapRowImpl 
loadBlockMetaInfo(CarbonRowSchema[] taskSummarySchema,
   blockMinValues, blockMaxValues);
   blockletCountInEachBlock.add(totalBlockletsInOneBlock);
 }
-byte[] blockletCount = ArrayUtils
-.toPrimitive(blockletCountInEachBlock.toArray(new 
Byte[blockletCountInEachBlock.size()]));
+byte[] blockletCount = 
convertRowCountFromShortToByteArray(blockletCountInEachBlock);
 // blocklet count index is the last index
 summaryRow.setByteArray(blockletCount, taskSummarySchema.length - 1);
 return summaryRow;
   }
 
+  private byte[] convertRowCountFromShortToByteArray(List 
blockletCountInEachBlock) {
--- End diff --

why need to do the convert ?


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-30 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r206178641
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ---
@@ -71,7 +71,7 @@
   /**
* variable for cache level BLOCKLET
*/
-  private static final String CACHE_LEVEL_BLOCKLET = "BLOCKLET";
+  public static final String CACHE_LEVEL_BLOCKLET = "BLOCKLET";
--- End diff --

why changes to public


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-28 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205934275
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -165,9 +180,11 @@ public void initIndexColumnConverters(CarbonTable 
carbonTable, List

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-27 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205676555
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -165,9 +180,11 @@ public void initIndexColumnConverters(CarbonTable 
carbonTable, List

[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-26 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205657632
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala
 ---
@@ -357,13 +357,20 @@ class IndexDataMapRebuildRDD[K, V](
 // skip clear datamap and we will do this adter rebuild
 reader.setSkipClearDataMapAtClose(true)
 
+// currently blockletId in rowWithPosition is wrong, we cannot use 
it
--- End diff --

OK


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-26 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205657316
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -103,7 +106,19 @@ public void init(DataMapModel dataMapModel) throws 
IOException {
   /**
* init field converters for index columns
*/
-  public void initIndexColumnConverters(CarbonTable carbonTable, 
List indexedColumn) {
+  public void initIndexColumnConverters(CarbonTable carbonTable, String 
dataMapName,
+  List indexedColumn) {
+String cacheLevel = MapUtils.getString(
+carbonTable.getTableInfo().getFactTable().getTableProperties(),
+CarbonCommonConstants.CACHE_LEVEL, 
CarbonCommonConstants.CACHE_LEVEL_DEFAULT_VALUE);
+this.isBlockletCacheLevel = cacheLevel.equalsIgnoreCase("blocklet");
+if (!this.isBlockletCacheLevel) {
+  LOGGER.warn(
+  String.format("BloomFilter datamap %s runs with 
cache_level=block for table %s.%s,"
+  + " which may decrease its pruning performance",
--- End diff --

OK


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-26 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205656230
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala
 ---
@@ -357,13 +357,20 @@ class IndexDataMapRebuildRDD[K, V](
 // skip clear datamap and we will do this adter rebuild
 reader.setSkipClearDataMapAtClose(true)
 
+// currently blockletId in rowWithPosition is wrong, we cannot use 
it
--- End diff --

This is a bit confusing, can you rephrase it


---


[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...

2018-07-26 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2565#discussion_r205655699
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -103,7 +106,19 @@ public void init(DataMapModel dataMapModel) throws 
IOException {
   /**
* init field converters for index columns
*/
-  public void initIndexColumnConverters(CarbonTable carbonTable, 
List indexedColumn) {
+  public void initIndexColumnConverters(CarbonTable carbonTable, String 
dataMapName,
+  List indexedColumn) {
+String cacheLevel = MapUtils.getString(
+carbonTable.getTableInfo().getFactTable().getTableProperties(),
+CarbonCommonConstants.CACHE_LEVEL, 
CarbonCommonConstants.CACHE_LEVEL_DEFAULT_VALUE);
+this.isBlockletCacheLevel = cacheLevel.equalsIgnoreCase("blocklet");
+if (!this.isBlockletCacheLevel) {
+  LOGGER.warn(
+  String.format("BloomFilter datamap %s runs with 
cache_level=block for table %s.%s,"
+  + " which may decrease its pruning performance",
--- End diff --

change to `which may decrease its pruning benefit, which lead to read more 
data`


---