[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2565 ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r206367929 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala --- @@ -371,7 +371,7 @@ class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll { """ | CREATE TABLE datamap_test_table(id INT, name STRING, city STRING, age INT) | STORED BY 'carbondata' -| TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='GLOBAL_SORT') +| TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='GLOBAL_SORT', 'CACHE_LEVEL'='BLOCKLET') --- End diff -- By default the cache_level is BLOCK which may affect the pruning info. In some test cases in this file, they assert on the content of pruning info. So here, I just change the cache_level to BLOCKLET, so that I do not to modify the assertion. ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r206367642 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java --- @@ -318,13 +318,22 @@ private DataMapRowImpl loadBlockMetaInfo(CarbonRowSchema[] taskSummarySchema, blockMinValues, blockMaxValues); blockletCountInEachBlock.add(totalBlockletsInOneBlock); } -byte[] blockletCount = ArrayUtils -.toPrimitive(blockletCountInEachBlock.toArray(new Byte[blockletCountInEachBlock.size()])); +byte[] blockletCount = convertRowCountFromShortToByteArray(blockletCountInEachBlock); // blocklet count index is the last index summaryRow.setByteArray(blockletCount, taskSummarySchema.length - 1); return summaryRow; } + private byte[] convertRowCountFromShortToByteArray(List blockletCountInEachBlock) { --- End diff -- because we are using offheap store, which needs to store the bytes. ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r206367396 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -71,7 +71,7 @@ /** * variable for cache level BLOCKLET */ - private static final String CACHE_LEVEL_BLOCKLET = "BLOCKLET"; + public static final String CACHE_LEVEL_BLOCKLET = "BLOCKLET"; --- End diff -- Because this member needs to be accessed outside this class. Currently in `CarbonInputFormat` we need to use this variable to know the current cache level. ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r206181478 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala --- @@ -371,7 +371,7 @@ class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll { """ | CREATE TABLE datamap_test_table(id INT, name STRING, city STRING, age INT) | STORED BY 'carbondata' -| TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='GLOBAL_SORT') +| TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='GLOBAL_SORT', 'CACHE_LEVEL'='BLOCKLET') --- End diff -- why need change "CACHE_LEVEL" to "BLOCKLET" ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r206178902 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java --- @@ -318,13 +318,22 @@ private DataMapRowImpl loadBlockMetaInfo(CarbonRowSchema[] taskSummarySchema, blockMinValues, blockMaxValues); blockletCountInEachBlock.add(totalBlockletsInOneBlock); } -byte[] blockletCount = ArrayUtils -.toPrimitive(blockletCountInEachBlock.toArray(new Byte[blockletCountInEachBlock.size()])); +byte[] blockletCount = convertRowCountFromShortToByteArray(blockletCountInEachBlock); // blocklet count index is the last index summaryRow.setByteArray(blockletCount, taskSummarySchema.length - 1); return summaryRow; } + private byte[] convertRowCountFromShortToByteArray(List blockletCountInEachBlock) { --- End diff -- why need to do the convert ? ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r206178641 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -71,7 +71,7 @@ /** * variable for cache level BLOCKLET */ - private static final String CACHE_LEVEL_BLOCKLET = "BLOCKLET"; + public static final String CACHE_LEVEL_BLOCKLET = "BLOCKLET"; --- End diff -- why changes to public ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r205934275 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java --- @@ -165,9 +180,11 @@ public void initIndexColumnConverters(CarbonTable carbonTable, List
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r205676555 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java --- @@ -165,9 +180,11 @@ public void initIndexColumnConverters(CarbonTable carbonTable, List
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r205657632 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala --- @@ -357,13 +357,20 @@ class IndexDataMapRebuildRDD[K, V]( // skip clear datamap and we will do this adter rebuild reader.setSkipClearDataMapAtClose(true) +// currently blockletId in rowWithPosition is wrong, we cannot use it --- End diff -- OK ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r205657316 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java --- @@ -103,7 +106,19 @@ public void init(DataMapModel dataMapModel) throws IOException { /** * init field converters for index columns */ - public void initIndexColumnConverters(CarbonTable carbonTable, List indexedColumn) { + public void initIndexColumnConverters(CarbonTable carbonTable, String dataMapName, + List indexedColumn) { +String cacheLevel = MapUtils.getString( +carbonTable.getTableInfo().getFactTable().getTableProperties(), +CarbonCommonConstants.CACHE_LEVEL, CarbonCommonConstants.CACHE_LEVEL_DEFAULT_VALUE); +this.isBlockletCacheLevel = cacheLevel.equalsIgnoreCase("blocklet"); +if (!this.isBlockletCacheLevel) { + LOGGER.warn( + String.format("BloomFilter datamap %s runs with cache_level=block for table %s.%s," + + " which may decrease its pruning performance", --- End diff -- OK ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r205656230 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/IndexDataMapRebuildRDD.scala --- @@ -357,13 +357,20 @@ class IndexDataMapRebuildRDD[K, V]( // skip clear datamap and we will do this adter rebuild reader.setSkipClearDataMapAtClose(true) +// currently blockletId in rowWithPosition is wrong, we cannot use it --- End diff -- This is a bit confusing, can you rephrase it ---
[GitHub] carbondata pull request #2565: [HotFix][CARBONDATA-2788][BloomDataMap] Fix b...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2565#discussion_r205655699 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java --- @@ -103,7 +106,19 @@ public void init(DataMapModel dataMapModel) throws IOException { /** * init field converters for index columns */ - public void initIndexColumnConverters(CarbonTable carbonTable, List indexedColumn) { + public void initIndexColumnConverters(CarbonTable carbonTable, String dataMapName, + List indexedColumn) { +String cacheLevel = MapUtils.getString( +carbonTable.getTableInfo().getFactTable().getTableProperties(), +CarbonCommonConstants.CACHE_LEVEL, CarbonCommonConstants.CACHE_LEVEL_DEFAULT_VALUE); +this.isBlockletCacheLevel = cacheLevel.equalsIgnoreCase("blocklet"); +if (!this.isBlockletCacheLevel) { + LOGGER.warn( + String.format("BloomFilter datamap %s runs with cache_level=block for table %s.%s," + + " which may decrease its pruning performance", --- End diff -- change to `which may decrease its pruning benefit, which lead to read more data` ---