[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264433 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2333,4 +2333,9 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + + public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE = "carbon.load.all.indexes.to.cache"; + + public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT = "true"; Review comment: add comment when to set false This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378744 ## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/InsertTaskCompletionListener.scala ## @@ -17,19 +17,27 @@ package org.apache.carbondata.spark.rdd +import scala.collection.JavaConverters._ + import org.apache.spark.TaskContext import org.apache.spark.sql.carbondata.execution.datasources.tasklisteners.CarbonLoadTaskCompletionListener import org.apache.spark.sql.execution.command.ExecutionErrors +import org.apache.spark.util.CollectionAccumulator -import org.apache.carbondata.core.util.{DataTypeUtil, ThreadLocalTaskInfo} +import org.apache.carbondata.core.util.{DataTypeUtil, SegmentMinMax, SegmentMinMaxStats, ThreadLocalTaskInfo} import org.apache.carbondata.processing.loading.{DataLoadExecutor, FailureCauses} import org.apache.carbondata.spark.util.CommonUtil class InsertTaskCompletionListener(dataLoadExecutor: DataLoadExecutor, -executorErrors: ExecutionErrors) +executorErrors: ExecutionErrors, +accumulator: CollectionAccumulator[Map[String, List[SegmentMinMax]]]) extends CarbonLoadTaskCompletionListener { override def onTaskCompletion(context: TaskContext): Unit = { try { + // add segment level minMax to accumulator + accumulator.add(SegmentMinMaxStats.getInstance().getSegmentMinMaxMap. +asScala.mapValues(_.asScala.toList).toMap) + SegmentMinMaxStats.getInstance().clear() Review comment: can just be map.clear This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379317644 ## File path: core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Holds list of block level min max for each segment + */ +public class SegmentMinMaxStats { + + private SegmentMinMaxStats() { + } + + public static SegmentMinMaxStats getInstance() { Review comment: in getInstance, create new object and map, only once and then reuse for each load, just clear the map entries once filled in accumulator. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378567 ## File path: core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Holds list of block level min max for each segment + */ +public class SegmentMinMaxStats { + + private SegmentMinMaxStats() { + } + + public static SegmentMinMaxStats getInstance() { +return segmentMinMaxStats; + } + + private Map> segmentMinMaxMap = new HashMap<>(); + + private static final SegmentMinMaxStats segmentMinMaxStats = new SegmentMinMaxStats(); + + public Map> getSegmentMinMaxMap() { +return segmentMinMaxMap; + } + + public void setSegmentMinMaxList(String segmentId, Map minValues, + Map maxValues) { +if (this.segmentMinMaxMap.get(segmentId) == null) { + List segmentMinMaxList = new ArrayList<>(); + segmentMinMaxList.add(new SegmentMinMax(minValues, maxValues)); + this.segmentMinMaxMap.put(segmentId, segmentMinMaxList); +} else { + this.segmentMinMaxMap.get(segmentId) Review comment: in if check already `this.segmentMinMaxMap.get(segmentId)` is null, so here put the segment id in map, nullpointer can come This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379346187 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -158,6 +171,74 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, return dataMaps; } + private void getTableBlockIndexUniqueIdentifierUsingSegmentPruning(DataMapFilter filter, Review comment: better `getTableBlockIndexUniqueIdentifierUsingSegmentPruning` rename method name to `getTableBlockIndexUniqueIdentifierUsingSegmentMinMax` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379376668 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -182,9 +184,14 @@ public static String genSegmentFileName(String segmentId, String UUID) { * @param UUID a UUID string used to construct the segment file name * @return segment file name Review comment: add one more parameter `segmentMinMaxList` in comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378987 ## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala ## @@ -58,6 +61,9 @@ object UpdateDataLoad { loadMetadataDetails.setSegmentStatus(SegmentStatus.SUCCESS) val executor = new DataLoadExecutor TaskContext.get().addTaskCompletionListener { context => +accumulator.add(SegmentMinMaxStats.getInstance().getSegmentMinMaxMap. + asScala.mapValues(_.asScala.toList).toMap) +SegmentMinMaxStats.getInstance().clear() Review comment: same as above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379341045 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentBlockInfo.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.indexstore; + +import java.util.List; +import java.util.Set; + +import org.apache.carbondata.core.util.SegmentMinMax; + +public class SegmentBlockInfo { Review comment: i think you can rename class as `SegmentBlockIndexInfo` or `SegmentIndexInfo` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264536 ## File path: core/src/main/java/org/apache/carbondata/core/datamap/Segment.java ## @@ -85,6 +86,8 @@ */ private transient Map options; + private List segmentMinMax; Review comment: add comment what it stores This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379374364 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -140,9 +149,66 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, segmentMap.put(segment.getSegmentNo(), segment); Set identifiers = getTableBlockIndexUniqueIdentifiers(segment); - // get tableBlockIndexUniqueIdentifierWrappers from segment file info - getTableBlockUniqueIdentifierWrappers(partitionsToPrune, - tableBlockIndexUniqueIdentifierWrappers, identifiers); + if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) { +// get tableBlockIndexUniqueIdentifierWrappers from segment file info +getTableBlockUniqueIdentifierWrappers(partitionsToPrune, +tableBlockIndexUniqueIdentifierWrappers, identifiers); + } else { +List segmentMinMaxList = segment.getSegmentMinMax(); +//boolean isLoadAllIndex = Boolean.parseBoolean(CarbonProperties.getInstance() +// .getProperty(CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE, +// CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT)); +if (null != segmentMinMaxList && !filter.isEmpty() && null != filter && null == FilterUtil +.getImplicitFilterExpression(filter.getExpression())) { + boolean isScanRequired = false; + for (SegmentMinMax segmentMinMax : segmentMinMaxList) { +Map minValues = segmentMinMax.getMinValues(); +Map maxValues = segmentMinMax.getMaxValues(); +int length = minValues.size(); +List columnSchemas = new ArrayList<>(); Review comment: why this one more list is required This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379377714 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java ## @@ -1265,6 +1292,21 @@ void addPath(String path, FolderDetails details) { public void setOptions(Map options) { this.options = options; } + +public List getSegmentMinMax() { + List segmentMinMaxList = null; + try { +segmentMinMaxList = +(List) ObjectSerializationUtil.convertStringToObject(segmentMinMax); + } catch (IOException e) { +LOGGER.error("Error while getting segment minmax"); + } + return segmentMinMaxList; Review comment: i recommend to return empty list instead of null and all the place remove null check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379377880 ## File path: core/src/main/java/org/apache/carbondata/core/util/SegmentMinMax.java ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.io.Serializable; +import java.util.Map; + +/** + * Holds Min, Max and columnCardinality values for each segment block + */ +public class SegmentMinMax implements Serializable { + + /** + * Map of column names and it's block level min values + */ + private Map minValues; + + /** + * Map of column names and it's block level max values + */ + private Map maxValues; + + SegmentMinMax(Map minValues, Map maxValues) { +this.minValues = minValues; +this.maxValues = maxValues; + } + + public Map getMinValues() { +return minValues; + } + + public void setMinValues(Map minValues) { +this.minValues = minValues; + } + + public Map getMaxValues() { +return maxValues; + } + + public void setMaxValues(Map maxValues) { +this.maxValues = maxValues; + } +} Review comment: addnew line at end of class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379343378 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -158,6 +171,74 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, return dataMaps; } + private void getTableBlockIndexUniqueIdentifierUsingSegmentPruning(DataMapFilter filter, + List tableBlockIndexUniqueIdentifierWrappers, + Segment segment, Set identifiers) { +List segmentMinMaxList = segment.getSegmentMinMax(); +//boolean isLoadAllIndex = Boolean.parseBoolean(CarbonProperties.getInstance() +// .getProperty(CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE, +// CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT)); +if (null != segmentMinMaxList && !filter.isEmpty()) { Review comment: i think for filter, you should have null check instead of empty check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379304868 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/SegmentBlockInfo.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.indexstore; + +import java.util.List; +import java.util.Set; + +import org.apache.carbondata.core.util.SegmentMinMax; + +public class SegmentBlockInfo { Review comment: give class level and variable level comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379339875 ## File path: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ## @@ -211,15 +292,22 @@ private void getTableBlockUniqueIdentifierWrappers(List partition public Set getTableBlockIndexUniqueIdentifiers(Segment segment) Review comment: replace this method with this and verify once SegmentBlockInfo segmentBlockInfo = segmentMap.get(segment.getSegmentNo()); Set tableBlockIndexUniqueIdentifiers; if (null != segmentBlockInfo) { segment.setSegmentMinMax(segmentMap.get(segment.getSegmentNo()).getSegmentMinMax()); tableBlockIndexUniqueIdentifiers = segmentBlockInfo.getTableBlockIndexUniqueIdentifiers(); } else { tableBlockIndexUniqueIdentifiers = BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment); if (tableBlockIndexUniqueIdentifiers.size() > 0) { segmentMap.put(segment.getSegmentNo(), new SegmentBlockInfo(tableBlockIndexUniqueIdentifiers, segment.getSegmentMinMax())); } } return tableBlockIndexUniqueIdentifiers; This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264459 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2333,4 +2333,9 @@ private CarbonCommonConstants() { * Default first day of week */ public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = "SUNDAY"; + Review comment: add comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379379224 ## File path: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ## @@ -315,7 +316,10 @@ object CarbonDataRDDFactory { val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable var status: Array[(String, (LoadMetadataDetails, ExecutionErrors))] = null var res: Array[List[(String, (LoadMetadataDetails, ExecutionErrors))]] = null - +// accumulator to collect segment minmax +val minMaxAccumulator = sqlContext Review comment: rename to `segmentMinMaxAccumulator` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache
akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378160 ## File path: core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Holds list of block level min max for each segment + */ +public class SegmentMinMaxStats { + + private SegmentMinMaxStats() { + } + + public static SegmentMinMaxStats getInstance() { +return segmentMinMaxStats; + } + + private Map> segmentMinMaxMap = new HashMap<>(); + + private static final SegmentMinMaxStats segmentMinMaxStats = new SegmentMinMaxStats(); + + public Map> getSegmentMinMaxMap() { +return segmentMinMaxMap; + } + + public void setSegmentMinMaxList(String segmentId, Map minValues, + Map maxValues) { +if (this.segmentMinMaxMap.get(segmentId) == null) { + List segmentMinMaxList = new ArrayList<>(); + segmentMinMaxList.add(new SegmentMinMax(minValues, maxValues)); + this.segmentMinMaxMap.put(segmentId, segmentMinMaxList); +} else { + this.segmentMinMaxMap.get(segmentId) + .add(new SegmentMinMax(minValues, maxValues)); +} + } + + public void clear() { Review comment: this method can be removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services