[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264433
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##
 @@ -2333,4 +2333,9 @@ private CarbonCommonConstants() {
* Default first day of week
*/
   public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = 
"SUNDAY";
+
+  public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE = 
"carbon.load.all.indexes.to.cache";
+
+  public static final String CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT = "true";
 
 Review comment:
   add comment when to set false


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378744
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/InsertTaskCompletionListener.scala
 ##
 @@ -17,19 +17,27 @@
 
 package org.apache.carbondata.spark.rdd
 
+import scala.collection.JavaConverters._
+
 import org.apache.spark.TaskContext
 import 
org.apache.spark.sql.carbondata.execution.datasources.tasklisteners.CarbonLoadTaskCompletionListener
 import org.apache.spark.sql.execution.command.ExecutionErrors
+import org.apache.spark.util.CollectionAccumulator
 
-import org.apache.carbondata.core.util.{DataTypeUtil, ThreadLocalTaskInfo}
+import org.apache.carbondata.core.util.{DataTypeUtil, SegmentMinMax, 
SegmentMinMaxStats, ThreadLocalTaskInfo}
 import org.apache.carbondata.processing.loading.{DataLoadExecutor, 
FailureCauses}
 import org.apache.carbondata.spark.util.CommonUtil
 
 class InsertTaskCompletionListener(dataLoadExecutor: DataLoadExecutor,
-executorErrors: ExecutionErrors)
+executorErrors: ExecutionErrors,
+accumulator: CollectionAccumulator[Map[String, List[SegmentMinMax]]])
   extends CarbonLoadTaskCompletionListener {
   override def onTaskCompletion(context: TaskContext): Unit = {
 try {
+  // add segment level minMax to accumulator
+  accumulator.add(SegmentMinMaxStats.getInstance().getSegmentMinMaxMap.
+asScala.mapValues(_.asScala.toList).toMap)
+  SegmentMinMaxStats.getInstance().clear()
 
 Review comment:
   can just be map.clear


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379317644
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Holds list of block level min max for each segment
+ */
+public class SegmentMinMaxStats {
+
+  private SegmentMinMaxStats() {
+  }
+
+  public static SegmentMinMaxStats getInstance() {
 
 Review comment:
   in getInstance, create new object and map, only once and then reuse for each 
load, just clear the map entries once filled in accumulator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378567
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Holds list of block level min max for each segment
+ */
+public class SegmentMinMaxStats {
+
+  private SegmentMinMaxStats() {
+  }
+
+  public static SegmentMinMaxStats getInstance() {
+return segmentMinMaxStats;
+  }
+
+  private Map> segmentMinMaxMap = new HashMap<>();
+
+  private static final SegmentMinMaxStats segmentMinMaxStats = new 
SegmentMinMaxStats();
+
+  public Map> getSegmentMinMaxMap() {
+return segmentMinMaxMap;
+  }
+
+  public void setSegmentMinMaxList(String segmentId, Map 
minValues,
+  Map maxValues) {
+if (this.segmentMinMaxMap.get(segmentId) == null) {
+  List segmentMinMaxList = new ArrayList<>();
+  segmentMinMaxList.add(new SegmentMinMax(minValues, maxValues));
+  this.segmentMinMaxMap.put(segmentId, segmentMinMaxList);
+} else {
+  this.segmentMinMaxMap.get(segmentId)
 
 Review comment:
   in if check already `this.segmentMinMaxMap.get(segmentId)` is null, so here 
put the segment id in map, nullpointer can come


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379346187
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -158,6 +171,74 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
 return dataMaps;
   }
 
+  private void 
getTableBlockIndexUniqueIdentifierUsingSegmentPruning(DataMapFilter filter,
 
 Review comment:
   better `getTableBlockIndexUniqueIdentifierUsingSegmentPruning` rename method 
name to `getTableBlockIndexUniqueIdentifierUsingSegmentMinMax`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379376668
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
 ##
 @@ -182,9 +184,14 @@ public static String genSegmentFileName(String segmentId, 
String UUID) {
* @param UUID  a UUID string used to construct the segment file name
* @return segment file name
 
 Review comment:
   add one more parameter `segmentMinMaxList`  in comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378987
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala
 ##
 @@ -58,6 +61,9 @@ object UpdateDataLoad {
   loadMetadataDetails.setSegmentStatus(SegmentStatus.SUCCESS)
   val executor = new DataLoadExecutor
   TaskContext.get().addTaskCompletionListener { context =>
+accumulator.add(SegmentMinMaxStats.getInstance().getSegmentMinMaxMap.
+  asScala.mapValues(_.asScala.toList).toMap)
+SegmentMinMaxStats.getInstance().clear()
 
 Review comment:
   same as above


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379341045
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/SegmentBlockInfo.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.indexstore;
+
+import java.util.List;
+import java.util.Set;
+
+import org.apache.carbondata.core.util.SegmentMinMax;
+
+public class SegmentBlockInfo {
 
 Review comment:
   i think you can rename class as `SegmentBlockIndexInfo` or `SegmentIndexInfo`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264536
 
 

 ##
 File path: core/src/main/java/org/apache/carbondata/core/datamap/Segment.java
 ##
 @@ -85,6 +86,8 @@
*/
   private transient Map options;
 
+  private List segmentMinMax;
 
 Review comment:
   add comment what it stores 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379374364
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -140,9 +149,66 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
   segmentMap.put(segment.getSegmentNo(), segment);
   Set identifiers =
   getTableBlockIndexUniqueIdentifiers(segment);
-  // get tableBlockIndexUniqueIdentifierWrappers from segment file info
-  getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
-  tableBlockIndexUniqueIdentifierWrappers, identifiers);
+  if (null != partitionsToPrune && !partitionsToPrune.isEmpty()) {
+// get tableBlockIndexUniqueIdentifierWrappers from segment file info
+getTableBlockUniqueIdentifierWrappers(partitionsToPrune,
+tableBlockIndexUniqueIdentifierWrappers, identifiers);
+  } else {
+List segmentMinMaxList = segment.getSegmentMinMax();
+//boolean isLoadAllIndex = 
Boolean.parseBoolean(CarbonProperties.getInstance()
+//
.getProperty(CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE,
+//
CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT));
+if (null != segmentMinMaxList && !filter.isEmpty() && null != filter 
&& null == FilterUtil
+.getImplicitFilterExpression(filter.getExpression())) {
+  boolean isScanRequired = false;
+  for (SegmentMinMax segmentMinMax : segmentMinMaxList) {
+Map minValues = segmentMinMax.getMinValues();
+Map maxValues = segmentMinMax.getMaxValues();
+int length = minValues.size();
+List columnSchemas = new ArrayList<>();
 
 Review comment:
   why this one more list is required


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379377714
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
 ##
 @@ -1265,6 +1292,21 @@ void addPath(String path, FolderDetails details) {
 public void setOptions(Map options) {
   this.options = options;
 }
+
+public List getSegmentMinMax() {
+  List segmentMinMaxList = null;
+  try {
+segmentMinMaxList =
+(List) 
ObjectSerializationUtil.convertStringToObject(segmentMinMax);
+  } catch (IOException e) {
+LOGGER.error("Error while getting segment minmax");
+  }
+  return segmentMinMaxList;
 
 Review comment:
   i recommend to return empty list instead of null and all the place remove 
null check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379377880
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/SegmentMinMax.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.io.Serializable;
+import java.util.Map;
+
+/**
+ * Holds Min, Max and columnCardinality values for each segment block
+ */
+public class SegmentMinMax implements Serializable {
+
+  /**
+   * Map of column names and it's block level min values
+   */
+  private Map minValues;
+
+  /**
+   * Map of column names and it's block level max values
+   */
+  private Map maxValues;
+
+  SegmentMinMax(Map minValues, Map maxValues) {
+this.minValues = minValues;
+this.maxValues = maxValues;
+  }
+
+  public Map getMinValues() {
+return minValues;
+  }
+
+  public void setMinValues(Map minValues) {
+this.minValues = minValues;
+  }
+
+  public Map getMaxValues() {
+return maxValues;
+  }
+
+  public void setMaxValues(Map maxValues) {
+this.maxValues = maxValues;
+  }
+}
 
 Review comment:
   addnew line at end of class


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379343378
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -158,6 +171,74 @@ public DataMapBuilder createBuilder(Segment segment, 
String shardName,
 return dataMaps;
   }
 
+  private void 
getTableBlockIndexUniqueIdentifierUsingSegmentPruning(DataMapFilter filter,
+  List 
tableBlockIndexUniqueIdentifierWrappers,
+  Segment segment, Set identifiers) {
+List segmentMinMaxList = segment.getSegmentMinMax();
+//boolean isLoadAllIndex = 
Boolean.parseBoolean(CarbonProperties.getInstance()
+//
.getProperty(CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE,
+//
CarbonCommonConstants.CARBON_LOAD_ALL_INDEX_TO_CACHE_DEFAULT));
+if (null != segmentMinMaxList && !filter.isEmpty()) {
 
 Review comment:
   i think for filter, you should have null check instead of empty check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379304868
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/SegmentBlockInfo.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.indexstore;
+
+import java.util.List;
+import java.util.Set;
+
+import org.apache.carbondata.core.util.SegmentMinMax;
+
+public class SegmentBlockInfo {
 
 Review comment:
   give class level and variable level comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379339875
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##
 @@ -211,15 +292,22 @@ private void 
getTableBlockUniqueIdentifierWrappers(List partition
 
   public Set 
getTableBlockIndexUniqueIdentifiers(Segment segment)
 
 Review comment:
   replace this method with this and verify once
   
SegmentBlockInfo segmentBlockInfo = segmentMap.get(segment.getSegmentNo());
   Set tableBlockIndexUniqueIdentifiers;
   if (null != segmentBlockInfo) {
 
segment.setSegmentMinMax(segmentMap.get(segment.getSegmentNo()).getSegmentMinMax());
 tableBlockIndexUniqueIdentifiers = 
segmentBlockInfo.getTableBlockIndexUniqueIdentifiers();
   } else {
 tableBlockIndexUniqueIdentifiers =
 BlockletDataMapUtil.getTableBlockUniqueIdentifiers(segment);
 if (tableBlockIndexUniqueIdentifiers.size() > 0) {
   segmentMap.put(segment.getSegmentNo(),
   new SegmentBlockInfo(tableBlockIndexUniqueIdentifiers, 
segment.getSegmentMinMax()));
 }
   }
   return tableBlockIndexUniqueIdentifiers;


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379264459
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##
 @@ -2333,4 +2333,9 @@ private CarbonCommonConstants() {
* Default first day of week
*/
   public static final String CARBON_TIMESERIES_FIRST_DAY_OF_WEEK_DEFAULT = 
"SUNDAY";
+
 
 Review comment:
   add comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379379224
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ##
 @@ -315,7 +316,10 @@ object CarbonDataRDDFactory {
 val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
 var status: Array[(String, (LoadMetadataDetails, ExecutionErrors))] = null
 var res: Array[List[(String, (LoadMetadataDetails, ExecutionErrors))]] = 
null
-
+// accumulator to collect segment minmax
+val minMaxAccumulator = sqlContext
 
 Review comment:
   rename to `segmentMinMaxAccumulator`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3584: [WIP] Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-14 Thread GitBox
akashrn5 commented on a change in pull request #3584: [WIP] Support 
SegmentLevel MinMax for better Pruning and less driver memory usage for cache
URL: https://github.com/apache/carbondata/pull/3584#discussion_r379378160
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/SegmentMinMaxStats.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Holds list of block level min max for each segment
+ */
+public class SegmentMinMaxStats {
+
+  private SegmentMinMaxStats() {
+  }
+
+  public static SegmentMinMaxStats getInstance() {
+return segmentMinMaxStats;
+  }
+
+  private Map> segmentMinMaxMap = new HashMap<>();
+
+  private static final SegmentMinMaxStats segmentMinMaxStats = new 
SegmentMinMaxStats();
+
+  public Map> getSegmentMinMaxMap() {
+return segmentMinMaxMap;
+  }
+
+  public void setSegmentMinMaxList(String segmentId, Map 
minValues,
+  Map maxValues) {
+if (this.segmentMinMaxMap.get(segmentId) == null) {
+  List segmentMinMaxList = new ArrayList<>();
+  segmentMinMaxList.add(new SegmentMinMax(minValues, maxValues));
+  this.segmentMinMaxMap.put(segmentId, segmentMinMaxList);
+} else {
+  this.segmentMinMaxMap.get(segmentId)
+  .add(new SegmentMinMax(minValues, maxValues));
+}
+  }
+
+  public void clear() {
 
 Review comment:
   this method can be removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services