[GitHub] incubator-carbondata pull request #207: [CARBONDATA-283] VT enhancement for ...

2016-11-03 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/207#discussion_r86347851
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Date;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.load.LoadMetadataDetails;
+
+/**
+ * This class contains all table status file utilities
+ */
+public final class CarbonTableStatusUtil {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName());
+
+  private CarbonTableStatusUtil() {
+
+  }
+
+  /**
+   * updates table status details using latest metadata
+   *
+   * @param oldMetadata
+   * @param newMetadata
+   * @return
+   */
+
+  public static List updateLatestTableStatusDetails(
+  LoadMetadataDetails[] oldMetadata, LoadMetadataDetails[] 
newMetadata) {
+
+List newListMetadata =
+new ArrayList(Arrays.asList(newMetadata));
+for (LoadMetadataDetails oldSegment : oldMetadata) {
+  if 
(CarbonCommonConstants.MARKED_FOR_DELETE.equalsIgnoreCase(oldSegment.getLoadStatus()))
 {
+
updateSegmentMetadataDetails(newListMetadata.get(newListMetadata.indexOf(oldSegment)));
+  }
+}
+return newListMetadata;
+  }
+
+  /**
+   * returns current time
+   *
+   * @return
+   */
+  private static String readCurrentTime() {
+SimpleDateFormat sdf = new 
SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP);
+String date = null;
+
+date = sdf.format(new Date());
+
+return date;
+  }
+
+  /**
+   * updates segment status and modificaton time details
+   *
+   * @param loadMetadata
+   */
+  public static void updateSegmentMetadataDetails(LoadMetadataDetails 
loadMetadata) {
--- End diff --

Move these function to status Manager


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #237: [CARBONDATA-317] - CSV having only s...

2016-10-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/237#discussion_r83584249
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/csv/CarbonCsvRelation.scala
 ---
@@ -148,6 +150,10 @@ case class CarbonCsvRelation protected[spark] (
   .withSkipHeaderRecord(false)
 CSVParser.parse(firstLine, 
csvFormat).getRecords.get(0).asScala.toArray
   }
+  if(null == firstRow) {
+throw new DataLoadingException("Please check your input path and 
make sure " +
--- End diff --

"Give a message First line cannot be empty",  irrespective of header 
present or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

2016-10-02 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/189#discussion_r81474992
  
--- Diff: format/src/main/thrift/schema.thrift ---
@@ -124,6 +124,7 @@ struct TableSchema{
1: required string table_id;  // ID used to
2: required list table_columns; // Columns in the table
3: required SchemaEvolution schema_evolution; // History of schema 
evolution of this table
+   4: optional i32 block_size
--- End diff --

@Zhangshunyu Actually need not change Schema for such extra properties 
which are being added. We can directly store TABLE_PROPERTIES as a string key 
value pairs in thrift. So that all such properties configuration parameters can 
be effectively handled without modifying thrift in future. @jackylk please 
comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #190: [CARBONDATA-268]Improve carbonoptimi...

2016-09-23 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/190#discussion_r80220071
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala
 ---
@@ -68,9 +70,50 @@ object CarbonOptimizer {
  */
 class ResolveCarbonFunctions(relations: Seq[CarbonDecoderRelation])
   extends Rule[LogicalPlan] with PredicateHelper {
-
+  val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
   def apply(plan: LogicalPlan): LogicalPlan = {
-transformCarbonPlan(plan, relations)
+if (relations.nonEmpty && !isOptimized(plan)) {
+  LOGGER.info("Starting to optimize plan")
+  val startTime = System.currentTimeMillis
+  val result = transformCarbonPlan(plan, relations)
+  LOGGER.info("Time taken to optimize plan: " + ( 
System.currentTimeMillis - startTime))
--- End diff --

add this to STATISTICS logger type


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #190: [CARBONDATA-268]Improve carbonoptimi...

2016-09-23 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/190#discussion_r80220168
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala
 ---
@@ -397,20 +452,38 @@ class ResolveCarbonFunctions(relations: 
Seq[CarbonDecoderRelation])
 case others => others
   }
 
+}
+
+val tStartTime = System.currentTimeMillis
--- End diff --

this variable not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...

2016-09-23 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/194#discussion_r80221140
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java ---
@@ -1401,8 +1401,7 @@ public static void logError(Throwable e, boolean 
invalidRowsPresent) {
   public static boolean nanSafeEqualsDoubles(Double d1, Double d2) {
 Boolean xIsNan = Double.isNaN(d1);
 Boolean yIsNan = Double.isNaN(d2);
-if ((xIsNan && yIsNan) || (d1.doubleValue() == d2.doubleValue())) {
-
+if ((d1.doubleValue() == d2.doubleValue()) || (xIsNan && yIsNan)) {
--- End diff --

 Boolean xIsNan = Double.isNaN(d1); these variables are already computed, 
no use of moving them. Instead use ( Double.isNaN(d1) && Double.isNaN(d2))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #190: [CARBONDATA-268]Improve carbonoptimi...

2016-09-23 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/190#discussion_r80219989
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala
 ---
@@ -68,9 +70,50 @@ object CarbonOptimizer {
  */
 class ResolveCarbonFunctions(relations: Seq[CarbonDecoderRelation])
   extends Rule[LogicalPlan] with PredicateHelper {
-
+  val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
   def apply(plan: LogicalPlan): LogicalPlan = {
-transformCarbonPlan(plan, relations)
+if (relations.nonEmpty && !isOptimized(plan)) {
+  LOGGER.info("Starting to optimize plan")
+  val startTime = System.currentTimeMillis
+  val result = transformCarbonPlan(plan, relations)
+  LOGGER.info("Time taken to optimize plan: " + ( 
System.currentTimeMillis - startTime))
--- End diff --

change the statement to "Time taken for Carbon Optimizer to optimize"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #190: [CARBONDATA-268]Improve carbonoptimi...

2016-09-23 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/190#discussion_r80219576
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala
 ---
@@ -87,20 +130,32 @@ class ResolveCarbonFunctions(relations: 
Seq[CarbonDecoderRelation])
   return plan
 }
 var decoder = false
+val mapOfNodes = new java.util.HashMap[LogicalPlan, ExtraNodeInfo]
+fillNodeInfo(plan, mapOfNodes)
 val aliasMap = CarbonAliasDecoderRelation()
 // collect alias information before hand.
 collectInformationOnAttributes(plan, aliasMap)
-val transFormedPlan =
-  plan transformDown {
-case cd: CarbonDictionaryTempDecoder if cd.isOuter =>
-  decoder = true
-  cd
+
+def hasCarbonRelation(currentPlan: LogicalPlan): Boolean = {
+  val extraNodeInfo = mapOfNodes.get(currentPlan)
--- End diff --

Change the name of mapofNodes to mapOfNonCarbonPlanNodes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290374
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 ---
@@ -102,7 +115,7 @@ class CarbonScanRDD[V: ClassTag](
 val splits = carbonInputFormat.getSplits(job)
 if (!splits.isEmpty) {
   val carbonInputSplits = 
splits.asScala.map(_.asInstanceOf[CarbonInputSplit])
-
+  
queryModel.setInvalidSegmentIds(validAndInvalidSegments.getInvalidSegments)
--- End diff --

move this to common getSplits, other wise validAndInvalidSegments can be 
null, if parallel deletion happens.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290196
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/ft/CarbonInputMapperTest.java 
---
@@ -129,6 +132,37 @@ private int countTheColumns(String outPath) throws 
Exception {
 return 0;
   }
 
+  private void runJob(String outPath, CarbonProjection projection, 
Expression filter)
--- End diff --

move back the code to original place


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290267
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -706,8 +725,9 @@ private String getUpdateExtension() {
   /**
* @return updateExtension
*/
-  private String[] getValidSegments(JobContext job) throws IOException {
-String segmentString = 
job.getConfiguration().get(INPUT_SEGMENT_NUMBERS, "");
+  private String[] getSegmentsFromConfiguration(JobContext job, String 
segmentType)
+  throws IOException {
+String segmentString = job.getConfiguration().get(segmentType, "");
--- End diff --

change signature to previous one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

2016-09-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/161#discussion_r79289289
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala
 ---
@@ -102,6 +102,11 @@ class CarbonMergerRDD[K, V](
 var dataloadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
 val carbonSparkPartition = 
theSplit.asInstanceOf[CarbonSparkPartition]
 
+// get destination segment properties as sent from driver which is 
of last segment.
+
+val segmentProperties = new 
SegmentProperties(carbonMergerMapping.columnSchemaList.asJava,
--- End diff --

rename it to maxSegmentcolumnSchemaList and maxSegmentColCardinality.
Write comment where it is declared


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

2016-09-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/161#discussion_r79289428
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala
 ---
@@ -259,6 +253,9 @@ class CarbonMergerRDD[K, V](
 )
   )
 
+  // keep on assigning till last one is reached.
+  blocksOfLastSegment = blocksOfOneSegment.asJava
--- End diff --

add null check and size check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79288466
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/datastore/BlockIndexStore.java
 ---
@@ -260,11 +295,29 @@ public void removeTableBlocks(List 
removeTableBlocksInfos,
 }
 Map<TableBlockInfo, AbstractIndex> map = 
tableBlocksMap.get(absoluteTableIdentifier);
 // if there is no loaded blocks then return
-if (null == map) {
+if (null == map || map.isEmpty()) {
+  return;
+}
+Map<String, List> segmentIdToBlockInfoMap =
+segmentIdToBlockListMap.get(absoluteTableIdentifier);
+if (null == segmentIdToBlockInfoMap || 
segmentIdToBlockInfoMap.isEmpty()) {
   return;
 }
-for (TableBlockInfo blockInfos : removeTableBlocksInfos) {
-  map.remove(blockInfos);
+synchronized (lockObject) {
+  for (String segmentId : segmentsToBeRemoved) {
+List tableBlockInfoList = 
segmentIdToBlockInfoMap.get(segmentId);
+if (null == tableBlockInfoList) {
+  continue;
+}
+Iterator tableBlockInfoIterator = 
tableBlockInfoList.iterator();
+while (tableBlockInfoIterator.hasNext()) {
+  TableBlockInfo info = tableBlockInfoIterator.next();
+  AbstractIndex remove = map.remove(info);
+  if (null != remove) {
--- End diff --

tableBlockInfoIterator.remove needs to called irrespective of null != remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #168: [CARBONDATA-251] making the auto com...

2016-09-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/168#discussion_r79287073
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -623,14 +623,7 @@ object CarbonDataRDDFactory extends Logging {
   }
 }
   }
-  if(compactionModel.isDDLTrigger) {
-// making this an blocking call for DDL
-compactionThread.run()
-  }
-  else {
-// non blocking call in case of auto compaction.
-compactionThread.start()
-  }
+compactionThread.run()
--- End diff --

Add a comment, why run is called instead of start, future will make it 
concurrent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #168: [CARBONDATA-251] making the auto com...

2016-09-17 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/168#discussion_r79287056
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -920,7 +920,7 @@
* Default value of Property for enabling system level compaction lock.1 
compaction can run
* at once.
*/
-  public static String DEFAULT_ENABLE_CONCURRENT_COMPACTION = "false";
+  public static String DEFAULT_ENABLE_CONCURRENT_COMPACTION = "true";
--- End diff --

Add comment that this property "carbon.concurrent.compaction" is @Depricated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-16 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79222167
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -101,6 +106,8 @@
   //comma separated list of input segment numbers
   public static final String INPUT_SEGMENT_NUMBERS =
   "mapreduce.input.carboninputformat.segmentnumbers";
+  public static final String INVALID_SEGMENT_NUMBERS =
+  "mapreduce.input.carboninputformat.invalidsegmentnumbers";
--- End diff --

Invalid segment deletion, need not be through CarbonInputFormat, When 
Invalid segments list given to Btree(both in Driver and executor it should able 
it delete invalid blocks).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #155: [CARBONDATA-239] Handling the except...

2016-09-16 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/155#discussion_r79208835
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -590,16 +605,25 @@ object CarbonDataRDDFactory extends Logging {
 executor, sqlContext, kettleHomePath, storeLocation
   )
 }
+catch {
+  case e: Exception =>
+logger.error("Exception in compaction thread for table 
" + tableForCompaction
+  .carbonTable.getDatabaseName + "." +
+ tableForCompaction.carbonTableIdentifier
+   .getTableName)
+  // not handling the exception. only logging as this is 
not the table triggered
+  // by user.
+}
 finally {
-  // delete the compaction required file
+  // delete the compaction required file in case of 
failure or success also.
--- End diff --

If compact file deletion fails, add failed table to skipped list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-15 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79003748
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -102,6 +91,60 @@ public long getTableStatusLastModifiedTime() throws 
IOException {
 
   /**
* get valid segment for given table
+   *
+   * @return
+   * @throws IOException
+   */
+  public InvalidSegmentsInfo getInvalidSegments() throws IOException {
--- End diff --

This requires reading SegmentInfo twice, once for valid blocks and next for 
Invalid Blocks. Instead send a single class containting ValidAndInvalidBlocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #140: [CARBONDATA-227] In block distributi...

2016-09-08 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/140#discussion_r77950753
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 ---
@@ -73,7 +73,7 @@ class CarbonScanRDD[V: ClassTag](
 baseStoreLocation: String)
   extends RDD[V](sc, Nil) with Logging {
 
-  val defaultParallelism = sc.defaultParallelism
+  var defaultParallelism = sc.defaultParallelism
--- End diff --

now member is not required, make it a private val.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #137: [CARBONDATA-222] Handled query issue...

2016-09-08 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/137#discussion_r77950155
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/datastore/impl/btree/BTreeDataRefNodeFinder.java
 ---
@@ -240,9 +240,16 @@ private int compareIndexes(IndexKey first, IndexKey 
second) {
   firstNoDictionaryKeyBuffer.getShort(nonDictionaryKeyOffset + 
SHORT_SIZE_IN_BYTES);
   secondNodeDictionaryLength =
   secondNoDictionaryKeyBuffer.getShort(nonDictionaryKeyOffset 
+ SHORT_SIZE_IN_BYTES);
-  compareResult = ByteUtil.UnsafeComparer.INSTANCE
-  .compareTo(first.getNoDictionaryKeys(), actualOffset, 
firstNoDcitionaryLength,
-  second.getNoDictionaryKeys(), actualOffset, 
secondNodeDictionaryLength);
+  int minLength = Math.min(firstNoDcitionaryLength, 
secondNodeDictionaryLength);
--- End diff --

@kumarvishal09 please comment on this. I am not getting how actualOffset is 
same for both keybuffers. 
Also as I know length can be calculated by nextOffset-currentOffset. Please 
give the structure of data stored and how this function works 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #139: [CARBONDATA-224]Fixed data mismatch ...

2016-09-07 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/139#discussion_r77947424
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala ---
@@ -864,8 +864,8 @@ class CarbonSqlParser()
/**
 * detects whether double or decimal column is part of 
dictionary_exclude
 */
-  def isDoubleDecimalColDictionaryExclude(columnDataType: String): Boolean 
= {
-val dataTypes = Array("double", "decimal")
+  def isNumberInColDictionaryExclude(columnDataType: String): Boolean = {
+ val dataTypes = Array("double", "decimal", "int")
--- End diff --

Add a reverse check that only string or (varchar-check) is only supported. 
As other datatypes like bigint etc will have problem in above check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #116: [CARBONDATA-198] System level lock f...

2016-09-01 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/116#discussion_r77129093
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -333,7 +531,7 @@ object CarbonDataRDDFactory extends Logging {
   )
 }
 
-var loadsToMerge = CarbonDataMergerUtil.identifySegmentsToBeMerged(
+val loadsToMerge = CarbonDataMergerUtil.identifySegmentsToBeMerged(
--- End diff --

this condition is not required before thread launch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #116: [CARBONDATA-198] System level lock f...

2016-09-01 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/116#discussion_r77128213
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -297,15 +366,144 @@ object CarbonDataRDDFactory extends Logging {
 }
 else {
   logger
-.audit("Not able to acquire the compaction lock for table " +
+.audit("Not able to acquire the system level compaction lock for 
table " +
   
s"${carbonLoadModel.getDatabaseName}.${carbonLoadModel.getTableName}"
 )
   logger
 .error("Not able to acquire the compaction lock for table " + 
carbonLoadModel
   .getDatabaseName + "." + carbonLoadModel.getTableName
 )
-  sys.error("Table is already locked for compaction. Please try after 
some time.")
+  if (!CarbonCompactionUtil
+.createCompactionRequiredFile(carbonTable.getMetaDataFilepath, 
compactionType)) {
+logger.error("Not able to create a compaction required file for 
table " + carbonLoadModel
+  .getDatabaseName + "." + carbonLoadModel.getTableName
+)
+  }
+  else {
+logger
+  .info("successfully created a compaction required file for table 
" + carbonLoadModel
+.getDatabaseName + "." + carbonLoadModel.getTableName
+  )
+  }
+  sys.error("System is already locked for compaction. Please try after 
some time.")
--- End diff --

change message to "Compaction is in progress, compaction request for table 
"" in queue".
Print error only during ddl trigger.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #116: [CARBONDATA-198] System level lock f...

2016-09-01 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/116#discussion_r77127755
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -297,15 +366,144 @@ object CarbonDataRDDFactory extends Logging {
 }
 else {
   logger
-.audit("Not able to acquire the compaction lock for table " +
+.audit("Not able to acquire the system level compaction lock for 
table " +
   
s"${carbonLoadModel.getDatabaseName}.${carbonLoadModel.getTableName}"
 )
   logger
 .error("Not able to acquire the compaction lock for table " + 
carbonLoadModel
   .getDatabaseName + "." + carbonLoadModel.getTableName
 )
-  sys.error("Table is already locked for compaction. Please try after 
some time.")
+  if (!CarbonCompactionUtil
+.createCompactionRequiredFile(carbonTable.getMetaDataFilepath, 
compactionType)) {
--- End diff --

move printing log to  function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #111: [CARBONDATA-194] ArrayIndexOfBoundEx...

2016-08-31 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/111#discussion_r76936740
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -104,6 +108,20 @@ public void initialize() throws IOException {
   }
 
   /**
+   * This method will decide the number of coulmns to be parsed for a row 
by univocity parser
+   *
+   * @param columnCountInSchema total number of columns in schema
+   * @return
+   */
+  private int getMaxColumnsForParsing(int columnCountInSchema) {
+int maxNumberOfColumnsForParsing = columnCountInSchema;
+if (columnCountInSchema < MAX_NUMBER_OF_COLUMNS_FOR_PARSING) {
+  maxNumberOfColumnsForParsing = MAX_NUMBER_OF_COLUMNS_FOR_PARSING;
+}
+return maxNumberOfColumnsForParsing;
--- End diff --

Add +10 if schema columns are considered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---