[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716151
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -130,41 +130,27 @@ public static CarbonTable 
getCarbonTable(Configuration configuration) throws IOE
 return (CarbonTable) 
ObjectSerializationUtil.convertStringToObject(carbonTableStr);
   }
 
-  /**
-   * It sets unresolved filter expression.
-   *
-   * @param configuration
-   * @param filterExpression
-   */
-  public static void setFilterPredicates(Configuration configuration, 
Expression filterExpression) {
-try {
-  String filterString = 
ObjectSerializationUtil.convertObjectToString(filterExpression);
-  configuration.set(FILTER_PREDICATE, filterString);
-} catch (Exception e) {
-  throw new RuntimeException("Error while setting filter expression to 
Job", e);
-}
+  public static void setTablePath(Configuration configuration, String 
tablePath)
+  throws IOException {
+configuration.set(FileInputFormat.INPUT_DIR, tablePath);
   }
 
   /**
-   * It sets the resolved filter expression
+   * It sets unresolved filter expression.
*
* @param configuration
* @param filterExpression
*/
-  public static void setFilterPredicates(Configuration configuration,
-  FilterResolverIntf filterExpression) {
+  public static void setFilterPredicates(Configuration configuration, 
Expression filterExpression) {
--- End diff --

ok, added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716124
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 ---
@@ -244,68 +195,40 @@ class CarbonScanRDD[V: ClassTag](
   throw new java.util.NoSuchElementException("End of stream")
 }
 havePair = false
-recordCount += 1
-keyClass.getValue(rowIterator.next())
+val value: V = reader.getCurrentValue
+value
   }
+}
+  }
 
-  def clearDictionaryCache(columnToDictionaryMap: 
java.util.Map[String, Dictionary]) = {
-if (null != columnToDictionaryMap) {
-  org.apache.carbondata.spark.util.CarbonQueryUtil
-.clearColumnDictionaryCache(columnToDictionaryMap)
-}
-  }
+  private def prepareInputFormatForDriver(conf: Configuration): 
CarbonInputFormat[V] = {
+CarbonInputFormat.setCarbonTable(conf, carbonTable)
+createInputFormat(conf)
+  }
 
-  def logStatistics(): Unit = {
-  if (null != queryModel.getStatisticsRecorder) {
-var queryStatistic = new QueryStatistic()
-queryStatistic
-  
.addFixedTimeStatistic(QueryStatisticsConstants.EXECUTOR_PART,
-System.currentTimeMillis - queryStartTime
-  )
-
queryModel.getStatisticsRecorder.recordStatistics(queryStatistic)
-// result size
-queryStatistic = new QueryStatistic()
-
queryStatistic.addCountStatistic(QueryStatisticsConstants.RESULT_SIZE, 
recordCount)
-
queryModel.getStatisticsRecorder.recordStatistics(queryStatistic)
-// print executor query statistics for each task_id
-queryModel.getStatisticsRecorder.logStatisticsAsTableExecutor()
-  }
-}
-  }
+  private def prepareInputFormatForExecutor(conf: Configuration): 
CarbonInputFormat[V] = {
+CarbonInputFormat.setCarbonReadSupport(classOf[RawDataReadSupport], 
conf)
+createInputFormat(conf)
+  }
 
-iter
+  private def createInputFormat(conf: Configuration): CarbonInputFormat[V] 
= {
+val format = new CarbonInputFormat[V]
+CarbonInputFormat.setTablePath(conf, identifier.getTablePath)
+CarbonInputFormat.setFilterPredicates(conf, filterExpression)
+val projection = new CarbonProjection
+columnProjection.foreach { attr =>
+  projection.addColumn(attr.asInstanceOf[AttributeReference].name)
+}
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716097
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -97,15 +91,14 @@ public long getTableStatusLastModifiedTime() throws 
IOException {
* @return
* @throws IOException
*/
-  public ValidAndInvalidSegmentsInfo getValidAndInvalidSegments() throws 
IOException {
+  public static ValidSegmentsInfo getValidSegments(AbsoluteTableIdentifier 
identifier)
--- End diff --

ok, added back.
invalid segments need to pass to task side, executor need to remove them in 
the index store


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716071
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -224,42 +221,29 @@ private void addSegmentsIfEmpty(JobContext job, 
AbsoluteTableIdentifier absolute
* @return List list of CarbonInputSplit
* @throws IOException
*/
-  @Override public List getSplits(JobContext job) throws 
IOException {
-try {
+  @Override
+  public List getSplits(JobContext job) throws IOException {
+AbsoluteTableIdentifier identifier = 
getAbsoluteTableIdentifier(job.getConfiguration());
+addSegmentsIfEmpty(job, identifier);
+if (getSegmentsToAccess(job).length == 0) {
--- End diff --

If this check is removed, some test case of MergeRDD is failing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87716028
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -224,42 +221,29 @@ private void addSegmentsIfEmpty(JobContext job, 
AbsoluteTableIdentifier absolute
* @return List list of CarbonInputSplit
* @throws IOException
*/
-  @Override public List getSplits(JobContext job) throws 
IOException {
-try {
+  @Override
+  public List getSplits(JobContext job) throws IOException {
+AbsoluteTableIdentifier identifier = 
getAbsoluteTableIdentifier(job.getConfiguration());
+addSegmentsIfEmpty(job, identifier);
+if (getSegmentsToAccess(job).length == 0) {
+  return new ArrayList<>(0);
+}
+
+FilterResolverIntf filterInterface;
+Expression filter = getFilterPredicates(job.getConfiguration());
+if (filter == null) {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87714121
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/mapreduce/CarbonHadoopMapReduceUtil.scala
 ---
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mapreduce
+
+/**
+ * Utility that enable the usage of SparkHadoopMapReduceUtil out side of 
spark package
+ */
+trait CarbonHadoopMapReduceUtil extends SparkHadoopMapReduceUtil {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87713382
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/mapreduce/CarbonHadoopMapReduceUtil.scala
 ---
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mapreduce
+
+/**
+ * Utility that enable the usage of SparkHadoopMapReduceUtil out side of 
spark package
+ */
+trait CarbonHadoopMapReduceUtil extends SparkHadoopMapReduceUtil {
--- End diff --

This class is already added in latest master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87712886
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -224,42 +221,29 @@ private void addSegmentsIfEmpty(JobContext job, 
AbsoluteTableIdentifier absolute
* @return List list of CarbonInputSplit
* @throws IOException
*/
-  @Override public List getSplits(JobContext job) throws 
IOException {
-try {
+  @Override
+  public List getSplits(JobContext job) throws IOException {
+AbsoluteTableIdentifier identifier = 
getAbsoluteTableIdentifier(job.getConfiguration());
+addSegmentsIfEmpty(job, identifier);
+if (getSegmentsToAccess(job).length == 0) {
+  return new ArrayList<>(0);
+}
+
+FilterResolverIntf filterInterface;
+Expression filter = getFilterPredicates(job.getConfiguration());
+if (filter == null) {
--- End diff --

No need of this check. Please initialize `FilterResolverIntf 
filterInterface = null` at first


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-13 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87712366
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -130,41 +130,27 @@ public static CarbonTable 
getCarbonTable(Configuration configuration) throws IOE
 return (CarbonTable) 
ObjectSerializationUtil.convertStringToObject(carbonTableStr);
   }
 
-  /**
-   * It sets unresolved filter expression.
-   *
-   * @param configuration
-   * @param filterExpression
-   */
-  public static void setFilterPredicates(Configuration configuration, 
Expression filterExpression) {
-try {
-  String filterString = 
ObjectSerializationUtil.convertObjectToString(filterExpression);
-  configuration.set(FILTER_PREDICATE, filterString);
-} catch (Exception e) {
-  throw new RuntimeException("Error while setting filter expression to 
Job", e);
-}
+  public static void setTablePath(Configuration configuration, String 
tablePath)
+  throws IOException {
+configuration.set(FileInputFormat.INPUT_DIR, tablePath);
   }
 
   /**
-   * It sets the resolved filter expression
+   * It sets unresolved filter expression.
*
* @param configuration
* @param filterExpression
*/
-  public static void setFilterPredicates(Configuration configuration,
-  FilterResolverIntf filterExpression) {
+  public static void setFilterPredicates(Configuration configuration, 
Expression filterExpression) {
--- End diff --

Can't the `filterExpression` null in any case? Don't require null check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87540836
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -177,6 +178,13 @@ public ValidAndInvalidSegmentsInfo 
getValidAndInvalidSegments() throws IOExcepti
   }
 
 }
+
+// remove entry in the segment index if there are invalid segments
+if (listOfInvalidSegments.size() > 0) {
--- End diff --

ok, modified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-10 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r87540816
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -177,6 +178,13 @@ public ValidAndInvalidSegmentsInfo 
getValidAndInvalidSegments() throws IOExcepti
   }
 
 }
+
+// remove entry in the segment index if there are invalid segments
+if (listOfInvalidSegments.size() > 0) {
--- End diff --

ok, modified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-07 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86786637
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/datastore/block/Distributable.java
 ---
@@ -16,10 +16,12 @@
  */
 package org.apache.carbondata.core.carbon.datastore.block;
 
+import java.io.IOException;
+
 /**
- * Abstract class which is maintains the locations of node.
+ * interface to get the locations of node. Used for making task 
distribution based on locality
  */
-public abstract class Distributable implements Comparable {
+public interface Distributable extends Comparable {
 
-  public abstract String[] getLocations();
+  String[] getLocations() throws IOException;
--- End diff --

Because CarbonInputSplit need to implement Distributable, and InputSplit 
has a getLocation function that throws IOException


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-07 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86786673
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -311,80 +278,6 @@ private void addSegmentsIfEmpty(JobContext job, 
AbsoluteTableIdentifier absolute
 return result;
   }
 
-  /**
-   * get total number of rows. Same as count(*)
-   *
-   * @throws IOException
-   * @throws IndexBuilderException
-   */
-  public long getRowCount(JobContext job) throws IOException, 
IndexBuilderException {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-03 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86393676
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -311,80 +278,6 @@ private void addSegmentsIfEmpty(JobContext job, 
AbsoluteTableIdentifier absolute
 return result;
   }
 
-  /**
-   * get total number of rows. Same as count(*)
-   *
-   * @throws IOException
-   * @throws IndexBuilderException
-   */
-  public long getRowCount(JobContext job) throws IOException, 
IndexBuilderException {
--- End diff --

This method is useful for count(*) query as we can return number of rows 
from driver itself , currently we are pushing down to executor, better keep 
this method it will be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-03 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86391469
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/datastore/block/Distributable.java
 ---
@@ -16,10 +16,12 @@
  */
 package org.apache.carbondata.core.carbon.datastore.block;
 
+import java.io.IOException;
+
 /**
- * Abstract class which is maintains the locations of node.
+ * interface to get the locations of node. Used for making task 
distribution based on locality
  */
-public abstract class Distributable implements Comparable {
+public interface Distributable extends Comparable {
 
-  public abstract String[] getLocations();
+  String[] getLocations() throws IOException;
--- End diff --

Any reason to throw IOException form this method, I think this is not 
required ?? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-02 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86214621
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java
 ---
@@ -953,66 +959,6 @@ public static void 
checkAndCreateCarbonDataLocation(String carbonStorePath, Stri
   }
 
   /**
-   * method to distribute the blocklets of a block in multiple blocks
--- End diff --

May be we should take a call for removing blocklet distribution. For filter 
queries with small number of blocks to scan it is very helpful to process 
faster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-02 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86211496
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterExpressionProcessor.java
 ---
@@ -352,4 +354,18 @@ private FilterResolverIntf 
getFilterResolverBasedOnExpressionType(
 return new RowLevelFilterResolverImpl(expression, false, false, 
tableIdentifier);
   }
 
+  public static FilterResolverIntf 
getResolvedFilter(AbsoluteTableIdentifier identifier,
--- End diff --

Why it was added?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-01 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058166
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
@@ -22,28 +22,44 @@
 import java.io.DataOutput;
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.Distributable;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
 
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Writable;
 import org.apache.hadoop.mapreduce.lib.input.FileSplit;
 
+
 /**
  * Carbon input split to allow distributed read of CarbonInputFormat.
  */
-public class CarbonInputSplit extends FileSplit implements Serializable, 
Writable {
+public class CarbonInputSplit extends FileSplit implements Distributable, 
Serializable, Writable {
 
   private static final long serialVersionUID = 3520344046772190207L;
   private String segmentId;
-  /**
+  public String taskId = "0";
+
+  /*
* Number of BlockLets in a block
*/
   private int numberOfBlocklets = 0;
 
-  public CarbonInputSplit() {
-super(null, 0, 0, new String[0]);
+  public  CarbonInputSplit() {
   }
 
-  public CarbonInputSplit(String segmentId, Path path, long start, long 
length,
+  private void parserPath(Path path) {
--- End diff --

please use CarbonTablePath.DataFileUtil.getTaskNo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-01 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058188
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
@@ -22,28 +22,44 @@
 import java.io.DataOutput;
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.Distributable;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
 
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Writable;
 import org.apache.hadoop.mapreduce.lib.input.FileSplit;
 
+
 /**
  * Carbon input split to allow distributed read of CarbonInputFormat.
  */
-public class CarbonInputSplit extends FileSplit implements Serializable, 
Writable {
+public class CarbonInputSplit extends FileSplit implements Distributable, 
Serializable, Writable {
 
   private static final long serialVersionUID = 3520344046772190207L;
   private String segmentId;
-  /**
+  public String taskId = "0";
+
+  /*
* Number of BlockLets in a block
*/
   private int numberOfBlocklets = 0;
 
-  public CarbonInputSplit() {
-super(null, 0, 0, new String[0]);
+  public  CarbonInputSplit() {
   }
 
-  public CarbonInputSplit(String segmentId, Path path, long start, long 
length,
+  private void parserPath(Path path) {
+String[] nameParts = path.getName().split("-");
+if (nameParts != null && nameParts.length >= 3) {
+  this.taskId = nameParts[2];
+}
+  }
+
+  private CarbonInputSplit(String segmentId, Path path, long start, long 
length,
--- End diff --

please initialize taskId


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---