Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-09 Thread foryou2030
+1
regards
Gin

发自我的 iPhone

> 在 2016年11月10日,上午3:25,Kumar Vishal  写道:
> 
> +1
> -Redards
> Kumar Vishal
> 
>> On Nov 9, 2016 08:04, "Jacky Li"  wrote:
>> 
>> +1
>> 
>> Regards,
>> Jacky
>> 
>>> 在 2016年11月9日,上午9:05,Jay <2550062...@qq.com> 写道:
>>> 
>>> +1
>>> regards
>>> Jay
>>> 
>>> 
>>> 
>>> 
>>> -- 原始邮件 --
>>> 发件人: "向志强";;
>>> 发送时间: 2016年11月9日(星期三) 上午8:59
>>> 收件人: "dev";
>>> 
>>> 主题: Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:
>>> 
>>> 
>>> 
>>> No need to install thrift for building project is so great.
>>> 
>>> 2016-11-08 23:16 GMT+08:00 QiangCai :
>>> 
 I look forward to release this version.
 Carbondata improved query and load performance. And it is a good news no
 need to install thrift for building project.
 Btw, How many PR merged into this version?
 
 
 
 --
 View this message in context: http://apache-carbondata-
 mailing-list-archive.1130556.n5.nabble.com/As-planed-we-
 are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2752.html
 Sent from the Apache CarbonData Mailing List archive mailing list
>> archive
 at Nabble.com.
> 



Re: load data error

2016-10-20 Thread foryou2030
try hdfs://name001:9000/carbondata/sample.csv
  Instead of 
hdfs:///name001:9000/carbondata/sample.csv

发自我的 iPhone

> 在 2016年10月20日,上午10:52,仲景武  写道:
> 
> 
> when run command (thrift sever):
> 
> jdbc:hive2://taonongyuan.com:10099/default> load 
> data inpath 'hdfs://name001:9000/carbondata/sample.csv' into table 
> test_table3;
> 
> 
> throw exception:
> 
> Driver stacktrace: (state=,code=0)
> 0: jdbc:hive2://taonongyuan.com:10099/default> load 
> data inpath 'hdfs:///name001:9000/carbondata/sample.csv' into table 
> test_table3;
> Error: java.lang.IllegalArgumentException: Pathname 
> /name001:9000/carbondata/sample.csv from 
> hdfs:/name001:9000/carbondata/sample.csv is not a valid DFS filename. 
> (state=,code=0)
> 0: jdbc:hive2://taonongyuan.com:10099/default> load 
> data inpath 'hdfs://name001:9000/carbondata/sample.csv' into table 
> test_table3;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 5.0 (TID 18, data002): java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://name001:9000/user/hive/warehouse/carbon.store/default/test_table3/Metadata/fdd8c8c4-5cdd-4542-aab1-785be20b9f36.dictmeta,
>  expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
> at 
> org.apache.carbondata.core.datastorage.store.impl.FileFactory.getDataInputStream(FileFactory.java:146)
> at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:79)
> at 
> org.apache.carbondata.core.reader.CarbonDictionaryMetadataReaderImpl.openThriftReader(CarbonDictionaryMetadataReaderImpl.java:181)
> at 
> org.apache.carbondata.core.reader.CarbonDictionaryMetadataReaderImpl.readLastEntryOfDictionaryMetaChunk(CarbonDictionaryMetadataReaderImpl.java:128)
> at 
> org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.readLastChunkFromDictionaryMetadataFile(AbstractDictionaryCache.java:129)
> at 
> org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.checkAndLoadDictionaryData(AbstractDictionaryCache.java:204)
> at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.getDictionary(ReverseDictionaryCache.java:181)
> at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:69)
> at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:40)
> at 
> org.apache.carbondata.spark.load.CarbonLoaderUtil.getDictionary(CarbonLoaderUtil.java:508)
> at 
> org.apache.carbondata.spark.load.CarbonLoaderUtil.getDictionary(CarbonLoaderUtil.java:514)
> at 
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:362)
> at 
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:293)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 
> 
> 
> 在 2016年10月19日,下午4:55,仲景武 
> mailto:zhongjin...@shhxzq.com>> 写道:
> 
> 
> hi, all
> 
> I have installed carbonate succeed  following the document 
> “https://cwiki.apache.org/confluence/display/CARBONDATA/“
> 
> but when load data into carbonate table  throws exception:
> 
> 
> run command:
> cc.sql("load data local inpath '../carbondata/sample.csv' into table 
> test_table")
> 
> errors:
> 
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
> not exist: /home/bigdata/bigdata/carbondata/sample.csv
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
> at 
> org.apache.hadoop.mapreduce.l

[GitHub] incubator-carbondata pull request #232: [CARBONDATA-310]Fixed compilation fa...

2016-10-12 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/232

[CARBONDATA-310]Fixed compilation failure when using spark 1.6.2

# Why raise this pr?
Compilation failed when using spark 1.6.2, because class not found: 
AggregateExpression
# How to solve?
Once Removing the import "import 
org.apache.spark.sql.catalyst.expressions.aggregate._" will cause compilation 
failure when using Spark 1.6.2, in which AggregateExpression is moved to 
subpackage "aggregate". So neeed changing it back.

Thanks for you remind, @harperjiang

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata agg_ex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #232


commit ee4f6832d893c6ac99e1694b607b6f2d38ec9231
Author: foryou2030 
Date:   2016-10-13T03:17:38Z

fix compile on spark1.6.2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #227: [CARBONDATA-304] Fixed data loading ...

2016-10-12 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/227#discussion_r83132187
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -197,8 +197,9 @@ public AbstractFactDataWriter(String storeLocation, int 
measureCount, int mdKeyL
 blockIndexInfoList = new ArrayList<>();
 // get max file size;
 CarbonProperties propInstance = CarbonProperties.getInstance();
-this.fileSizeInBytes = blocksize * 
CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
-* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L;
+// if blocksize=2048, then 2048*1024*1024 will beyond the range of Int
+this.fileSizeInBytes = 1L * blocksize * 
CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #227: [CARBONDATA-304] Fixed data loading ...

2016-10-11 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/227

[CARBONDATA-304] Fixed data loading failure when set table_blocksize=2048

# Why raise this pr?
Load data failure when set table_blocksize=2048
# How to solve?
if blocksize=2048, then 2048*1024*1024 will beyond the range of Int, so use 
long

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata size1L

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #227


commit c9644bc21501341aac03fce7ad85118eca118ab8
Author: foryou2030 
Date:   2016-10-11T10:00:54Z

fix out of Int range




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #221: use recoder for all statistic log

2016-10-09 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/221

use recoder for all statistic log

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata off_stat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #221


commit 4e1154d538a96d8862a2b66815cb5db1dc7e3ed5
Author: foryou2030 
Date:   2016-10-09T08:41:20Z

use recoder for all statistic log




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #198: [CARBONDATA-273]Using carbon common ...

2016-09-26 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/198

[CARBONDATA-273]Using carbon common constants instead of direct values

# Why raise this pr?
there are some constants hav been defined in CarbonCommonConstants, but 
sitill using its direct value in the code.
# How to solve?
Using carbon common constants instead of direct values


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata utf8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #198


commit 28ea756774f43e76b6988a96451aa8b13993e83a
Author: foryou2030 
Date:   2016-09-26T08:25:38Z

use carbon common constants instead of direct values




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #176: [CARBONDATA-208] Add configuration e...

2016-09-23 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/176#discussion_r80208061
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatisticsRecorderImpl.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.carbon.querystatistics;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+import static org.apache.carbondata.core.util.CarbonUtil.printLine;
+
+/**
+ * Class will be used to record and log the query statistics
+ */
+public class QueryStatisticsRecorderImpl implements 
QueryStatisticsRecorder,Serializable {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(QueryStatisticsRecorderImpl.class.getName());
+
+  /**
+   * serialization version
+   */
+  private static final long serialVersionUID = -5719752001674467864L;
+
+  /**
+   * list for statistics to record time taken
+   * by each phase of the query for example aggregation
+   * scanning,block loading time etc.
+   */
+  private List queryStatistics;
+
+  /**
+   * query with taskd
+   */
+  private String queryIWthTask;
+
+  /**
+   * lock for log statistics table
+   */
+  private static final Object lock = new Object();
+
+  public QueryStatisticsRecorderImpl(String queryId) {
+queryStatistics = new ArrayList();
+this.queryIWthTask = queryId;
+  }
+
+  /**
+   * Below method will be used to add the statistics
+   *
+   * @param statistic
+   */
+  public synchronized void recordStatistics(QueryStatistic statistic) {
+queryStatistics.add(statistic);
+  }
+
+  /**
+   * Below method will be used to log the statistic
+   */
+  public void logStatistics() {
+for (QueryStatistic statistic : queryStatistics) {
+  LOGGER.statistic(statistic.getStatistics(queryIWthTask));
+}
+  }
+
+  /**
+   * Below method will be used to show statistic log as table
+   */
+  public void logStatisticsAsTableExecutor() {
+synchronized (lock) {
--- End diff --

yes, i think no need lock. For executor, one task one recorder. 
what do u think? @Vimal-Das 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #176: [CARBONDATA-208] Add configuration e...

2016-09-23 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/176#discussion_r80198494
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonTimeStatisticsFactory.java
 ---
@@ -17,13 +17,14 @@
 
 package org.apache.carbondata.core.util;
 
-import 
org.apache.carbondata.core.carbon.querystatistics.DriverQueryStatisticsRecorder;
+import org.apache.carbondata.core.carbon.querystatistics.*;
 import org.apache.carbondata.core.constants.CarbonCommonConstants;
 
 public class CarbonTimeStatisticsFactory {
   private static String LoadStatisticsInstanceType;
   private static LoadStatistics LoadStatisticsInstance;
-  private static DriverQueryStatisticsRecorder 
QueryStatisticsRecorderInstance;
+  private static String queryStatisticsRecorderInstanceType;
+  private static QueryStatisticsRecorder QueryStatisticsRecorderInstance;
--- End diff --

already initialized 
```
 static {
CarbonTimeStatisticsFactory.updateTimeStatisticsUtilStatus();
LoadStatisticsInstance = genLoadStatisticsInstance();
QueryStatisticsRecorderInstance = genQueryStatisticsRecorderInstance();
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #176: [CARBONDATA-208] Add configuration e...

2016-09-23 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/176#discussion_r80197685
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -461,8 +460,8 @@ private Object getFilterPredicates(Configuration 
configuration) {
   FilterExpressionProcessor filterExpressionProcessor,
   AbsoluteTableIdentifier absoluteTableIdentifier, FilterResolverIntf 
resolver,
   String segmentId) throws IndexBuilderException, IOException {
-
-QueryStatisticsRecorder recorder = new QueryStatisticsRecorder("");
+QueryStatisticsRecorder recorder =
+
CarbonTimeStatisticsFactory.getQueryStatisticsRecorderInstance();
--- End diff --

already fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #176: [CARBONDATA-208] Add configuration e...

2016-09-23 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/176#discussion_r80197172
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonTimeStatisticsFactory.java
 ---
@@ -52,12 +56,27 @@ public static LoadStatistics 
getLoadStatisticsInstance() {
 return LoadStatisticsInstance;
   }
 
-  private static DriverQueryStatisticsRecorder 
genQueryStatisticsRecorderInstance() {
-return DriverQueryStatisticsRecorder.getInstance();
+  private static QueryStatisticsRecorder 
genQueryStatisticsRecorderInstance() {
+if (queryStatisticsRecorderInstanceType.equalsIgnoreCase("true")) {
+  return DriverQueryStatisticsRecorderImpl.getInstance();
+} else {
+  return DriverQueryStatisticsRecorderDummy.getInstance();
+}
   }
 
-  public static DriverQueryStatisticsRecorder 
getQueryStatisticsRecorderInstance() {
+  public static QueryStatisticsRecorder 
getQueryStatisticsRecorderInstance() {
 return QueryStatisticsRecorderInstance;
   }
 
+  public static QueryStatisticsRecorder getQueryStatisticsRecorder(String 
queryId) {
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #176: [CARBONDATA-208] Add configuration e...

2016-09-23 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/176#discussion_r80197104
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -461,8 +459,15 @@ private Object getFilterPredicates(Configuration 
configuration) {
   FilterExpressionProcessor filterExpressionProcessor,
   AbsoluteTableIdentifier absoluteTableIdentifier, FilterResolverIntf 
resolver,
   String segmentId) throws IndexBuilderException, IOException {
-
-QueryStatisticsRecorder recorder = new QueryStatisticsRecorder("");
+String queryStatisticsRecorderInstanceType = 
CarbonProperties.getInstance()
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #176: [CARBONDATA-208] Add configuration e...

2016-09-21 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/176#discussion_r79826568
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -461,8 +460,7 @@ private Object getFilterPredicates(Configuration 
configuration) {
   FilterExpressionProcessor filterExpressionProcessor,
   AbsoluteTableIdentifier absoluteTableIdentifier, FilterResolverIntf 
resolver,
   String segmentId) throws IndexBuilderException, IOException {
-
-QueryStatisticsRecorder recorder = new QueryStatisticsRecorder("");
+QueryStatisticsRecorder recorder = 
CarbonTimeStatisticsFactory.getQueryStatisticsRecorder("");
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #176: [CARBONDATA-208] Add configuration e...

2016-09-20 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/176#discussion_r79550295
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
 ---
@@ -158,7 +159,16 @@ case class CarbonDictionaryDecoder(
 val carbonTable = 
relation.carbonRelation.carbonRelation.metaData.carbonTable
 (carbonTable.getFactTableName, 
carbonTable.getAbsoluteTableIdentifier)
   }.toMap
-  val recorder = new QueryStatisticsRecorder(queryId)
+  val queryStatisticsRecorderInstanceType = 
CarbonProperties.getInstance()
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #177: [CARBONDATA-259] Fixed query statist...

2016-09-19 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/177

[CARBONDATA-259] Fixed query statistics for queries with limit

# Why raised this pr?
 query statistics are not present for limit queries.
# How to solve it?
Using context.addTaskCompletionListener in compute method to register an 
on-task-completion callback to print the statistic logs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata limit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #177


commit 4e6a76b305408174776138255b2802fdcfa473b6
Author: foryou2030 
Date:   2016-09-20T05:29:14Z

Fixed query statistics for queries with limit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #176: [CARBONDATA-208] add configurable on...

2016-09-19 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/176

[CARBONDATA-208] add configurable on-off for query statistics

# Why raise this pr?
Currently there are many STATISTIC log for performance tuning purpose, but 
it should be configurable by the user.
# How to solve it?
Add configuration to carbon.properties, "enable.query.statistics" (default 
value is "false")
User can enable query statistics by setting it as true.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata 
statistic_switch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #176


commit a3165a6686d371f2bd256a7a8fc3fec8a55b96e0
Author: foryou2030 
Date:   2016-09-19T12:55:52Z

add configurable on-off for query statistics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #150: [CARBONDATA-235] Removed no-used car...

2016-09-13 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/150

[CARBONDATA-235] Removed no-used carbon common constants

# Why raise this pr?
Some no-used constants still exist in CarbonCommonConstants
# How to solve it?
Removed no-used carbon common constants

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata no_constants

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #150


commit 0d38921d082c5da538545ef52b0759604dcb97c8
Author: foryou2030 
Date:   2016-09-13T09:12:13Z

removed no used carbon common constants




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #137: [CARBONDATA-222] Handled query issue...

2016-09-08 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/137#discussion_r77957259
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/datastore/impl/btree/BTreeDataRefNodeFinder.java
 ---
@@ -240,9 +240,16 @@ private int compareIndexes(IndexKey first, IndexKey 
second) {
   firstNoDictionaryKeyBuffer.getShort(nonDictionaryKeyOffset + 
SHORT_SIZE_IN_BYTES);
   secondNodeDictionaryLength =
   secondNoDictionaryKeyBuffer.getShort(nonDictionaryKeyOffset 
+ SHORT_SIZE_IN_BYTES);
-  compareResult = ByteUtil.UnsafeComparer.INSTANCE
-  .compareTo(first.getNoDictionaryKeys(), actualOffset, 
firstNoDcitionaryLength,
-  second.getNoDictionaryKeys(), actualOffset, 
secondNodeDictionaryLength);
+  int minLength = Math.min(firstNoDcitionaryLength, 
secondNodeDictionaryLength);
--- End diff --

ok, handled. thanks for you @gvramana @kumarvishal09 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #138: Show query statistics as millseconds...

2016-09-07 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/138

Show query statistics as millseconds instead of seconds

# Why raised this PR?

It is more proper to show time cost of statistics as millseconds instead of 
seconds

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata mills

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #138






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #137: [CARBONDATA-222] Handled query issue...

2016-09-07 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/137

[CARBONDATA-222]  Handled query issue for all dimensions are no dictionary 
columns

# Why raise this pr?
query failed when all dimensions are no-dictionary column, and the data 
contains one row that all no-dictionary column value are empty
# How to resolve?
when find First Data Block, we need to compare indexes, in this scenario,we 
should use other compareTo method.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata no_dict_null

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/137.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #137






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #123: [CARBONDATA-204] Clear queryStatisti...

2016-09-05 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/123#discussion_r77511283
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/DriverQueryStatisticsRecorder.java
 ---
@@ -78,106 +83,148 @@ public synchronized void 
recordStatisticsForDriver(QueryStatistic statistic, Str
*/
   public void logStatisticsAsTableDriver() {
 synchronized (lock) {
-  String tableInfo = collectDriverStatistics();
-  if (null != tableInfo) {
-LOGGER.statistic(tableInfo);
+  Iterator>> entries =
+  queryStatisticsMap.entrySet().iterator();
+  while (entries.hasNext()) {
+Map.Entry> entry = entries.next();
+String queryId = entry.getKey();
+// clear the unknown query statistics
+if(StringUtils.isEmpty(queryId)) {
+  queryStatisticsMap.remove(queryId);
--- End diff --

ok, handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #123: [CARBONDATA-204] Clear queryStatisti...

2016-09-05 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/123#discussion_r77506707
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/DriverQueryStatisticsRecorder.java
 ---
@@ -78,106 +83,148 @@ public synchronized void 
recordStatisticsForDriver(QueryStatistic statistic, Str
*/
   public void logStatisticsAsTableDriver() {
 synchronized (lock) {
-  String tableInfo = collectDriverStatistics();
-  if (null != tableInfo) {
-LOGGER.statistic(tableInfo);
+  Iterator>> entries =
+  queryStatisticsMap.entrySet().iterator();
+  while (entries.hasNext()) {
+Map.Entry> entry = entries.next();
+String queryId = entry.getKey();
+// clear the unknown query statistics
+if(StringUtils.isEmpty(queryId)) {
+  queryStatisticsMap.remove(queryId);
--- End diff --

I tried this, but caused some exceptions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #123: [CARBONDATA-204] Clear queryStatisti...

2016-09-05 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/123#discussion_r77488586
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/DriverQueryStatisticsRecorder.java
 ---
@@ -78,106 +82,142 @@ public synchronized void 
recordStatisticsForDriver(QueryStatistic statistic, Str
*/
   public void logStatisticsAsTableDriver() {
 synchronized (lock) {
-  String tableInfo = collectDriverStatistics();
-  if (null != tableInfo) {
-LOGGER.statistic(tableInfo);
+  for (String key: queryStatisticsMap.keySet()) {
+// print 
sql_parse_t,load_meta_t,block_allocation_t,block_identification_t
+// or just print block_allocation_t,block_identification_t
+if (queryStatisticsMap.get(key).size() >= 2) {
--- End diff --

ok, thanks
handled, pls check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #123: [CARBONDATA-204] Clear queryStatisti...

2016-09-05 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/123#discussion_r77488614
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/DriverQueryStatisticsRecorder.java
 ---
@@ -78,106 +82,142 @@ public synchronized void 
recordStatisticsForDriver(QueryStatistic statistic, Str
*/
   public void logStatisticsAsTableDriver() {
 synchronized (lock) {
-  String tableInfo = collectDriverStatistics();
-  if (null != tableInfo) {
-LOGGER.statistic(tableInfo);
+  for (String key: queryStatisticsMap.keySet()) {
+// print 
sql_parse_t,load_meta_t,block_allocation_t,block_identification_t
+// or just print block_allocation_t,block_identification_t
+if (queryStatisticsMap.get(key).size() >= 2) {
+  String tableInfo = collectDriverStatistics(key);
+  if (null != tableInfo) {
+LOGGER.statistic(tableInfo);
+  }
+}
+// clear timeout query statistics
+if(StringUtils.isEmpty(key)) {
+  queryStatisticsMap.remove(key);
+} else {
+  long interval = System.nanoTime() - Long.parseLong(key);
+  if (interval > 
QueryStatisticsConstants.CLEAR_STATISTICS_TIMEOUT) {
+queryStatisticsMap.remove(key);
+  }
+}
   }
 }
   }
 
   /**
* Below method will parse queryStatisticsMap and put time into table
*/
-  public String collectDriverStatistics() {
-for (String key: queryStatisticsMap.keySet()) {
-  try {
-// TODO: get the finished query, and print Statistics
-if (queryStatisticsMap.get(key).size() > 3) {
-  String sql_parse_time = "";
-  String load_meta_time = "";
-  String block_allocation_time = "";
-  String block_identification_time = "";
-  Double driver_part_time_tmp = 0.0;
-  String splitChar = " ";
-  // get statistic time from the QueryStatistic
-  for (QueryStatistic statistic : queryStatisticsMap.get(key)) {
-switch (statistic.getMessage()) {
-  case QueryStatisticsConstants.SQL_PARSE:
-sql_parse_time += statistic.getTimeTaken() + splitChar;
-driver_part_time_tmp += statistic.getTimeTaken();
-break;
-  case QueryStatisticsConstants.LOAD_META:
-load_meta_time += statistic.getTimeTaken() + splitChar;
-driver_part_time_tmp += statistic.getTimeTaken();
-break;
-  case QueryStatisticsConstants.BLOCK_ALLOCATION:
-block_allocation_time += statistic.getTimeTaken() + 
splitChar;
-driver_part_time_tmp += statistic.getTimeTaken();
-break;
-  case QueryStatisticsConstants.BLOCK_IDENTIFICATION:
-block_identification_time += statistic.getTimeTaken() + 
splitChar;
-driver_part_time_tmp += statistic.getTimeTaken();
-break;
-  default:
-break;
-}
-  }
-  String driver_part_time = driver_part_time_tmp + splitChar;
-  // structure the query statistics info table
-  StringBuilder tableInfo = new StringBuilder();
-  int len1 = 8;
-  int len2 = 20;
-  int len3 = 21;
-  int len4 = 22;
-  String line = "+" + printLine("-", len1) + "+" + printLine("-", 
len2) + "+" +
-  printLine("-", len3) + "+" + printLine("-", len4) + "+";
-  String line2 = "|" + printLine(" ", len1) + "+" + printLine("-", 
len2) + "+" +
-  printLine(" ", len3) + "+" + printLine("-", len4) + "+";
-  // table header
-  tableInfo.append(line).append("\n");
-  tableInfo.append("|" + printLine(" ", (len1 - 
"Module".length())) + "Module" + "|" +
-  printLine(" ", (len2 - "Operation Step".length())) + 
"Operation Step" + "|" +
-  printLine(" ", (len3 + len4 + 1 - "Query Cost".length())) +
-  "Query Cost" + "|" + "\n");
-  // driver part
-  tableInfo.append(line).append("\n");
-  tableInfo.append("|" + 

[GitHub] incubator-carbondata pull request #122: [CARBONDATA-202] Handled exception t...

2016-09-02 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/122#discussion_r77336083
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -791,16 +819,23 @@ object GlobalDictionaryUtil extends Logging {
   if (requireDimension.nonEmpty) {
 val model = createDictionaryLoadModel(carbonLoadModel, table, 
requireDimension,
   hdfsLocation, dictfolderPath, false)
+// check if dictionary files contains bad record
+val accumulator = sqlContext.sparkContext.accumulator(0)
 // read local dictionary file, and group by key
 val allDictionaryRdd = readAllDictionaryFiles(sqlContext, 
headers,
-  requireColumnNames, allDictionaryPath)
+  requireColumnNames, allDictionaryPath, accumulator)
 // read exist dictionary and combine
 val inputRDD = new 
CarbonAllDictionaryCombineRDD(allDictionaryRdd, model)
   .partitionBy(new 
ColumnPartitioner(model.primDimensions.length))
 // generate global dictionary files
 val statusList = new 
CarbonGlobalDictionaryGenerateRDD(inputRDD, model).collect()
 // check result status
 checkStatus(carbonLoadModel, sqlContext, model, statusList)
+// if the dictionary contains wrong format record, throw ex
+if (accumulator.value > 0) {
+  throw new DataLoadingException("Data Loading failure, the 
dictionary file " +
--- End diff --

u r right. handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #122: [CARBONDATA-202] Handled exception t...

2016-09-02 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/122#discussion_r77333560
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -588,30 +588,59 @@ object GlobalDictionaryUtil extends Logging {
  allDictionaryPath: String) = {
 var allDictionaryRdd: RDD[(String, Iterable[String])] = null
 try {
-  // read local dictionary file, and spilt (columnIndex, columnValue)
-  val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
-.map(x => {
+  // parse record and validate record
+  def parseRecord(x: String, accum: Accumulator[Int]) : (String, 
String) = {
 val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR)
-if (tokens.size != 2) {
-  logError("Read a bad dictionary record: " + x)
-}
-var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME
+var columnName: String = ""
 var value: String = ""
-try {
-  columnName = csvFileColumns(tokens(0).toInt)
-  value = tokens(1)
-} catch {
-  case ex: Exception =>
-logError("Reset bad dictionary record as default value")
+// such as "," , "", throw ex
+if (tokens.size == 0) {
+  logError("Read a bad dictionary record: " + x)
+  accum += 1
+} else if (tokens.size == 1) {
+  // such as "1", "jone", throw ex
+  if (x.contains(",") == false) {
+accum += 1
+  } else {
+try {
+  columnName = csvFileColumns(tokens(0).toInt)
+} catch {
+  case ex: Exception =>
+logError("Read a bad dictionary record: " + x)
+accum += 1
+}
+  }
+} else {
+  try {
+columnName = csvFileColumns(tokens(0).toInt)
+value = tokens(1)
+  } catch {
+case ex: Exception =>
+  logError("Read a bad dictionary record: " + x)
+  accum += 1
+  }
 }
 (columnName, value)
-  })
+  }
 
+  val accumulator = sqlContext.sparkContext.accumulator(0)
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #122: [CARBONDATA-202] Handled exception t...

2016-09-02 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/122#discussion_r77333578
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -588,30 +588,59 @@ object GlobalDictionaryUtil extends Logging {
  allDictionaryPath: String) = {
 var allDictionaryRdd: RDD[(String, Iterable[String])] = null
 try {
-  // read local dictionary file, and spilt (columnIndex, columnValue)
-  val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
-.map(x => {
+  // parse record and validate record
+  def parseRecord(x: String, accum: Accumulator[Int]) : (String, 
String) = {
 val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR)
-if (tokens.size != 2) {
-  logError("Read a bad dictionary record: " + x)
-}
-var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME
+var columnName: String = ""
 var value: String = ""
-try {
-  columnName = csvFileColumns(tokens(0).toInt)
-  value = tokens(1)
-} catch {
-  case ex: Exception =>
-logError("Reset bad dictionary record as default value")
+// such as "," , "", throw ex
+if (tokens.size == 0) {
+  logError("Read a bad dictionary record: " + x)
+  accum += 1
+} else if (tokens.size == 1) {
+  // such as "1", "jone", throw ex
+  if (x.contains(",") == false) {
+accum += 1
+  } else {
+try {
+  columnName = csvFileColumns(tokens(0).toInt)
+} catch {
+  case ex: Exception =>
+logError("Read a bad dictionary record: " + x)
+accum += 1
+}
+  }
+} else {
+  try {
+columnName = csvFileColumns(tokens(0).toInt)
+value = tokens(1)
+  } catch {
+case ex: Exception =>
+  logError("Read a bad dictionary record: " + x)
+  accum += 1
+  }
 }
 (columnName, value)
-  })
+  }
 
+  val accumulator = sqlContext.sparkContext.accumulator(0)
+  // read local dictionary file, and spilt (columnIndex, columnValue)
+  val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
+.map(x => parseRecord(x, accumulator)).persist()
+  // for accumulator updates performed inside actions only
+  basicRdd.count()
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #123: [CARBONDATA-204] Clear query statist...

2016-09-02 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/123

[CARBONDATA-204] Clear query statistics map when timeout

# Why raise this pr?
I found Query statistics issue:
1. some query statistics that never be printed will be keeped into 
querystatisticsMap, so it will cause "out of memory" for long time running
2. in some sceniaro, the driver can't record "sql_parse_time" , the driver 
statistics logs will not be output, we should output  block_allocation_time and 
block_identification_time always.
# How to solve?
1. add function to check querystatistics timeout , once timeout, remove the 
queryId from the map.
2.add conditional detection for queryStatisticsMap size, if the 
queryStatistic only contain block_allocation_time and 
block_identification_time, then ouput them.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata fix_stat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #123


commit e10868a5154ccc15196b23428db09005c3affc85
Author: foryou2030 
Date:   2016-09-02T10:22:03Z

clear query statistics map




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #122: [CARBONDATA-202] Handled exception t...

2016-09-02 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/122

[CARBONDATA-202] Handled exception thrown in beeline for all dictionary 

# Why raise this pr?
Exception thrown in Beeline for data loading when dictionary file content 
is not in correct format is not proper.
# How to solve?
use accumulator as flag, record the bad record in dictionary files, then 
throw exception


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata dictionary_ex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #122


commit 56b3a85ef8e371c302c94354d121ff267ddae329
Author: foryou2030 
Date:   2016-09-02T09:18:37Z

handled all dictionary exception

commit 6e99d68d3d616f5fd3e8a2b1a9dc07eb7edbb22d
Author: foryou2030 
Date:   2016-09-02T09:19:33Z

add testcase




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #112: Add statistics for counting the size...

2016-09-01 Thread foryou2030
Github user foryou2030 closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/112


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [CARBONDATA-200] Add performance stat...

2016-09-01 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77147060
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/SingleQueryStatisticsRecorder.java
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.carbon.querystatistics;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+/**
+ * Class will be used to record and log the query statistics
+ */
+public class SingleQueryStatisticsRecorder implements Serializable {
--- End diff --

handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [CARBONDATA-200] Add performance stat...

2016-09-01 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77147108
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/SingleQueryStatisticsRecorder.java
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.carbon.querystatistics;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+/**
+ * Class will be used to record and log the query statistics
+ */
+public class SingleQueryStatisticsRecorder implements Serializable {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(SingleQueryStatisticsRecorder.class.getName());
+  /**
+   * serialization version
+   */
+  private static final long serialVersionUID = -1L;
+
+  /**
+   * singleton QueryStatisticsRecorder for driver
+   */
+  private HashMap> queryStatisticsMap;
+
+  private SingleQueryStatisticsRecorder() {
+queryStatisticsMap = new HashMap>();
+  }
+
+  private static SingleQueryStatisticsRecorder 
carbonLoadStatisticsImplInstance =
+  new SingleQueryStatisticsRecorder();
+
+  public static SingleQueryStatisticsRecorder getInstance() {
+return carbonLoadStatisticsImplInstance;
+  }
+
+  /**
+   * Below method will be used to add the statistics
+   *
+   * @param statistic
+   */
+  public synchronized void recordStatisticsForDriver(QueryStatistic 
statistic, String queryId) {
+// refresh query Statistics Map
+if (queryStatisticsMap.get(queryId) != null) {
+  queryStatisticsMap.get(queryId).add(statistic);
+} else {
+  List newQueryStatistics = new 
ArrayList();
+  newQueryStatistics.add(statistic);
+  queryStatisticsMap.put(queryId, newQueryStatistics);
+}
+  }
+
+  /**
+   * Below method will be used to show statistic log as table
+   */
+  public void logStatisticsAsTableDriver() {
+String tableInfo = collectDriverStatistics();
+if (null != tableInfo) {
+  LOGGER.statistic(tableInfo);
+}
+  }
+
+  /**
+   * Below method will parse queryStatisticsMap and put time into table
+   */
+  public String collectDriverStatistics() {
+for (String key: queryStatisticsMap.keySet()) {
+  try {
+// TODO: get the finished query, and print Statistics
+if (queryStatisticsMap.get(key).size() > 2) {
+  String sql_parse_time = "";
+  String load_meta_time = "";
+  String block_allocation_time = "";
+  String block_identification_time = "";
+  Double driver_part_time_tmp = 0.0;
+  String splitChar = " ";
+  // get statistic time from the QueryStatistic
+  for (QueryStatistic statistic : queryStatisticsMap.get(key)) {
+switch (statistic.getMessage()) {
+  case QueryStatisticsConstants.SQL_PARSE:
+sql_parse_time += statistic.getTimeTaken() + splitChar;
+driver_part_time_tmp += statistic.getTimeTaken();
+break;
+  case QueryStatisticsConstants.LOAD_META:
+load_meta_time += statistic.getTimeTaken() + splitChar;
+driver_part_time_tmp += statistic.getTimeTaken();
+break;
+  case QueryStatisticsConstants.BLOCK_ALLOCATION:
+block_allocation_time += statistic.getTimeTaken() + 
splitChar;
+driver_part_time_tmp += statistic.getTimeTaken();
+break;
+  case QueryStatistic

[GitHub] incubator-carbondata pull request #91: [CARBONDATA-200] Add performance stat...

2016-09-01 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77147039
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/SingleQueryStatisticsRecorder.java
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.carbon.querystatistics;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+/**
+ * Class will be used to record and log the query statistics
+ */
+public class SingleQueryStatisticsRecorder implements Serializable {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(SingleQueryStatisticsRecorder.class.getName());
+  /**
+   * serialization version
+   */
+  private static final long serialVersionUID = -1L;
+
+  /**
+   * singleton QueryStatisticsRecorder for driver
+   */
+  private HashMap> queryStatisticsMap;
--- End diff --

handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [CARBONDATA-200] Add performance stat...

2016-09-01 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77147007
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
 ---
@@ -32,6 +32,7 @@ import 
org.apache.carbondata.core.carbon.{AbsoluteTableIdentifier, ColumnIdentif
 import org.apache.carbondata.core.carbon.metadata.datatype.DataType
--- End diff --

handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-31 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77120916
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/result/iterator/DetailQueryResultIterator.java
 ---
@@ -45,10 +47,28 @@
 
   public DetailQueryResultIterator(List infos, 
QueryModel queryModel) {
 super(infos, queryModel);
+this.queryModel = queryModel;
+  }
+
+  private Boolean flag;
+
+  private Long total = 0L;
+
+  private QueryModel queryModel;
+
+  @Override public boolean hasNext() {
+flag = super.hasNext();
--- End diff --

in hasnext: i need to judge when  will it traverse the iterator over
in next: i need to accumulate the total time
but AbstractDetailQueryResultIterator didn't define 'next' 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-31 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77120933
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/result/iterator/DetailQueryResultIterator.java
 ---
@@ -45,10 +47,28 @@
 
   public DetailQueryResultIterator(List infos, 
QueryModel queryModel) {
 super(infos, queryModel);
+this.queryModel = queryModel;
+  }
+
+  private Boolean flag;
+
+  private Long total = 0L;
--- End diff --

handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-31 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77120500
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatistic.java
 ---
@@ -45,8 +47,16 @@
*/
   private long startTime;
 
-  public QueryStatistic() {
+  /**
+   * number of count
+   */
+  private long count;
+
+  private String queryIdWthTask;
+
+  public QueryStatistic(String queryId) {
--- End diff --

handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-31 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77120457
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatisticsRecorder.java
 ---
@@ -20,11 +20,14 @@
 
--- End diff --

handled,  but i can't get the query id, so i print it alone


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-31 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77120377
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatisticsCommonConstants.java
 ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.core.carbon.querystatistics;
+
+public final class QueryStatisticsCommonConstants {
--- End diff --

QueryStatisticsConstants used to defined Constants, i write it refer to 
CarbonCommonConstants,
i think need to change it to interface


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #112: Add statistics for counting the size...

2016-08-31 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/112

Add statistics for counting the size of the query result

Why rasie this pr?
Users want to know the total records send by carbon to spark .
How to solve?
add count statistics in the iterator
How to test?
running query, and check the executor logs, it will print like the 
following:

`2016-08-31 15:01:27,270 | STATISTIC | [[Executor task launch 
worker-70][partitionID:automation;queryID:2421028889045981_0]] | The record 
numbers of query result for the taskid : 2421028889045981_0 Is : 13 | 
org.apache.carbondata.common.logging.impl.StandardLogService.statistic(StandardLogService.java:238)`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata record_num

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #112


commit 720f5e7b5d6b8fa2449d7f890d488c201c541753
Author: foryou2030 
Date:   2016-08-31T08:40:36Z

add statistics for the number of records




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-30 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r76754835
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatisticsRecorder.java
 ---
@@ -61,14 +67,275 @@ public QueryStatisticsRecorder(String queryId) {
*/
   public synchronized void recordStatistics(QueryStatistic statistic) {
 queryStatistics.add(statistic);
+// refresh query Statistics Map
+String key = statistic.getQueryId();
+if (!StringUtils.isEmpty(key)) {
+  // 240954528274124_0 and 240954528274124 is the same query id
+  key = key.substring(0, 15);
+}
+if (queryStatisticsMap.get(key) != null) {
--- End diff --

i think 240954528274124_0  means queryid for each segment? yes?
each executor will print whole time for all segment 
if we have 3 executors, 2 segments, print like:
executor1 log:
|++ +--+
|| Dictionary load| | 0.002  0.001 |
|++ +--+

executor 2 log:
|++ +--+
|| Dictionary load| | 0.003  0.002 |
|++ +--+

executor3 log:
|++ +--+
|| Dictionary load| | 0.001  0.003 |
|++ +--+


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...

2016-08-30 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/82#discussion_r76740663
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -650,6 +650,51 @@ object GlobalDictionaryUtil extends Logging {
   }
 
   /**
+   * get file headers from fact file
+   *
+   * @param carbonLoadModel
+   * @return headers
+   */
+  private def getHeaderFormFactFile(carbonLoadModel: CarbonLoadModel): 
Array[String] = {
+var headers: Array[String] = null
+var factFile: String = null
+val fileType = FileFactory.getFileType(carbonLoadModel.getFactFilePath)
+val filePath = 
FileFactory.getCarbonFile(carbonLoadModel.getFactFilePath, fileType)
+if (filePath.isDirectory) {
+  val listFiles = filePath.getParentFile.listFiles()
--- End diff --

ok, thanks @QiangCai @manishgupta88 
i have handled



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...

2016-08-29 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/82#discussion_r76726188
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -1443,5 +1446,32 @@ public static int getDictionaryChunkSize() {
 }
 return dictionaryOneChunkSize;
   }
+
+  /**
+   * @param csvFilePath
+   * @return
+   */
+  public static String readHeader(String csvFilePath) {
+
+DataInputStream fileReader = null;
+BufferedReader bufferedReader = null;
+String readLine = null;
+
+try {
+  fileReader =
+  FileFactory.getDataInputStream(csvFilePath, 
FileFactory.getFileType(csvFilePath));
+  bufferedReader =
+  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
--- End diff --

ok, handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

2016-08-28 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/100#discussion_r76552945
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -591,29 +591,31 @@ object GlobalDictionaryUtil extends Logging {
   val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
 .map(x => {
 val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR)
-var index: Int = 0
+if (tokens.size != 2) {
+  logError("[ALL_DICTIONARY] Read a bad dictionary record: " + x)
+}
+var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME
 var value: String = ""
 try {
-  index = tokens(0).toInt
+  columnName = csvFileColumns(tokens(0).toInt)
   value = tokens(1)
 } catch {
   case ex: Exception =>
-logError("read a bad dictionary record" + x)
+logError("[ALL_DICTIONARY] Reset bad dictionary record as 
default value")
 }
-(index, value)
+(columnName, value)
   })
+
   // group by column index, and filter required columns
   val requireColumnsList = requireColumns.toList
   allDictionaryRdd = basicRdd
 .groupByKey()
-.map(x => (csvFileColumns(x._1), x._2))
 .filter(x => requireColumnsList.contains(x._1))
 } catch {
   case ex: Exception =>
-logError("read local dictionary files failed")
+logError("[ALL_DICTIONARY] Read dictionary files failed. Caused 
by" + ex.getMessage)
--- End diff --

ok, it no required, i have removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

2016-08-28 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/100#discussion_r76552918
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging {
 // filepath regex, look like "/path/*.dictionary"
 if (filePath.getName.startsWith("*")) {
   val dictExt = filePath.getName.substring(1)
-  val listFiles = filePath.getParentFile.listFiles()
-  if (listFiles.exists(file =>
-file.getName.endsWith(dictExt) && file.getSize > 0)) {
-true
+  if (filePath.getParentFile.exists()) {
+val listFiles = filePath.getParentFile.listFiles()
--- End diff --

handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

2016-08-27 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/100#discussion_r76513982
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging {
 // filepath regex, look like "/path/*.dictionary"
 if (filePath.getName.startsWith("*")) {
   val dictExt = filePath.getName.substring(1)
-  val listFiles = filePath.getParentFile.listFiles()
-  if (listFiles.exists(file =>
-file.getName.endsWith(dictExt) && file.getSize > 0)) {
-true
+  if (filePath.getParentFile.exists()) {
+val listFiles = filePath.getParentFile.listFiles()
+if (listFiles.exists(file =>
+  file.getName.endsWith(dictExt) && file.getSize > 0)) {
+  true
+} else {
+  logWarning("[ALL_DICTIONARY] No dictionary files found or empty 
dictionary files! " +
+"Won't generate new dictionary.")
+  false
+}
   } else {
-logInfo("No dictionary files found or empty dictionary files! " +
-  "Won't generate new dictionary.")
-false
+throw new FileNotFoundException(
+  "[ALL_DICTIONARY] The given dictionary file path not found!")
   }
 } else {
-  if (filePath.exists() && filePath.getSize > 0) {
-true
+  if (filePath.exists()) {
+if (filePath.getSize > 0) {
+  true
+} else {
+  logWarning("[ALL_DICTIONARY] No dictionary files found or empty 
dictionary files! " +
+"Won't generate new dictionary.")
+  false
+}
   } else {
-logInfo("No dictionary files found or empty dictionary files! " +
-  "Won't generate new dictionary.")
-false
+throw new FileNotFoundException(
+  "[ALL_DICTIONARY] The given dictionary file path not found!")
--- End diff --

ok, fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #101: [WIP] When query scan splits is null...

2016-08-27 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/101

[WIP] When query scan splits is null for, no need to return null 
CarbonSparkPartition



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata rm_partition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/101.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #101


commit d9d7e44192dd03b1268feecb3db7f961577d070f
Author: foryou2030 
Date:   2016-08-27T10:05:38Z

When splits is null, no need to return null CarbonSparkPartition




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

2016-08-26 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/100

Handle all dictionary exception more properly

# Why rasied this pr?

When using all dictionary, if we give a wrong dictionary file path, or 
dictionary file include bad record,
carbon should deal with them properly.
# How to solve?

1.when give a wrong dictionary file path, throw file not found exception
2.when dictionary file include bad record,log error, and replace the bad 
record as default value.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata dict_ex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #100


commit 00408412971177cf192cebae8092fda20bcb7d58
Author: foryou2030 
Date:   2016-08-27T06:42:16Z

Handle all dictionary exception more properly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-24 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/91

[WIP] Add performance statistics logs to record the query time taken by 
carbon



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata 
query_statistics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/91.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #91


commit 729e49ca40fc3dbc76ed7a03896ec15167d9e24b
Author: Jay357089 
Date:   2016-08-24T10:54:41Z

add more query statistics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...

2016-08-22 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/82

[CARBONDATA-165] Support loading fact file with header for all dictionary

# Why raise this pr?
When fact csv already have header and  giving FILEHEADER along with  
ALL_DICTIONARY_PATH option ,  header will be considered as data row , which is 
not correct.
FILEHEADER option must be given only when CSV do not have header .  We can 
read the header from fact file when FILEHEADER  is not  given with 
ALL_DICTIONARY_PATH 

# How to solve?
Add adapter for loading data with header. While loading fact CSV file with 
header, get file header from fact file instead of option("FILEHEADER") (the 
interface "getHeaderFormFactFile" did this)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata 
all_dict_header

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/82.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #82


commit 79d4b0ae87a58cd3437fc6f19bba9865afcac417
Author: foryou2030 
Date:   2016-08-22T10:00:00Z

adapt data with header for all dictionary




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Open discussion and Vote: What kind of JIRA issue events need send mail to dev@carbondata.incubator.apache.org

2016-08-18 Thread foryou2030
option2

发自我的 iPhone

> 在 2016年8月18日,下午3:57,Liang Big data  写道:
> 
> Hi all
> 
> Please discuss and vote, do you think what kind of JIRA issue events need
> send mails to dev@carbondata.incubator.apache.org?
> 
> Option1: None, all JIRA issue events don't need send any mails to
> dev@carbondata.incubator.apache.org, users can directly go to apache jira
> to check issues' content.
> 
> Option2:Issue Created and Issue Commented, the two JIRA issue events send
> mails to dev@carbondata.incubator.apache.org
> 
> Option3:Keep the current notification schema, the below events send mails
> to dev@carbondata.incubator.apache.org.
> 
> or any other option?
> 
> The below is current notification schema
> EventsNotifications
> Issue Created
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Updated
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Assigned
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Resolved
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Closed
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Commented
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Comment Edited
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Comment Deleted
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Reopened
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Deleted
> 
>   - All Watchers
>   - Current Assignee
>   - Reporter
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> Issue Moved
> 
>   - Single Email Address (dev@carbondata.incubator.apache.org)
> 
> 
> 
> Regards
> Liang



[GitHub] incubator-carbondata pull request #76: [CARBONDATA-158] fix load data with f...

2016-08-16 Thread foryou2030
GitHub user foryou2030 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/76

[CARBONDATA-158] fix load data with first line is null

jira url: https://issues.apache.org/jira/browse/CARBONDATA-158

in this pr, hanled load data failure

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foryou2030/incubator-carbondata first_blank

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/76.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #76


commit 0a97ceac69f9e5b5cf4f14d241806abd7dbada92
Author: foryou2030 
Date:   2016-08-16T12:08:19Z

fix load data with first line is null




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---