[GitHub] incubator-carbondata pull request #627: fix jira CARBONDATA-748 for version ...

2017-03-07 Thread simafengyun
GitHub user simafengyun opened a pull request:

https://github.com/apache/incubator-carbondata/pull/627

fix jira CARBONDATA-748 for version 0.2

fix jira CARBONDATA-748, use binary search to replace linear search
the performance is much better no ,from more than 10 seconds to a few 
millisecond

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/simafengyun/incubator-carbondata 
Fix-CARBONDATA-748

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #627


commit 29466d3a9f5a9508581dfa55a12ed49a64aac14e
Author: mayun 
Date:   2017-03-07T07:57:32Z

fix jira CARBONDATA-748 for version 0.2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: fix jira CARBONDATA-748 for version 0.2

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #627: fix jira CARBONDATA-748 for version ...

2017-03-07 Thread Hexiaoqiao
Github user Hexiaoqiao commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/627#discussion_r104611743
  
--- Diff: core/src/main/java/org/apache/carbondata/core/util/ByteUtil.java 
---
@@ -152,6 +152,159 @@ static boolean lessThanUnsigned(long x1, long x2) {
   return (x1 + Long.MIN_VALUE) < (x2 + Long.MIN_VALUE);
 }
 
+
--- End diff --

please delete this annotation if not use any more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #625: [CARBONDATA-743] Remove redundant CarbonFil...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/625
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1020/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #627: fix jira CARBONDATA-748 for version ...

2017-03-07 Thread Hexiaoqiao
Github user Hexiaoqiao commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/627#discussion_r104613452
  
--- Diff: core/src/main/java/org/apache/carbondata/core/util/ByteUtil.java 
---
@@ -152,6 +152,159 @@ static boolean lessThanUnsigned(long x1, long x2) {
   return (x1 + Long.MIN_VALUE) < (x2 + Long.MIN_VALUE);
 }
 
+
+/* public static int binarySearch(byte[] a, int fromIndex, int 
toIndex, byte[] key) {
+   int keyLength = key.length;
+   rangeCheck(a.length, fromIndex, toIndex, keyLength);
+   return binarySearch0(a, fromIndex, toIndex / keyLength, 
key);
+   }
+
+   // Like public version, but without range checks.
+   private static int binarySearch0(byte[] a, int fromIndex, int 
toIndex, byte[] key) {
+   int low = fromIndex;
+   int high = toIndex - 1;
+   int keyLength = key.length;
+   // int high = toIndex/keyLength;
+
+   while (low <= high) {
+   int mid = (low + high) >>> 1;
+   // byte midVal = a[mid];
+
+   int result = 
ByteUtil.UnsafeComparer.INSTANCE.compareTo(a, mid * keyLength, keyLength, key, 
0,
+   keyLength);
+
+   if (result < 0)
+   low = mid + keyLength;
+   else if (result > 0)
+   high = mid - keyLength;
+   else
+   return mid; // key found
+   }
+   return -(low + 1); // key not found.
+   }*/
+   /**
+* Checks that {@code fromIndex} and {@code toIndex} are in the range 
and toIndex % keyLength = 0
+* and throws an exception if they aren't.
+*/
+   private static void rangeCheck(int arrayLength, int fromIndex, int 
toIndex, int keyWordLength) {
+   if (fromIndex > toIndex || toIndex % keyWordLength != 0) {
+   throw new IllegalArgumentException("fromIndex(" + 
fromIndex + ") > toIndex(" + toIndex + ")");
--- End diff --

the exception msg may be misleading, please give some msg about `toIndex % 
keyWordLength != 0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #627: fix jira CARBONDATA-748 for version ...

2017-03-07 Thread Hexiaoqiao
Github user Hexiaoqiao commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/627#discussion_r104614642
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -154,24 +154,45 @@ private BitSet setFilterdIndexToBitSetWithColumnIndex(
   }
 
   private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk 
dimensionColumnDataChunk,
-  int numerOfRows) {
-BitSet bitSet = new BitSet(numerOfRows);
-if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) 
{
-  FixedLengthDimensionDataChunk fixedDimensionChunk =
-  (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;
-  byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
-  for (int k = 0; k < filterValues.length; k++) {
-for (int j = 0; j < numerOfRows; j++) {
-  if (ByteUtil.UnsafeComparer.INSTANCE
-  .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j * 
filterValues[k].length,
-  filterValues[k].length, filterValues[k], 0, 
filterValues[k].length) == 0) {
-bitSet.set(j);
-  }
-}
-  }
-}
-return bitSet;
-  }
+ int numerOfRows) {
+
+   BitSet bitSet = new BitSet(numerOfRows);
+   //BitSet bitSet1 = new BitSet(numerOfRows);
+   
+   if (dimensionColumnDataChunk instanceof 
FixedLengthDimensionDataChunk) {
+ FixedLengthDimensionDataChunk fixedDimensionChunk =
+ (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;
+ byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
+ byte[] dataChunk= fixedDimensionChunk.getCompleteDataChunk();
+
+ //long start = System.currentTimeMillis();
+ for (int k = 0; k < filterValues.length; k++) {
+ 
+ int[] index = 
ByteUtil.UnsafeComparer.INSTANCE.binaryRangeSearch(dataChunk, 0, 
dataChunk.length, filterValues[k]);
+ for(int i = index[0]; i<=index[1];i++){
+ 
+ bitSet.set(i);
+ }
+ }
+ 
+ //below comments code is used to confirm the binary range search 
is correct  
--- End diff --

please delete annotation if not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #625: [CARBONDATA-743] Remove redundant CarbonFil...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/625
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1021/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #627: fix jira CARBONDATA-748 for version ...

2017-03-07 Thread Hexiaoqiao
Github user Hexiaoqiao commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/627#discussion_r104617853
  
--- Diff: core/src/main/java/org/apache/carbondata/core/util/ByteUtil.java 
---
@@ -152,6 +152,159 @@ static boolean lessThanUnsigned(long x1, long x2) {
   return (x1 + Long.MIN_VALUE) < (x2 + Long.MIN_VALUE);
 }
 
+
+/* public static int binarySearch(byte[] a, int fromIndex, int 
toIndex, byte[] key) {
+   int keyLength = key.length;
+   rangeCheck(a.length, fromIndex, toIndex, keyLength);
+   return binarySearch0(a, fromIndex, toIndex / keyLength, 
key);
+   }
+
+   // Like public version, but without range checks.
+   private static int binarySearch0(byte[] a, int fromIndex, int 
toIndex, byte[] key) {
+   int low = fromIndex;
+   int high = toIndex - 1;
+   int keyLength = key.length;
+   // int high = toIndex/keyLength;
+
+   while (low <= high) {
+   int mid = (low + high) >>> 1;
+   // byte midVal = a[mid];
+
+   int result = 
ByteUtil.UnsafeComparer.INSTANCE.compareTo(a, mid * keyLength, keyLength, key, 
0,
+   keyLength);
+
+   if (result < 0)
+   low = mid + keyLength;
+   else if (result > 0)
+   high = mid - keyLength;
+   else
+   return mid; // key found
+   }
+   return -(low + 1); // key not found.
+   }*/
+   /**
+* Checks that {@code fromIndex} and {@code toIndex} are in the range 
and toIndex % keyLength = 0
+* and throws an exception if they aren't.
+*/
+   private static void rangeCheck(int arrayLength, int fromIndex, int 
toIndex, int keyWordLength) {
+   if (fromIndex > toIndex || toIndex % keyWordLength != 0) {
+   throw new IllegalArgumentException("fromIndex(" + 
fromIndex + ") > toIndex(" + toIndex + ")");
+   }
+   if (fromIndex < 0) {
+   throw new ArrayIndexOutOfBoundsException(fromIndex);
+   }
+   if (toIndex > arrayLength) {
+   throw new ArrayIndexOutOfBoundsException(toIndex);
+   }
+   }
+   
+   /**
+* search a specific key's range boundary in sorted byte array
+* 
+* @param dataChunk
+* @param fromIndex
+* @param toIndex
+*it's max value should be word Number in dataChunk, equal 
to indataChunk.length/keyWord.length
+* @param keyWord
+* @return int[] contains a range's lower boundary and upper boundary
+*/
+   public static int[] binaryRangeSearch(byte[] dataChunk, int fromIndex, 
int toIndex, byte[] keyWord) {
+
+
+   int keyLength = keyWord.length;
+   rangeCheck(dataChunk.length, fromIndex, toIndex, keyLength);
+   
+   // reset to index =  word total number in the dataChunk
+   toIndex = toIndex/keyWord.length;   
+   
+   int[] rangeIndex = new int[2];
+   int low = fromIndex;
+   int high = toIndex - 1;
+
+   while (low <= high) {
+   int mid = (low + high) >>> 1;
+
+   int result = 
ByteUtil.UnsafeComparer.INSTANCE.compareTo(dataChunk, mid * keyLength, 
keyLength, keyWord, 0,
+   keyLength);
+
+   if (result < 0)
+   low = mid + 1;
+   else if (result > 0)
+   high = mid - 1;
+   else {
+   // key found  then find the range bound
+   rangeIndex[0] = 
binaryRangeBoundarySearch(dataChunk, low, mid, keyWord, false);
+   rangeIndex[1] = 
binaryRangeBoundarySearch(dataChunk, mid, high, keyWord, true);
+   return rangeIndex;
+   }
+
+   }
+   // key not found. return a not exist range
+   rangeIndex[0] = 0;
+   rangeIndex[1] = -1;
+   return rangeIndex;
+   }
+
+  /**
+   * use to search a specific keyword's lower boundary and upper boundary 
according to upFindFlg
+   * @param dataChunk
+   * @param fromIndex
+   * @param toIndex
+   * @param keyWord
+   * @param upFindFlg true:find upp

[GitHub] incubator-carbondata issue #627: fix jira CARBONDATA-748 for version 0.2

2017-03-07 Thread Hexiaoqiao
Github user Hexiaoqiao commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
@simafengyun 
thanks for your work, please rename title follow the format 
`[CARBONDATA-issue number>] Title of the pull request`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: fix jira CARBONDATA-748 for version 0.2

2017-03-07 Thread zzcclp
Github user zzcclp commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
Is this pr just be fixed for branch-0.2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: fix jira CARBONDATA-748 for version 0.2

2017-03-07 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #622: [CARBONDATA-744] The property "spark...

2017-03-07 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/622#discussion_r104622543
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 ---
@@ -116,8 +116,8 @@ class CarbonScanRDD(
   i += 1
   result.add(partition)
 }
-  } else if 
(sparkContext.getConf.contains("spark.carbon.custom.distribution") &&
- 
sparkContext.getConf.getBoolean("spark.carbon.custom.distribution", false)) {
+  } else if (java.lang.Boolean
+
.parseBoolean(CarbonProperties.getInstance().getProperty("carbon.custom.distribution")))
 {
--- End diff --

I think carbon.custom.block.distribution is better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: fix jira CARBONDATA-748 for version 0.2

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1022/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: fix jira CARBONDATA-748 for version 0.2

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1023/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #625: [CARBONDATA-743] Remove redundant Ca...

2017-03-07 Thread lionelcao
Github user lionelcao closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: CARBONDATA-748

2017-03-07 Thread simafengyun
Github user simafengyun commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
yes
I just fixed on branch-0.2.
If code is ok, I can continue to fix for version 1.0










At 2017-03-07 17:20:44, "Zhichao Zhang"  wrote:


Is this pr just be fixed for branch-0.2?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: CARBONDATA-748

2017-03-07 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
@simafengyun Thanks for working on it.
Range binary search is impressive. we can use same search in other places 
as well.
But using this search in `setFilterdIndexToBitSet` is not always correct. 
Because here the data may not be sorted always, Only for the first column of 
dimensions is sorted naturally because of mdk order.
And if the data is sorted explicitly(it means it has inverted index) then 
it goes to another method `setFilterdIndexToBitSetWithColumnIndex`.  So here we 
need extra checks to do binary search in the method `setFilterdIndexToBitSet` , 
that is like whether the column is naturally sorted or not.This information we 
may need get from store.  
Please use this Range binary search in 
'setFilterdIndexToBitSetWithColumnIndex' as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: CARBONDATA-748

2017-03-07 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
@simafengyun Please raise this PR on master as we don't have any plans to 
provide patches on branch-0.2. And also please format the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #628: [CARBONDATA-743] Remove redundant Ca...

2017-03-07 Thread lionelcao
GitHub user lionelcao opened a pull request:

https://github.com/apache/incubator-carbondata/pull/628

[CARBONDATA-743] Remove redundant CarbonFilters file

Remove redundant CarbonFilters file in spark2 and keep the one in spark 
common

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lionelcao/incubator-carbondata carbon743

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #628


commit 3ac7a8c25650b92cc7819ce10213b3cdfd8b8135
Author: lucao 
Date:   2017-03-07T09:51:37Z

[CARBONDATA-743] Remove redundant CarbonFilters file




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #629: [CARBONDATA-740] add logger for tota...

2017-03-07 Thread lionelcao
GitHub user lionelcao opened a pull request:

https://github.com/apache/incubator-carbondata/pull/629

[CARBONDATA-740] add logger for total rows processed

add logger for total rows processed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lionelcao/incubator-carbondata carbon740

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #629


commit 11f79f1b4df6826639c8dcb8637e70c773738b5a
Author: lucao 
Date:   2017-03-07T09:56:00Z

[CARBONDATA-740] add logger for total rows processed




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #623: [CARBONDATA-746] Add spark-sql CLI s...

2017-03-07 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/623#discussion_r104630150
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/cli/CarbonContext.scala
 ---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.cli
+
+import org.apache.spark.sql.{DataFrame, SparkSession}
+import org.apache.spark.sql.hive.HiveContext
+
+class CarbonContext(
--- End diff --

I guess it is not required to create context , just use the context 
available in  CarbonSession would be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #623: [CARBONDATA-746] Add spark-sql CLI s...

2017-03-07 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/623#discussion_r104629836
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/cli/CarbonSQLCLIDriver.scala
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.cli
+
+import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.sql.{SparkSession, SQLContext}
+import org.apache.spark.sql.hive.thriftserver.{SparkSQLCLIDriver, 
SparkSQLEnv}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+
+object CarbonSQLCLIDriver {
+
+  private val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+  var hiveContext: SQLContext = _
+  var sparkContext: SparkContext = _
+
+  def main(args: Array[String]): Unit = {
+init()
+SparkSQLEnv.sparkContext = sparkContext
+SparkSQLEnv.sqlContext = hiveContext
+SparkSQLCLIDriver.installSignalHandler()
+SparkSQLCLIDriver.main(args)
+  }
+
+  def init() {
+if (hiveContext == null) {
+  val sparkConf = new SparkConf(loadDefaults = true)
+
+  import org.apache.spark.sql.CarbonSession._
+
+  val storePath = System.getenv("CARBON_HOME") + 
"/bin/carbonsqlclistore"
+  val warehouse = System.getenv("CARBON_HOME") + "/warehouse"
+  val carbon = SparkSession
+  .builder()
+  .master(System.getProperty("spark.master"))
+  .appName("CarbonSQLCLIDriver")
+  .config("spark.sql.warehouse.dir", warehouse)
+  .getOrCreateCarbonSession(storePath, storePath)
+
+  hiveContext = new CarbonContext(carbon)
--- End diff --

I guess it is not required to create context here, just use 
`carbon.sqlContext` is enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #629: [CARBONDATA-740] add logger for total rows ...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/629
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1025/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #629: [CARBONDATA-740] add logger for tota...

2017-03-07 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/629#discussion_r104634820
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java
 ---
@@ -157,6 +157,7 @@ protected CarbonRowBatch processRowBatch(CarbonRowBatch 
rowBatch) {
   public void close() {
 if (!closed) {
   closed = true;
+  LOGGER.info("Total rows processed in step: " + rowCounter.get());
--- End diff --

Please add the step name as well here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #628: [CARBONDATA-743] Remove redundant CarbonFil...

2017-03-07 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/628
  
@lionelcao Thank you for working on this,
But there are some differences in files from `CarbonFilters` of 
spark-common package and spark2 package. That is why I mentioned in jira to 
follow these steps,
1.Delete the CarbonFilters scala file from spark-common package
2. Move the CarbonFilters scala from spark2 package to spark-common package.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: CARBONDATA-748

2017-03-07 Thread simafengyun
Github user simafengyun commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  


You mentioned the below,
>>
But using this search in setFilterdIndexToBitSet is not always correct. 
Because here the data may not be sorted always, Only for the first column of 
dimensions is sorted naturally because of mdk order.
>>
 but i don't think so. 
the order you mentioned is the logical level order(MDK).
As I know, for dimension column, it has physical order in chunk level.
for dimension data which has dictionary encode, the dictionary data will 
sorted in blocklet level and keep the order in chunk on the physical  disk.
So after the one chunk dimension data read, it will keep the order, so I 
think it is fit for the binary search.


if I was wrong, please feel free to tell me, thanks 










At 2017-03-07 17:44:28, "Ravindra Pesala"  wrote:


@simafengyun Thanks for working on it.
Range binary search is impressive. we can use same search in other places 
as well.
But using this search in setFilterdIndexToBitSet is not always correct. 
Because here the data may not be sorted always, Only for the first column of 
dimensions is sorted naturally because of mdk order.
And if the data is sorted explicitly(it means it has inverted index) then 
it goes to another method setFilterdIndexToBitSetWithColumnIndex. So here we 
need extra checks to do binary search in the method setFilterdIndexToBitSet , 
that is like whether the column is naturally sorted or not.This information we 
may need get from store.
Please use this Range binary search in 
'setFilterdIndexToBitSetWithColumnIndex' as well.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: CARBONDATA-748

2017-03-07 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
@simafengyun you are right, default we sort the data but we can avoid 
sorting using `no_inverted_index` property at tblproperties level. So if user 
uses this configuration then we cannot do binary search always. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-07 Thread jackylk
Github user jackylk commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104642435
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CompareTest {
+
+  val parquetTableName = "comparetest_parquet"
+  val carbonTableName = "comparetest_carbon"
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+import spark.implicits._
+spark.sparkContext.parallelize(1 to 10 * 1000 * 1000, 4)
+.map(x => ("i" + x, "p" + x % 10, "j" + x % 100, x, x + 1, (x + 7) 
% 21, (x + 5) / 43, x
+* 5))
+.toDF("id", "country", "city", "c4", "c5", "c6", "c7", "c8")
+  }
+
+  private def loadParquetTable(spark: SparkSession, input: DataFrame): 
Long = timeit {
+input.write.mode(SaveMode.Overwrite).parquet(parquetTableName)
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: CARBONDATA-748

2017-03-07 Thread simafengyun
Github user simafengyun commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
so binary search only fit the below cases,please comfirm thanks

1 the first dimension column , since it's logicl order are the same with 
physical order in blocklet level,so it does not need inverted index

2 for dimension column which has inverted index

question 1

if a dimension column is not the first colmn and in a blocklet it just has 
the same order for both logical and physical,will carbondata create inverted 
index for it?

question 2

is there an easy way to identify whether a dimension column has 
no_inverted_index property config by user in source code ,so it will be 
helpful for me to add a condition to use binary search



thanks

在 2017-03-07 18:35:05,"Ravindra Pesala"  
写道:


@simafengyun you are right, default we sort the data but we can avoid 
sorting using no_inverted_index property at tblproperties level. So if user 
uses this configuration then we cannot do binary search always.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #630: [CARBONDATA-730]added decimal type i...

2017-03-07 Thread anubhav100
GitHub user anubhav100 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/630

[CARBONDATA-730]added decimal type in carbon data frame writer

Below exception is thrown while trying to save dataframe with a decimal 
column type.

scala> df.printSchema
– account: integer (nullable = true)
– currency: integer (nullable = true)
– branch: integer (nullable = true)
– country: integer (nullable = true)
– date: date (nullable = true)
– fcbalance: decimal(16,3) (nullable = true)
– lcbalance: decimal(16,3) (nullable = true)

scala> df.write.format("carbondata").option("tableName", 
"accBal").option("compress", "true").mode(SaveMode.Overwrite).save()

java.lang.RuntimeException: unsupported type: DecimalType(16,3)
at scala.sys.package$.error(package.scala:27)
   


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anubhav100/incubator-carbondata CARBONDATA-730

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/630.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #630


commit acd67332af60ad78dd93c1897ad31af4621c237e
Author: anubhav100 
Date:   2017-03-06T12:39:55Z

added decimal type in carbon data frame writer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #630: [CARBONDATA-730]added decimal type in carbo...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/630
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1026/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (CARBONDATA-745) Does carbondata apply to scenes that need to sort historical and current data?

2017-03-07 Thread Liang Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899315#comment-15899315
 ] 

Liang Chen commented on CARBONDATA-745:
---

First,  see if i understood it:
There is a sorted table, and will generate a new table with real-time data. the 
two tables have one same column(eg: name is key), you want to do join by column 
"key" to merge two tables into one table,  complete sort operation along with 
joining operation , is it correct ?

During loading data, can't do like that. For two different table data, you must 
manually do join operation as per your requires.

Regards
Liang
 


> Does carbondata apply to scenes that need to sort historical and current data?
> --
>
> Key: CARBONDATA-745
> URL: https://issues.apache.org/jira/browse/CARBONDATA-745
> Project: CarbonData
>  Issue Type: Wish
>  Components: data-load
>Affects Versions: 1.0.0-incubating
>Reporter: ke xu
>Priority: Minor
>
> Does carbondata apply to scenes that need to sort historical and current data?
> Now there is a new scene that needs to sort the current data and historical 
> data in real time according to certain rules,and returns the sorted data when 
> querying.
> It's just like hbase put data sorted by rowkey.
> We want to sort the data when we load the data without having to spend extra 
> time to sort。
> Is it suitable for this scene?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] incubator-carbondata issue #624: [CARBONDATA-747][WIP] Add simple performanc...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/624
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1027/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #630: [CARBONDATA-730]added decimal type in carbo...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/630
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1028/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #631: [WIP]Adding Header And Making Footer...

2017-03-07 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/631

[WIP]Adding Header And Making Footer Optional

Currently carbon does not support appendable format, so below changes is to 
support appendable format  in V3 data file format by making footer option and 
added header in V3 carbon data file .

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
AddingHeaderAndMakingFooterOptional

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #631


commit 7b5cad74f6e050001994c2b4426f54b903051518
Author: kumarvishal 
Date:   2017-03-07T12:54:13Z

Adding Header And Making Footer Optional




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #631: [WIP]Adding Header And Making Footer Option...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/631
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1029/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-751) Adding Header and making footer optional

2017-03-07 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-751:
---

 Summary: Adding Header and making footer optional
 Key: CARBONDATA-751
 URL: https://issues.apache.org/jira/browse/CARBONDATA-751
 Project: CarbonData
  Issue Type: Bug
Reporter: kumar vishal


Currently carbon does not support appendable format, so below changes is to 
support appendable format in V3 data file format by making footer option and 
added header in V3 carbon data file .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] incubator-carbondata pull request #631: [CARBONDATA-751]Adding Header And Ma...

2017-03-07 Thread jackylk
Github user jackylk commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/631#discussion_r104660524
  
--- Diff: pom.xml ---
@@ -93,6 +93,7 @@
 
   
 common
+format
--- End diff --

This is not required since we are using format jar from maven repo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #631: [CARBONDATA-751]Adding Header And Ma...

2017-03-07 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/631#discussion_r104660895
  
--- Diff: pom.xml ---
@@ -93,6 +93,7 @@
 
   
 common
+format
--- End diff --

I have change the format to run all the test i am adding this otherwise 
build will fail, after passing all the testcases i will remove this 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #631: [CARBONDATA-751]Adding Header And Making Fo...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/631
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1030/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #623: [CARBONDATA-746] Add spark-sql CLI support ...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/623
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1031/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #627: CARBONDATA-748

2017-03-07 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/627
  
Yes, those are the 2 cases we can do binary search on dimension.

Regarding question1 , no it does not create inverted index if the data is 
already sorted(naturally sorted).

Regarding question2, it is an issue pending from long time, it requires 
changes from data writer and as well as data reader to update the sort state.  
There are attributes like SortState and Encoders in DataChunk of format 
needs to updated properly while writing and get them back while reading to know 
the dimension has inverted_index or not, or it is naturally sorted or not. 

My suggestion is don't change the method of  `setFilterdIndexToBitSet` as 
the old logic is required for no_inverted index and non sorted columns. Update 
the the method `setFilterdIndexToBitSetWithColumnIndex` with your range search. 
 To know the dimension is naturally sorted or not we can add another PR to 
update the read/write logic. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (CARBONDATA-748) "between and" filter query is very slow

2017-03-07 Thread Jarck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarck updated CARBONDATA-748:
-





thanks for your quick response and suggestion.


But if don't change setFilterdIndexToBitSet.
what about the filter on the first dimension column?
currently in this case it will run the method setFilterdIndexToBitSet and cause 
the query very slow.


maybe also need to change the logic when filter on the first dimension column, 
let it run the method setFilterdIndexToBitSetWithColumnIndex, does it ok?















> "between and" filter query is very slow
> ---
>
> Key: CARBONDATA-748
> URL: https://issues.apache.org/jira/browse/CARBONDATA-748
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jarck
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Hi,
> Currently In include and exclude filter case when dimension column does not
> have inverted index it is doing linear search , We can add binary search
> when data for that column is sorted, to get this information we can check
> in carbon table for that column whether user has selected no inverted index
> or not. If user has selected No inverted index while creating a column this
> code is fine, if user has not selected then data will be sorted so we can
> add binary search which will improve the performance.
> Please raise a Jira for this improvement
> -Regards
> Kumar Vishal
> On Fri, Mar 3, 2017 at 7:42 PM, 马云  wrote:
> Hi Dev,
> I used carbondata version 0.2 in my local machine, and found that the
> "between and" filter query is very slow.
> the root caused is by the below code in IncludeFilterExecuterImpl.java.
> It takes about 20s in my test.
> The code's  time complexity is O(n*m). I think it needs to optimized,
> please confirm. thanks
>  private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens
> ionColumnDataChunk,
>  intnumerOfRows) {
>BitSet bitSet = new BitSet(numerOfRows);
>if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk)
> {
>  FixedLengthDimensionDataChunk fixedDimensionChunk =
>  (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;
>  byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
>  longstart = System.currentTimeMillis();
>  for (intk = 0; k < filterValues.length; k++) {
>for (intj = 0; j < numerOfRows; j++) {
>  if (ByteUtil.UnsafeComparer.INSTANCE
>  .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j *
> filterValues[k].length,
>  filterValues[k].length, filterValues[k], 0,
> filterValues[k].length) == 0) {
>bitSet.set(j);
>  }
>}
>  }
>  System.out.println("loop time: "+(System.currentTimeMillis() -
> start));
>}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] incubator-carbondata pull request #631: [CARBONDATA-751]Adding Header And Ma...

2017-03-07 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/631#discussion_r104709184
  
--- Diff: format/src/main/thrift/carbondata.thrift ---
@@ -183,11 +183,23 @@ struct FileFooter{
 }
 
 /**
+* Footer for indexed carbon file
+*/
+struct FileFooterVersion3{
+1: required i64 num_rows; // Total number of rows in this file
+2: required SegmentInfo segment_info;  // Segment info (will be 
same/repeated for all files in this segment)
+3: required list blocklet_index_list;   // blocklet 
index of all blocklets in this file
+4: optional list blocklet_info_list3;   // Information 
about blocklets of all columns in this file
+5: optional dictionary.ColumnDictionaryChunk dictionary; // blocklet 
local dictionary
+}
+
+/**
  * Header for appendable carbon file
  */
 struct FileHeader{
1: required i32 version; // version used for data compatibility
2: required list table_columns;  // Description of 
columns in this file
+   3: required bool is_Footer_Present; //  to check whether footer is 
present or not  
--- End diff --

Making the field as `required` will loose backward compatability. Better to 
make as `optional`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (CARBONDATA-745) Does carbondata apply to scenes that need to sort historical and current data?

2017-03-07 Thread ke xu (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900540#comment-15900540
 ] 

ke xu commented on CARBONDATA-745:
--

It is the need to sort a field of the table,returns the sorted data when 
querying, I confirmed that the carbon can achieve this effect, We should to put 
the fields in the first column,and enable the index, that the data will be 
sorted when loaded.
Thanks for reply.

> Does carbondata apply to scenes that need to sort historical and current data?
> --
>
> Key: CARBONDATA-745
> URL: https://issues.apache.org/jira/browse/CARBONDATA-745
> Project: CarbonData
>  Issue Type: Wish
>  Components: data-load
>Affects Versions: 1.0.0-incubating
>Reporter: ke xu
>Priority: Minor
>
> Does carbondata apply to scenes that need to sort historical and current data?
> Now there is a new scene that needs to sort the current data and historical 
> data in real time according to certain rules,and returns the sorted data when 
> querying.
> It's just like hbase put data sorted by rowkey.
> We want to sort the data when we load the data without having to spend extra 
> time to sort。
> Is it suitable for this scene?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CARBONDATA-745) Does carbondata apply to scenes that need to sort historical and current data?

2017-03-07 Thread ke xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ke xu resolved CARBONDATA-745.
--
   Resolution: Resolved
Fix Version/s: 1.0.0-incubating

> Does carbondata apply to scenes that need to sort historical and current data?
> --
>
> Key: CARBONDATA-745
> URL: https://issues.apache.org/jira/browse/CARBONDATA-745
> Project: CarbonData
>  Issue Type: Wish
>  Components: data-load
>Affects Versions: 1.0.0-incubating
>Reporter: ke xu
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> Does carbondata apply to scenes that need to sort historical and current data?
> Now there is a new scene that needs to sort the current data and historical 
> data in real time according to certain rules,and returns the sorted data when 
> querying.
> It's just like hbase put data sorted by rowkey.
> We want to sort the data when we load the data without having to spend extra 
> time to sort。
> Is it suitable for this scene?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (CARBONDATA-745) Does carbondata apply to scenes that need to sort historical and current data?

2017-03-07 Thread ke xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ke xu closed CARBONDATA-745.


> Does carbondata apply to scenes that need to sort historical and current data?
> --
>
> Key: CARBONDATA-745
> URL: https://issues.apache.org/jira/browse/CARBONDATA-745
> Project: CarbonData
>  Issue Type: Wish
>  Components: data-load
>Affects Versions: 1.0.0-incubating
>Reporter: ke xu
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> Does carbondata apply to scenes that need to sort historical and current data?
> Now there is a new scene that needs to sort the current data and historical 
> data in real time according to certain rules,and returns the sorted data when 
> querying.
> It's just like hbase put data sorted by rowkey.
> We want to sort the data when we load the data without having to spend extra 
> time to sort。
> Is it suitable for this scene?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] incubator-carbondata issue #629: [CARBONDATA-740] add logger for total rows ...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/629
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1032/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #628: [CARBONDATA-743] Remove redundant CarbonFil...

2017-03-07 Thread lionelcao
Github user lionelcao commented on the issue:

https://github.com/apache/incubator-carbondata/pull/628
  
Hi @ravipesala , I compared those two files before commit and I think the 
difference has no impact to the functionality. Any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #629: [CARBONDATA-740] add logger for total rows ...

2017-03-07 Thread lionelcao
Github user lionelcao commented on the issue:

https://github.com/apache/incubator-carbondata/pull/629
  
@ravipesala  Added step name, please check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (CARBONDATA-740) Add logger for rows processed while closing in AbstractDataLoadProcessorStep

2017-03-07 Thread Cao, Lionel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel reassigned CARBONDATA-740:
--

Assignee: Cao, Lionel  (was: Liang Chen)

> Add logger for rows processed while closing in AbstractDataLoadProcessorStep
> 
>
> Key: CARBONDATA-740
> URL: https://issues.apache.org/jira/browse/CARBONDATA-740
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ravindra Pesala
>Assignee: Cao, Lionel
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add logger for rows processed while closing in AbstractDataLoadProcessorStep.
> It is good to print the total records processed while closing the step, so 
> please log the rows processed in AbstractDataLoadProcessorStep



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-732) User unable to execute the select/Load query using thrift server.

2017-03-07 Thread anubhav tarar (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anubhav tarar reassigned CARBONDATA-732:


Assignee: anubhav tarar

> User unable to execute the select/Load query using thrift server. 
> --
>
> Key: CARBONDATA-732
> URL: https://issues.apache.org/jira/browse/CARBONDATA-732
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.0.0-incubating
> Environment: Spark 2.1
>Reporter: Vinod Rohilla
>Assignee: anubhav tarar
> Attachments: LOG_FIle
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Result does not display to user while hit Select/Load query.
> Steps to reproduce:
> 1:Hit the query :
> 0: jdbc:hive2://localhost:1> select * from t4;
> Note: Cursor Keep blinking on beeline.
> 2: Logs on Thrift server:
> Error sending result 
> StreamResponse{streamId=/jars/carbondata_2.11-1.0.0-incubating-SNAPSHOT-shade-hadoop2.2.0.jar,
>  byteCount=19350001, 
> body=FileSegmentManagedBuffer{file=/opt/spark-2.1.0/carbonlib/carbondata_2.11-1.0.0-incubating-SNAPSHOT-shade-hadoop2.2.0.jar,
>  offset=0, length=19350001}} to /192.168.2.179:48291; closing connection
> java.lang.AbstractMethodError
>   at io.netty.util.ReferenceCountUtil.touch(ReferenceCountUtil.java:73)
>   at 
> io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:811)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:724)
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:111)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:739)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:731)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:817)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:724)
>   at 
> io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:305)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:739)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:815)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:795)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:832)
>   at 
> io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1032)
>   at 
> io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:296)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:150)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChan

[GitHub] incubator-carbondata pull request #632: [CARBONDATA-732] Exclude netty depen...

2017-03-07 Thread PKOfficial
GitHub user PKOfficial opened a pull request:

https://github.com/apache/incubator-carbondata/pull/632

[CARBONDATA-732] Exclude netty dependencies in assembly module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PKOfficial/incubator-carbondata exclude-netty

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #632


commit f551b091b7fbf2775e0ddced4d0a52d663d940f8
Author: Prabhat Kashyap 
Date:   2017-03-08T04:59:26Z

exclude netty dependencies in assembly module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #632: [CARBONDATA-732] Exclude netty dependencies...

2017-03-07 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/632
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---