[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...

2017-03-10 Thread mayunSaicmotor
Github user mayunSaicmotor commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/638#discussion_r105522667
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -150,12 +138,15 @@ private BitSet 
setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
 BitSet bitSet = new BitSet(numerOfRows);
 if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) 
{
   byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
-  for (int k = 0; k < filterValues.length; k++) {
-for (int j = 0; j < numerOfRows; j++) {
-  if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) 
{
-bitSet.set(j);
-  }
+  for (int i = 0; i < numerOfRows; i++) {
+
+int index = CarbonUtil.binarySearch(filterValues, 0, 
filterValues.length,
--- End diff --

@ravipesala, If put the if clause out of the for clause, it is better?

`  private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk 
dimensionColumnDataChunk,
  int numerOfRows) {
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
  byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();

  if (filterValues.length > 1) {
for (int i = 0; i < numerOfRows; i++) {
  int index = CarbonUtil.binarySearch(filterValues, 0, 
filterValues.length - 1,
  dimensionColumnDataChunk.getChunkData(i));

  if (index >= 0) {
bitSet.set(i);
  }
}
  } else if (filterValues.length == 1) {
for (int i = 0; i < numerOfRows; i++) {
  if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) {
bitSet.set(i);
  }
}
  }
}
return bitSet;
  }`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...

2017-03-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/638


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...

2017-03-10 Thread mayunSaicmotor
Github user mayunSaicmotor commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/638#discussion_r105429564
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -419,6 +419,94 @@ public static int 
getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
 return -(low + 1);
   }
 
+  public static int[] getRangeIndexUsingBinarySearch(
--- End diff --

comments  was added.  Is there anything else need to change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

2017-03-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/638#discussion_r105424505
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -150,12 +138,15 @@ private BitSet 
setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
 BitSet bitSet = new BitSet(numerOfRows);
 if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) 
{
   byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
-  for (int k = 0; k < filterValues.length; k++) {
-for (int j = 0; j < numerOfRows; j++) {
-  if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) 
{
-bitSet.set(j);
-  }
+  for (int i = 0; i < numerOfRows; i++) {
+
+int index = CarbonUtil.binarySearch(filterValues, 0, 
filterValues.length,
--- End diff --

looks fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

2017-03-10 Thread mayunSaicmotor
Github user mayunSaicmotor commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/638#discussion_r105422550
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -150,12 +138,15 @@ private BitSet 
setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
 BitSet bitSet = new BitSet(numerOfRows);
 if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) 
{
   byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
-  for (int k = 0; k < filterValues.length; k++) {
-for (int j = 0; j < numerOfRows; j++) {
-  if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) 
{
-bitSet.set(j);
-  }
+  for (int i = 0; i < numerOfRows; i++) {
+
+int index = CarbonUtil.binarySearch(filterValues, 0, 
filterValues.length,
--- End diff --

does the below is OK?

  private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk 
dimensionColumnDataChunk,
  int numerOfRows) {
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
  byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
  for (int i = 0; i < numerOfRows; i++) {

if (filterValues.length > 1) {
  int index = CarbonUtil.binarySearch(filterValues, 0, 
filterValues.length - 1,
  dimensionColumnDataChunk.getChunkData(i));

  if (index >= 0) {
bitSet.set(i);
  }
} else if (filterValues.length == 1) {
  if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) {
bitSet.set(i);
  }
} else {
  break;
}

  }
}
return bitSet;
  }




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

2017-03-10 Thread mayunSaicmotor
Github user mayunSaicmotor commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/638#discussion_r105416528
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -419,6 +419,94 @@ public static int 
getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
 return -(low + 1);
   }
 
+  public static int[] getRangeIndexUsingBinarySearch(
--- End diff --

you are right, I  really done binary search even for getting the ranges 
previously, but yesterday I done performance test and found the  performance is 
not better than current logic. the binary search range has advantage only under 
the condition of  data array size is very long  and the repeated data is too 
much. But usually the data array size is 12000 for a chunk, not too long. So  
the binary search range has no advantage and I decide to keep the current logic




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

2017-03-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/638#discussion_r105406369
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -419,6 +419,94 @@ public static int 
getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
 return -(low + 1);
   }
 
+  public static int[] getRangeIndexUsingBinarySearch(
--- End diff --

There is not much difference between `getFirstIndexUsingBinarySearch` and 
this method,  I remembered in your last PR you have done binary search even for 
getting the ranges, what happened to it, did you get any functional or 
performance issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

2017-03-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/638#discussion_r105405605
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -150,12 +138,15 @@ private BitSet 
setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
 BitSet bitSet = new BitSet(numerOfRows);
 if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) 
{
   byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
-  for (int k = 0; k < filterValues.length; k++) {
-for (int j = 0; j < numerOfRows; j++) {
-  if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) 
{
-bitSet.set(j);
-  }
+  for (int i = 0; i < numerOfRows; i++) {
+
+int index = CarbonUtil.binarySearch(filterValues, 0, 
filterValues.length,
--- End diff --

if `filterValues` size is one then we better avoid this binary search , 
just compare would be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...

2017-03-10 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/638#discussion_r105404228
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -419,6 +419,94 @@ public static int 
getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
 return -(low + 1);
   }
 
+  public static int[] getRangeIndexUsingBinarySearch(
--- End diff --

Please provide comments this method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---