[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...
Github user mayunSaicmotor commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/638#discussion_r105522667 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); - for (int k = 0; k < filterValues.length; k++) { -for (int j = 0; j < numerOfRows; j++) { - if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) { -bitSet.set(j); - } + for (int i = 0; i < numerOfRows; i++) { + +int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length, --- End diff -- @ravipesala, If put the if clause out of the for clause, it is better? ` private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnDataChunk, int numerOfRows) { BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); if (filterValues.length > 1) { for (int i = 0; i < numerOfRows; i++) { int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length - 1, dimensionColumnDataChunk.getChunkData(i)); if (index >= 0) { bitSet.set(i); } } } else if (filterValues.length == 1) { for (int i = 0; i < numerOfRows; i++) { if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) { bitSet.set(i); } } } } return bitSet; }` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/638 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...
Github user mayunSaicmotor commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/638#discussion_r105429564 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d return -(low + 1); } + public static int[] getRangeIndexUsingBinarySearch( --- End diff -- comments was added. Is there anything else need to change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/638#discussion_r105424505 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); - for (int k = 0; k < filterValues.length; k++) { -for (int j = 0; j < numerOfRows; j++) { - if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) { -bitSet.set(j); - } + for (int i = 0; i < numerOfRows; i++) { + +int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length, --- End diff -- looks fine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Github user mayunSaicmotor commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/638#discussion_r105422550 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); - for (int k = 0; k < filterValues.length; k++) { -for (int j = 0; j < numerOfRows; j++) { - if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) { -bitSet.set(j); - } + for (int i = 0; i < numerOfRows; i++) { + +int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length, --- End diff -- does the below is OK? private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnDataChunk, int numerOfRows) { BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); for (int i = 0; i < numerOfRows; i++) { if (filterValues.length > 1) { int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length - 1, dimensionColumnDataChunk.getChunkData(i)); if (index >= 0) { bitSet.set(i); } } else if (filterValues.length == 1) { if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) { bitSet.set(i); } } else { break; } } } return bitSet; } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Github user mayunSaicmotor commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/638#discussion_r105416528 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d return -(low + 1); } + public static int[] getRangeIndexUsingBinarySearch( --- End diff -- you are right, I really done binary search even for getting the ranges previously, but yesterday I done performance test and found the performance is not better than current logic. the binary search range has advantage only under the condition of data array size is very long and the repeated data is too much. But usually the data array size is 12000 for a chunk, not too long. So the binary search range has no advantage and I decide to keep the current logic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/638#discussion_r105406369 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d return -(low + 1); } + public static int[] getRangeIndexUsingBinarySearch( --- End diff -- There is not much difference between `getFirstIndexUsingBinarySearch` and this method, I remembered in your last PR you have done binary search even for getting the ranges, what happened to it, did you get any functional or performance issues? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/638#discussion_r105405605 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java --- @@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD BitSet bitSet = new BitSet(numerOfRows); if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) { byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys(); - for (int k = 0; k < filterValues.length; k++) { -for (int j = 0; j < numerOfRows; j++) { - if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) { -bitSet.set(j); - } + for (int i = 0; i < numerOfRows; i++) { + +int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length, --- End diff -- if `filterValues` size is one then we better avoid this binary search , just compare would be enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/638#discussion_r105404228 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d return -(low + 1); } + public static int[] getRangeIndexUsingBinarySearch( --- End diff -- Please provide comments this method --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---