[ https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shenxingwuying updated KUDU-3455: --------------------------------- Description: Improve space complexity about prune hash partitions for in-list predicate Pruning hash partitions for in-list predicate at java-client, the logic codes has a high space complexity, and it may cause java-client out of memory. {code:java} // java List<PartialRow> rows = Arrays.asList(schema.newPartialRow()); for (int idx : columnIdxs) { List<PartialRow> newRows = new ArrayList<>(); ColumnSchema column = schema.getColumnByIndex(idx); KuduPredicate predicate = predicates.get(column.getName()); List<byte[]> predicateValues; if (predicate.getType() == KuduPredicate.PredicateType.EQUALITY) { predicateValues = Collections.singletonList(predicate.getLower()); } else { predicateValues = Arrays.asList(predicate.getInListValues()); } // For each of the encoded string, replicate it by the number of values in // equality and in-list predicate. for (PartialRow row : rows) { for (byte[] predicateValue : predicateValues) { PartialRow newRow = new PartialRow(row); newRow.setRaw(idx, predicateValue); newRows.add(newRow); } } rows = newRows; } for (PartialRow row : rows) { int hash = KeyEncoder.getHashBucket(row, hashSchema); hashBuckets.set(hash); } {code} This patch fixes the problem and provide a recursive algorithm, that uses a method like 'deep first search' to pick all combinations and try to release PartialRow objects ASAP. was: [java] Improve space complexity about prune hash partitions for in-list predicate Pruning hash partitions for in-list predicate at java-client, the logic codes has a high space complexity, and it may cause java-client out of memory. This patch fixes the problem and provide a recursive algorithm, that uses a method like 'deep first search' to pick all combinations and try to release PartialRow objects ASAP. > Improve space complexity about prune hash partitions for in-list predicate > -------------------------------------------------------------------------- > > Key: KUDU-3455 > URL: https://issues.apache.org/jira/browse/KUDU-3455 > Project: Kudu > Issue Type: Task > Reporter: shenxingwuying > Assignee: shenxingwuying > Priority: Major > > Improve space complexity about prune hash partitions for in-list predicate > Pruning hash partitions for in-list predicate at java-client, the logic > codes has a high space complexity, and it may cause java-client out > of memory. > > > {code:java} > // java > List<PartialRow> rows = Arrays.asList(schema.newPartialRow()); for > (int idx : columnIdxs) { List<PartialRow> newRows = new ArrayList<>(); > ColumnSchema column = schema.getColumnByIndex(idx); KuduPredicate > predicate = predicates.get(column.getName()); List<byte[]> > predicateValues; if (predicate.getType() == > KuduPredicate.PredicateType.EQUALITY) { predicateValues = > Collections.singletonList(predicate.getLower()); } else { > predicateValues = Arrays.asList(predicate.getInListValues()); } // > For each of the encoded string, replicate it by the number of values in > // equality and in-list predicate. for (PartialRow row : rows) { > for (byte[] predicateValue : predicateValues) { PartialRow newRow = > new PartialRow(row); newRow.setRaw(idx, predicateValue); > newRows.add(newRow); } } rows = newRows; } for > (PartialRow row : rows) { int hash = KeyEncoder.getHashBucket(row, > hashSchema); hashBuckets.set(hash); } > {code} > > > > > > > This patch fixes the problem and provide a recursive algorithm, that > uses a method like 'deep first search' to pick all combinations and > try to release PartialRow objects ASAP. -- This message was sent by Atlassian Jira (v8.20.10#820010)