keith-turner commented on code in PR #6040:
URL: https://github.com/apache/accumulo/pull/6040#discussion_r2699546850
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java:
##########
@@ -243,6 +245,26 @@ private SortedKeyValueIterator<Key,Value> createIterator()
} else {
// Scan time iterator options were set, so need to merge those with
pre-parsed table
// iterator options.
+
+ // First ensure the set iterators do not conflict with the existing
table iterators.
+ for (var scanParamIterInfo : scanParams.getSsiList()) {
Review Comment:
The changes make the code much easier to understand. Seems like it checking
for duplicate priorities and duplicate names.
Was experimenting w/ the following to change this new check from `O(M*N)` to
`O(M+N)` by having some quick checks to see if the more expensive checks are
needed. Noticed that the merge code sorts the iterators, which we could
leverage to efficiently check for duplicate priorities. Still need to check
for duplicate names though. Seems like the code is checking for names in the
scan iterator that conflict with names in the table iterators. We could do
that more efficiently by having ParsedIteratorConfig contain a set of unique
iterator names, then this set is computed only on table config changes making
it cheap for the scan code to use. We can use this set of unique names to
quickly check if we need to call the iterator check code.
```java
IteratorConfigUtil.mergeIteratorConfig(iterInfos, iterOpts,
pic.getIterInfo(),
pic.getOpts(), scanParams.getSsiList(), scanParams.getSsio());
if (scanParams.getSsiList().isEmpty() &&
scanParams.getSsio().isEmpty()) {
// The check iterator for conflict code is O(M*N) where M is the
number of table iterators and N is the number of scan iterators. Before
calling the iterator check code do O(M+N) checks to see if the more expensive
check is even needed (normally its not).
// check for duplicate priority or names, assuming iterInfo is
sorted on priority. Leveraging the fact that its sorted to efficiently check
for duplicate priorities.
for(int i = 1; i<iterInfos.size(); i++)
if(iterInfos.get(i-1).getPriority() ==
iterInfos.get(i).getPriority()) {
// TODO call check iterator code which does more expensive
checks
}
}
for(var iterInfo : scanParams.getSsiList()) {
if(pic.getUniqueNames().contains(iterInfo.getIterName())) {
// TODO call check iterator code which does more expensive
checks
}
}
}
```
Looking into this made we wonder if the should change the constant
`IteratorConfigUtil.ITER_INFO_COMPARATOR` to the following.
```java
public static final Comparator<IterInfo> ITER_INFO_COMPARATOR =
Comparator.comparingInt(IterInfo::getPriority).thenComparing(IterInfo::getIterName);
```
This would make iterators w/ the same priority have more deterministic
behavior if we are only going to warn about that in 2.1.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]