keith-turner commented on code in PR #6040:
URL: https://github.com/apache/accumulo/pull/6040#discussion_r2699546850


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java:
##########
@@ -243,6 +245,26 @@ private SortedKeyValueIterator<Key,Value> createIterator()
       } else {
         // Scan time iterator options were set, so need to merge those with 
pre-parsed table
         // iterator options.
+
+        // First ensure the set iterators do not conflict with the existing 
table iterators.
+        for (var scanParamIterInfo : scanParams.getSsiList()) {

Review Comment:
   The changes make the code much easier to understand.  Seems like it checking 
for duplicate priorities and duplicate names.
   
   Was experimenting w/ the following to change this new check from `O(M*N)` to 
`O(M+N)` by having some quick checks to see if the more expensive checks are 
needed.  Noticed that the merge code sorts the iterators, which we could 
leverage to efficiently check for duplicate priorities.  Still need to check 
for duplicate names though.  Seems like the code is checking for names in the 
scan iterator that conflict with names in the table iterators.  We could do 
that more efficiently by having ParsedIteratorConfig contain a set of unique 
iterator names, then this set is computed only on table config changes making 
it cheap for the scan code to use.  We can use this set of unique names to 
quickly check if we need to call the iterator check code.
   
   ```java
           IteratorConfigUtil.mergeIteratorConfig(iterInfos, iterOpts, 
pic.getIterInfo(),
               pic.getOpts(), scanParams.getSsiList(), scanParams.getSsio());
          
           if (scanParams.getSsiList().isEmpty() && 
scanParams.getSsio().isEmpty()) {
             // The check iterator for conflict code is O(M*N) where M is the 
number of table iterators and N is the number of scan iterators.  Before 
calling the iterator check code do O(M+N) checks to see if the more expensive 
check is even needed (normally its not).
   
             // check for duplicate priority or names, assuming iterInfo is 
sorted on priority.  Leveraging the fact that its sorted to efficiently check 
for duplicate priorities.
             for(int i = 1; i<iterInfos.size(); i++)
               if(iterInfos.get(i-1).getPriority() == 
iterInfos.get(i).getPriority()) {
                 // TODO call check iterator code which does more expensive 
checks
               }
             }
   
             for(var iterInfo : scanParams.getSsiList()) {
               if(pic.getUniqueNames().contains(iterInfo.getIterName())) {
                 // TODO call check iterator code which does more expensive 
checks
               }
             }
           }
   ```
   
   Looking into this made we wonder if the should change the constant 
`IteratorConfigUtil.ITER_INFO_COMPARATOR` to the following.
   
   ```java
     public static final Comparator<IterInfo> ITER_INFO_COMPARATOR =
         
Comparator.comparingInt(IterInfo::getPriority).thenComparing(IterInfo::getIterName);
   ```
   
   This would make iterators w/ the same priority have more deterministic 
behavior if we are only going to warn about that in 2.1.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to