pabloem commented on pull request #15848:
URL: https://github.com/apache/beam/pull/15848#issuecomment-1029488107


   regarding your comment in 
https://github.com/apache/beam/pull/15848#discussion_r798790351
   
   yeah, SDF is interesting here, and I am not 100% sure how to approach it - 
but yes, we need ordering, so the implementation would be something like:
   
   ```
   resultset = query.execute();
   while(true) {
     resultset.next();
     if(!tryClaim(resultSet.get(key))) {
       return DONE;
     }
     c.output(format(resultSet));
   }
   ```
   
   and the split would be something like:
   ```
   tryClaim(key) {
     this.latestKey = key
   }
   trySplit() {
     ranges = generateRanges(2, latestKey, endOfRange)  // Generate two ranges 
between the current key and the last key
     return new SplitResult(ranges.currentRange, ranges.nextRange);
   }
   ```
   
   The issue is that the query with the full range would be executing in the 
database, so we may duplicate work on the database - so I'm not sure how bad 
this is or not : )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to