zabetak commented on code in PR #202:
URL: https://github.com/apache/tez/pull/202#discussion_r1241990886
##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java:
##########
@@ -260,23 +260,20 @@ public List<GroupedSplitContainer>
getGroupedSplits(Configuration conf,
desiredNumSplits = newDesiredNumSplits;
} else if (lengthPerGroup < minLengthPerGroup) {
// splits too small to work. Need to override with size.
- int newDesiredNumSplits = (int)(totalLength/minLengthPerGroup) + 1;
/**
* This is a workaround for systems like S3 that pass the same
* fake hostname for all splits.
*/
if (!allSplitsHaveLocalhost) {
+ int newDesiredNumSplits = (int)(totalLength/minLengthPerGroup) + 1;
+ LOG.info("Desired splits: " + desiredNumSplits + " too large. " +
+ " Desired splitLength: " + lengthPerGroup +
+ " Min splitLength: " + minLengthPerGroup +
+ " New desired splits: " + newDesiredNumSplits +
+ " Total length: " + totalLength +
+ " Original splits: " + originalSplits.size());
desiredNumSplits = newDesiredNumSplits;
}
Review Comment:
I added a log message here
https://github.com/apache/tez/pull/202/commits/c715c0b3e3f54b191124216789ee469ca2d257b7
and also made a small refactoring to gather logging in one place and always
log if splitLength bounds are exceeded.
Anyways the most important point in this patch is to be able to see the
original `desiredNumSplits` in every case; especially when it is different from
`newDesiredNumSplits` or not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]