sanha opened a new pull request #68: [NEMO-125] Fix data loss bug caused by SailfishSchedulingPolicy URL: https://github.com/apache/incubator-nemo/pull/68 JIRA: [NEMO-125: Fix data loss bug caused by SailfishSchedulingPolicy](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-125) **Major changes:** - Let `InputStreamIterator` in `DataUtil` use the size of each partition instead of the number of elements to limit the data to be read. - This "limiting" behavior is needed to avoid to read padding bytes introduced by compression or other `Stream` implementations (such as `ByteArrayOutputStream`). - Remove the number of elements in a partition from partition metadata. - Re-enable byte array coder optimization in `SailfishPass`. **Minor changes to note:** - Make `DataTransferFactory` be constructed through Tang only (instead of public constructor). - Make `OutputWriter` be constructed through `DataTransferFactory` only. **Tests for the changes:** - Existing data plane unit tests cover this change. - Add `testSailfishInOneExecutor` in `WordCountITCase`. - This test reproduces the same resource environment and fails without the change in this PR. **Other comments:** - N/A. resolves [NEMO-125](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-125)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services