sanha opened a new pull request #68: [NEMO-125] Fix data loss bug caused by 
SailfishSchedulingPolicy
URL: https://github.com/apache/incubator-nemo/pull/68
 
 
   JIRA: [NEMO-125: Fix data loss bug caused by 
SailfishSchedulingPolicy](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-125)
   
   **Major changes:**
   - Let `InputStreamIterator` in `DataUtil` use the size of each partition 
instead of the number of elements to limit the data to be read.
     - This "limiting" behavior is needed to avoid to read padding bytes 
introduced by compression or other `Stream` implementations (such as 
`ByteArrayOutputStream`).
   - Remove the number of elements in a partition from partition metadata.
   - Re-enable byte array coder optimization in `SailfishPass`.
   
   **Minor changes to note:**
   - Make `DataTransferFactory` be constructed through Tang only (instead of 
public constructor).
   - Make `OutputWriter` be constructed through `DataTransferFactory` only.
   
   **Tests for the changes:**
   - Existing data plane unit tests cover this change.
   - Add `testSailfishInOneExecutor` in `WordCountITCase`.
     - This test reproduces the same resource environment and fails without the 
change in this PR.
   
   **Other comments:**
   - N/A.
   
   resolves 
[NEMO-125](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-125)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to