[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.013.patch Thanks a lot [~sseth] for the review comments. Uploading new patch with just the minor change to PipelinedSorter. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, > TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, > TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, > TEZ-3605.012.patch, TEZ-3605.013.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. > Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced > job, this change would allow not fetching empty partitions and then throwing > them away. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.012.patch Fixing the numspill=0 case for DefaultSorter to honor sendPartitionDetails flag. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, > TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, > TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, TEZ-3605.012.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. > Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced > job, this change would allow not fetching empty partitions and then throwing > them away. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.011.patch Uploading a new patch that has changes to the merge/flush logic. SpillRecord contains entries for all partitions with length=0 in the empty case. The file.out write is done only for non-empty partitions. The defaultSorter's special case for numSpills=0 is also changed as part of this fix. Added some tests to cover some aspects of the change. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, > TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, > TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. > Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced > job, this change would allow not fetching empty partitions and then throwing > them away. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.010.patch Patch needed rebasing after latest commits. Only change from previous version is in the test. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, > TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, > TEZ-3605.009.patch, TEZ-3605.010.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. > Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced > job, this change would allow not fetching empty partitions and then throwing > them away. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.009.patch Modified the pipelined sorter and related test. It checks if a given index record has data or not before calling Merge code on it. Appreciate any comments on the approach and any corrections ( esp. for the pipelined case). > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, > TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, TEZ-3605.009.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. > Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced > job, this change would allow not fetching empty partitions and then throwing > them away. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.008.patch Patch needed rebase on the right branch. Updated. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, > TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. > Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced > job, this change would allow not fetching empty partitions and then throwing > them away. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.007.patch Thanks [~sseth] for the review comments. Attached is a patch that uses {{sendEmptyPartitionDetails}} to make the writer initialization decision for both Default and Pipelined Sorter. Will ping for comments after a clean pre-commit. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, > TEZ-3605.006.patch, TEZ-3605.007.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. > Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced > job, this change would allow not fetching empty partitions and then throwing > them away. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Description: Analogous to the Unordered case we should not have empty partition entries/segments in the Ordered/DefaultSorter case. This will save writing unnecessary data. Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced job, this change would allow not fetching empty partitions and then throwing them away. was:Analogous to the Unordered case we should not have empty partition entries/segments in the Ordered/DefaultSorter case. This will save writing unnecessary data. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. > Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced > job, this change would allow not fetching empty partitions and then throwing > them away. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.006.patch Updated patch that moves incrementing the numRecordsPerPartition values to a method. Currently I have replicated the idea from Unordered case to keep track of records per partition, alternatively we could make it a bitset but it might be helpful later to have an array of records per partition for any usage by extension. Also added getOutputContext getter and modified the test to assert for more conditions like the bitset for empty partitions being set correctly etc. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.005.patch Fixed the findbugs exclude file and tested locally that it does not generate warnings. Test failure was irreproducible locally. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.004.patch Additional change to findbugs exclude file for numRecordsPerPartition in line with partitionStats field. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch, TEZ-3605.004.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.003.patch patch v3 attempts to fix the findbugs warning by declaring the numRecordsPerPartition as private member of ExternalSorter instead of protected. The getter remains public. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, > TEZ-3605.003.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.002.patch Rebasing the patch on master. > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case
[ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3605: - Attachment: TEZ-3605.001.patch > Detect and prune empty partitions for the Ordered case > -- > > Key: TEZ-3605 > URL: https://issues.apache.org/jira/browse/TEZ-3605 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3605.001.patch > > > Analogous to the Unordered case we should not have empty partition > entries/segments in the Ordered/DefaultSorter case. This will save writing > unnecessary data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)