[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-27 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.013.patch

Thanks a lot [~sseth] for the review comments. Uploading new patch with just 
the minor change to PipelinedSorter.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, 
> TEZ-3605.012.patch, TEZ-3605.013.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-26 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.012.patch

Fixing the numspill=0 case for DefaultSorter to honor sendPartitionDetails flag.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch, TEZ-3605.012.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-06-26 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.011.patch

Uploading a new patch that has changes to the merge/flush logic. SpillRecord 
contains entries for all partitions with length=0 in the empty case. The 
file.out write is done only for non-empty partitions. The defaultSorter's 
special case for numSpills=0 is also changed as part of this fix. Added some 
tests to cover some aspects of the change.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-30 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.010.patch

Patch needed rebasing after  latest commits. Only change from previous version 
is in the test.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-22 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.009.patch

Modified the pipelined sorter and related test. It checks if a given index 
record has data or not before calling Merge code on it. Appreciate any comments 
on the approach and any corrections ( esp. for the pipelined case). 

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, TEZ-3605.009.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-18 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.008.patch

Patch needed rebase on the right branch. Updated.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-18 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.007.patch

Thanks [~sseth] for the review comments. Attached is a patch that uses 
{{sendEmptyPartitionDetails}} to make the writer initialization decision for 
both Default and Pipelined Sorter. Will ping for comments after a clean 
pre-commit.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-05-04 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Description: 
Analogous to the Unordered case we should not have empty partition 
entries/segments in the Ordered/DefaultSorter case. This will save writing 
unnecessary data.
Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
job, this change would allow not fetching empty partitions and then throwing 
them away.

  was:Analogous to the Unordered case we should not have empty partition 
entries/segments in the Ordered/DefaultSorter case. This will save writing 
unnecessary data.


> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-15 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.006.patch

Updated patch that moves incrementing the numRecordsPerPartition values to a 
method. Currently I have replicated the idea from Unordered case to keep track 
of records per partition, alternatively we could make it a bitset but it might 
be helpful later to have an array of records per partition for any usage by 
extension. Also added getOutputContext getter and modified the test to assert 
for more conditions like the bitset for empty partitions being set correctly 
etc. 

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-07 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.005.patch

Fixed the findbugs exclude file and tested locally that it does not generate 
warnings. Test failure was irreproducible locally. 

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-07 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.004.patch

Additional change to findbugs exclude file for numRecordsPerPartition in line 
with partitionStats field.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-06 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.003.patch

patch v3 attempts to fix the findbugs warning by declaring the 
numRecordsPerPartition as private member of ExternalSorter instead of 
protected. The getter remains public.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-06 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.002.patch

Rebasing the patch on master.

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (TEZ-3605) Detect and prune empty partitions for the Ordered case

2017-03-03 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3605:
-
Attachment: TEZ-3605.001.patch

> Detect and prune empty partitions for the Ordered case
> --
>
> Key: TEZ-3605
> URL: https://issues.apache.org/jira/browse/TEZ-3605
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3605.001.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)