[ 
https://issues.apache.org/jira/browse/TEZ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074652#comment-14074652
 ] 

Bikas Saha commented on TEZ-1107:
---------------------------------

This was originally opened by Daniel because Pig was setting an initial 
parallelism which could then be increased later on. The approach then changed 
to setting initial parallelism to -1 and setting the correct parallelism later 
on. So Pig should not need this feature any longer, at least for the original 
use case.

In general, this does not need an API change since the parallelism is specified 
in the API already. Just that support is currently not there when the 
parallelism actually increases.

> Support increase of parallelism of vertex in case of custom partitioner
> -----------------------------------------------------------------------
>
>                 Key: TEZ-1107
>                 URL: https://issues.apache.org/jira/browse/TEZ-1107
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Bikas Saha
>
> Current VertexManagerPlugin/EdgeManager mechanism support decrease of 
> parallelism of a vertex, but increase parallelism is not supported. In 
> general, we need to do repartition to increase the parallelism. However, in 
> my simplified case, the proceeding vertex is using a custom partitioner which 
> is able to partition to the final parallelism, repartitioning is not needed. 
> However, I hit an exception from sorter:
>                     : Caused by: java.io.IOException: Illegal partition for 
> Null: false index: 0 53.8 (2), TotalPartitions: 2
>                     : at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.collect(DefaultSorter.java:208)
>                     : at 
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.write(DefaultSorter.java:185)
>  
>                     : at 
> org.apache.tez.runtime.library.output.OnFileSortedOutput$1.write(OnFileSortedOutput.java:111)
>     
>                     : at 
> org.apache.pig.backend.hadoop.executionengine.tez.POIdentityInOutTez.getNextTuple(POIdentityInOutTez.java:148)
>                     : ... 8 more   
> While increase parallelism in general is harder, increase parallelism with a 
> custom partitioner might be easier to fix. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to