[ https://issues.apache.org/jira/browse/PIG-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-1932: ---------------------------- Fix Version/s: 0.9.0 Assignee: Alan Gates Status: Patch Available (was: Open) > GFCross should allow the user to set the DEFAULT_PARALLELISM value > ------------------------------------------------------------------ > > Key: PIG-1932 > URL: https://issues.apache.org/jira/browse/PIG-1932 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.8.0 > Reporter: Alan Gates > Assignee: Alan Gates > Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-1932.patch > > > The internal UDF GFCross uses a final static int DEFAULT_PARALLELISM to > determine how wide to spread the records in a cross. It is currently hard > wired to 96. There are no comments in the code on how that value was settled > on. Despite the name, this value is not necessarily related to the reduce > parallelism controlled by the parallel clause. It controls how many > artificial join key values are generated and how many times each record is > duplicated before going through the join. The higher it is set the more key > values (and thus the less likely the cross will run out of memory) but also > the more times each record is duplicated in the map phase before being sent > to the reduce. > We should leave the default value at 96 but allow a property to override this > default and change the value. > We cannot use a constructor argument here because the use of the UDF is not > exposed to the user, so he has no opportunity to pass a constructor argument > to it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira