[
https://issues.apache.org/jira/browse/PIG-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789069#action_12789069
]
Thejas M Nair commented on PIG-1143:
------------------------------------
If the data is going to be in BinStorage, my comments regarding the approach
for this patch are not applicable. But the patch does not need to be ported to
load-store redesign branch.
> Poisson Sample Loader should compute the number of samples required only once
> -----------------------------------------------------------------------------
>
> Key: PIG-1143
> URL: https://issues.apache.org/jira/browse/PIG-1143
> Project: Pig
> Issue Type: Bug
> Reporter: Sriranjan Manjunath
> Assignee: Sriranjan Manjunath
>
> The current poisson sampler forces each of the maps to compute the sample
> number. This is redundant and causes issues when a large directory is
> specified in the join. The sampler should be changed to calculate the sample
> count only once and this information should be shared with the remaining
> mappers.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.