[
https://issues.apache.org/jira/browse/PIG-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929865#action_12929865
]
Thejas M Nair commented on PIG-1713:
------------------------------------
Once the first use case is supported (expressions parameter for SAMPLE), the
ideal use case will also automatically work - thanks to the 'relation as
scalar' feature introduced in PIG-1434 . Until this feature is available, a
workaround is to use a filter statement with a udf that returns true based on
the probability argument.
> SAMPLE command should accept parameters
> ---------------------------------------
>
> Key: PIG-1713
> URL: https://issues.apache.org/jira/browse/PIG-1713
> Project: Pig
> Issue Type: Improvement
> Reporter: Viraj Bhat
>
> I have a script which takes in a command line parameter.
> {code}
> pig -p number=100 script.pig
> {code}
> The script contains the following parameters:
> {code}
> A = load '/user/viraj/test' using PigStorage() as (a,b,c);
> B = SAMPLE A 1/$number;
> dump B;
> {code}
> Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data
> on demand.
> Ideally I would like to calculate SAMPLE from within Pig script without
> having to run one Pig script first get it's results and another to pass the
> results.
> Ideal use case:
> {code}
> A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3);
> ...
> ...
> W = group X by col1;
> Z = foreach Y generate AVG(X);
> AA = load '/user/viraj/test' using PigStorage() as (a,b,c);
> BB = SAMPLE AA 1/Z;
> dump BB;
> {code}
> Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.