[
https://issues.apache.org/jira/browse/PIG-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-1713:
----------------------------
Description:
I have a script which takes in a command line parameter.
{code}
pig -p number=100 script.pig
{code}
The script contains the following parameters:
{code}
A = load '/user/viraj/test' using PigStorage() as (a,b,c);
B = SAMPLE A 1/$number;
dump B;
{code}
Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data on
demand.
Ideally I would like to calculate SAMPLE from within Pig script without having
to run one Pig script first get it's results and another to pass the results.
Ideal use case:
{code}
A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3);
...
...
W = group X by col1;
Z = foreach Y generate AVG(X);
AA = load '/user/viraj/test' using PigStorage() as (a,b,c);
BB = SAMPLE AA 1/Z;
dump BB;
{code}
Viraj
Change this Jira to only track sampling algorithm. PIG-1926 is opened to track
limit/sample taking scalar.
This is a candidate project for Google summer of code 2012. More information
about the program can be found at
https://cwiki.apache.org/confluence/display/PIG/GSoc2012
was:
I have a script which takes in a command line parameter.
{code}
pig -p number=100 script.pig
{code}
The script contains the following parameters:
{code}
A = load '/user/viraj/test' using PigStorage() as (a,b,c);
B = SAMPLE A 1/$number;
dump B;
{code}
Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data on
demand.
Ideally I would like to calculate SAMPLE from within Pig script without having
to run one Pig script first get it's results and another to pass the results.
Ideal use case:
{code}
A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3);
...
...
W = group X by col1;
Z = foreach Y generate AVG(X);
AA = load '/user/viraj/test' using PigStorage() as (a,b,c);
BB = SAMPLE AA 1/Z;
dump BB;
{code}
Viraj
Change this Jira to only track sampling algorithm. PIG-1926 is opened to track
limit/sample taking scalar.
This is a candidate project for Google summer of code 2011. More information
about the program can be found at http://wiki.apache.org/pig/GSoc2011
> SAMPLE command should accept parameters to specify alternative sampling
> algorithm
> ---------------------------------------------------------------------------------
>
> Key: PIG-1713
> URL: https://issues.apache.org/jira/browse/PIG-1713
> Project: Pig
> Issue Type: Improvement
> Reporter: Viraj Bhat
> Labels: gsoc2012
>
> I have a script which takes in a command line parameter.
> {code}
> pig -p number=100 script.pig
> {code}
> The script contains the following parameters:
> {code}
> A = load '/user/viraj/test' using PigStorage() as (a,b,c);
> B = SAMPLE A 1/$number;
> dump B;
> {code}
> Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data
> on demand.
> Ideally I would like to calculate SAMPLE from within Pig script without
> having to run one Pig script first get it's results and another to pass the
> results.
> Ideal use case:
> {code}
> A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3);
> ...
> ...
> W = group X by col1;
> Z = foreach Y generate AVG(X);
> AA = load '/user/viraj/test' using PigStorage() as (a,b,c);
> BB = SAMPLE AA 1/Z;
> dump BB;
> {code}
> Viraj
> Change this Jira to only track sampling algorithm. PIG-1926 is opened to
> track limit/sample taking scalar.
> This is a candidate project for Google summer of code 2012. More information
> about the program can be found at
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira