[ https://issues.apache.org/jira/browse/PIG-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1713: ---------------------------- Description: I have a script which takes in a command line parameter. {code} pig -p number=100 script.pig {code} The script contains the following parameters: {code} A = load '/user/viraj/test' using PigStorage() as (a,b,c); B = SAMPLE A 1/$number; dump B; {code} Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data on demand. Ideally I would like to calculate SAMPLE from within Pig script without having to run one Pig script first get it's results and another to pass the results. Ideal use case: {code} A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3); ... ... W = group X by col1; Z = foreach Y generate AVG(X); AA = load '/user/viraj/test' using PigStorage() as (a,b,c); BB = SAMPLE AA 1/Z; dump BB; {code} Viraj Change this Jira to only track sampling algorithm. PIG-1926 is opened to track limit/sample taking scalar. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 was: I have a script which takes in a command line parameter. {code} pig -p number=100 script.pig {code} The script contains the following parameters: {code} A = load '/user/viraj/test' using PigStorage() as (a,b,c); B = SAMPLE A 1/$number; dump B; {code} Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data on demand. Ideally I would like to calculate SAMPLE from within Pig script without having to run one Pig script first get it's results and another to pass the results. Ideal use case: {code} A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3); ... ... W = group X by col1; Z = foreach Y generate AVG(X); AA = load '/user/viraj/test' using PigStorage() as (a,b,c); BB = SAMPLE AA 1/Z; dump BB; {code} Viraj Change this Jira to only track sampling algorithm. PIG-1926 is opened to track limit/sample taking scalar. This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011 > SAMPLE command should accept parameters to specify alternative sampling > algorithm > --------------------------------------------------------------------------------- > > Key: PIG-1713 > URL: https://issues.apache.org/jira/browse/PIG-1713 > Project: Pig > Issue Type: Improvement > Reporter: Viraj Bhat > Labels: gsoc2012 > > I have a script which takes in a command line parameter. > {code} > pig -p number=100 script.pig > {code} > The script contains the following parameters: > {code} > A = load '/user/viraj/test' using PigStorage() as (a,b,c); > B = SAMPLE A 1/$number; > dump B; > {code} > Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data > on demand. > Ideally I would like to calculate SAMPLE from within Pig script without > having to run one Pig script first get it's results and another to pass the > results. > Ideal use case: > {code} > A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3); > ... > ... > W = group X by col1; > Z = foreach Y generate AVG(X); > AA = load '/user/viraj/test' using PigStorage() as (a,b,c); > BB = SAMPLE AA 1/Z; > dump BB; > {code} > Viraj > Change this Jira to only track sampling algorithm. PIG-1926 is opened to > track limit/sample taking scalar. > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira