[ https://issues.apache.org/jira/browse/PIG-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009280#comment-13009280 ]
Gianmarco De Francisci Morales commented on PIG-1713: ----------------------------------------------------- To support the simple use case one would simply need to allow expressions in the SAMPLE argument. This should mainly require changes to the front-end I assume. For more complex techniques like reservoir one should implement a new (physical?) operator. What is the exact scope/goal of the project? Maybe it could be split in 2 parts. Supporting sampling with variable arguments as the first part, and adding more complex techniques as a second part? > SAMPLE command should accept parameters > --------------------------------------- > > Key: PIG-1713 > URL: https://issues.apache.org/jira/browse/PIG-1713 > Project: Pig > Issue Type: Improvement > Reporter: Viraj Bhat > Labels: gsoc2011 > Fix For: 0.10 > > > I have a script which takes in a command line parameter. > {code} > pig -p number=100 script.pig > {code} > The script contains the following parameters: > {code} > A = load '/user/viraj/test' using PigStorage() as (a,b,c); > B = SAMPLE A 1/$number; > dump B; > {code} > Realistic use cases of SAMPLE require statisticians to calculate SAMPLE data > on demand. > Ideally I would like to calculate SAMPLE from within Pig script without > having to run one Pig script first get it's results and another to pass the > results. > Ideal use case: > {code} > A = load '/user/viraj/input' using PigStorage() as (col1, col2, col3); > ... > ... > W = group X by col1; > Z = foreach Y generate AVG(X); > AA = load '/user/viraj/test' using PigStorage() as (a,b,c); > BB = SAMPLE AA 1/Z; > dump BB; > {code} > Viraj > Limit should has the same case. > This is a candidate project for Google summer of code 2011. More information > about the program can be found at http://wiki.apache.org/pig/GSoc2011 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira