[jira] [Commented] (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464460#comment-13464460 ] Russell Jurney commented on PIG-1150: - Does this look right? http://introcs.cs.princeton.edu/java/97data/OnePass.java.html It looks right to me. I think? VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Assignee: Dmitriy V. Ryaboy Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162952#comment-13162952 ] Jayadev Chandrasekhar commented on PIG-1150: [From Yahoo! Labs] Just checking whether variance (or std deviation) UDF got finally implemented in pig, from this ticket it appears not, but not sure whether a similar item is tracked somewhere else. I will need to either use it or implement it if not present, can try to contribute my implementation if it is not too difficult to make it generic, please point me to some guidelines while contributing. VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Assignee: Dmitriy V. Ryaboy Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162954#comment-13162954 ] Jayadev Chandrasekhar commented on PIG-1150: Sorry was just checking the whole chain of comments , it appears that piggybank is not maintained by the pig team any more ? VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Assignee: Dmitriy V. Ryaboy Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162974#comment-13162974 ] Daniel Dai commented on PIG-1150: - We are still maintaining piggybank and we will review and commit code once there's a patch. The var patch attached has not been checked in. It needs rework mentioned by Dmitriy. VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Assignee: Dmitriy V. Ryaboy Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163040#comment-13163040 ] Dmitriy V. Ryaboy commented on PIG-1150: An accumulative version has been checked into the LinkedIn DataFu library (a contrib from Cloudera). You can use that. It's still not as efficient as one would like, I might have some news on that front once internal testing at Twitter is done :). VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Assignee: Dmitriy V. Ryaboy Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira