[ https://issues.apache.org/jira/browse/TEZ-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Janos Matyas updated TEZ-1608: ------------------------------ Description: The goal of this sample is to find the topK elements of a dataset, while guiding through the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism). An example use case for top K: Given a large data set in CSV format of user comments on a site listed as: userid,postid,commentid,comment,timestamp and we are looking for the top K commenter or the posts with the most comment. was:The goal of this sample is to find the topK elements of a dataset, while guiding through the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism). Target Version/s: (was: 0.6.0) Affects Version/s: (was: 0.6.0) 0.5.0 Labels: (was: example) > TopK example > ------------ > > Key: TEZ-1608 > URL: https://issues.apache.org/jira/browse/TEZ-1608 > Project: Apache Tez > Issue Type: Sub-task > Affects Versions: 0.5.0 > Reporter: Janos Matyas > Attachments: TEZ-1608-1.patch > > > The goal of this sample is to find the topK elements of a dataset, while > guiding through the basics of Tez (DAG creation, tokenizers, custom > comparators and parallelism). > An example use case for top K: > Given a large data set in CSV format of user comments on a site listed as: > userid,postid,commentid,comment,timestamp and we are looking for the top K > commenter or the posts with the most comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)