[jira] Commented: (PIG-200) Pig Performance Benchmarks
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851293#action_12851293 ] Daniel Dai commented on PIG-200: Hi, duncan, I tried and I didn't see errors. Are you using pig 0.6 release? What error message did you see? Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir Youssefi Assignee: Alan Gates Fix For: 0.2.0 Attachments: generate_data.pl, perf-0.6.patch, perf.hadoop.patch, perf.patch To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only). Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs. We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc. We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix). I will update this JIRA with more details of current activities soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-200) Pig Performance Benchmarks
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845174#action_12845174 ] duncan commented on PIG-200: Hi Daniel, How can I run the perf.patch? I saw a lot of different things in the perf.patch. I want to generate data set and use those 14 pig queries for benchmarking. Would you mind telling me more on how to use the perf.patch? Thanks Duncan Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir Youssefi Assignee: Alan Gates Attachments: generate_data.pl, perf.hadoop.patch, perf.patch To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only). Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs. We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc. We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix). I will update this JIRA with more details of current activities soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-200) Pig Performance Benchmarks
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738609#action_12738609 ] Ying He commented on PIG-200: - doc for DataGenerator in hadoop mode is here: http://wiki.apache.org/pig/DataGeneratorHadoop Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir Youssefi Attachments: generate_data.pl, perf.hadoop.patch, perf.patch To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only). Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs. We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc. We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix). I will update this JIRA with more details of current activities soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-200) Pig Performance Benchmarks
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721959#action_12721959 ] Zheng Shao commented on PIG-200: We made a benchmark for Hive based on the queries from the SIGMOD 2009 paper. https://issues.apache.org/jira/browse/HIVE-396 We also spent a lot of time in writing pig programs for those queries, and we have some preliminary results. Will somebody from the pig team take a look and help improve the pig queries? Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir Youssefi Attachments: generate_data.pl, perf.patch To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only). Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs. We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc. We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix). I will update this JIRA with more details of current activities soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.