Re: Dataset for hive
HI Gopal Xiaohe, Thanks for sharing. Thanks,VK On Wednesday, April 15, 2015 9:23 AM, xiaohe lan zombiexco...@gmail.com wrote: I just have time to generate the data a few minutes ago. It can generate 100G data for me in tens of minutes on my 5 nodes cluster. Thanks all for helping me. Regards,Xiaohe On Fri, Apr 3, 2015 at 9:00 PM, Fabio C. anyte...@gmail.com wrote: Thanks Gopal, but since it was a while ago and I didn't have to generate too much data I just run the tpc-ds generator binaries in parallel and uploaded it manually. Anyway if you want to have a look at the error: http://hortonworks.com/community/forums/topic/hive-testbench-error/ Maybe it's trivial and it can help someone else. Regards Fabio On Thu, Apr 2, 2015 at 7:20 PM, Gopal Vijayaraghavan gop...@apache.org wrote: https://github.com/hortonworks/hive-testbench The official procedure to generate and upload the data has never worked for me (and it looks like it's not a supported software), so it could be a bit tricky to do it manually and on a single host. I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a whole weekend for 1Tb of data to be generated on a single machine. If you or anyone else has issues with it, I can take a look at it. Cheers, Gopal
Re: Dataset for hive
Thanks Gopal, but since it was a while ago and I didn't have to generate too much data I just run the tpc-ds generator binaries in parallel and uploaded it manually. Anyway if you want to have a look at the error: http://hortonworks.com/community/forums/topic/hive-testbench-error/ Maybe it's trivial and it can help someone else. Regards Fabio On Thu, Apr 2, 2015 at 7:20 PM, Gopal Vijayaraghavan gop...@apache.org wrote: https://github.com/hortonworks/hive-testbench The official procedure to generate and upload the data has never worked for me (and it looks like it's not a supported software), so it could be a bit tricky to do it manually and on a single host. I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a whole weekend for 1Tb of data to be generated on a single machine. If you or anyone else has issues with it, I can take a look at it. Cheers, Gopal
Re: Dataset for hive
Hi Vivek Veeramani, Actually, I already have that. But with the wiki dataset, I can only do select * queries. Thanks, Xiaohe On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani vivek.veeraman...@gmail.com wrote: Hi Xiaohe, If it's data set that you're looking for, you can find wikipedia data dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the dumps @ http://meta.wikimedia.org/wiki/Data_dumps. Hope this helps.. On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan zombiexco...@gmail.com wrote: Hi All, I am new to Hive. Just set up a 5 nodes Hadoop environment and want to have a try on HiveQL. Is there any dataset I can download to play HiveQL. The dataset should have several tables some I can write some complex join. About 100G should be fine. Thanks, Xiaohe -- Thanks , Vivek Veeramani cell : +91-9632 975 975 +91-9895 277 101
Re: Dataset for Hive
Hi Xiaohe, You can try TPC-DS from https://github.com/hortonworks/hive-testbench. It contains large number of queries with complex joins. Chao On Wed, Apr 1, 2015 at 9:30 PM, xiaohe lan zombiexco...@gmail.com wrote: Hi All, I am new to Hive. Just set up a 5 node Hadoop environment and want to have a try on HiveQL. Is there any dataset I can download to play HiveQL. The dataset should have several tables some I can write some complex join. About 100G should be fine. Thanks, Xiaohe
Re: Dataset for hive
https://github.com/hortonworks/hive-testbench The official procedure to generate and upload the data has never worked for me (and it looks like it's not a supported software), so it could be a bit tricky to do it manually and on a single host. The good point is you already have several queries and you can set the size of the data you want to generate. On Thu, Apr 2, 2015 at 8:29 AM, xiaohe lan zombiexco...@gmail.com wrote: Hi Vivek Veeramani, Actually, I already have that. But with the wiki dataset, I can only do select * queries. Thanks, Xiaohe On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani vivek.veeraman...@gmail.com wrote: Hi Xiaohe, If it's data set that you're looking for, you can find wikipedia data dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the dumps @ http://meta.wikimedia.org/wiki/Data_dumps. Hope this helps.. On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan zombiexco...@gmail.com wrote: Hi All, I am new to Hive. Just set up a 5 nodes Hadoop environment and want to have a try on HiveQL. Is there any dataset I can download to play HiveQL. The dataset should have several tables some I can write some complex join. About 100G should be fine. Thanks, Xiaohe -- Thanks , Vivek Veeramani cell : +91-9632 975 975 +91-9895 277 101
Re: Dataset for hive
https://github.com/hortonworks/hive-testbench The official procedure to generate and upload the data has never worked for me (and it looks like it's not a supported software), so it could be a bit tricky to do it manually and on a single host. I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a whole weekend for 1Tb of data to be generated on a single machine. If you or anyone else has issues with it, I can take a look at it. Cheers, Gopal
Dataset for hive
Hi All, I am new to Hive. Just set up a 5 nodes Hadoop environment and want to have a try on HiveQL. Is there any dataset I can download to play HiveQL. The dataset should have several tables some I can write some complex join. About 100G should be fine. Thanks, Xiaohe
Re: Dataset for hive
Hi Xiaohe, If it's data set that you're looking for, you can find wikipedia data dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the dumps @ http://meta.wikimedia.org/wiki/Data_dumps. Hope this helps.. On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan zombiexco...@gmail.com wrote: Hi All, I am new to Hive. Just set up a 5 nodes Hadoop environment and want to have a try on HiveQL. Is there any dataset I can download to play HiveQL. The dataset should have several tables some I can write some complex join. About 100G should be fine. Thanks, Xiaohe -- Thanks , Vivek Veeramani cell : +91-9632 975 975 +91-9895 277 101