Re: Dataset for hive
HI Gopal & Xiaohe, Thanks for sharing. Thanks,VK On Wednesday, April 15, 2015 9:23 AM, xiaohe lan wrote: I just have time to generate the data a few minutes ago. It can generate 100G data for me in tens of minutes on my 5 nodes cluster. Thanks all for helping me. Regards,Xiaohe On Fri, Apr 3, 2015 at 9:00 PM, Fabio C. wrote: Thanks Gopal, but since it was a while ago and I didn't have to generate too much data I just run the tpc-ds generator binaries in parallel and uploaded it manually. Anyway if you want to have a look at the error: http://hortonworks.com/community/forums/topic/hive-testbench-error/ Maybe it's trivial and it can help someone else. Regards Fabio On Thu, Apr 2, 2015 at 7:20 PM, Gopal Vijayaraghavan wrote: > https://github.com/hortonworks/hive-testbench > > The official procedure to generate and upload the data has never worked >for me (and it looks like it's not a supported software), so it could be >a bit tricky to do it manually and on a single host. I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a whole weekend for 1Tb of data to be generated on a single machine. If you or anyone else has issues with it, I can take a look at it. Cheers, Gopal
Re: Dataset for hive
I just have time to generate the data a few minutes ago. It can generate 100G data for me in tens of minutes on my 5 nodes cluster. Thanks all for helping me. Regards, Xiaohe On Fri, Apr 3, 2015 at 9:00 PM, Fabio C. wrote: > Thanks Gopal, but since it was a while ago and I didn't have to generate > too much data I just run the tpc-ds generator binaries in parallel and > uploaded it manually. Anyway if you want to have a look at the error: > http://hortonworks.com/community/forums/topic/hive-testbench-error/ > Maybe it's trivial and it can help someone else. > > Regards > > Fabio > > On Thu, Apr 2, 2015 at 7:20 PM, Gopal Vijayaraghavan > wrote: > >> >> >> > https://github.com/hortonworks/hive-testbench >> > >> > The official procedure to generate and upload the data has never worked >> >for me (and it looks like it's not a supported software), so it could be >> >a bit tricky to do it manually and on a single host. >> >> I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a >> whole weekend for 1Tb of data to be generated on a single machine. >> >> If you or anyone else has issues with it, I can take a look at it. >> >> Cheers, >> Gopal >> >> >> >
Re: Dataset for hive
Thanks Gopal, but since it was a while ago and I didn't have to generate too much data I just run the tpc-ds generator binaries in parallel and uploaded it manually. Anyway if you want to have a look at the error: http://hortonworks.com/community/forums/topic/hive-testbench-error/ Maybe it's trivial and it can help someone else. Regards Fabio On Thu, Apr 2, 2015 at 7:20 PM, Gopal Vijayaraghavan wrote: > > > > https://github.com/hortonworks/hive-testbench > > > > The official procedure to generate and upload the data has never worked > >for me (and it looks like it's not a supported software), so it could be > >a bit tricky to do it manually and on a single host. > > I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a > whole weekend for 1Tb of data to be generated on a single machine. > > If you or anyone else has issues with it, I can take a look at it. > > Cheers, > Gopal > > >
Re: Dataset for hive
> https://github.com/hortonworks/hive-testbench > > The official procedure to generate and upload the data has never worked >for me (and it looks like it's not a supported software), so it could be >a bit tricky to do it manually and on a single host. I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a whole weekend for 1Tb of data to be generated on a single machine. If you or anyone else has issues with it, I can take a look at it. Cheers, Gopal
Re: Dataset for hive
https://github.com/hortonworks/hive-testbench The official procedure to generate and upload the data has never worked for me (and it looks like it's not a supported software), so it could be a bit tricky to do it manually and on a single host. The good point is you already have several queries and you can set the size of the data you want to generate. On Thu, Apr 2, 2015 at 8:29 AM, xiaohe lan wrote: > Hi Vivek Veeramani, > > Actually, I already have that. But with the wiki dataset, I can only do > "select *" queries. > > Thanks, > Xiaohe > > On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani < > vivek.veeraman...@gmail.com> wrote: > >> Hi Xiaohe, >> >> If it's data set that you're looking for, you can find wikipedia data >> dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the >> dumps @ http://meta.wikimedia.org/wiki/Data_dumps. >> >> Hope this helps.. >> >> >> On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan >> wrote: >> >>> Hi All, >>> >>> I am new to Hive. Just set up a 5 nodes Hadoop environment and want to >>> have a try on HiveQL. >>> Is there any dataset I can download to play HiveQL. The dataset should >>> have several tables some I can write some complex join. About 100G should >>> be fine. >>> >>> Thanks, >>> Xiaohe >>> >> >> >> >> -- >> Thanks , >> Vivek Veeramani >> >> >> cell : +91-9632 975 975 >> +91-9895 277 101 >> > >
Re: Dataset for Hive
Hi Xiaohe, You can try TPC-DS from https://github.com/hortonworks/hive-testbench. It contains large number of queries with complex joins. Chao On Wed, Apr 1, 2015 at 9:30 PM, xiaohe lan wrote: > Hi All, > > I am new to Hive. Just set up a 5 node Hadoop environment and want to have > a try on HiveQL. > Is there any dataset I can download to play HiveQL. The dataset should have > several tables some I can write some complex join. About 100G should be > fine. > > Thanks, > Xiaohe >
Re: Dataset for hive
Hi Vivek Veeramani, Actually, I already have that. But with the wiki dataset, I can only do "select *" queries. Thanks, Xiaohe On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani wrote: > Hi Xiaohe, > > If it's data set that you're looking for, you can find wikipedia data > dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the > dumps @ http://meta.wikimedia.org/wiki/Data_dumps. > > Hope this helps.. > > > On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan > wrote: > >> Hi All, >> >> I am new to Hive. Just set up a 5 nodes Hadoop environment and want to >> have a try on HiveQL. >> Is there any dataset I can download to play HiveQL. The dataset should >> have several tables some I can write some complex join. About 100G should >> be fine. >> >> Thanks, >> Xiaohe >> > > > > -- > Thanks , > Vivek Veeramani > > > cell : +91-9632 975 975 > +91-9895 277 101 >
Re: Dataset for hive
Hi Xiaohe, If it's data set that you're looking for, you can find wikipedia data dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the dumps @ http://meta.wikimedia.org/wiki/Data_dumps. Hope this helps.. On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan wrote: > Hi All, > > I am new to Hive. Just set up a 5 nodes Hadoop environment and want to > have a try on HiveQL. > Is there any dataset I can download to play HiveQL. The dataset should > have several tables some I can write some complex join. About 100G should > be fine. > > Thanks, > Xiaohe > -- Thanks , Vivek Veeramani cell : +91-9632 975 975 +91-9895 277 101