Re: Dataset for hive

2015-04-15 Thread venkatanathen kannan
HI Gopal  Xiaohe, 
Thanks for sharing.
Thanks,VK  


 On Wednesday, April 15, 2015 9:23 AM, xiaohe lan zombiexco...@gmail.com 
wrote:
   

 I just have time to generate the data a few minutes ago. It can generate 100G 
data for me in tens of minutes on my 5 nodes cluster.
Thanks all for helping me.
Regards,Xiaohe
On Fri, Apr 3, 2015 at 9:00 PM, Fabio C. anyte...@gmail.com wrote:

Thanks Gopal, but since it was a while ago and I didn't have to generate too 
much data I just run the tpc-ds generator binaries in parallel and uploaded it 
manually. Anyway if you want to have a look at the error: 
http://hortonworks.com/community/forums/topic/hive-testbench-error/ 
Maybe it's trivial and it can help someone else.

Regards

Fabio

On Thu, Apr 2, 2015 at 7:20 PM, Gopal Vijayaraghavan gop...@apache.org wrote:



 https://github.com/hortonworks/hive-testbench

 The official procedure to generate and upload the data has never worked
for me (and it looks like it's not a supported software), so it could be
a bit tricky to do it manually and on a single host.

I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a
whole weekend for 1Tb of data to be generated on a single machine.

If you or anyone else has issues with it, I can take a look at it.

Cheers,
Gopal








  

Re: Dataset for hive

2015-04-03 Thread Fabio C.
Thanks Gopal, but since it was a while ago and I didn't have to generate
too much data I just run the tpc-ds generator binaries in parallel and
uploaded it manually. Anyway if you want to have a look at the error:
http://hortonworks.com/community/forums/topic/hive-testbench-error/
Maybe it's trivial and it can help someone else.

Regards

Fabio

On Thu, Apr 2, 2015 at 7:20 PM, Gopal Vijayaraghavan gop...@apache.org
wrote:



  https://github.com/hortonworks/hive-testbench
 
  The official procedure to generate and upload the data has never worked
 for me (and it looks like it's not a supported software), so it could be
 a bit tricky to do it manually and on a single host.

 I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a
 whole weekend for 1Tb of data to be generated on a single machine.

 If you or anyone else has issues with it, I can take a look at it.

 Cheers,
 Gopal





Re: Dataset for hive

2015-04-02 Thread xiaohe lan
Hi Vivek Veeramani,

Actually, I already have that. But with the wiki dataset, I can only do
select * queries.

Thanks,
Xiaohe

On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani vivek.veeraman...@gmail.com
 wrote:

 Hi Xiaohe,

 If it's data set that you're looking for, you can find wikipedia data
 dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the
 dumps @ http://meta.wikimedia.org/wiki/Data_dumps.

 Hope this helps..


 On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan zombiexco...@gmail.com
 wrote:

 Hi All,

 I am new to Hive. Just set up a 5 nodes Hadoop environment and want to
 have a try on HiveQL.
 Is there any dataset I can download to play HiveQL. The dataset should
 have several tables some I can write some complex join. About 100G should
 be fine.

 Thanks,
 Xiaohe




 --
 Thanks ,
 Vivek Veeramani


 cell : +91-9632 975 975
 +91-9895 277 101



Re: Dataset for Hive

2015-04-02 Thread Chao Sun
Hi Xiaohe,

You can try TPC-DS from https://github.com/hortonworks/hive-testbench.
It contains large number of queries with complex joins.

Chao

On Wed, Apr 1, 2015 at 9:30 PM, xiaohe lan zombiexco...@gmail.com wrote:

 Hi All,

 I am new to Hive. Just set up a 5 node Hadoop environment and want to have
 a try on HiveQL.
 Is there any dataset I can download to play HiveQL. The dataset should have
 several tables some I can write some complex join. About 100G should be
 fine.

 Thanks,
 Xiaohe



Re: Dataset for hive

2015-04-02 Thread Fabio C.
https://github.com/hortonworks/hive-testbench
The official procedure to generate and upload the data has never worked for
me (and it looks like it's not a supported software), so it could be a bit
tricky to do it manually and on a single host. The good point is you
already have several queries and you can set the size of the data you want
to generate.

On Thu, Apr 2, 2015 at 8:29 AM, xiaohe lan zombiexco...@gmail.com wrote:

 Hi Vivek Veeramani,

 Actually, I already have that. But with the wiki dataset, I can only do
 select * queries.

 Thanks,
 Xiaohe

 On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani 
 vivek.veeraman...@gmail.com wrote:

 Hi Xiaohe,

 If it's data set that you're looking for, you can find wikipedia data
 dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the
 dumps @ http://meta.wikimedia.org/wiki/Data_dumps.

 Hope this helps..


 On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan zombiexco...@gmail.com
 wrote:

 Hi All,

 I am new to Hive. Just set up a 5 nodes Hadoop environment and want to
 have a try on HiveQL.
 Is there any dataset I can download to play HiveQL. The dataset should
 have several tables some I can write some complex join. About 100G should
 be fine.

 Thanks,
 Xiaohe




 --
 Thanks ,
 Vivek Veeramani


 cell : +91-9632 975 975
 +91-9895 277 101





Re: Dataset for hive

2015-04-02 Thread Gopal Vijayaraghavan


 https://github.com/hortonworks/hive-testbench

 The official procedure to generate and upload the data has never worked
for me (and it looks like it's not a supported software), so it could be
a bit tricky to do it manually and on a single host.

I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a
whole weekend for 1Tb of data to be generated on a single machine.

If you or anyone else has issues with it, I can take a look at it.

Cheers,
Gopal




Dataset for hive

2015-04-01 Thread xiaohe lan
Hi All,

I am new to Hive. Just set up a 5 nodes Hadoop environment and want to have
a try on HiveQL.
Is there any dataset I can download to play HiveQL. The dataset should have
several tables some I can write some complex join. About 100G should be
fine.

Thanks,
Xiaohe


Re: Dataset for hive

2015-04-01 Thread vivek veeramani
Hi Xiaohe,

If it's data set that you're looking for, you can find wikipedia data dumps
@ http://dumps.wikimedia.org/enwiki/. Also documentation on the dumps @
http://meta.wikimedia.org/wiki/Data_dumps.

Hope this helps..


On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan zombiexco...@gmail.com wrote:

 Hi All,

 I am new to Hive. Just set up a 5 nodes Hadoop environment and want to
 have a try on HiveQL.
 Is there any dataset I can download to play HiveQL. The dataset should
 have several tables some I can write some complex join. About 100G should
 be fine.

 Thanks,
 Xiaohe




-- 
Thanks ,
Vivek Veeramani


cell : +91-9632 975 975
+91-9895 277 101