回复:Re: A tool to generate simulation data

2017-07-27 Thread luohui20001

thank you Suzen, i've had a try to generate 1 billion records within 1.5min. It 
is fast.And I will go on to try some other cases.



 

ThanksBest regards!
San.Luo

- 原始邮件 -
发件人:"Suzen, Mehmet" <su...@acm.org>
收件人:luohui20...@sina.com
抄送人:user <user@spark.apache.org>
主题:Re: A tool to generate simulation data
日期:2017年07月28日 01点18分


I suggest RandomRDDs API. It provides nice tools. If you write
wrappers around that might be good.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs$
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Re: A tool to generate simulation data

2017-07-27 Thread Suzen, Mehmet
I suggest RandomRDDs API. It provides nice tools. If you write
wrappers around that might be good.

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs$

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



A tool to generate simulation data

2017-07-27 Thread luohui20001
hello guys  Is there a tool or an open source project that can mock lange 
amount of data quickly, and support below :1. transaction data2. time series 
data3. specified format data like CSV files or json files.4. data generated at 
a changing speed.5. distributed data generation



 

ThanksBest regards!
San.Luo