Hi

Thanks for the answer.


I think my simulator includes a lot of parallel state machines and each of them 
generates log file (with timestamps). Finally all events (rows) of all log 
files should combine as time order to (one) very huge log file. Practically the 
combined huge log file can also be split into smaller ones.


What transformation or action functions can i use in Spark for that purpose ?

Or are there exist some code sample (Python or Scala) about that ?

Regards

Esa Heikkinen

________________________________
Lähettäjä: Jörn Franke <jornfra...@gmail.com>
Lähetetty: 20. kesäkuuta 2017 17:12
Vastaanottaja: Esa Heikkinen
Kopio: user@spark.apache.org
Aihe: Re: Using Spark as a simulator

It is fine, but you have to design it that generated rows are written in large 
blocks for optimal performance.
The most tricky part with data generation is the conceptual part, such as 
probabilistic distribution etc
You have to check as well that you use a good random generator, for some cases 
the Java internal might be not that well.

On 20. Jun 2017, at 16:04, Esa Heikkinen 
<esa.heikki...@student.tut.fi<mailto:esa.heikki...@student.tut.fi>> wrote:


Hi


Spark is a data analyzer, but would it be possible to use Spark as a data 
generator or simulator ?


My simulation can be very huge and i think a parallelized simulation using by 
Spark (cloud) could work.

Is that good or bad idea ?


Regards

Esa Heikkinen

Reply via email to