Hi Patrik First of all, thank you for your reply.
1. Splitting them seems to be the best option. 2. Yes, I'm passing simple HashSet and yes, I'm modifying this set in different actors. So if I understand correctly, I should make them immutable. Are there any tutorials how to do this correctly? Also, what about JPA entities? I probably also shouldn't pass them between actors, right? 3. I will probably create a parent actor with ask pattern, so in case something goes wrong and I don't get step success/error message in time, I will stop parent actor. Or should I implement this in different way? 4. Akka streams looks really great, hopefully it will be completed soon! Have a nice weekend! On Friday, 28 November 2014 15:50:24 UTC+1, Patrik Nordwall wrote: > > Hi, > > On Fri, Nov 28, 2014 at 11:06 AM, Erol Merdanović <[email protected] > <javascript:>> wrote: > >> Hi >> >> We have an app (Play Framework) which accepts files from different >> devices. Devices are logging some values into a file and send that file >> every 5 minutes. The app accepts the file, parses it and processes it. >> >> Actual steps are >> 1. parse the file >> 2. recalculate values (we get values as 0-100%, so we need to go to >> database to read actually range for each value and recalculate it) >> 3. filter zero values (if each value is < X, then value = 0 where X is >> again obtained from database for each value) >> 4. calculate calculation values (we read formulas from database and >> create new calculated values from the uploaded values) >> 5. we store values to timeseries database (kairosdb) >> 6. every value belongs to different tag (tag is like pressure, >> temperature ..) so when we get a value, we activate the tag (update in >> database) >> 7. store last value for each tag (again, make an update in the database) >> 8. update device status (we update the last uploaded time for each device >> in the database) >> 9. check for alarms (for each alarm we fetch avg value from timeseries >> database and if it's >, < or == to alarm value, activate alarm and >> insert/update alarm log into database) >> >> To do all of this, it takes around 200ms-300ms on our production machine. >> >> Problems: >> 1. Should I actually split this into separate actors? Right now it's in >> one big actor which is calling injected services. >> > > Depends on how much you want to be able to run in parallel. One single > instance of this big actor is obviously not an option. One instance per > device could work. > > There are a lot of database calls involved and if those are blocking calls > (e.g. jdbc) you should make sure that you use a dedicated dispatcher for > those steps to manage the blocking threads carefully and isolate them from > other non-blocking parts of the application. That can be a reason to > separate the steps into separate actors. > > Another reason is that separate actors will enable pipelined execution, > e.g. one chunk of values can be calculated in step 4 while another chunk of > values are stored to the timeseries database in step 5. > > Also, it can be good to split the steps into separate actors to make them > easier to test. > > It doesn't sound like you need to distribute the processing steps to > several machines, but that could otherwise also be a reason to make them > separate actors. > > >> 2. How to make sure each actor is called after another (to have >> sequential processing)? Should each actor pass the message to another actor >> or is there some better way (supervisor/parent? )? >> > > There are several ways. One way is to have a coordinator that delegates > results from one step to the next step. Then the coordinator have all the > references to the workers. Another way is to pass the references in the > messages, and each worker pass on the result to the next step. > > >> 3. What happens when I want to skip certain step (for example, to skip >> checking for alarms)? Should there be skipAlarms = true in the message or I >> should implement this somehow else? >> 4. Should there be any problems when I'm passing collection (in our case >> Set of values) between actors? >> > > Make sure that the messages are immutable. Are you using Java? Java > collections are not immutable, so make sure that you copy them when passing > in messages, or at least that you don't modify the same collection from two > different actors. > > >> Right now I'm inserting around 20k files and I'm noticing big memory >> consumption so I might even have a memory leak somewhere. >> > > Here is the challenge with all this. You must make sure that you don't > read in more records than you can consume, otherwise you risk out of > memory. For example, reading and parsing the file might be fast but the > next step involving the database might be slower, then you have a lot of > records in memory, in the mailbox of the Step2 actor. To solve this you > need to implement flow control. > > The simplest flow control is that each step requests previous step for one > more record when it is done with a record. It does not send the result to > the next step until that has requested for a record. Everything message > based, of course. This propagates all the way up to the file reading. > > If you find that the one-by-one request scheme is limiting performance you > can request records in batches, such as "I'm ready to processes 100 > records". > > Incidentally, we are currently working on Akka Streams that supports this > kind of backpressured stream processing. It's in preview experimental state > right now, i.e. not ready for production. > > Read more about it, and try it here: > http://typesafe.com/activator/template/akka-stream-java8 > http://typesafe.com/activator/template/akka-stream-scala > > Regards, > Patrik > > >> >> One of the ideas that I had was to create one parent actor for each >> device. The parent actor passes the message to child actor (9 steps) and >> waits for an answer. When it gets it, pass the message to another step. >> There I could then have some kind of logic to skip certain steps. >> >> For any suggestions and comments I would be really grateful. >> >> -- >> >>>>>>>>>> Read the docs: http://akka.io/docs/ >> >>>>>>>>>> Check the FAQ: >> http://doc.akka.io/docs/akka/current/additional/faq.html >> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user >> --- >> You received this message because you are subscribed to the Google Groups >> "Akka User List" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/akka-user. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > > Patrik Nordwall > Typesafe <http://typesafe.com/> - Reactive apps on the JVM > Twitter: @patriknw > > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
