Re: [akka-user] TimeoutException when using tell on an ActorSelection if the target actor is created through AskSupport
On Fri, Jan 9, 2015 at 10:06 PM, Ferdinand Hübner ferdinand.hueb...@gmail.com wrote: On Friday, January 9, 2015 at 7:34:08 PM UTC+1, Patrik Nordwall wrote: 9 jan 2015 kl. 18:10 skrev Ferdinand Hübner ferdinan...@gmail.com: Yes, that is what I'm doing. I'm passing the sender() ActorRef to other actors until I am able to reply to it with deliver. When I reply with delivery, I call path on the ActorRef. In my service layer, I would send a confirmation using tell with ActorRef.noSender once the future from AskSupport completes. That is not going to work. Let's say that the confirmation message is lost. Then AtLeastOnceDelivery will resend it to the path of the PromiseActorRef (created by ask), but that is already completed and the resent message will go to deadLetters, and be retried again. I am aware of that and decided to ignore it at this point until I am able to decide if AtLeastOnceSupport is really something that I want and need. My idea was to handle UnconfirmedWarning by simply confirming the messages it contains. Would it be possible to implement it without ask? Yes, that should be possible. I never really thought about implementing it without ask. It's the first thing that came to my mind and worked well so far. I'll be going for a less temporay actor that completes promises and confirms deliveryIds that are not pending completion. Thank you for the help and your suggestions. You're welcome. /Patrik Ferdinand -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout. -- Patrik Nordwall Typesafe http://typesafe.com/ - Reactive apps on the JVM Twitter: @patriknw -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
[akka-user] Docs/samples on large file (10GB) uploads using akka-stream
Hello, I am writing a web-service to upload file/s as large as 10GB. Just looking for some pointers on how to go about it using akka-stream (haven't used it every before) Thanks, -Yogesh -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?
Hi Allen, I'd suspect the reason that it works well with Akka Streams is that they have back-pressure while your actor solution does not (you'll send 40 million messages as fast as you can, but the actor processing them might not be able to keep up) On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aiming...@gmail.com wrote: Hey Viktor, I'm trying to use Akka to parallelize this process. There shouldn't be any bottleneck, and I don't understand why I got memory overflow with my first version (actor version). The main task is to read in a line, break it up, and turn each segments (strings) into an integer, then prints it out to a CSV file (vectorization process). def processLine(line: String): Unit = { val vector: ListBuffer[String] = ListBuffer() val segs = line.split(,) println(segs(0)) (1 to segs.length - 1).map {i = val factorArray = dictionaries(i-1) vector += factorArray._2.indexOf(segs(i)).toString //get the factor level of string } timer ! OneDone printer ! Print(vector.toList)} When I'm doing this in pure Akka (with actors), since I created 40 million objects: Row(line: String), I get memory overflow issue. If I use Akka-stream, there is no memory overflow issue, but the performance is too similar to the non-parallelized version (even slower). It's my first time using Akka-stream. So I'm unfamiliar with the optimization you were talking about. Sincerely, Allen On Friday, January 9, 2015 at 4:03:13 PM UTC-5, √ wrote: Hi Allen, What's the bottleneck? Have you tried enabling the experimental optimizations? On Fri, Jan 9, 2015 at 9:52 PM, Allen Nie aimi...@gmail.com wrote: Thank you Soumya, I think Akka-streams is the way to go. However, I would also appreciate some performance boost as well - still have 40 million lines to go through! But thanks anyway! On Friday, January 9, 2015 at 12:43:49 PM UTC-5, Soumya Simanta wrote: I would recommend using the Akka-streams API for this. Here is sample. I was able to process a 1G file with around 1.5 million records in *20MB* of memory. The file read and the writing on the console rates are different but the streams API handles that. This is not the fastest but you at least won't run out of memory. https://lh6.googleusercontent.com/-zdX0n1pvueE/VLATDja3K4I/v18/BH7V1RAuxT8/s1600/1gb_file_processing.png import java.io.FileInputStream import java.util.Scanner import akka.actor.ActorSystem import akka.stream.{FlowMaterializer, MaterializerSettings} import akka.stream.scaladsl.Source import scala.util.Try object StreamingFileReader extends App { val inputStream = new FileInputStream(/path/to/file) val sc = new Scanner(inputStream, UTF-8) implicit val system = ActorSystem(Sys) val settings = MaterializerSettings(system) implicit val materializer = FlowMaterializer(settings.copy(maxInputBufferSize = 256, initialInputBufferSize = 256)) val fileSource = Source(() = Iterator.continually(sc.nextLine())) import system.dispatcher fileSource.map { line = line //do nothing //in the for each print the line. }.foreach(println).onComplete { _ = Try { sc.close() inputStream.close() } system.shutdown() } } On Friday, January 9, 2015 at 10:53:33 AM UTC-5, Allen Nie wrote: Hi, I am trying to process a csv file with 40 million lines of data in there. It's a 5GB size file. I'm trying to use Akka to parallelize the task. However, it seems like I can't stop the quick memory growth. It expanded from 1GB to almost 15GB (the limit I set) under 5 minutes. This is the code in my main() method: val inputStream = new FileInputStream(E:\\Allen\\DataScience\\train\\train.csv)val sc = new Scanner(inputStream, UTF-8) var counter = 0 while (sc.hasNextLine) { rowActors(counter % 20) ! Row(sc.nextLine()) counter += 1} sc.close() inputStream.close() Someone pointed out that I was essentially creating 40 million Row objects, which naturally will take up a lot of space. My row actor is not doing much. Just simply transforming each line into an array of integers (if you are familiar with the concept of vectorizing, that's what I'm doing). Then the transformed array gets printed out. Done. I originally thought there was a memory leak but maybe I'm not managing memory right. Can I get any wise suggestions from the Akka experts here?? http://i.stack.imgur.com/yQ4xx.png -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/ current/additional/faq.html Search the archives: https://groups.google.com/ group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com. To post to this group, send email to akka...@googlegroups.com. Visit this group at
Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?
Hi, On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aiming...@gmail.com wrote: Hey Viktor, I'm trying to use Akka to parallelize this process. There shouldn't be any bottleneck, and I don't understand why I got memory overflow with my first version (actor version). The main task is to read in a line, break it up, and turn each segments (strings) into an integer, then prints it out to a CSV file (vectorization process). def processLine(line: String): Unit = { val vector: ListBuffer[String] = ListBuffer() val segs = line.split(,) println(segs(0)) (1 to segs.length - 1).map {i = val factorArray = dictionaries(i-1) vector += factorArray._2.indexOf(segs(i)).toString //get the factor level of string } timer ! OneDone printer ! Print(vector.toList)} When I'm doing this in pure Akka (with actors), since I created 40 million objects: Row(line: String), I get memory overflow issue. No surprise there, you just slurp up all rows faster than the actors can keep up processing them, so most of them are in a mailbox. In fact if your actors do something trivially simple, the whole overhead of asynchronously passing elements to the actors might be larger than what you gain. In these cases it is recommended to pass batches of Rows instead of one-by-one. Remember, parallelisation only gains when the overhead of it is smaller than the task it parallelizes. If I use Akka-stream, there is no memory overflow issue, but the performance is too similar to the non-parallelized version (even slower). No surprise there either, you did nothing to parallelize or pipeline any computation in the stream, so you get the overhead of asynchronous processing and none of the benefits of it (but at least you get backpressure). You have a few approaches to get the benefints of multi-core processing with streams: - if you have multiple processing steps for a row you can pipeline them, see the intro part of this doc page: http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-rate.html - you can use mapAsync to have similar effects but with one computation step, see here: http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-integrations.html#Illustrating_ordering_and_parallelism - you can explicitly add fan-out elements to parallelise among multiple explicit workers, see here: http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-cookbook.html#Balancing_jobs_to_a_fixed_pool_of_workers Overall, for this kind of tasks I recommend using Streams, but you need to read the documentation first to understand how it works. -Endre It's my first time using Akka-stream. So I'm unfamiliar with the optimization you were talking about. Sincerely, Allen On Friday, January 9, 2015 at 4:03:13 PM UTC-5, √ wrote: Hi Allen, What's the bottleneck? Have you tried enabling the experimental optimizations? On Fri, Jan 9, 2015 at 9:52 PM, Allen Nie aimi...@gmail.com wrote: Thank you Soumya, I think Akka-streams is the way to go. However, I would also appreciate some performance boost as well - still have 40 million lines to go through! But thanks anyway! On Friday, January 9, 2015 at 12:43:49 PM UTC-5, Soumya Simanta wrote: I would recommend using the Akka-streams API for this. Here is sample. I was able to process a 1G file with around 1.5 million records in *20MB* of memory. The file read and the writing on the console rates are different but the streams API handles that. This is not the fastest but you at least won't run out of memory. https://lh6.googleusercontent.com/-zdX0n1pvueE/VLATDja3K4I/v18/BH7V1RAuxT8/s1600/1gb_file_processing.png import java.io.FileInputStream import java.util.Scanner import akka.actor.ActorSystem import akka.stream.{FlowMaterializer, MaterializerSettings} import akka.stream.scaladsl.Source import scala.util.Try object StreamingFileReader extends App { val inputStream = new FileInputStream(/path/to/file) val sc = new Scanner(inputStream, UTF-8) implicit val system = ActorSystem(Sys) val settings = MaterializerSettings(system) implicit val materializer = FlowMaterializer(settings.copy(maxInputBufferSize = 256, initialInputBufferSize = 256)) val fileSource = Source(() = Iterator.continually(sc.nextLine())) import system.dispatcher fileSource.map { line = line //do nothing //in the for each print the line. }.foreach(println).onComplete { _ = Try { sc.close() inputStream.close() } system.shutdown() } } On Friday, January 9, 2015 at 10:53:33 AM UTC-5, Allen Nie wrote: Hi, I am trying to process a csv file with 40 million lines of data in there. It's a 5GB size file. I'm trying to use Akka to parallelize the task. However, it seems like I can't stop the quick memory growth. It expanded from 1GB to almost 15GB (the limit I set) under 5 minutes. This
[akka-user] Re: Database Actor Read Model
Hi Andy - we are using an implementation of the Message Sequence http://www.eaipatterns.com/MessageSequence.html EIP, which chunks the messages. The code is here: https://github.com/sourcedelica/akka/blob/wip-3842-message-sequence-pattern-ericp/akka-contrib/src/main/scala/akka/contrib/pattern/MessageSequence.scala Test is here: https://github.com/sourcedelica/akka/blob/wip-3842-message-sequence-pattern-ericp/akka-contrib/src/test/scala/akka/contrib/pattern/MessageSequenceSpec.scala The other alternative is to store the large data in a shared place, like a distributed cache or NFS, and then send a reference to that location in the message. On Friday, January 9, 2015 at 6:38:32 AM UTC-5, Andy Zelinski wrote: Lots of database actor questions, but I couldn't find one that directly addressed data size limits: I want to send requests for data from a frontend to a Cluster Singleton which will forward each request to a Database Actor. The database actor runs query, then from the Row result constructs a domain object. this works great for case class Tweet(id: TweetId, user: User, text: Text, createdAt: Date). But what if we need to gather and pass around data from much wider tables than that? say we need to gather a bunch of user data. profile data, preference data, history, purchases, moderately large collection, etc. for use both by the akka cluster that performs our logic, and also simply to pass data to the frontend for templating. Put simply, potentially much more than the 120Kb maximum-frame-size. thus the following conundrum: 1. want to limit number of queries to database per data request, since that is the biggest bottleneck, area for headache. (although im using an async driver so its not as bad as blocking jdbc) 2. most data will be requested together, so want to to be able to get all row data in one query 3. thus the database actor will have too much data to send as a message back to its parent and ultimately to frontend actor of course i could split the query results into multiple messages, but that necessitates changing the whole actor system. can chunking or some kind of dedicated channel save me from having to completely re-think my backend? thanks! -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
[akka-user] Sharing message queue between two akka actors?
HI all akka experts, I have following questions for you 1. Is it possible to share message queue between two akka actors? 2. Is there any effect of increasing number of dispatchers on the message processing rate of akka actors? 3. What are the factors that affect the rate of message processing using akka Actors? *Thanks Regards * *Shrikrishna Kadam* -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
Re: [akka-user] Sharing message queue between two akka actors?
Hi Krishna! On Sat, Jan 10, 2015 at 5:16 PM, Krishna Kadam shrikrishna.kad...@gmail.com wrote: HI all akka experts, I have following questions for you 1. Is it possible to share message queue between two akka actors? Yes and no, there's a BalancingRouter, but it's only for that one. 2. Is there any effect of increasing number of dispatchers on the message processing rate of akka actors? Having multiple dispatchers is about bulkheading different actor subtrees, not about processing rate (throughput). 3. What are the factors that affect the rate of message processing using akka Actors? Short answer: Little's Law Long answer: Depending where the bottleneck is, you may want to tune dispatcher settings (throughput iso fairness, backing executor service, number of threads and mailbox implementation). *Thanks Regards * *Shrikrishna Kadam* -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout. -- Cheers, √ -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?
Hi Endre, That's a very valid suggestion. I'm quite new to Akka (finished about 35% of its docs). I'm still trying to understand how to properly parallelize tasks. You and Viktor mentioned back-pressure. Can you go a bit deeper in that. For example, what is back-pressure and how to build it into my actor solutions ? (Info links would be all I need). I asked a similar question like this on StackOverflow but no one could point me to the right direction. Thank you for linking Akka-stream's docs. Allen On Saturday, January 10, 2015 at 5:38:42 AM UTC-5, drewhk wrote: Hi, On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aimi...@gmail.com javascript: wrote: Hey Viktor, I'm trying to use Akka to parallelize this process. There shouldn't be any bottleneck, and I don't understand why I got memory overflow with my first version (actor version). The main task is to read in a line, break it up, and turn each segments (strings) into an integer, then prints it out to a CSV file (vectorization process). def processLine(line: String): Unit = { val vector: ListBuffer[String] = ListBuffer() val segs = line.split(,) println(segs(0)) (1 to segs.length - 1).map {i = val factorArray = dictionaries(i-1) vector += factorArray._2.indexOf(segs(i)).toString //get the factor level of string } timer ! OneDone printer ! Print(vector.toList)} When I'm doing this in pure Akka (with actors), since I created 40 million objects: Row(line: String), I get memory overflow issue. No surprise there, you just slurp up all rows faster than the actors can keep up processing them, so most of them are in a mailbox. In fact if your actors do something trivially simple, the whole overhead of asynchronously passing elements to the actors might be larger than what you gain. In these cases it is recommended to pass batches of Rows instead of one-by-one. Remember, parallelisation only gains when the overhead of it is smaller than the task it parallelizes. If I use Akka-stream, there is no memory overflow issue, but the performance is too similar to the non-parallelized version (even slower). No surprise there either, you did nothing to parallelize or pipeline any computation in the stream, so you get the overhead of asynchronous processing and none of the benefits of it (but at least you get backpressure). You have a few approaches to get the benefints of multi-core processing with streams: - if you have multiple processing steps for a row you can pipeline them, see the intro part of this doc page: http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-rate.html - you can use mapAsync to have similar effects but with one computation step, see here: http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-integrations.html#Illustrating_ordering_and_parallelism - you can explicitly add fan-out elements to parallelise among multiple explicit workers, see here: http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-cookbook.html#Balancing_jobs_to_a_fixed_pool_of_workers Overall, for this kind of tasks I recommend using Streams, but you need to read the documentation first to understand how it works. -Endre It's my first time using Akka-stream. So I'm unfamiliar with the optimization you were talking about. Sincerely, Allen On Friday, January 9, 2015 at 4:03:13 PM UTC-5, √ wrote: Hi Allen, What's the bottleneck? Have you tried enabling the experimental optimizations? On Fri, Jan 9, 2015 at 9:52 PM, Allen Nie aimi...@gmail.com wrote: Thank you Soumya, I think Akka-streams is the way to go. However, I would also appreciate some performance boost as well - still have 40 million lines to go through! But thanks anyway! On Friday, January 9, 2015 at 12:43:49 PM UTC-5, Soumya Simanta wrote: I would recommend using the Akka-streams API for this. Here is sample. I was able to process a 1G file with around 1.5 million records in *20MB* of memory. The file read and the writing on the console rates are different but the streams API handles that. This is not the fastest but you at least won't run out of memory. https://lh6.googleusercontent.com/-zdX0n1pvueE/VLATDja3K4I/v18/BH7V1RAuxT8/s1600/1gb_file_processing.png import java.io.FileInputStream import java.util.Scanner import akka.actor.ActorSystem import akka.stream.{FlowMaterializer, MaterializerSettings} import akka.stream.scaladsl.Source import scala.util.Try object StreamingFileReader extends App { val inputStream = new FileInputStream(/path/to/file) val sc = new Scanner(inputStream, UTF-8) implicit val system = ActorSystem(Sys) val settings = MaterializerSettings(system) implicit val materializer = FlowMaterializer(settings.copy(maxInputBufferSize = 256, initialInputBufferSize = 256)) val
Re: [akka-user] Not loosing message due to remote actor unavailable.
Thanks Endre On Fri, Jan 9, 2015 at 5:36 PM, Endre Varga endre.va...@typesafe.com wrote: Hi Matan, On Fri, Jan 9, 2015 at 4:30 PM, Matan Safriel dev.ma...@gmail.com wrote: Thanks, good to know! Does it include a leader election process for when the leader isn't available? From the docs: After gossip convergence a leader for the cluster can be determined. There is no leader election process, the leader can always be recognised deterministically by any node whenever there is gossip convergence. The leader is just a role, any node can be the leader and it can change between convergence rounds. The leader is simply the first node in sorted order that is able to take the leadership role, where the preferred member states for a leader are upand leaving (see the Membership Lifecycle section below for more information about member states). There is no explicit leader election because the leader can be always implicitly determinable from the cluster CRDT state. I was not sure from the documentation whether the leader is a single point of failure, or its relationship to seed nodes. Leader is not a single point of failure, it changes many times during the life of a cluster. Seed nodes are not special either, they are just a set of known IP addresses to bootstrap from (i.e. a set of nodes that can be used to enter the cluster, only one of them that is needed to be up at any time) when a node wants to join. This idea is not unique to Akka, for example quoting from the Cassandra docs: Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a Seed. It's customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide an avenue for all nodes in the cluster to discover one another. -Endre Maybe this should be a different thread. On Fri, Jan 9, 2015 at 5:15 PM, Endre Varga endre.va...@typesafe.com wrote: Hi, On Fri, Jan 9, 2015 at 4:12 PM, Matan Safriel dev.ma...@gmail.com wrote: Thanks Patrik, good to know, because the cluster module may seem like a very specific and rather unfinished reference implementation; What do you mean by unfinished reference implementation? It is a fully supported module of Akka. -Endre Good to have this in the core without relying on the cluster module. On Friday, January 9, 2015 at 11:25:30 AM UTC+2, Patrik Nordwall wrote: On Thu, Jan 8, 2015 at 9:56 PM, Matan Safriel dev@gmail.com wrote: Sorry for awakening this old thread... Is it really the case that there is all this fancy supervision architecture, and then a remote actor that has gone non-responsive, needs to be handled outside of the supervision hierarchy alltogether? was it considered enabling supervision to work such that the supervisor would know in case its remote child has gone unresponsive/gated/quarantined? The information in this thread is outdated. We have added support for proper remote death watch also when using remoting only (without cluster). Perhaps I misinterpret the reference to the cluster module in the response by Dr. Roland Kuhn below. Thanks for clarifying, Matan On Thursday, January 31, 2013 at 10:13:26 PM UTC+2, rkuhn wrote: Hi Juan Pablo, 30 jan 2013 kl. 22:08 skrev Juan Pablo Vergara Villarraga: If a remote actor is not available due to power loss Can the supervision strategy handle the situation? No, loss of actors is managed by the Death Watch ( http://doc.akka.io/docs/akka/2.1.0/general/supervision.html #What_Lifecycle_Monitoring_Means http://www.google.com/url?q=http%3A%2F%2Fdoc.akka.io%2Fdocs%2Fakka%2F2.1.0%2Fgeneral%2Fsupervision.html%23What_Lifecycle_Monitoring_Meanssa=Dsntz=1usg=AFQjCNFsY9OHlk2bVT8CIYRVEBaNRchLyA), but support for detecting unreachable remote nodes is only present in the cluster module. I have coded the example and I have shut down the remote actor system but it seems that the supervision strategy only takes into account Exceptions thrown by the remote actor once reached. Yes, that is correct. I have already implemented the subscription to the events that indicates that error in the connection have occurred. I still need to have access to message the sender sent originally so the message do not get lost. There is nothing you can subscribe to which tells you whether a given message was processed on the remote system. If you cannot lose messages then you need to persist them and use explicit acknowledgements from the receiving actor to retire them from the local storage. You will also need to implement resending and deduplication if you need exactly-once delivery; you might want to read the documentation on message delivery
Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?
Allen, Here is one definition of back pressure http://www.reactivemanifesto.org/glossary#Back-Pressure For example, in your initial user question about memory errors the back pressure mechanism of akka-streams allows processing your data even with a limited memory budget. I have personally found this presentation ( https://www.youtube.com/watch?v=khmVMvlP_QA ) by Roland Kuhn very helpful in understanding the motivations and core concepts behind akka-streams which is an implementation of reactive streams ( http://www.reactive-streams.org/). -Soumya On Sat, Jan 10, 2015 at 7:42 PM, Allen Nie aiming...@gmail.com wrote: Hi Endre, That's a very valid suggestion. I'm quite new to Akka (finished about 35% of its docs). I'm still trying to understand how to properly parallelize tasks. You and Viktor mentioned back-pressure. Can you go a bit deeper in that. For example, what is back-pressure and how to build it into my actor solutions ? (Info links would be all I need). I asked a similar question like this on StackOverflow but no one could point me to the right direction. Thank you for linking Akka-stream's docs. Allen On Saturday, January 10, 2015 at 5:38:42 AM UTC-5, drewhk wrote: Hi, On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aimi...@gmail.com wrote: Hey Viktor, I'm trying to use Akka to parallelize this process. There shouldn't be any bottleneck, and I don't understand why I got memory overflow with my first version (actor version). The main task is to read in a line, break it up, and turn each segments (strings) into an integer, then prints it out to a CSV file (vectorization process). def processLine(line: String): Unit = { val vector: ListBuffer[String] = ListBuffer() val segs = line.split(,) println(segs(0)) (1 to segs.length - 1).map {i = val factorArray = dictionaries(i-1) vector += factorArray._2.indexOf(segs(i)).toString //get the factor level of string } timer ! OneDone printer ! Print(vector.toList)} When I'm doing this in pure Akka (with actors), since I created 40 million objects: Row(line: String), I get memory overflow issue. No surprise there, you just slurp up all rows faster than the actors can keep up processing them, so most of them are in a mailbox. In fact if your actors do something trivially simple, the whole overhead of asynchronously passing elements to the actors might be larger than what you gain. In these cases it is recommended to pass batches of Rows instead of one-by-one. Remember, parallelisation only gains when the overhead of it is smaller than the task it parallelizes. If I use Akka-stream, there is no memory overflow issue, but the performance is too similar to the non-parallelized version (even slower). No surprise there either, you did nothing to parallelize or pipeline any computation in the stream, so you get the overhead of asynchronous processing and none of the benefits of it (but at least you get backpressure). You have a few approaches to get the benefints of multi-core processing with streams: - if you have multiple processing steps for a row you can pipeline them, see the intro part of this doc page: http://doc.akka.io/docs/ akka-stream-and-http-experimental/1.0-M2/scala/stream-rate.html - you can use mapAsync to have similar effects but with one computation step, see here: http://doc.akka.io/docs/akka-stream-and-http- experimental/1.0-M2/scala/stream-integrations.html# Illustrating_ordering_and_parallelism - you can explicitly add fan-out elements to parallelise among multiple explicit workers, see here: http://doc.akka.io/docs/akka-stream-and-http- experimental/1.0-M2/scala/stream-cookbook.html#Balancing_jobs_to_a_fixed_ pool_of_workers Overall, for this kind of tasks I recommend using Streams, but you need to read the documentation first to understand how it works. -Endre It's my first time using Akka-stream. So I'm unfamiliar with the optimization you were talking about. Sincerely, Allen On Friday, January 9, 2015 at 4:03:13 PM UTC-5, √ wrote: Hi Allen, What's the bottleneck? Have you tried enabling the experimental optimizations? On Fri, Jan 9, 2015 at 9:52 PM, Allen Nie aimi...@gmail.com wrote: Thank you Soumya, I think Akka-streams is the way to go. However, I would also appreciate some performance boost as well - still have 40 million lines to go through! But thanks anyway! On Friday, January 9, 2015 at 12:43:49 PM UTC-5, Soumya Simanta wrote: I would recommend using the Akka-streams API for this. Here is sample. I was able to process a 1G file with around 1.5 million records in *20MB* of memory. The file read and the writing on the console rates are different but the streams API handles that. This is not the fastest but you at least won't run out of memory.
Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?
Thanks Soumya. That's very helpful! Drewhk, Sorry to bother you one more time. After reading the documents, I changed my code to this in order to fully utilize parallelism in Akka Stream: fileSource.map {line = line.split(,)} .map { segs = segs(0) +: (1 to segs.length -1).map { i = stableDictionaries(i-1).indexOf(segs(i)).toString }} .foreach{(segs) = println(segs(0)) timer ! OneDone //not sure if this works printer ! Print(segs)}.onComplete { _ = Try { sc.close() inputStream.close() } system.shutdown()} As you can see, one of the heavier duty actually happens within the element of the input array. Is there a way for me to parallelize that process as well since it happened inside a map? Allen On Saturday, January 10, 2015 at 8:03:18 PM UTC-5, Soumya Simanta wrote: Allen, Here is one definition of back pressure http://www.reactivemanifesto.org/glossary#Back-Pressure For example, in your initial user question about memory errors the back pressure mechanism of akka-streams allows processing your data even with a limited memory budget. I have personally found this presentation ( https://www.youtube.com/watch?v=khmVMvlP_QA ) by Roland Kuhn very helpful in understanding the motivations and core concepts behind akka-streams which is an implementation of reactive streams ( http://www.reactive-streams.org/). -Soumya On Sat, Jan 10, 2015 at 7:42 PM, Allen Nie aimi...@gmail.com javascript: wrote: Hi Endre, That's a very valid suggestion. I'm quite new to Akka (finished about 35% of its docs). I'm still trying to understand how to properly parallelize tasks. You and Viktor mentioned back-pressure. Can you go a bit deeper in that. For example, what is back-pressure and how to build it into my actor solutions ? (Info links would be all I need). I asked a similar question like this on StackOverflow but no one could point me to the right direction. Thank you for linking Akka-stream's docs. Allen On Saturday, January 10, 2015 at 5:38:42 AM UTC-5, drewhk wrote: Hi, On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aimi...@gmail.com wrote: Hey Viktor, I'm trying to use Akka to parallelize this process. There shouldn't be any bottleneck, and I don't understand why I got memory overflow with my first version (actor version). The main task is to read in a line, break it up, and turn each segments (strings) into an integer, then prints it out to a CSV file (vectorization process). def processLine(line: String): Unit = { val vector: ListBuffer[String] = ListBuffer() val segs = line.split(,) println(segs(0)) (1 to segs.length - 1).map {i = val factorArray = dictionaries(i-1) vector += factorArray._2.indexOf(segs(i)).toString //get the factor level of string } timer ! OneDone printer ! Print(vector.toList)} When I'm doing this in pure Akka (with actors), since I created 40 million objects: Row(line: String), I get memory overflow issue. No surprise there, you just slurp up all rows faster than the actors can keep up processing them, so most of them are in a mailbox. In fact if your actors do something trivially simple, the whole overhead of asynchronously passing elements to the actors might be larger than what you gain. In these cases it is recommended to pass batches of Rows instead of one-by-one. Remember, parallelisation only gains when the overhead of it is smaller than the task it parallelizes. If I use Akka-stream, there is no memory overflow issue, but the performance is too similar to the non-parallelized version (even slower). No surprise there either, you did nothing to parallelize or pipeline any computation in the stream, so you get the overhead of asynchronous processing and none of the benefits of it (but at least you get backpressure). You have a few approaches to get the benefints of multi-core processing with streams: - if you have multiple processing steps for a row you can pipeline them, see the intro part of this doc page: http://doc.akka.io/docs/ akka-stream-and-http-experimental/1.0-M2/scala/stream-rate.html - you can use mapAsync to have similar effects but with one computation step, see here: http://doc.akka.io/docs/akka-stream-and-http- experimental/1.0-M2/scala/stream-integrations.html# Illustrating_ordering_and_parallelism - you can explicitly add fan-out elements to parallelise among multiple explicit workers, see here: http://doc.akka.io/docs/ akka-stream-and-http-experimental/1.0-M2/scala/stream-cookbook.html# Balancing_jobs_to_a_fixed_pool_of_workers Overall, for this kind of tasks I recommend using Streams, but you need to read the documentation first to understand how it works. -Endre It's my first time using Akka-stream. So I'm unfamiliar with the optimization you were talking about. Sincerely, Allen On Friday, January 9, 2015
Re: [akka-user] Re: How to support looser coupling in scaled out Akka systems?
Thanks Ryan, that certainly makes sense and is something I'm considering, though I would rather not introduce another technology if possible. That contrib from Patrik looks promising, I would be interested to know if anyone has used it in a meaningful production scenario and, if so, what the experience was like. Also if it's ever likely to make it into the main library.. On 11 Jan 2015 13:18, Ryan Tanner ryan.tan...@gmail.com wrote: If you want distributed pub/sub, I would use an actual pub/sub system. Akka can certainly do it, but Kafka or RabbitMQ are built *specifically for that purpose, *especially if you want distributed pub/sub. Of course the publishers and consumers on either end can be Akka-based. Though there is the distributed pub/sub extension in contrib: http://doc.akka.io/docs/akka/snapshot/contrib/distributed-pub-sub.html On Saturday, January 10, 2015 at 8:34:17 PM UTC-7, manwood wrote: I would like to be able to publish messages on an 'message bus' within my Akka system, rather than force actors to know about the actors that consume the events they generate (ie. avoid using *context.actorOf*). I know there is the event bus construct, but my understanding is this is limited to operating within a local process. Is there a recommended way of supporting a properly message-driven Akka architecture that scales across remote processes? -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to a topic in the Google Groups Akka User List group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/GXeqwgU7Bd4/unsubscribe. To unsubscribe from this group and all its topics, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout. -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
[akka-user] Re: How to support looser coupling in scaled out Akka systems?
If you want distributed pub/sub, I would use an actual pub/sub system. Akka can certainly do it, but Kafka or RabbitMQ are built *specifically for that purpose, *especially if you want distributed pub/sub. Of course the publishers and consumers on either end can be Akka-based. Though there is the distributed pub/sub extension in contrib: http://doc.akka.io/docs/akka/snapshot/contrib/distributed-pub-sub.html On Saturday, January 10, 2015 at 8:34:17 PM UTC-7, manwood wrote: I would like to be able to publish messages on an 'message bus' within my Akka system, rather than force actors to know about the actors that consume the events they generate (ie. avoid using *context.actorOf*). I know there is the event bus construct, but my understanding is this is limited to operating within a local process. Is there a recommended way of supporting a properly message-driven Akka architecture that scales across remote processes? -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
[akka-user] How to support looser coupling in scaled out Akka systems?
I would like to be able to publish messages on an 'message bus' within my Akka system, rather than force actors to know about the actors that consume the events they generate (ie. avoid using *context.actorOf*). I know there is the event bus construct, but my understanding is this is limited to operating within a local process. Is there a recommended way of supporting a properly message-driven Akka architecture that scales across remote processes? -- Read the docs: http://akka.io/docs/ Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups Akka User List group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.