Re: [akka-user] TimeoutException when using tell on an ActorSelection if the target actor is created through AskSupport

2015-01-10 Thread Patrik Nordwall
On Fri, Jan 9, 2015 at 10:06 PM, Ferdinand Hübner 
ferdinand.hueb...@gmail.com wrote:



 On Friday, January 9, 2015 at 7:34:08 PM UTC+1, Patrik Nordwall wrote:


 9 jan 2015 kl. 18:10 skrev Ferdinand Hübner ferdinan...@gmail.com:

 Yes, that is what I'm doing. I'm passing the sender() ActorRef to other
 actors until I am able to reply to it with deliver. When I reply with
 delivery, I call path on the ActorRef.
 In my service layer, I would send a confirmation using tell with
 ActorRef.noSender once the future from AskSupport completes.

 That is not going to work. Let's say that the confirmation message is
 lost. Then AtLeastOnceDelivery will resend it to the path of the
 PromiseActorRef (created by ask), but that is already completed and the
 resent message will go to deadLetters, and be retried again.


 I am aware of that and decided to ignore it at this point until I am able
 to decide if AtLeastOnceSupport is really something that I want and need.
 My idea was to handle UnconfirmedWarning by simply confirming the messages
 it contains.


 Would it be possible to implement it without ask?


 Yes, that should be possible. I never really thought about implementing it
 without ask. It's the first thing that came to my mind and worked well so
 far.
 I'll be going for a less temporay actor that completes promises and
 confirms deliveryIds that are not pending completion.

 Thank you for the help and your suggestions.


You're welcome.
/Patrik


 Ferdinand

 --
  Read the docs: http://akka.io/docs/
  Check the FAQ:
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
 ---
 You received this message because you are subscribed to the Google Groups
 Akka User List group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to akka-user+unsubscr...@googlegroups.com.
 To post to this group, send email to akka-user@googlegroups.com.
 Visit this group at http://groups.google.com/group/akka-user.
 For more options, visit https://groups.google.com/d/optout.




-- 

Patrik Nordwall
Typesafe http://typesafe.com/ -  Reactive apps on the JVM
Twitter: @patriknw

-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Docs/samples on large file (10GB) uploads using akka-stream

2015-01-10 Thread Yogesh Pandit
Hello,

I am writing a web-service to upload file/s as large as 10GB. Just looking 
for some pointers on how to go about it using akka-stream (haven't used it 
every before)
Thanks,

-Yogesh

-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?

2015-01-10 Thread Viktor Klang
Hi Allen,

I'd suspect the reason that it works well with Akka Streams is that they
have back-pressure while your actor solution does not (you'll send 40
million messages as fast as you can, but the actor processing them might
not be able to keep up)

On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aiming...@gmail.com wrote:

 Hey Viktor,

 I'm trying to use Akka to parallelize this process. There shouldn't be
 any bottleneck, and I don't understand why I got memory overflow with my
 first version (actor version). The main task is to read in a line, break it
 up, and turn each segments (strings) into an integer, then prints it out to
 a CSV file (vectorization process).

def processLine(line: String): Unit = {

   val vector: ListBuffer[String] = ListBuffer()
   val segs = line.split(,)

   println(segs(0))

   (1 to segs.length - 1).map {i =
 val factorArray = dictionaries(i-1)
 vector += factorArray._2.indexOf(segs(i)).toString   //get the factor 
 level of string
   }

   timer ! OneDone

   printer ! Print(vector.toList)}


 When I'm doing this in pure Akka (with actors), since I created 40
 million objects: Row(line: String), I get memory overflow issue. If I use
 Akka-stream, there is no memory overflow issue, but the performance is too
 similar to the non-parallelized version (even slower).

 It's my first time using Akka-stream. So I'm unfamiliar with the
 optimization you were talking about.

 Sincerely,
 Allen

 On Friday, January 9, 2015 at 4:03:13 PM UTC-5, √ wrote:

 Hi Allen,

 What's the bottleneck?
 Have you tried enabling the experimental optimizations?

 On Fri, Jan 9, 2015 at 9:52 PM, Allen Nie aimi...@gmail.com wrote:

 Thank you Soumya,

I think Akka-streams is the way to go. However, I would also
 appreciate some performance boost as well - still have 40 million lines to
 go through! But thanks anyway!


 On Friday, January 9, 2015 at 12:43:49 PM UTC-5, Soumya Simanta wrote:

 I would recommend using the Akka-streams API for this.
 Here is sample. I was able to process a 1G file with around 1.5 million
 records in *20MB* of memory. The file read and the writing on the
 console rates are different but the streams API handles that.  This is not
 the fastest but you at least won't run out of memory.



 https://lh6.googleusercontent.com/-zdX0n1pvueE/VLATDja3K4I/v18/BH7V1RAuxT8/s1600/1gb_file_processing.png

 import java.io.FileInputStream
 import java.util.Scanner

 import akka.actor.ActorSystem
 import akka.stream.{FlowMaterializer, MaterializerSettings}
 import akka.stream.scaladsl.Source

 import scala.util.Try


 object StreamingFileReader extends App {


   val inputStream = new FileInputStream(/path/to/file)
   val sc = new Scanner(inputStream, UTF-8)

   implicit val system = ActorSystem(Sys)
   val settings = MaterializerSettings(system)
   implicit val materializer = 
 FlowMaterializer(settings.copy(maxInputBufferSize
 = 256, initialInputBufferSize = 256))

   val fileSource = Source(() = Iterator.continually(sc.nextLine()))

   import system.dispatcher

   fileSource.map { line =
 line //do nothing
   //in the for each print the line.
   }.foreach(println).onComplete { _ =
 Try {
   sc.close()
   inputStream.close()
 }
 system.shutdown()
   }
 }




 On Friday, January 9, 2015 at 10:53:33 AM UTC-5, Allen Nie wrote:

 Hi,

 I am trying to process a csv file with 40 million lines of data in
 there. It's a 5GB size file. I'm trying to use Akka to parallelize the
 task. However, it seems like I can't stop the quick memory growth. It
 expanded from 1GB to almost 15GB (the limit I set) under 5 minutes. This 
 is
 the code in my main() method:

 val inputStream = new 
 FileInputStream(E:\\Allen\\DataScience\\train\\train.csv)val sc = new 
 Scanner(inputStream, UTF-8)
 var counter = 0
 while (sc.hasNextLine) {

   rowActors(counter % 20) ! Row(sc.nextLine())

   counter += 1}

 sc.close()
 inputStream.close()

 Someone pointed out that I was essentially creating 40 million Row
 objects, which naturally will take up a lot of space. My row actor is not
 doing much. Just simply transforming each line into an array of integers
 (if you are familiar with the concept of vectorizing, that's what I'm
 doing). Then the transformed array gets printed out. Done. I originally
 thought there was a memory leak but maybe I'm not managing memory right.
 Can I get any wise suggestions from the Akka experts here??



 http://i.stack.imgur.com/yQ4xx.png

  --
  Read the docs: http://akka.io/docs/
  Check the FAQ: http://doc.akka.io/docs/akka/
 current/additional/faq.html
  Search the archives: https://groups.google.com/
 group/akka-user
 ---
 You received this message because you are subscribed to the Google
 Groups Akka User List group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to akka-user+...@googlegroups.com.
 To post to this group, send email to akka...@googlegroups.com.
 Visit this group at 

Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?

2015-01-10 Thread Endre Varga
Hi,

On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aiming...@gmail.com wrote:

 Hey Viktor,

 I'm trying to use Akka to parallelize this process. There shouldn't be
 any bottleneck, and I don't understand why I got memory overflow with my
 first version (actor version). The main task is to read in a line, break it
 up, and turn each segments (strings) into an integer, then prints it out to
 a CSV file (vectorization process).

def processLine(line: String): Unit = {

   val vector: ListBuffer[String] = ListBuffer()
   val segs = line.split(,)

   println(segs(0))

   (1 to segs.length - 1).map {i =
 val factorArray = dictionaries(i-1)
 vector += factorArray._2.indexOf(segs(i)).toString   //get the factor 
 level of string
   }

   timer ! OneDone

   printer ! Print(vector.toList)}


 When I'm doing this in pure Akka (with actors), since I created 40
 million objects: Row(line: String), I get memory overflow issue.


No surprise there, you just slurp up all rows faster than the actors can
keep up processing them, so most of them are in a mailbox. In fact if your
actors do something trivially simple, the whole overhead of asynchronously
passing elements to the actors might be larger than what you gain. In these
cases it is recommended to pass batches of Rows instead of one-by-one.
Remember, parallelisation only gains when the overhead of it is smaller
than the task it parallelizes.



 If I use Akka-stream, there is no memory overflow issue, but the
 performance is too similar to the non-parallelized version (even slower).


No surprise there either, you did nothing to parallelize or pipeline any
computation in the stream, so you get the overhead of asynchronous
processing and none of the benefits of it (but at least you get
backpressure).

You have a few approaches to get the benefints of multi-core processing
with streams:
 - if you have multiple processing steps for a row you can pipeline them,
see the intro part of this doc page:
http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-rate.html
 - you can use mapAsync to have similar effects but with one computation
step, see here:
http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-integrations.html#Illustrating_ordering_and_parallelism
 - you can explicitly add fan-out elements to parallelise among multiple
explicit workers, see here:
http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-cookbook.html#Balancing_jobs_to_a_fixed_pool_of_workers

Overall, for this kind of tasks I recommend using Streams, but you need to
read the documentation first to understand how it works.

-Endre



 It's my first time using Akka-stream. So I'm unfamiliar with the
 optimization you were talking about.

 Sincerely,
 Allen

 On Friday, January 9, 2015 at 4:03:13 PM UTC-5, √ wrote:

 Hi Allen,

 What's the bottleneck?
 Have you tried enabling the experimental optimizations?

 On Fri, Jan 9, 2015 at 9:52 PM, Allen Nie aimi...@gmail.com wrote:

 Thank you Soumya,

I think Akka-streams is the way to go. However, I would also
 appreciate some performance boost as well - still have 40 million lines to
 go through! But thanks anyway!


 On Friday, January 9, 2015 at 12:43:49 PM UTC-5, Soumya Simanta wrote:

 I would recommend using the Akka-streams API for this.
 Here is sample. I was able to process a 1G file with around 1.5 million
 records in *20MB* of memory. The file read and the writing on the
 console rates are different but the streams API handles that.  This is not
 the fastest but you at least won't run out of memory.



 https://lh6.googleusercontent.com/-zdX0n1pvueE/VLATDja3K4I/v18/BH7V1RAuxT8/s1600/1gb_file_processing.png

 import java.io.FileInputStream
 import java.util.Scanner

 import akka.actor.ActorSystem
 import akka.stream.{FlowMaterializer, MaterializerSettings}
 import akka.stream.scaladsl.Source

 import scala.util.Try


 object StreamingFileReader extends App {


   val inputStream = new FileInputStream(/path/to/file)
   val sc = new Scanner(inputStream, UTF-8)

   implicit val system = ActorSystem(Sys)
   val settings = MaterializerSettings(system)
   implicit val materializer = 
 FlowMaterializer(settings.copy(maxInputBufferSize
 = 256, initialInputBufferSize = 256))

   val fileSource = Source(() = Iterator.continually(sc.nextLine()))

   import system.dispatcher

   fileSource.map { line =
 line //do nothing
   //in the for each print the line.
   }.foreach(println).onComplete { _ =
 Try {
   sc.close()
   inputStream.close()
 }
 system.shutdown()
   }
 }




 On Friday, January 9, 2015 at 10:53:33 AM UTC-5, Allen Nie wrote:

 Hi,

 I am trying to process a csv file with 40 million lines of data in
 there. It's a 5GB size file. I'm trying to use Akka to parallelize the
 task. However, it seems like I can't stop the quick memory growth. It
 expanded from 1GB to almost 15GB (the limit I set) under 5 minutes. This 

[akka-user] Re: Database Actor Read Model

2015-01-10 Thread Eric Pederson
Hi Andy - we are using an implementation of the Message Sequence 
http://www.eaipatterns.com/MessageSequence.html EIP, which chunks the 
messages.

The code is 
here: 
https://github.com/sourcedelica/akka/blob/wip-3842-message-sequence-pattern-ericp/akka-contrib/src/main/scala/akka/contrib/pattern/MessageSequence.scala

Test is 
here: 
https://github.com/sourcedelica/akka/blob/wip-3842-message-sequence-pattern-ericp/akka-contrib/src/test/scala/akka/contrib/pattern/MessageSequenceSpec.scala

The other alternative is to store the large data in a shared place, like a 
distributed cache or NFS, and then send a reference to that location in the 
message.

On Friday, January 9, 2015 at 6:38:32 AM UTC-5, Andy Zelinski wrote:

 Lots of database actor questions, but I couldn't find one that directly 
 addressed data size limits:

 I want to send requests for data from a frontend to a Cluster Singleton 
 which will forward each request to a Database Actor.

 The database actor runs query, then from the Row result constructs a 
 domain object. this works great for case class Tweet(id: TweetId, user: 
 User, text: Text, createdAt: Date).

 But what if we need to gather and pass around data from much wider tables 
 than that?

 say we need to gather a bunch of user data.  profile data, preference 
 data, history, purchases, moderately large collection, etc. for use both by 
 the akka cluster that performs our logic,

 and also simply to pass data to the frontend for templating. Put simply, 
 potentially much more than the 120Kb maximum-frame-size. 

 thus the following conundrum: 

 1. want to limit number of queries to database per data request, since 
 that is the biggest bottleneck, area for headache. (although im using an 
 async driver so its not as bad as blocking jdbc)
 2. most data will be requested together, so want to to be able to get all 
 row data in one query
 3. thus the database actor will have too much data to send as a message 
 back to its parent and ultimately to frontend actor

 of course i could split the query results into multiple messages, but that 
 necessitates changing the whole actor system.

 can chunking or some kind of dedicated channel save me from having to 
 completely re-think my backend?

 thanks! 








-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Sharing message queue between two akka actors?

2015-01-10 Thread Krishna Kadam
HI all akka experts,

I have following questions for you
1. Is it possible to share message queue between two akka actors?
2. Is there any effect of increasing number of dispatchers on the message 
processing rate of akka actors? 
3. What are the factors that affect the rate of message processing using 
akka Actors? 

*Thanks  Regards *
*Shrikrishna Kadam*

-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Sharing message queue between two akka actors?

2015-01-10 Thread Viktor Klang
Hi Krishna!

On Sat, Jan 10, 2015 at 5:16 PM, Krishna Kadam shrikrishna.kad...@gmail.com
 wrote:

 HI all akka experts,

 I have following questions for you
 1. Is it possible to share message queue between two akka actors?


Yes and no, there's a BalancingRouter, but it's only for that one.


 2. Is there any effect of increasing number of dispatchers on the message
 processing rate of akka actors?


Having multiple dispatchers is about bulkheading different actor subtrees,
not about processing rate (throughput).


 3. What are the factors that affect the rate of message processing using
 akka Actors?


Short answer: Little's Law
Long answer: Depending where the bottleneck is, you may want to tune
dispatcher settings (throughput iso fairness, backing executor service,
number of threads and mailbox implementation).



 *Thanks  Regards *
 *Shrikrishna Kadam*

 --
  Read the docs: http://akka.io/docs/
  Check the FAQ:
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
 ---
 You received this message because you are subscribed to the Google Groups
 Akka User List group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to akka-user+unsubscr...@googlegroups.com.
 To post to this group, send email to akka-user@googlegroups.com.
 Visit this group at http://groups.google.com/group/akka-user.
 For more options, visit https://groups.google.com/d/optout.




-- 
Cheers,
√

-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?

2015-01-10 Thread Allen Nie
Hi Endre,

That's a very valid suggestion. I'm quite new to Akka (finished about 
35% of its docs). I'm still trying to understand how to properly 
parallelize tasks. You and Viktor mentioned back-pressure. Can you go a bit 
deeper in that. For example, what is back-pressure and how to build it into 
my actor solutions ? (Info links would be all I need). I asked a similar 
question like this on StackOverflow but no one could point me to the right 
direction.

Thank you for linking Akka-stream's docs.

Allen


On Saturday, January 10, 2015 at 5:38:42 AM UTC-5, drewhk wrote:

 Hi,

 On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aimi...@gmail.com 
 javascript: wrote:

 Hey Viktor,

 I'm trying to use Akka to parallelize this process. There shouldn't 
 be any bottleneck, and I don't understand why I got memory overflow with my 
 first version (actor version). The main task is to read in a line, break it 
 up, and turn each segments (strings) into an integer, then prints it out to 
 a CSV file (vectorization process).

def processLine(line: String): Unit = {

   val vector: ListBuffer[String] = ListBuffer()
   val segs = line.split(,)

   println(segs(0))

   (1 to segs.length - 1).map {i =
 val factorArray = dictionaries(i-1)
 vector += factorArray._2.indexOf(segs(i)).toString   //get the factor 
 level of string
   }

   timer ! OneDone

   printer ! Print(vector.toList)}


 When I'm doing this in pure Akka (with actors), since I created 40 
 million objects: Row(line: String), I get memory overflow issue. 


 No surprise there, you just slurp up all rows faster than the actors can 
 keep up processing them, so most of them are in a mailbox. In fact if your 
 actors do something trivially simple, the whole overhead of asynchronously 
 passing elements to the actors might be larger than what you gain. In these 
 cases it is recommended to pass batches of Rows instead of one-by-one. 
 Remember, parallelisation only gains when the overhead of it is smaller 
 than the task it parallelizes. 

  

 If I use Akka-stream, there is no memory overflow issue, but the 
 performance is too similar to the non-parallelized version (even slower).


 No surprise there either, you did nothing to parallelize or pipeline any 
 computation in the stream, so you get the overhead of asynchronous 
 processing and none of the benefits of it (but at least you get 
 backpressure).

 You have a few approaches to get the benefints of multi-core processing 
 with streams:
  - if you have multiple processing steps for a row you can pipeline them, 
 see the intro part of this doc page: 
 http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-rate.html
  - you can use mapAsync to have similar effects but with one computation 
 step, see here: 
 http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-integrations.html#Illustrating_ordering_and_parallelism
  - you can explicitly add fan-out elements to parallelise among multiple 
 explicit workers, see here: 
 http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-M2/scala/stream-cookbook.html#Balancing_jobs_to_a_fixed_pool_of_workers

 Overall, for this kind of tasks I recommend using Streams, but you need to 
 read the documentation first to understand how it works. 

 -Endre
  


 It's my first time using Akka-stream. So I'm unfamiliar with the 
 optimization you were talking about.

 Sincerely,
 Allen

 On Friday, January 9, 2015 at 4:03:13 PM UTC-5, √ wrote:

 Hi Allen,

 What's the bottleneck?
 Have you tried enabling the experimental optimizations?

 On Fri, Jan 9, 2015 at 9:52 PM, Allen Nie aimi...@gmail.com wrote:

 Thank you Soumya, 

I think Akka-streams is the way to go. However, I would also 
 appreciate some performance boost as well - still have 40 million lines to 
 go through! But thanks anyway!


 On Friday, January 9, 2015 at 12:43:49 PM UTC-5, Soumya Simanta wrote:

 I would recommend using the Akka-streams API for this. 
 Here is sample. I was able to process a 1G file with around 1.5 
 million records in *20MB* of memory. The file read and the writing on 
 the console rates are different but the streams API handles that.  This 
 is 
 not the fastest but you at least won't run out of memory. 



 https://lh6.googleusercontent.com/-zdX0n1pvueE/VLATDja3K4I/v18/BH7V1RAuxT8/s1600/1gb_file_processing.png

 import java.io.FileInputStream
 import java.util.Scanner

 import akka.actor.ActorSystem
 import akka.stream.{FlowMaterializer, MaterializerSettings}
 import akka.stream.scaladsl.Source

 import scala.util.Try


 object StreamingFileReader extends App {


   val inputStream = new FileInputStream(/path/to/file)
   val sc = new Scanner(inputStream, UTF-8)

   implicit val system = ActorSystem(Sys)
   val settings = MaterializerSettings(system)
   implicit val materializer = 
 FlowMaterializer(settings.copy(maxInputBufferSize 
 = 256, initialInputBufferSize = 256))

   val 

Re: [akka-user] Not loosing message due to remote actor unavailable.

2015-01-10 Thread Matan Safriel
Thanks Endre

On Fri, Jan 9, 2015 at 5:36 PM, Endre Varga endre.va...@typesafe.com
wrote:

 Hi Matan,

 On Fri, Jan 9, 2015 at 4:30 PM, Matan Safriel dev.ma...@gmail.com wrote:

 Thanks, good to know! Does it include a leader election process for when
 the leader isn't available?


 From the docs:

 After gossip convergence a leader for the cluster can be determined.
 There is no leader election process, the leader can always be recognised
 deterministically by any node whenever there is gossip convergence. The
 leader is just a role, any node can be the leader and it can change between
 convergence rounds. The leader is simply the first node in sorted order
 that is able to take the leadership role, where the preferred member states
 for a leader are upand leaving (see the Membership Lifecycle section below
 for more information about member states).

 There is no explicit leader election because the leader can be always
 implicitly determinable from the cluster CRDT state.


 I was not sure from the documentation whether the leader is a single
 point of failure, or its relationship to seed nodes.


 Leader is not a single point of failure, it changes many times during the
 life of a cluster.

 Seed nodes are not special either, they are just a set of known IP
 addresses to bootstrap from (i.e. a set of nodes that can be used to enter
 the cluster, only one of them that is needed to be up at any time) when a
 node wants to join. This idea is not unique to Akka, for example quoting
 from the Cassandra docs:

 Cassandra nodes exchange information about one another using a mechanism
 called Gossip, but to get the ball rolling a newly started node needs to
 know of at least one other, this is called a Seed. It's customary to pick a
 small number of relatively stable nodes to serve as your seeds, but there
 is no hard-and-fast rule here. Do make sure that each seed also knows of at
 least one other, remember, the goal is to avoid a chicken-and-egg scenario
 and provide an avenue for all nodes in the cluster to discover one another.

 -Endre




 Maybe this should be a different thread.

 On Fri, Jan 9, 2015 at 5:15 PM, Endre Varga endre.va...@typesafe.com
 wrote:

 Hi,

 On Fri, Jan 9, 2015 at 4:12 PM, Matan Safriel dev.ma...@gmail.com
 wrote:

 Thanks Patrik, good to know, because the cluster module may seem like a
 very specific and rather unfinished reference implementation;


 What do you mean by unfinished reference implementation? It is a fully
 supported module of Akka.

 -Endre


 Good to have this in the core without relying on the cluster module.

 On Friday, January 9, 2015 at 11:25:30 AM UTC+2, Patrik Nordwall wrote:



 On Thu, Jan 8, 2015 at 9:56 PM, Matan Safriel dev@gmail.com
 wrote:

 Sorry for awakening this old thread...
 Is it really the case that there is all this fancy supervision
 architecture, and then a remote actor that has gone non-responsive, needs
 to be handled outside of the supervision hierarchy alltogether? was it
 considered enabling supervision to work such that the supervisor would 
 know
 in case its remote child has gone unresponsive/gated/quarantined?


 The information in this thread is outdated. We have added support for
 proper remote death watch also when using remoting only (without cluster).



 Perhaps I misinterpret the reference to the cluster module in the
 response by Dr. Roland Kuhn below.

 Thanks for clarifying,
 Matan

 On Thursday, January 31, 2013 at 10:13:26 PM UTC+2, rkuhn wrote:

 Hi Juan Pablo,

 30 jan 2013 kl. 22:08 skrev Juan Pablo Vergara Villarraga:

 If a remote actor is not available due to power loss Can the
 supervision strategy handle the situation?


 No, loss of actors is managed by the Death Watch (
 http://doc.akka.io/docs/akka/2.1.0/general/supervision.html
 #What_Lifecycle_Monitoring_Means
 http://www.google.com/url?q=http%3A%2F%2Fdoc.akka.io%2Fdocs%2Fakka%2F2.1.0%2Fgeneral%2Fsupervision.html%23What_Lifecycle_Monitoring_Meanssa=Dsntz=1usg=AFQjCNFsY9OHlk2bVT8CIYRVEBaNRchLyA),
 but support for detecting unreachable remote nodes is only present in 
 the
 cluster module.

 I have coded the example and I have shut down the remote actor
 system but it seems that the supervision strategy only takes into 
 account
 Exceptions thrown by the remote actor once reached.


 Yes, that is correct.

 I have already implemented the subscription to the events that
 indicates that error in the connection have occurred. I still need to 
 have
 access to message the sender sent originally so the message do not get 
 lost.


 There is nothing you can subscribe to which tells you whether a
 given message was processed on the remote system. If you cannot lose
 messages then you need to persist them and use explicit acknowledgements
 from the receiving actor to retire them from the local storage. You will
 also need to implement resending and deduplication if you need 
 exactly-once
 delivery; you might want to read the documentation on message delivery
 

Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?

2015-01-10 Thread Soumya Simanta
Allen,

Here is one definition of back pressure
http://www.reactivemanifesto.org/glossary#Back-Pressure
For example, in your initial user question about memory errors the back
pressure mechanism  of akka-streams allows processing your data even with a
limited memory budget.

I have personally found this presentation (
https://www.youtube.com/watch?v=khmVMvlP_QA
) by Roland Kuhn very helpful in understanding the motivations and core
concepts behind akka-streams which is an implementation of reactive streams
( http://www.reactive-streams.org/).


-Soumya


On Sat, Jan 10, 2015 at 7:42 PM, Allen Nie aiming...@gmail.com wrote:

 Hi Endre,

 That's a very valid suggestion. I'm quite new to Akka (finished about
 35% of its docs). I'm still trying to understand how to properly
 parallelize tasks. You and Viktor mentioned back-pressure. Can you go a bit
 deeper in that. For example, what is back-pressure and how to build it into
 my actor solutions ? (Info links would be all I need). I asked a similar
 question like this on StackOverflow but no one could point me to the right
 direction.

 Thank you for linking Akka-stream's docs.

 Allen


 On Saturday, January 10, 2015 at 5:38:42 AM UTC-5, drewhk wrote:

 Hi,

 On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aimi...@gmail.com wrote:

 Hey Viktor,

 I'm trying to use Akka to parallelize this process. There shouldn't
 be any bottleneck, and I don't understand why I got memory overflow with my
 first version (actor version). The main task is to read in a line, break it
 up, and turn each segments (strings) into an integer, then prints it out to
 a CSV file (vectorization process).

def processLine(line: String): Unit = {

   val vector: ListBuffer[String] = ListBuffer()
   val segs = line.split(,)

   println(segs(0))

   (1 to segs.length - 1).map {i =
 val factorArray = dictionaries(i-1)
 vector += factorArray._2.indexOf(segs(i)).toString   //get the factor 
 level of string
   }

   timer ! OneDone

   printer ! Print(vector.toList)}


 When I'm doing this in pure Akka (with actors), since I created 40
 million objects: Row(line: String), I get memory overflow issue.


 No surprise there, you just slurp up all rows faster than the actors can
 keep up processing them, so most of them are in a mailbox. In fact if your
 actors do something trivially simple, the whole overhead of asynchronously
 passing elements to the actors might be larger than what you gain. In these
 cases it is recommended to pass batches of Rows instead of one-by-one.
 Remember, parallelisation only gains when the overhead of it is smaller
 than the task it parallelizes.



 If I use Akka-stream, there is no memory overflow issue, but the
 performance is too similar to the non-parallelized version (even slower).


 No surprise there either, you did nothing to parallelize or pipeline any
 computation in the stream, so you get the overhead of asynchronous
 processing and none of the benefits of it (but at least you get
 backpressure).

 You have a few approaches to get the benefints of multi-core processing
 with streams:
  - if you have multiple processing steps for a row you can pipeline them,
 see the intro part of this doc page: http://doc.akka.io/docs/
 akka-stream-and-http-experimental/1.0-M2/scala/stream-rate.html
  - you can use mapAsync to have similar effects but with one computation
 step, see here: http://doc.akka.io/docs/akka-stream-and-http-
 experimental/1.0-M2/scala/stream-integrations.html#
 Illustrating_ordering_and_parallelism
  - you can explicitly add fan-out elements to parallelise among multiple
 explicit workers, see here: http://doc.akka.io/docs/akka-stream-and-http-
 experimental/1.0-M2/scala/stream-cookbook.html#Balancing_jobs_to_a_fixed_
 pool_of_workers

 Overall, for this kind of tasks I recommend using Streams, but you need
 to read the documentation first to understand how it works.

 -Endre



 It's my first time using Akka-stream. So I'm unfamiliar with the
 optimization you were talking about.

 Sincerely,
 Allen

 On Friday, January 9, 2015 at 4:03:13 PM UTC-5, √ wrote:

 Hi Allen,

 What's the bottleneck?
 Have you tried enabling the experimental optimizations?

 On Fri, Jan 9, 2015 at 9:52 PM, Allen Nie aimi...@gmail.com wrote:

 Thank you Soumya,

I think Akka-streams is the way to go. However, I would also
 appreciate some performance boost as well - still have 40 million lines to
 go through! But thanks anyway!


 On Friday, January 9, 2015 at 12:43:49 PM UTC-5, Soumya Simanta wrote:

 I would recommend using the Akka-streams API for this.
 Here is sample. I was able to process a 1G file with around 1.5
 million records in *20MB* of memory. The file read and the writing
 on the console rates are different but the streams API handles that.  
 This
 is not the fastest but you at least won't run out of memory.



 

Re: [akka-user] Re: How to Properly Manage Akka's Memory Usage with 40 Million Lines Task?

2015-01-10 Thread Allen Nie
Thanks Soumya. That's very helpful!

Drewhk, 

Sorry to bother you one more time. After reading the documents, I 
changed my code to this in order to fully utilize parallelism in Akka 
Stream:

fileSource.map {line =

  line.split(,)}
  .map { segs =
  segs(0) +: (1 to segs.length -1).map { i =
stableDictionaries(i-1).indexOf(segs(i)).toString
  }}
  .foreach{(segs) =
  println(segs(0))
  timer ! OneDone //not sure if this works
  printer ! Print(segs)}.onComplete { _ =
  Try {
sc.close()
inputStream.close()
  }
  system.shutdown()}


As you can see, one of the heavier duty actually happens within the 
element of the input array. Is there a way for me to parallelize that 
process as well since it happened inside a map?

Allen

On Saturday, January 10, 2015 at 8:03:18 PM UTC-5, Soumya Simanta wrote:

 Allen, 

 Here is one definition of back pressure 
 http://www.reactivemanifesto.org/glossary#Back-Pressure
 For example, in your initial user question about memory errors the back 
 pressure mechanism  of akka-streams allows processing your data even with a 
 limited memory budget. 

 I have personally found this presentation (
 https://www.youtube.com/watch?v=khmVMvlP_QA
 ) by Roland Kuhn very helpful in understanding the motivations and core 
 concepts behind akka-streams which is an implementation of reactive streams 
 ( http://www.reactive-streams.org/). 


 -Soumya


 On Sat, Jan 10, 2015 at 7:42 PM, Allen Nie aimi...@gmail.com 
 javascript: wrote:

 Hi Endre,

 That's a very valid suggestion. I'm quite new to Akka (finished about 
 35% of its docs). I'm still trying to understand how to properly 
 parallelize tasks. You and Viktor mentioned back-pressure. Can you go a bit 
 deeper in that. For example, what is back-pressure and how to build it into 
 my actor solutions ? (Info links would be all I need). I asked a similar 
 question like this on StackOverflow but no one could point me to the right 
 direction.

 Thank you for linking Akka-stream's docs.

 Allen


 On Saturday, January 10, 2015 at 5:38:42 AM UTC-5, drewhk wrote:

 Hi,

 On Fri, Jan 9, 2015 at 10:53 PM, Allen Nie aimi...@gmail.com wrote:

 Hey Viktor,

 I'm trying to use Akka to parallelize this process. There shouldn't 
 be any bottleneck, and I don't understand why I got memory overflow with 
 my 
 first version (actor version). The main task is to read in a line, break 
 it 
 up, and turn each segments (strings) into an integer, then prints it out 
 to 
 a CSV file (vectorization process).

def processLine(line: String): Unit = {

   val vector: ListBuffer[String] = ListBuffer()
   val segs = line.split(,)

   println(segs(0))

   (1 to segs.length - 1).map {i =
 val factorArray = dictionaries(i-1)
 vector += factorArray._2.indexOf(segs(i)).toString   //get the factor 
 level of string
   }

   timer ! OneDone

   printer ! Print(vector.toList)}


 When I'm doing this in pure Akka (with actors), since I created 40 
 million objects: Row(line: String), I get memory overflow issue. 


 No surprise there, you just slurp up all rows faster than the actors can 
 keep up processing them, so most of them are in a mailbox. In fact if your 
 actors do something trivially simple, the whole overhead of asynchronously 
 passing elements to the actors might be larger than what you gain. In these 
 cases it is recommended to pass batches of Rows instead of one-by-one. 
 Remember, parallelisation only gains when the overhead of it is smaller 
 than the task it parallelizes. 

  

 If I use Akka-stream, there is no memory overflow issue, but the 
 performance is too similar to the non-parallelized version (even slower).


 No surprise there either, you did nothing to parallelize or pipeline any 
 computation in the stream, so you get the overhead of asynchronous 
 processing and none of the benefits of it (but at least you get 
 backpressure).

 You have a few approaches to get the benefints of multi-core processing 
 with streams:
  - if you have multiple processing steps for a row you can pipeline 
 them, see the intro part of this doc page: http://doc.akka.io/docs/
 akka-stream-and-http-experimental/1.0-M2/scala/stream-rate.html
  - you can use mapAsync to have similar effects but with one computation 
 step, see here: http://doc.akka.io/docs/akka-stream-and-http-
 experimental/1.0-M2/scala/stream-integrations.html#
 Illustrating_ordering_and_parallelism
  - you can explicitly add fan-out elements to parallelise among multiple 
 explicit workers, see here: http://doc.akka.io/docs/
 akka-stream-and-http-experimental/1.0-M2/scala/stream-cookbook.html#
 Balancing_jobs_to_a_fixed_pool_of_workers

 Overall, for this kind of tasks I recommend using Streams, but you need 
 to read the documentation first to understand how it works. 

 -Endre
  


 It's my first time using Akka-stream. So I'm unfamiliar with the 
 optimization you were talking about.

 Sincerely,
 Allen

 On Friday, January 9, 2015 

Re: [akka-user] Re: How to support looser coupling in scaled out Akka systems?

2015-01-10 Thread mark
Thanks Ryan, that certainly makes sense and is something I'm considering,
though I would rather not introduce another technology if possible.

That contrib from Patrik looks promising, I would be interested to know if
anyone has used it in a meaningful production scenario and, if so, what the
experience was like. Also if it's ever likely to make it into the main
library..
On 11 Jan 2015 13:18, Ryan Tanner ryan.tan...@gmail.com wrote:

 If you want distributed pub/sub, I would use an actual pub/sub system.
 Akka can certainly do it, but Kafka or RabbitMQ are built *specifically
 for that purpose, *especially if you want distributed pub/sub.  Of course
 the publishers and consumers on either end can be Akka-based.

 Though there is the distributed pub/sub extension in contrib:
 http://doc.akka.io/docs/akka/snapshot/contrib/distributed-pub-sub.html

 On Saturday, January 10, 2015 at 8:34:17 PM UTC-7, manwood wrote:

 I would like to be able to publish messages on an 'message bus' within my
 Akka system, rather than force actors to know about the actors that consume
 the events they generate (ie. avoid using *context.actorOf*).

 I know there is the event bus construct, but my understanding is this is
 limited to operating within a local process. Is there a recommended way of
 supporting a properly message-driven Akka architecture that scales across
 remote processes?

  --
  Read the docs: http://akka.io/docs/
  Check the FAQ:
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
 ---
 You received this message because you are subscribed to a topic in the
 Google Groups Akka User List group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/akka-user/GXeqwgU7Bd4/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 akka-user+unsubscr...@googlegroups.com.
 To post to this group, send email to akka-user@googlegroups.com.
 Visit this group at http://groups.google.com/group/akka-user.
 For more options, visit https://groups.google.com/d/optout.


-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: How to support looser coupling in scaled out Akka systems?

2015-01-10 Thread Ryan Tanner
If you want distributed pub/sub, I would use an actual pub/sub system. 
 Akka can certainly do it, but Kafka or RabbitMQ are built *specifically 
for that purpose, *especially if you want distributed pub/sub.  Of course 
the publishers and consumers on either end can be Akka-based.

Though there is the distributed pub/sub extension in 
contrib: http://doc.akka.io/docs/akka/snapshot/contrib/distributed-pub-sub.html

On Saturday, January 10, 2015 at 8:34:17 PM UTC-7, manwood wrote:

 I would like to be able to publish messages on an 'message bus' within my 
 Akka system, rather than force actors to know about the actors that consume 
 the events they generate (ie. avoid using *context.actorOf*). 

 I know there is the event bus construct, but my understanding is this is 
 limited to operating within a local process. Is there a recommended way of 
 supporting a properly message-driven Akka architecture that scales across 
 remote processes? 


-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] How to support looser coupling in scaled out Akka systems?

2015-01-10 Thread manwood
I would like to be able to publish messages on an 'message bus' within my 
Akka system, rather than force actors to know about the actors that consume 
the events they generate (ie. avoid using *context.actorOf*). 

I know there is the event bus construct, but my understanding is this is 
limited to operating within a local process. Is there a recommended way of 
supporting a properly message-driven Akka architecture that scales across 
remote processes? 

-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.