Hi,
I am trying to process a csv file with 40 million lines of data in there. It's a 5GB size file. I'm trying to use Akka to parallelize the task. However, it seems like I can't stop the quick memory growth. It expanded from 1GB to almost 15GB (the limit I set) under 5 minutes. This is the code in my main() method: val inputStream = new FileInputStream("E:\\Allen\\DataScience\\train\\train.csv")val sc = new Scanner(inputStream, "UTF-8") var counter = 0 while (sc.hasNextLine) { rowActors(counter % 20) ! Row(sc.nextLine()) counter += 1} sc.close() inputStream.close() Someone pointed out that I was essentially creating 40 million Row objects, which naturally will take up a lot of space. My row actor is not doing much. Just simply transforming each line into an array of integers (if you are familiar with the concept of vectorizing, that's what I'm doing). Then the transformed array gets printed out. Done. I originally thought there was a memory leak but maybe I'm not managing memory right. Can I get any wise suggestions from the Akka experts here?? <http://i.stack.imgur.com/yQ4xx.png> -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.