Hi,

    I am trying to process a csv file with 40 million lines of data in 
there. It's a 5GB size file. I'm trying to use Akka to parallelize the 
task. However, it seems like I can't stop the quick memory growth. It 
expanded from 1GB to almost 15GB (the limit I set) under 5 minutes. This is 
the code in my main() method:

val inputStream = new 
FileInputStream("E:\\Allen\\DataScience\\train\\train.csv")val sc = new 
Scanner(inputStream, "UTF-8")
var counter = 0
while (sc.hasNextLine) {

  rowActors(counter % 20) ! Row(sc.nextLine())

  counter += 1}

sc.close()
inputStream.close()

    Someone pointed out that I was essentially creating 40 million Row 
objects, which naturally will take up a lot of space. My row actor is not 
doing much. Just simply transforming each line into an array of integers 
(if you are familiar with the concept of vectorizing, that's what I'm 
doing). Then the transformed array gets printed out. Done. I originally 
thought there was a memory leak but maybe I'm not managing memory right. 
Can I get any wise suggestions from the Akka experts here??

    

<http://i.stack.imgur.com/yQ4xx.png>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to