Re: [Pharo-users] running out of memory while processing a 220MB csv file with NeoCSVReader - tips?

stepharo Sat, 15 Nov 2014 11:46:29 -0800

Thanks for this cool discussion.
I will add this to the NepCSV chapter :)


On 14/11/14 21:00, Sven Van Caekenberghe wrote:

This is what I tried:

'paul.csv' asFileReference writeStreamDo: [ :file|
   ZnBufferedWriteStream on: file do: [ :out |
     (NeoCSVWriter on: out) in: [ :writer |
       writer writeHeader: { #Number. #Color. #Integer. #Boolean}.
       1 to: 1e7 do: [ :each |
         writer nextPut: { each. #(Red Green Blue) atRandom. 1e6 atRandom. 
#(true false) atRandom } ] ] ] ].

This results in a 300Mb file:

$ ls -lah paul.csv
-rw-r--r--@ 1 sven  staff   327M Nov 14 20:45 paul.csv
$ wc paul.csv
  10000001 10000001 342781577 paul.csv

This is a selective read and collect (loads about 10K records):

Array streamContents: [ :out |
   'paul.csv' asFileReference readStreamDo: [ :in |
     (NeoCSVReader on: (ZnBufferedReadStream on: in)) in: [ :reader |
       reader skipHeader; addIntegerField; addSymbolField; addIntegerField; 
addFieldConverter: [ :x | x = #true ].
       reader do: [ :each | each third < 1000 ifTrue: [ out nextPut: each ] ] ] 
] ].

Re: [Pharo-users] running out of memory while processing a 220MB csv file with NeoCSVReader - tips?

Reply via email to