Re: Performant method for reading huge text files

2014-02-06 Thread Marco Leise
Am Tue, 04 Feb 2014 00:04:22 +
schrieb Rene Zwanenburg renezwanenb...@gmail.com:

 On Monday, 3 February 2014 at 23:50:54 UTC, bearophile wrote:
  Rene Zwanenburg:
 
  The problem is speed. I'm using LockingTextReader in 
  std.stdio, but it't not nearly fast enough. On my system it 
  only reads about 3 MB/s with one core spending all it's time 
  in IO calls.
 
  Are you reading the text by lines? In Bugzilla there is a 
  byLineFast:
  https://d.puremagic.com/issues/show_bug.cgi?id=11810
 
  Bye,
  bearophile
 
 Nope, I'm feeding it to csvReader which uses an input range of 
 characters. Come to think of it..
 
 Well this is embarassing, I've been sloppy with my profiling :). 
 It appears the time is actually spent converting strings to 
 doubles, done by csvReader to read a row into my Record struct. 
 No way to speed that up I suppose. Still I find it surprising 
 that parsing doubles is so slow.

Parsing textual representations of numbers is slow. The other
way around is faster. You have to check all kinds of stuff,
like preceding +/-, starts with a dot, are all characters '0'
to '9', is there an exponent? Is it NaN or nan?
Floating point math is slow, but when you store the
intermediate results while parsing inside an integer, you may
run out of digits if the number string is long. On the other
hand repeated floating point math will introduce some error
as you append digits.

Here is the ~400 lines version in Phobos:
https://github.com/D-Programming-Language/phobos/blob/master/std/conv.d#L2250

-- 
Marco



Re: Performant method for reading huge text files

2014-02-05 Thread Kagamin

You can also try a BufferedRange.
http://forum.dlang.org/thread/l9q66g$2he3$1...@digitalmars.com


Re: Performant method for reading huge text files

2014-02-04 Thread Chris Williams
On Tuesday, 4 February 2014 at 00:04:23 UTC, Rene Zwanenburg 
wrote:

On Monday, 3 February 2014 at 23:50:54 UTC, bearophile wrote:

Rene Zwanenburg:

The problem is speed. I'm using LockingTextReader in 
std.stdio, but it't not nearly fast enough. On my system it 
only reads about 3 MB/s with one core spending all it's time 
in IO calls.


Are you reading the text by lines? In Bugzilla there is a 
byLineFast:

https://d.puremagic.com/issues/show_bug.cgi?id=11810

Bye,
bearophile


Nope, I'm feeding it to csvReader which uses an input range of 
characters. Come to think of it..


Well this is embarassing, I've been sloppy with my profiling 
:). It appears the time is actually spent converting strings to 
doubles, done by csvReader to read a row into my Record struct. 
No way to speed that up I suppose. Still I find it surprising 
that parsing doubles is so slow.


Parsing should be faster than I/O. Set up two buffers and have 
one thread reading into buffer A while you parse buffer B with a 
second thread.


Re: Performant method for reading huge text files

2014-02-04 Thread Chris Williams
Parsing should be faster than I/O. Set up two buffers and have 
one thread reading into buffer A while you parse buffer B with 
a second thread.


...and then flip buffers whenever the slower of the two has 
completed.


Re: Performant method for reading huge text files

2014-02-03 Thread bearophile

Rene Zwanenburg:

The problem is speed. I'm using LockingTextReader in std.stdio, 
but it't not nearly fast enough. On my system it only reads 
about 3 MB/s with one core spending all it's time in IO calls.


Are you reading the text by lines? In Bugzilla there is a 
byLineFast:

https://d.puremagic.com/issues/show_bug.cgi?id=11810

Bye,
bearophile


Re: Performant method for reading huge text files

2014-02-03 Thread Rene Zwanenburg

On Monday, 3 February 2014 at 23:50:54 UTC, bearophile wrote:

Rene Zwanenburg:

The problem is speed. I'm using LockingTextReader in 
std.stdio, but it't not nearly fast enough. On my system it 
only reads about 3 MB/s with one core spending all it's time 
in IO calls.


Are you reading the text by lines? In Bugzilla there is a 
byLineFast:

https://d.puremagic.com/issues/show_bug.cgi?id=11810

Bye,
bearophile


Nope, I'm feeding it to csvReader which uses an input range of 
characters. Come to think of it..


Well this is embarassing, I've been sloppy with my profiling :). 
It appears the time is actually spent converting strings to 
doubles, done by csvReader to read a row into my Record struct. 
No way to speed that up I suppose. Still I find it surprising 
that parsing doubles is so slow.