Re: best (fastest) way to parse text files ?

Glynn Clements Fri, 31 Jul 1998 12:17:33 -0400

Novak Elliott wrote:

> Since the input files are often large (10meg, 50meg, 100meg not unusual) i
> need to optimize the way in which i read in this data.
> 
> Possible methods that come to mind are:
> 
>       - simply malloc an arbitrary sized array of MYOBJECT's, each with
>         an arbitrary sized array of POINTS, then use fgets() and
>         sscanf() to read in the data, realloc()'ing when memory
>         allocated is insufficient.
> 
>       - use a binary fread() and memchr() to work out how many 'N' chars
>         and hence how many "END"'s and hence how many MYOBJECTS there
>         are. Use fread() and memchr() again to work out how many '\n'
>         chars there are between each "END" hence how many POINT structs
>         to allocate for each MYOBJECT. Then use fgets() and sscanf() to
>         read the data into the structs.
> 
> Both of these methods seem to be very inefficient, either do to constant
> realloc()'ing or re-reading the same (large) file for different info.

Depending upon the number of points per MYOBJECT, it may be more
efficient to read the points into temporary storage (allocated with
alloca()), and then allocate an appropriately-sized array with
malloc() once the END is reached.

Alternatively, store the points in a linked list. NB: don't use
malloc() to allocate individual POINT structures; use malloc() to
allocate large blocks and write your own routine to allocate
individual POINT structures from these blocks.

> Also are fgets() and sscanf() the best way to read in data of this
> form ? I'd like to be able to do everything using fread(), memchr()
> with no realloc()'ing, perhap's with a getnextword() function in
> combination with atod() - is this possible and if so is it any
> quicker ?

The fastest way to read the data will be to write a custom tokeniser
using (f)lex.

-- 
Glynn Clements <[EMAIL PROTECTED]>
Re: best (fastest) way to parse text files ?

Reply via email to