Pretty much answers my question. In my use case it'd be easier to use
delimiters like \0 or \n, due to the data not being binary. However now
I wonder, which method would need more cpu time? I suppose that when
using delimiters there isn't a easier way than using fgetc(), reading
through the whole data stream. Hard-coded field lengths would be faster
if the fields contain a lot of characters I guess.

On 06/06/2011 20:22, Connor Lane Smith wrote:
It ultimately depends on the use case. If you don't need \0 or \n in
cells, your format is fine. If not, there are two approaches:

As Dieter suggested, you can use fixed length fields. This is great if
you have a maximum cell width, especially if this length is small or
most fields use most of the space. This approach is used in, for
example, tarballs' filename fields.

However, if the cells dramatically vary in length, and the maximum is
rather large, a better alternative is to use length-prefixing, using a
number of bytes according to how large you expect your rows and cells
to be:

0x000d 0x0006 "hello"\0 0x0007 "world!"\0

That is, 2-byte row length followed by two cells each with a 2-byte
cell length (and I've null-terminated the strings in the example). You
may need 4 or 8 bytes if your data is very long. The benefit of this
is that you can check the row length and jump straight to the next
row, or carry on into the row and iterate its cells. It is also
completely independent of content: you can store anything.

The problem with using ASCII values is you can't store binary data,
and you have to check each cell's content and everything. It's a
hassle; using length-prefixing is way easier.

(This approach is very often used in binary protocols, such as 9P and Sam.)

--
Džen

Reply via email to