Pretty much answers my question. In my use case it'd be easier to use delimiters like \0 or \n, due to the data not being binary. However now I wonder, which method would need more cpu time? I suppose that when using delimiters there isn't a easier way than using fgetc(), reading through the whole data stream. Hard-coded field lengths would be faster if the fields contain a lot of characters I guess.
On 06/06/2011 20:22, Connor Lane Smith wrote:
It ultimately depends on the use case. If you don't need \0 or \n in cells, your format is fine. If not, there are two approaches: As Dieter suggested, you can use fixed length fields. This is great if you have a maximum cell width, especially if this length is small or most fields use most of the space. This approach is used in, for example, tarballs' filename fields. However, if the cells dramatically vary in length, and the maximum is rather large, a better alternative is to use length-prefixing, using a number of bytes according to how large you expect your rows and cells to be: 0x000d 0x0006 "hello"\0 0x0007 "world!"\0 That is, 2-byte row length followed by two cells each with a 2-byte cell length (and I've null-terminated the strings in the example). You may need 4 or 8 bytes if your data is very long. The benefit of this is that you can check the row length and jump straight to the next row, or carry on into the row and iterate its cells. It is also completely independent of content: you can store anything. The problem with using ASCII values is you can't store binary data, and you have to check each cell's content and everything. It's a hassle; using length-prefixing is way easier. (This approach is very often used in binary protocols, such as 9P and Sam.)
-- Džen