Hey, On 6 June 2011 18:19, Džen <yvl...@gmail.com> wrote: > I was wondering about which way would be the easiest/simplest to > serialize data, f.e. being read via a file or stdin (data being a > table of x rows and y columns, each cell a string). I thought of > using NULL bytes as cell delimiters and newline characters as row > delimiters. This way it wouldn't be possible to use \0 nor \n > inside the "cells", but I couldn't think of a simpler solution.
It ultimately depends on the use case. If you don't need \0 or \n in cells, your format is fine. If not, there are two approaches: As Dieter suggested, you can use fixed length fields. This is great if you have a maximum cell width, especially if this length is small or most fields use most of the space. This approach is used in, for example, tarballs' filename fields. However, if the cells dramatically vary in length, and the maximum is rather large, a better alternative is to use length-prefixing, using a number of bytes according to how large you expect your rows and cells to be: 0x000d 0x0006 "hello"\0 0x0007 "world!"\0 That is, 2-byte row length followed by two cells each with a 2-byte cell length (and I've null-terminated the strings in the example). You may need 4 or 8 bytes if your data is very long. The benefit of this is that you can check the row length and jump straight to the next row, or carry on into the row and iterate its cells. It is also completely independent of content: you can store anything. The problem with using ASCII values is you can't store binary data, and you have to check each cell's content and everything. It's a hassle; using length-prefixing is way easier. (This approach is very often used in binary protocols, such as 9P and Sam.) Thanks, cls