On Thu, 2009-09-10 at 17:36 -0400, Ben Scott wrote: > Just for the sake of example: Bruce said 160 MB of data. Let's > assume it's all 4-byte integers. That's roughly 42 million integers. > Calling sprintf() and sscanf() 42 million times is going to slow > things down. Likewise, if we assume a newline separated format and > all significant digits used, an ASCII representation is going to use > 11 bytes per integer, turning 160 MB into 440 MB. >
The ASCII is triple the binary in size. That could be bearable in most situations. It should also compress fairly well. The conversions are trivial to code and run in your favorite scripting language. I used Python (see below). Round trip time for 10 million floats (doubles) was about 45 seconds. Integers would have been quicker. Presumably C would be faster, at the cost of a bit more code and complexity. So Bruce would be looking at about 3 minutes of processing time if his hardware matched mine. I'm not second guessing Bruce's decision here. It's all about getting the most out of your time using the available tools. >>>> Python Code >>>>>> In [11]: m10 = 10 * 1000 * 1000 # easier on the eyes than a long list of 0s In [12]: m10 Out[12]: 10000000 In [17]: f_list = [random.random()*20 for x in xrange(m10)] # force some of the random numbers to be greater than 1 In [24]: now();s_list = map(repr, f_list);now() Out[24]: datetime.datetime(2009, 9, 11, 10, 12, 22, 549050) Out[24]: datetime.datetime(2009, 9, 11, 10, 12, 54, 261281) # created 10,000,000 float strings in 32 seconds In [25]: now();f2_list = map( float, s_list); now() Out[25]: datetime.datetime(2009, 9, 11, 10, 13, 11, 215100) Out[25]: datetime.datetime(2009, 9, 11, 10, 13, 24, 218123) # converted 10,000,000 strings to float in 13 seconds In [26]: f_list[:10] Out[26]: [3.2547270222254054, 4.1187838723903596, 19.029531987086656, 14.980165347124705, 2.1337003969489698, 8.2395337150073527, 4.7579966946618608, 0.88969361970157923, 9.5651010251147905, 16.707563948930382] In [27]: f2_list[:10] Out[27]: [3.2547270222254054, 4.1187838723903596, 19.029531987086656, 14.980165347124705, 2.1337003969489698, 8.2395337150073527, 4.7579966946618608, 0.88969361970157923, 9.5651010251147905, 16.707563948930382] In [28]: from itertools import izip In [29]: any(f1-f2 for (f1,f2) in izip(f_list, f2_list)) Out[29]: False # all differences were 0 so the round trip processing was correct <<<<<<< end of python code <<<<<<<< -- Lloyd Kvam Venix Corp DLSLUG/GNHLUG library http://dlslug.org/library.html http://www.librarything.com/catalog/dlslug http://www.librarything.com/rsshtml/recent/dlslug http://www.librarything.com/rss/recent/dlslug _______________________________________________ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/