Re: [Gambas-user] fastest way to "zero-out" a huge binary file

Doriano Blengino Mon, 25 Jan 2010 01:30:34 -0800

kevinfishburne ha scritto:
> Doriano Blengino wrote:
>   
>> Read carefully the documentation about the WRITE instruction - in the 
>> first few lines you find the explanation. When you write something out, 
>> you can specify the length of the data you write. If you don't specify 
>> it, gambas will do it for you, and this is what you don't want.
>>
>>     
>
> I did exactly that last night, and also found the String$ function which was
> helpful in creating the string with all the zeros. I can now create a
> zero-value 8 GB file in about 60 seconds. My roots are in QuickBASIC 4.5,
> but my GAMBAS skills are improving the more I use it thankfully. This forum
> is probably the most helpful forum in forum history as well.
>
>
> Doriano Blengino wrote:
>   
>> I don't know the algorithm you want to use on so many data, but if you 
>> look deep at it, may be you find some shortcut to minimize read and 
>> write from a file. You could read data in chunks, calculate intermediate 
>> results in ram, and so on. If you try to randomly read 65537*65537 small 
>> pieces of data, your program will take forever to run, because the 
>> overhead of input/output will be overimposed on every two bytes. By 
>> reading 4 bytes at a time instead of two, you will gain twice the speed, 
>> and so on. If the calculus is heavy, then there will be a point of best 
>> compromise between the heavyness of I/O and heavyness of calculus. If 
>> the calculus is very-very-very heavy, then the I/O time will be
>> neglectable.
>>
>> Another nice thing could be to compress data to and from disk, if data 
>> can be managed by blocks. You could even reduce data to 1/10, and this 
>> would be a big improvement.
>>
>>     
>
> Those are accurate observations and my main devil at this point. Here's an
> explanation of the algorithm:
>
> http://www.gameprogrammer.com/fractal.html#diamond
> http://en.wikipedia.org/wiki/Diamond-square_algorithm
>
> Basically think of the file as a square grid of byte pairs (shorts), each
> representing an elevation point on a map. Each pair of bytes is read
> directly into a short for processing as necessary. Initially the four corner
> points are read and the algorithm is applied, generating the center point
> which is then written back to the file (figure b in the first link).
>
> The second part of the algorithm uses these five points plus four points
> initially outside the file to generate four more points (figure c in the
> first link). It's the same as the first part of the algorithm, but rotated
> 45 degrees and performed four times in different areas.
>
> After the two algorithm variations are applied they are repeated at
> half-scale across all the points, exponentially increasing the number of
> iterations per pass (1x1, 4x4, 16x16, 64x64, etc.).
>
> The only thing I can think to do to increase I/O efficiency without adding
> more overhead to the number crunching (which is already significant) would
> be to assign each read point to a small array of shorts so the points would
> not have to be read from disk more than once per pass.
>
> Another idea would be to wait for the subdivisions to become small enough
> that one could be loaded at once into a huge array of shorts (2 GB, for
> instance), then operate solely on that chunk until it was finished and
> commit it back to disk. Because of the second part of the algorithm's 45
> degree angle I'd still have to read some bytes that were outside the array,
> however. That would also disrupt my nice nested FOR...NEXT loop I have going
> to control the iterations.
>   
Surely this is what I thought at first. The algorithm starts by 
examining points far apart each other, but then, dicotomically, goes to 
smaller and smaller squares. At a certain point, the squares get so 
small to be easily kept in ram; say, a 1024*1024 square. At this point, 
if I well understand, there are no "pollutions" from the surrounding 
points, so you can use an array and call recursively a routine on it, 
giving the four vertexes.
The subroutine could look something like this:


    sub square_diamond(left, right, top, bottom as integer)

      step = (right+left) / 2
       if right-left <= MAX_ARRAY_SIZE then
          ' use internal cached data

          ' but first load data from disk if it is the first time
          if right-left = MAX_ARRAY_SIZE then load_array_from_disk(...)

          cache[(left+step, top+step] = (blah+blah+blah+...) / 4
       else
         ' seek in disk, load 4/8 values, calculate, and write back to disk
       endif

       ' apply this algorithm in the four resulting squares
       square_diamond(left, left+step, top, top+step)      ' upper left
       square_diamond(left+step, right, top, top+step)      ' upper right
       square_diamond(left, left+step, top+step, bottom)      ' lower left
       square_diamond(left+step, right, top+step,bottom)      ' lower right
       if step=1 then
          ' finished with this square. If this is the lower-rightest 
square of the cache, write it back to the disk
          ' sorry... too lazy to continue this...
    end

In the above code there are lots of bugs and omissions, but this is the 
whole idea.

Hope this helps - cannot think, at the moment, at anything better. 
Sometimes one can try to imagine at data in a way totally different; for 
example, are we sure we must see the file as 65537 rows, each of 65537 
columns? Could be that seeing it as a series of bigger and bigger frames 
(1 dot, the central one, then the 8 surrounding dots starting from top, 
then the next frame and so on) could help? Probably not, but this is 
just a quick temptative to see a different arrangement.

Regards,
Doriano

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Gambas-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gambas-user

Re: [Gambas-user] fastest way to "zero-out" a huge binary file

Reply via email to