Nice job! What version of VMWare are you using? Did you check to see if
the source code that comes with the Linux distribution of VMWare (came
with my version anyway) has the source code for reading/writing to the
disk images? I wonder if they used VMWare as an emulator, or even used
Bochs, while developing their algorithm (to test it). It seems to me
that this method could be used for a word processor too, to manage
memory.
The format I mentioned earlier might be better, however:
Let's say your linux partition is 1GB, and you want to make a disk image
but you aren't sure how big to make it. Make it dynamic like this:
- Have two files. One file would contain a seeries of INDEXes and the
other would contain a seeries of CLUSTERs.
- Choose a cluster size (in sectors), a word size (in bytes), and a
sector size (in bytes). For this example, let me say that you have
chosen a cluster size of 1 sector, a word size of 4 bytes, and a sector
size of 512 bytes.
- The index file contains a seeries of words (4-byte entries). Each word
contains a cluster number. The first word is always zero.
- To write to the first sector, you look at the 0th word of the index
file, and see that it contains a 0. Multiply that by the number of
clusters per byte (the product of the number of clusters per sector and
the number of sectors per byte). That is the offset of the first sector
of the first cluster of the CLUSTER file. If there were 2 sectors per
cluster, then the next sector would be found immediately after it. What
about sector number 3 when there are only 2 sectors per cluster? That's
not necessarilly immediately after the sector before it, in the file. To
find where it is, look at the second word of the index file. It contains
the CLUSTER OFFSET in the data file, for the corresponding CLUSTER.
- This has the benifiet of only allocating disk space for sectors
(actually clusters) that are really used.
- Unlike the VMWare format, this does not require you to decide in
advance how much you might use as a maximum. By using a different file
for the index, the number of indexes can grow as necessary.
- It did not occurr to me to use a third level. Seems quite useless.
So, to find out where in the cluster file (that file contains the actual
sectors) to look for a certain sector, first find out what cluster it
belongs to like this:
WhichClusterItBelongsTo = (WhichSectorItIsWhereTheFirstIsZero +
NumberOfSectorsPerCluster - 1) / NumberOfSectorsPerCluster;
Then, having the cluster it belongs to, multiply that by the word size
and look at that offset in the index file. Note that this is analogous
to the hardware paging system but it borrows idea from the FAT system
and takes advantage of the fact that files can grow in size (so we don't
have to decide in advance how many sectors to use; it's best to grow in
CHUNKS to reduce fragmentation, however).
BufferThatHasTheIndexFileInMemory is an unsigned *.
ClusterToReallyUse =
BufferThatHasTheIndexFileInMemory[WhichClusterItBelongsTo];
Now multiply that by the number of ClustersPerByte, and give the user
the sector in the cluster file at that (byte) offset. That's how you
read from clusters that already exist. To read from nonexistant
clusters, just return all 0's (pretend it's been preformatted).
We should use some kind of compression like this: if all bytes of all
sectors of a cluster are the same, then when more clusters are made that
contain completely the exact same byte, make it POINT to the former one.
This way if the USER fills 1 gigabytes worth of ÷'s (that's the
character DOS FORMAT fills things with), we won't really use that much
space. Then if the similarity ceses we shall 'unlink it' by duplicating
them and letting a change happen to only one copy. Kind of like a
'sector link'.
I just thought of something! This is pointless if we use some kind of
disk compression. Does Linux support it? I think 2000 does but I'm not
sure.
How to WRITE. Eventually the user will write to a sector for the first
time. This will mean GROWING the data file and index file at first. We
could do that right now: make it so that the disk image file
automatically grows. I bet it already does that for DOS.
But what if the user deletes a 200MB file? If we use intelligence, we
could monitor all disk accesses and if we tell the monitor what
filesystem we are using, then direct disk accesses could be conterted
more or less to host-level file operations. This would be especially
easy for the FAT filesystem. Then if a 200MB file gets deleted that much
disk space would become available to the host as well as the guest; thus
also to other guests.
But what if its an unknown filesystem? Then we consider what would
happen if a region of disk is SKIPPED. What if there are two partitions
created and the second partition, which is 10MB, is filled to the gills,
but the 200MB partition before it isn't used at all? If we only grow the
cluster/image file, then 210MB will be used up and taken away from the
host and other guests.
But if instead we use the method explained above, we would do this on
the first write to the 10MB partition which is really at offset 200MB in
the disk due to the partition before it: extend the INDEX file to 200MB
worth of cluster indexes. If each cluster is 4KB in size and each sector
is 512 bytes, and we are using a 32-bit integer for the cluster indexes,
this means that on that first write, not 200MB, but instead
200*1024*1024/4096*4 bytes, would be used, at first, for the index file.
That would take up only 200KB. That's a great savings. The first 200MB
worth of indexes (that is, the first 200*1024*1024/4096 = 51200 indexes)
would all be filled with -1 [all bits set] which would mean that no
space is allocated for those clusters. Then the next integer, would
contain a 0 in it, meaning the first 4096/512= 8 sectors can be found in
the first 512*8 bytes of the DATA file. Then if the second sector of the
10MB partition is used, it would be found at offste 512 in the data
image, on to the 8th sector. For the 9th sector (sector number 8) we
would use cluster number 1. That would mean extending the index file by
4 bytes, and we would write a 1 in there, meaning cluster 1 is mapped to
cluster 1 in the cluster image file. Etcetera, unless there ever is the
compression I mentioned above, unless we recognize when a file is
deleted and free it from our image file too (as I said it's easy for
FAT12 which is used for most floppy disks and also FAT16 and FAT32); OR:
What if the user, decides to write to a sector in the first cluster of
the LOGICAL image? Meaning, suppose they write to the first sector of
their 200MB partition. Then, we look at the corresponding entry in the
index file and see the -1 (all cluster entries are unsigned unless the
value is -1; in that case we say it is signed; or instead of -1 I could
say (unsigned)(-1); in any case all bits set).
Then we realize that we need to FRAGMENT. To do that, extend the data
file as though you were growing it and let the user use the next sector
in it. This won't get in the way the the user's next LOGICAL sector of
their 10MB partition, *if* we do this:
- Make the first cluster entry (corresponding to the first logical
cluster) equal to the number of clusters we have allocated space for, in
the cluster image file (CIF??). Thus if the user has used up the entire
10MB partition and then they use the first sector of the 200MB
partition, we would change the first index of the index file, from -1,
to 10*1024*1024/4096/512 = 5. Then when the user accesses sectors number
0..7 of their disk, they will really be accessing the 512-byte block
numbers 5*4096/512..5*4096/512+7 of the actual file. So they will think
they're accessing the first sectors when really they're accessing the
last sectors.
Then, suppose the user creates a 1-sector partition (hehe) after the
10MB partition. Then what? We need to find the next free cluster. We
will have a pointer to it stored somewhere along with the number of
clusters actually used and the number of clusters that are allocated but
not used (that are -1). Just use the next free cluster number. Then,
update the next free cluster number like this: if the next cluster after
the one just used involves growing, do it. Otherwise, if it's already
allocated, is it available (-1)? If so, use it. Otherwise, scan the
entire index file for the next -1 if the number of clusters that are
allocated, but not used, is nonzero; when we find one, decrease that
number of clusters allocated and use it. What if that number of clusters
allocated but available is 0? Then grow.
That is all.
-
As for using VMWare's format:
I could help you out, as I know C and C++ but to learn more about the
format I'd need to get Linux running *inside* bochs. I don't know enough
about Linux to make a distrubtion that can compile itself and run
VMWare, unfortunately, however.
[snip]
> Here's a Delphi listing of the header-record.
> A LongWord is a UINT32
>
> TVMBootRec = packed record
> // Magic: always 'COWD' or 1146572611, stands for COWDisk (where COW is ...?)
> Magic: array[0..3] of Char;
>
> // HeaderVersion: currently 1 (build 799, version 2.0.3)
> HeaderVersion,
>
> // Flags. ?? Values other than 3 get response.
> // b7..b0: b0: 1 = 'Root', 0 = child-node disk (??)
> // b1: 1 = 'Check capable'. Seems to get set automatically even if 0
> // b2: 1 = 'Inconsistent'. Forces check. Gets cleared on chk & fix.
> // No more flags: every value above 7 gives bugreport-request.
> Flags,
>
> // TotalSectors: projected disksize / max accessible sectors by VM
> TotalSectors,
>
> // leafSectors: as from log: "Size of 3rd-level blocks: 32 sectors"
> // # of sectors allocated per chunk, mostly 32 = 16k
> leafSectors,
>
> // OffsetToRoot: in sectors, 0-based. Normally 4 (= 5th sector = 0xA00)
> OffsetToRoot,
>
> // numRootEntries
> // Deduced from the logs:
> // "numRootEntries = 256, numSectors = 4192020, leaf = 16384"
> numRootEntries,
>
> // DiskFileSize: Size of diskfile in sectors, or, as VMWare logs it:
> // "Useless offset of the next sector to allocate: x sectors"
> DiskFileSize,
>
> Cylinders,
> HeadsPerCylinder: LongWord;
> SectorsPerTrack: Word;
>
> // Fill1: Blank or filled. VMWare doesn't seem to mind.
> Fill1: array[ 1.. (( $420-$2A ) ] of Byte;
>
> // ParentNodeTimeStamp: timestamp, purpose not known.
> // Note: logfiles also state a ParentNodeFilename, but VMWare doesn't load
> // this from any data described here.
> ParentNodeTimeStamp: LongWord;
>
> // LastModification, timestamp. "Last modification timestamp". Format not yet known
> LastModification: LongWord;
>
> // Note: from the VMWare-logs it seems here are an additional
> // "Node file name" and "Node description". Purpose not known/tested.
> // Shouldn't be here if Root-bit is set.
> NodeFileName: array[1..60] of Char; // *should be* 0-terminated. VM doesn't check.
> NodeDescription: array[1..512] of Char; // ditto
>
> // SavedTimeStamp: "Saved modification timestamp". Format not yet known
> SavedTimeStamp: LongWord;
> // DriveType
> DriveType: Array[0..3] of Char; // 'ide', 0 or 'scsi'
>
> // Fill2: Blank/filled (but if 1st byte filled and drivetype = 4 chars, panic)
> Fill2: array[ 1.. (( $800-$66C ) ] of Byte;
>
> // Header ends here. The above is what's created on an empty disk,
> // the data below gets added when needed (but who wants to do without
> // at least 1 root and 1 branch? ;)
> // These sizes are for my test-image. In reality they depend on
> // numRootEntries and ?? (not leafSectors, that is used for the leaves).
> // Looks like the size of a branch is hardcoded.
>
> // RootEntries: array[ 1.. (( $A00-$800 ) div SizeOf(LongWord)) ] of LongWord;
> // FirstBranch: array[ 1.. (( $1200-$A00 ) div SizeOf(LongWord)) ] of LongWord;
> end;
>
> -----BEGIN PGP SIGNATURE-----
> Version: PGP 6.5.8 -- QDPGP 2.61b
>
> iQA/AwUBOpRpWUtK1v9gXDEDEQIZ7wCbBuI2s5k4ajxBGf9LXEXQqJHJHkoAoPgo
> BGc2wRqXaEcgweOvBJOvEGWb
> =dGMq
> -----END PGP SIGNATURE-----
> ----
> Paul te Bokkel
> Sphere Information Technologies BV
> PGP key available and PGP preferred