Gerson Kurz wrote:

> I've made some headway and can now both list (most) directories, and
> retrieve file data of ReiserFS partitions from Windows NT/2000. Description,
> Binary and Sourcecode can be found here:
>
> http://p-nand-q.com/reiser4win.htm
>
> It works, at least on my partitions. I'd be pleased to know from you if it
> works for you or where you have problems.
>
> I've also extended my "ReiserFS manual" a lot, see here:
>
> http://p-nand-q.com/reiserfs/reiserfs.htm
>
> It contains several things I didn't find out:
>
> For example, I still don't know what to make of "k_offset". It is not used
> in my code, and still I seem to be able to retrieve files and list
> directories. Where it "should" be used (for example, for directory listings
> spanning multiple blocks) I can do without and just always use the node next
> to the right). Whats going on here?

Without k_offset component of key the reiserfs tree could not be even built.

For example, if we have 1GB file then it will take 256000 of 4KB blocks and
thus, the corresponded indirect item will have 256000 of 4 bytes pointers.

So, the indirect item itself will take 1MB or 256 blocks. Now, to build
the tree we should assign unique key to each of these 256 blocks.

All keys here would be the same without k_offset field.

If we would like to read 100 bytes from the middle of this 1GB file then
we can do it in reiserfs by reading just a few blocks : root, internal, leaf
and unformatted node. To find proper internal and leaf node we have to use
k_offset field.

All indirect items are sorted by key, but in this case - by k_offset only,
because the rest 3 components of all keys are the same for this indirect item.

Similar problem we have with big directory that occupy a set of blocks.
Dir_id and Object_id and Item_type fields of all keys are the same for
each leaf node here. The keys differ only by its k_offset field.

Although, we use hash_value as k_offset for directories :
k_offset = hash_value = hash(name) & 0x7fffff80;

As you can see the first and the last 7 bits are zeroed.
The last 7 bits are used for generation counter which is used in case
of the hash collisions.

So, if 2 or more filenames have the same hash value then the generation
counter should be used. There can be no more than 128 hash collisions
for one directory in reiserfs currently.

All names in directory are sorted by hash, so when you know hash value
you can try to find the block (leaf node) which contains needed name
corresponded to given hash. After that, the name itself could be found
inside the block.

Again, we have to read no more than the tree height blocks to find
any filename in any size directory.

Thanks,
Yura.


>
>
> I cannot do links. I think I understand how links work (hardlinks at least)
> but I've got to make them work.
>
> Some "very" large files I seem unable to retrieve. I would guess that, just
> like with large directories, the right adjectant (if that word exists) nodes
> of the first indirect item are used for more indirect items, but somehow
> something seems to go wrong there.
>
> My code doesn't use hashes. Somehow, they just don't come up. (I saw that
> the "third part" of the file key is the hash, but it took me some hours
> searching the code until I found out that you use just some bits of the
> hash. Arg.)
>
> The documentation is missing the "root key" bit, I'll add that soon.
>
> I would be very thankful for corrections to that manual in terms of helping
> me understand what goes on. Please, have patience with me, I come from a
> windows background and am not a native speaker of english ;)
>
> Bye, Gerson Kurz

Reply via email to