On 05/21/2010 08:03 AM, Thomas Berg wrote: > As I have followed some threads lately and also many > times earlier about the file system in general and PDS(E)s > in particular I got an (maybe OT) idea. (Beware! :) ) > > The goal is to be able to have any string in general as > a "data set name"/file name and in particular *nix type > name/structure. And that in z/OS native. > > As we have 44 bytes available in the catalog(s), I think we can > do something like this: > > - Use 16 bytes for a hash (MDM5/SHA-2) of the file name. > - Use 16 bytes for a hash of the "dir path". > - Alternatively using 16 bytes hash for the whole string of > path and file name. But I think that separate hashes would > get some performance benefits when handling "directorys". > - We must probably use an (initial?) byte with e g nulls for avoiding > collision with the old data set names. > - As an additional option we could use maybe 4 bytes for file > version/generation handling. > > E g we have a file "Just a test etc." which have the hash > x'F296A5AE68F284954EBF47EC5EEFD72E', in the "dir" > "/First level dir/Second level dir/" which have the hash > x'847FD35FD88274EC0EDA528E7CD7A65A'. > > So the 44 (?) bytes would maybe look like: > x'00F296A5AE68F284954EBF47EC5EEFD72E847FD35FD88274EC0EDA528E7CD7A65A0000000000000000000000'. > > Enq:s would then also have the hash as keys etc. > > > Am I an idiot or just a typical programmer ? :) > Regards, > Thomas Berg > _________________________________________ > Thomas Berg Specialist A M SWEDBANK ...
The obvious problem with this of course is that hash functions by their very nature (mapping a larger domain to a smaller range) cannot be one-to-one. They work reliably for such things as symbol table lookup only because you also have the full string available, both as the search argument and in the table in order to resolve collisions when two strings hash to the same value. A hash value unaccompanied by the full string does not represent a unique string and is thus ambiguous. Most programmers/users would not find it acceptable to request a read/update/enqueue on "file a" and have it give the same results as a reference to some unknown and unrelated "file x", just because they happened to hash to the same value. If the hash function mapped to an equal or larger range, then it would be possible to have uniqueness; but then the hash value would require more bits to represent it than the original string of symbols and nothing would be gained by the substitution. -- Joel C. Ewing, Fort Smith, AR jremoveccapsew...@acm.org ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html