On 05/21/2010 08:03 AM, Thomas Berg wrote:
> As I have followed some threads lately and also many
> times earlier about the file system in general and PDS(E)s
> in particular I got an (maybe OT) idea.  (Beware! :) )
> 
> The goal is to be able to have any string in general as
> a "data set name"/file name and in particular *nix type
> name/structure.  And that in z/OS native.
> 
> As we have 44 bytes available in the catalog(s), I think we can
> do something like this:
> 
> - Use 16 bytes for a hash (MDM5/SHA-2) of the file name.
> - Use 16 bytes for a hash of the "dir path".
> - Alternatively using 16 bytes hash for the whole string of
> path and file name.  But I think that separate hashes would
> get some performance benefits when handling "directorys".
> - We must probably use an (initial?) byte with e g nulls for avoiding
> collision with the old data set names.
> - As an additional option we could use maybe 4 bytes for file
> version/generation handling.
> 
> E g we have a file "Just a test etc." which have the hash
> x'F296A5AE68F284954EBF47EC5EEFD72E', in the "dir"
> "/First level dir/Second level dir/" which have the hash
> x'847FD35FD88274EC0EDA528E7CD7A65A'.
> 
> So the 44 (?) bytes would maybe look like:
> x'00F296A5AE68F284954EBF47EC5EEFD72E847FD35FD88274EC0EDA528E7CD7A65A0000000000000000000000'.
> 
> Enq:s would then also have the hash as keys etc.
> 
> 
> Am I an idiot or just a typical programmer ?  :)
> Regards,
> Thomas Berg
> _________________________________________
> Thomas Berg   Specialist   A M   SWEDBANK
...

The obvious problem with this of course is that hash functions by their
very nature (mapping a larger domain to a smaller range) cannot be
one-to-one.  They work reliably for such things as symbol table lookup
only because you also have the full string available, both as the search
argument and in the table in order to resolve collisions when two
strings hash to the same value.  A hash value unaccompanied by the full
string does not represent a unique string and is thus ambiguous.

Most programmers/users would not find it acceptable to request a
read/update/enqueue on "file a" and have it give the same results as a
reference to some unknown and unrelated "file x", just because they
happened to hash to the same value.

If the hash function mapped to an equal or larger range, then it would
be possible to have uniqueness; but then the hash value would require
more bits to represent it than the original string of symbols and
nothing would be gained by the substitution.

-- 
Joel C. Ewing, Fort Smith, AR        jremoveccapsew...@acm.org

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to