I have a program that creates various database objects in PostgreSQL. There is a DOM, and for each element in the DOM, a database object is created (schema, table, field, index and tablespace).

I do not want this program to generate very long identifiers. It would increase SQL parsing time, and don't look good. Let's just say that the limit should be 32 characters. But I also want to recognize the identifiers when I look at their modified/truncated names.

So I have come up with this solution:

- I have restricted original identifiers not to contain the dollar sign. They can only contain [A-Z] or [a-z] or [0-9] and the underscore. Here is a valid example:


- I'm trying to use a hash function to reduce the length of the identifier when it is too long:

class Connection(object):
    # ... more code here
    def makename(cls, basename):
        if len(basename)>32:
            h = hashlib.sha256()
            tail = base64.b64encode(h.digest(),"_$")[:10]
            return basename[:30]+"$"+tail
            return basename

Here is the result:

print repr(Connection.makename("some_field_name"))
print repr(Connection.makename("group1_group2_group3_some_field_name"))

So, if the identifier is too long, then I use a modified version, that should be unique, and similar to the original name. Let's suppose that nobody wants to crack this modified hash on purpose.

And now, the questions:

* Would it be a problem to use CRC32 instead of SHA? (Since security is not a problem, and CRC32 is faster.) * I'm truncating the digest value to 10 characters. Is it safe enough? I don't want to use more than 10 characters, because then it wouldn't be possible to recognize the original name. * Can somebody think of a better algorithm, that would give a bigger chance of recognizing the original identifier from the modified one?




Reply via email to