On 6/3/2015 2:50 AM, Adrian Stern wrote:
Hi William, thanks for joining the conversation.

1) We do hope for constraints since a connection to an ERP system is possible in the future. We want to plan ahead.

2) As for the subclass approach: I would need about 30 subclasses and it will get really hard to add new products since a change in the database will be necessary each time. That's why we want a more generic approach.

At the root of your description it appears to me that you are choosing essentially an EAV design pattern. Constraints and case specific validation become difficult (or impossible) to consistently enforce. I have personal experience in cleaning up a system that used this *exact* pattern (product -> product attribute). Different developers approached updating information in different ways. They also chose to store the attributes as text to avoid the text vs number issue. However, they failed to force any validation and any hint of attempting to prevent duplicates. That destroyed hope of supporting an underlying business need to search for specific values during data analysis. (think of how many different ways you can misspell "poughkeepsie" - but hey... it's "just a name")

I inherited the results of poor development controls and poor control over the weakness of the design - e.g. validation... and the cleanup has been long, and painful.

I think you should evaluate your unease with having to update the database on release (potentially many times) carefully for what it is and why you have it. [I'm not saying it is invalid - just know why you have it] Because no matter how well you design your system - databases evolve. Manage that.

Anybody can muck up part of a project and cause garbage - but speaking from experience... this design pattern really encourages it. If you choose to use it - then you HAVE to control where and when inserts/updates are done and be very careful with specifying how validation is to be done to the entire development team (and potentially the users)... and then review the code (and/or data) regularly.

wide tables/sparse fill on the columns ... I haven't tried to calculate the overhead on this... but disk space is generally considered to be cheap. [that doesn't mean your case wouldn't be criminally wasteful] Choosing 1 wide table or 30 sub-tables to deal with detail data. I don't know how that directly effects Postgres' performance.... if you choose 30 tables ... meh... at least you'll know exactly where your data is - and exactly what is allowed for each and every variant. Remember to enforce a consistent guessable naming convention.

All that said - there are reasons to use this pattern to gain the power of the dynamically expandable configuration of allowed values. I just see the use cases where the gain (flexibility) is worth the pain (validation control) to be few and far between.

just my $0.01

Roxanne



Maybe I don't understand you right, because of the language barrier. Can you provide me a link to a subclassing example? -> https://docs.djangoproject.com/en/1.8/topics/db/models/#multi-table-inheritance ?

ORM is a given in my case. This is not a high performance application.

Freundliche Grüsse

Adrian Stern
unchained - web solutions

adrian.st...@unchained.ch <mailto:adrian.st...@unchained.ch>
+41 79 292 83 47

On Tue, Jun 2, 2015 at 5:35 PM, William Dunn <dunn...@gmail.com <mailto:dunn...@gmail.com>> wrote:

    Hello Adrian,

    May I ask why you need a non-standard model? By standard models I
    mean the following:

    1) When you don't need to have subclass specific database
    constraints: All subclasses in the same table, subclasses that do
    not have an attribute have that column null. This has the best
    performance because no joins are required when querying both
    superclass and subclass attributes, and all the data for an object
    will be in the same block on disk. The disadvantage of this is
    that you cannot enforce constraints, such as not-null, on subclass
    specific attributes columns because the constraints would also be
    applied to the superclasses. If you can ensure that your
    application is the only way data can be inserted those constraints
    will naturally be enforced there.

    2) When you need subclass specific database constraints: Use an
    ORM such as Django's ORM or SQLAlchemy which has one table with
    the superclass attributes and a table for each subclass with their
    subclass specific attributes. This is slower because joins will be
    needed and the data for an object will be in 2 different blocks of
    disk but it allows you to enforce constraints within the database
    which will be checked whenever any application tries to insert
    values. There is a lot of complexity added because there will be
    so many small tables and indexes but the ORM takes care of that
    for you.

    *Will J. Dunn*
    _willjdunn.com <http://willjdunn.com>_

    On Mon, Jun 1, 2015 at 10:35 AM, Adrian Stern
    <adrian.st...@unchained.ch <mailto:adrian.st...@unchained.ch>> wrote:

        Hi, I'm new

        I've been working as the sole administrator of various
        postgresql projects for a while now. All of which where django
        projects.
        Since a new project is starting and we've found the need for a
        more generic approach I would like to ask a few questions.

        I would like to implement a pattern similar to the product
        feature pattern explained in the silverstone book - the data
        model resource book vol 1. It is simply explained. There is a
        Table PRODUCT holding the fields all the products share, then
        there is the table PRODUCT_FEATURE, both of them in a “many to
        many“ relationship.

        PRODUCT <--- m -------- n ---> PRODUCT_FEATURE (a table in
        between of course)

        PRODUCT_FEATURE --> PF
        PRODUCT --> P
        TABLE IN BETWEEN --> TIB

        PF defines the feature Type while P stands for the product the
        feature is applied to. Some of these PF can have values of
        different types (text, numbers, floating, blob, ...) which
        would be applied to TIB.

        I don't like the idea of having numerous empty fields prepared
        in TIB, just to store occasional values of different types,
        therefore I need to specialize those TIB Values.

        Now how would I do That?

        I could create some tables solely for the means of holding
        [NUM], [TEXT], [BLOB], [ETC] and reference them with the TIB
        PK. When using them I could create a view TIBV containing all
        of [NUM, TEXT, BLOB, ETC] in the same column called Value, and
        join it with TIB to get the value of a PF.

        But is this a good idea?
        Is there a better way?

        Also, I would have to create a pivot table in order to list
        all the products with all the features. As this is implemented
        in C (afaik) I suppose it is rather fast or at least fast
        enough, but I do not actually know. What I know is, there are
        about 30 Product Types and around 50 possible product
        features. One product can have up to approximately 25 PF but
        are mostly around 5 to 10.

        Do you think a pivot table is a good idea?
        What alternative do i have?

        There is room for caching since the dataset is not updated too
        often.

        regards, adrian





--
[At other schools] I think the most common fault in general is to teach 
students how to pass exams instead of teaching them the science.
Donald Knuth

Reply via email to