I'm working on a web app for a quality control checklist. I already
have a table set up, but I have a hunch that our model is sub-optimal
and I could get some better performance.I'm hoping someone on this
list can help me think clearly about how to express this efficiently
in SQL.

Each checklist has dozens, sometimes hundreds of questions. Each
question has between 2 and 10 possible answers. Each Question is a
varchar string, and so is each answer. A completed checklist is when
all of the questions are associated with one of its possible answers
-- i.e., when one answer is chosen.

Checklists are different for different purposes, and they can change
over time. So to keep completed checklists from inadvertently changing
when we want to have changes in new checklists, we have templates.
Templates, Questions, and Answers are a mirror of Checklists,
Questions, and Answers, and represent the 'current version' of the
checklist. We don't do the checklists ourselves, but provide them for
our clients, so that is another level of grouping going on.

So the current table hierarchy looks like this

- Clients
 - Templates
   - TemplateQuestions
     - TemplateQuestionAnswers
 - Checklists
   - ChecklistQuestions
     - ChecklistQuestionAnswers

Because we don't want changes in the current template to 'go back in
time' and change completed checklists, data is copied from Templates
into Checklists when a user goes to start a new checklist.

As you can guess, this creates a lot of duplication. In
ChecklistQuestionAnswers, out of about a million answer rows, there
are only 4,000 distinct answers. Of course, TemplatesQuestionAnswers
has duplication too, but not as bad.

So what I'm think I want to do is create a versioning system for
checklist templates, so I can save on space by storing unique
questions with unique sets of answers only once. That way, instead of
duplicating text wholesale, I can just link a Checklist against a
*version* of a Template, and then a checklist set is which answer was
chosen for which question.

Here's what I've sketched out so far.

"A clients has many templates. A template has many revisions, but only
one current revision. Each revision has many questions, and each
question has many ( between 2 and 10 ) answers. Each Checklist relates
to one Template. Each checklist has a set answers that indicate the
answer select for each question in its version of the template."

    Questions /* all unique question wordings */
    Questions.id
    Questions.question

    Answers /* all unique answer wordings. */
    Answers.id
    Answers.answer

    Templates
    Templates.client_id /* relates to client table. */
    Templates.template_name
    Templates.current_version /* this is related to
TemplateVersions.version_number */

    TemplateVersions /* A logical grouping of a set of questions and answers */
    TemplateVersions.version
    TemplateVersions.template_id /* relates this version to a template. */

    TemplateQuestions
    TemplateQuestions.template_version /* relates a question to a
template version */
    TemplateQuestions.question_id /* relates a unique question to this
template version */
    TemplateQuestions.id

    TemplateQuestionAnswers
    TemplateQuestionAnswers.template_question_id /* relates this
answer to a particular template version question */
    TemplateQuestionAnswers.answer_id /* relates the unique question
to a unique answer */
    TemplateQuestionAnswers.id

    Checklists
    Checklists.id
    Checklists.template_version /* relates this question to a template
version -- associating this checklist to a client happens through this
relationship */

    ChecklistAnswers /* ( I might call this something other than
'Answers' since the lack of ChecklistQuestionAnswers breaks 'name
symmetry' with TemplateQuestionAnswers ) */
    ChecklistAnswers.checklist_id
    ChecklistAnswers.question_id
    ChecklistAnswers.answer_id

The rub I'm getting hung up on is guaranteeing that ChecklistAnswers
associates a proper question-and-answer pair  -- the relationship that
exists in the version of the Template that its Checklist parent is
referencing.

In other words, each row in ChecklistAnswers must 'mirror' a
question_id from TemplateQuestions to one child question from
TemplateQuestionAnswers, form the template_version in Checklists. I'm
trying to think of how to do this and my thinking process short
circuits here. This is really the 'deliverable' of the database -- a
completed checklist -- so all the other templates and everything is
sort of epiphenomenal or an abstraction of that. If I can't get this
working, I've missed the whole point!

This seems a *little* unwieldy, so I'm wondering if I'm making a
solution whose complexity is not worth the space-savings I might get
from implementing it.

Also note, I've simplified this a bit. There are other dimensions of
complexity, such as a category system for grouping questions for
reporting, but I don't think we need to get into that here.

-- 
"Computers are useless. They can only give you answers"
-- Pablo Picasso

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Reply via email to