> -----Original Message----- > From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- > ow...@postgresql.org] On Behalf Of Clark C. Evans > Sent: Friday, November 12, 2010 12:25 PM > To: Mark Mitchell; 'Tom Lane' > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] More then 1600 columns? > > On Fri, 12 Nov 2010, Tom Lane wrote: > > Generally, wanting more than a few dozen columns is a > > good sign that you need to rethink your schema design. > > What are you trying to accomplish exactly? > > Generally speaking, yes. However, survey instruments are a > very legitimate design where this is not only correct, but > quite desirable. Sure -- in an ideal world you might be > able to turn your instrument into something that is a highly > normalized structure. However, it's simply not practical.
Two tables with a foreign key is not practical? > PostgreSQL absolutely should look into supporting this. While > the storage may not be ideal, it is an increasingly common case. > Perhaps the storage engine could permit a set of columns to be > tagged for ``sparse column storage``? Reminds me of a quote from the movie Pinocchio: "Give a boy enough rope and he'll hang himself." Is a fundamental design change for the purpose of allowing massive denormalization really a good idea? > On Fri, 12 Nov 2010, Mark Michell wrote: > > Yes I understand that this is "bad design" but what we are doing is > > storing each form field in a survey in its own column. For very long > > surveys we end up with thousands of elements. > > It's *not* necessarily bad design. If you've got a survey > instrument that has 1000 unique data points, it is a reasonable > thing to do. We (Prometheus Research) have these sorts of > things all the times. > > #1 You want to store and query them based on specific fields. Use a join > #2 You want to enforce the data integrity of each column > by using types, enums and other techniques. Non-sequiter > #3 You want to integrate this structure with the other > aspects of your database. Foreign keys > > I know storing in an array is possible but it makes it > > so much easier to query the data set when each element > > is in its own field. comments on why I should not do this > > Using HSTORE or using an EAV table completely undermines > the ability to ensure column-level typing, constraints, > and... use SQL. With arrays / EAV table structure, you > end up having to re-invent your query language. It is a > non-solution. > > > the possible alternatives > > The work-around we use is to break our instrument into > one table per ``section``. We also save each section as the > user completes that part of the instrument. This has the > advantage of allowing you to model repeating sections. We > typically enforce column uniqueness across all sections. > This requires your application and query tool to be a bit > smart about making a virtual table with all 1000+ columns. > Kinda sucks, but it works. > > ... > > I guess the other option is to look at a RDF data store where > you use SPARQL to query the data. OpenJena SDB is implemented > as a 3-table storage in PostgreSQL if you wish to keep the same > database for both your assessment storage and other tables. I'm > not lovin' this option, but it's on the table for our company. > > Instead, I'd absolutely love to see PostgreSQL add a variant > scalar type (for when the type of a column changes over the > lifetime of the instrument) and a column storage for Nth+ > and additional columns. For now, the work-around above sort > of works for us. Sometimes it is a good idea to denormalize. For instance, performance might improve if you put bill_to_address and ship_to_address in the customer record instead of joining against an address table. But if you have thousands of different child types, it is a good idea to ask yourself about the value of denormalization verses the cost. There is a reason that BCNF was invented in 1974. Normally, I am all for the idea of making a database more capable. But if that design change encourages terrible practice I can think of better ways to expend programmer time and effort (namely on changes that encourage good practice and increased productivity). I would be very curious to hear of a real case where there is an advantage to having many thousands of columns in a table verses using a child table. I have never seen such a thing in real life. I am not opposed to the idea that such a thing is possible. It is just that I have always found the child table is simply better. If (for access) the single table seems simpler, then a view can be used. IMO-YMMV -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general