On 16 Nov 2001, Jason E. Stewart wrote: > I'm building a tool that will enable scientists to load there > experimental data into a database. That data will come as spreadsheets > of data from scientists. Each group of scientists will use slightly > different technology to generate the data so those spreadsheets are > very likely to have different numbers of columns, and they will > certainly have different data types in the various columns, however > from one group, all the data should have the same format (or small set > of formats). > > So whatever solution I come up with, needs to be flexible and store > data no matter how many columns it has or what the data types for the > fields are. One complication is that there are likely to be millions > of rows from the spreadsheets, so I want it to be reasonable efficient > (no joins if possible). > > Variable length arrays seemed the obvious way to solve this. > > I just wanted to avoid having to create a new table for each > spreadsheet configuration. A small finite number of tables would be > fine, but I couldn't come up with a way. That's your problem right here. You _should_ create a new table for each spreadsheet as they are semantically different. Use unions to provide unified view of fields that are indeed common.
Skimping on proper relational design short-term will likely to bite you in long term. Putting dissimilar values into same field is only inviting trouble. -alex
