Thank you for the reply >> Primary table is all integers like: >> >> date id | num1 | num2 | num3 | num4 | num5 | num6 | num7 | num 8 >> ------------------------------------------------------------------------------------------------- >> primary key is on date to num->6 columns > >>> What types are num1->8? > > They are all integer
>>Hmm - not sure if you'd get any better packing if you could make some >>int2 and put them next to each other. Need to test. Thanks, I find virtually nothing on the int2 column type? beyond brief mention here http://www.postgresql.org/docs/8.2/interactive/datatype-numeric.html#DATATYPE-INT Could i prevail on you to expand on packing wtih int2 a bit more, or point me in the right direction for documentation? If there's some way I can pack multipe columns into one to save space, yet still effectively query on them, even if it's a lot slower, that would be great. My current scheme, though as normalized and summarized as I can make it, really chews up a ton of space. It might even be chewing up more than the data files i'm summarizing, I assume due to the indexing. Regading saving disk space, I saw someone mention doing a custom build and changing TOAST_TUPLE_THRESHOLD/TOAST_TUPLE_TARGET So data is compressed sooner, it seems like that might be a viable option as well. http://www.thescripts.com/forum/thread422854.html > 2,500 rows per hour, with duplicate date columns, seems like it could > add up though. >>>Well, let's see 2500*24*365 = 21,900,000 * 4 bytes extra = 83MB >>>additional storage over a year. Not sure it's worth worrying about. Ahh yes probably better to make it a date w. timestamp column then. Z Richard Huxton <dev@archonet.com> wrote: Zoolin Lin wrote: > thanks for your reply > >> Primary table is all integers like: >> >> date id | num1 | num2 | num3 | num4 | num5 | num6 | num7 | num 8 >> ------------------------------------------------------------------------------------------------- >> primary key is on date to num->6 columns > >>> What types are num1->8? > > They are all integer Hmm - not sure if you'd get any better packing if you could make some int2 and put them next to each other. Need to test. >> date_id | date w timestamp ---------------------------------------- >> 1 | 2007-2-15 Midnight 2 | 2007-2-15 1 am 3 | >> 2007-2-15 2 am etc for 24 hours each day > >>> If you only want things accurate to an hour, you could lost the >>> join and just store it as an int: 2007021500, 2007021501 etc. > > Hmm yeh I could, I think with the amount of data in the db though it > behooves me to use one of the date types, even if via lookup table. You can always create it as a custom ZLDate type. All it really needs to be is an int with a few casts. > So I guess I'm just not sure if I'm really gaining anything by using > an integer date id column and doing a join on a date lookup table, > vs just making it a date w. timestamp column and having duplicate > dates in that column. > > I would imagine internally that the date w. timestamp is stored as > perhaps a time_t type plus some timezone information. I don't know > if it takes that much more space, or there's a significant > performance penalty in using it It's a double or int64 I believe, so allow 8 bytes instead of 4 for your int. > 2,500 rows per hour, with duplicate date columns, seems like it could > add up though. Well, let's see 2500*24*365 = 21,900,000 * 4 bytes extra = 83MB additional storage over a year. Not sure it's worth worrying about. -- Richard Huxton Archonet Ltd --------------------------------- No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started.