Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: Yes it is exactly that. I will follow you advice and create a abstraction layer for the data access that will return the sparse dataset using the standard dataset as input. There is just one thing I disagree you said it that the performance is not good, right. However, it is practical! Nothing is easier and more practical than keeping the sparse representation inside of the database for my application. I think that you misunderstand the problems that come from doing it that way. The basic issue is this: the point of creating a system using an RDBMS is to separate your data from your application. This allows for various methods of data mining later. If you are presenting information in your DB this way, you are breaking that assumption and so you get very little (if any) benefit from using PostgreSQL instead of something like BDB. Database design is usually about concise and unambiguous representation of data and normalization is a part of this. This allows various applications or ad hoc queries to be able to draw against the data in meaningful ways. The traditional way of representing sparse data is to use a join. CREATE TABLE bird ( id SERIAL PRIMARY KEY, description TEXT, bname TEXT); CREATE TABLE bird_color ( bird_id INT REFERENCES bird (id), color TEXT, is_color BOOL, UNIQUE (bird_id, color)); Now, you can select bname from bird where (select is_color from bird_color where color = 'red' and bird_id = bird.id) The point is not that this makes it easier or harder from the stand point of your application but that it makes it easier to build secondary apps against the same data set, and that it avoids various ambiguities that could result from secondary data entry/analysis apps. People have suggested using an array, and that would be possible as well (undecided whether this breaks first normal form) but this will certainly cause more headache with secondary apps. Please understand that PostgreSQL is designed around a world where these secondary apps inevitably get built while a lot of commerical, off the shelf software assumes that they won't be. This is why many of us question the DB design of these apps. I hope that this helps explain some of the why's of this thread. Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Beyond the 1600 columns limit on windows
Yes it is exactly that. I will follow you advice and create a abstraction layer for the data access that will return the sparse dataset using the standard dataset as input. There is just one thing I disagree you said it that the performance is not good, right. However, it is practical! Nothing is easier and more practical thankeeping thesparse representation inside of the database for my application. On 11/8/05, John D. Burger [EMAIL PROTECTED] wrote: Evandro's mailing lists (Please, don't send personal messages to thisaddress) wrote: It has nothing to do with normalisation. It is a program for scientific applications. Datavalues are broken into column to allow multiple linear regression and multivariate regression trees computations.Having done similar things in the past, I wonder if your current DB design includes a column for every feature-value combination:instanceIDcolor=redcolor=bluecolor=yellow...height=71height=72-42 True False False 43 False TrueFalse44 False False True...This is likely to be extremely sparse, and you might use a sparserepresentation accordingly.As several folks have suggested, the representation in the database needn't be the same as in your code. Even SPSSthe most well-known statistic sw uses the same approach and data structure that my software uses. Probably I should use another data structure but would not be as eficient and practical as the one I use now.The point is that, if you want to use Postgres, this is not in factefficient and practical.In fact, it might be the case that mappingfrom a sparse DB representation to your internal data structures is =more= efficient than naively using the same representation in bothplaces.- John D. BurgerMITRE-- Evandro M Leite JrPhD Student Software developer University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: Yes it is exactly that. I will follow you advice and create a abstraction layer for the data access that will return the sparse dataset using the standard dataset as input. There is just one thing I disagree you said it that the performance is not good, right. However, it is practical! Nothing is easier and more practical than keeping the sparse representation inside of the database for my application. I bet even your application would profit from not handling sparse data. You could just not insert them into your tree instead of having to jump over empty elements. And there is always a way to lazily abstract the data to some frontend (While I doubt anybody can actuall scroll wide enough on a screen to see all the 1600 colums ;) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Beyond the 1600 columns limit on windows
Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres. Regards -Evandro-- Evandro M Leite JrPhD Student Software developerUniversity of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres. I don't think so. Are you sure you need more than 1600 columns? That's many more than I've ever wanted or needed. If you can share some details of the problem you are trying to solve, perhaps someone can see a different solution for you. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres. I would like to know who on earth needs 1600 columns and even beyond? Hint: you can have practically unlimited rows in your n:m table :-) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Beyond the 1600 columns limit on windows
I'm doing a PhD in data mining and I need more than 1600 columns. I gotan error message saying that I can not use more than 1600 columns. It is happening because I have to change categorical values to binarycreating new columns. Do you know if oracle can handle it? -- Evandro M Leite Jr. PhD Student Software developer University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroMobile 079 068 70740 Office 023 8055 3644 Home 023 8055 9160 On 11/8/05, Tino Wildenhain [EMAIL PROTECTED] wrote: Evandro's mailing lists (Please, don't send personal messages to thisaddress) schrieb: Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres.I would like to know who on earth needs 1600 columns and even beyond?Hint: you can have practically unlimited rows in your n:m table :-) ---(end of broadcast)---TIP 4: Have you searched our list archives?http://archives.postgresql.org -- Evandro M Leite JrPhD Student Software developerUniversity of Southampton, UKPersonal website: http://evandro.org Academic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: I'm doing a PhD in data mining and I need more than 1600 columns. I got an error message saying that I can not use more than 1600 columns. It is happening because I have to change categorical values to binary creating new columns. Perhaps you don't want a relational database at all if you are stretching it to match your client application in this way. Do I have it right that you have something like Table: bird_sighting_facts (bird, category, value) 1 | wingspan| 120mm 2 | beak-colour | red 3 | chest-colour| blue ... And are converting it into: expanded_bird_facts (bird, cat_wingspan, cat_beak_colour, cat_chest_colour, ...) In which case since you'll almost certainly be throwing away any relational integrity you had in the first case I'd just throw a lightweight wrapper around some dbfile files or similar. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) [EMAIL PROTECTED] writes: I'm doing a PhD in data mining and I need more than 1600 columns. I don't think so --- consider redesigning your data model instead. For instance, maybe you could combine similar columns into an array. Or split the table into an m:n linking structure. Even coming close to that implementation limit suggests bad SQL design; if we thought it was a realistic problem we would have increased it long ago... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: I'm doing a PhD in data mining and I need more than 1600 columns. I got an error message saying that I can not use more than 1600 columns. It is happening because I have to change categorical values to binary creating new columns. Do you know if oracle can handle it? pardon, but as PhD you should be able to do sensible database design. Even if you would have more then 1600 columns, you cannot expect very good performance with it (on nearly any database). I'd strongly recommend to replan your table layout. You can get help here if you provide more information on your plans. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Beyond the 1600 columns limit on windows
Alex Stapleton schrieb: On 8 Nov 2005, at 12:50, Tino Wildenhain wrote: Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres. I would like to know who on earth needs 1600 columns and even beyond? Hint: you can have practically unlimited rows in your n:m table :-) Well this screams random arbitrary limit to me. Why does this limit exist? What ever happened to the holy 0,1,infinity triumvirate? I guess it eases implementation and there is no reason to go so high on columns either. The limit could even be lower w/o and hurts but 1600 seems skyrocket high enough (read unlimited :-) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Beyond the 1600 columns limit on windows
Tino Wildenhain [EMAIL PROTECTED] writes: Alex Stapleton schrieb: Well this screams random arbitrary limit to me. Why does this limit exist? What ever happened to the holy 0,1,infinity triumvirate? I guess it eases implementation and there is no reason to go so high on columns either. The limit could even be lower w/o and hurts but 1600 seems skyrocket high enough (read unlimited :-) The rationale is laid out in excruciating detail in src/include/access/htup.h: /* * MaxTupleAttributeNumber limits the number of (user) columns in a tuple. * The key limit on this value is that the size of the fixed overhead for * a tuple, plus the size of the null-values bitmap (at 1 bit per column), * plus MAXALIGN alignment, must fit into t_hoff which is uint8. On most * machines the upper limit without making t_hoff wider would be a little * over 1700. We use round numbers here and for MaxHeapAttributeNumber * so that alterations in HeapTupleHeaderData layout won't change the * supported max number of columns. */ #define MaxTupleAttributeNumber 1664/* 8 * 208 */ /*-- * MaxHeapAttributeNumber limits the number of (user) columns in a table. * This should be somewhat less than MaxTupleAttributeNumber. It must be * at least one less, else we will fail to do UPDATEs on a maximal-width * table (because UPDATE has to form working tuples that include CTID). * In practice we want some additional daylight so that we can gracefully * support operations that add hidden resjunk columns, for example * SELECT * FROM wide_table ORDER BY foo, bar, baz. * In any case, depending on column data types you will likely be running * into the disk-block-based limit on overall tuple size if you have more * than a thousand or so columns. TOAST won't help. *-- */ #define MaxHeapAttributeNumber 1600/* 8 * 200 */ regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Beyond the 1600 columns limit on windows
On Tue, 2005-11-08 at 09:45, Tino Wildenhain wrote: Alex Stapleton schrieb: On 8 Nov 2005, at 12:50, Tino Wildenhain wrote: Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres. I would like to know who on earth needs 1600 columns and even beyond? Hint: you can have practically unlimited rows in your n:m table :-) Well this screams random arbitrary limit to me. Why does this limit exist? What ever happened to the holy 0,1,infinity triumvirate? I guess it eases implementation and there is no reason to go so high on columns either. The limit could even be lower w/o and hurts but 1600 seems skyrocket high enough (read unlimited :-) I'd have to vote with Tino here. Why worry about an arbitrary limit you should never really be approaching anyway. If a table has more than several dozen columns, you've likely missed some important step of normalization. Once you near 100 columns, something is usually horribly wrong. I cannot imagine having a table that actually needed 1600 or more columns. And, Evandro, nothing is free. If someone went to the trouble of removing the limit of 1600, we'd probably pay in some other way, most likely with poor performance. There are other, far more important features to work on, I'd think. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's == Evandro's mailing lists (Please, don't send personal messages to this address) [EMAIL PROTECTED] writes: [I would have replied to your personal address, but I'm not about to copy it from a footer.] Evandro's I'm doing a PhD in data mining and I need more than 1600 columns. I got an Evandro's error message saying that I can not use more than 1600 columns. Evandro's It is happening because I have to change categorical values to binary Evandro's creating new columns. Do you know if oracle can handle it? /me boggles You are doing a PhD in data mining, and you have a table that needs more than 1600 columns? /me gasps What are they *teaching* these days? If you have a design that has more than 20 or so columns, you're probably already not normalizing properly. There just aren't *that* many attributes of a object before you should start factoring parts of it out, even if it means creating some 1-1 tables. In programming, if I ever see someone name a sequence of variables, like thing1 and thing2, I know there's going to be trouble ahead, because that should have been a different data structure. Similarly, I bet some of your columns are foo1 and foo2. Signs of brokenness in the design. Or do you really have 1600 *different* attributes, none of which have a number in their name? That requires a serious amount of creativity. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 merlyn@stonehenge.com URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Beyond the 1600 columns limit on windows
On 8 Nov 2005, at 12:50, Tino Wildenhain wrote: Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres. I would like to know who on earth needs 1600 columns and even beyond? Hint: you can have practically unlimited rows in your n:m table :-) Well this screams random arbitrary limit to me. Why does this limit exist? What ever happened to the holy 0,1,infinity triumvirate? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Beyond the 1600 columns limit on windows
On 8 Nov 2005, at 16:06, Scott Marlowe wrote: On Tue, 2005-11-08 at 09:45, Tino Wildenhain wrote: Alex Stapleton schrieb: On 8 Nov 2005, at 12:50, Tino Wildenhain wrote: Evandro's mailing lists (Please, don't send personal messages to this address) schrieb: Hi guys, I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres. I would like to know who on earth needs 1600 columns and even beyond? Hint: you can have practically unlimited rows in your n:m table :-) Well this screams random arbitrary limit to me. Why does this limit exist? What ever happened to the holy 0,1,infinity triumvirate? I guess it eases implementation and there is no reason to go so high on columns either. The limit could even be lower w/o and hurts but 1600 seems skyrocket high enough (read unlimited :-) I'd have to vote with Tino here. Why worry about an arbitrary limit you should never really be approaching anyway. If a table has more than several dozen columns, you've likely missed some important step of normalization. Once you near 100 columns, something is usually horribly wrong. I cannot imagine having a table that actually needed 1600 or more columns. And, Evandro, nothing is free. If someone went to the trouble of removing the limit of 1600, we'd probably pay in some other way, most likely with poor performance. There are other, far more important features to work on, I'd think. Oh wait, PG is written in C isn't it. I guess fixed size things are a bit easier to deal with. Pardon me then :) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Beyond the 1600 columns limit on windows
Sorry, It has nothing to do with normalisation. It is a program for scientific applications. Datavalues are broken into column to allow multiple linear regression and multivariate regression trees computations. Even SPSSthe most well-known statistic sw uses the same approach and data structure that my software uses. Probably I should use another data structure but would not be as eficient and practical as the one I use now. Many thanks -Evandro On 08 Nov 2005 05:30:07 -0800, Randal L. Schwartz merlyn@stonehenge.com wrote: Evandro's == Evandro's mailing lists (Please, don't send personal messages to this address) [EMAIL PROTECTED] writes:[I would have replied to your personal address, but I'm not aboutto copy it from a footer.]Evandro's I'm doing a PhD in data mining and I need more than 1600 columns. I got an Evandro's error message saying that I can not use more than 1600 columns.Evandro'sIt is happening because I have to change categorical values to binaryEvandro's creating new columns. Do you know if oracle can handle it? /me bogglesYou are doing a PhD in data mining, and you have a table that needsmore than 1600 columns?/me gaspsWhat are they *teaching* these days?If you have a design that has more than 20 or so columns, you're probably already not normalizing properly.There just aren't *that*many attributes of a object before you should start factoring parts ofit out, even if it means creating some 1-1 tables.In programming, if I ever see someone name a sequence of variables, like thing1 and thing2, I know there's going to be trouble ahead,because that should have been a different data structure.Similarly,I bet some of your columns are foo1 and foo2.Signs of brokenness in the design.Or do you really have 1600 *different* attributes, none of which havea number in their name?That requires a serious amount ofcreativity. :)--Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 merlyn@stonehenge.com URL:http://www.stonehenge.com/merlyn/Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!-- Evandro M Leite JrPhD Student Software developer University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages
Re: [GENERAL] Beyond the 1600 columns limit on windows
Well this screams random arbitrary limit to me. Why does this limit exist? What ever happened to the holy 0,1,infinity triumvirate? I guess it eases implementation and there is no reason to go so high on columns either. The limit could even be lower w/o and hurts but 1600 seems skyrocket high enough (read unlimited :-) It is probably what fits in a single block. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: Sorry, It has nothing to do with normalisation. It is a program for scientific applications. It has everything to do with normalisation. You appear to be pushing application presentation issues into the structure of your database. If SQL allowed you, this would break 1NF. Data values are broken into column to allow multiple linear regression and multivariate regression trees computations. Sounds like you want an array then (or perhaps several arrays). Even SPSS the most well-known statistic sw uses the same approach and data structure that my software uses. Ah - and they've made a good choice? Probably I should use another data structure but would not be as eficient and practical as the one I use now. The structure you use inside your application and the data definition used by the database are two separate things. You presumably are doing some transformation of data on fetching it anyway - I'd switch rows-columns over then. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Beyond the 1600 columns limit on windows
Evandro's mailing lists (Please, don't send personal messages to this address) wrote: It has nothing to do with normalisation. It is a program for scientific applications. Data values are broken into column to allow multiple linear regression and multivariate regression trees computations. Having done similar things in the past, I wonder if your current DB design includes a column for every feature-value combination: instanceID color=red color=blue color=yellow ... height=71 height=72 - 42 True False False 43 False TrueFalse 44 False False True ... This is likely to be extremely sparse, and you might use a sparse representation accordingly. As several folks have suggested, the representation in the database needn't be the same as in your code. Even SPSS the most well-known statistic sw uses the same approach and data structure that my software uses. Probably I should use another data structure but would not be as eficient and practical as the one I use now. The point is that, if you want to use Postgres, this is not in fact efficient and practical. In fact, it might be the case that mapping from a sparse DB representation to your internal data structures is =more= efficient than naively using the same representation in both places. - John D. Burger MITRE ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Beyond the 1600 columns limit on windows
On Tue, Nov 08, 2005 at 02:14:58PM -0500, John D. Burger wrote: Evandro's mailing lists (Please, don't send personal messages to this address) wrote: It has nothing to do with normalisation.? It is a program for scientific applications. Data?values are broken into column to allow multiple linear regression and multivariate regression trees computations. Having done similar things in the past, I wonder if your current DB design includes a column for every feature-value combination: instanceID color=red color=blue color=yellow ... height=71 height=72 - 42 True False False 43 False TrueFalse 44 False False True ... This is likely to be extremely sparse, and you might use a sparse representation accordingly. As several folks have suggested, the representation in the database needn't be the same as in your code. Even SPSS?the most well-known statistic sw uses the same approach and data structure that my software uses. Probably I should use another data structure but would not be as eficient and practical as the one I use now. The point is that, if you want to use Postgres, this is not in fact efficient and practical. In fact, it might be the case that mapping from a sparse DB representation to your internal data structures is =more= efficient than naively using the same representation in both places. s/Postgres/just about any database/ BTW, even if you're doing logic in the database that doesn't mean you have to stick with the way you're representing things. There's ways to get the same info via conventional SQL that doesn't involve building a huge crosstab. Something interesting is that the data structure presented here looks a hell of a lot like a bitmap index, something new in 8.1 (well, at least bitmap index scans). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend