Re: [GENERAL] ERROR: relation . . . does not exist
You have made clear to me why my attempt for a RFE for COPY FROM CVS has found some technical resistance/disagreement, but I still think my idea even if not so popular for concrete and cultural reasons makes at least sense to some people It's a perfectly reasonable problem to want to solve; the question is whether COPY is the right place to solve it. I would think that a tool that reads the CSV data and produces a proposed schema definition for the table would be a more generally-useful approach. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: relation . . . does not exist
> ... are times local or UTC ~ this is a rather semantic, not a syntactic issue that some code could NOT decide based on the data it reads ~ > Should we assume integer or float? ~ is a dot anywhere in the data you read in for that particular column? ... ~ > Varchar or text? ~ Is the length of the data read in always less than 255 bytes ( or characters?)? ... ~ You have made clear to me why my attempt for a RFE for COPY FROM CVS has found some technical resistance/disagreement, but I still think my idea even if not so popular for concrete and cultural reasons makes at least sense to some people ~ DBAs ussualy have a mental map of the data they have on each table, etc; whereas as a data analyst you find yourself constantly reading in, cleasing and marshaling data from which you have no prior knowledge ~ lbrtchx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] indexes on functions and create or replace function
On Thu, Aug 28, 2008 at 7:45 PM, Matthew Dennis <[EMAIL PROTECTED]> wrote: > Another question though. Since I could potentially start transaction, drop > indexes/checks, replace function, create indexes/checks, commit tranasaction > could I deal with the case of the constant folding into the cached plan by > flushing the entire cache in the same transaction? Is cache flushing > transactional? The cases I have for this are infrequent in time and the > overhead of reindexing things, rechecking checks/unique indexes already > dwarf the performance lost to flushing the cache. > > On a related note, if I had a maintenence window where I can shutdown all > DB access, make the referenced changes to the > functions/indexes/caches/checks and restart PG - in your opinion, are there > other likely problems to changing an immutable function under those > circumstances, or should that be pretty safe? In other words, I have a > function that has indexes on it that does the wrong thing - what do I do to > replace it? > In the thread below, we kind of got side tracked on some other stuff and I never got an answer to the questions above. Does anyone have any insight/suggestions about the best way to replace a function that is used by an index? http://groups.google.com/group/pgsql.general/browse_thread/thread/92289ef0c2f5a109/8f96fb24bdd668e8
Re: [GENERAL] ERROR: relation . . . does not exist
On Aug 30, 2008, at 10:33 AM, Albretch Mueller wrote: well, yeah! I would totally agree with you, but since I doubt very much "COPY FROM CSV" is part of the SQL standard to beging with, why not spice it up a little more? I'd guess that coming up with a general algorithm to guess the type from a column of CSV text would satisfy no one, since we'd always miss a particular case that is important to someone (are times local or UTC? Should we assume integer or float? Varchar or text?), and the option is a forest of switches that would be extremely complex and error prone. This sounds very much like an application-domain problem, best solved in the application domain. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] DUPS in tables columns ERROR: column ". . . " does not exist
On Sat, Aug 30, 2008 at 01:36:25PM -0400, Albretch Mueller wrote: > > The system is smart enough to only do the count() once. > ~ > But not smart enough to make a variable you declare point to that > internal variable so that things are clearer/ easier ;-) The SQL standard has pretty clear rules about what variables can be referenced from where, and this is one of those places (the rationale is probably in there too). Have a nice day -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [GENERAL] ERROR: relation . . . does not exist
On Saturday 30 August 2008 9:42:19 am Adrian Klaver wrote: > On Saturday 30 August 2008 5:23:25 am Albretch Mueller wrote: > > On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <[EMAIL PROTECTED]> wrote: > > > Define easily. > > > > ~ > > OK, let me try to outline the approach I would go for: > > ~ > > I think "COPY FROM CSV" should have three options, namely: > > ~ > > 1) the way we have used it in which you create the table first > > ~ > > 2) another way in which defaults are declared, generally as: > > ~ > > 2.1) aggressive: data type, value and formatting analysis is done; if > > only 1 or 0 are found declare then a BOOLEAN, if repeated data is > > found (say state codes) and the stratification nodes cover the rest of > > the data, stratify the data out to other extra table (they have a name > > I can't recall now), index it ..., if data is kind of numeric with > > front slashes and/or hyphen could they possibly be dates? if they are > > definitelly dates convert them to bigint (and do the formatting in the > > presentation code (also this a win-win situation with i18n code)) ... > > ~ > > 2.2) conservative: data type and value, but no formatting analysis is > > done and the greater encompassing data type is selected, say for 1 or > > 0 data use bytes [0, 255], for bytes use int, if something could be > > encoded as char(2), use varchar instead, . . . > > ~ > > 2.3) dumn: just use the coarsest data type possible; bigint for > > anything that looks like a number and varchar for the rest > > ~ > > the "dumn" option should suggest to the DBA the option they are > > using, quantified consequences for their desicions (larger DBs for no > > reason, approx. reduction in speed, . .) and how not to be "dumn" > > ~ > > 3) or you could define "import templates" declaring which specific > > data types to use for data in a certain way, which could be declared > > per column using regexps > > ~ > > > > > I could go on, but the point is that table data types require some > > > thought on the part of the DBA. > > > > ~ > > Well, it still requires their minds and input, but they will have > > jobs even if they get some help, don't you think so ;-) > > ~ > > lbrtchx > > This is a combination of more work then necessary and putting the cart > after the horse. All I can see happening is delaying the point of decision Lets try this again. The cart before the horse. Memo to self: 1) Drink sufficient coffee. 2) Answer email. > to a later time and or dumping the decision process on someone else. There > is already a "dumb" solution that has been brought many times on this list. > It involve creating a holding table that has text only fields and copying > the data into and then moving the data from there to a final table. As far > as import templates I suggest looking at: > http://pgloader.projects.postgresql.org/ > It also addresses some of your other suggestions. It does not automatically > create a table though. > > > > -- > Adrian Klaver > [EMAIL PROTECTED] -- Adrian Klaver [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] DUPS in tables columns ERROR: column ". . . " does not exist
> The system is smart enough to only do the count() once. ~ But not smart enough to make a variable you declare point to that internal variable so that things are clearer/ easier ;-) ~ Thanks lbrtchx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: relation . . . does not exist
> spreadsheet programs (generally; I'm sure there are exceptions) don't have > the notion of a schema; each cell can hold its own particular type. ~ Oh, now I see what Martin meant! ~ > that's not a traditional part of a database engine. ~ well, yeah! I would totally agree with you, but since I doubt very much "COPY FROM CSV" is part of the SQL standard to beging with, why not spice it up a little more? ~ > This is probably one of those classic "twenty lines of Perl" problems. ~ java since 1.5 comes with a full blown, PERL-like regexp engine ~ > I suggest looking at: http://pgloader.projects.postgresql.org/ > [1] A validator (regex) for each data type . . . ~ that sort of things was what i was talking about, but I would go quite a bit farther ~ Thanks lbrtchx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] DUPS in tables columns ERROR: column ". . . " does not exist
"Albretch Mueller" <[EMAIL PROTECTED]> writes: > thank you Stefan your SQL worked, but still; I am just asking and my > programming bias will certainly show, but aren't you effectivly > "calling" count on the table three times if you go: The system is smart enough to only do the count() once. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] DUPS in tables columns ERROR: column ". . . " does not exist
thank you Stefan your SQL worked, but still; I am just asking and my programming bias will certainly show, but aren't you effectivly "calling" count on the table three times if you go: ~ SELECT md5, COUNT(md5) FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 HAVING COUNT(md5) > 1 ORDER BY COUNT(md5) DESC; ~ Shouldn't ~ SELECT md5, COUNT(md5) AS CNT FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 HAVING CNT > 1 ORDER BY CNT DESC; ~ work? ~ jpk=# SELECT md5, COUNT(md5) AS CNT FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 HAVING CNT > 1 ORDER BY CNT DESC; jpk-# jpk-# jpk-# jpk-# ERROR: column "cnt" does not exist LINE 4: HAVING CNT > 1 ~ Thanks lbrtchx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: relation . . . does not exist
On Aug 30, 2008, at 9:19 AM, Christophe wrote: On Aug 30, 2008, at 6:26 AM, Albretch Mueller wrote: Well, then obviously there is the need for it and you were not successful enough at convincing these developers that they were "confusing postgresql with a spreadsheet program" The behavior you are looking for is typical of a spreadsheet, because spreadsheet programs (generally; I'm sure there are exceptions) don't have the notion of a schema; each cell can hold its own particular type. That being said, the automatic type- guessing that Excel, say, provides is far from foolproof; I've probably spent more time cleaning up Excel's bad guesses than would have been saved by my just specifying a type for each column. As has been noted, text representation of values are extremely ambiguous as of which Postgres type they mean... and, of course, you could have user-defined domains and types as well. It's true that it could take a wild guess, but that's not a traditional part of a database engine. That being said, it would not be too hard to write a client that accepted a CSV or tab-delimited file, parsed the header into column names, and then scanned the values of the columns to take a reasonable guess as to the column type from a highly limited set of possibilities. This is probably one of those classic "twenty lines of Perl" problems. About 150 line of perl[1]. It can actually work quite well, but is entirely a client-side problem. None of that sort of heuristics should go anywhere near COPY in. It doesn't seem as though COPY INTO is the right place for that, since the particular guesses and set of types that one would make strike me as very closely tied to your particular application domain. Cheers, Steve [1] A validator (regex) for each data type, then for each column track which data types it may be, as you scan through the file. Use the relative priorities of different data types to assign something appropriate for each column, then do a second pass translating the format into something Postgresql is comfortable with and feed it into pg_putcopydata. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: relation . . . does not exist
On Saturday 30 August 2008 5:23:25 am Albretch Mueller wrote: > On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <[EMAIL PROTECTED]> wrote: > > Define easily. > > ~ > OK, let me try to outline the approach I would go for: > ~ > I think "COPY FROM CSV" should have three options, namely: > ~ > 1) the way we have used it in which you create the table first > ~ > 2) another way in which defaults are declared, generally as: > ~ > 2.1) aggressive: data type, value and formatting analysis is done; if > only 1 or 0 are found declare then a BOOLEAN, if repeated data is > found (say state codes) and the stratification nodes cover the rest of > the data, stratify the data out to other extra table (they have a name > I can't recall now), index it ..., if data is kind of numeric with > front slashes and/or hyphen could they possibly be dates? if they are > definitelly dates convert them to bigint (and do the formatting in the > presentation code (also this a win-win situation with i18n code)) ... > ~ > 2.2) conservative: data type and value, but no formatting analysis is > done and the greater encompassing data type is selected, say for 1 or > 0 data use bytes [0, 255], for bytes use int, if something could be > encoded as char(2), use varchar instead, . . . > ~ > 2.3) dumn: just use the coarsest data type possible; bigint for > anything that looks like a number and varchar for the rest > ~ > the "dumn" option should suggest to the DBA the option they are > using, quantified consequences for their desicions (larger DBs for no > reason, approx. reduction in speed, . .) and how not to be "dumn" > ~ > 3) or you could define "import templates" declaring which specific > data types to use for data in a certain way, which could be declared > per column using regexps > ~ > > > I could go on, but the point is that table data types require some > > thought on the part of the DBA. > > ~ > Well, it still requires their minds and input, but they will have > jobs even if they get some help, don't you think so ;-) > ~ > lbrtchx This is a combination of more work then necessary and putting the cart after the horse. All I can see happening is delaying the point of decision to a later time and or dumping the decision process on someone else. There is already a "dumb" solution that has been brought many times on this list. It involve creating a holding table that has text only fields and copying the data into and then moving the data from there to a final table. As far as import templates I suggest looking at: http://pgloader.projects.postgresql.org/ It also addresses some of your other suggestions. It does not automatically create a table though. -- Adrian Klaver [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: relation . . . does not exist
On Aug 30, 2008, at 6:26 AM, Albretch Mueller wrote: Well, then obviously there is the need for it and you were not successful enough at convincing these developers that they were "confusing postgresql with a spreadsheet program" The behavior you are looking for is typical of a spreadsheet, because spreadsheet programs (generally; I'm sure there are exceptions) don't have the notion of a schema; each cell can hold its own particular type. That being said, the automatic type-guessing that Excel, say, provides is far from foolproof; I've probably spent more time cleaning up Excel's bad guesses than would have been saved by my just specifying a type for each column. As has been noted, text representation of values are extremely ambiguous as of which Postgres type they mean... and, of course, you could have user-defined domains and types as well. It's true that it could take a wild guess, but that's not a traditional part of a database engine. That being said, it would not be too hard to write a client that accepted a CSV or tab-delimited file, parsed the header into column names, and then scanned the values of the columns to take a reasonable guess as to the column type from a highly limited set of possibilities. This is probably one of those classic "twenty lines of Perl" problems. It doesn't seem as though COPY INTO is the right place for that, since the particular guesses and set of types that one would make strike me as very closely tied to your particular application domain. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] DUPS in tables columns ERROR: column ". . . " does not exist
Albretch Mueller wrote: Hi, ~ I am trying to get dups from some data from files which md5sums I previously calculated ~ Here is my mere mortal SQL ~ SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 WHERE (md5cnt > 1) GROUP BY md5 ORDER BY md5cnt DESC; I think you are looking for HAVING as in: SELECT md5, COUNT(md5) FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 HAVING count(md5) > 1 Stefan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: relation . . . does not exist
> I think you're confusing postgresql with a spreadsheet program. ~ I wonder what makes you think so ~ > There are client programs which will do this for you, perhaps you wan one of > those? ~ Well, then obviously there is the need for it and you were not successful enough at convincing these developers that they were "confusing postgresql with a spreadsheet program" ~ > PostgreSQL has 60+ types and many look like eachother. How do you propose to > differentiate? ~ Data Types are basically about value ranges (how many bits do you need to hold the value) and formatting. IMHO, finding an optimal [im|ex]port reasoning among 60+ types should not be that much of a big deal. In fact as a data analyst I have exported and imported CSV data a whole lot and in many occasions it required some extra custom coding. I may as well consolidate my code as a whole jcsvport library in java and start an OS project when I find the time to so ~ lbrtchx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] DUPS in tables columns ERROR: column ". . . " does not exist
Also I know there is a DISTINCT keyword, but I also need to know how many times the particular data in the column is repeated if it is, that is why I need to go: ~ SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 WHERE (md5cnt > 1) GROUP BY md5 ORDER BY md5cnt DESC; ~ Thanks lbrtchx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: relation . . . does not exist
On Sat, Aug 30, 2008 at 08:23:25AM -0400, Albretch Mueller wrote: > OK, let me try to outline the approach I would go for: > ~ > I think "COPY FROM CSV" should have three options, namely: I think you're confusing postgresql with a spreadsheet program. A database is designed to take care of your data and ensure its integrity. As such it requires a little more thought. There are client programs which will do this for you, perhaps you wan one of those? What's so hard about: create table foo (a text, b text); After which your COPY will complete fine. > 2.1) aggressive: data type, value and formatting analysis is done; if > only 1 or 0 are found declare then a BOOLEAN, if repeated data is > found (say state codes) and the stratification nodes cover the rest of > the data, stratify the data out to other extra table (they have a name > I can't recall now), index it ..., if data is kind of numeric with > front slashes and/or hyphen could they possibly be dates? if they are > definitelly dates convert them to bigint (and do the formatting in the > presentation code (also this a win-win situation with i18n code)) ... PostgreSQL has 60+ types and many look like eachother. How do you propose to differentiate? Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [GENERAL] ERROR: relation . . . does not exist
On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <[EMAIL PROTECTED]> wrote: > Define easily. ~ OK, let me try to outline the approach I would go for: ~ I think "COPY FROM CSV" should have three options, namely: ~ 1) the way we have used it in which you create the table first ~ 2) another way in which defaults are declared, generally as: ~ 2.1) aggressive: data type, value and formatting analysis is done; if only 1 or 0 are found declare then a BOOLEAN, if repeated data is found (say state codes) and the stratification nodes cover the rest of the data, stratify the data out to other extra table (they have a name I can't recall now), index it ..., if data is kind of numeric with front slashes and/or hyphen could they possibly be dates? if they are definitelly dates convert them to bigint (and do the formatting in the presentation code (also this a win-win situation with i18n code)) ... ~ 2.2) conservative: data type and value, but no formatting analysis is done and the greater encompassing data type is selected, say for 1 or 0 data use bytes [0, 255], for bytes use int, if something could be encoded as char(2), use varchar instead, . . . ~ 2.3) dumn: just use the coarsest data type possible; bigint for anything that looks like a number and varchar for the rest ~ the "dumn" option should suggest to the DBA the option they are using, quantified consequences for their desicions (larger DBs for no reason, approx. reduction in speed, . .) and how not to be "dumn" ~ 3) or you could define "import templates" declaring which specific data types to use for data in a certain way, which could be declared per column using regexps ~ > I could go on, but the point is that table data types require some thought on > the part of the DBA. ~ Well, it still requires their minds and input, but they will have jobs even if they get some help, don't you think so ;-) ~ lbrtchx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] DUPS in tables columns ERROR: column ". . . " does not exist
Hi, ~ I am trying to get dups from some data from files which md5sums I previously calculated ~ Here is my mere mortal SQL ~ SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 WHERE (md5cnt > 1) GROUP BY md5 ORDER BY md5cnt DESC; ~ and this is what I get: ~ jpk=# SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 WHERE (md5cnt > 1) GROUP BY md5 ORDER BY md5cnt DESC; jpk-# jpk-# jpk-# jpk-# ERROR: column "md5cnt" does not exist LINE 3: WHERE (md5cnt > 1) ~ I think I know what that one means based on the clear error message, namely md5cntis not a table column itself, but I still think there should be a way to formulate a simple query like this because PG does take "ORDER BY md5cnt DESC" even if md5cnt is not a table column, why on earth then it does not swallow and digest the "WHERE (md5cnt > 1)" part? ~ You could go the monkey way running a query like: ~ SELECT md5, COUNT(md5) AS md5cnt FROM jdk1_6_0_07_txtfls_md5 GROUP BY md5 ORDER BY md5cnt DESC; ~ and then use code to jump of the loop when md5cnt becomes 1 or you could use nested SQL statements ~ How can you find duplicate records in a table? ~ Thanks lbrtchx -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general