[GENERAL] tsearch2, locale, UTF-8 and Windows
Over the past few days, I have been reading everything I could about tsearch2, but I cannot figure out what the latest status is concerning the default locale on a Windows UTF-8 database under PostgreSQL 8.2. More specifically, I have a UTF-8 database containing information in five different European languages (English, French, Spanish, German and Italian). I am coding on a Windows system with locale French_Canada.1252. The server that will soon run the database will likely have locale en_US... I am at a loss concerning the locale with which I should initdb on my system!!! What am I getting myself into? A lot of what I read on this matter was pretty bleak; has version 8.2 cleared all that? What is the latest word on this?
Re: [GENERAL] Predicted lifespan of different PostgreSQL branches
Oisin Glynn wrote: My 8.2c, Having 8.1 end of life this soon after the release of 8.2 seems pretty harsh. Yeah, I agree. In part I'm basing the idea to support the current and 2 previous branches on the amount of work required to build a complete set of point releases in one go - 3 seems manageable over a weekend. 4 would probably be possible, but would be more rushed than I'd like. Also, three just seems like a sensible number to maintain. I kinda like Magnus' idea to put older releases into a sort of 'retired' mode though, and build only the binaries for PostgreSQL itself. Regards, Dave. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] counting query
I have a table definition such as: CREATE TABLE attendance ( attendanceid serial primary key, entered date DEFAULT current_date NOT NULL, absent boolean, authorization text default 'N', timeperiod char(2) check(timeperiod in('AM','PM')), days varchar(10), studentid int, unique(entered,timeperiod,studentid) ) Which is used to record school attendance data. I am now trying to write a query to identify trends in absences by counting the days column and returning any student that has repeated absences on certain days. I am struggling to return anything that does not need further manipulation in Python before being useful. Does anyone have any ideas? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Predicted lifespan of different PostgreSQL branches
Dave Page wrote: Oisin Glynn wrote: My 8.2c, Having 8.1 end of life this soon after the release of 8.2 seems pretty harsh. Yeah, I agree. In part I'm basing the idea to support the current and 2 previous branches on the amount of work required to build a complete set of point releases in one go - 3 seems manageable over a weekend. 4 would probably be possible, but would be more rushed than I'd like. I don't see a problem if the updates for the current and previous release binaries are available on the current schedule and older release update binaries are available a few days to a week later. The other option is to extend the time allocated between the decision to OK the releases and making the binaries available. For example we could say source release available friday with 8.2.x and 8.1.x binaries available monday then 8.0.x and 7.4.x binaries available thursday or friday. A lot of users may download the releases as they come out but most with production servers won't be installing them the same day. -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Password issue revisited
Michael Schmidt wrote: Fellow PostgreSQL fans, 1. I don't see that this would pose a major security risk. In fact, in applications where the user enters the password for each session, the password need never be saved to disk, which seems a definite security advantage. Some folks have noted that .pgpass is a plain text file, hence it could be vulnerable. Yes it is a plain text file but if you want to use it then you need to ensure the security is sufficient on the file or it won't be used. As per the manual - The permissions on .pgpass must disallow any access to world or group; achieve this by the command chmod 0600 ~/.pgpass. If the permissions are less strict than this, the file will be ignored. (The file permissions are not currently checked on Microsoft Windows, however.) So this security feature should be something that gets added to the windows version. But otherwise the security of the user's account that has a .pgpass file is the decider on whether it is vulnerable. -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] counting query
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of garry saddington Sent: zondag 28 januari 2007 14:06 To: pgsql-general@postgresql.org Subject: [GENERAL] counting query I have a table definition such as: CREATE TABLE attendance ( attendanceid serial primary key, Why you have this??? You already have (entered,timeperiod,studentid) that you can use, since that must be unique too. Try to avoid surrogate keys as much as possible (it really increases performance and ease-of-writing for complex queries! entered date DEFAULT current_date NOT NULL, absent boolean, authorization text default 'N', timeperiod char(2) check(timeperiod in('AM','PM')), days varchar(10), studentid int, unique(entered,timeperiod,studentid) ) Guessing the meaning a bit, not too self-explaining. Which is used to record school attendance data. I am now trying to write a query to identify trends in absences by counting the days column and returning any student that has repeated absences on certain days. I am struggling to return anything that does not need further manipulation in Python before being useful. Does anyone have any ideas? Yes, before starting you must have a well-defined idea on what you want to know. What should the result look like? In most situations start simple things, and eventually combine these to something more complex, but always know what you are querying. You should have some idea of what results you will be getting. Of course, this is a proces of discovery rather than following set rules. Some background on probabilities and statistics really helps. Personally I like visualization quite a lot to help me with this. Personally I've found nothing that will beat Excel for doing data analysis. Learn to use the pivot table and pivot charts. They are extremely powerful. However, it can be a bit tricky to transform the input into something the tool can use. A good starting point is to split the dates into seperate year, month, day, week values. Some idea's that might work to get it started: * A graph with days vs occurrences (count). * Graph of total sick days per student vs occurrences. * Graph of Check the occurrences per month/day/week Next try filtering of the data to form sequential periods and more funny things. I believe all this can be done with plain SQL and you don't need any python or plsql or other languages. - Joris Dobbelsteen ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] counting query
Joris Dobbelsteen wrote: CREATE TABLE attendance ( attendanceid serial primary key, Why you have this??? You already have (entered,timeperiod,studentid) that you can use, since that must be unique too. Try to avoid surrogate keys as much as possible (it really increases performance and ease-of-writing for complex queries! Correct me if I am wrong, but wouldn't a simple number be a lot easier to look up than a composite key? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] counting query
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Meyer Sent: zondag 28 januari 2007 15:36 To: pgsql-general@postgresql.org Subject: Re: [GENERAL] counting query Joris Dobbelsteen wrote: CREATE TABLE attendance ( attendanceid serial primary key, Why you have this??? You already have (entered,timeperiod,studentid) that you can use, since that must be unique too. Try to avoid surrogate keys as much as possible (it really increases performance and ease-of-writing for complex queries! Correct me if I am wrong, but wouldn't a simple number be a lot easier to look up than a composite key? No, it is not. Better, it is, It might be, until you go just a bit larger. I've learned it when I've build a production database (that's still used in production and still performs excellent. At the time I was only 15 or 16 or 17 years old. That was 7 to 10 years ago. Since then I've learned a lot.) The trouble is, the database consists of well over 40 tables (with nearly surrogate keys) and joining a bit of data on the far ends of the database requires you to join arround 10 to 15 tables. If you are doing something complex you will get lost at some point and really need graphical tools to just grasp what you are trying to query. So a bit of a complex query easily results in a excessive number of tables that must be queried. Why I did this. At this time I was not aware that you could build a primary key consisting of multiple columns. And if you look arround you at the Internet you see (nearly) all databases of free 'web applications' making excessive use of surrogate keys. This resulted in a lot of excessive surrogate keys that could have been easily avoided and where not a required atrifact of the inconsistent data I had to import. What would have been better without surrogate keys all-over: * Easier to write complex queries with much fewer tables to be queried. * Much faster query performance, as fewer tables need to be referenced. * Better integrity enforcement with simple foreign key constraints. If fact, in (guessed) 50% of the queries I could have avoided at least 2 table joins! Think big. That is why... - Joris Dobbelsteen ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] counting query
Joris Dobbelsteen [EMAIL PROTECTED] writes: What would have been better without surrogate keys all-over: * Easier to write complex queries with much fewer tables to be queried. * Much faster query performance, as fewer tables need to be referenced. * Better integrity enforcement with simple foreign key constraints. Not this debeta again. ;) Surrugate vs natural keys shouldn't make a difference in how many yables you have--they depends on the degree of normalization. Sounds like you denormalized your database and happened to eliminate surrogate keys at the same time. Using that to say surrogate keys are bad is kind of misleading. -Doug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] counting query
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/28/07 08:36, John Meyer wrote: Joris Dobbelsteen wrote: CREATE TABLE attendance ( attendanceid serial primary key, Why you have this??? You already have (entered,timeperiod,studentid) that you can use, since that must be unique too. Try to avoid surrogate keys as much as possible (it really increases performance and ease-of-writing for complex queries! Correct me if I am wrong, but wouldn't a simple number be a lot easier to look up than a composite key? This is the great synthetic-vs-natural key debate. Sure, it's easier to write queries that join on a synthetic integer field. However, adding 3 extra fields to a few other tables is not onerous, and it adds useful information to the other tables, since (entered,timeperiod,studentid) are what *really* makes a record unique. Also, synthetic keys mean that you have to do more joins, since if you want to know about entered and table T_FOO, you'd have to join attendance to T_FOO. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFvMH/S9HxQb37XmcRAoefAKDpf/6TG5WzP4nBIEcqVHE1dmb4/gCgxkZd 5fxfG4NoBR/Ul3fhqmpuTFQ= =g/F2 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] counting query
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/28/07 07:05, garry saddington wrote: I have a table definition such as: CREATE TABLE attendance ( attendanceid serial primary key, entered date DEFAULT current_date NOT NULL, absent boolean, authorization text default 'N', timeperiod char(2) check(timeperiod in('AM','PM')), days varchar(10), studentid int, unique(entered,timeperiod,studentid) ) Which is used to record school attendance data. I am now trying to write a query to identify trends in absences by counting the days column and returning any student that has repeated absences on certain days. I am struggling to return anything that does not need further manipulation in Python before being useful. Does anyone have any ideas? When you say certain days, you mean days of the week? If so, create a view like: CREATE VIEW V_DAY_ABSENCES AS SELECT ENTERED, AUTHORIZATION, TIMEPERIOD, DAYS, STUDENTID, DOW(CAST(ENTERED AS TIMESTAMP)) AS WEEKDAY FROM ATTENDANCE WHERE ABSENT = TRUE; Then, this query should do what you want: SELECT STUDENTID, TIMEPERIOD, WEEKDAY, COUNT(*) FROM V_DAY_ABSENSES GROUP BY STUDENTID, TIMEPERIOD, WEEKDAY HAVING COUNT(*) 3; -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFvMgCS9HxQb37XmcRAkvrAJ9ZiOJCM8GGE7ptIzcZsUJc7T2fnQCgpkUn /9nkR9BO04WB0XThPlx+254= =9D2A -END PGP SIGNATURE- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Limit on number of users in postgresql?
Thanks Tom. You know I thought about this approach a little more. I don't think there's a simple answer to this security problem short of placing a proxy server application between the clients and the database. The problem with giving database role accounts to each and every user is that the users now have uncontrolled access to the database. That's not good either because I want to do fine grained access control within my application. The more I think about it, the more I think a proxy app is necessary. It seems like a lot of work just for security issues, but basically most web based database apps use this model, with the web application acting as a proxy between the database and the client. Thanks. Tom Lane wrote: Mark Walker [EMAIL PROTECTED] writes: I'm wondering if there's a manageable limit on the number of users in postgresql. I could have conceivably thousands of different users for any particular app and multiple apps in the same server. Numbers in the low thousands are probably workable, as long as you aren't adding and removing users at a great rate. I think that the main bottleneck would be the flat file that's used to tell the postmaster about the set of valid users --- every time a user is added/dropped/changed, that file gets rewritten and then re-parsed by the postmaster. So you could eat a lot of overhead if you change users every few seconds or something like that. I'd suggest doing some testing to see if there are any unexpected bottlenecks, but I don't see a reason to reject the idea out of hand. (If you find any slownesses, report 'em, and we might be able to fix 'em.) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] counting query
On 1/28/07, Ron Johnson [EMAIL PROTECTED] wrote: This is the great synthetic-vs-natural key debate. Truly. But what the heck! Surrogate keys are not evil, and they do have value. I see no value in proclaiming surrogate keys are evil, do not use them. Surrogate keys do have advantages: - Disassociation of natural data to other referential tables (which can also be confusing) Imagine a social security number, drivers license number, or any other natural key. Now imagine that key value has changed for a specific person, and you have used it as a natural key throughout your data structures. (and they do change) - Reduced storage requirements (yields better performance) It is cheaper to store a 50 byte field + a 4 byte surrogate key once, then it is to store it a million times: (surrogate key) 54 bytes + (4 bytes * 1 million) = 4MB vs. (natural key) 50 bytes * 1 million = 50 MB Natural keys are not evil either, and they have their own advantages. But when your modeling very large databases (around 10TB range) then you absolutely have to consider every single decision, and natural keys (in my opinion) is not always a good one as a single natural key could result in another 100GB of storage requirements. There should be some thought when you are modeling and these are some of the things to consider. I don't see a 10 table join being a major performance penalty, especially when 8 of the tables may be a few MB in size. -- Chad http://www.postgresqlforums.com/
[GENERAL] explain analyse much slower than actual query
Dear All, I want to find all of the msg_ids from messages that are not in table part_tsearch, where both tables are large but the result of the query is normally small, and often empty. To 'protect' the application against the unusual case where the result of the query is large I have added a limit clause: select msg_id from messages where msg_id not in (select msg_id from part_tsearch) limit 10; Currently there are about 30,000 rows in each table and about 500 rows in the result of the (un-limit-ed) query. So it is unusual in the sense that the size of the result is relatively large and the limit clause will take effect. Both tables are indexed by msg_id. This was taking longer than I expected, so I decided to explain-analyse it. However, when I run it under explain-analyse, it takes even longer than before: decimail= select msg_id from messages where msg_id not in (select msg_id from part_tsearch) limit 10; msg_id 67894 67809 52548 67745 67538 67540 67329 67246 67235 67140 (10 rows) (that takes about 2 seconds) decimail= explain analyse select msg_id from messages where msg_id not in (select msg_id from part_tsearch) limit 10; QUERY PLAN Limit (cost=4301.99..10534.34 rows=10 width=4) (actual time=6677.791..72417.068 rows=10 loops=1) - Seq Scan on messages (cost=4301.99..11966058.86 rows=19193 width=4) (actual time=6677.725..72416.427 rows=10 loops=1) Filter: (NOT (subplan)) SubPlan - Materialize (cost=4301.99..4830.07 rows=37908 width=4) (actual time=0.097..39.494 rows=862 loops=903) - Seq Scan on part_tsearch (cost=0.00..4115.08 rows=37908 width=4) (actual time=0.104..1679.258 rows=37901 loops=1) Total runtime: 72424.066 ms (7 rows) As I increase the limit the runtime increases as follows: limit normal runtime explain-anlyse runtime 10 277 20 5309 40 12 807 80 88 160 149 320 1016 I was not patient enough to wait for the remaining explain-analyse results, but I feel that there is a linear slowdown of about 60x between the raw query and the explain-analyse version. In general, for other queries, explain-analyse reports runtimes that agree with the runtimes of the actual queries. But the peculiar behaviour of explain-analyse is really a distraction from the fact that the query is slow, especially when the limit value is large. The system seems to be largely CPU-bound during these long run-times. The query plan reported by explain-analyse is the same in each case. How many times is it actually doing the seq-scan on part_tsearch? I see that the rows value reported for Materialize is rather different in the planned and actual numbers. What is this telling me? I analysed the tables immediately before starting on this. I was hoping that it would be implemented using some sort of index scan on the two tables, maybe something involing bitmaps. Is there something that I can do to the query, or to my indexes, to make this happen? I tried using except rather than not in subquery; this isn't exactly identical as except is required to return sorted results, which I don't need; when the limit clause is in effect I don't care which of the possible ids are returned and I never care about the order. In this case the runtimes are as follows: limit normal runtime explain-anlyse runtime 10 15 54 20 15 55 40 19 94 80 955 160 20 68 320 10 70 Note that again explain-analyse is slower than the normal runtime, but now by a factor of about 5 rather than the previous factor of about 60. Basically, the query runtimes are now essentially constant; I imagine that they would be flatter if the machine wasn't also running other processes. The query plan shows that it is doing a sequential pass over each of the tables, sorting and then doing a set-op - exactly as expected and OK for large limits, but not for small limits. I feel that it should be possible to do this in miliseconds, not seconds, using the existing indexes. This is with Postgresql 8.1 on Debian GNU/Linux. (BTW, this is for Decimail, my postgresql-based IMAP mail server. I have recently added tsearch2-based searching. It is still somewhat experimental but has never lost any mail. If you're interested, have a look at http://decimail.org/.) Many thanks for any advice, Phil. (You are welcome to CC: me in any replies) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] virtual (COMPUTED BY) columns?
am Sun, dem 28.01.2007, um 10:25:33 -0600 mailte Ron Johnson folgendes: Hi. These fields do not use any disk space, as the data in them is derived on the fly. For example: CREATE TABLE T_EXAMPLE ( SOME_DATE DATE, JDATE COMPUTED BY EXTRACT(JULIAN FROM SOME_DATE) ); A work-around is to create a function, and reference it in every query, but storing them in the table definition is the tidy, low-maintenance way to do it. Was this a question how to do this in PostgreSQL? You can use a VIEW. Just create T_EXAMPLE ( SOME_DATE DATE ); and then CREATE VIEW view_example AS SELECT some_date, EXTRACT(JULIAN FROM some_date) AS julian_date FROM T_EXAMPLE; Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: - Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] virtual (COMPUTED BY) columns?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/28/07 10:43, A. Kretschmer wrote: am Sun, dem 28.01.2007, um 10:25:33 -0600 mailte Ron Johnson folgendes: Hi. These fields do not use any disk space, as the data in them is derived on the fly. For example: CREATE TABLE T_EXAMPLE ( SOME_DATE DATE, JDATE COMPUTED BY EXTRACT(JULIAN FROM SOME_DATE) ); A work-around is to create a function, and reference it in every query, but storing them in the table definition is the tidy, low-maintenance way to do it. Was this a question how to do this in PostgreSQL? You can use a VIEW. Just create T_EXAMPLE ( SOME_DATE DATE ); and then CREATE VIEW view_example AS SELECT some_date, EXTRACT(JULIAN FROM some_date) AS julian_date FROM T_EXAMPLE; Good point. But then you have 2 bits of metadata, and yet one more object to update when you add a column. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFvNQjS9HxQb37XmcRArguAJ4y4e5rbpe4YoH+VNJXIW0XSrjRqQCfcLO4 78WYNnFb14wlI9hXJtwbSeM= =KZkP -END PGP SIGNATURE- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] explain analyse much slower than actual query
Phil Endecott [EMAIL PROTECTED] writes: I was not patient enough to wait for the remaining explain-analyse results, but I feel that there is a linear slowdown of about 60x between the raw query and the explain-analyse version. Slow gettimeofday() ... fairly common on desktop-grade PC hardware :-(. You seem to have a particularly bad case of it, though, as extrapolating from your numbers suggests the overhead is something like 20 microseconds per clock reading; the other reporters we've heard from seemed to get around 1 to 5 usec IIRC. A lot of PCs still use clock chips that were designed back when multiple microseconds to read the clock wasn't unreasonable, but with CPU speeds in the GHz range this is just sucky hardware. It shows up on this example because most of the node entry/exits are for the Materialize node, which can return the next row from its internal array in about no time flat, so the clock readings represent huge percentage overhead. But the peculiar behaviour of explain-analyse is really a distraction from the fact that the query is slow, especially when the limit value is large. You need a hashed subplan for NOT IN to work reasonably fast. The fact you're not getting one suggests you either have to raise work_mem, or you're using some weird datatype that doesn't support hashing. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Password issue revisited
Mr. Lane and Mr. Momjian, Well, I asked and I got an answer. So be it. Regarding how I concluded that PGPASSFILE was deprecated for pg_dump, I offer the following. 1. The documentation for pg_dump in the manual (Section VI) includes a section labeled Environment. This lists PGDATABASE, PGHOST, PGPORT, and PGUSER. It also says default connection parameters but there is no hyperlink or reference to another manual section to explain/define this term. 2. Neither the Index nor the Table of Contents has an entry for connection parameters or default connection parameters. 3. The application help (pg_dump --help) provides no further information. 4. The pg_dump documentation in Section VI has a See Also section, with a link to 29.12. In 29.12 there is a list of variables that can be used to select default connection parameters for PQconnectdb, PQsetdbLogin, and PQsetdb (no mention of pg_dump). The list includes those mentioned in the pg_dump documentation () see 1 above) along with several others (including PGPASSFILE). I was left wondering why would some of these be mentioned in the pg_dump section and not the others? Perhaps those not mentioned can't be used by pg_dump? Given the above ambiguities, I was not able to conclude that pg_dump would accept the PGPASSFILE environment variable. Michael Schmidt
Re: [GENERAL] counting query
-Original Message- From: Douglas McNaught [mailto:[EMAIL PROTECTED] Sent: zondag 28 januari 2007 16:29 To: Joris Dobbelsteen Cc: John Meyer; pgsql-general@postgresql.org Subject: Re: [GENERAL] counting query Joris Dobbelsteen [EMAIL PROTECTED] writes: What would have been better without surrogate keys all-over: * Easier to write complex queries with much fewer tables to be queried. * Much faster query performance, as fewer tables need to be referenced. * Better integrity enforcement with simple foreign key constraints. Not this debeta again. ;) Surrugate vs natural keys shouldn't make a difference in how many yables you have--they depends on the degree of normalization. Sounds like you denormalized your database and happened to eliminate surrogate keys at the same time. Using that to say surrogate keys are bad is kind of misleading. I have perhaps formulated it quite extreme. It was not intended to take such a extreme stance. I appologize if it will lead to such a discussion. I'll try to do it a bit more careful next time. My point is only, be careful with surrogate keys and try not the use them for everything. In my sole opinion I see them getting used too much. But then again, what is good and wrong will always be subjective. So perhaps quoteWhat would have been better without surrogate keys all-overquote should have been My database where I extremely overdid it with surrogate keys. Lets leave it to this. - Joris ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] counting query
Joris Dobbelsteen [EMAIL PROTECTED] writes: So perhaps quoteWhat would have been better without surrogate keys all-overquote should have been My database where I extremely overdid it with surrogate keys. Fair enough. It's generally true that going to extremes with anything causes problems. :) -Doug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Predicted lifespan of different PostgreSQL branches
Dave Page wrote: Also, three just seems like a sensible number to maintain. I kinda like Magnus' idea to put older releases into a sort of 'retired' mode though, and build only the binaries for PostgreSQL itself. But would that give people who have previously used the full installer an upgrade path (that doesn't break everything around it)? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Password issue revisited
Michael Schmidt [EMAIL PROTECTED] writes: ... Regarding how I concluded that PGPASSFILE was deprecated for pg_dump, I offer the following. 1. The documentation for pg_dump in the manual (Section VI) includes a section labeled Environment. This lists PGDATABASE, PGHOST, PGPORT, and PGUSER. It also says default connection parameters but there is no hyperlink or reference to another manual section to explain/define this term. Yeah. There is a link down in See Also but the incomplete Environment section of these man pages seems misleading. Rather than try to maintain complete lists in each of the client-application man pages, I propose we remove those sections completely, and just rely on the See Also links to section 29.12. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Firing triggers based on certain Insert conditions
Hi I have a table in which i have a field named 'source' A trigger is written on this table. I want this trigger to fire only when after Insert this field 'source' has value = 'from', otherwise trigger should not be fired at all. Just wondering if its really possible? Thanks in advance. Harpreet
Re: [GENERAL] Questions about horizontal partitioning
However, if the primary key is entirely within those six columns, there will have to be an index on it in both tables to enforce the primary key constraint. In that case, an inner join could be performed with an index lookup or an index scan plus hash join, for a query that didn't use any other columns. Whether that translates into a significant I/O reduction depends on how wide and how frequently non-NULL those other columns are. ... if someone is feeling pedagogical (and the answer isn't that complicated), could they explain why a simple index on the desired columns wouldn't be the best solution? Cheers Antoine ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Firing triggers based on certain Insert conditions
Harpreet Dhaliwal [EMAIL PROTECTED] writes: I want this trigger to fire only when after Insert this field 'source' has value = 'from', otherwise trigger should not be fired at all. Just wondering if its really possible? No, and it seems pretty silly as a feature request. Why don't you just put the test in the trigger, and have it do nothing when you don't want it to do anything? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Limit on number of users in postgresql?
Furface [EMAIL PROTECTED] wrote: Thanks Tom. You know I thought about this approach a little more. I don't think there's a simple answer to this security problem short of placing a proxy server application between the clients and the database. The problem with giving database role accounts to each and every user is that the users now have uncontrolled access to the database. Ummm ... huh? PostgreSQL has a pretty nice security model that gives you a great deal of control over what users have access to: http://www.postgresql.org/docs/8.2/static/user-manag.html The only thing that's missing is row-level granularity. There's at least one project out there supporting that, and you can also simulate it with clever usage of stored procedures and the ability to run them with the permissions of the definer instead of the executer. -Bill ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Firing triggers based on certain Insert conditions
On 1/29/07, Harpreet Dhaliwal [EMAIL PROTECTED] wrote: Hi I have a table in which i have a field named 'source' A trigger is written on this table. I want this trigger to fire only when after Insert this field 'source' has value = 'from', otherwise trigger should not be fired at all. Just wondering if its really possible? AFAIK you can't prevent the trigger from firing, but you can make the trigger behave differently based on what's in the fields of the new row: CREATE FUNCTION source_insert() RETURNS trigger AS $$ BEGIN IF NEW.source = 'from' THEN /* do stuff */ END IF; RETURN NULL; END; $$ LANGUAGE plpgsql VOLATILE; CREATE TRIGGER post_insert AFTER INSERT ON [your table name] FOR EACH ROW EXECUTE PROCEDURE source_insert(); Regards, BJ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Firing triggers based on certain Insert conditions
I never said I don't want my trigger to do anything. My subject line only made it pretty clear that I want to fire my trigger based on certain conditions. I know its kind of a silly feature request but you really can't help it when you are working with stupid advisors :) thanks for your reponse anyways. Harpreet On 1/28/07, Tom Lane [EMAIL PROTECTED] wrote: Harpreet Dhaliwal [EMAIL PROTECTED] writes: I want this trigger to fire only when after Insert this field 'source' has value = 'from', otherwise trigger should not be fired at all. Just wondering if its really possible? No, and it seems pretty silly as a feature request. Why don't you just put the test in the trigger, and have it do nothing when you don't want it to do anything? regards, tom lane
Re: [GENERAL] Predicted lifespan of different PostgreSQL branches
--- Original Message --- From: Peter Eisentraut [EMAIL PROTECTED] To: pgsql-general@postgresql.org Sent: 28/01/07, 17:39:00 Subject: Re: [GENERAL] Predicted lifespan of different PostgreSQL branches Dave Page wrote: Also, three just seems like a sensible number to maintain. I kinda like Magnus' idea to put older releases into a sort of 'retired' mode though, and build only the binaries for PostgreSQL itself. But would that give people who have previously used the full installer an upgrade path (that doesn't break everything around it)? Yes - they'd just unpack the archive over their install directory. Might screw up the permissions though, and wouldn't include the docs :-( We'd certainly need to try it out thoroughly first... /D ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Limit on number of users in postgresql?
OK, let me think. In my situation, I'm writing an accounting app. A typical situation would be a standard user would be able to update data in a timesheet while an administrator would be able to approve the time sheet. If I gave the standard user access to the timesheet header table, they would also have access to an approval field, so I'd need something like column level security. I could put in a trigger where I check the permissions of someone attempting to set the approval column. Yes, I think that would probably work. I think you're correct. I can do pretty much what I want without developing a proxy server. That's what I think I'll do, my original plan of adding users with different roles. It still gives me the creeps, allowing lots of people direct access to my server. Thanks. Bill Moran wrote: Furface [EMAIL PROTECTED] wrote: Thanks Tom. You know I thought about this approach a little more. I don't think there's a simple answer to this security problem short of placing a proxy server application between the clients and the database. The problem with giving database role accounts to each and every user is that the users now have uncontrolled access to the database. Ummm ... huh? PostgreSQL has a pretty nice security model that gives you a great deal of control over what users have access to: http://www.postgresql.org/docs/8.2/static/user-manag.html The only thing that's missing is row-level granularity. There's at least one project out there supporting that, and you can also simulate it with clever usage of stored procedures and the ability to run them with the permissions of the definer instead of the executer. -Bill ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] explain analyse much slower than actual query
Thanks for the quick reply Tom. Tom Lane wrote: Phil Endecott spam_from_postgresql_general ( at ) chezphil ( dot ) org writes: I was not patient enough to wait for the remaining explain-analyse results, but I feel that there is a linear slowdown of about 60x between the raw query and the explain-analyse version. Slow gettimeofday() ... fairly common on desktop-grade PC hardware :-(. It's actually a virtual machine, and I seem to recall reading something about the virtualised gettimeofday() being slow. OK, that explains it. Thanks. But the peculiar behaviour of explain-analyse is really a distraction from the fact that the query is slow, especially when the limit value is large. You need a hashed subplan for NOT IN to work reasonably fast. The fact you're not getting one suggests you either have to raise work_mem, or you're using some weird datatype that doesn't support hashing. It's an int, and yes, increasing work_mem makes it use a hashed subplan: QUERY PLAN -- Limit (cost=4209.76..4213.61 rows=10 width=4) (actual time=5432.840..5461.518 rows=10 loops=1) - Seq Scan on messages (cost=4209.76..11608.23 rows=19218 width=4) (actual time=5432.776..5460.859 rows=10 loops=1) Filter: (NOT (hashed subplan)) SubPlan - Seq Scan on part_tsearch (cost=0.00..4115.01 rows=37901 width=4) (actual time=0.390..2984.783 rows=37907 loops=1) Total runtime: 5468.817 ms So presumably work_mem must be greater than some function of the size of the table in the subquery. Is there some way to work that out? This (virtual) machine doesn't have an enormous amount of RAM so I like to keep settings like this as high as necessary but no higher. If I understand it correctly, it is still doing a sequential scan on part_tsearch that does not terminate early due to the limit clause. So I'm still seeing run times that are rather worse than I think should be possible. Can it not step through the indexes in the way that it does for a Merge Join until it has got enough results to satisfy the limit, and then terminate? Thanks, Phil. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] counting query
On Sun, 2007-01-28 at 09:57 -0600, Ron Johnson wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/28/07 07:05, garry saddington wrote: I have a table definition such as: CREATE TABLE attendance ( attendanceid serial primary key, entered date DEFAULT current_date NOT NULL, absent boolean, authorization text default 'N', timeperiod char(2) check(timeperiod in('AM','PM')), days varchar(10), studentid int, unique(entered,timeperiod,studentid) ) Which is used to record school attendance data. I am now trying to write a query to identify trends in absences by counting the days column and returning any student that has repeated absences on certain days. I am struggling to return anything that does not need further manipulation in Python before being useful. Does anyone have any ideas? When you say certain days, you mean days of the week? If so, create a view like: CREATE VIEW V_DAY_ABSENCES AS SELECT ENTERED, AUTHORIZATION, TIMEPERIOD, DAYS, STUDENTID, DOW(CAST(ENTERED AS TIMESTAMP)) AS WEEKDAY FROM ATTENDANCE WHERE ABSENT = TRUE; Then, this query should do what you want: SELECT STUDENTID, TIMEPERIOD, WEEKDAY, COUNT(*) FROM V_DAY_ABSENSES GROUP BY STUDENTID, TIMEPERIOD, WEEKDAY HAVING COUNT(*) 3; Thank you, this works great. But I have another problem: Is it possible to identify absences in consecutive weeks on the same day. EG. If a pupil has a pattern of having every monday AM off school, how could that be identified? Regards Garry ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] explain analyse much slower than actual query
Phil Endecott [EMAIL PROTECTED] writes: If I understand it correctly, it is still doing a sequential scan on part_tsearch that does not terminate early due to the limit clause. So I'm still seeing run times that are rather worse than I think should be possible. Can it not step through the indexes in the way that it does for a Merge Join until it has got enough results to satisfy the limit, and then terminate? Nope, there is not that much intelligence about NOT IN. You could possibly manually rewrite the thing as a LEFT JOIN with a WHERE inner-join-key IS NULL clause. This would probably lose if most of the outer relation's rows join to many inner rows, though. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] PostgreSQL data loss
On 1/26/07, BluDes [EMAIL PROTECTED] wrote: Hi everyone, I have a problem with one of my costomers. I made a program that uses a PostgreSQL (win32) database to save its data. My customer claims that he lost lots of data reguarding his own clients and that those data had surely been saved on the database. My first guess is that he is the one who deleted the data but wants to blame someone else, obviously I can't prove it. I've been working with PostgreSQL since early 7.1 on dozens of projects and I've had maybe two or three cases of data corruption that were not explained by hardware failure or something like that (and even these cases were debatable since I was not in direct control of the server). Both of those cases had side effects...the corruption busted something else which sent immediate red flags that something was wrong. I think your customer is CYA. merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] counting query
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/28/07 15:18, garry saddington wrote: On Sun, 2007-01-28 at 09:57 -0600, Ron Johnson wrote: On 01/28/07 07:05, garry saddington wrote: [snip] When you say certain days, you mean days of the week? If so, create a view like: CREATE VIEW V_DAY_ABSENCES AS SELECT ENTERED, AUTHORIZATION, TIMEPERIOD, DAYS, STUDENTID, DOW(CAST(ENTERED AS TIMESTAMP)) AS WEEKDAY FROM ATTENDANCE WHERE ABSENT = TRUE; Then, this query should do what you want: SELECT STUDENTID, TIMEPERIOD, WEEKDAY, COUNT(*) FROM V_DAY_ABSENSES GROUP BY STUDENTID, TIMEPERIOD, WEEKDAY HAVING COUNT(*) 3; Thank you, this works great. But I have another problem: Is it possible to identify absences in consecutive weeks on the same day. EG. If a pupil has a pattern of having every monday AM off school, how could that be identified? I'd use the T_CALENDAR table, modified for your purposes. (It's a static that we create on every database.) We populate it with 22 years of dates. You'll have to write a small procedure to do it. CREATE TABLE T_CALENDAR ( DATE_ANSI DATE, YEAR_NUM SMALLINT, MONTH_NUM SMALLINT, DAY_OF_MONTH SMALLINT, DAY_OF_WEEK SMALLINT, JULIAN_DAYSMALLINT, DAY_OF_WEEK SMALLINT, IS_SCHOOL_DAY BOOL, SCHOOL_YEAR SMALLINT, -- 2006 for the 2006/07 school year SCHOOL_MONTH SMALLINT); -- 1 for August, 2 for September, etc Then, join T_CALENDAR to ATTENDANCE, WHERE DATE_ANSI = ENTERED AND DAY_OF_WEEK = 1 AND IS_SCHOOL_DAY = TRUE AND SCHOOL_YEAR = 2006; Making that join into a view and then, as Joris suggested, connect it to a spreadsheet. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFvRmAS9HxQb37XmcRAvpcAKDQ7kjs9/rMb39w5JYRRTl65mYoKQCfVVUm NTv6r6Kzu8T5D+SS8vxwFjs= =VDXa -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] text storage and parsing errors
For what may be really strange reasons I am trying to store emails into a database table. The idea is to take the entire original message and store it into a table with two columns, an serial primary key column and a column for the message. Originally I thought I would just use column type text but changed that to bytea based on the fact that the dbmail project uses bytea for reasons of character support. I'm running into problems with perl throwing an error of :'invalid input syntax for type bytea'. I have not yet been able to capture a message for any further testing because the error tends to destroy the message in process... the way that I'm using perl is to do a full prepare and execute statements, which as I understand perl, will do all the character escaping necessary to store the message. meaning, If I have characters like (') or (`) they should be escaped when they are entered into the SQL parameter. I am wondering if this is indeed the case. But I'm looking for any suggestions or considerations when trying to d this approach to avoid this kind of text parsing problem. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] MULE_INTERNAL translation to win1250
Hi. I have a strange problem in postgres 8.1.4 (gentoo 64bit on AMD64 platform) My database is created vith LATIN-2 encoding for correct vieving of nacional specific characters ( czech language ) inside code of my php application is setting client encoding to win1250 because I need output of query in this encoding. On some parts of data I got an error : Query failed: ERROR: character 0x829a of encoding MULE_INTERNAL has no equivalent in WIN1250 Without set client_encoding to win1250 query works. I am curious why there is a MULE_INTERNAL mentioned even when \l+ say that corresponding database is created with (and even all the cluster) LATIN2 encoding. Strange enough that ALL INSERTS are done with WIN1250 client encoding too. May be a bug in charset translation routines of postgres ? And how can I repair it, preferable in whole database ? Thanx for help. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] MULE_INTERNAL translation to win1250
NTPT [EMAIL PROTECTED] writes: Without set client_encoding to win1250 query works. I am curious why there is a MULE_INTERNAL mentioned even when \l+ say that corresponding database is created with (and even all the cluster) LATIN2 encoding. The conversions between LATIN2 and WIN1250 go by way of MULE_INTERNAL to reduce duplication of code. It shouldn't make any difference to the end result though. Are you sure that the characters you're using are supposed to have representations in both character sets? May be a bug in charset translation routines of postgres ? If you think that, you need to provide us with the exact codes that are being mistranslated and what you think they should translate to. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Limit on number of users in postgresql?
On Sun, Jan 28, 2007 at 01:21:09PM -0500, Bill Moran wrote: The only thing that's missing is row-level granularity. There's at least one project out there supporting that, and you can also simulate it with clever usage of stored procedures and the ability to run them with the permissions of the definer instead of the executer. You can also use rules to protect rows. E.g. CREATE RULE atable__lock_user_insert AS ON INSERT TO atable WHERE CURRENT_USER != 'mysuper' AND new.username != CURRENT_USER DO INSTEAD nothing; CREATE RULE atable__lock_user_update AS ON UPDATE TO atable WHERE CURRENT_USER != 'mysuper' AND old.username != CURRENT_USER DO INSTEAD nothing; CREATE RULE atable__lock_user_delete AS ON DELETE TO atable WHERE CURRENT_USER != 'mysuper' AND old.username != CURRENT_USER DO INSTEAD nothing; -- Ron Peterson https://www.yellowbank.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] text storage and parsing errors
tom wrote: the way that I'm using perl is to do a full prepare and execute statements, which as I understand perl, will do all the character escaping necessary to store the message. meaning, If I have characters like (') or (`) they should be escaped when they are entered into the SQL parameter. I am wondering if this is indeed the case. But I'm looking for any suggestions or considerations when trying to d this approach to avoid this kind of text parsing problem. Assuming you're using DBD::Pg and a placeholder for the bytea parameter of your query, you must specify its type explicitly, like this: use DBD::Pg qw(:pg_types); [...] $s=$dbh-prepare(INSERT INTO the_table(contents) VALUES(?)); $s-bind_param(1, $the_data, { pg_type = DBD::Pg::PG_BYTEA }); $s-execute; Hope this helps, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] PostgreSQL data loss
While making POC (proof of concept) for any project, we clearly mention at the end of the document that loss of data is not going to be our responsibility and thats how we guys save our ass right in the begening. What happened with you has happened with us many a times but our bold and italicized lines about data loss have always saved us. I suggest you something like this for your future projects. Hope this helps. Regards On 1/28/07, Merlin Moncure [EMAIL PROTECTED] wrote: On 1/26/07, BluDes [EMAIL PROTECTED] wrote: Hi everyone, I have a problem with one of my costomers. I made a program that uses a PostgreSQL (win32) database to save its data. My customer claims that he lost lots of data reguarding his own clients and that those data had surely been saved on the database. My first guess is that he is the one who deleted the data but wants to blame someone else, obviously I can't prove it. I've been working with PostgreSQL since early 7.1 on dozens of projects and I've had maybe two or three cases of data corruption that were not explained by hardware failure or something like that (and even these cases were debatable since I was not in direct control of the server). Both of those cases had side effects...the corruption busted something else which sent immediate red flags that something was wrong. I think your customer is CYA. merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] MULE_INTERNAL translation to win1250
MULE_INTERNAL is used for an intermediate encoding between LATIN2 and WIN1250. The error message indicates that 0x9a of LATIN2 cannot be mapped to WIN1250. You can see 0x00 in the position for 0x9a (between 0x99 and 0x9b) in the encoding map in src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c, which indicates nothing is corresponding to LATIN2 0x9a. If you know what should be mapped for LATIN2 0x9a, please let know us. static const unsigned char iso88592_2_win1250[] = { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x00, 0x8B, 0x00, 0x00, 0x00, 0x00, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x00, 0x9B, 0x00, 0x00, 0x00, 0x00, 0xA0, 0xA5, 0xA2, 0xA3, 0xA4, 0xBC, 0x8C, 0xA7, 0xA8, 0x8A, 0xAA, 0x8D, 0x8F, 0xAD, 0x8E, 0xAF, 0xB0, 0xB9, 0xB2, 0xB3, 0xB4, 0xBE, 0x9C, 0xA1, 0xB8, 0x9A, 0xBA, 0x9D, 0x9F, 0xBD, 0x9E, 0xBF, 0xC0, 0xC1, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7, 0xC8, 0xC9, 0xCA, 0xCB, 0xCC, 0xCD, 0xCE, 0xCF, 0xD0, 0xD1, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6, 0xD7, 0xD8, 0xD9, 0xDA, 0xDB, 0xDC, 0xDD, 0xDE, 0xDF, 0xE0, 0xE1, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7, 0xE8, 0xE9, 0xEA, 0xEB, 0xEC, 0xED, 0xEE, 0xEF, 0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF }; -- Tatsuo Ishii SRA OSS, Inc. Japan Hi. I have a strange problem in postgres 8.1.4 (gentoo 64bit on AMD64 platform) My database is created vith LATIN-2 encoding for correct vieving of nacional specific characters ( czech language ) inside code of my php application is setting client encoding to win1250 because I need output of query in this encoding. On some parts of data I got an error : Query failed: ERROR: character 0x829a of encoding MULE_INTERNAL has no equivalent in WIN1250 Without set client_encoding to win1250 query works. I am curious why there is a MULE_INTERNAL mentioned even when \l+ say that corresponding database is created with (and even all the cluster) LATIN2 encoding. Strange enough that ALL INSERTS are done with WIN1250 client encoding too. May be a bug in charset translation routines of postgres ? And how can I repair it, preferable in whole database ? Thanx for help. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] tsearch2, locale, UTF-8 and Windows
Just to pinpoint the meaning of my dismay, let me add one comment to my previous post. In the What'sNew document for tsearch2 with 8.2 http://www.sai.msu.su/~megera/wiki/Tsearch2WhatsNew we read: Don't forget to initdb cluster with correct utf8-locale ! initdb -D /usr/local/pgsql-dev/data.el_utf8 --locale=el_GR.utf8 I have never seen any detail of what was meant by the correct locale. Clearly if I was using a database with French content on a French system, I would intuitively choose fr_FR.utf8 as the locale, but if my database contains texts in several languages (see quoted post below), I don't know on what factor to base my choice of initdb locale. Suggestions? On Jan 28, 3:28 am, [EMAIL PROTECTED] (Pierre Thibaudeau) wrote: [...] I cannot figure out what the latest status is concerning the default locale on a Windows UTF-8 database under PostgreSQL 8.2. [...] I have a UTF-8 database containing information in five different European languages (English, French, Spanish, German and Italian). I am coding on a Windows system with locale French_Canada.1252. The server that will soon run the database will likely have locale en_US...
Re: [GENERAL] MULE_INTERNAL translation to win1250
On Sun, Jan 28, 2007 at 06:33:16PM -0500, Tom Lane wrote: NTPT [EMAIL PROTECTED] writes: May be a bug in charset translation routines of postgres ? If you think that, you need to provide us with the exact codes that are being mistranslated and what you think they should translate to. I wonder if the OP is doing something like this: test= SELECT getdatabaseencoding(); getdatabaseencoding - LATIN2 (1 row) test= SHOW client_encoding; client_encoding - win1250 (1 row) test= CREATE TABLE test (t text); CREATE TABLE test= INSERT INTO test VALUES (E'\202\232'); -- \202=0x82, \232=0x9a INSERT 0 1 test= SELECT * FROM test; ERROR: character 0x829a of encoding MULE_INTERNAL has no equivalent in WIN1250 The intent might be that E'\202\232' is a string in the client's encoding, where it would represent the same characters as Unicode U+201A SINGLE LOW-9 QUOTATION MARK, U+0161 LATIN SMALL LETTER S WITH CARON (I'm using Unicode as the pivot for convenience). But the backend is handling the string in the database's encoding, where it represents U+0082,U+009A, which are control characters that don't have mappings in win1250; hence the conversion error when the client tries to read the data. Just a guess. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Load balancing across disks
Im in the process of finalising my conversion from M$ SQL server to Postgres - all of which I'm very happy about so far. The database I work with has 37 tables, 5 of which run into the order of tens of millions of records and approximately another 10 can run into millions depending on the size of the customers main system. The DB is going to be expanded to have another 15 or so tables when I step up replication from my primary OpenVMS based application, some of these will also have extremely high usage - again depending on the size of the customers main system. In order to balance disk load and ensure faster data access, my current SQL server setup has the data spread across 3 physical disk devices. One question I would like to know which I can't find in the documentation I've been reading is if Postgres has any similar data distribution abilities. I.e. can I create a data file on D drive which holds tables a, b and e, and a data file on E drive which holds tables c, d and f. If this is possible, could someone point me to some documentation so I can experiment a little. If not possible, I guess I'll have to upgrade to some faster hardware... if they'll be willing to give me money for that ;) BTW, I'm using 8.2 on M$ Weenblows (Yes I know weenblows sucks, but i don't have sufficient unix/linux/other platform Postgres runs on experience to run the db on another server I'll wait for someone to port it to OpenVMS ;) and then use it on that (I know, I'm dreaming) - OpenVMS makes unixish systems look like they have the reliability of weenblows, I'll tell you that much :D) Cheers. -- Paul Lambert Database Administrator AutoLedgers ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Load balancing across disks
Paul Lambert [EMAIL PROTECTED] writes: I.e. can I create a data file on D drive which holds tables a, b and e, and a data file on E drive which holds tables c, d and f. If this is possible, could someone point me to some documentation so I can experiment a little. Read the doc section on tablespaces. -Doug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] MULE_INTERNAL translation to win1250
I made a some future investigation. I find and identified an exact line in databse. Exact column that cause a problem, I am able to select column into testtable while in testtable it retain its bad behavior. fortunally, this row does not contain vital data so I can drop it rather without a bigger problem, but I would like to know why I am able to identify a single character that cause a problem in real data and in testtable too. (rather character combination using substring function - it seems that in certain point it take two characters as single 16bit one ) but I am not able to reproduce this behavior on fresh table using insert and select statements. Please give me a some tip where to search and what else informations to provide. thank you. - Original Message - From: Tom Lane [EMAIL PROTECTED] To: NTPT [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Sent: Monday, January 29, 2007 12:33 AM Subject: Re: [GENERAL] MULE_INTERNAL translation to win1250 NTPT [EMAIL PROTECTED] writes: Without set client_encoding to win1250 query works. I am curious why there is a MULE_INTERNAL mentioned even when \l+ say that corresponding database is created with (and even all the cluster) LATIN2 encoding. The conversions between LATIN2 and WIN1250 go by way of MULE_INTERNAL to reduce duplication of code. It shouldn't make any difference to the end result though. Are you sure that the characters you're using are supposed to have representations in both character sets? May be a bug in charset translation routines of postgres ? If you think that, you need to provide us with the exact codes that are being mistranslated and what you think they should translate to. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.410 / Virus Database: 268.17.12/654 - Release Date: 27.1.2007 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Load balancing across disks
Paul Lambert wrote: Im in the process of finalising my conversion from M$ SQL server to Postgres - all of which I'm very happy about so far. The database I work with has 37 tables, 5 of which run into the order of tens of millions of records and approximately another 10 can run into millions depending on the size of the customers main system. The DB is going to be expanded to have another 15 or so tables when I step up replication from my primary OpenVMS based application, some of these will also have extremely high usage - again depending on the size of the customers main system. In order to balance disk load and ensure faster data access, my current SQL server setup has the data spread across 3 physical disk devices. One question I would like to know which I can't find in the documentation I've been reading is if Postgres has any similar data distribution abilities. I.e. can I create a data file on D drive which holds tables a, b and e, and a data file on E drive which holds tables c, d and f. You are looking for tablespaces :). If this is possible, could someone point me to some documentation so I can experiment a little. http://www.postgresql.org/docs/8.0/interactive/sql-createtablespace.html If not possible, I guess I'll have to upgrade to some faster hardware... if they'll be willing to give me money for that ;) BTW, I'm using 8.2 on M$ Weenblows (Yes I know weenblows sucks, but i don't have sufficient unix/linux/other platform Postgres runs on experience to run the db on another server I'll wait for someone to port it to OpenVMS ;) and then use it on that (I know, I'm dreaming) - OpenVMS makes unixish systems look like they have the reliability of weenblows, I'll tell you that much :D) Cheers. -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] MULE_INTERNAL translation to win1250
On Sun, Jan 28, 2007 at 07:27:12PM -0700, Michael Fuhr wrote: I wonder if the OP is doing something like this: [...] test= INSERT INTO test VALUES (E'\202\232'); -- \202=0x82, \232=0x9a Another possibility, perhaps more likely, is that some connection didn't set client_encoding to win1250 before it inserted win1250-encoded data; in that case the data was probably treated as LATIN2 and stored without conversion. When a connection with client_encoding set to win1250 tries to fetch the data, conversion is attempted and fails because some LATIN2 values don't have win1250 mappings. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] convert(USING utf8_to_iso_8859_15) on Windows
Is this a documented phenomenon with the convert function? The first result is what's expected: SELECT convert('Gregoire' USING utf8_to_iso_8859_15); Gregoire But I don't understand the next result, when I put an acute accent over the first e: SELECT convert('Grégoire' USING utf8_to_iso_8859_15); (The output is an empty string.) Likewise, whenever I enter a string containing non-ASCII characters, the convert function outputs an empty string. Same results when I change the conversion type from UTF8 to any other encoding which accepts those non-ASCII characters... (When I try a conversion to an encoding that doesn't accept the characters, I get an error message, and that's normal.) My setup is as follows: PostgreSQL 8.2.1 on WindowsXP The database has UTF8 encoding. SHOW lc_ctype; gives: French_Canada.1252 Is my problem related to Windows' lack of UTF8 support? I thought those problems were solved with version 8.2...