Re: [GENERAL] Speed difference between != and = operators?
Paul Caskey wrote: This query takes 206 seconds: SELECT t1.blah, t1.foo, t2.id FROM t1, t2, t3 WHERE t1.SessionId = 427 AND t1.CatalogId = 22 AND t1.CatalogId = t3.CatalogId AND t2.id = t3.SomeId AND t2.Active != 0 If I change the last line to this, it takes 1 second: AND t2.Active = 1 The "Active" field is 0 or 1. The query returns the same rows, either way (about 1700 rows). There is an index on the Active field. These two queries both take 1 second: SELECT * FROM t2 WHERE Active = 1; SELECT * FROM t2 WHERE Active != 0; Any ideas? Possible bug? My coworker adds: Interesting note... when you change the operator the way you said you get a completely different query plan: Original: (Active != 0) Nested Loop (cost=0.00..143.49 rows=1 width=20) - Nested Loop (cost=0.00..141.46 rows=1 width=12) - Seq Scan on t2 (cost=0.00..48.11 rows=1 width=4) - Seq Scan on t3 (cost=0.00..71.50 rows=1748 width=8) - Index Scan using idx1 on t3 (cost=0.00..2.02 rows=1 width=8) Altered: (Active = 1) Nested Loop (cost=5.06..272.65 rows=62 width=20) - Hash Join (cost=5.06..146.49 rows=62 width=16) - Seq Scan on t3 (cost=0.00..71.49 rows=1748 width=8) - Hash (cost=5.05..5.05 rows=4 width=8) - Index Scan using idx1 on t1 (cost=0.00..5.05 rows=4 width=8) - Index Scan using t2_pkey on t2 (cost=0.00..2.02 rows=1 width=4) What's odd is the maximum cost is low in both cases, but is even lower in the query than runs 200 times slower. -- Paul Caskey [EMAIL PROTECTED] Software Engineer New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
Re: [GENERAL] 4 billion record limit?
Thomas Lockhart wrote: FWIW, I checked into MySQL, and as far as I can tell, they have nothing like this implicit 4 billion transactional "limit". So maybe competitive spirit will drive the postgres hackers to fix this problem sooner than later. ;-) We have *never* had a report of someone pushing this 4GB limit, and theoretical problems usually go into the long-term development plan, not in the "OHMYGODITSBROKEN" list. That's absolutely true, and I'm sorry for pushing your MySQL button. I'm not in some sort of panic to get this fixed. I just raised this issue to (1) see if I was the only one who noticed/cared and (2) learn some details to see if/what limits were really there. Tom Lane wrote: Hmph. Has anyone ever come close to shoving 4G rows into MySQL? Maybe they just haven't documented their limits... The guantlet has been dropped! Looks like I have my work cut out for me, this weekend. :-) Again, just to clarify, I'm not concerned about a database holding 4 billion records at once. I'm concerned about performing 4,000,000,000 inserts and 3,999,999,000 deletes over the course of a few years. I just ran the numbers and that's about 50 transactions/second on an international web site active 24 hours/day for 3 years. Put 5 such sites on one server and you're down to 10 trans/sec per site. Ramp up to 30 trans/sec/site and you're down to 1 year. Juggle these numbers however you want; it's not that far fetched. Sure, it has a lot to do with the application. Most databases are read much more than they're written, and of course a SELECT doesn't chew up an OID. But every INSERT does. Any application that updates information frequently can be vulnerable to this problem. With my table structure, it happens to be much easier to DELETE and INSERT inside a transaction than to UPDATE. And every "row" of customer-supplied data could turn into 100 INSERTs, so I'm accelerating toward that 4G limit 100 times faster. Bottom line: I can recycle my own IDs if I have to. But I need to know whether I should change my table structure and/or code to conserve OIDs, an internal system identifier particular to the RDBMS I chose. Considering the main reason I chose Postgres was for scalability, this seems a little ironic. -- Paul Caskey [EMAIL PROTECTED] Software Engineer New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
Re: [GENERAL] 4 billion record limit?
Tom Lane wrote: Paul Caskey [EMAIL PROTECTED] writes: No doubt about it, you're likely to get a few "duplicate key" errors and stuff like that. I'm just observing that it's not likely to be a complete catastrophe, especially not if you don't rely on OIDs to be unique in your user tables. I don't rely on OID uniqueness, but I assumed Postgres does! Only in the system tables, and not even in all of them. From the system's point of view, there's no real need to assign OIDs to user table rows at all --- so another possible answer is not to do that, unless the user requests it. That's interesting (almost shocking) to know, but it's probably too late to put the lid back on that bottle. I imagine a lot of software has been written especially for postgres, like pgaccess, which use the oid field along with cursors, etc. A lot of people have probably also relied on the oid as a convenient unique record identifier. I know I was tempted. If the system is doing it anyway, why maintain my own with a serial or sequence? So if we make the OID optional in user tables, they need to be left in by default, and only *omitted* with a compile-time option. Otherwise we'll break many more things than we solve with this Y2K-like problem. Dave Burbidge wrote: Ummm ... I'm a newbie to this list, but hasn't this evolved into a hacker issue? Yes, we should move any further detailed discussion over there. I asked this question initially in the hacker list and got no response, so I came over here. But we did have a related discussion on 64-bit sequences in the hacker list. FWIW, I checked into MySQL, and as far as I can tell, they have nothing like this implicit 4 billion transactional "limit". So maybe competitive spirit will drive the postgres hackers to fix this problem sooner than later. ;-) If nothing else, this needs to be documented. Whether by design or not, people need to know there is a hard limit of 4 billion records on a server, and a strange fuzzy sorta-limit of 4 billion inserts on a server. Regardless of whether they use a 64-bit compiler. -- Paul Caskey [EMAIL PROTECTED] Software Engineer New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
Re: [GENERAL] 4 billion record limit?
Tom Lane wrote: Paul Caskey [EMAIL PROTECTED] writes: Tom Lane wrote: It's only a hard limit if your application assumes OIDs are unique. If you don't assume that, then I think it's not a big problem. It's possible (though obviously not especially likely) that you might get OID collisions in the system tables after an OID-counter wraparound. This implies they do wrap around. So they are reused? Chris said no, but you're saying yes. (Maybe they wrap around "by accident", by adding one to MAXINT, which will give zero on an unsigned int, I believe. Will the system choke on zero? Has anyone tested this wraparound?) Yes, yes, and yes ;-). Good answer. :-) Thanks for looking into it. Even if they do wrap, if I have some old records lying around with a low OIDs, they will trip me up. No doubt about it, you're likely to get a few "duplicate key" errors and stuff like that. I'm just observing that it's not likely to be a complete catastrophe, especially not if you don't rely on OIDs to be unique in your user tables. I don't rely on OID uniqueness, but I assumed Postgres does! I don't use the OID for anything; I maintain my own "id" field with a sequence for every table. I can catch a "duplicate key" error in my code, but I'll treat it as if my own "id" caused a problem. It will totally confuse me and my code if there's a "duplicate key" error related to a hidden system field I never create or examine. I won't know if/how to re-insert with a different key to solve the problem. We have talked about offering 8-byte OIDs as a compile-time option, and I think it'll happen eventually, but I'm not holding my breath. Lots of other stuff seems more pressing... Fair enough, and thanks for all your work so far. -- Paul Caskey [EMAIL PROTECTED] 505-255-1999 New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
[GENERAL] Bigger sequences (int8)
Can I make a sequence use an int8 instead of int4? I have an application where, over a few years, it's quite possible to hit the ~2 billion limit. (~4 billion if I start the sequence at -2 billion.) There won't be that many records in the table, but there will be that many inserts. In other words, there will be many deletes, as well. If I CYCLE, old record could still be lingering, and I have the overhead of checking every NEXTVAL to make sure it's not already being used. :-( Any other ideas? I could use two int4's together as primary key, and do some crazy math to increment, or jump through other hoops (besides CYCLE) to intelligently reuse keys ... but then I have some ugly overhead, as well. I really want the sequence to just be an int8. Are we talking about a heinous amount of work in the source code to allow this option? I wouldn't want to mess with the "CREATE SEQUENCE" syntax; it would seem more appropriate as a compile-time option. Paul Caskey Software Engineer New Mexico Software