Re: [GENERAL] Speed difference between != and = operators?

2000-07-31 Thread Paul Caskey

Paul Caskey wrote:
 
 This query takes 206 seconds:
 
 SELECT t1.blah, t1.foo, t2.id
 FROM t1, t2, t3
 WHERE t1.SessionId = 427
 AND t1.CatalogId = 22
 AND t1.CatalogId = t3.CatalogId
 AND t2.id = t3.SomeId
 AND t2.Active != 0
 
 If I change the last line to this, it takes 1 second:
 
 AND t2.Active = 1
 
 The "Active" field is 0 or 1.  The query returns the same rows, either way
 (about 1700 rows).  There is an index on the Active field.
 
 These two queries both take 1 second:
 
 SELECT * FROM t2 WHERE Active = 1;
 SELECT * FROM t2 WHERE Active != 0;
 
 Any ideas?  Possible bug?


My coworker adds:

Interesting note... when you change the operator the way you 
said you get a completely different query plan:

Original: (Active != 0)

Nested Loop  (cost=0.00..143.49 rows=1 width=20)
  -  Nested Loop  (cost=0.00..141.46 rows=1 width=12)
-  Seq Scan on t2  (cost=0.00..48.11 rows=1 width=4)
-  Seq Scan on t3  (cost=0.00..71.50 rows=1748 width=8)
  -  Index Scan using idx1 on t3 (cost=0.00..2.02 rows=1 width=8)


Altered: (Active = 1)

Nested Loop  (cost=5.06..272.65 rows=62 width=20)
  -  Hash Join  (cost=5.06..146.49 rows=62 width=16)
-  Seq Scan on t3  (cost=0.00..71.49 rows=1748 width=8)
-  Hash  (cost=5.05..5.05 rows=4 width=8)
  -  Index Scan using idx1 on t1 (cost=0.00..5.05 rows=4
width=8)
  -  Index Scan using t2_pkey on t2  (cost=0.00..2.02 rows=1 width=4)


What's odd is the maximum cost is low in both cases, but is even lower in
the query than runs 200 times slower.

-- 
Paul Caskey [EMAIL PROTECTED]   Software Engineer
New Mexico Software 5041 Indian School NE   Albuquerque, NM 87110
--



Re: [GENERAL] 4 billion record limit?

2000-07-28 Thread Paul Caskey

Thomas Lockhart wrote:
 
  FWIW, I checked into MySQL, and as far as I can tell, they have nothing
  like this implicit 4 billion transactional "limit".  So maybe competitive
  spirit will drive the postgres hackers to fix this problem sooner than
  later.  ;-)
 
 We have *never* had a report of someone pushing this 4GB limit, and
 theoretical problems usually go into the long-term development plan, not
 in the "OHMYGODITSBROKEN" list.

That's absolutely true, and I'm sorry for pushing your MySQL button.  I'm
not in some sort of panic to get this fixed.  I just raised this issue to
(1) see if I was the only one who noticed/cared and (2) learn some details
to see if/what limits were really there.

 Tom Lane wrote:

 Hmph.  Has anyone ever come close to shoving 4G rows into MySQL?  Maybe
 they just haven't documented their limits...

The guantlet has been dropped!  Looks like I have my work cut out for me,
this weekend.  :-)

Again, just to clarify, I'm not concerned about a database holding 4
billion records at once.  I'm concerned about performing 4,000,000,000
inserts and 3,999,999,000 deletes over the course of a few years.  I just
ran the numbers and that's about 50 transactions/second on an
international web site active 24 hours/day for 3 years.  Put 5 such sites
on one server and you're down to 10 trans/sec per site.  Ramp up to 30
trans/sec/site and you're down to 1 year.  Juggle these numbers however
you want; it's not that far fetched.

Sure, it has a lot to do with the application.  Most databases are read
much more than they're written, and of course a SELECT doesn't chew up an
OID.  But every INSERT does.  Any application that updates information
frequently can be vulnerable to this problem.  With my table structure, it
happens to be much easier to DELETE and INSERT inside a transaction than
to UPDATE.  And every "row" of customer-supplied data could turn into 100
INSERTs, so I'm accelerating toward that 4G limit 100 times faster.

Bottom line: I can recycle my own IDs if I have to.  But I need to know
whether I should change my table structure and/or code to conserve OIDs,
an internal system identifier particular to the RDBMS I chose. 
Considering the main reason I chose Postgres was for scalability, this
seems a little ironic.

-- 
Paul Caskey [EMAIL PROTECTED]   Software Engineer
New Mexico Software 5041 Indian School NE   Albuquerque, NM 87110
--



Re: [GENERAL] 4 billion record limit?

2000-07-27 Thread Paul Caskey

Tom Lane wrote:
 
 Paul Caskey [EMAIL PROTECTED] writes:

  No doubt about it, you're likely to get a few "duplicate key" errors and
  stuff like that.  I'm just observing that it's not likely to be a
  complete catastrophe, especially not if you don't rely on OIDs to be
  unique in your user tables.
 
  I don't rely on OID uniqueness, but I assumed Postgres does!
 
 Only in the system tables, and not even in all of them.  From the
 system's point of view, there's no real need to assign OIDs to
 user table rows at all --- so another possible answer is not to
 do that, unless the user requests it.

That's interesting (almost shocking) to know, but it's probably too late
to put the lid back on that bottle.  I imagine a lot of software has been
written especially for postgres, like pgaccess, which use the oid field
along with cursors, etc.  A lot of people have probably also relied on the
oid as a convenient unique record identifier.  I know I was tempted.  If
the system is doing it anyway, why maintain my own with a serial or
sequence?

So if we make the OID optional in user tables, they need to be left in by
default, and only *omitted* with a compile-time option.  Otherwise we'll
break many more things than we solve with this Y2K-like problem.

Dave Burbidge wrote:

 Ummm ... I'm a newbie to this list, but hasn't this evolved into a hacker 
 issue?

Yes, we should move any further detailed discussion over there.  I asked
this question initially in the hacker list and got no response, so I came
over here.  But we did have a related discussion on 64-bit sequences in
the hacker list.

FWIW, I checked into MySQL, and as far as I can tell, they have nothing
like this implicit 4 billion transactional "limit".  So maybe competitive
spirit will drive the postgres hackers to fix this problem sooner than
later.  ;-)

If nothing else, this needs to be documented.  Whether by design or not,
people need to know there is a hard limit of 4 billion records on a
server, and a strange fuzzy sorta-limit of 4 billion inserts on a server. 
Regardless of whether they use a 64-bit compiler.

-- 
Paul Caskey [EMAIL PROTECTED]   Software Engineer
New Mexico Software 5041 Indian School NE   Albuquerque, NM 87110
--



Re: [GENERAL] 4 billion record limit?

2000-07-26 Thread Paul Caskey

Tom Lane wrote:
 
 Paul Caskey [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  It's only a hard limit if your application assumes OIDs are unique.
  If you don't assume that, then I think it's not a big problem.
 
  It's possible (though obviously not especially likely) that you might
  get OID collisions in the system tables after an OID-counter wraparound.
 
  This implies they do wrap around.  So they are reused?  Chris said no, but
  you're saying yes.
 
  (Maybe they wrap around "by accident", by adding one to MAXINT, which will
  give zero on an unsigned int, I believe.  Will the system choke on zero?
  Has anyone tested this wraparound?)
 
 Yes, yes, and yes ;-).  

Good answer.  :-)  Thanks for looking into it.

  Even if they do wrap, if I have some old records lying around with a low
  OIDs, they will trip me up.
 
 No doubt about it, you're likely to get a few "duplicate key" errors and
 stuff like that.  I'm just observing that it's not likely to be a
 complete catastrophe, especially not if you don't rely on OIDs to be
 unique in your user tables.

I don't rely on OID uniqueness, but I assumed Postgres does!

I don't use the OID for anything; I maintain my own "id" field with a
sequence for every table.  I can catch a "duplicate key" error in my code,
but I'll treat it as if my own "id" caused a problem.  It will totally
confuse me and my code if there's a 
"duplicate key" error related to a hidden system field I never create or
examine.  I won't know if/how to re-insert with a different key to solve
the problem.

 We have talked about offering 8-byte OIDs as a compile-time option,
 and I think it'll happen eventually, but I'm not holding my breath.
 Lots of other stuff seems more pressing...

Fair enough, and thanks for all your work so far.

-- 
Paul Caskey [EMAIL PROTECTED]   505-255-1999
New Mexico Software 5041 Indian School NE   Albuquerque, NM 87110
--



[GENERAL] Bigger sequences (int8)

2000-06-20 Thread Paul Caskey

Can I make a sequence use an int8 instead of int4?  

I have an application where, over a few years, it's quite possible to hit
the ~2 billion limit.  (~4 billion if I start the sequence at -2
billion.)  

There won't be that many records in the table, but there will be that many
inserts.  In other words, there will be many deletes, as well.

If I CYCLE, old record could still be lingering, and I have the overhead
of checking every NEXTVAL to make sure it's not already being used.  :-(

Any other ideas?  I could use two int4's together as primary key, and do
some crazy math to increment, or jump through other hoops (besides CYCLE)
to intelligently reuse keys ... but then I have some ugly overhead, as
well.

I really want the sequence to just be an int8.  Are we talking about a
heinous amount of work in the source code to allow this option?  I
wouldn't want to mess with the "CREATE SEQUENCE" syntax; it would seem
more appropriate as a compile-time option.


Paul Caskey
Software Engineer
New Mexico Software