Re: [PERFORM] not exists clause

2008-01-10 Thread Guy Rouillier

S Golly wrote:

I cannot get the "not exists" clause of ANSI SQL to execute correctly.
select t.col11, t.col1... from table1 t where not exists (select 1 from 
table2 where col2 = t.col1);

table1 has 40M + rows. if that matters.
 
OS is FreeBSD 6.2, postgresql version 8.2.6
 
Is it not supported or a bug ?

thank you for your support.


This is really not a performance question, but a general SQL question.

select * from t1

f1
--
1
2
3

select * from t2

f1
--
1
2

select * from t1
where not exists
(
select 1
from t2
where t2.f1 = t1.f1
)

f1
--
3

--
Guy Rouillier

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Best way to index IP data?

2008-01-10 Thread Jonah H. Harris
On Jan 10, 2008 6:25 PM, Steve Atkins <[EMAIL PROTECTED]> wrote:
> http://pgfoundry.org/projects/ip4r/
>
> That has the advantage over using integers, or the built-in inet type,
> of being indexable for range and overlap queries.

Agreed.  ip4r is da bomb.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-10 Thread Steve Atkins


On Jan 10, 2008, at 3:14 PM, Kevin Kempter wrote:


Hi List;

We'll be loading a table with begining & ending I.P.'s - the table  
will likely

have upwards of 30million rows.  Any thoughts on how to get the best
performance out of queries that want to look for IP ranges or the  
use of

between queries? Should these be modeled as integers?


http://pgfoundry.org/projects/ip4r/

That has the advantage over using integers, or the built-in inet type,
of being indexable for range and overlap queries.

Cheers,
  Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Best way to index IP data?

2008-01-10 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, 10 Jan 2008 16:14:54 -0700
Kevin Kempter <[EMAIL PROTECTED]> wrote:

> Hi List;
> 
> We'll be loading a table with begining & ending I.P.'s - the table
> will likely have upwards of 30million rows.  Any thoughts on how to
> get the best performance out of queries that want to look for IP
> ranges or the use of between queries? Should these be modeled as
> integers? 
> 

http://www.postgresql.org/docs/current/static/datatype-net-types.html

> Thanks in advance
> 
> /Kevin
> 
> ---(end of
> broadcast)--- TIP 1: if posting/reading
> through Usenet, please send an appropriate subscribe-nomail command
> to [EMAIL PROTECTED] so that your message can get through to
> the mailing list cleanly
> 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHhqjHATb/zqfZUUQRAvMOAJ984Np5GMrFd1vixP/zECIl3qUWYgCff6U4
bCBBz1VaxqIoZfCFfKEIZLU=
=+9vD
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Best way to index IP data?

2008-01-10 Thread Kevin Kempter
Hi List;

We'll be loading a table with begining & ending I.P.'s - the table will likely 
have upwards of 30million rows.  Any thoughts on how to get the best 
performance out of queries that want to look for IP ranges or the use of 
between queries? Should these be modeled as integers? 

Thanks in advance

/Kevin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] not exists clause

2008-01-10 Thread Josh Berkus
Golly,

> I cannot get the "not exists" clause of ANSI SQL to execute correctly.
> select t.col11, t.col1... from table1 t where not exists (select 1 from
> table2 where col2 = t.col1);
> table1 has 40M + rows. if that matters.
>
> OS is FreeBSD 6.2, postgresql version 8.2.6

You'll have to post the actual query and error message.  WHERE NOT EXISTS 
has been supported since version 7.1.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] not exists clause

2008-01-10 Thread S Golly
I cannot get the "not exists" clause of ANSI SQL to execute correctly.
select t.col11, t.col1... from table1 t where not exists (select 1 from
table2 where col2 = t.col1);
table1 has 40M + rows. if that matters.

OS is FreeBSD 6.2, postgresql version 8.2.6

Is it not supported or a bug ?
thank you for your support.


Re: [PERFORM] big database performance

2008-01-10 Thread Jared Mauch
On Thu, Jan 10, 2008 at 12:08:39PM +0100, Stephane Bailliez wrote:
> Jared Mauch wrote:
>>  I do large databases in Pg, like 300GB/day of new data.
>
> That's impressive.  Would it be possible to have details on your hardware, 
> schema and configuration and type of usage ?
>
> I'm sure there's something to learn in there for a lot of people (or at 
> least for me)

http://archives.postgresql.org/pgsql-performance/2007-12/msg00372.php

http://archives.postgresql.org/pgsql-performance/2006-05/msg00444.php

The hardware specs are kinda boring since it's not
I/O bound, so you could get the same disk performance out of
some EIDE 7200 rpm disks (which I have done for testing).

The current setup is a 4xOpteron 8218 (dual core) w/ 16G ram.
I have roughly 12TB usable disk space on the sysem connected via some
SATA <-> FC thing our systems folks got us.  Problem I have is the linear
cpu speed isn't enough and there would be challenges splitting the
workload across multiple cpus.  All my major reporting is done via
pg_dump and I'm pondering what would happen if I just removed Pg
from the equation for the major reporting tasks entirely.  I may see
much better performance without the database [in my way].  I've not
done that as some types of data access would need to be significantly
redone and I don't want to spend the time on that...

- Jared

-- 
Jared Mauch  | pgp key available via finger from [EMAIL PROTECTED]
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] big database performance

2008-01-10 Thread Jared Mauch
On Thu, Jan 10, 2008 at 10:57:46AM +0200, Adrian Moisey wrote:
> What sort of information do you need from me ?

Ratio of read vs write operations (select vs insert/copy).

average number of indicies per table

average table size.  (analyze verbose  if you want to get
into more details).

What is the process doing (eg: in top, is it just on the CPU or
is it blocking for I/O?).

I/O information, from iostat -d (You may need to build an iostat
binary for Linux, the source is out there, i can give you a pointer if
you need it).

>>  Is your problem with performance database reads? writes? (insert/copy?)
>> How many indicies do you have?
>
> I think the problem is related to load.  Everything is slow because there 
> are way too many connections.  So everything is making everything else 
> slow.  Not much detail, is it?
>
> We have 345 indicies on the db.

If the tables are heavily indexed this could easily slow down
insert performance.  Taking a large dataset and adding a second
index, postgres doesn't use threads to create the two indicies on
different cpus/cores in parallel.  This could represent some of your
performance difference.  If you're doing a lot of write operations
and fewer read, perhaps the cost of an index isn't worth it in the
cpu time spent creating it vs the amount of time for a seq scan.

- Jared

-- 
Jared Mauch  | pgp key available via finger from [EMAIL PROTECTED]
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Search for fixed set of keywords

2008-01-10 Thread Oleg Bartunov

On Thu, 10 Jan 2008, J?rg Kiegeland wrote:


Did you try integer arrays with GIN (inverted index) ?
I now tried this, and GIN turned out to be linear time, compared with GIST 
which was acceptable time. However I tested this only for Z=infinity, for 
Z=1000, GIST/GIN are both not acceptable.


Sorry, I didn't follow your problem, but GIN should be certainly
logarithmic on the number of unique words. Also, it'd be much clear
if you show us your queries and explain analyze.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] big database performance

2008-01-10 Thread Stephane Bailliez

Jared Mauch wrote:

I do large databases in Pg, like 300GB/day of new data.
That's impressive.  Would it be possible to have details on your 
hardware, schema and configuration and type of usage ?


I'm sure there's something to learn in there for a lot of people (or at 
least for me)


Cheers,

-- stephane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Search for fixed set of keywords

2008-01-10 Thread Jörg Kiegeland

Did you try integer arrays with GIN (inverted index) ?
I now tried this, and GIN turned out to be linear time, compared with 
GIST which was acceptable time. However I tested this only for 
Z=infinity, for Z=1000, GIST/GIN are both not acceptable.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] big database performance

2008-01-10 Thread Adrian Moisey

Hi


I do large databases in Pg, like 300GB/day of new data.  Need a lot
more data on what you're having issues with.


That is big!

What sort of information do you need from me ?

	Is your problem with performance database reads? 
writes? (insert/copy?)  How many indicies do you have?


I think the problem is related to load.  Everything is slow because 
there are way too many connections.  So everything is making everything 
else slow.  Not much detail, is it?


We have 345 indicies on the db.

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] big database performance

2008-01-10 Thread Adrian Moisey

Hi


Also, we're running the db on ext3 with noatime.   Should I look at
changing or getting rid of journaling ?


No (unless you like really long fsck times). data=writeback is safe with 
PostgreSQL, though.


I tested that on a dev box, and I didn't notice a difference when using 
pgbench


--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match