[GENERAL] Data Warehousing

2007-09-03 Thread Rob Kirkbride
Hi,

I've got a postgres database collected logged data. This data I have to keep
for at least 3 years. The data in the first instance is being recorded in a
postgres cluster. This then needs to be moved a reports database server for
analysis. Therefore I'd like a job to dump data on the cluster say every
hour and record this is in the reports database. The clustered database
could be purged of say data more than a week old.

So basically I need a dump/restore that only appends new data to the reports
server database.

I've googled but can't find anything, can anyone help?

Thanks

Rob


Re: [GENERAL] Data Warehousing

2007-09-03 Thread Scott Marlowe
On 9/3/07, Rob Kirkbride <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've got a postgres database collected logged data. This data I have to keep
> for at least 3 years. The data in the first instance is being recorded in a
> postgres cluster. This then needs to be moved a reports database server for
> analysis. Therefore I'd like a job to dump data on the cluster say every
> hour and record this is in the reports database. The clustered database
> could be purged of say data more than a week old.
>
> So basically I need a dump/restore that only appends new data to the reports
> server database.
>
> I've googled but can't find anything, can anyone help?

You might find an answer in partitioning your data.  There's a section
in the docs on it.  Then you can just dump the old data from the
newest couple of partitions if you're partitioning by week, and dump
anything older with a simple delete where date < now() - interval '1
week' or something like that.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Data Warehousing

2007-09-03 Thread Andrej Ricnik-Bay
On 9/3/07, Rob Kirkbride <[EMAIL PROTECTED]> wrote:

> So basically I need a dump/restore that only appends new
> data to the reports server database.
I guess that will all depend on whether or not your
data has a record of the time it got stuck in the cluster
or not ... if there's no concept of a time-stamp attached
to the records as they get entered I don't think it can be
done.


> Thanks
>
> Rob
Cheers,
Andrej

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] invalid byte sequence for encoding "UTF8": 0xff

2007-09-03 Thread Ashish Karalkar
Hello All,

I have a data script which runs fine from PgAdmin SQL Editor,but when I  run 
this  from command prompt I get following error:


test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql

psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR:  invalid byt
e sequence for encoding "UTF8": 0xff
HINT:  This error can also happen if the byte sequence does not match the encodi
ng expected by the server, which is controlled by "client_encoding".


can anybody suggest me what is going wrong.
database  encoding :UTF8

PostgreSQL details:

 version
---
 PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3 
20041212 (Red Hat 3.4.3-9.EL4)



Re: [GENERAL] invalid byte sequence for encoding "UTF8": 0xff

2007-09-03 Thread Martijn van Oosterhout
On Mon, Sep 03, 2007 at 01:36:58PM +0530, Ashish Karalkar wrote:
> Hello All,
> 
> I have a data script which runs fine from PgAdmin SQL Editor,but when I  run 
> this  from command prompt I get following error:
> test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql
> 
> psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: ERROR:  invalid 
> byt
> e sequence for encoding "UTF8": 0xff
> HINT:  This error can also happen if the byte sequence does not match the 
> encodi
> ng expected by the server, which is controlled by "client_encoding".

Well, the error is correct, that's not a valid UTF-8 character. I seem
to remember someone saying that ooasionally windows puts BOMs in UTF-8
files (which is completely bogus). Check the file using a simple text
editor a check if there are some odd characters at the beginning of the
file.

Have a ncie day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


[GENERAL] Instances where enable_seqscan = false is good

2007-09-03 Thread Ow Mun Heng
Same query, executed twice, once using seqscan enabled and the other
with it disabled. Difference is nearly night and day.


How can I persuade PG to use the index w/o resorting to setting seqscan
= false

(actually, I don't know what are the pro or cons - I read posts from the
archives far back as 2005 but that was dealing with inconsistencies in
the text string eg: warwa(s/z/etc..) which caused it to pick seq scans.)

PPl in IRC suggested setting default_statistics = 100 but I didn't find
that useful nor helpful. Also, tables has been vacuum and analysed.

Using Seq-scans
---


QUERY PLAN   
--
 Limit  (cost=4430.53..50173.70 rows=1000 width=47) (actual
time=21832.092..43771.536 rows=228 loops=1)
   ->  Hash Join  (cost=4430.53..260866.77 rows=5606 width=47) (actual
time=21832.088..43770.927 rows=228 loops=1)
 Hash Cond: ((trz.number)::text = (drv.number)::text)
 ->  Seq Scan on zone trz  (cost=0.00..233254.27 rows=6148222
width=39) (actual time=22.807..31891.591 rows=6181910 loops=1)
   Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
 ->  Hash  (cost=4414.39..4414.39 rows=1291 width=24) (actual
time=171.911..171.911 rows=12591 loops=1)
   ->  Bitmap Heap Scan on drv  (cost=30.44..4414.39
rows=1291 width=24) (actual time=62.980..142.594 rows=12591 loops=1)
 Recheck Cond: ((code)::text = 'NRN15'::text)
 ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11
rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
   Index Cond: ((code)::text = 'NRN15'::text)
 Total runtime: 43772.045 ms
(11 rows)


set enable_seqscan = false;

QUERY PLAN 
---
--
 Limit  (cost=0.00..69314.54 rows=1000 width=47) (actual
time=122.920..553.538 rows=228 loops=1)
   ->  Nested Loop  (cost=0.00..388646.63 rows=5607 width=47) (actual
time=122.915..552.956 rows=228 loops=1)
 ->  Index Scan using idx_drv on drv  (cost=0.00..5077.64
rows=1293 width=24) (actual time=38.164..110.933 rows=12591 loops=1)
   Index Cond: ((code)::text = 'NRN15'::text)
 ->  Index Scan using idx_trz_sn on zone trz  (cost=0.00..295.10
rows=120 width=39) (actual time=0.021..0.021 rows=0 loops=12591)
   Index Cond: ((drv.number)::text = (trz.number)::text)
   Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
 Total runtime: 553.964 ms
(8 rows)

Using Seq-scans
---

 QUERY PLAN 
  

 Limit  (cost=4430.53..50173.70 rows=1000 width=47) (actual 
time=21832.092..43771.536 rows=228 loops=1)
   ->  Hash Join  (cost=4430.53..260866.77 rows=5606 width=47) (actual 
time=21832.088..43770.927 rows=228 loops=1)
 Hash Cond: ((trz.number)::text = (drv.number)::text)
 ->  Seq Scan on zone trz  (cost=0.00..233254.27 rows=6148222 width=39) 
(actual time=22.807..31891.591 rows=6181910 loops=1)
   Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
 ->  Hash  (cost=4414.39..4414.39 rows=1291 width=24) (actual 
time=171.911..171.911 rows=12591 loops=1)
   ->  Bitmap Heap Scan on drv  (cost=30.44..4414.39 rows=1291 
width=24) (actual time=62.980..142.594 rows=12591 loops=1)
 Recheck Cond: ((code)::text = 'NRN15'::text)
 ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11 
rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
   Index Cond: ((code)::text = 'NRN15'::text)
 Total runtime: 43772.045 ms
(11 rows)


set enable_seqscan = false;
   QUERY 
PLAN 
-
 Limit  (cost=0.00..69314.54 rows=1000 width=47) (actual time=122.920..553.538 
rows=228 loops=1)
   ->  Nested Loop  (cost=0.00..388646.63 rows=5607 width=47) (actual 
time=122.915..552.956 rows=228 loops=1)
 ->  Index Scan using idx_drv on drv  (cost=0.00..5077.64 rows=1293 
width=24) (actual time=38.164..110.933 rows=12591 loops=1)
   Index Cond: ((code)::text = 'NRN15'::text)
 ->  Index Scan using idx_trz_sn on zone trz  (cost=0.00..295.10 
rows=120 width=39) (actual time=0.021..0.021 rows=0 loops=12591)
   Index Cond: ((drv.number)::text = (trz.numb

Re: [GENERAL] invalid byte sequence for encoding "UTF8": 0xff

2007-09-03 Thread Albe Laurenz
Ashish Karalkar wrote:
> I have a data script which runs fine from PgAdmin SQL 
> Editor,but when I  run this  from command prompt I get 
> following error:
>  
>  
> test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql
> 
> psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1: 
> ERROR:  invalid byte sequence for encoding "UTF8": 0xff
> HINT:  This error can also happen if the byte sequence does 
> not match the encoding expected by the server, which is
> controlled by "client_encoding".
>  
> can anybody suggest me what is going wrong.
> database  encoding :UTF8
>  
> PostgreSQL details:
>  
>  version
> --
>  PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc 
> (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)

Can you tell us the following:

- What is the client operating system (where you run psql and
  PgAdmin III)?
- What is the value of the environment variable PGCLIENTENCODING
  set to on the client?
- What does the SQL command "show client_encoding;" return
  when you issue it in
  a) PgAdmin III
  b) psql
- Please create a file that contains only the first line
  of QSWEB_100_4_Default_Data.sql (I call it "l" in the following
  commands), run the following two (Linux) commands on it:
  a) od -t c l
  b) od -t x1 l
  and show us the output of both commands.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Data Warehousing

2007-09-03 Thread Rob Kirkbride
On 03/09/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
>
> On 9/3/07, Rob Kirkbride <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I've got a postgres database collected logged data. This data I have to
> keep
> > for at least 3 years. The data in the first instance is being recorded
> in a
> > postgres cluster. This then needs to be moved a reports database server
> for
> > analysis. Therefore I'd like a job to dump data on the cluster say every
> > hour and record this is in the reports database. The clustered database
> > could be purged of say data more than a week old.
> >
> > So basically I need a dump/restore that only appends new data to the
> reports
> > server database.
> >
> > I've googled but can't find anything, can anyone help?
>
> You might find an answer in partitioning your data.  There's a section
> in the docs on it.  Then you can just dump the old data from the
> newest couple of partitions if you're partitioning by week, and dump
> anything older with a simple delete where date < now() - interval '1
> week' or something like that.



We're using hibernate to write to the database. Partitioning looks like it
will be too much of a re-architecture. In reply to Andrej we do have a
logged_time entity in the required tables. That being the case how does that
help me with the tools provided?

Might I have to write a custom JDBC application to do the data migration?

Rob


Re: [GENERAL] Data Warehousing

2007-09-03 Thread Andrej Ricnik-Bay
On 9/3/07, Rob Kirkbride <[EMAIL PROTECTED]> wrote:

> We're using hibernate to write to the database. Partitioning looks like it
> will be too much of a re-architecture. In reply to Andrej we do have a
> logged_time entity in the required tables. That being the case how does that
> help me with the tools provided?
>
> Might I have to write a custom JDBC application to do the data migration?
That would be one option :}

If the server is on a Unix/Linux-platform you should be able
to achieve the result with a reasonably simple shell-script
and cron, I'd say.


> Rob
Cheers,
Andrej



-- 
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Data Warehousing

2007-09-03 Thread Rob Kirkbride

Andrej Ricnik-Bay wrote:

On 9/3/07, Rob Kirkbride <[EMAIL PROTECTED]> wrote:

  

We're using hibernate to write to the database. Partitioning looks like it
will be too much of a re-architecture. In reply to Andrej we do have a
logged_time entity in the required tables. That being the case how does that
help me with the tools provided?

Might I have to write a custom JDBC application to do the data migration?


That would be one option :}

If the server is on a Unix/Linux-platform you should be able
to achieve the result with a reasonably simple shell-script
and cron, I'd say.

  
I am on a Linux platform but I'm going to need some pointers regarding 
the cron job. Are you suggesting that I parse the dump file? I assume I 
would need to switch to using inserts and then parse the dump looking 
for where I need to start from?





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Instances where enable_seqscan = false is good

2007-09-03 Thread Gregory Stark
"Ow Mun Heng" <[EMAIL PROTECTED]> writes:

> Same query, executed twice, once using seqscan enabled and the other
> with it disabled. Difference is nearly night and day.
>
>
> How can I persuade PG to use the index w/o resorting to setting seqscan
> = false

The usual knob to fiddle with is random_page_cost. If your database fits
mostly in memory you may want to turn it down from the default of 4 to
something closer to 1. Perhaps 2 or even 1.5 or so. But don't do it based on a
single query under testing conditions, use a wide variety of queries under
production conditions.

> QUERY PLAN   
> --
>  Limit  (cost=4430.53..50173.70 rows=1000 width=47) (actual 
> time=21832.092..43771.536 rows=228 loops=1)
>->  Hash Join  (cost=4430.53..260866.77 rows=5606 width=47) (actual 
> time=21832.088..43770.927 rows=228 loops=1)

The difference between the predicted and actual rows is suspicious. let's look
lower down to see where it comes from.

>  Hash Cond: ((trz.number)::text = (drv.number)::text)
>  ->  Seq Scan on zone trz  (cost=0.00..233254.27 rows=6148222 
> width=39) (actual time=22.807..31891.591 rows=6181910 loops=1)
>Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))

This part looks ok 615k versus 618k is pretty good.

>  ->  Hash  (cost=4414.39..4414.39 rows=1291 width=24) (actual 
> time=171.911..171.911 rows=12591 loops=1)

Ah, this is off by an order of magnitude, that's bad.

>->  Bitmap Heap Scan on drv  (cost=30.44..4414.39 rows=1291 
> width=24) (actual time=62.980..142.594 rows=12591 loops=1)
>  Recheck Cond: ((code)::text = 'NRN15'::text)
>  ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11 
> rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
>Index Cond: ((code)::text = 'NRN15'::text)

So you might want to increase the statistics target for the "code" column.

Incidentally the way this is written makes me wonder what data type "code" is
defined as.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] invalid byte sequence for encoding "UTF8": 0xff

2007-09-03 Thread Ashish Karalkar


- Original Message - 
From: "Ashish Karalkar" <[EMAIL PROTECTED]>

To: "Albe Laurenz" <[EMAIL PROTECTED]>
Sent: Monday, September 03, 2007 4:09 PM
Subject: Re: [GENERAL] invalid byte sequence for encoding "UTF8": 0xff




- Original Message - 
From: "Albe Laurenz" <[EMAIL PROTECTED]>
To: "Ashish Karalkar *EXTERN*" <[EMAIL PROTECTED]>; 


Sent: Monday, September 03, 2007 2:12 PM
Subject: Re: [GENERAL] invalid byte sequence for encoding "UTF8": 0xff


Ashish Karalkar wrote:

I have a data script which runs fine from PgAdmin SQL
Editor,but when I  run this  from command prompt I get
following error:


test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql

psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1:
ERROR:  invalid byte sequence for encoding "UTF8": 0xff
HINT:  This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".

can anybody suggest me what is going wrong.
database  encoding :UTF8

PostgreSQL details:

 version
--
 PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc
(GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)


Can you tell us the following:

Please find my answer below

- What is the client operating system (where you run psql and
 PgAdmin III)?

Its Windows XP - PgAdmin III
RHEL 3.4.3-9.EL4-psql (Server Machine)


- What is the value of the environment variable PGCLIENTENCODING
 set to on the client?
PGCLIENTENCODING is not set and as per documantation I think by default it 
takes value of database i.e. UTF8


- What does the SQL command "show client_encoding;" return
 when you issue it in
 a) PgAdmin III
UNICODE
 b) psql
UTF8

- Please create a file that contains only the first line
 of QSWEB_100_4_Default_Data.sql (I call it "l" in the following
 commands), run the following two (Linux) commands on it:
 a) od -t c l
 b) od -t x1 l
 and show us the output of both commands.


[EMAIL PROTECTED] qsweb]# od -t c test.sql
000   \   s   e   t   O   N   _   E   R   R   O   R   _   S   T
020   O   P
022
[EMAIL PROTECTED] qsweb]# od -t x1 test.sql
000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54
020 4f 50
022
[EMAIL PROTECTED] qsweb]#

Thanks Albe for your replay.
here is the data you wanted


Yours,
Laurenz Albe

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/ 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] invalid byte sequence for encoding "UTF8": 0xff

2007-09-03 Thread Albe Laurenz
Ashish Karalkar wrote:
>>> I have a data script which runs fine from PgAdmin SQL
>>> Editor,but when I  run this  from command prompt I get
>>> following error:
>>>
>>> test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql
>>>
>>> psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1:
>>> ERROR:  invalid byte sequence for encoding "UTF8": 0xff
>>>
>>>  version
>>> --
>>>  PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc
>>> (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)
>>
>> - What is the client operating system (where you run psql and
>>  PgAdmin III)?
>>
> Its Windows XP - PgAdmin III
> RHEL 3.4.3-9.EL4-psql (Server Machine)

So I get it that you run "psql" on Windows XP, right?

> PGCLIENTENCODING is not set and as per documantation I 
> think by default it takes value of database i.e. UTF8
>
>> - What does the SQL command "show client_encoding;" return
>>  when you issue it in
>>  a) PgAdmin III
> UNICODE
>>  b) psql
> UTF8

Ok, I suspect that's your problem.
You created QSWEB_100_4_Default_Data.sql by using the "Save" dialog
in PgAdmin III on the Windows machine, right?

Then the file will probably be encoded in Windows-1252.

If your client_encoding is set to UTF8, psql will expect UTF-8
data in the SQL script and complain if it meets wrong ones.

Does the script work as expected when you change the client
encoding to WIN1252?

>> - Please create a file that contains only the first line
>>  of QSWEB_100_4_Default_Data.sql (I call it "l" in the following
>>  commands), run the following two (Linux) commands on it:
>>  a) od -t c l
>>  b) od -t x1 l
>>  and show us the output of both commands.
>
> [EMAIL PROTECTED] qsweb]# od -t c test.sql
> 000   \   s   e   t   O   N   _   E   R   R   O   R   _   S
T
> 020   O   P
> 022
> [EMAIL PROTECTED] qsweb]# od -t x1 test.sql
> 000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54
> 020 4f 50
> 022

That's weird, because psql complained about line 1.

Maybe you messed something up by extracting the first line.

Try the following:

- Use binary file transfer and transfer the SQL script to a Linux
machine.

- Run "od -t c -t x1" on the file

- Find the 0xff that psql complains about.

Maybe that helps to locate the problem.
0xff is an unusual Windows-1252 character as well...

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] invalid byte sequence for encoding "UTF8": 0xff

2007-09-03 Thread Ashish Karalkar


- Original Message - 
From: "Albe Laurenz" <[EMAIL PROTECTED]>

To: "Ashish Karalkar *EXTERN*" <[EMAIL PROTECTED]>
Cc: 
Sent: Monday, September 03, 2007 4:54 PM
Subject: RE: [GENERAL] invalid byte sequence for encoding "UTF8": 0xff


Ashish Karalkar wrote:

I have a data script which runs fine from PgAdmin SQL
Editor,but when I  run this  from command prompt I get
following error:

test=# \i /usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql

psql:/usr/local/pgsql/qsweb1/QSWEB_100_4_Default_Data.sql:1:
ERROR:  invalid byte sequence for encoding "UTF8": 0xff

 version
--
 PostgreSQL 8.2.0 on i686-pc-linux-gnu, compiled by GCC gcc
(GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)


- What is the client operating system (where you run psql and
 PgAdmin III)?


Its Windows XP - PgAdmin III
RHEL 3.4.3-9.EL4-psql (Server Machine)


So I get it that you run "psql" on Windows XP, right?

PGCLIENTENCODING is not set and as per documantation I 
think by default it takes value of database i.e. UTF8



- What does the SQL command "show client_encoding;" return
 when you issue it in
 a) PgAdmin III

UNICODE

 b) psql

UTF8


Ok, I suspect that's your problem.
You created QSWEB_100_4_Default_Data.sql by using the "Save" dialog
in PgAdmin III on the Windows machine, right?

Then the file will probably be encoded in Windows-1252.

If your client_encoding is set to UTF8, psql will expect UTF-8
data in the SQL script and complain if it meets wrong ones.

Does the script work as expected when you change the client
encoding to WIN1252?


- Please create a file that contains only the first line
 of QSWEB_100_4_Default_Data.sql (I call it "l" in the following
 commands), run the following two (Linux) commands on it:
 a) od -t c l
 b) od -t x1 l
 and show us the output of both commands.


[EMAIL PROTECTED] qsweb]# od -t c test.sql
000   \   s   e   t   O   N   _   E   R   R   O   R   _   S

T

020   O   P
022
[EMAIL PROTECTED] qsweb]# od -t x1 test.sql
000 5c 73 65 74 20 4f 4e 5f 45 52 52 4f 52 5f 53 54
020 4f 50
022


That's weird, because psql complained about line 1.

Maybe you messed something up by extracting the first line.

Try the following:

- Use binary file transfer and transfer the SQL script to a Linux
machine.

- Run "od -t c -t x1" on the file

- Find the 0xff that psql complains about.

Maybe that helps to locate the problem.
0xff is an unusual Windows-1252 character as well...

Hey Thanks Albe it worked.


Yours,
Laurenz Albe

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Cannot install under Windows Vista

2007-09-03 Thread tkdchen
Hi,

I install PostgreSQL under Windows Vista, but the MSI failed. It told
me that it has no right to create postgres user account. I don't know
why. I just run the MSI package with an Administrator account. Please
help me to solve this problem.

-- 
GoogleTalk: [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED]
My Space: tkdchen.spaces.live.com
BOINC: boinc.berkeley.edu
中国分布式计算总站: www.equn.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] Statistics collection question

2007-09-03 Thread Phoenix Kiula
A couple of questions about the "most_common_vals" stuff in pg_stats
for a high traffic table:

1. Can I tell the stats collector to collect only values of a column
where a certain regex is matched? It is currently collecting the 500
values where most of them are values that I don't want, so it's
polluted with unwanted vals and therefore useless.

2. Secondly, for a unique column in the table, will the
"most_common_vals" always be -1? I guess this could make sense, but I
was wondering if the stats collector could somehow collect at least
1000 unique values to improve at least some performance.

TIA!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Connecting to PostgreSQL server with Mono using ident authetication

2007-09-03 Thread Andreas Tille

Hi,

it is my first shot using Mono and I failed to get the example from

   http://www.mono-project.de/wiki/keyword/PostgreSQL/

working.  The reason is obviousely that whatever I tried NpgsqlConnection
tries to use password authentication but I have configured my system
that ident authentication is used by default.

I'm using Debian GNU/Linux testing running postgresql 8.2 and mono 1.2.
Could anybody enlight me how to connect to a database where users have
no password set because always ident authentication is used on local host.

I hope this is the right list for this kind of questions.  Any hint for
a better place to ask is welcome as well.

Kind regards

Andreas.

PS: It would be welcome if you keep me in CC because I'm not subscribed
to the list, but I can read the archive as well.

--
http://fam-tille.de

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Cannot install under Windows Vista

2007-09-03 Thread Rainer Bauer
tkdchen wrote:

>I install PostgreSQL under Windows Vista, but the MSI failed. It told
>me that it has no right to create postgres user account. I don't know
>why. I just run the MSI package with an Administrator account. Please
>help me to solve this problem.

Sounds like the UAC (User Access Control) is in the way.

There is a document available from Microsoft describing the installation:


Quote:
"Notes on Vista: 
The install on Vista is similar to other Windows installs but to install on
Vista, you must turn off User Account Control first."

Rainer

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] WAL Archiving problem

2007-09-03 Thread Norberto Delle

Tom Lane writes:

Norberto Delle <[EMAIL PROTECTED]> writes:

I have a PostgreSQL 8.2.4 installation running under Windows XP with WAL
archiving activated.
But at some point Postgres began to ask to archive a WAL segment that
isn't in the pg_xlog directory. I thought that a segment that isn't 
succesfully

archived should remain in the pg_xlog directory, or am i wrong?


Do you have the postmaster log from around the time that this started
happening?  I'm wondering about a file rename() failing, or some such.

What files do you have, exactly, in pg_xlog and pg_xlog/archive_status?
It'd be useful to see their modification timestamps as well as their
names.

Hi all

Thank you Tom, for the quick answer

Here is the part of the postmaster log where something wrong happened:

-- This sequence of WAL files was originated by a restore (COPY FROM stdin)

2007-08-20 09:09:40 LOG:  archived transaction log file 
"0001000200DC"
2007-08-20 09:10:27 LOG:  archived transaction log file 
"0001000200DD"
2007-08-20 09:11:07 LOG:  archived transaction log file 
"0001000200DE"
2007-08-20 09:11:33 LOG:  archived transaction log file 
"0001000200DF"
2007-08-20 09:11:38 LOG:  archived transaction log file 
"0001000200E0"
2007-08-20 09:11:42 LOG:  archived transaction log file 
"0001000200E1"
2007-08-20 09:11:46 LOG:  archived transaction log file 
"0001000200E2"
2007-08-20 09:11:50 LOG:  archived transaction log file 
"0001000200E3"
2007-08-20 09:11:53 LOG:  archived transaction log file 
"0001000200E4"
2007-08-20 09:11:57 LOG:  archived transaction log file 
"0001000200E5"
2007-08-20 09:12:01 LOG:  archived transaction log file 
"0001000200E6"
2007-08-20 09:12:09 LOG:  archived transaction log file 
"0001000200E7"
2007-08-20 09:12:20 LOG:  archived transaction log file 
"0001000200E8"
2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown 
winsock error 10061
2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown 
winsock error 10061

2007-08-20 09:12:21 LOG:  unexpected EOF on client connection
2007-08-20 09:12:21 LOG:  unexpected EOF on client connection
2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown 
winsock error 10061

2007-08-20 09:12:21 LOG:  unexpected EOF on client connection

-- Note that here the WAL file '0001000200E9' was archived 
(Postgres thinks it was,

-- because it's not present in the backup directory)

2007-08-20 09:12:33 LOG:  archived transaction log file 
"0001000200E9"
2007-08-20 09:12:46 LOG:  archived transaction log file 
"0001000200EA"
2007-08-20 09:12:57 LOG:  archived transaction log file 
"0001000200EB"


-- And here Postgres is asking to archive '0001000200E9' again

2007-08-20 09:22:29 LOG:  archive command "C:\Imob\IMOBBackup\bbp.exe 
-wal="pg_xlog\0001000200E9"" failed: return code 13
2007-08-20 09:22:31 LOG:  archive command "C:\Imob\IMOBBackup\bbp.exe 
-wal="pg_xlog\0001000200E9"" failed: return code 13
2007-08-20 09:22:32 LOG:  archive command "C:\Imob\IMOBBackup\bbp.exe 
-wal="pg_xlog\0001000200E9"" failed: return code 13
2007-08-20 09:22:32 WARNING:  transaction log file 
"0001000200E9" could not be archived: too many failures



Looking in bbp.exe log i realized that the archive command fails because 
pg_xlog\0001000200E9 is not found,
and looking in the pg_xlog\archive_status directory there is a file 
named '0001000200E9.X.ready'.
More information will be difficult to obtain because a don't have direct 
access to the server.


I hope this information helps



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Statistics collection question

2007-09-03 Thread Tom Lane
"Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> A couple of questions about the "most_common_vals" stuff in pg_stats
> for a high traffic table:

> 1. Can I tell the stats collector to collect only values of a column
> where a certain regex is matched?

Not directly, but you could set up a partial index defined that way,
and ANALYZE would collect stats on the index contents.  Whether the
planner could actually do anything with the information is another
story; I suspect you're wasting your time with this idea.

> 2. Secondly, for a unique column in the table, will the
> "most_common_vals" always be -1? I guess this could make sense, but I
> was wondering if the stats collector could somehow collect at least
> 1000 unique values to improve at least some performance.

most_common_vals will (and should) be empty if there aren't actually any
common values, but aren't you getting a histogram?  Exactly what
performance do you think will be improved?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] WAL Archiving problem

2007-09-03 Thread Tom Lane
Norberto Delle <[EMAIL PROTECTED]> writes:
> 2007-08-20 09:12:09 LOG:  archived transaction log file 
> "0001000200E7"
> 2007-08-20 09:12:20 LOG:  archived transaction log file 
> "0001000200E8"
> 2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown 
> winsock error 10061
> 2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown 
> winsock error 10061
> 2007-08-20 09:12:21 LOG:  unexpected EOF on client connection
> 2007-08-20 09:12:21 LOG:  unexpected EOF on client connection
> 2007-08-20 09:12:21 LOG:  could not receive data from client: Unknown 
> winsock error 10061
> 2007-08-20 09:12:21 LOG:  unexpected EOF on client connection

> -- Note that here the WAL file '0001000200E9' was archived 
> (Postgres thinks it was,
> -- because it's not present in the backup directory)

> 2007-08-20 09:12:33 LOG:  archived transaction log file 
> "0001000200E9"
> 2007-08-20 09:12:46 LOG:  archived transaction log file 
> "0001000200EA"

Hmm.  The broken client connections should in theory be unrelated to
anything happening with WAL files, but it does seem mighty suspicious
that they happened in the same time period that that was the active
WAL file.  Do you see a lot of those "error 10061" entries elsewhere
in your logs, or was this an unusual occurrence?  Also, what exactly
is your archiving script doing --- does it send the file over a network
connection?  If the messages we can see above indicate a transient
network problem, as seems likely, that might possibly have affected
the archiving process as well.  Are you sure your archiving script
would have noticed a network-related failure?

> -- And here Postgres is asking to archive '0001000200E9' again

> 2007-08-20 09:22:29 LOG:  archive command "C:\Imob\IMOBBackup\bbp.exe 
> -wal="pg_xlog\0001000200E9"" failed: return code 13

Ten minutes later --- that's a heck of a long time when you're finishing
a WAL file every ten or fifteen seconds.  Please check exactly what
timestamp is on the .ready file.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Statistics collection question

2007-09-03 Thread Phoenix Kiula
On 03/09/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Phoenix Kiula" <[EMAIL PROTECTED]> writes:

> most_common_vals will (and should) be empty if there aren't actually any
> common values, but aren't you getting a histogram?  Exactly what
> performance do you think will be improved?


Lots of posts here in reponse to performance question have the
recommendation "increase the stats on that column". From whatever
succint reading is made available on the postgres site, I gather that
this aids the planner in getting some info about some of the data. Am
I missing something here, or totally off-base?

The issue is that I don't quite get why MySQL can fetch one indexed
row (i.e., SQL that ends with a very simple "WHERE indexed_column =
'constant' ") in a matter of milliseconds, but PgSQL is taking 5 to 6
seconds on an average at least for the first time. I use RAPTOR 15K
drives, they're not SCSI but they're not exactly "cheap disks" either.
And I have 4GB RAM. The explain select  shows that index is being
used!

TIA.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Statistics collection question

2007-09-03 Thread Alvaro Herrera
Phoenix Kiula escribió:
> On 03/09/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> > "Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> 
> > most_common_vals will (and should) be empty if there aren't actually any
> > common values, but aren't you getting a histogram?  Exactly what
> > performance do you think will be improved?
> 
> 
> Lots of posts here in reponse to performance question have the
> recommendation "increase the stats on that column". From whatever
> succint reading is made available on the postgres site, I gather that
> this aids the planner in getting some info about some of the data. Am
> I missing something here, or totally off-base?
> 
> The issue is that I don't quite get why MySQL can fetch one indexed
> row (i.e., SQL that ends with a very simple "WHERE indexed_column =
> 'constant' ") in a matter of milliseconds, but PgSQL is taking 5 to 6
> seconds on an average at least for the first time. I use RAPTOR 15K
> drives, they're not SCSI but they're not exactly "cheap disks" either.
> And I have 4GB RAM. The explain select  shows that index is being
> used!

Let's see the explain output?  I doubt your games with stats have
anything to do with it.  Maybe it is having to scan a lot of dead tuples
or something like that.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Statistics collection question

2007-09-03 Thread Alban Hertroys
Phoenix Kiula wrote:
> Lots of posts here in reponse to performance question have the
> recommendation "increase the stats on that column". From whatever
> succint reading is made available on the postgres site, I gather that
> this aids the planner in getting some info about some of the data. Am
> I missing something here, or totally off-base?

As I understand it it's a sample of how the data is distributed.
Probably it's based on statistical mathematics that specifies a minimum
size for a representive sample of a given data set. It boils down to:
"If you want to know how many people like vanilla ice cream, how many
people do you need to ask their preference?".

> The issue is that I don't quite get why MySQL can fetch one indexed
> row (i.e., SQL that ends with a very simple "WHERE indexed_column =
> 'constant' ") in a matter of milliseconds, but PgSQL is taking 5 to 6
> seconds on an average at least for the first time. I use RAPTOR 15K
> drives, they're not SCSI but they're not exactly "cheap disks" either.
> And I have 4GB RAM. The explain select  shows that index is being
> used!

That's definitely not normal. I have a smallish table here containing
2.5 million records, and querying for one with a specific index takes
141 micro(!) seconds. The hardware involved is a dual opteron with 4G,
in a xen domain; I don't know what disks are used, but I doubt they're
raptors.

So something is wrong with your setup, that much is obvious. I sincerely
doubt that postgres is to blame here.

You did check that you're not connecting through the internet and
getting a DNS timeout?

Regards,
-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Statistics collection question

2007-09-03 Thread Phoenix Kiula
On 03/09/07, Alban Hertroys <[EMAIL PROTECTED]> wrote:
> Phoenix Kiula wrote:
> As I understand it it's a sample of how the data is distributed.
> Probably it's based on statistical mathematics that specifies a minimum
> size for a representive sample of a given data set. It boils down to:
> "If you want to know how many people like vanilla ice cream, how many
> people do you need to ask their preference?".


Thanks for this explanation. So I should set the statistics on my
indexed column to 10 or so? I made it 1000 this morning, trying to see
how it would affect performance.



> That's definitely not normal. I have a smallish table here containing
> 2.5 million records, and querying for one with a specific index takes
> 141 micro(!) seconds. The hardware involved is a dual opteron with 4G,
> in a xen domain; I don't know what disks are used, but I doubt they're
> raptors.
>
> So something is wrong with your setup, that much is obvious. I sincerely
> doubt that postgres is to blame here.
>
> You did check that you're not connecting through the internet and
> getting a DNS timeout?



I am getting these times from the postgres log (pglog). I have setup
the minimum query time as 5000 (ms). Here is an except from my
log...which is constantly updated with more and more of these!

Here's an excerpt from  the log. It looks abysmal!!


---

LOG:  duration: 85865.904 ms  statement: select t_info, dstats, id
from trades where t_alias = '1q8bf' and status = 'Y'
LOG:  duration: 83859.505 ms  statement: select t_info, dstats, id
from trades where t_alias = '1a7iv' and status = 'Y'
LOG:  duration: 71922.423 ms  statement: select t_info, dstats, id
from trades where t_alias = 'bvu' and status = 'Y'
LOG:  duration: 74924.741 ms  statement: select t_info, dstats, id
from trades where t_alias = 'nt3g' and status = 'Y'
LOG:  duration: 82471.036 ms  statement: select t_info, dstats, id
from trades where t_alias = '15p8m' and status = 'Y'
LOG:  duration: 90015.410 ms  statement: select t_info, dstats, id
from trades where t_alias = 'pkfi' and status = 'Y'
LOG:  duration: 72713.815 ms  statement: select t_info, dstats, id
from trades where t_alias = 'evdi' and status = 'Y'
LOG:  duration: 88054.444 ms  statement: select t_info, dstats, id
from trades where t_alias = '1a8zj' and status = 'Y'
LOG:  duration: 94502.678 ms  statement: select t_info, dstats, id
from trades where t_alias = '1d188' and status = 'Y'
LOG:  duration: 82178.724 ms  statement: select t_info, dstats, id
from trades where t_alias = 'q8zu' and status = 'Y'
LOG:  duration: 107030.741 ms  statement: select t_info, dstats, id
from trades where t_alias = 'jnzu' and status = 'Y'
LOG:  duration: 87634.723 ms  statement: select t_info, dstats, id
from trades where t_alias = 'tav9' and status = 'Y'
LOG:  duration: 104271.695 ms  statement: select t_info, dstats, id
from trades where t_alias = '37tk7' and status = 'Y'
LOG:  duration: 88726.671 ms  statement: select t_info, dstats, id
from trades where t_alias = 'tavc' and status = 'Y'
LOG:  duration: 74710.120 ms  statement: select t_info, dstats, id
from trades where t_alias = '1q8zu' and status = 'Y'
LOG:  duration: 93100.863 ms  statement: select t_info, dstats, id
from trades where t_alias = '1ovmc' and status = 'Y'
LOG:  duration: 83659.489 ms  statement: select t_info, dstats, id
from trades where t_alias = '1p9ub' and status = 'Y'
LOG:  duration: 71963.413 ms  statement: select t_info, dstats, id
from trades where t_alias = '9awlia' and status = 'Y'
LOG:  duration: 83569.602 ms  statement: select t_info, dstats, id
from trades where t_alias = '2yeza' and status = 'Y'
LOG:  duration: 93473.282 ms  statement: select t_info, dstats, id
from trades where t_alias = '17huv' and status = 'Y'

---


By way of an explanation, the T_INFO is a text column, DSTATS is
char(1), and ID is the bigint primary key. Status can be 'Y' or 'N',
so I have not included it in the index (not selective enough) but
T_ALIAS is the unique index.

The EXPLAIN ANALYZE output is as follows:


MYUSER=# explain analyze select t_info, dstats, id from trades where
t_alias = '17huv' and status = 'Y';

 QUERY PLAN

 Index Scan using trades_unique_t_alias on trades  (cost=0.00..3.41
rows=1 width=110) (actual time=0.100..0.104 rows=1 loops=1)
   Index Cond: ((t_alias)::text = '17huv'::text)
   Filter: (status = 'Y'::bpchar)
 Total runtime: 0.166 ms
(4 rows)

Time: 2.990 ms




And my postgresql.conf is looking like this:



max_connections  = 350
shared_buffers   = 21000# Not much more than
20k...http://www.revsys.com/writings/postgresql-performance.html
effective_cache_size = 128000
max_fsm_relations= 100
max_fsm_pages= 15
work_mem = 16000#
http:

[GENERAL] Vacuum process idle but hogging memory 8.2.4

2007-09-03 Thread Henrik

Hello list,

System is running linux kernel 2.6.18 with postgres 8.2.4 and 1GB ram.

I'm having a 50GB database with the biggest table taking about 30 GB  
and has about 200 million rows.


I'm already started to redesign the database to avoid the hugh number  
of rows in this big table but I'm still curious why autovacuum hogs  
over 200MB when it is not running?


Is it the shared_buffers?


Thanks,
Henke

shared_buffers = 128MB  
work_mem = 10MB 
maintenance_work_mem = 64MB

vacuum_cost_delay = 0   # 0-1000 milliseconds
vacuum_cost_limit = 200 # 0-1 credits

effective_cache_size = 256MB

autovacuum_vacuum_cost_delay = 50
autovacuum_vacuum_cost_limit = 150


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Statistics collection question

2007-09-03 Thread Richard Broersma Jr
--- Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> LOG:  duration: 93473.282 ms  statement: select t_info, dstats, id
> from trades where t_alias = '17huv' and status = 'Y'
> 
> ---
> 
>  Index Scan using trades_unique_t_alias on trades  (cost=0.00..3.41
 
> Time: 2.990 ms


Thats odd, I wonder why the EXPLAIN ANALYZE time id some much less that the 
logged select
statement times?

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Statistics collection question

2007-09-03 Thread Phoenix Kiula
On 04/09/07, Richard Broersma Jr <[EMAIL PROTECTED]> wrote:
> --- Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> > LOG:  duration: 93473.282 ms  statement: select t_info, dstats, id
> > from trades where t_alias = '17huv' and status = 'Y'
> >
> > ---
> >
> >  Index Scan using trades_unique_t_alias on trades  (cost=0.00..3.41
>
> > Time: 2.990 ms
>
>
> Thats odd, I wonder why the EXPLAIN ANALYZE time id some much less that the 
> logged select
> statement times?
>



Because the statement has been executed and is in the cache.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Data Warehousing

2007-09-03 Thread Andrej Ricnik-Bay
On 9/3/07, Rob Kirkbride <[EMAIL PROTECTED]> wrote:

> I am on a Linux platform but I'm going to need some pointers regarding
> the cron job. Are you suggesting that I parse the dump file? I assume I
> would need to switch to using inserts and then parse the dump looking
> for where I need to start from?
The question is: how complex is the data you need to
extract?  I guess where I was heading was to run a
select with the interval Scott described from psql into
a file, and then copy-from that into the analysis database.

However, if the structure is more complex, if you needed
to join tables, the parsing of a dump-file may be an option,
even though (always retaining a weeks worth) might make
that into quite some overhead.


Cheers,
Andrej



-- 
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Statistics collection question

2007-09-03 Thread Tom Lane
"Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> On 04/09/07, Richard Broersma Jr <[EMAIL PROTECTED]> wrote:
>> Thats odd, I wonder why the EXPLAIN ANALYZE time id some much less that the 
>> logged select
>> statement times?

> Because the statement has been executed and is in the cache.

That answer is way too flippant.  In particular it doesn't explain your
repeated 80sec queries --- you should have enough memory in that thing
to be caching a fair amount of your data.

I'm wondering about some transaction taking exclusive lock on the table
and sitting on it for a minute or so, and also about network problems
delaying transmission of data to the client.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Vacuum process idle but hogging memory 8.2.4

2007-09-03 Thread Tom Lane
Henrik <[EMAIL PROTECTED]> writes:
> I'm already started to redesign the database to avoid the hugh number  
> of rows in this big table but I'm still curious why autovacuum hogs  
> over 200MB when it is not running?

On what do you base that assertion?

> Is it the shared_buffers?

Well, 128M in shared buffers plus 64M maintenance_work_mem would go
a long way towards explaining a 200M process address space, but it's
hardly "hogging" the shared buffers.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] Suggestion for new function on pg_catalog: get_config()

2007-09-03 Thread Luiz K. Matsumura

Hi all,

I want to suggest a inclusion of an new function on the pg_catalog.
Since we have set_config(text, text, boolean)
can we have an get_config( text ) ?

I research and find an internal function that do it, we only don't have 
an call for it.
I think that can be like bellow (I don't know if the function can be 
STATIC or strict)


CREATE OR REPLACE FUNCTION get_config(text)
 RETURNS SETOF text AS
'show_config_by_name'
 LANGUAGE 'internal' VOLATILE;


TIA.

--
Luiz K. Matsumura
Plan IT Tecnologia Informática Ltda.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Instances where enable_seqscan = false is good

2007-09-03 Thread Ow Mun Heng
On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:
> "Ow Mun Heng" <[EMAIL PROTECTED]> writes:
> >
> > How can I persuade PG to use the index w/o resorting to setting seqscan
> > = false
> 
> The usual knob to fiddle with is random_page_cost. If your database fits
> mostly in memory you may want to turn it down from the default of 4 to
> something closer to 1. 

I tried down to 0.4 before it resorted to using the index. The DB
shouldn't fit into memory (I think) that table alone has ~8million rows
at ~1.5G size

> >  ->  Hash  (cost=4414.39..4414.39 rows=1291 width=24) (actual 
> > time=171.911..171.911 rows=12591 loops=1)
> 
> Ah, this is off by an order of magnitude, that's bad.
> 

having said so, still don't understand why..

> >->  Bitmap Heap Scan on drv  (cost=30.44..4414.39 rows=1291 
> > width=24) (actual time=62.980..142.594 rows=12591 loops=1)
> >  Recheck Cond: ((code)::text = 'NRN15'::text)
> >  ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11 
> > rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
> >Index Cond: ((code)::text = 'NRN15'::text)
> 
> So you might want to increase the statistics target for the "code" column.
Err.. how come? (newbie) it's scanning the index there. What's bad is
that it's using Seq_scans on the "zone" table.

> Incidentally the way this is written makes me wonder what data type "code" is
> defined as.

code is defined as varchar(5) data type. I'm changing all of the normal
char(n) to varchar(n) columns..

BTW, thanks for helping. Not using seq scans does really make a huge
difference as you can clearly see from the timing.

 Total runtime: 43772.045 ms
 Total runtime: 553.964 ms


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Suggestion for new function on pg_catalog: get_config()

2007-09-03 Thread Alvaro Herrera
Luiz K. Matsumura wrote:
> Hi all,
>
> I want to suggest a inclusion of an new function on the pg_catalog.
> Since we have set_config(text, text, boolean)
> can we have an get_config( text ) ?

Hum, isn't this current_setting()?

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"Coge la flor que hoy nace alegre, ufana. ¿Quién sabe si nacera otra mañana?"

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Suggestion for new function on pg_catalog: get_config()

2007-09-03 Thread Luiz K. Matsumura



Alvaro Herrera wrote:

Luiz K. Matsumura wrote:
  

Hi all,

I want to suggest a inclusion of an new function on the pg_catalog.
Since we have set_config(text, text, boolean)
can we have an get_config( text ) ?



Hum, isn't this current_setting()?

  
Oh oh, you are right, forget my suggestion... :-[ 
I'm lost in the ocean of functions of pg_catalog. lol


By the way,

select setting AS default_tablespace from pg_show_all_settings() x(name 
text, setting text, unit text, category text, short_desc text, 
extra_desc text, context text, vartype text, source text, min_val text, 
max_val text) where name = 'default_tablespace'


and

select current_setting('default_tablespace' )

can be considered equivalent ?

I'm just researching on a bug that i found in the new pgadmin3 and if 
this is equivalent i will suggest to pgadmin team to use it...


TIA


--
Luiz K. Matsumura
Plan IT Tecnologia Informática Ltda.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Instances where enable_seqscan = false is good

2007-09-03 Thread Ow Mun Heng
On Tue, 2007-09-04 at 10:06 +0800, Ow Mun Heng wrote:
> On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:
> > "Ow Mun Heng" <[EMAIL PROTECTED]> writes:

> > >->  Bitmap Heap Scan on drv  (cost=30.44..4414.39 
> > > rows=1291 width=24) (actual time=62.980..142.594 rows=12591 loops=1)
> > >  Recheck Cond: ((code)::text = 'NRN15'::text)
> > >  ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11 
> > > rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
> > >Index Cond: ((code)::text = 'NRN15'::text)
> > 
> > So you might want to increase the statistics target for the "code" column.
> Err.. how come? (newbie) it's scanning the index there. What's bad is
> that it's using Seq_scans on the "zone" table.

Code is now increased to Stat level of 100



 Limit  (cost=25113.04..30733.57 rows=1000 width=47) (actual 
time=19666.832..39961.032 rows=228 loops=1)
   ->  Hash Join  (cost=25113.04..324620.08 rows=53288 width=47) (actual 
time=19666.826..39960.437 rows=228 loops=1)
 Hash Cond: ((trz.number)::text = (drv.number)::text)
 ->  Seq Scan on zone trz  (cost=0.00..234363.75 rows=6394431 width=39) 
(actual time=17.635..29164.929 rows=6222984 loops=1)
   Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
 ->  Hash  (cost=24965.72..24965.72 rows=11785 width=24) (actual 
time=215.851..215.851 rows=12591 loops=1)
   ->  Bitmap Heap Scan on drv (cost=243.76..24965.72 rows=11785 
width=24) (actual time=50.910..188.894 rows=12591 loops=1)
 Recheck Cond: ((code)::text = 'NRN15'::text)
 ->  Bitmap Index Scan on idx_drv  (cost=0.00..240.82 
rows=11785 width=0) (actual time=49.180..49.180 rows=12591 loops=1)
   Index Cond: ((code)::text = 'NRN15'::text)
 Total runtime: 39961.703 ms


Does seem to be slightly better (from 43772ms)

trz.number stat level increased to 100 & code to 100

 QUERY PLAN

 Limit  (cost=25113.04..30733.57 rows=1000 width=47) (actual 
time=22152.398..42430.820 rows=228 loops=1)
   ->  Hash Join  (cost=25113.04..324620.08 rows=53288 width=47) (actual 
time=22152.392..42430.212 rows=228 loops=1)
 Hash Cond: ((trz.number)::text = (drv.number)::text)
 ->  Seq Scan on zone trz  (cost=0.00..234363.75 rows=6394431 width=39) 
(actual time=11.840..28808.222 rows=6222984 loops=1)
   Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
 ->  Hash  (cost=24965.72..24965.72 rows=11785 width=24) (actual 
time=2646.652..2646.652 rows=12591 loops=1)
   ->  Bitmap Heap Scan on drv  (cost=243.76..24965.72 rows=11785 
width=24) (actual time=50.628..2600.132 rows=12591 loops=1)
 Recheck Cond: ((code)::text = 'NRN15'::text)
 ->  Bitmap Index Scan on idx_drvl  (cost=0.00..240.82 
rows=11785 width=0) (actual time=38.436..38.436 rows=12591 loops=1)
   Index Cond: ((code)::text = 'NRN15'::text)
 Total runtime: 42431.358 ms

hmm..not much difference..

What else can be done?? Many Thanks..

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] Max File size per Table (1G limit??)

2007-09-03 Thread Ow Mun Heng
I just browsed to my $PGDATA location and noticed that there are some
tables which has ending of .1

# ls -lahS | egrep '(24694|24702|24926)'
-rw--- 1 postgres postgres 1.0G Sep  3 22:56 24694
-rw--- 1 postgres postgres 1.0G Sep  3 22:52 24702
-rw--- 1 postgres postgres 1.0G Sep  3 22:58 24926
-rw--- 1 postgres postgres 800M Sep  3 22:57 24694.1
-rw--- 1 postgres postgres 161M Sep  3 22:52 24702.1
-rw--- 1 postgres postgres  12M Sep  3 22:58 24926.1

I'm wondering what are these since I've not set up table partitioning
just yet.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] Symlinking (specific) tables to different Drives

2007-09-03 Thread Ow Mun Heng
Hi,

I'm running out of space on one of my partitions and I still have not
gotten all the data loaded yet. I've read that one could symlink the
pg_pg_xlog directory to another drive. I'm wondering if I can do the
same for specific tables as well.

Thanks.

I've already done a pg_dump of the entire schema but have not dropped /
re-init the DB to another location cos I'm afraid I'll lose some items.
(I've to drop the DB, format the partition, merge it w/ another
partition and re-init the DB then restore the DB from the dump)

sigh.. wish it was easier, (meaning, like SQL Server where one can
detach an entire DB/tablespace and then re-attach it elsewhere)

Anyway
Thanks for the input.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Symlinking (specific) tables to different Drives

2007-09-03 Thread Mikko Partio
On 9/4/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I'm running out of space on one of my partitions and I still have not
> gotten all the data loaded yet. I've read that one could symlink the
> pg_pg_xlog directory to another drive. I'm wondering if I can do the
> same for specific tables as well.



Create another tablespace to the new location and the ALTER TABLE ...
TABLESPACE newtablespace.


Thanks.
>
> I've already done a pg_dump of the entire schema but have not dropped /
> re-init the DB to another location cos I'm afraid I'll lose some items.
> (I've to drop the DB, format the partition, merge it w/ another
> partition and re-init the DB then restore the DB from the dump)
>
> sigh.. wish it was easier, (meaning, like SQL Server where one can
> detach an entire DB/tablespace and then re-attach it elsewhere)


If you are moving the whole cluster and can afford the downtime, you can
shutdown the postmaster, move $PGDATA to a new location and then start
postmaster from that new location.

Regards

MP


Re: [GENERAL] Instances where enable_seqscan = false is good

2007-09-03 Thread Gregory Stark

"Ow Mun Heng" <[EMAIL PROTECTED]> writes:

> On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:
>> "Ow Mun Heng" <[EMAIL PROTECTED]> writes:
>> >
>> > How can I persuade PG to use the index w/o resorting to setting seqscan
>> > = false
>> 
>> The usual knob to fiddle with is random_page_cost. If your database fits
>> mostly in memory you may want to turn it down from the default of 4 to
>> something closer to 1. 
>
> I tried down to 0.4 before it resorted to using the index. The DB
> shouldn't fit into memory (I think) that table alone has ~8million rows
> at ~1.5G size

Values under 1 are nonsensical. Basically being as low as 1 means you're
telling the database that a random access i/o takes the same amount of time as
a sequential i/o. (Actually we have sequential_page_cost now so I guess
instead of "1" I should say "the same as sequential_page_cost" but I'm
assuming you haven't modified sequential_page_cost from the default of 1 have
you?)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Instances where enable_seqscan = false is good

2007-09-03 Thread Ow Mun Heng
On Tue, 2007-09-04 at 05:15 +0100, Gregory Stark wrote:
> "Ow Mun Heng" <[EMAIL PROTECTED]> writes:
> 
> > On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:
> >> "Ow Mun Heng" <[EMAIL PROTECTED]> writes:
> >> >
> >> > How can I persuade PG to use the index w/o resorting to setting seqscan
> >> > = false
> >> 
> >> The usual knob to fiddle with is random_page_cost. If your database fits
> >> mostly in memory you may want to turn it down from the default of 4 to
> >> something closer to 1. 
> >
> > I tried down to 0.4 before it resorted to using the index. The DB
> > shouldn't fit into memory (I think) that table alone has ~8million rows
> > at ~1.5G size
> 
> Values under 1 are nonsensical. 
exactly, might as well use enable_seqscan=false. So it's still default
at 4

> Basically being as low as 1 means you're
> telling the database that a random access i/o takes the same amount of time as
> a sequential i/o. (Actually we have sequential_page_cost now so I guess
> instead of "1" I should say "the same as sequential_page_cost" but I'm
> assuming you haven't modified sequential_page_cost from the default of 1 have
> you?)

Have not changed anything in that area. Question is.. Do I need to? or
should I try out something just to see how it is? 
(any) Recommendations would be good.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Symlinking (specific) tables to different Drives

2007-09-03 Thread Ow Mun Heng
On Tue, 2007-09-04 at 07:09 +0300, Mikko Partio wrote:
> 
> 
> On 9/4/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I'm running out of space on one of my partitions and I still
> have not
> gotten all the data loaded yet. I've read that one could
> symlink the
> pg_pg_xlog directory to another drive. I'm wondering if I can
> do the 
> same for specific tables as well.
> 
> 
> Create another tablespace to the new location and the ALTER TABLE ...
> TABLESPACE newtablespace.
>  

OOooohhh... I didn't know one could use tablespaces like that. (I mean,
I did read the docs, but it just didn't register that it _can_ do
something like that)

additional question.. do I need to change the search_path?

> 
> Thanks.
> 
> I've already done a pg_dump of the entire schema but have not
> dropped /
> re-init the DB to another location cos I'm afraid I'll lose
> some items.
> (I've to drop the DB, format the partition, merge it w/
> another 
> partition and re-init the DB then restore the DB from the
> dump)
> 
> sigh.. wish it was easier, (meaning, like SQL Server where one
> can
> detach an entire DB/tablespace and then re-attach it
> elsewhere)
> 
> If you are moving the whole cluster and can afford the downtime, you
> can shutdown the postmaster, move $PGDATA to a new location and then
> start postmaster from that new location.

It's not a cluster. Its a devel DB in my laptop so.. no issues w/
dropping everything and re-creating. Just exploring my options..

The tablespace thing looks/sounds interesting though... 
> 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Max File size per Table (1G limit??)

2007-09-03 Thread Jaime Casanova
On 9/3/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote:
> I just browsed to my $PGDATA location and noticed that there are some
> tables which has ending of .1
>
> # ls -lahS | egrep '(24694|24702|24926)'
> -rw--- 1 postgres postgres 1.0G Sep  3 22:56 24694
> -rw--- 1 postgres postgres 1.0G Sep  3 22:52 24702
> -rw--- 1 postgres postgres 1.0G Sep  3 22:58 24926
> -rw--- 1 postgres postgres 800M Sep  3 22:57 24694.1
> -rw--- 1 postgres postgres 161M Sep  3 22:52 24702.1
> -rw--- 1 postgres postgres  12M Sep  3 22:58 24926.1
>
> I'm wondering what are these since I've not set up table partitioning
> just yet.
>
>

postgres uses datafiles from up to 1GB, if a table has more data than
that limit then postgres creates more files.

http://www.postgresql.org/docs/8.2/static/storage-file-layout.html

-- 
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
   Richard Cook

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Symlinking (specific) tables to different Drives

2007-09-03 Thread Mikko Partio
On 9/4/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote:
>
> On Tue, 2007-09-04 at 07:09 +0300, Mikko Partio wrote:
> >
> >
> > On 9/4/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I'm running out of space on one of my partitions and I still
> > have not
> > gotten all the data loaded yet. I've read that one could
> > symlink the
> > pg_pg_xlog directory to another drive. I'm wondering if I can
> > do the
> > same for specific tables as well.
> >
> >
> > Create another tablespace to the new location and the ALTER TABLE ...
> > TABLESPACE newtablespace.
> >
>
> OOooohhh... I didn't know one could use tablespaces like that. (I mean,
> I did read the docs, but it just didn't register that it _can_ do
> something like that)
>
> additional question.. do I need to change the search_path?


No (changing tablespaces does not change your logical schema).

Regards

MP


Re: [GENERAL] Max File size per Table (1G limit??)

2007-09-03 Thread Tom Lane
Ow Mun Heng <[EMAIL PROTECTED]> writes:
> I just browsed to my $PGDATA location and noticed that there are some
> tables which has ending of .1

TFM has some useful background knowledge for that sort of thing:
http://www.postgresql.org/docs/8.2/static/storage.html

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings