Re: [HACKERS] [Plperlng-devel] Data Persists Past Scope

2006-10-17 Thread Andrew Dunstan

This is almost ceratinly a perl problem that has nothing to do with postgres.

Please construct a small test case - I at least don't have time to spend
wading through huge gobs of code.

Note: if the variable is referred to by a live subroutine it will still be
alive. See man perlref and search for closure - it might help you.

cheers

andrew



David Fetter wrote:
 Folks,

 While testing DBI-Link, I've noticed something very odd.  In the
 trigger code, I have subroutines with 'my' variables in them, which I
 thought meant that as soon as the subroutine returned, the variables
 went away.

 They are not going away :(

 Please find attached some sample output along with DBI-Link.  The
 database I'm connecting to is MySQL's Sakila, but the same happens in
 Oracle, so I don't think (this time ;) it's a MySQL problem.

 If I quit the session or reload the functions, the ghost variables go
 away, but I can't ask people to do that between queries.

 Help!




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] postgres database crashed

2006-10-17 Thread Ashish Goel
Hi all,I am sorry but I forgot to mention that in the database schema we are maintaining referrences to the main table xyz(int id, img image, fname varhcar(50))There are around 14 tables referrencing this table . The referrences are being made to the column id.The code works well if we don't maintain the referrences but when we include the referrences then the database crashes somewhere between 2500-3000 transactions.So could this problem be due to the multiple referrences being made to the same table ?Markus Schaber [EMAIL PROTECTED] wrote: Hi, Ashish,Ashish Goel wrote: But the same code worked when I inserted around 2500 images in the database. After that it started crashing.Testing can never prove that there are no bugs.It's like the proof that all odd
 numbers above 1 are prime:3 is prime, 5 is prime, 7 is prime, so I conclude that all odd numbersabove 1 are prime. So , I don't think it's because of error in the code. Can u suggest some other possible reasons and also why is it crashing at call to memcpy().- broken hardware- compiler bugs- bugs in PostgreSQLBut without having seen your code, I tend to assume that it's somethinglike a wrong length flag in some corner case in your codeMarkus-- Markus Schaber | Logical TrackingTracing International AGDipl. Inf. | Software Development GISFight against software patents in Europe! www.ffii.orgwww.nosoftwarepatents.org 
	

	
		Get your own web address for just $1.99/1st yr. We'll help. Yahoo! Small Business.


Re: [HACKERS] postgres database crashed

2006-10-17 Thread Markus Schaber
Hi, Ashish,

Ashish Goel wrote:

 I am sorry but I forgot to mention that in the database schema we are
 maintaining referrences to the main table xyz(int id, img image, fname
 varhcar(50))
 
 There are around 14 tables referrencing this table . The referrences are
 being made to the column id.
 
 The code works well if we don't maintain the referrences but when we
 include the referrences then the database crashes somewhere between
 2500-3000 transactions.
 
 So could this problem be due to the multiple referrences being made to
 the same table ?

I doubt so.

Foreign key references are among the basics of SQL, they're pretty well
tested.

Could you try to replace your image type e. G. with bytea for your test
purposes, and see, whether it crashes, too?

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Is python 2.5 supported?

2006-10-17 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-10-16 kell 14:46, kirjutas Jim C. Nasby:
 Since installing python 2.5, tapir has been failing:
 
 http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=tapirdt=2006-10-15%2020:20:16
 
 Several of the failures appear to be a simple change in error reporting;
 I haven't investigated why import_succeed() failed.
 
 Should python 2.5 work with plpython?

This is about pl_python ?

Forwarding to Sven to investigate

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Anyone using POSIX time zone offset capability?

2006-10-17 Thread Sander Steffann

Hi,


The POSIX timezone notation as understood by the zic code includes
the possibility of

zoneabbrev[+-]hh[:mm[:ss]]

but the meaning is that hh:mm:ss *is* the offset from GMT, and
zoneabbrev is being defined as the abbreviation for that offset.
What the datetime.c code is doing is trying to find the zoneabbrev
in a built-in timezone table, and then adding the two together.
This is simply wacko.


I think that if anyone has ever tried to use this notation they would have 
noticed this misinterpretation of the specs.



Given where the code stands now, I think the best solution is to
rip out DecodePosixTimezone and instead pass the syntax off to the
zic code (which can handle it via tzparse()).  Since the datetime
input parser is ultimately only interested in the GMT offset value,
this would mean that the zoneabbrev part would become a noise word.


Sounds like a good idea to me.
Sander



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Anyone using POSIX time zone offset capability?

2006-10-17 Thread Tom Lane
Sander Steffann [EMAIL PROTECTED] writes:
 What the datetime.c code is doing is trying to find the zoneabbrev
 in a built-in timezone table, and then adding the two together.
 This is simply wacko.

 I think that if anyone has ever tried to use this notation they would have 
 noticed this misinterpretation of the specs.

Well, it'd work without surprise for the case of GMT+-n, which is
undoubtedly the most common case ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Anyone using POSIX time zone offset capability?

2006-10-17 Thread Sander Steffann

Hi,


Sander Steffann [EMAIL PROTECTED] writes:

What the datetime.c code is doing is trying to find the zoneabbrev
in a built-in timezone table, and then adding the two together.
This is simply wacko.


I think that if anyone has ever tried to use this notation they would 
have

noticed this misinterpretation of the specs.


Well, it'd work without surprise for the case of GMT+-n, which is
undoubtedly the most common case ...


H. I hadn't thought of that, but then: with the changes you proposed 
they would still get what they expect. Even though that notation would not 
conform to the POSIX docs.


Still seems like a good idea :)
Sander



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Getting the type Oid in a CREATE TYPE output function

2006-10-17 Thread Marko Kreen

On 10/16/06, Weslee Bilodeau [EMAIL PROTECTED] wrote:

Marko Kreen wrote:
 The PGP functions happen to do it already - pgp_key_id().

Actually, Tom helped me realize I made a mistake, which I'm following
his suggestion. Not tying keys to OIDs which change when backup/restored.


Yeah, tying to oids is bad, you should link to names,
preferably schema-qualified.  Anyway, that was just off-hand
suggestion.



[ snip nice description ]




I'm not sure if anyone else needs something like it, but it allows us to
transparently encrypt data directly in the tables. Minimum application
changes ('select enc_key' at connection) - the main requirement when
working on legacy code that needs to match todays security polices quickly.


Some want row-level access control, then your scheme would not be enough.

Maybe it would be better to avoid combining the keys, instead have
hidden key in database and several user keys that grant access to that
key, thus you can revoke access from only some users.

But one thing I suggest strongly - use PGP encryption instead
of old encrypt()/decrypt().  PGP hides the data much better,
espacially in case of lot of small data with same key.


--
marko

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] postgres database crashed

2006-10-17 Thread Tom Lane
Markus Schaber [EMAIL PROTECTED] writes:
 Ashish Goel wrote:
 The code works well if we don't maintain the referrences but when we
 include the referrences then the database crashes somewhere between
 2500-3000 transactions.

 So could this problem be due to the multiple referrences being made to
 the same table ?

 I doubt so.
 Foreign key references are among the basics of SQL, they're pretty well
 tested.

I'm betting that this is a memory-clobber problem in that custom datatype.
The reason the symptoms come and go when varying unrelated stuff is that
it might be chancing to clobber momentarily-unused memory rather than
live data structures.  (In the above example, the queue of pending FK
trigger events is likely what got clobbered.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] hacking postgres hashjoin algorithm

2006-10-17 Thread HS

Hello there

I am trying to play around with the hashjoin algorithm in postgres.

I am using the statement like
Select count(*) from r,s where r.id=s.id;

I looked at the function ExecHashJoin() in nodeHashjoin.c and cannot 
find where the algorithm is comparing if r.id equals s.id


please advise.

Regards
hal

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR:  column y.ycis_id must appear in the GROUP BY clause or be used
in an aggregate function

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] hacking postgres hashjoin algorithm

2006-10-17 Thread Martijn van Oosterhout
On Sun, Oct 15, 2006 at 11:08:18PM -0400, HS wrote:
 Hello there
 
 I am trying to play around with the hashjoin algorithm in postgres.
 
 I am using the statement like
 Select count(*) from r,s where r.id=s.id;
 
 I looked at the function ExecHashJoin() in nodeHashjoin.c and cannot 
 find where the algorithm is comparing if r.id equals s.id

The code doing the work is actually ExecScanHashBucket() which is in
nodeHash.c. The actual check is done by the ExecQual there...

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Stephen Frost
* Mark Woodward ([EMAIL PROTECTED]) wrote:
 If I am asking for a specific column value, should I, technically
 speaking, need to group by that column?

Technically speaking, if you're asking for a specific tuple, should you
be allowed to request an aggregation?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Additional stats for Relations

2006-10-17 Thread Simon Riggs
On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:

 On 10/13/06, Jim C. Nasby [EMAIL PROTECTED] wrote:


 I'm also not sure if this metric is what you actually want,
 since a
 single page can be returned many times from the FSM even
 between 
 vacuums. Tracking how many pages for a relation have been put
 into the
 FSM might be more useful...
  
 Nikhils
 Pages might be put into the FSM, but by this metric don't we get the
 actual usage of the pages from the FSM? Agreed a single page can be
 returned multiple times, but since it serves a new tuple, shouldn't we
 track it?
 Nikhils

This makes sense for indexes, but only makes sense for heaps when we
know that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.

IMHO Jim's proposal makes more sense for general use.

  heap_blks_extend: The number of times file extend was
 invoked on the
  relation

Sounds good

  heap_blks_truncate: The total number of blocks that have
 been truncated due 
  to vacuum activity e.g.

Sounds good

  As an addendum to the truncate stats above, we can also have
 the additional
  following stats:
 
  heap_blks_maxtruncate: The max block of buffers truncated in
 one go 
 
  heap_blks_ntruncate: The number of times truncate was called
 on this
  relation

Those last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
a more interesting metric? We've got last vacuum date, but no indication
of how frequently a vacuum has run.

 Do you have a use-case for this info? I can see where it might
 be neat 
 to know, but I'm not sure how you'd actually use it in the
 real world.
  
 Nikhils
 The use-case according to me is that these stats help prove the
 effectiveness of autovacuum/vacuum operations. By varying some autovac
 guc variables, and doing subsequent (pgbench e.g.) runs, one can find
 out the optimum values for these variables using these stats. 
 Nikhils

This should be useful for tuning space allocation/deallocation. If we
get this patch in early it should help get feedback on this area.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread D'Arcy J.M. Cain
On Tue, 17 Oct 2006 12:08:07 -0400
Stephen Frost [EMAIL PROTECTED] wrote:
 * Mark Woodward ([EMAIL PROTECTED]) wrote:
  If I am asking for a specific column value, should I, technically
  speaking, need to group by that column?
 
 Technically speaking, if you're asking for a specific tuple, should you
 be allowed to request an aggregation?

One column value doesn't necessarily mean one tuple unless it has a
unique index on that column.

SELECT COUNT(*) FROM table WHERE field = 'value';

That's perfectly reasonable.  You don't need the GROUP BY clause.

However, this doesn't sound like a hackers question.  Next time, please
ask on another list such as pgsql-sql or even pgsql-novice.  You can
review the mailing lists and their purpose at
http://www.postgresql.org/community/lists/

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] constraints in query plans

2006-10-17 Thread Jeff Davis
On Sun, 2006-10-15 at 20:36 -0700, Jeremy Drake wrote:
 I just tried that,
 CREATE INDEX test_domain_k_x1_x2_mp ON test_domain (k, x1, x2, mp);
 
 and dropped the others.  That actually works properly.
 jeremyd=# explain analyze select * from test_domain where k = 1255 and mp;
 QUERY PLAN
 --
  Bitmap Heap Scan on test_domain  (cost=5.37..237.21 rows=66 width=17) 
 (actual time=0.115..0.707 rows=132 loops=1)
Recheck Cond: (k = 1255)
Filter: mp
-  Bitmap Index Scan on test_domain_k_x1_x2_mp  (cost=0.00..5.37 rows=66 
 width=0) (actual time=0.081..0.081 rows=132 loops=1)
  Index Cond: ((k = 1255) AND (mp = true))
  Total runtime: 1.137 ms
 (6 rows)
 
 I thought I had to refer to all of the columns in order for this to work,
 that I could not skip some in the middle, but it seems to work.
 

As long as k=1255 is selective enough, the index is useful. That's
because k is the first item in the index key.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Shane Ambler

Stephen Frost wrote:

* Mark Woodward ([EMAIL PROTECTED]) wrote:

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?


Technically speaking, if you're asking for a specific tuple, should you
be allowed to request an aggregation?

Only with the assumption that the value in the where clause is for a 
unique column.


If you want min(col2) and avg(col2) where col1=x you can get it without 
a group by, the same as if you put col1x - if you want an aggregate of 
all records returned not the aggregate based on each value of col1.




select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;


But back to the query the issue comes in that the ycis_id value is 
included with the return values requested (a single row value with 
aggregate values that isn't grouped) - if ycis_id is not unique you will 
get x number of returned tuples with ycis_id=15 and the same min() and 
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values 
you want without the group by.



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Asynchronous I/O Support

2006-10-17 Thread Merlin Moncure

On 10/15/06, Luke Lonergan [EMAIL PROTECTED] wrote:

Martijn,
The killer use-case we've identified is for the scattered I/O associated
with index + heap scans in Postgres.  If we can issue ~5-15 I/Os in advance
when the TIDs are widely separated it has the potential to increase the I/O
speed by the number of disks in the tablespace being scanned.  At this
point, that pattern will only use one disk.


did you have a chance to look at posix_fadvise?

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward
 Stephen Frost wrote:

 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

 But back to the query the issue comes in that the ycis_id value is
 included with the return values requested (a single row value with
 aggregate values that isn't grouped) - if ycis_id is not unique you will
 get x number of returned tuples with ycis_id=15 and the same min() and
 avg() values for each row.
 Removing the ycis_id after the select will return the aggregate values
 you want without the group by.

I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
group by, or a bug in the parser?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Asynchronous I/O Support

2006-10-17 Thread Florian Weimer
* Neil Conway:

 [1] http://lse.sourceforge.net/io/aio.html

Last Modified Mon, 07 Jun 2004 12:00:09 GMT

But you are right -- it seems that io_submit still blocks without
O_DIRECT. *sigh*

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Durlacher Allee 47tel: +49-721-96201-1
D-76131 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Markus Schaber
Hi, Mark,

Mark Woodward wrote:
 Shouldn't this work?
 
 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
 
 ERROR:  column y.ycis_id must appear in the GROUP BY clause or be used
 in an aggregate function
 
 If I am asking for a specific column value, should I, technically
 speaking, need to group by that column?

Try:

SELECT 15 as ycis_id, min(tindex), avt(tindex) from y where ycis_id = 15;

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Nolan Cafferky

Mark Woodward wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
  


I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
group by, or a bug in the parser?
  
I think your point is that every non-aggregate column in the results of 
the query also appears in the where clause and is given a single value 
there, so conceivably, an all-knowing, all-powerful postgres could 
recognize this and do the implied GROUP by on these columns.


I'm not in a position to give a definitive answer on this, but I suspect 
that adjusting the query parser/planner to allow an implied GROUP BY 
either gets prohibitively complicated, or fits too much of a special 
case to be worth implementing. 


select
ycis_id,
some_other_id,
min(tindex),
avg(tindex)
from
y
where
ycis_id = 15
group by
some_other_id;


Here, postgres would have to use the group by you specified, and also 
recognize the single-valued constant assigned to ycis_id.  Maybe not too 
bad, but:


select
ycis_id,
some_other_id,
min(tindex),
avg(tindex)
from
y
where
ycis_id = some_single_valued_constant(foo, bar)
group by
some_other_id;

In this case, postgres doesn't know whether 
some_single_valued_constant() will really return the same single value 
for every tuple.  Ultimately, as more complex queries are introduced, it 
would become a lot simpler for the query writer to just specify the 
group by columns instead of trying to guess it from the where clause.


Final note: I could also see situations where an implied group by would 
silently allow a poorly written query to execute, instead of throwing an 
error that suggests to the query writer that they did something wrong.


--
Nolan Cafferky
Software Developer
IT Department
RBS Interactive
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Andrew Dunstan

Mark Woodward wrote:

Stephen Frost wrote:



select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
  

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.



I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
group by, or a bug in the parser?

  


AFAIK what you want is not per sql spec. What if you had instead written


 select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id) = 15;


? I think you are expecting too much reasoning from the engine.

cheers

andrew


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Markus Schaber
Hi, Mark,

Mark Woodward wrote:
 Stephen Frost wrote:

 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
 But back to the query the issue comes in that the ycis_id value is
 included with the return values requested (a single row value with
 aggregate values that isn't grouped) - if ycis_id is not unique you will
 get x number of returned tuples with ycis_id=15 and the same min() and
 avg() values for each row.
 Removing the ycis_id after the select will return the aggregate values
 you want without the group by.
 
 I still assert that there will always only be one row to this query. This
 is an aggregate query, so all the rows with ycis_id = 15, will be
 aggregated. Since ycis_id is the identifying part of the query, it should
 not need to be grouped.
 
 My question, is it a syntactic technicality that PostgreSQL asks for a
 group by, or a bug in the parser?

I think that it's a lack of special-casing the = operator. Imagine
where ycis_id15 or where ycis_id @| $RECTANGLE or other (probably
user defined) operators on (probably user defined) datatypes.

The parser has no real knowledge what the operators do, it simply
requests one that returns a bool.

One could make the parser to special case the = operator, and maybe some
others, however I doubt it's worth the effort.


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward
 Hi, Mark,

 Mark Woodward wrote:
 Stephen Frost wrote:

 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
 But back to the query the issue comes in that the ycis_id value is
 included with the return values requested (a single row value with
 aggregate values that isn't grouped) - if ycis_id is not unique you
 will
 get x number of returned tuples with ycis_id=15 and the same min() and
 avg() values for each row.
 Removing the ycis_id after the select will return the aggregate values
 you want without the group by.

 I still assert that there will always only be one row to this query.
 This
 is an aggregate query, so all the rows with ycis_id = 15, will be
 aggregated. Since ycis_id is the identifying part of the query, it
 should
 not need to be grouped.

 My question, is it a syntactic technicality that PostgreSQL asks for a
 group by, or a bug in the parser?

 I think that it's a lack of special-casing the = operator. Imagine
 where ycis_id15 or where ycis_id @| $RECTANGLE or other (probably
 user defined) operators on (probably user defined) datatypes.

 The parser has no real knowledge what the operators do, it simply
 requests one that returns a bool.

 One could make the parser to special case the = operator, and maybe some
 others, however I doubt it's worth the effort.

I understand the SQL, and this isn't a sql question else it would be on
a different list, it is a PostgreSQL internals question and IMHO potential
bug.

The original query:
select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

Should NOT require a group by to get ycis_id in the results.





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Asynchronous I/O Support

2006-10-17 Thread Raja Agrawal

Have a look at this:
[2]http://www-128.ibm.com/developerworks/linux/library/l-async/

This gives a good description of AIO.

I'm doing some testing. Will notify, if I get any positive results.

Please let me know, if you get any ideas after reading [2].

Regards,
Raja

On 10/17/06, Florian Weimer [EMAIL PROTECTED] wrote:

* Neil Conway:

 [1] http://lse.sourceforge.net/io/aio.html

Last Modified Mon, 07 Jun 2004 12:00:09 GMT

But you are right -- it seems that io_submit still blocks without
O_DIRECT. *sigh*

--
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Durlacher Allee 47tel: +49-721-96201-1
D-76131 Karlsruhe fax: +49-721-96201-99



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward
 Hi, Mark,

 Mark Woodward wrote:
 Shouldn't this work?

 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

 ERROR:  column y.ycis_id must appear in the GROUP BY clause or be used
 in an aggregate function

 If I am asking for a specific column value, should I, technically
 speaking, need to group by that column?

 Try:

 SELECT 15 as ycis_id, min(tindex), avt(tindex) from y where ycis_id = 15;


This isn't a SQL question!!! This is a question of whether or not
PostgreSQL is correct in requiring a group by in the query. I assert
that since it is unabiguous as to what ycis_id should be, PostgreSQL
should not require a grouping.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward
 Mark Woodward wrote:
 Stephen Frost wrote:


 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

 But back to the query the issue comes in that the ycis_id value is
 included with the return values requested (a single row value with
 aggregate values that isn't grouped) - if ycis_id is not unique you
 will
 get x number of returned tuples with ycis_id=15 and the same min() and
 avg() values for each row.
 Removing the ycis_id after the select will return the aggregate values
 you want without the group by.


 I still assert that there will always only be one row to this query.
 This
 is an aggregate query, so all the rows with ycis_id = 15, will be
 aggregated. Since ycis_id is the identifying part of the query, it
 should
 not need to be grouped.

 My question, is it a syntactic technicality that PostgreSQL asks for a
 group by, or a bug in the parser?



 AFAIK what you want is not per sql spec. What if you had instead written


   select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id)
 = 15;


 ? I think you are expecting too much reasoning from the engine.

Regardless, I can get the results I need and have already worked around
this. The reason why I posted the question to hackers was that I think it
is a bug.

The output column ycis_id is unabiguously a single value with regards to
the query. Shouldn't PostgreSQL know this? AFAIR, I think I've used this
exact type of query before either on PostgreSQL or another system, maybe
Oracle, and it did work.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward
 Mark Woodward wrote:
 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;


 I still assert that there will always only be one row to this query.
 This
 is an aggregate query, so all the rows with ycis_id = 15, will be
 aggregated. Since ycis_id is the identifying part of the query, it
 should
 not need to be grouped.

 My question, is it a syntactic technicality that PostgreSQL asks for a
 group by, or a bug in the parser?

 I think your point is that every non-aggregate column in the results of
 the query also appears in the where clause and is given a single value
 there, so conceivably, an all-knowing, all-powerful postgres could
 recognize this and do the implied GROUP by on these columns.

Not exactly.

 I'm not in a position to give a definitive answer on this, but I suspect
 that adjusting the query parser/planner to allow an implied GROUP BY
 either gets prohibitively complicated, or fits too much of a special
 case to be worth implementing.

 select
   ycis_id,
   some_other_id,
   min(tindex),
   avg(tindex)
   from
   y
   where
   ycis_id = 15
   group by
   some_other_id;

This is not, in fact, like the example I gave and confuses the point I am
trying to make.


The original query:
select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ycis_id is unambiguous and MUST be only one value, there should be no
requirement of grouping. In fact, a group by implies multiple result
rows in an aggregate query.

As I said in other branches of this thread, this isn't a SQL question, it
is a question of whether or not the PostgreSQL parser is correct or not,
and I do not believe that it is working correctly.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Andrew Dunstan

Mark Woodward wrote:

Hi, Mark,

Mark Woodward wrote:


Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR:  column y.ycis_id must appear in the GROUP BY clause or be used
in an aggregate function

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?
  

Try:

SELECT 15 as ycis_id, min(tindex), avt(tindex) from y where ycis_id = 15;




This isn't a SQL question!!! This is a question of whether or not
PostgreSQL is correct in requiring a group by in the query. I assert
that since it is unabiguous as to what ycis_id should be, PostgreSQL
should not require a grouping.

  


Of course it's an SQL question. How can you ask about the correctness of 
a piece of text which purports to be SQL and then say it isn't an SQL 
question?


If you can point to a place in the spec or our docs that sanctions the 
usage you expect, then please do so, Until then I (and I suspect 
everyone else) will persist in saying it's not a bug.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Shane Ambler

Mark Woodward wrote:

Stephen Frost wrote:


select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.


I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.


SELECT ycis_id FROM table WHERE ycis_id=15; returns a single tuple when 
ycis_id is unique otherwise multiple tuples
which means that SELECT ycis_id is technically defined as returning a 
multiple row tuple even if ycis_id is unique - the data in the tuple 
returned is data directly from one table row


SELECT max(col2) FROM table WHERE ycis_id=15; returns an aggregate tuple

SELECT ycis_id FROM table WHERE ycis_id=15 GROUP BY ycis_id; returns an 
aggregate tuple (aggregated with the GROUP BY clause making the ycis_id 
after the SELECT an aggregate as well)


You can't have both a single tuple and an aggregate tuple returned in 
the one statement. If you want the column value of ycis_id in the 
results you need the group by to unify all returned results as being 
aggregates.



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Peter Eisentraut
Mark Woodward wrote:
 Shouldn't this work?

 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

 ERROR:  column y.ycis_id must appear in the GROUP BY clause or be
 used in an aggregate function

This would require a great deal of special-casing, in particular 
knowledge of the = operator, and then the restriction to a particular 
form of the WHERE clause.  For overall consistency, I don't think this 
should be allowed.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Chris Campbell

On Oct 17, 2006, at 15:19, Peter Eisentraut wrote:


Mark Woodward wrote:

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR:  column y.ycis_id must appear in the GROUP BY clause or be
used in an aggregate function


This would require a great deal of special-casing, in particular
knowledge of the = operator, and then the restriction to a particular
form of the WHERE clause.  For overall consistency, I don't think this
should be allowed.


In this particular case, the client constructing the query *knows*  
the value of ycis_id (since the client is generating the ycis_id =  
15 clause). It's technically just a waste of bandwidth and server  
resources to recalculate it. If you really want to replicate the  
output of the query you proposed, you could rewrite it on the client as:


select 15 as ycis_id, min(tindex), avg(tindex) from y where  
ycis_id = 15;


You could argue that the server should do this for you, but it seems  
ugly to do in the general case. And, like Peter points out, would  
need a lot of special-casing. I guess the parser could do it for  
expressions in the SELECT clause that exactly match expressions in  
the WHERE clause.


Thanks!

- Chris


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward
 On Tue, Oct 17, 2006 at 02:41:25PM -0400, Mark Woodward wrote:

 The output column ycis_id is unabiguously a single value with regards
 to
 the query. Shouldn't PostgreSQL know this? AFAIR, I think I've used
 this
 exact type of query before either on PostgreSQL or another system, maybe
 Oracle, and it did work.

 Doesn't work in Oracle 10g:

 SELECT ycis_id, tindex from x where ycis_id = 15;
 YCIS_ID  TINDEX
 ===  ==
  15  10
  15  20

 SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15;
 ORA-00937: not a single-group group function

 SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15 GROUP
 BY ycis_id;
 YCIS_ID  MIN(TINDEX)  AVG(TINDEX)
 ===  ===  ===
  15   10   15


That's interesting. I am digging through the SQL99 spec, and am trying to
find a definitive answer.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward
 Mark Woodward wrote:
 Shouldn't this work?

 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

 ERROR:  column y.ycis_id must appear in the GROUP BY clause or be
 used in an aggregate function

 This would require a great deal of special-casing, in particular
 knowledge of the = operator, and then the restriction to a particular
 form of the WHERE clause.  For overall consistency, I don't think this
 should be allowed.


Well, this started out as a huh, that's funny, that should work, is that
a bug? and is turning into a search through the SQL99 spec for a clear
answer. I've already worked around it, but to me, at least, it seems it
should work.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Mark Woodward
 On Oct 17, 2006, at 15:19, Peter Eisentraut wrote:

 Mark Woodward wrote:
 Shouldn't this work?

 select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

 ERROR:  column y.ycis_id must appear in the GROUP BY clause or be
 used in an aggregate function

 This would require a great deal of special-casing, in particular
 knowledge of the = operator, and then the restriction to a particular
 form of the WHERE clause.  For overall consistency, I don't think this
 should be allowed.

 In this particular case, the client constructing the query *knows*
 the value of ycis_id (since the client is generating the ycis_id =
 15 clause). It's technically just a waste of bandwidth and server
 resources to recalculate it. If you really want to replicate the
 output of the query you proposed, you could rewrite it on the client as:

  select 15 as ycis_id, min(tindex), avg(tindex) from y where
 ycis_id = 15;

 You could argue that the server should do this for you, but it seems
 ugly to do in the general case. And, like Peter points out, would
 need a lot of special-casing. I guess the parser could do it for
 expressions in the SELECT clause that exactly match expressions in
 the WHERE clause.


But, and here's the rub, which is the correct way to handle it? I'm
looking through the SQL99 spec to see if I can find an answer.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Joe Sunday
On Tue, Oct 17, 2006 at 02:41:25PM -0400, Mark Woodward wrote:

 The output column ycis_id is unabiguously a single value with regards to
 the query. Shouldn't PostgreSQL know this? AFAIR, I think I've used this
 exact type of query before either on PostgreSQL or another system, maybe
 Oracle, and it did work.

Doesn't work in Oracle 10g:

SELECT ycis_id, tindex from x where ycis_id = 15;
YCIS_ID  TINDEX
===  ==
 15  10
 15  20

SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15;
ORA-00937: not a single-group group function

SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15 GROUP BY 
ycis_id;
YCIS_ID  MIN(TINDEX)  AVG(TINDEX)
===  ===  ===
 15   10   15

--Joe

-- 
Joe Sunday [EMAIL PROTECTED]  http://www.csh.rit.edu/~sunday/
Computer Science House, Rochester Inst. Of Technology

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] query log corrupted-looking entries

2006-10-17 Thread Tom Lane
George Pavlov [EMAIL PROTECTED] writes:
 Hmm.  If the messages are less than PIPE_BUF bytes long (4096 bytes on
 Linux) then the writes are supposed to be atomic.

 Some of them involve long messages (4K), but there are many that do not
 (like the ones I had posted at the start of this thread). 

I checked around with some kernel/glibc gurus in Red Hat, and the
consensus seemed to be that we'd be better off to bypass fprintf() and
just send message strings to stderr using write() --- ie, instead of
elog.c doing

fprintf(stderr, %s, buf.data);

do

write(fileno(stderr), buf.data, strlen(buf.data));

Anyone have any comments on possible portability risks?  In particular,
will this work on Windows?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Martijn van Oosterhout
On Tue, Oct 17, 2006 at 04:45:49PM -0400, Mark Woodward wrote:
 Well, this started out as a huh, that's funny, that should work, is that
 a bug? and is turning into a search through the SQL99 spec for a clear
 answer. I've already worked around it, but to me, at least, it seems it
 should work.

What you're asking for is difficult, not done by anyone else (so far
demostrated) and not mandated by the spec, so I don't see how it could
be construed a bug.

As for the spec, this is what I have from SQL2003:

7.12.15) If T is a grouped table, then let G be the set of grouping
  columns of T. In each value expression contained in select list,
  each column reference that references a column of T shall reference
  some column C that is functionally dependent on G or shall be
  contained in an aggregated argument of a set function specification
  whose aggregation query is QS.

Which to me says that everything in the output is either grouped by or
part of an aggregate. That together with a statement elsewhere saying
that if no group by clause is present, GROUP BY () is implied seems to
seal it for me.

(BTW, the functionally dependent is new and postgresql only supports
the older SQL standards where C has to actually be a grouping column).

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Mark Woodward wrote:
 My question, is it a syntactic technicality that PostgreSQL asks for a
 group by, or a bug in the parser?

 AFAIK what you want is not per sql spec.

It would in fact be a violation of spec.  Consider the case where there
are no rows matching 15.  In this case

select min(tindex), avg(tindex) from y where ycis_id = 15;

will yield one row containing NULLs, whereas

select min(tindex), avg(tindex) from y where ycis_id = 15 group by ycis_id;

will yield no rows (because there are no groups).  Therefore, if
Postgres were to implicitly insert a GROUP BY to make it legal to
reference ycis_id directly, we'd be changing the query behavior
and breaking spec.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Syntax bug? Group by?

2006-10-17 Thread Shane Ambler

Joe Sunday wrote:

On Tue, Oct 17, 2006 at 02:41:25PM -0400, Mark Woodward wrote:


The output column ycis_id is unabiguously a single value with regards to
the query. Shouldn't PostgreSQL know this? AFAIR, I think I've used this
exact type of query before either on PostgreSQL or another system, maybe
Oracle, and it did work.


Doesn't work in Oracle 10g:

SELECT ycis_id, tindex from x where ycis_id = 15;
YCIS_ID  TINDEX
===  ==
 15  10
 15  20

SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15;
ORA-00937: not a single-group group function

SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15 GROUP BY 
ycis_id;
YCIS_ID  MIN(TINDEX)  AVG(TINDEX)
===  ===  ===
 15   10   15

--Joe



MySQL reports -
Mixing of GROUP columns (MIN(),MAX(),COUNT()...) with no GROUP columns 
is illegal if there is no GROUP BY clause



I found one that actually returns the desired result - SQLite3.

sqlite select * from test;
15|20
15|10
sqlite select ycis_id,min(tindex),avg(tindex) from test where ycis_id=15;
15|10|15
sqlite


--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Postgresql Caching

2006-10-17 Thread Anon Mous
Hi  I've loved reading all of your thoughts and comments.  Yet, I'm left with the question:   Can we can brainstorm a caching solution that is workable...  I've seen some posts talking about some of the challenges.  1.) Only good for static data  As it was proposed that is largely true. This doesn't mean, however that the idea as stated isn't worth persuing because I think you'll find most applications have a great deal of "static enough" data to benefit greatly from this type of caching.  However, I think some simple changes to the idea may make it useful for busy tables...  These changes, would probably require direct communication between the caching controller and the the postmaster.  a.) Rather than table locking, track changes at the row level. b.) Rather than requiring a complete reseeding of a table after an update, just invalidate, or repopulate the affected rows.  c.)
 Rather than destroying popular query results, try to update them if possible. For example, it's easy to remove one entry from the cache if that row was just deleted. (It's probably cheaper to rerun the query on just the few changed rows than to rerun the whole query.) d.) Any other ideas?  2.) If any caching were possible, we'd already be doing it.  I don't think this statement will stand the test of time! Agreed, caching quickly becomes a head hurting complex topic, but the benefits are well worth the effort!  3.) Consistency and memcached, *are* mutually exclusive. Memcached provides no mechanisms for consistency.   "You can never have perfect consistency across different systems (memcache /  postgresql) and especially not when their visibility rules differ. What is  visible to something via memcache is always latest uncommitted. What is  visible in PostgreSQL is something less than that. Consistency is not  possible. 
 Correct caching is therefore also not possible unless you define  correct as 'latest', and even then, you have problems if memcache expires  the record, before the real record has been commited into PostgreSQL."  I completely agree. I'm not talking about synchronizing memcached data to be consistent, I'm saying lets create a caching layer that works something like memcache and preserves consistency!  and, very related to this,  4.) Memcached Caching is exactly opposite to Postgres consistency.  Specifically: Memcache is serialized Memcache can loose data at any time Memcache has only 2 fields Memcache has no synchronization  Postgres needs consistency.  Memcache doesn't do any synchronization, and that means consistency is impossible. However, a special version of memcache that is embedded into the postgresql system or api that does talk with the postmaster could be able to provide guaranteed
 consistency?  5.) This idea won't save any time with SQL parsing.  I believe it can... Because, as memcache has illustrated, you can avoid any sql parsing by using the sql and user's table permissions (if different users are enabled) as the key to the cached data.   6.) Postgresql is consistency. If an application needs speed let the application figure out how to cache the data  I appreciate that Postgres is all about data consistency. Actually, that's why I'm here and not on the Mysql board... However, I believe that we can provide caching without losing consistency, and developers will love the extra speed.  If we do implement the caching once, everyone will be able to use it without complicating their end application!!! (Read: It will help the world and make PostgreSQL very popular!)  ---  So, would it work to create a caching pre-processor for Postgresql that would work serially on every request,
  and pass all uncached queries to the database?  - If it's a cache hit, and the data is currently available and active, pass the data back.   - If it's a miss, pass the query along to the database, and populate the cache with the results.   - If the query changes data, invalidate the cached queries that touch any table rows that could be affected. After the update, re-enable the unaffected table rows, and repopulate the cache with the updates.   - Cached queries using an affected table would normally be deleted except in special simple cases that could be updated.   A related interesting thought... It might help if the cache stored the data in separate table rows rather than in whatever format the query requested the way memcached does.   - Each cached table row could be stored as a separate entity.  - Complicated joins, rather than caching all of the data in whatever organization the user specified, would instead store a
  matrix of pointers to the exact table fields in most cases.  Will it work? Am I missing anything?  Thanks  Daniel  
		Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.

Re: [HACKERS] Postgresql Caching

2006-10-17 Thread Jeff Davis
On Tue, 2006-10-17 at 16:51 -0700, Anon Mous wrote:
 Hi
 
 I've loved reading all of your thoughts and comments.
 
 Yet, I'm left with the question:
 
Can we can brainstorm a caching solution that is workable...
 

I think you're making this a little complicated.

A lot of these problems can be solved with something like materialized
views (triggers updating another smaller relation), or they are already
solved by the lower layers of caching (like PostgreSQL's shared buffers,
or the OS buffer cache).

If the application is executing many queries that are exactly the same,
it would be a good idea to look at something like pgpool-II's query
cache. The only way to gain a benefit on querying results that are
already in memory is to avoid the query processing.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] 8.1.4 verified on Intel Mac OS 10.4.8

2006-10-17 Thread Douglas Toltzman
The subject line says it all.I just completed a build, test, and install of PostgreSQL 8.1.4 on an Intel Mac with OS 10.4.8.  Every single step executed flawlessly, and all 98 tests passed, running the regression tests.I've got a suggestion for the documentation (INSTALL file).  I may be the only person in the world who does everything backwards, but my first attempt at regression tests failed because I had already started the postmaster.  I know that's pretty stupid, but I had to scratch my head for a few minutes when it said initdb failed, and left me no clue as to why.  You might add a note in with the instructions for regression testing to shutdown the server before running the tests.  It does say, after all, that you can run them any time!   Douglas Toltzman[EMAIL PROTECTED](910) 526-5938p.s. I noticed I am a build behind, but I would think that 8.1.5 would give similar results. 

[HACKERS] pg_internal.init is hazardous to your health

2006-10-17 Thread Tom Lane
Dirk Lutzebaeck and I just spent a tense couple of hours trying to
figure out why a large database Down Under wasn't coming up after being
reloaded from a base backup plus PITR recovery.  The symptoms were that
the recovery went fine, but backend processes would fail at startup or
soon after with could not open relation XX/XX/XX: No such file type of
errors.

The answer that ultimately emerged was that they'd been running a
nightly maintenance script that did REINDEX SYSTEM (among other things
I suppose).  The PITR base backup included pg_internal.init files that
were appropriate when it was taken, and the PITR recovery process did
nothing whatsoever to update 'em :-(.  So incoming backends picked up
init files with obsolete relfilenode values.

We don't actually need to *update* the file, per se, we only need to
remove it if no longer valid --- the next incoming backend will rebuild
it.  I could see fixing this by making WAL recovery run around and zap
all the .init files (only problem is to find 'em), or we could add a new
kind of WAL record saying remove the .init file for database XYZ
to be emitted whenever someone removes the active one.  Thoughts?

Meanwhile, if you're trying to recover from a PITR backup and it's not
working, try removing any pg_internal.init files you can find.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_internal.init is hazardous to your health

2006-10-17 Thread Gavin Sherry
On Tue, 17 Oct 2006, Tom Lane wrote:

 Dirk Lutzebaeck and I just spent a tense couple of hours trying to
 figure out why a large database Down Under wasn't coming up after being
 reloaded from a base backup plus PITR recovery.  The symptoms were that
 the recovery went fine, but backend processes would fail at startup or
 soon after with could not open relation XX/XX/XX: No such file type of
 errors.

 The answer that ultimately emerged was that they'd been running a
 nightly maintenance script that did REINDEX SYSTEM (among other things
 I suppose).  The PITR base backup included pg_internal.init files that
 were appropriate when it was taken, and the PITR recovery process did
 nothing whatsoever to update 'em :-(.  So incoming backends picked up
 init files with obsolete relfilenode values.

Ouch.

 We don't actually need to *update* the file, per se, we only need to
 remove it if no longer valid --- the next incoming backend will rebuild
 it.  I could see fixing this by making WAL recovery run around and zap
 all the .init files (only problem is to find 'em), or we could add a new
 kind of WAL record saying remove the .init file for database XYZ
 to be emitted whenever someone removes the active one.  Thoughts?

The latter seems the Right Way except, I guess, that the decision to
remove the file is buried deep inside inval.c.

Thanks,

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Hints proposal

2006-10-17 Thread Robert Treat
On Friday 13 October 2006 12:46, Gregory Stark wrote:
 Josh Berkus josh@agliodbs.com writes:
   I actually think the way to attack this issue is to discuss the kinds
   of errors the planner makes, and what tweaks we could do to correct
   them. Here's the ones I'm aware of:
  
   -- Incorrect selectivity of WHERE clause
   -- Incorrect selectivity of JOIN
   -- Wrong estimate of rows returned from SRF
   -- Incorrect cost estimate for index use
  
   Can you think of any others?

  -- Incorrect estimate for result of DISTINCT or GROUP BY.

Yeah, that one is bad.  I also ran into one the other day where the planner 
did not seem to understand the distinctness of a columns values across table 
partitions... 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Anyone using POSIX time zone offset capability?

2006-10-17 Thread James Cloos
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom The weird thing about this allegedly-POSIX notation is the combination
Tom of a symbolic name and a further offset from it.

AIUI, it is not a further offset but rather (mostly-)redundant data
specifying the exact offset from UTC¹ the text tz specifies.  Having
both provides easy parsing both for humans (the text) and for code
(the number).

-JimC

[1] Of course POSIX time is not really offset from UTC, since POSIX
pretends there have been no leap seconds since 1970.  As such
the timestamps are technically ambiguous as to whether the specify
real UTC-based time or POSIX time   (Currently there is a
23-second difference between the two.)

-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 0xED7DAEA6

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Additional stats for Relations

2006-10-17 Thread NikhilS
Hi, 
So: 
heap_blks_reused (with Jim's semantics), heap_blks_extend,
heap_blks_truncate are the interesting stats? Will try to work up a
patch for this. 

Regards,
Nikhils
 EnterpriseDB http://www.enterprisedb.com
On 10/15/06, Simon Riggs [EMAIL PROTECTED] wrote:
On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote: On 10/13/06, Jim C. Nasby [EMAIL PROTECTED] wrote: I'm also not sure if this metric is what you actually want,
 since a single page can be returned many times from the FSM even between vacuums. Tracking how many pages for a relation have been put into the
 FSM might be more useful... Nikhils Pages might be put into the FSM, but by this metric don't we get the actual usage of the pages from the FSM? Agreed a single page can be
 returned multiple times, but since it serves a new tuple, shouldn't we track it? NikhilsThis makes sense for indexes, but only makes sense for heaps when weknow that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.IMHO Jim's proposal makes more sense for general use.  heap_blks_extend: The number of times file extend was invoked on the
  relationSounds good  heap_blks_truncate: The total number of blocks that have been truncated due  to vacuum activity e.g.Sounds good
  As an addendum to the truncate stats above, we can also have the additional  following stats:   heap_blks_maxtruncate: The max block of buffers truncated in
 one go   heap_blks_ntruncate: The number of times truncate was called on this  relationThose last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums isa more interesting metric? We've got last vacuum date, but no indicationof how frequently a vacuum has run. Do you have a use-case for this info? I can see where it might
 be neat to know, but I'm not sure how you'd actually use it in the real world. Nikhils The use-case according to me is that these stats help prove the
 effectiveness of autovacuum/vacuum operations. By varying some autovac guc variables, and doing subsequent (pgbench e.g.) runs, one can find out the optimum values for these variables using these stats.
 NikhilsThis should be useful for tuning space allocation/deallocation. If weget this patch in early it should help get feedback on this area.--Simon RiggsEnterpriseDB 
http://www.enterprisedb.com-- All the world's a stage, and most of us are desperately unrehearsed.