Re: [HACKERS] xlog viewer proposal

2006-06-25 Thread Diogo Biazus
Alright, I'm working on a fast prototype using the SRF.On 6/23/06, Simon Riggs [EMAIL PROTECTED] wrote:
On Fri, 2006-06-23 at 10:59 -0300, Diogo Biazus wrote: On 6/23/06, Simon Riggs 
[EMAIL PROTECTED] wrote:  - give more flexibility for managing the xlogs remotely Not sure what you mean.  - I think it's faster to implement and to have a working and
 usable  tool. Why do you think that? It sounds like you've got more work since you effectively need to rewrite the _desc routines.
 Yes, but I don't need to worry with program output, and I have the backend's memory management and error handling.I'd suggest doing a quick prototype to allow us to evaluate whicharchitecture would be preferable.
I'm torn between the good-idea and the safe-minimal-but-definitely in8.2 option.--Simon RiggsEnterpriseDB http://www.enterprisedb.com
-- Diogo Biazus - [EMAIL PROTECTED]Móvel Consultoriahttp://www.movelinfo.com.br
http://www.postgresql.org.br


Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Simon Riggs
On Thu, 2006-06-22 at 14:57 -0300, Diogo Biazus wrote:
 Agree, the project must choose one path as the starting point. But the
 two options can be given in the long run.

I'm acting as Diogo's mentor for the SoC, so I'm trying to let Diogo
discuss his ideas in the community manner without too much steering.

Diogo's ideas are interesting - they aren't the way I would have done it
either, but that doesn't mean we shouldn't consider this alternative
approach.

 I still think that as a starting point the functions inside the
 database are a good option.

Yes, if we use SRF functions for this, ISTM they are the best place for
them. 

 The reasons are: 
 - using SQL to agregate and transform data in any way from the logs.

That is a major point here. If the xlogdump is purely a stand-alone
program that it will be much less functionally rich and as Tom mentions,
there are other reasons for having access to a server.

 - it's easier for the DBA in the other use cases where the cluster is
 still active. 

Good point.

 - give more flexibility for managing the xlogs remotely

Not sure what you mean.

 - I think it's faster to implement and to have a working and usable
 tool.

Why do you think that? It sounds like you've got more work since you
effectively need to rewrite the _desc routines.

 And there is one option to minimize the problem in the failed cluster
 case: the wrapper program could give the option to initdb a temporary
 area when no connection is given, creating a backend just to analyze a
 set of xlogs. 

It seems a reasonable assumption that someone reading PostgreSQL logs
would have access to another PostgreSQL cluster. It obviously needs to
work when the server that originated the logs is unavailable, but that
does not mean that all PostgreSQL systems are unavailable. There's no
need to try to wrap initdb - just note that people would have to have
access to a PostgreSQL system.

 Other option is to start by the standalone tool and create a wrapper
 function inside postgresql that would just call this external program
 and extract data from the xlogs using this program's output (with some
 option to output all data in a CSV format). 

I think this idea is a good one, but we must also consider whether is
can be done effectively within the time available. Is this: can do now
or want to do in future?

The alternative of reinforcing xlogdump needs to be considered more
fully now and quickly, so coding can begin as soon as possible. 
- Diogo: what additional things can you make xlogdump do?
- Tom: can you say more about what you'd like to see from a tool, to
help Diogo determine the best way forward. What value can he add if you
have already written the tool?


Some other considerations:
The biggest difficulty is finding loser transactions - ones that have
not yet committed by the end of the log. You need to do this in both
cases if you want to allow transaction state to be determined precisely
for 100% of transactions; otherwise you might have to have an Unknown
transaction state in addition to the others.

What nobody has mentioned is that connecting to a db to lookup table
names from OIDs is only possible if that db knows about the set of
tables the log files refer to. How would we be certain that the
OID-to-tablename match would be a reliable one?

--
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Diogo Biazus
On 6/23/06, Simon Riggs [EMAIL PROTECTED]
 wrote:
 - give more flexibility for managing the xlogs remotelyNot sure what you mean. - I think it's faster to implement and to have a working and usable tool.Why do you think that? It sounds like you've got more work since you
effectively need to rewrite the _desc routines.Yes, but I don't need to worry with program output, and I have the backend's memory management and error handling.

 And there is one option to minimize the problem in the failed cluster case: the wrapper program could give the option to initdb a temporary area when no connection is given, creating a backend just to analyze a
 set of xlogs.It seems a reasonable assumption that someone reading PostgreSQL logswould have access to another PostgreSQL cluster. It obviously needs towork when the server that originated the logs is unavailable, but that
does not mean that all PostgreSQL systems are unavailable. There's noneed to try to wrap initdb - just note that people would have to haveaccess to a PostgreSQL system.Yes, that's what I tought, wrap the initdb isn't needed but would make things easier for a newbie. 
 Other option is to start by the standalone tool and create a wrapper
 function inside postgresql that would just call this external program
 and extract data from the xlogs using this program's output (with some option to output all data in a CSV format).I think this idea is a good one, but we must also consider whether iscan be done effectively within the time available. Is this: can do now
or want to do in future?I think that could be done, I have some code to call external programs within the database ready. It would be one of the enhancements in the case we choose the standalone path. 
The alternative of reinforcing xlogdump needs to be considered morefully now and quickly, so coding can begin as soon as possible.
- Diogo: what additional things can you make xlogdump do?I could add options to display the data of the bkp bloks, add the missing rmids: RM_HASH_ID, RM_GIST_ID, RM_SEQ_ID. Make a options to query only the transaction info xids status. And make a contrib module that calls the xlogdump and parses the output.
- Tom: can you say more about what you'd like to see from a tool, tohelp Diogo determine the best way forward. What value can he add if you
have already written the tool?Some other considerations:The biggest difficulty is finding loser transactions - ones that havenot yet committed by the end of the log. You need to do this in both
cases if you want to allow transaction state to be determined preciselyfor 100% of transactions; otherwise you might have to have an Unknowntransaction state in addition to the others.Yes, this is one thing we have to do in any case. 
What nobody has mentioned is that connecting to a db to lookup tablenames from OIDs is only possible if that db knows about the set of
tables the log files refer to. How would we be certain that theOID-to-tablename match would be a reliable one?Good question, It seems to me that the only case where this have a trivial aswer is if your inside the backend querying the current xlog directory.
I'm still thinking about the solution for the other cases (inside or outside the backend).-- Diogo Biazus - 

[EMAIL PROTECTED]Móvel Consultoriahttp://www.movelinfo.com.br
http://www.postgresql.org.br



Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Diogo Biazus
On 6/23/06, Diogo Biazus [EMAIL PROTECTED] wrote:
On 6/23/06, Simon Riggs 
[EMAIL PROTECTED]
 wrote:
 - give more flexibility for managing the xlogs remotelyNot sure what you mean.I can connect to the server if I want to query xlogs in a remote machine.
If i depend on a standalone tool that reads the filesystem I'll need some kind of ssh access.Sometimes the DBA needs the info in the xlogs but doesn't have/want access to the SO.It's also easier to create a script that collect statistics about xlogs in machiune clusters.
-- Diogo Biazus - [EMAIL PROTECTED]Móvel Consultoriahttp://www.movelinfo.com.br
http://www.postgresql.org.br


Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Simon Riggs
On Fri, 2006-06-23 at 11:03 -0300, Diogo Biazus wrote:
 On 6/23/06, Diogo Biazus [EMAIL PROTECTED] wrote:


 On 6/23/06, Simon Riggs [EMAIL PROTECTED] wrote:
  - give more flexibility for managing the xlogs
 remotely
 
 Not sure what you mean.
 
 I can connect to the server if I want to query xlogs in a remote
 machine.
 If i depend on a standalone tool that reads the filesystem I'll need
 some kind of ssh access.
 Sometimes the DBA needs the info in the xlogs but doesn't have/want
 access to the SO.
 It's also easier to create a script that collect statistics about
 xlogs in machiune clusters. 

OK, security is an interesting one.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Simon Riggs
On Fri, 2006-06-23 at 10:59 -0300, Diogo Biazus wrote:

 On 6/23/06, Simon Riggs [EMAIL PROTECTED] wrote:
  - give more flexibility for managing the xlogs remotely
 
 Not sure what you mean.
 
  - I think it's faster to implement and to have a working and
 usable
  tool.
 
 Why do you think that? It sounds like you've got more work
 since you 
 effectively need to rewrite the _desc routines.
 
 Yes, but I don't need to worry with program output, and I have the
 backend's memory management and error handling.

I'd suggest doing a quick prototype to allow us to evaluate which
architecture would be preferable.

I'm torn between the good-idea and the safe-minimal-but-definitely in
8.2 option.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] xlog viewer proposal

2006-06-22 Thread Tom Lane
Diogo Biazus [EMAIL PROTECTED] writes:
 The idea I've been discussing with Simon Riggs is to create a set of
 functions that can be called from within the database.

I'd question that at the very start.  I don't see any strong reason to
do it that way, and as you admit further down it'd make it impossible to
use the viewer to work on extracting data from a failed cluster; which
is, at least in my mind, one of the primary use-cases for the thing.
I would suggest building the xlog-reader as a separate program, not part
of the backend.  It would be useful to have options to (eg) translate
table OIDs to table names, which would require a connection to the
database, but this should be an *option* not an essential condition.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] xlog viewer proposal

2006-06-22 Thread Jonah H. Harris

On 6/22/06, Tom Lane [EMAIL PROTECTED] wrote:

it'd make it impossible to use the viewer to work
on extracting data from a failed cluster; which is,
at least in my mind, one of the primary use-cases
for the thing.


While I too see this as something which could be used for this outside
the database, I don't think being able to access it from within the
system would be bad either.


I would suggest building the xlog-reader as a separate
program, not part of the backend.


I think it should certainly be able to run on it's own, but it
wouldn't be that hard to extend the functions so that they were usable
from within the database or vice-versa.

I just don't see it as mutually exclusive.  It would be relatively
easy to interface it from either side once the core functionality is
there.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] xlog viewer proposal

2006-06-22 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 I think it should certainly be able to run on it's own, but it
 wouldn't be that hard to extend the functions so that they were usable
 from within the database or vice-versa.

Yes it would.  The most obvious point is that memory management and
error handling conventions inside the backend are quite different from
what you'd expect to employ in a standalone program.  Also the means
you'd use for consulting the system catalogs (in that option to provide
readable names for OIDs) are entirely different.

I think asking for support of both environments is a good way to ensure
that this summer project doesn't get finished :-(

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] xlog viewer proposal

2006-06-22 Thread Jonah H. Harris

On 6/22/06, Tom Lane [EMAIL PROTECTED] wrote:

Yes it would.  The most obvious point is that memory management and
error handling conventions inside the backend are quite different from
what you'd expect to employ in a standalone program.


No, this wouldn't really be that hard, especially if he created a few
macros to handle the differences.


Also the means you'd use for consulting the system catalogs
(in that option to provide readable names for OIDs) are entirely
different.


Definitely correct there.  If it's designed well, it wouldn't be too
hard to use same-named functions for getting catalog information and
using ifdefs depending on how it's being used.


I think asking for support of both environments is a good way to ensure
that this summer project doesn't get finished :-(


I totally agree... I'm not suggesting that he does both for SoC.  Just
that if it is designed well from the start, it wouldn't be too hard to
add support for either command-line or contrib-ish usage.  The goal
was to start with xlogdump and enhance it, so starting with
command-line is probably the best anyway.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] xlog viewer proposal

2006-06-22 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 On 6/22/06, Tom Lane [EMAIL PROTECTED] wrote:
 Yes it would.  The most obvious point is that memory management and
 error handling conventions inside the backend are quite different from
 what you'd expect to employ in a standalone program.

 No, this wouldn't really be that hard, especially if he created a few
 macros to handle the differences.

Jonah, I've been working with this system for years, and it's not that
easy to handle the differences with a few macros.  I've seen people
try, repeatedly, and seen their code break repeatedly.  The relatively
small number of files that we use in both frontend and backend scenarios
are all extremely limited-in-function and tend to break easily.  I would
never try to build something nontrivial that would work both ways
... especially not on a very tight time budget.  Diogo will have enough
challenges delivering something useful that works in one environment.

 The goal
 was to start with xlogdump and enhance it, so starting with
 command-line is probably the best anyway.

Diogo, are you working from my old xlogdump hack?  If so what version?
I can send you the latest off-list.  I add stuff to it periodically when
I need it, and I don't think I've published it lately.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] xlog viewer proposal

2006-06-22 Thread Diogo Biazus
Agree, the project must choose one path as the starting point. But the two options can be given in the long run.I still think that as a starting point the functions inside the database are a good option.The reasons are: 
- using SQL to agregate and transform data in any way from the logs.- it's easier for the DBA in the other use cases where the cluster is still active. - give more flexibility for managing the xlogs remotely
- I think it's faster to implement and to have a working and usable tool.And there is one option to minimize the problem in the failed cluster case: the wrapper program could give the option to initdb a temporary area when no connection is given, creating a backend just to analyze a set of xlogs.
After this summer project I could go on and try to use parts of this code to implement a realy standalone tool.

Other option is to start by the standalone tool and create a wrapper function inside postgresql that would just call this external program and extract data from the xlogs using this program's output (with some option to output all data in a CSV format).
On 6/22/06, Tom Lane [EMAIL PROTECTED] wrote:
Jonah H. Harris [EMAIL PROTECTED] writes: I think it should certainly be able to run on it's own, but it wouldn't be that hard to extend the functions so that they were usable
 from within the database or vice-versa.Yes it would.The most obvious point is that memory management anderror handling conventions inside the backend are quite different fromwhat you'd expect to employ in a standalone program.Also the means
you'd use for consulting the system catalogs (in that option to providereadable names for OIDs) are entirely different.I think asking for support of both environments is a good way to ensurethat this summer project doesn't get finished :-(
regards, tom lane-- Diogo Biazus - [EMAIL PROTECTED]Móvel Consultoria
http://www.movelinfo.com.brhttp://www.postgresql.org.br


Re: [HACKERS] xlog viewer proposal

2006-06-22 Thread Diogo Biazus
Diogo, are you working from my old xlogdump hack?If so what version?I can send you the latest off-list.I add stuff to it periodically when
I need it, and I don't think I've published it lately.Yup, I've got a version that was posted here some time ago. If you could send me the latest version I would be very glad.
-- Diogo Biazus - [EMAIL PROTECTED]Móvel Consultoriahttp://www.movelinfo.com.brhttp://www.postgresql.org.br



Re: [HACKERS] xlog viewer proposal

2006-06-22 Thread Jonah H. Harris

On 6/22/06, Tom Lane [EMAIL PROTECTED] wrote:

Jonah, I've been working with this system for years, and it's not that
easy to handle the differences with a few macros.


True, it is harder than just that.  I didn't mean to make light of it
at all, just that a good amount of design upfront would allow us to
support both.


Diogo, are you working from my old xlogdump hack?  If so what version?
I can send you the latest off-list.  I add stuff to it periodically when
I need it, and I don't think I've published it lately.


As it's a pretty cool and useful utility, could you publish it on-list
again some time soon?  It would be nice to pull some stuff from it for
the pg_resetxlog -f stuff... rather than having to start from scratch
copying  changing some code from xlog.c.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match