On Thu, 2006-06-22 at 14:57 -0300, Diogo Biazus wrote: > Agree, the project must choose one path as the starting point. But the > two options can be given in the long run.
I'm acting as Diogo's mentor for the SoC, so I'm trying to let Diogo discuss his ideas in the community manner without too much steering. Diogo's ideas are interesting - they aren't the way I would have done it either, but that doesn't mean we shouldn't consider this alternative approach. > I still think that as a starting point the functions inside the > database are a good option. Yes, if we use SRF functions for this, ISTM they are the best place for them. > The reasons are: > - using SQL to agregate and transform data in any way from the logs. That is a major point here. If the xlogdump is purely a stand-alone program that it will be much less functionally rich and as Tom mentions, there are other reasons for having access to a server. > - it's easier for the DBA in the other use cases where the cluster is > still active. Good point. > - give more flexibility for managing the xlogs remotely Not sure what you mean. > - I think it's faster to implement and to have a working and usable > tool. Why do you think that? It sounds like you've got more work since you effectively need to rewrite the _desc routines. > And there is one option to minimize the problem in the failed cluster > case: the wrapper program could give the option to initdb a temporary > area when no connection is given, creating a backend just to analyze a > set of xlogs. It seems a reasonable assumption that someone reading PostgreSQL logs would have access to another PostgreSQL cluster. It obviously needs to work when the server that originated the logs is unavailable, but that does not mean that all PostgreSQL systems are unavailable. There's no need to try to wrap initdb - just note that people would have to have access to a PostgreSQL system. > Other option is to start by the standalone tool and create a wrapper > function inside postgresql that would just call this external program > and extract data from the xlogs using this program's output (with some > option to output all data in a CSV format). I think this idea is a good one, but we must also consider whether is can be done effectively within the time available. Is this: can do now or want to do in future? The alternative of reinforcing xlogdump needs to be considered more fully now and quickly, so coding can begin as soon as possible. - Diogo: what additional things can you make xlogdump do? - Tom: can you say more about what you'd like to see from a tool, to help Diogo determine the best way forward. What value can he add if you have already written the tool? Some other considerations: The biggest difficulty is finding "loser transactions" - ones that have not yet committed by the end of the log. You need to do this in both cases if you want to allow transaction state to be determined precisely for 100% of transactions; otherwise you might have to have an Unknown transaction state in addition to the others. What nobody has mentioned is that connecting to a db to lookup table names from OIDs is only possible if that db knows about the set of tables the log files refer to. How would we be certain that the OID-to-tablename match would be a reliable one? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match