[HACKERS] LIMIT and JOINS

2006-02-22 Thread Gregor Zeitlinger








Hi,



I have a question on implementation
of the LIMIT SQL clause.



Using the LIMIT clause, I want
to reduce the computation time for a query.



When I try to limit the result
of a joined table, however, the join will be computed first (which takes very long).



Lets suppose the following example:



select * from T1 join T2 on T1.id = T2.id LIMIT 1



Conceptually, it should be possible
to fetch one row from T1 and T2, i.e. to propagate the LIMIT clause.



I was wondering what the
exact requirements are to propagate the LIMIT clause.



Do I need a foreign key
relation between T1 and T2?

Do I need to use a full outer
join in order to propagate the LIMIT clause?



Thanks



Gregor
 Zeitlinger

LUCAS Product Development



Torex Retail Solutions GmbH



Schwedenstr. 9,
D-13359 Berlin 

Tel. +49 (0) 30
49901-243

Fax +49 (0) 30
49901-139



Mailto:[EMAIL PROTECTED]

http://www.torexretail.de










Re: [HACKERS] Incremental Backup Script

2006-01-09 Thread Gregor Zeitlinger
 As per docs, if the databases are rarely updated it could take a long
 time for the WAL segment to roll over. 
Yes, therefore I want to copy the current WAL (as I said earlier).
When restoring, I also want to make sure that I restore exactely to the point 
when I copied the current WA segment.

Hence I consider to do it as follows:
1) take the t = current time
2) copy the current WAL
3) when restoring, set  recovery_target_time  = t
 
Maybe there is even a way to ask Postgres of its last commited x = xid.
In that case, we could set recovery_target_xid = x
Is that possible?
 
Regards,
 
Gregor

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Incremental Backup Script

2006-01-04 Thread Gregor Zeitlinger
-Original Message-
From: Zach Bagnall [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 04, 2006 4:42 AM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Incremental Backup Script


Gregor: can you explain how to identify the current file? I had 
implemented a backup and restore script for PITR but stumbled at this 
point. The page above does not specify how this is to be done.
I have found a way from 
http://archives.postgresql.org/pgsql-admin/2005-10/msg00059.php. I have not 
tried it, but it seems straightforward.

I appreciate the addition of PITR - it's better than nothing (nothing 
being full dumps) in some respects. Ideally, we need to be able to dump 
deltas for a single database. 
Yes, it is not a replacement for an incremental backup, especially due to the 
fact that WALs may be orders of magnitude larger than a delta.

In practice, restoration using the PITR 
method is awkward.
Yes, what I am planning to do:
1) drop the corrupted database
2) restore the base backup
3) replay all incremental backups (in the sense of my original mail)

Gregor Zeitlinger
LUCAS Product Development

Torex Retail Solutions GmbH

Schwedenstr. 9, D-13359 Berlin 
Tel. +49 (0) 30 49901-243
Fax +49 (0) 30 49901-139
 
Mailto:[EMAIL PROTECTED]
http://www.torexretail.de

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Incremental Backup Script

2005-12-25 Thread Gregor Zeitlinger
Hello,
 
as far as I have understood, the WAL backup that you control via 
archive_command is the PostgreSQL equivalent to what other databases let you 
do with an incremental backup. That is, if you don't forget to include the 
current WAL block.
 
I have found a script to determine the current WAL on the admin mailing list. 
Based on this script, I intend to write two scripts that do the following 
(unless something like this already exists).
 
basebackup - basebackup.bbd.bz2
incrementalbackup - incrementalbackup.bbd.ibc.ibd.bz2 (incremental 
backup relative to the last incremental backup)
restore (a file produced by the above commands) - restore database (either 
base, or base + 1..n incremental backups)
 
bbd: base backup date (e.g. 2005-12-25-14-00)
ibc: incremental backup counter (1..n)
ibd: incremental backup date
 
The central idea is that base backups are guaranteed to include all information 
up to bbd and incremental backups all data up to ibd. I hope that this 
makes it easier for administrators.
 
archive_command:
copy the files to a local backup directory (LWB = local wal backup)
 
basebackup:
1) tar the data directory 
2) add any WALs that are produced while the backup is running. 
3) delete all WAL that are included in the tar
4) I still wonder how bbd must be chosen (that of pg_start_backup?)
 
incremental backup: 
1) add all WAL that are currently in the LWB to the tar
2) add the current WAL to the tar
3) verify that all WALs prior to the current WAL are included (i.e. that no WAL 
is currently being copied to the LWB)
4) delete all WAL that are included in the tar
 
restore:
1) if it's a base backup, just restore that
2) if it's an incremental backup, check that the corresponding base backup and 
all incremental backups with lower ibc are available. Then restore the base 
backup and all incremental backups up to the one specified
 
Also, I was wondering whether it is always safe to copy the current WAL file, 
i.e. may the current WAL file be invalid in any circumstance?
 
Is this a sensible idea?
 
Regards,
 
Gregor

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] is GiST still alive?

2003-10-23 Thread Gregor Zeitlinger
On Wed, 22 Oct 2003, Christopher Browne wrote:
 No, tables wouldn't be the right way to do it.

 But it's going to be troubled, in any case, because of the
 every-popular mixtures of:

  a) Often weird declarations of what character sets are in use;
I gotta admit that I haven't spend too much attention on that specific
part. But couln't you just store it in the character set that was
originally used to populate the document?

  b) Pointers to other parts of a document;
do you mean to the parent element and the child elements?
This is specifially what my custom format is designed for.

  c) What's a database going to consist of?  One XML document?  Or
 many?
many, each of which can be up to about 1TB

 And if many, then then how do you have a centralized
 reference point to navigate from to find the document that you
 want?
This one could be a table, or another xml document.

 And navigate was a carefully chosen word; what you then have is
 essentially a network database system, and have to then start making
 up ways of describing queries.  XQuery may be better than CODASYL of
 yesteryear, but you're still left writing a lot of recursive code.
 (Thus making those that understand the Lambda Nature more powerful...)
I don't get your point? XQuery works on one document, IIRC.

 At the end, do you have a database?  Or just a set of documents?
 It's hard to tell, a priori.
OK, know waht you mean. I'd say it's a database, because the information
is stored not plain - but in pages and in an optimized format for
insertion, deletion and querying.

 And do you think this is likely to be useful because:

  a) You have some clear notion as to why this ought to be useful?
yes. Modyfing and querying plain xml files sucks performancewise once your
documents get a little larger (100 MB+)

  b) XML is a big buzzword, and people have been able to succesfully
 attract research funds or vulture capital on the basis of
 having that acronym in a proposal?
That time's over anyways, isn't it?

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] is GiST still alive?

2003-10-23 Thread Gregor Zeitlinger
On Thu, 23 Oct 2003, Christopher Kings-Lynne wrote:
 4. Extend the contrib/ltree gist-based tree indexing scheme to work on
 xml and hence the operations in no.3 above are really fast...
but then, the plain xml data is still stored in a database colum, if I
understand correctly?

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] is GiST still alive?

2003-10-23 Thread Gregor Zeitlinger
On Wed, 22 Oct 2003, Josh Berkus wrote:
 Reinvent the wheel?  Well, yes.

 The first thing ... the VERY first thing, abosolutely ... that you need to do
 is invent a theory of XML databases.
Well, I have. It doen't cover all parts in detail yet, because I've
started with a simple IO layer (simple page locking, no concurrent
transactions) and worked on the page layout and parsing algorithms from
there on. Querying on that format will follow thereafter. And concurrency
issuses will be dealt with even later.
I am considering hierachical locking (any part of the tree).
I was just wondering wheater I could take some of that fuctionlay from an
existing database.

 Without these things, you're just another idiot floundering around a morass of
 acronyms and half-baked ideas.
I know

 With them, you will have something that no current XML database
 project/product has, and can give XML databases a fighting chance to
 survive beyond the current fad.
If there was a promising project for an xml database, I would have joined
it.

 Of course,it's possible in the course of theorizing that you may prove that
 XML databases are impossible.  But that's how the cookie crumbles 
only an implementation is a real proof.

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] is GiST still alive?

2003-10-23 Thread Gregor Zeitlinger
On Thu, 23 Oct 2003, Andrew Dunstan wrote:
 Storing the XML text has problems - you have to parse it every time you
 want something - that has to cause a huge performance hit.
You couldn't have said better what I meant.
I store the xml already parsed. You can navigate right along. To the
parent, the previous, the next elemnt or the first or last child.

 I use XML a lot for all sorts of purposes, but it is appropriate for
 data transfer rather than data storage, IMNSHO.
Right now, you're quite right. But I want to change that.

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] is GiST still alive?

2003-10-23 Thread Gregor Zeitlinger
On Thu, 23 Oct 2003, Christopher Kings-Lynne wrote:
  You couldn't have said better what I meant.
  I store the xml already parsed. You can navigate right along. To the
  parent, the previous, the next elemnt or the first or last child.

 Which is the whole point of indexing it...
not quite. Indexing can solve some of the problems, not all.
1) You have to update the index every time you modify the data. My custom
format serves as an index for some queries.

2) The page format is designed in such a way that modifications
(insertion, deletion) are as fast as the original parsing. I'm not sure
how that stacks up to modifying data in a column. I guess it depens on the
strategy to store very large strings in columns.

 I use XML a lot for all sorts of purposes, but it is appropriate for
 data transfer rather than data storage, IMNSHO.
 
  Right now, you're quite right. But I want to change that.

 No point, it's a data exchange format, it's not usefull for data storage.
Well, neither one is a data exchange format only or a data storage format
only.
Rather, the difference is that relations are designed to store structued
data while xml is desinged to store semi-structued (not so regular) data.

Which is better suited for data exchange is a matter of convention (where
xml seems to be good), while efficiency and other features of an
implementation determine, which one is suited for data storage.

If your point is that currently xml is not suited for storage, because
there are more efficent RDBMS than xml databases, I agree. Otherwise, I
don't see your point.

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] is GiST still alive?

2003-10-23 Thread Gregor Zeitlinger
On Thu, 23 Oct 2003, Josh Berkus wrote:
 Um, I/O and Page layout are not theory.   They are implementation issues.
yes or no, depending on your point of view.

 Theory would answer things like What are the mathematical operations I can
 use to define compliance or non-compliance with the DTD for a heirarchy and
 for data elements?
snip

Yes, that's also tought of. Most of it was done by a collegue at
university, who came up with the idea, but didn't do any implementation.

  only an implementation is a real proof.

 Implementation is proof of a theory.  But you've got to have the theory first
 or you don't know what you're proving.
agreed.

 Anyway, I don't think you an borrow code from any existing relational
 database,since an XML database would be radically different structurally.
I'm getting this impression, too.

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] is GiST still alive?

2003-10-22 Thread Gregor Zeitlinger
On Wed, 22 Oct 2003, Christopher Browne wrote:
 It leaves open the question of what is the appropriate way of
 expressing XML entities and attributes and CDATA in database form.
snip

Thanx for your advise, but that's not what I had in mind. The original
idea to have a native xml database was that I doesn't work too well in a
relational database.
I was just wondering wheater I have to reinvent the wheel of database
technology when it comes to transaction processing, ACID, and Indexes,
which a native xml database ought to have as well.

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] is GiST still alive?

2003-10-22 Thread Gregor Zeitlinger
On Wed, 22 Oct 2003, Andrew Dunstan wrote:
 But why put yourself to such bother? I have never found a good reason to
 do this sort of thing.
I think there is a huge potential for XML databases once there are good
ones and people start using them more extensively.
But for having real fast xml databases it's too slow to store them in
relational tables. After all, if you were designing a database for xml
data only - would you come up with tables to store them?

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] is GiST still alive?

2003-10-21 Thread Gregor Zeitlinger
Hi,

I'm developing a native XML database (C++) (which is supposed to become
open source one day) and I'm wondering wheather I could use GiST for it's
indexes. Is GiST still alive?

Also, I'm looking for a database that I could use for my XML database.
Right now, I'm using a custom IO layer. Right now, it doesn't seem to be a
problem, because transactions, etc. are not supported yet. I already took
a look at some of the native xml databases such as apache xindice and
exist, but they don't seem to focus on performance and scalability, as I
do.

Would PostgreSQL fit that requirement? And are you interested in having a
fast, scalable XML access method?

-- 
Gregor Zeitlinger
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]