I'm working on a Subversion interface to MediaWiki and am struggling
with the SQL to respond to Subversion's update-report:
http://www.mediawiki.org/wiki/WebDAV

MediaWiki's revision table contains unique revision ids and the
corresponding page id. The page table contains unique page ids and the
corresponding page title.

Suversion's update-report intends to get a list of changes between the
current state of the working copy and a target revision: often HEAD,
MAX(revision.rev_id)

By converting update-report entries to an SQL condition, I can select
rows of the revision table which come after the current state of the
working copy:

  <S:update-report send-all="true" xmlns:S="svn:">
    <S:src-path>http://ket/~jablko/mediawiki/webdav.php</S:src-path>
    <S:target-revision>27</S:target-revision>
    <S:entry rev="18"></S:entry>
    <S:entry rev="20">Test</S:entry>
    <S:entry rev="20">Main_Page</S:entry>
  </S:update-report>

- becomes:

((page_title = 'Main_Page' OR page_title LIKE 'Main\_Page/%') AND 
revision.rev_id > '20' OR NOT (page_title = 'Main_Page' OR page_title LIKE 
'Main\_Page/%') AND ((page_title = 'Test' OR page_title LIKE 'Test/%') AND 
revision.rev_id > '20' OR NOT (page_title = 'Test' OR page_title LIKE 'Test/%') 
AND revision.rev_id > '18'))

Using GROUP BY page_id, I get a list of pages which changed between the
current state of the working copy and the target revision.

My problem: I also need to know if these pages have revisions before
<current-state>, or if they are newly created.

I don't want to do a second query for revisions of <page-list> before
<current-state> because on an initial checkout, <page-list> could be
huge, making an enormous SQL query.

Instead, I think I should do a LEFT JOIN on another instance of the
revision table ("old"), where revisions are before <current-state>. NULL
rows in this table correspond to newly created pages:

SELECT page_title, MAX(new.rev_id), old.rev_id FROM page JOIN revision AS new 
LEFT JOIN revision AS old ON new.rev_page = old.rev_page WHERE new.rev_page = 
page_id AND old.rev_id < new.rev_id AND new.rev_id <= <target-revision> AND 
<new.rev_id greater than current-state> GROUP BY page_id

The problem with this query is that "old.rev_id < new.rev_id" means each
row in "old" is less than _a_ row in "new", not necessarily less than
the _minimum_ row in "new".

I don't want to replace this condition with <old.rev_id less than
current-state> because the <current-state> expression can be long and
complex; I prefer to evaluate it only once, for <new.rev_id greater than
current-state>

I tried replacing "old.rev_id < new.rev_id" with "old.rev_id <
MIN(new.rev_id)" but got a MySQL error: Invalid use of group function

Can anyone recommend how best to query the database for a list of pages
which changed since the current state of the working copy, and whether
those pages are newly created?

Much thanks, Jack

Attachment: signature.asc
Description: Digital signature

Reply via email to