Bug#573505: php-doc xml validation error caused by libxml2 / fixed by new upstream version

2010-05-02 Thread Stefano Zacchiroli
On Sat, May 01, 2010 at 04:16:28PM +0200, Matthijs Kooijman wrote:
 (If you're in a hurry, you might want to skip to the end of the mail, since
 the conclusion is partly unrelated to my in-depth analysis of the problem)

Hey Matthijs, thanks a lot for your in depth investigation!

 So, it turns out we can easily fix this problem by packaging the latest
 version of the php documentation. Considering that we're currently shipping a
 version from 2008, that seems like a good idea anyway.

Agreed. My previous comment on that was however that to properly package
new upstream we should additionally package the latest PhD which, IMHO,
deserves to be in a separate (and hence NEW) package. If the maintainer
is willing to go that way, I believe it would be the most appropriate
solution. Unfortunately, I've thus far been unable to get any answer
whatsoever on the matter from Lior.

An alternative path can be to try backporting the upstream fix to the
current version of the package in sid, adding the needed xmlns=
attribute where needed.

Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...| ..: | Je dis tu à tous ceux que j'aime



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#573505: php-doc xml validation error caused by libxml2 / fixed by new upstream version

2010-05-02 Thread Lior Kaplan
On Sun, May 2, 2010 at 10:41 AM, Stefano Zacchiroli z...@debian.org wrote:

 On Sat, May 01, 2010 at 04:16:28PM +0200, Matthijs Kooijman wrote:
   So, it turns out we can easily fix this problem by packaging the latest
  version of the php documentation. Considering that we're currently
 shipping a
  version from 2008, that seems like a good idea anyway.

 Agreed. My previous comment on that was however that to properly package
 new upstream we should additionally package the latest PhD which, IMHO,
 deserves to be in a separate (and hence NEW) package. If the maintainer
 is willing to go that way, I believe it would be the most appropriate
 solution. Unfortunately, I've thus far been unable to get any answer
 whatsoever on the matter from Lior.


Is there any use for PhD outside the php-doc package? If not, I'm not sure
it worth to split them to two separate packages. In any case, I'd like to
first upload a updated package, and only then try to split them.

Kaplan


Bug#573505: php-doc xml validation error caused by libxml2 / fixed by new upstream version

2010-05-02 Thread Stefano Zacchiroli
On Sun, May 02, 2010 at 10:46:27AM +0300, Lior Kaplan wrote:
 Is there any use for PhD outside the php-doc package? If not, I'm not sure
 it worth to split them to two separate packages.

AFAIK PhD is a generic document processor, so there _can_ be. Even
though there are none in Debian yet, there advantages in having the two
separate, e.g.: you can update them separately, you get a proper
uscan/watch notification of new versions of the two, users can use PhD
themselves, etc; all in all, packaging them separately seems to me to be
more the Debian way of doing things, YMMV.

Of course there are disadvantages too, such as the fact that, right now,
PhD will need to pass through NEW.

 In any case, I'd like to first upload a updated package, and only then
 try to split them.

That sounds about right.

Thanks for getting back on this!
Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...| ..: | Je dis tu à tous ceux que j'aime



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#573505: php-doc xml validation error caused by libxml2 / fixed by new upstream version

2010-05-01 Thread Matthijs Kooijman
Hi,

(If you're in a hurry, you might want to skip to the end of the mail, since
the conclusion is partly unrelated to my in-depth analysis of the problem)

it seems PHD is completely innocent for this bug. Upgrading PHD didn't seem to
help (though I'm not 100% sure that the upgrading worked, I did throw out the
old version entirely). From configure.php, it seems that this error occurs
when asking the DOMDocument builtin PHP class to validate, even before PHD is
called at all.

Instead the manual.xml validation errors reported here seem to be caused by
recent changes in libxml2, as suggested in this post:
  http://www.mail-archive.com/x...@gnome.org/msg07188.html

Apparently, there was recently a change in libxml to pass on the current
default namespace when expanding entity references. This was a change to fix
this bug:
  https://bugzilla.gnome.org/show_bug.cgi?id=502960

This problem seems to occur when the entity referenced actually contains
complete nodes, which don't have an xmlns= of themselves. The workaround
suggested in the first post above is to explicitly define a default namespace
inside the node(s) generated by the entity.

From (quickly) looking at the source, for when the Namespace default prefix
was not found error occurs, it seems that the following happens:
 * A new node (presumably from an entity reference) is created, which gets the
   current default namespace passed in (presumably from the node that
   references the entity) (xmlSAX2StartElementNs() in SAX2.c)
 * The new node checks his default namespace, using the xmlSearchNs function
   from tree.c. Its documentation says: We don't allow to cross entities
   boundaries. If you don't declare the namespace within those you will be in
   troubles !!! A warning is generated to cover this case.
 * The namespace could not be found, generating the Namespace default prefix
   was not found error.

This seems related to this (old) bug, marked wontfix:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=174793
This post, from 2002 suggests that the behaviour for xmlSearchNS above is
actually a libxml2 bug, but one whose workaround is good practice anyway:
  http://mail.gnome.org/archives/xslt/2002-January/msg00022.html

This post concerns explicit namespace prefixes in entities, which are not
declared in the entity itself.

However, it is my suspicion that the recent change in libxml is now causing
the libxml bug from 2002 to appear for entities that don't declare a default
namespace either (previously, the node inside the entity just wouldn't have
any namespace associated with it, which was probably not correct either, but
did validate, since xmlSAX2StartElementNS() didn't try to check the namespace
at all).


So it seems the official workaround to this is declaring a default xml
namespace inside each entity declaration (I haven't tried this). I don't know
enough of the XML (NS) specification to know if this is a real solution or if
libxml2 should really be fixed instead...

However, it seems that this problem really isn't Debian specific, so upstream
should be seeing the same problems. Looking at the latest upstream version of
the documentation, it sems that upstream has already applied the workaround.
For example, the frontpage.authors entity at:
  http://svn.php.net/repository/phpdoc/en/trunk/contributors.ent

Closer inspection shows that this is fixed in r290424 and r290427:
  matth...@xanthe:$ svn log -c 290424 -c 290427
  http://svn.php.net/repository/phpdoc/en/trunk
  
  r290424 | bjori | 2009-11-09 16:58:06 +0100 (Mon, 09 Nov 2009) | 3 lines

  Add namespace declaration to all free standing elements
  # See https://bugzilla.gnome.org/show_bug.cgi?id=502960

  
  r290427 | bjori | 2009-11-09 17:04:52 +0100 (Mon, 09 Nov 2009) | 3 lines

  Adding namespace declaration to newly introduced entities
  # See https://bugzilla.gnome.org/show_bug.cgi?id=502960

  




So, it turns out we can easily fix this problem by packaging the latest
version of the php documentation. Considering that we're currently shipping a
version from 2008, that seems like a good idea anyway.

It's not exactly obvious to me how the documentation stuff is organized and
how to get a orig tarball from upstream's svn (it seems that the Debian
package merges a few directories from SVN?), so I haven't tested this theory.
There's probably more things to fix when upgrading, though.


Lior, could you take care of this upgrade? If not, I can see if I can prepare
something, since I'd like to preserve php-doc (though I don't have too much
time, of course :-p).

Gr.

Matthijs


signature.asc
Description: Digital signature