Bug#430782: liferea: Please be more flexible with XML input

2007-10-07 Thread Daniel Jacobowitz
On Fri, Sep 21, 2007 at 12:50:19AM +0200, Lars Lindner wrote:
 Thanks for the link. It helped me to create a solution. Now Liferea
 copies all global namespace definitions to the generated XHTML root
 node. This solves the problem for me.
 
 Fix available in SVN to be released with 1.4.3

Thanks again for your help.  It seems to work, though it does not help
me much... it turns out that even with that, the feeds I was having
trouble with are still not valid XHTML.  They use e.g. the x:num and
x:str attributes generated by Excel without appropriate div tags.
This was fine when they were rendered as HTML, but now they fail the
namespace checks.

:-(

-- 
Daniel Jacobowitz
CodeSourcery



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-09-20 Thread Lars Lindner
On 6/29/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote:
 On Fri, Jun 29, 2007 at 10:37:38PM +0200, Lars Lindner wrote:
  For correctness LJ should provide Atom feeds which wrap everything in a
 
  div lj:ns=http://livejournal.com/something;
 
  Of course prefix lj and URL are fictional and should be replaced
  with the real values I do not know. Alternatively as you suggest we
  could have a filter adding it afterwards.

 Hmm, the header is:

 rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/'

 It's just when you go to parse the description that that namespace
 prefix is not applied; I presume the description is treated as a
 separate document.

 Is that at all useful?

Yes, it is. I've thought a while over it and I think it is a good idea
to treat global namespaces and add them to the extracted feed items
content.
What I'm missing at the moment is a LiveJournal example feed. I
searched a bit but could not find any RSS feeds on LiveJournal. It
seems like they per-default only advertise Atom feeds (which of course
is a good idea).

Maybe you could send me a copy of a feed? Or URLs of freely available
feeds with the problem?

Lars



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-09-20 Thread Daniel Jacobowitz
On Thu, Sep 20, 2007 at 11:57:50PM +0200, Lars Lindner wrote:
 Yes, it is. I've thought a while over it and I think it is a good idea
 to treat global namespaces and add them to the extracted feed items
 content.

Great!  Thanks for getting back to me.

 What I'm missing at the moment is a LiveJournal example feed. I
 searched a bit but could not find any RSS feeds on LiveJournal. It
 seems like they per-default only advertise Atom feeds (which of course
 is a good idea).
 
 Maybe you could send me a copy of a feed? Or URLs of freely available
 feeds with the problem?

RSS feeds are at http://username.livejournal.com/data/rss.  I picked
one off the front page and here you go:
  http://community.livejournal.com/nflfans/data/rss

Right at the moment the first and fifth items in the feed show this bug.

-- 
Daniel Jacobowitz
CodeSourcery



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-09-20 Thread Daniel Jacobowitz
On Fri, Sep 21, 2007 at 12:50:19AM +0200, Lars Lindner wrote:
 Thanks for the link. It helped me to create a solution. Now Liferea
 copies all global namespace definitions to the generated XHTML root
 node. This solves the problem for me.
 
 Fix available in SVN to be released with 1.4.3

Thanks!

-- 
Daniel Jacobowitz
CodeSourcery



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-09-20 Thread Lars Lindner
On 9/21/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote:
 On Thu, Sep 20, 2007 at 11:57:50PM +0200, Lars Lindner wrote:
  Yes, it is. I've thought a while over it and I think it is a good idea
  to treat global namespaces and add them to the extracted feed items
  content.

 Great!  Thanks for getting back to me.

  What I'm missing at the moment is a LiveJournal example feed. I
  searched a bit but could not find any RSS feeds on LiveJournal. It
  seems like they per-default only advertise Atom feeds (which of course
  is a good idea).
 
  Maybe you could send me a copy of a feed? Or URLs of freely available
  feeds with the problem?

 RSS feeds are at http://username.livejournal.com/data/rss.  I picked
 one off the front page and here you go:
   http://community.livejournal.com/nflfans/data/rss

 Right at the moment the first and fifth items in the feed show this bug.

Thanks for the link. It helped me to create a solution. Now Liferea
copies all global namespace definitions to the generated XHTML root
node. This solves the problem for me.

Fix available in SVN to be released with 1.4.3

Lars



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-06-29 Thread Lars Lindner

On 6/29/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote:

Package: liferea
Version: 1.2.16b-1
Followup-For: Bug #430782

The error message is accurate:

XML Parsing Error: prefix not bound to a namespace
Location: file:///
Line Number 450, Column 119:

The XML isn't valid, but many feeds seem to have this level of
inaccuracy and it would be much more useful for liferea (or the
renderer) to cope.  LiveJournal seems to be pretty careless though; it
also outputs some utf-8 XML files that are not valid utf-8 :-(

I don't think the error message is comingf from liferea, but I don't
know if it comes from libxml2 or xul.


You are correct this is an error message given by libxml2.

But you are totally wrong about handling invalid XML. The core
idea of XML is to guarantee applications a correct content encoding
by ensuring well-formedness and validity of the given data.

So suggesting to have weak XML parsing invalidates the idea of XML itself.
Also what should a parser do with a file that contains for example partly
UTF-8 content and partly Latin-1? There is no way to decide what to do
with the byte mess.

With XML the rule is applications should *ALWAYS* refuse non-wellformed
content. Also when using a library for parsing the application has no
way to force tolerant parsing. As for libxml2 I know for sure that the
author clearly disagrees with applications wanting to do tolerant parsing.

Lars


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-06-29 Thread Daniel Jacobowitz
On Fri, Jun 29, 2007 at 08:59:37PM +0200, Lars Lindner wrote:
 You are correct this is an error message given by libxml2.
 
 But you are totally wrong about handling invalid XML. The core
 idea of XML is to guarantee applications a correct content encoding
 by ensuring well-formedness and validity of the given data.
 
 So suggesting to have weak XML parsing invalidates the idea of XML itself.
 Also what should a parser do with a file that contains for example partly
 UTF-8 content and partly Latin-1? There is no way to decide what to do
 with the byte mess.

You'll note that I carefully did not suggest Liferea should be
tolerant of the messed up UTF-8; I was just complaining about it.  I
fixed that elsewhere by judicious use of iconv and outside knowledge.

An unbound prefix is a very different sort of error from invalid
UTF-8.

 With XML the rule is applications should *ALWAYS* refuse non-wellformed
 content. Also when using a library for parsing the application has no
 way to force tolerant parsing. As for libxml2 I know for sure that the
 author clearly disagrees with applications wanting to do tolerant parsing.

In any case, previous versions of liferea were able to display these
common entries without trouble.  I don't know if that means it did not
push article bodies through libxml2; I think it somewhat likely, since
this is the escaped contents of the description, not part of the RSS
feed proper.  Normally that's HTML, with all the attendant sloppiness,
rather than well-formed XML.

-- 
Daniel Jacobowitz
CodeSourcery


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-06-29 Thread Lars Lindner

On 6/29/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote:

On Fri, Jun 29, 2007 at 08:59:37PM +0200, Lars Lindner wrote:
 You are correct this is an error message given by libxml2.

 But you are totally wrong about handling invalid XML. The core
 idea of XML is to guarantee applications a correct content encoding
 by ensuring well-formedness and validity of the given data.

 So suggesting to have weak XML parsing invalidates the idea of XML itself.
 Also what should a parser do with a file that contains for example partly
 UTF-8 content and partly Latin-1? There is no way to decide what to do
 with the byte mess.

You'll note that I carefully did not suggest Liferea should be
tolerant of the messed up UTF-8; I was just complaining about it.  I
fixed that elsewhere by judicious use of iconv and outside knowledge.

An unbound prefix is a very different sort of error from invalid
UTF-8.


Well, it is still complaining :-)


 With XML the rule is applications should *ALWAYS* refuse non-wellformed
 content. Also when using a library for parsing the application has no
 way to force tolerant parsing. As for libxml2 I know for sure that the
 author clearly disagrees with applications wanting to do tolerant parsing.

In any case, previous versions of liferea were able to display these
common entries without trouble.  I don't know if that means it did not
push article bodies through libxml2; I think it somewhat likely, since
this is the escaped contents of the description, not part of the RSS
feed proper.  Normally that's HTML, with all the attendant sloppiness,
rather than well-formed XML.


The reason is that the 1.0.x series did generate HTML for rendering.
1.2.x uses XHTML which automatically includes namespace checks. And to
be honest I see no easy solution for embedded namespaces. Is the feed
reader to be expected to extract and merge namespaces defined by the
feed (and different ones over time!) into the XHTML generated to
render items? I think it is technically possible, but also really
troublesome.

Lars


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-06-29 Thread Daniel Jacobowitz
On Fri, Jun 29, 2007 at 09:32:10PM +0200, Lars Lindner wrote:
   With XML the rule is applications should *ALWAYS* refuse non-wellformed
   content. Also when using a library for parsing the application has no
   way to force tolerant parsing. As for libxml2 I know for sure that the
   author clearly disagrees with applications wanting to do tolerant parsing.
 
  In any case, previous versions of liferea were able to display these
  common entries without trouble.  I don't know if that means it did not
  push article bodies through libxml2; I think it somewhat likely, since
  this is the escaped contents of the description, not part of the RSS
  feed proper.  Normally that's HTML, with all the attendant sloppiness,
  rather than well-formed XML.
 
 The reason is that the 1.0.x series did generate HTML for rendering.
 1.2.x uses XHTML which automatically includes namespace checks. And to
 be honest I see no easy solution for embedded namespaces. Is the feed
 reader to be expected to extract and merge namespaces defined by the
 feed (and different ones over time!) into the XHTML generated to
 render items? I think it is technically possible, but also really
 troublesome.

I see.  We don't have any marker indicating what sort of data the
description is, unfortunately.  I think expecting it to be
well-formed XHTML may be... a little overly optimistic, given the
sorts of things that generate RSS feeds.

Maybe there's some manual way to avoid this problem - allowing the
user to manually bind a specific prefix?

If there isn't, I can hack up a filter for my local LJ feeds.  But it
would be nice if it worked out of the box.

-- 
Daniel Jacobowitz
CodeSourcery


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-06-29 Thread Lars Lindner

On 6/29/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote:

On Fri, Jun 29, 2007 at 09:32:10PM +0200, Lars Lindner wrote:
   With XML the rule is applications should *ALWAYS* refuse non-wellformed
   content. Also when using a library for parsing the application has no
   way to force tolerant parsing. As for libxml2 I know for sure that the
   author clearly disagrees with applications wanting to do tolerant parsing.
 
  In any case, previous versions of liferea were able to display these
  common entries without trouble.  I don't know if that means it did not
  push article bodies through libxml2; I think it somewhat likely, since
  this is the escaped contents of the description, not part of the RSS
  feed proper.  Normally that's HTML, with all the attendant sloppiness,
  rather than well-formed XML.

 The reason is that the 1.0.x series did generate HTML for rendering.
 1.2.x uses XHTML which automatically includes namespace checks. And to
 be honest I see no easy solution for embedded namespaces. Is the feed
 reader to be expected to extract and merge namespaces defined by the
 feed (and different ones over time!) into the XHTML generated to
 render items? I think it is technically possible, but also really
 troublesome.

I see.  We don't have any marker indicating what sort of data the
description is, unfortunately.  I think expecting it to be
well-formed XHTML may be... a little overly optimistic, given the
sorts of things that generate RSS feeds.

Maybe there's some manual way to avoid this problem - allowing the
user to manually bind a specific prefix?

If there isn't, I can hack up a filter for my local LJ feeds.  But it
would be nice if it worked out of the box.


For correctness LJ should provide Atom feeds which wrap everything in a

div lj:ns=http://livejournal.com/something;

Of course prefix lj and URL are fictional and should be replaced
with the real values I do not know. Alternatively as you suggest we
could have a filter adding it afterwards.

Lars


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-06-29 Thread Daniel Jacobowitz
On Fri, Jun 29, 2007 at 10:37:38PM +0200, Lars Lindner wrote:
 For correctness LJ should provide Atom feeds which wrap everything in a
 
 div lj:ns=http://livejournal.com/something;
 
 Of course prefix lj and URL are fictional and should be replaced
 with the real values I do not know. Alternatively as you suggest we
 could have a filter adding it afterwards.

Hmm, the header is:

rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/'

It's just when you go to parse the description that that namespace
prefix is not applied; I presume the description is treated as a
separate document.

Is that at all useful?

-- 
Daniel Jacobowitz
CodeSourcery


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-06-29 Thread Karl Sackett
I've run in to the same problem and I can add some observations.  It's 
not just LiveJournal feeds that generate this error message from 
libxml2.  I'm dealing with a malformed feed hosted by Typepad that's 
causing the same problem.


Also, if I try to read such a feed in Combined View it will cause 
Liferea to lock up.  The only way I've found to clear the problem is to 
clear the feed's folder of other feeds and delete the folder, taking the 
bad feed with it.  This may take several attempts as it usually causes 
Liferea to crash.  Then I resubscribe to the feed and be careful to read 
it only in Normal view.


It would be helpful if someone could confirm this.

--
Karl Sackett K4KRS[EMAIL PROTECTED]

I have always looked on disobedience toward the oppressive as the only
 way to use the miracle of having been born.
 Oriana Fallaci


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#430782: liferea: Please be more flexible with XML input

2007-06-28 Thread Daniel Jacobowitz
Package: liferea
Version: 1.2.16b-1
Followup-For: Bug #430782

The error message is accurate:

XML Parsing Error: prefix not bound to a namespace
Location: file:///
Line Number 450, Column 119:

The XML isn't valid, but many feeds seem to have this level of
inaccuracy and it would be much more useful for liferea (or the
renderer) to cope.  LiveJournal seems to be pretty careless though; it
also outputs some utf-8 XML files that are not valid utf-8 :-(

I don't think the error message is comingf from liferea, but I don't
know if it comes from libxml2 or xul.

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.21 (SMP w/2 CPU cores; PREEMPT)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages liferea depends on:
ii  gconf2  2.18.0.1-3   GNOME configuration database syste
ii  libatk1.0-0 1.18.0-2 The ATK accessibility toolkit
ii  libc6   2.5-11   GNU C Library: Shared libraries
ii  libcairo2   1.4.6-1.1The Cairo 2D vector graphics libra
ii  libdbus-1-3 1.1.1-1  simple interprocess messaging syst
ii  libdbus-glib-1-20.73-2   simple interprocess messaging syst
ii  libfontconfig1  2.4.2-1.2generic font configuration library
ii  libgcc1 1:4.2-20070609-1 GCC support library
ii  libgconf2-4 2.18.0.1-3   GNOME configuration database syste
ii  libgcrypt11 1.2.4-2  LGPL Crypto library - runtime libr
ii  libglib2.0-02.12.12-1The GLib library of C routines
ii  libgnutls13 1.6.3-1  the GNU TLS library - runtime libr
ii  libgtk2.0-0 2.10.13-1The GTK+ graphical user interface 
ii  libice6 1:1.0.3-2X11 Inter-Client Exchange library
ii  liblua505.0.3-2  Main interpreter library for the L
ii  liblualib50 5.0.3-2  Extension library for the Lua 5.0 
ii  libnm-glib0 0.6.4-8  network management framework (GLib
ii  libnotify1 [libnotify1- 0.4.4-3  sends desktop notifications to a n
ii  libnspr4-0d 4.6.6-3  NetScape Portable Runtime Library
ii  liborbit2   1:2.14.7-0.1 libraries for ORBit2 - a CORBA ORB
ii  libpango1.0-0   1.16.4-1 Layout and rendering of internatio
ii  libsm6  2:1.0.3-1X11 Session Management library
ii  libstdc++6  4.2-20070609-1   The GNU Standard C++ Library v3
ii  libx11-62:1.0.3-7X11 client-side library
ii  libxcursor1 1:1.1.8-2X cursor management library
ii  libxext61:1.0.3-2X11 miscellaneous extension librar
ii  libxfixes3  1:4.0.3-2X11 miscellaneous 'fixes' extensio
ii  libxi6  1:1.0.1-4X11 Input extension library
ii  libxinerama11:1.0.2-1X11 Xinerama extension library
ii  libxml2 2.6.29.dfsg-1GNOME XML library
ii  libxrandr2  2:1.2.1-1X11 RandR extension library
ii  libxrender1 1:0.9.2-1X Rendering Extension client libra
ii  libxslt1.1  1.1.21-1 XSLT processing library - runtime 
ii  libxul0d1.8.1.4-2Gecko engine library
ii  zlib1g  1:1.2.3-15   compression library - runtime

Versions of packages liferea recommends:
ii  dbus  1.1.1-1simple interprocess messaging syst
ii  dbus-x11  1.1.1-1simple interprocess messaging syst

-- no debconf information


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]