Bug#430782: liferea: Please be more flexible with XML input
On Fri, Sep 21, 2007 at 12:50:19AM +0200, Lars Lindner wrote: Thanks for the link. It helped me to create a solution. Now Liferea copies all global namespace definitions to the generated XHTML root node. This solves the problem for me. Fix available in SVN to be released with 1.4.3 Thanks again for your help. It seems to work, though it does not help me much... it turns out that even with that, the feeds I was having trouble with are still not valid XHTML. They use e.g. the x:num and x:str attributes generated by Excel without appropriate div tags. This was fine when they were rendered as HTML, but now they fail the namespace checks. :-( -- Daniel Jacobowitz CodeSourcery -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On 6/29/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote: On Fri, Jun 29, 2007 at 10:37:38PM +0200, Lars Lindner wrote: For correctness LJ should provide Atom feeds which wrap everything in a div lj:ns=http://livejournal.com/something; Of course prefix lj and URL are fictional and should be replaced with the real values I do not know. Alternatively as you suggest we could have a filter adding it afterwards. Hmm, the header is: rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/' It's just when you go to parse the description that that namespace prefix is not applied; I presume the description is treated as a separate document. Is that at all useful? Yes, it is. I've thought a while over it and I think it is a good idea to treat global namespaces and add them to the extracted feed items content. What I'm missing at the moment is a LiveJournal example feed. I searched a bit but could not find any RSS feeds on LiveJournal. It seems like they per-default only advertise Atom feeds (which of course is a good idea). Maybe you could send me a copy of a feed? Or URLs of freely available feeds with the problem? Lars -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On Thu, Sep 20, 2007 at 11:57:50PM +0200, Lars Lindner wrote: Yes, it is. I've thought a while over it and I think it is a good idea to treat global namespaces and add them to the extracted feed items content. Great! Thanks for getting back to me. What I'm missing at the moment is a LiveJournal example feed. I searched a bit but could not find any RSS feeds on LiveJournal. It seems like they per-default only advertise Atom feeds (which of course is a good idea). Maybe you could send me a copy of a feed? Or URLs of freely available feeds with the problem? RSS feeds are at http://username.livejournal.com/data/rss. I picked one off the front page and here you go: http://community.livejournal.com/nflfans/data/rss Right at the moment the first and fifth items in the feed show this bug. -- Daniel Jacobowitz CodeSourcery -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On Fri, Sep 21, 2007 at 12:50:19AM +0200, Lars Lindner wrote: Thanks for the link. It helped me to create a solution. Now Liferea copies all global namespace definitions to the generated XHTML root node. This solves the problem for me. Fix available in SVN to be released with 1.4.3 Thanks! -- Daniel Jacobowitz CodeSourcery -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On 9/21/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote: On Thu, Sep 20, 2007 at 11:57:50PM +0200, Lars Lindner wrote: Yes, it is. I've thought a while over it and I think it is a good idea to treat global namespaces and add them to the extracted feed items content. Great! Thanks for getting back to me. What I'm missing at the moment is a LiveJournal example feed. I searched a bit but could not find any RSS feeds on LiveJournal. It seems like they per-default only advertise Atom feeds (which of course is a good idea). Maybe you could send me a copy of a feed? Or URLs of freely available feeds with the problem? RSS feeds are at http://username.livejournal.com/data/rss. I picked one off the front page and here you go: http://community.livejournal.com/nflfans/data/rss Right at the moment the first and fifth items in the feed show this bug. Thanks for the link. It helped me to create a solution. Now Liferea copies all global namespace definitions to the generated XHTML root node. This solves the problem for me. Fix available in SVN to be released with 1.4.3 Lars -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On 6/29/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote: Package: liferea Version: 1.2.16b-1 Followup-For: Bug #430782 The error message is accurate: XML Parsing Error: prefix not bound to a namespace Location: file:/// Line Number 450, Column 119: The XML isn't valid, but many feeds seem to have this level of inaccuracy and it would be much more useful for liferea (or the renderer) to cope. LiveJournal seems to be pretty careless though; it also outputs some utf-8 XML files that are not valid utf-8 :-( I don't think the error message is comingf from liferea, but I don't know if it comes from libxml2 or xul. You are correct this is an error message given by libxml2. But you are totally wrong about handling invalid XML. The core idea of XML is to guarantee applications a correct content encoding by ensuring well-formedness and validity of the given data. So suggesting to have weak XML parsing invalidates the idea of XML itself. Also what should a parser do with a file that contains for example partly UTF-8 content and partly Latin-1? There is no way to decide what to do with the byte mess. With XML the rule is applications should *ALWAYS* refuse non-wellformed content. Also when using a library for parsing the application has no way to force tolerant parsing. As for libxml2 I know for sure that the author clearly disagrees with applications wanting to do tolerant parsing. Lars -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On Fri, Jun 29, 2007 at 08:59:37PM +0200, Lars Lindner wrote: You are correct this is an error message given by libxml2. But you are totally wrong about handling invalid XML. The core idea of XML is to guarantee applications a correct content encoding by ensuring well-formedness and validity of the given data. So suggesting to have weak XML parsing invalidates the idea of XML itself. Also what should a parser do with a file that contains for example partly UTF-8 content and partly Latin-1? There is no way to decide what to do with the byte mess. You'll note that I carefully did not suggest Liferea should be tolerant of the messed up UTF-8; I was just complaining about it. I fixed that elsewhere by judicious use of iconv and outside knowledge. An unbound prefix is a very different sort of error from invalid UTF-8. With XML the rule is applications should *ALWAYS* refuse non-wellformed content. Also when using a library for parsing the application has no way to force tolerant parsing. As for libxml2 I know for sure that the author clearly disagrees with applications wanting to do tolerant parsing. In any case, previous versions of liferea were able to display these common entries without trouble. I don't know if that means it did not push article bodies through libxml2; I think it somewhat likely, since this is the escaped contents of the description, not part of the RSS feed proper. Normally that's HTML, with all the attendant sloppiness, rather than well-formed XML. -- Daniel Jacobowitz CodeSourcery -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On 6/29/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote: On Fri, Jun 29, 2007 at 08:59:37PM +0200, Lars Lindner wrote: You are correct this is an error message given by libxml2. But you are totally wrong about handling invalid XML. The core idea of XML is to guarantee applications a correct content encoding by ensuring well-formedness and validity of the given data. So suggesting to have weak XML parsing invalidates the idea of XML itself. Also what should a parser do with a file that contains for example partly UTF-8 content and partly Latin-1? There is no way to decide what to do with the byte mess. You'll note that I carefully did not suggest Liferea should be tolerant of the messed up UTF-8; I was just complaining about it. I fixed that elsewhere by judicious use of iconv and outside knowledge. An unbound prefix is a very different sort of error from invalid UTF-8. Well, it is still complaining :-) With XML the rule is applications should *ALWAYS* refuse non-wellformed content. Also when using a library for parsing the application has no way to force tolerant parsing. As for libxml2 I know for sure that the author clearly disagrees with applications wanting to do tolerant parsing. In any case, previous versions of liferea were able to display these common entries without trouble. I don't know if that means it did not push article bodies through libxml2; I think it somewhat likely, since this is the escaped contents of the description, not part of the RSS feed proper. Normally that's HTML, with all the attendant sloppiness, rather than well-formed XML. The reason is that the 1.0.x series did generate HTML for rendering. 1.2.x uses XHTML which automatically includes namespace checks. And to be honest I see no easy solution for embedded namespaces. Is the feed reader to be expected to extract and merge namespaces defined by the feed (and different ones over time!) into the XHTML generated to render items? I think it is technically possible, but also really troublesome. Lars -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On Fri, Jun 29, 2007 at 09:32:10PM +0200, Lars Lindner wrote: With XML the rule is applications should *ALWAYS* refuse non-wellformed content. Also when using a library for parsing the application has no way to force tolerant parsing. As for libxml2 I know for sure that the author clearly disagrees with applications wanting to do tolerant parsing. In any case, previous versions of liferea were able to display these common entries without trouble. I don't know if that means it did not push article bodies through libxml2; I think it somewhat likely, since this is the escaped contents of the description, not part of the RSS feed proper. Normally that's HTML, with all the attendant sloppiness, rather than well-formed XML. The reason is that the 1.0.x series did generate HTML for rendering. 1.2.x uses XHTML which automatically includes namespace checks. And to be honest I see no easy solution for embedded namespaces. Is the feed reader to be expected to extract and merge namespaces defined by the feed (and different ones over time!) into the XHTML generated to render items? I think it is technically possible, but also really troublesome. I see. We don't have any marker indicating what sort of data the description is, unfortunately. I think expecting it to be well-formed XHTML may be... a little overly optimistic, given the sorts of things that generate RSS feeds. Maybe there's some manual way to avoid this problem - allowing the user to manually bind a specific prefix? If there isn't, I can hack up a filter for my local LJ feeds. But it would be nice if it worked out of the box. -- Daniel Jacobowitz CodeSourcery -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On 6/29/07, Daniel Jacobowitz [EMAIL PROTECTED] wrote: On Fri, Jun 29, 2007 at 09:32:10PM +0200, Lars Lindner wrote: With XML the rule is applications should *ALWAYS* refuse non-wellformed content. Also when using a library for parsing the application has no way to force tolerant parsing. As for libxml2 I know for sure that the author clearly disagrees with applications wanting to do tolerant parsing. In any case, previous versions of liferea were able to display these common entries without trouble. I don't know if that means it did not push article bodies through libxml2; I think it somewhat likely, since this is the escaped contents of the description, not part of the RSS feed proper. Normally that's HTML, with all the attendant sloppiness, rather than well-formed XML. The reason is that the 1.0.x series did generate HTML for rendering. 1.2.x uses XHTML which automatically includes namespace checks. And to be honest I see no easy solution for embedded namespaces. Is the feed reader to be expected to extract and merge namespaces defined by the feed (and different ones over time!) into the XHTML generated to render items? I think it is technically possible, but also really troublesome. I see. We don't have any marker indicating what sort of data the description is, unfortunately. I think expecting it to be well-formed XHTML may be... a little overly optimistic, given the sorts of things that generate RSS feeds. Maybe there's some manual way to avoid this problem - allowing the user to manually bind a specific prefix? If there isn't, I can hack up a filter for my local LJ feeds. But it would be nice if it worked out of the box. For correctness LJ should provide Atom feeds which wrap everything in a div lj:ns=http://livejournal.com/something; Of course prefix lj and URL are fictional and should be replaced with the real values I do not know. Alternatively as you suggest we could have a filter adding it afterwards. Lars -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
On Fri, Jun 29, 2007 at 10:37:38PM +0200, Lars Lindner wrote: For correctness LJ should provide Atom feeds which wrap everything in a div lj:ns=http://livejournal.com/something; Of course prefix lj and URL are fictional and should be replaced with the real values I do not know. Alternatively as you suggest we could have a filter adding it afterwards. Hmm, the header is: rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/' It's just when you go to parse the description that that namespace prefix is not applied; I presume the description is treated as a separate document. Is that at all useful? -- Daniel Jacobowitz CodeSourcery -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
I've run in to the same problem and I can add some observations. It's not just LiveJournal feeds that generate this error message from libxml2. I'm dealing with a malformed feed hosted by Typepad that's causing the same problem. Also, if I try to read such a feed in Combined View it will cause Liferea to lock up. The only way I've found to clear the problem is to clear the feed's folder of other feeds and delete the folder, taking the bad feed with it. This may take several attempts as it usually causes Liferea to crash. Then I resubscribe to the feed and be careful to read it only in Normal view. It would be helpful if someone could confirm this. -- Karl Sackett K4KRS[EMAIL PROTECTED] I have always looked on disobedience toward the oppressive as the only way to use the miracle of having been born. Oriana Fallaci -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#430782: liferea: Please be more flexible with XML input
Package: liferea Version: 1.2.16b-1 Followup-For: Bug #430782 The error message is accurate: XML Parsing Error: prefix not bound to a namespace Location: file:/// Line Number 450, Column 119: The XML isn't valid, but many feeds seem to have this level of inaccuracy and it would be much more useful for liferea (or the renderer) to cope. LiveJournal seems to be pretty careless though; it also outputs some utf-8 XML files that are not valid utf-8 :-( I don't think the error message is comingf from liferea, but I don't know if it comes from libxml2 or xul. -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.21 (SMP w/2 CPU cores; PREEMPT) Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/bash Versions of packages liferea depends on: ii gconf2 2.18.0.1-3 GNOME configuration database syste ii libatk1.0-0 1.18.0-2 The ATK accessibility toolkit ii libc6 2.5-11 GNU C Library: Shared libraries ii libcairo2 1.4.6-1.1The Cairo 2D vector graphics libra ii libdbus-1-3 1.1.1-1 simple interprocess messaging syst ii libdbus-glib-1-20.73-2 simple interprocess messaging syst ii libfontconfig1 2.4.2-1.2generic font configuration library ii libgcc1 1:4.2-20070609-1 GCC support library ii libgconf2-4 2.18.0.1-3 GNOME configuration database syste ii libgcrypt11 1.2.4-2 LGPL Crypto library - runtime libr ii libglib2.0-02.12.12-1The GLib library of C routines ii libgnutls13 1.6.3-1 the GNU TLS library - runtime libr ii libgtk2.0-0 2.10.13-1The GTK+ graphical user interface ii libice6 1:1.0.3-2X11 Inter-Client Exchange library ii liblua505.0.3-2 Main interpreter library for the L ii liblualib50 5.0.3-2 Extension library for the Lua 5.0 ii libnm-glib0 0.6.4-8 network management framework (GLib ii libnotify1 [libnotify1- 0.4.4-3 sends desktop notifications to a n ii libnspr4-0d 4.6.6-3 NetScape Portable Runtime Library ii liborbit2 1:2.14.7-0.1 libraries for ORBit2 - a CORBA ORB ii libpango1.0-0 1.16.4-1 Layout and rendering of internatio ii libsm6 2:1.0.3-1X11 Session Management library ii libstdc++6 4.2-20070609-1 The GNU Standard C++ Library v3 ii libx11-62:1.0.3-7X11 client-side library ii libxcursor1 1:1.1.8-2X cursor management library ii libxext61:1.0.3-2X11 miscellaneous extension librar ii libxfixes3 1:4.0.3-2X11 miscellaneous 'fixes' extensio ii libxi6 1:1.0.1-4X11 Input extension library ii libxinerama11:1.0.2-1X11 Xinerama extension library ii libxml2 2.6.29.dfsg-1GNOME XML library ii libxrandr2 2:1.2.1-1X11 RandR extension library ii libxrender1 1:0.9.2-1X Rendering Extension client libra ii libxslt1.1 1.1.21-1 XSLT processing library - runtime ii libxul0d1.8.1.4-2Gecko engine library ii zlib1g 1:1.2.3-15 compression library - runtime Versions of packages liferea recommends: ii dbus 1.1.1-1simple interprocess messaging syst ii dbus-x11 1.1.1-1simple interprocess messaging syst -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]