Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-23 Thread David Powell


Thursday, February 23, 2006, 6:37:50 AM, you wrote:

> Does someone who has access to an MSFT system care to take a
> look at this?

I have been playing with IE7, and it is interesting to see what
happens when you click on a feed and "view source".

If the feed hasn't been subscribed to, you just see the feed source as
you would expect.

If you have subscribed to the feed however, you see Windows's internal
representation of the feed, which is normalised to a sort of RSS2++. I
assume that this is what is exposed when you use the APIs to access
the XML.

(Hmm - giving access to the XML in this way is a brave move, XML has a
huge surface area for an API, practically any change to the XML
produced by Windows could break client applications, and I didn't find
any documentation for the normalised RSS2++ ).

What is interesting is that Atom is handled (reasonably well), by
converting the Atom to RSS2. The logic seems to replace atom elements
with there RSS2 equivalents and the loss in fidelity is not too great
(eg atom:updated -> pubDate), and to leave the Atom as-is for awkward
(eg: [EMAIL PROTECTED]/xml)

There is definitely some loss in fidelity though.  It would be nice to
run an extreme Atom feed through the process to see what gets lost.
xml:base appears to get corrupted, and unless the API provides access
to the baseURI of each entry there is a risk of data loss (as the
xml:base at feed level may change between polls it therefore needs to
be preserved with each entry.)

Does anyone have a bad-ass atom feed with IRIs, binary content,
atom:source, xml:base, xml:lang, extensions etc for testing?

-- 
Dave



RE: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-24 Thread Sean Lyndersay


The normalized XML that you're seeing in View Source is also accessible from 
the feed APIs, so the XML we generate is a format we expect to support in 
perpetuity. 

It's designed to be a relatively simple format that application developers can 
rely on in the same way that they rely on APIs in the object model, so we map 
all common elements from other formats into RSS 2.0 (the basis for our native 
format). Why RSS 2.0? Because it's the format used by the majority of feeds on 
the web. Since this is an internal format between the platform and its clients, 
it theoretically doesn't matter what we chose as long as there's no data loss 
(and as long as we document it -- which we're in the process of doing). In the 
Atom case, in particular, we occasionally need to bring Atom elements through 
as RSS 2.0 extensions. 

Any case of data-loss is a bug that we'll address (that's the point of a Beta 
:). If you have cases of sites where there is data-loss, you can either send it 
me, send it to [EMAIL PROTECTED] or post to the feedback wiki where we're 
tracking feeds that we're not handling correctly [1].

I'm in the process of publishing the documentation for how the Windows RSS 
Platform handles each feed format on our blog [2].

If someone does have a particularly complex Atom feed, we'd love to use it for 
our own testing to make sure we're handling all of the Atom-specific data 
correctly, so just send me a link.

As a general statement, if you have question about what IE7/Windows is or is 
not doing with feeds, just drop me a line. 

Thanks,

Sean 

 [1] 
http://channel9.msdn.com/wiki/default.aspx/Channel9.InternetExplorerFeedIssues 
 [2] http://blogs.msdn.com/rssteam


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Powell
Sent: Thursday, February 23, 2006 7:29 AM
To: A. Pagaltzis
Cc: Atom Syntax
Subject: Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test



Thursday, February 23, 2006, 6:37:50 AM, you wrote:

> Does someone who has access to an MSFT system care to take a look at 
> this?

I have been playing with IE7, and it is interesting to see what happens when 
you click on a feed and "view source".

If the feed hasn't been subscribed to, you just see the feed source as you 
would expect.

If you have subscribed to the feed however, you see Windows's internal 
representation of the feed, which is normalised to a sort of RSS2++. I assume 
that this is what is exposed when you use the APIs to access the XML.

(Hmm - giving access to the XML in this way is a brave move, XML has a huge 
surface area for an API, practically any change to the XML produced by Windows 
could break client applications, and I didn't find any documentation for the 
normalised RSS2++ ).

What is interesting is that Atom is handled (reasonably well), by converting 
the Atom to RSS2. The logic seems to replace atom elements with there RSS2 
equivalents and the loss in fidelity is not too great (eg atom:updated -> 
pubDate), and to leave the Atom as-is for awkward
(eg: [EMAIL PROTECTED]/xml)

There is definitely some loss in fidelity though.  It would be nice to run an 
extreme Atom feed through the process to see what gets lost.
xml:base appears to get corrupted, and unless the API provides access to the 
baseURI of each entry there is a risk of data loss (as the xml:base at feed 
level may change between polls it therefore needs to be preserved with each 
entry.)

Does anyone have a bad-ass atom feed with IRIs, binary content, atom:source, 
xml:base, xml:lang, extensions etc for testing?

--
Dave




Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-24 Thread James M Snell

Thanks for the heads up on this.  Check my blog[1] for a few comments

[1] http://www.snellspace.com/wp/?p=268

- James

Sean Lyndersay wrote:
> 
> The normalized XML that you're seeing in View Source is also accessible from 
> the feed APIs, so the XML we generate is a format we expect to support in 
> perpetuity. 
> 
> It's designed to be a relatively simple format that application developers 
> can rely on in the same way that they rely on APIs in the object model, so we 
> map all common elements from other formats into RSS 2.0 (the basis for our 
> native format). Why RSS 2.0? Because it's the format used by the majority of 
> feeds on the web. Since this is an internal format between the platform and 
> its clients, it theoretically doesn't matter what we chose as long as there's 
> no data loss (and as long as we document it -- which we're in the process of 
> doing). In the Atom case, in particular, we occasionally need to bring Atom 
> elements through as RSS 2.0 extensions. 
> 
> Any case of data-loss is a bug that we'll address (that's the point of a Beta 
> :). If you have cases of sites where there is data-loss, you can either send 
> it me, send it to [EMAIL PROTECTED] or post to the feedback wiki where we're 
> tracking feeds that we're not handling correctly [1].
> 
> I'm in the process of publishing the documentation for how the Windows RSS 
> Platform handles each feed format on our blog [2].
> 
> If someone does have a particularly complex Atom feed, we'd love to use it 
> for our own testing to make sure we're handling all of the Atom-specific data 
> correctly, so just send me a link.
> 
> As a general statement, if you have question about what IE7/Windows is or is 
> not doing with feeds, just drop me a line. 
> 
> Thanks,
> 
> Sean 
> 
>  [1] 
> http://channel9.msdn.com/wiki/default.aspx/Channel9.InternetExplorerFeedIssues
>  
>  [2] http://blogs.msdn.com/rssteam
> 
> 
> -Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Powell
> Sent: Thursday, February 23, 2006 7:29 AM
> To: A. Pagaltzis
> Cc: Atom Syntax
> Subject: Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test
> 
> 
> 
> Thursday, February 23, 2006, 6:37:50 AM, you wrote:
> 
>> Does someone who has access to an MSFT system care to take a look at 
>> this?
> 
> I have been playing with IE7, and it is interesting to see what happens when 
> you click on a feed and "view source".
> 
> If the feed hasn't been subscribed to, you just see the feed source as you 
> would expect.
> 
> If you have subscribed to the feed however, you see Windows's internal 
> representation of the feed, which is normalised to a sort of RSS2++. I assume 
> that this is what is exposed when you use the APIs to access the XML.
> 
> (Hmm - giving access to the XML in this way is a brave move, XML has a huge 
> surface area for an API, practically any change to the XML produced by 
> Windows could break client applications, and I didn't find any documentation 
> for the normalised RSS2++ ).
> 
> What is interesting is that Atom is handled (reasonably well), by converting 
> the Atom to RSS2. The logic seems to replace atom elements with there RSS2 
> equivalents and the loss in fidelity is not too great (eg atom:updated -> 
> pubDate), and to leave the Atom as-is for awkward
> (eg: [EMAIL PROTECTED]/xml)
> 
> There is definitely some loss in fidelity though.  It would be nice to run an 
> extreme Atom feed through the process to see what gets lost.
> xml:base appears to get corrupted, and unless the API provides access to the 
> baseURI of each entry there is a risk of data loss (as the xml:base at feed 
> level may change between polls it therefore needs to be preserved with each 
> entry.)
> 
> Does anyone have a bad-ass atom feed with IRIs, binary content, atom:source, 
> xml:base, xml:lang, extensions etc for testing?
> 
> --
> Dave
> 
> 
> 



RE: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-24 Thread Sean Lyndersay

Thanks James,

There's a general comment about normalization that I should address. First, 
normalization is something we have to do -- every aggregator on the planet has 
a native format that they work with, and they convert to that format before 
doing anything else. In our case, since we're developing a platform as well, we 
make that normalized format available to any user of the platform. We also 
provide a object model over the normalized feed.

The primary reason for providing XML is that some developers find it easier to 
build applications using XML+XSL/XPath, rather than using an object model. The 
other major reason is to enable access to any elements that we don't natively 
support (i.e. most extensions) in the object model. To make this easier, we 
provide access to normalized XML version of the entire feed, or a fragment that 
represents a single item. 

Giving the original format to the developer is of questionable value since that 
makes the developer have to do all the work of learning 4 different feed 
formats just to get the data they need. For this reason, we kept the normalized 
format as close the most commonly used format (RSS 2.0) as possible, with 
extensions where necessary to keep the spirit of the Atom 1.0 format intact. 

Obviously, we have some bugs still, as you've pointed out, which we'll address 
for the next Beta.

I'm sure that many people -- on this list in particular -- think that the right 
thing to do is to normalize to Atom 1.0, instead. Yep, that's certainly one way 
to think about it. But then I'd be having this same discussion with Dave and 
with folks on rss-public. :) In short, I'd rather avoid the issue altogether 
and provide some value to the developers who are using the platform -- which 
means preventing them from having to learn several different formats to get 
common data, while allowing them to get access to extensions.

Addressing your post's issues in order: 

#1. We make no active effort to re-order the items in the feed. By default, 
they should end up being ordered by the date elements in the feed. As you 
noted, neither Atom nor RSS 2.0/1.0 require that order be respected. 

#2. Stripping out extensions. This is a bug we'll address.

#3. Invalid Atom. This is a bug we'll address.

#4. We'll look into addressing this with a namespace. 


Thanks for the detailed post and comments. If you have any other questions, 
just let me know.
Sean

-Original Message-
From: James M Snell [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 24, 2006 2:39 PM
To: Sean Lyndersay
Cc: Atom Syntax
Subject: Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

Thanks for the heads up on this.  Check my blog[1] for a few comments

[1] http://www.snellspace.com/wp/?p=268

- James

Sean Lyndersay wrote:
> 
> The normalized XML that you're seeing in View Source is also accessible from 
> the feed APIs, so the XML we generate is a format we expect to support in 
> perpetuity. 
> 
> It's designed to be a relatively simple format that application developers 
> can rely on in the same way that they rely on APIs in the object model, so we 
> map all common elements from other formats into RSS 2.0 (the basis for our 
> native format). Why RSS 2.0? Because it's the format used by the majority of 
> feeds on the web. Since this is an internal format between the platform and 
> its clients, it theoretically doesn't matter what we chose as long as there's 
> no data loss (and as long as we document it -- which we're in the process of 
> doing). In the Atom case, in particular, we occasionally need to bring Atom 
> elements through as RSS 2.0 extensions. 
> 
> Any case of data-loss is a bug that we'll address (that's the point of a Beta 
> :). If you have cases of sites where there is data-loss, you can either send 
> it me, send it to [EMAIL PROTECTED] or post to the feedback wiki where we're 
> tracking feeds that we're not handling correctly [1].
> 
> I'm in the process of publishing the documentation for how the Windows RSS 
> Platform handles each feed format on our blog [2].
> 
> If someone does have a particularly complex Atom feed, we'd love to use it 
> for our own testing to make sure we're handling all of the Atom-specific data 
> correctly, so just send me a link.
> 
> As a general statement, if you have question about what IE7/Windows is or is 
> not doing with feeds, just drop me a line. 
> 
> Thanks,
> 
> Sean
> 
>  [1] 
> http://channel9.msdn.com/wiki/default.aspx/Channel9.InternetExplorerFe
> edIssues
>  [2] http://blogs.msdn.com/rssteam
> 
> 
> -----Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of David Powell
> Sent: Thursday, February 23, 2006 7:29 AM
&g

Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-24 Thread James M Snell



Sean Lyndersay wrote:
> Thanks James,
> 
>There's a general comment about normalization that I should address. First, 
>normalization 
>is something we have to do -- every aggregator on the planet has a native 
>format that they 
>work with, and they convert to that format before doing anything else. In our 
>case, since 
>we're developing a platform as well, we make that normalized format
available to any user
>of the platform. We also provide a object model over the normalized feed.
>

Yep, completely understood, but it's still rather annoying that I can't
seem to be able to get to the original input... or at the very least,
have an option of retrieving the XML as Atom if the original input was
Atom... if not by default, then as an additional API call.

If the MS RSS stuff was *just* an aggregator, I wouldn't care so much
about this, but given that it's also a platform, I, as a publisher,
don't take too kindly to that platform completely changing and hiding
what I originally published from developers.

> Giving the original format to the developer is of questionable value since 
> that makes the 
> developer have to do all the work of learning 4 different feed formats just 
> to get the data 
>they need. For this reason, we kept the normalized format as close the most 
>commonly used format 

Like I said, providing the normalized form is a reasonable default
behavior, but for those sick and twisted individuals that want to get to
the original format, you should let them.

> Obviously, we have some bugs still, as you've pointed out, which we'll 
> address for the next Beta.
> 

:-D

> I'm sure that many people -- on this list in particular -- think that the 
> right thing to do is 
> to normalize to Atom 1.0, instead. 

I would have preferred that, but I can understand the choice y'all made.

> #1. We make no active effort to re-order the items in the feed. By default, 
> they should end 
> up being ordered by the date elements in the feed. As you noted, neither Atom 
> nor RSS 2.0/1.0 require that order be respected. 
> 

Would it be reasonable to make sure that it sorts 'em in descending
order by date? :-)

> #2. Stripping out extensions. This is a bug we'll address.
> 
> #3. Invalid Atom. This is a bug we'll address.
> 
> #4. We'll look into addressing this with a namespace. 
> 

Excellent.

> 
> Thanks for the detailed post and comments. If you have any other questions, 
> just let me know.

I'll likely be hitting the API again in the next couple of weeks.  In
the meantime, please make sure y'all are taking a look at the
conformance tests on the Atom wiki. :-)

- James



Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-24 Thread A. Pagaltzis

Hi Sean,

first of all, thanks for responding. Now, to your points:

* Sean Lyndersay <[EMAIL PROTECTED]> [2006-02-25 00:20]:
>Giving the original format to the developer is of questionable
>value since that makes the developer have to do all the work of
>learning 4 different feed formats just to get the data they
>need.

I agree that for the common simple needs, it is helpful for
developers to standardize on a single format. But why would you
refuse the pristine input to developers who are willing to do
extra work in order to get extra value out of the input? I don't
see any reason not to offer both options; it's not an either/or
question.

>I'm sure that many people -- on this list in particular -- think
>that the right thing to do is to normalize to Atom 1.0, instead.
>Yep, that's certainly one way to think about it. But then I'd be
>having this same discussion with Dave and with folks on
>rss-public. :)

Another way to look at this is that you are not actually
normalising to RSS 2.0.

The Atom model is a proper superset of the RSS 2.0 model, so
anything that can be expressed in RSS 2.0 can also be expressed
faithfully in Atom, whereas the reverse is not true and there is
loss of fidelity. You overcome this by adding namespaced Atom
elements to RSS 2.0; put plainly, you end up with a format that
isn't RSS 2.0 so much as a funky Atom-in-RSS-2.0 thing.

But when all is said and done, the names between the angle
brackets don't matter much. My concern is rather that the
benefits that the Atom model brings to the table, both in added
capabilities as well as in rigor, be available to as many users
as soon as possible.

And to that end, the following:

>In short, I'd rather avoid the issue altogether and provide some
>value to the developers who are using the platform -- which
>means preventing them from having to learn several different
>formats to get common data, while allowing them to get access to
>extensions.

could be a great opportunity. But my question here is, how will
the Atom extensions be documented? Do they constitute an integral
part of the core normalized format? Or will they just be
second-class citizen which developers will have to be
individually persuaded to support? How much of the added
capabilities will be available through the API?

In a nutshell, how likely is it that developers aiming to do the
simplest thing that can possibly work will support the full
capabilities of Atom?

Regards,
-- 
Aristotle Pagaltzis // 



Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-24 Thread Robert Sayre

On 2/24/06, Sean Lyndersay <[EMAIL PROTECTED]> wrote:
>
> Giving the original format to the developer is of questionable value since 
> that makes the
> developer have to do all the work of learning 4 different feed formats just 
> to get the data
> they need. For this reason, we kept the normalized format as close the most 
> commonly
> used format (RSS 2.0) as possible, with extensions where necessary to keep 
> the spirit of
> the Atom 1.0 format intact.

Works for me.

--

Robert Sayre

"I would have written a shorter letter, but I did not have the time."



Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-24 Thread Sam Ruby

Sean Lyndersay wrote:
> 
> The normalized XML that you're seeing in View Source is also
> accessible from the feed APIs, so the XML we generate is a format we
> expect to support in perpetuity.
> 
> It's designed to be a relatively simple format that application
> developers can rely on in the same way that they rely on APIs in the
> object model, so we map all common elements from other formats into
> RSS 2.0 (the basis for our native format). Why RSS 2.0? Because it's
> the format used by the majority of feeds on the web.

>From reports I have seen, you are doing things like adding type
attributes on the description element?  If so, it isn't RSS 2.0.

How do you plan to handle multiple enclosures?

How about HTML in titles?

On Feed-Tech I saw a post by Phil Stanhope indicating the importance of
sub-second times in certain scenarios.  How will this be expressed in
RFC 822 format?

How about content that is in a binary format?

I can go on...

> Any case of data-loss is a bug that we'll address

If that is a blanket statement, then I expect that you will be seeing a
lot of bug reports.

- Sam Ruby



RE: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-24 Thread Sean Lyndersay


As an format that is not intended for publishing on the web, but which is 
really a contract between the platform and the clients of the platform, the 
native format we use is not really required to be "pure" RSS 2.0. As I said, we 
use RSS 2.0 as the basis for the native format to leverage developers 
understanding of that format. I don't think that the addition of a single 
attribute to an element quite invalidates the entire premise, but if it really 
upsets everyone, we can put it in a namespace.

Obviously, the further off the beaten path you get from typical feeds, the less 
like a pure RSS 2.0 feed it will look. 

Especially in the case of Atom 1.0, we detect various cases that aren't 
supported by RSS 2.0, and model them as extensions. Binary content is an 
example where we'll include the entire atom element as an RSS 2.0 extension in 
the feed. 

Since RFC 822 doesn't support sub-second times, we'll truncate those in the XML 
we return, but the sub-second times is available via the object model, if an 
application needs it.

Right now, we don't support multiple enclosures, so representing it in the XML 
isn't an issue. When we do - in a future release - we could include multiple 
enclosure elements in the XML, or create an extension (or leverage the Yahoo 
Media RSS extensions). 

At the end of the day, I'll readily admit that complex Atom 1.0 feeds won't 
look pretty in RSS 2.0+Atom-extensions. That said, the vast majority of all 
Atom feeds on the Internet today convert to RSS 2.0 with zero issues. 

I'll repeat what I said in a different response: This isn't the end of the API 
set, and this feedback is useful to figuring out what needs to happen for the 
next version. 

Thanks,
Sean 

-Original Message-
From: Sam Ruby [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 24, 2006 5:32 PM
To: Sean Lyndersay
Cc: Atom Syntax
Subject: Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

Sean Lyndersay wrote:
> 
> The normalized XML that you're seeing in View Source is also 
> accessible from the feed APIs, so the XML we generate is a format we 
> expect to support in perpetuity.
> 
> It's designed to be a relatively simple format that application 
> developers can rely on in the same way that they rely on APIs in the 
> object model, so we map all common elements from other formats into 
> RSS 2.0 (the basis for our native format). Why RSS 2.0? Because it's 
> the format used by the majority of feeds on the web.

>From reports I have seen, you are doing things like adding type attributes on 
>the description element?  If so, it isn't RSS 2.0.

How do you plan to handle multiple enclosures?

How about HTML in titles?

On Feed-Tech I saw a post by Phil Stanhope indicating the importance of 
sub-second times in certain scenarios.  How will this be expressed in RFC 822 
format?

How about content that is in a binary format?

I can go on...

> Any case of data-loss is a bug that we'll address

If that is a blanket statement, then I expect that you will be seeing a lot of 
bug reports.

- Sam Ruby



Re: Fwd: [rss-public] Microsoft Feeds API Enclosure Test

2006-02-25 Thread Sam Ruby

Sean Lyndersay wrote:
> 
> As an format that is not intended for publishing on the web, but
> which is really a contract between the platform and the clients of
> the platform, the native format we use is not really required to be
> "pure" RSS 2.0.

Sean,

I'm not sure what you are looking for.  If the position you find
yourself in is that you are feature complete, the format you have
selected you expect to support in perpetuity, you have baked silent data
loss right into the Windows platform, but you will accept bug reports,
then simply realize that bug reports can be created at will.

If, instead, the position is that you have an internal data model which
is open for discussion, then we can have a discussion.  If you look on
the web, you will find plenty of RSS 2.0 feeds that contain small bits
of HTML markup -- things like bold words or italics -- in titles.  You
will find feeds with multiple enclosures.  You will find feeds with
relative references.  The RSS 2.0 spec doesn't disallow any of these,
nor does it specify how such things are to be interpreted.

Even so, such things are out there.  And they either need to be in your
model, or you have either data loss or data corruption.

You seem to think that you have picked RSS 2.0.  If you take a look at
the current food fight regarding the RSS Advisory Board, you would
realize that RSS 2.0 is frozen.  There are only two paths forward.
Creating new formats with new names, or the use of extensions.  The
newly reconsituted RSS Advisory Board thought that it could change RSS
by providing a small number of much needed clarifications.  They were wrong.

By changing what the description element is, you are changing RSS 2.0.
Call what you are creating a new format.  Or use a different element, in
a namespace.  Perhaps atom:summary would do.  Or you could create your own.

- Sam Ruby