date:20050821


* Sam Ruby <[EMAIL PROTECTED]> [2005-08-22 06:45]:
> SAX is a popular API for dealing with streaming XML (and there
> are a number of "pull parsing" APIs too).

Of course – using a DOM parser is impossible with either approach
anyway.

> With a HTTP client library and SAX, the "absolute simplest
> solution" is what Bob is describing: a single document that
> never completes.

That can be argued both ways, I think. The important point is
that the connection can and almost certainly will be closed in
the middle of an entry document.

With a single, endless document, the application will have to
backtrack, discarding events until it has thrown away the last
seen start-element event for the incomplete entry, then close the
feed element.

With a series of concatenated complete documents, it can simply
discard everything that belongs to the current incomplete Entry
Document. There are implicit checkpoints in the stream.

Regards,
-- 
Aristotle Pagaltzis //

Re: If you want "Fat Pings" just use Atom!


Joe Gregorio wrote:
> Why not POST the Atom Entry, ala the Atom Publishing Protocol?

Essentially, LiveJournal is making this data available to anybody who
wishes to access it, without any need to register or to invent a unique API.

I can, and have, accessed the LiveJournal stream from behind both a
firewall and a NAT device.  Doing so requires the client to initiate the
request.  Therefore, if you really wanted to turn this around, the
client would need to initiate a POST, and the server would need to
return the "Fat Pings" as the response.

I talked to Brad - in fact, I had independently made the same suggestion
that Bob did.  Brad indicated that if there were clients with different
requirements, he was amenable to accommodating each - endpoints are cheap.

- Sam Ruby

Re: If you want "Fat Pings" just use Atom!

A. Pagaltzis wrote:
> * Bob Wyman <[EMAIL PROTECTED]> [2005-08-22 01:05]:
> 
>>What do you think? Is there any conceptual problem with
>>streaming basic Atom over TCP/IP, HTTP continuous sessions
>>(probably using chunked content) etc.?
> 
> I wonder how you would make sure that the document is
> well-formed. Since the stream never actually ends and there is no
> way for a client to signal an intent to close the connection, the
>  at the top would never actually be accompanied by a
>  at the bottom.
> 
> If you accept that the stream can never be a complete well-formed
> document, is there any reason not to simply send a stream of
> concatenated Atom Entry Documents?
> 
> That would seem like the absolute simplest solution.

I think the keyword in the above is "complete".

SAX is a popular API for dealing with streaming XML (and there are a
number of "pull parsing" APIs too).  It makes individual elements
available to your application as they are read.  If at any point, the
SAX parser determines that your feed is not well formed, it throws an
error at that point.

With a HTTP client library and SAX, the "absolute simplest solution" is
what Bob is describing: a single document that never completes.

Note that if your application were to discard all the data it receives
before it encouters the first entry, the stream from there on out would
be identical.

- Sam Ruby

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-21 Thread 'A. Pagaltzis'


* James M Snell <[EMAIL PROTECTED]> [2005-08-22 05:30]:
> Second note to self: After thinking about this a bit more, I
> would also need a way of specifying a null license (e.g. the
> lack of a license). For instance, what if an entry that does
> not contain a license is aggregated into a feed that has a
> license. The original lack-of-license still applies to that
> entry regardless of what is specified on the feed level. Golly
> Bob, you're right, this is rather messy ain't it. Hmm...

This is exactly the point I was making about atom:contributor.

Regards,
-- 
Aristotle Pagaltzis //

Re: If you want "Fat Pings" just use Atom!



Bob Wyman wrote:


Basically, there are many really easy ways that one can handle
streams of Atom entries. You could prepend an empty feed to the head of the
stream, you could use "virtual" end-tags, you could just send entries and
rely on the receiver to wrap them up as required, etc... But, since all of
these are really easy and none of them really gets in the way of anything
rational that I can imagine someone wanting to do, why not just default to
doing it the way it is defined in the Atom spec? In that way, we don't have
to create one more context-dependent distinction between formats. Complexity
is reduced and we can avoid having to read yet-another-specification that
looks very, very much like hundreds we've read before. If Atom provides all
we need, lets not do something else unless there is a *very* good argument
to do so.

bob wyman


 

+1. The basic format gives us everything we need to enable this.  Even 
looking back over my PaceSimpleNotify proposal.. in which I introduce a 
notification element used to identify the action that has occurred on 
the element (e.g. create, update or delete), I can see that there really 
is no need to have that element. 


- James

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)



A. Pagaltzis wrote:


* Robin Cover <[EMAIL PROTECTED]> [2005-08-22 05:05]:
 


On Mon, 22 Aug 2005, A. Pagaltzis wrote:
   


That issue is inheritance.

atom:author is the only precedent for it in Atom.
 


If "it" in "only precedent for it" refers to inhertance, can
you explain the sense in which "atom:author is the only
precedent" ?

xml:lang,
   



Sure, then you can also cite xml:base and possibly more. But
these are irrelevant – they they work on a different layer and
have no concept of “feed-level” or “entry-level.” They’re
possibly better described as annotations at the XML level, as
opposed to metadata at the Atom model level.

 

And yet in the spec we constrain the relevance of xml:lang and xml:base 
to specific elements.  Sure, they operate on the XML level, but they do 
have relevance on the feed/entry level.  Not that this changes your 
argument, it's just an observation that xml:lang and xml:base are not 
entirely dissimilar to atom:author.


- James

Re: If you want "Fat Pings" just use Atom!

2005-08-21 Thread 'A. Pagaltzis'


* Bob Wyman <[EMAIL PROTECTED]> [2005-08-22 05:25]:
> If Atom provides all we need, lets not do something else unless
> there is a *very* good argument to do so.

I’m not inventing anything. Atom Entry Documents are part of the
spec and Atom Feed Documents may legally be empty.

And a consumer of a stream according to your proposition does not
get away without implementing some special ability to be able to
interpret it correctly anyway – you are using “plain Atom” at the
cost of requiring a specifically abled XML parser.

My proposition requires a trivial amount of extra semantics
implemented at the application logic level; yours requires extra
semantics at the protocol logic level (the protocol being XML).

I think doing this in application logic is so cheap that reaching
for the protocol logic is unwarranted.

Regards,
-- 
Aristotle Pagaltzis //

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)



Bob Wyman wrote:


Aristotle Pagaltzis wrote:
 


That issue is inheritance.
   


Let me give an example of problematic inheritance...
Some have suggested that there be a "License" that you can associate
with Atom feeds and entries. However, scoping becomes very important in this
case because of some peculiarities of the legal system.
One can copyright an individual thing and one can copyright a
collection of things. A claim of copyright in a collection is not, however,
necessarily a claim of copyright over the elements of the collection.
Similarly, a claim of copyright over an element of the collection doesn't
reduce any claim of copyright in the collection itself.
If we assume inheritance from feed elements, then without further
specification, it isn't possible to claim copyright in the collection that
is the feed without claiming copyright in its individual parts. What you'd
have to do is create two distinct types of claim (one for collection and one
for item. That's messy.)
I'm sure that copyright and licenses aren't the only problematic
contexts here.

bob wyman



 

From the format text: "If an atom:entry element does not contain 
atom:author elements, then the atom:author elements of the contained 
atom:source element are considered to apply. In an Atom Feed Document, 
the atom:author elements of the containing atom:feed element are 
considered to apply to the entry if there are no atom:author elements in 
the locations described above."


This really does not describe an inheritance model*.  This describes a 
scoping model.  If an entry does not contain an author, the author on 
the feed is said to apply.  If the entry does happen to have an author, 
it doesn't matter if the feed also has an author, it is the author 
element on the entry that applies to the entry.  Same thing with the 
license extension (for example).  If both the feed and entry contain 
license links, it is the license link on the entry that is relevant to 
the entry.  If the entry does not contain a license, it looks to the 
feed level.  Placing a license on the feed level does not change the 
license of the entry if the entry already has a license. 

There is another example of this kind of scoping that we're all 
intimately familiar with:


 
   
 
   
 

From Section 5.1 of Namespaces in XML: "The namespace declaration is 
considered to apply to the element where it is specified and to all 
elements within the content of that element, unless overridden by 
another namespace declaration" 
(http://www.w3.org/TR/REC-xml-names/#scoping)


To directly address Bob's point that "it isn't possible to claim 
copyright in the collection that is the feed without claiming copyright 
in its individual parts" -- it IS possible if we are looking at this in 
terms of relevant scope as opposed to inherited properties.


Note to self: I need to fix the license spec to address this.  Currently 
the spec limits the license relevance to entries without providing a 
mechanism for attaching a license to the feed itself.


Second note to self: After thinking about this a bit more, I would also 
need a way of specifying a null license (e.g. the lack of a license).  
For instance, what if an entry that does not contain a license is 
aggregated into a feed that has a license.  The original lack-of-license 
still applies to that entry regardless of what is specified on the feed 
level.  Golly Bob, you're right, this is rather messy ain't it. Hmm...


- James

* I've called this inheritance in the past but now I do believe that I 
was mistaken.

RE: If you want "Fat Pings" just use Atom!

Aristotle Pagaltzis wrote:
> Shades of SGML.
No! No! Not that! :-)

He continues with:
> ... many good points 

Basically, there are many really easy ways that one can handle
streams of Atom entries. You could prepend an empty feed to the head of the
stream, you could use "virtual" end-tags, you could just send entries and
rely on the receiver to wrap them up as required, etc... But, since all of
these are really easy and none of them really gets in the way of anything
rational that I can imagine someone wanting to do, why not just default to
doing it the way it is defined in the Atom spec? In that way, we don't have
to create one more context-dependent distinction between formats. Complexity
is reduced and we can avoid having to read yet-another-specification that
looks very, very much like hundreds we've read before. If Atom provides all
we need, lets not do something else unless there is a *very* good argument
to do so.

bob wyman

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)

* Robin Cover <[EMAIL PROTECTED]> [2005-08-22 05:05]:
> On Mon, 22 Aug 2005, A. Pagaltzis wrote:
> > That issue is inheritance.
> > 
> > atom:author is the only precedent for it in Atom.
> 
> If "it" in "only precedent for it" refers to inhertance, can
> you explain the sense in which "atom:author is the only
> precedent" ?
> 
> xml:lang,

Sure, then you can also cite xml:base and possibly more. But
these are irrelevant – they they work on a different layer and
have no concept of “feed-level” or “entry-level.” They’re
possibly better described as annotations at the XML level, as
opposed to metadata at the Atom model level.

And with both of them being attributes, the cardinality is
trivial and known in advance: zero or one. Thus the override
mechanism is universally obvious in advance.

They have nothing in common with extension elements as per the
semantics defined in the Atom Format specification.

Regards,
-- 
Aristotle Pagaltzis //

RE: If you want "Fat Pings" just use Atom!


Joe Gregorio wrote:
> Why can't you keep that socket open, that is the default
> behavior for HTTP 1.1.
In some applications, HTTP 1.1 will work just fine. However, HTTP
doesn't add much to the high volume case. It also costs a great deal. For
instance, every POST requires a response. This means that you're moving from
a pure streaming case to an endless sequence of application level ACK/NAKs
that are simply replicating what TCP/IP already does for you. Also, the HTTP
headers that would be required simply don't contribute anything useful. The
bandwidth overhead of the additional headers as well as the bandwidth,
processing and timing problems related to generating responses begins to
look pretty nasty when you're moving at hundreds of items per minute or
second...
One really good reason for using HTTP would be to exploit the
existing HTTP infrastructure including proxies, caches, application-level
firewalls, etc. However, I'm aware of no such infrastructure components that
are designed to handle well permanently open high-bandwidth connections. The
HTTP infrastructure is optimized around the normal uses of HTTP. This isn't
"normal." 
One of the really irritating things about the current HTTP
infrastructure is that it is very "fragile." This is a problem that has
caused unlimited headaches for the folk trying to do "notification over
HTTP" (mod-pubsub, KnowNow, various HTTP-based IM/chat systems, etc.). The
problem is that HTTP connections, given the current infrastructure and
standard components, are very hard to keep open "permanently" or for a very
long period of time. One is often considered lucky if you can keep an HTTP
connection open for 5 minutes without having to re-initialize... Of course,
during the period between when your connection breaks and when you get it
re-established, you're losing packets. That means that you have to have a
much more robust mechanism for recovering lost messages and that means
increased complexity, network traffic, etc. The added complexity and trouble
can be justified in some cases; however, not in all cases.
HTTP is great in some cases but not all. That's why the IETF has
defined BEEP, XMPP, SIP, SIMPLE, etc. in addition to HTTP. One protocol
model simply can't suit all needs at all times and in all contexts.
Whatever... The point here is that Atom already has defined all that
appears to be needed in order to address the "Fat Ping" requirement whether
you prefer individual HTTP POSTs, POSTs over HTTP 1.1 connections, XMPP, or
raw open TCP/IP sockets. That is a good thing.

bob wyman

Re: If you want "Fat Pings" just use Atom!

2005-08-21 Thread 'A. Pagaltzis'


* Bob Wyman <[EMAIL PROTECTED]> [2005-08-22 04:00]:
> Basically, what you do is consider the open tag to have a
> virtual closure and use it primarily as a carrier of stream
> metadata.

Shades of SGML…

> You could certainly do that, however, you will inevitably want
> to pass across some stream oriented metadata and you'll
> eventually realize that much of it is stuff that you can map
> into an Atom Feed

OT1H, you could put this data in the stream as an empty but
complete Atom Feed Document served as the first complete entity
in the feed –

> A rather nice side effect of forming the stream as an atom feed
> is the simple fact that a "log" of the stream can be written to
> disk as a well-formed Atom file.

– but OTOH this is a pretty good point.

Of course, the question is whether it is really any more work to
receive an empty Atom Feed Document + X * Atom Entry Documents
and to insert the Entry Documents into the Feed Document for
storage.

Note that in the case of prepending an empty Atom Feed Document,
all fully received Documents are well-formed entities of their
own, so you don’t need a recovering XML parser that can implement
the “virtual closing element” semantic – all entities can be
processed with any run-of-the-mill XML parser.

Regards,
-- 
Aristotle Pagaltzis //

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)



A. Pagaltzis wrote:


And with that, getting back to your question, the answer seems
pretty clear: it depends on whether the extension element is more
like atom:contributor, ie defines a property which an entry may
or may not have, or more like atom:author, ie defines a property
that every entry inevitably has.

But because that is a matter of interpretation, I would strongly
prefer to say that if the extension does not specify a meaning
for an element at the feed level, then the meaning is undefined. 


On the other, related point, the same principle of avoiding the
necessity of complicated override mechanisms is why I say that
aggregators should assume that unknown extension elements (which
therefore have *unknown* as opposed to *undefined* semantics)
pertain only to the feed, not to its entries.

 

Ok, I retract my earlier comment about apps not making any assumptions 
about unknown extensions.  This is better. 

Regarding Bob Wyman's separate note about what aggregators like pubsub 
should do, If the aggregator is generating the source element, the 
aggregator should include everything it finds in the feed.. even if it 
does not understand it.  If the entry already contains a source element, 
leave it as is.


- James

Re: If you want "Fat Pings" just use Atom!



Bob Wyman wrote:


Joe Gregorio wrote:
 


Why not POST the Atom Entry, ala the Atom Publishing Protocol?
   


This would be an excellent idea if what we were talking about was a
low volume site. However, a site like LiveJournal generates hundreds of
updates per minute. Right now, on a Sunday evening, they are updating at the
rate of 349 entries per minute. During peak periods, they generate much more
traffic. Generating 349 POST messages per minute to perhaps 10 or 15
different services means that they would be pumping out thousands of these
things per minute. It just isn't reasonable.
Using an open TCP/IP socket to carry a stream of Atom Entries
results in much greater efficiencies with much reduced bandwidth and
processing requirements. 
	At PubSub, we've been experimentally providing "Fat Ping" versions

of our FeedMesh feeds to a small group of testers. We publish messages at a
rate much higher than LiveJournal does -- since we publish all of
LiveJournal's content plus everyone else's. We couldn't even consider Fat
Pings if we had to create and tear down a TCP/IP-HTTP session to post each
individual entry.
There are many situations in which HTTP would work fine for Fat
Pings. However, for high-volume sites, it just isn't reasonable. The key, to
me, is that we establish the expectation that the Atom format is adequate to
the task (whatever the transport) and leave the transport selection as a
context dependent decision. Thus, some server/client pairs would exchange
streams of Atom entries using the POST based Atom Publishing Protocol while
others would exchange essentially the same streams using a more efficient
transport mechanism such as streaming raw sockets or even "Atom over XMPP".

 

First off, as a general FYI, take a look at PaceSimpleNotify... the 
current version uses basic HTTP POSTs to send one or more individual 
atom:entry's to a remote endpoint.  I'm hoping that the folks on the 
protocol list will pick this up in discussion in the near future as it 
is something that I definitely want to see incorporated. 

Secondly, I believe that the format is more than adequate to support 
this kind of mechanism.  I do not believe that Brad's atomStream 
container is necessary.  Either just stream a bunch of atom:feed or 
atom:entry elements directly over an open TCP/IP or a persistent 
(keep-alive) HTTP connection.  By no means would I ever suggest a new 
HTTP connection for each ping.


- James

Re: If you want "Fat Pings" just use Atom!

2005-08-21 Thread Joe Gregorio


On 8/21/05, Bob Wyman <[EMAIL PROTECTED]> wrote:
> Joe Gregorio wrote:
> > Why not POST the Atom Entry, ala the Atom Publishing Protocol?
> This would be an excellent idea if what we were talking about was a
> low volume site. However, a site like LiveJournal generates hundreds of
> updates per minute. Right now, on a Sunday evening, they are updating at the
> rate of 349 entries per minute. During peak periods, they generate much more
> traffic. Generating 349 POST messages per minute to perhaps 10 or 15
> different services means that they would be pumping out thousands of these
> things per minute. It just isn't reasonable.
> Using an open TCP/IP socket to carry a stream of Atom Entries
> results in much greater efficiencies with much reduced bandwidth and
> processing requirements.

Why can't you keep that socket open, that is the default 
behavior for HTTP 1.1.

   -joe

-- 
Joe Gregoriohttp://bitworking.org

RE: Extensions at the feed level (Was: Re: geolocation inatom:author?)


Eric Scheid wrote:
> It's an interesting problem. A pity now that the idea of segregating
> entry-defaults at the feed level didn't get sufficient momentum.
> But what if there was an widely known extension developed for just
> this purpose -- providing entry metadata at the feed level. Possibly
> in a wrapper, possibly by flagging with an attribute.
It ain't pretty. But, it would definitely help resolve a great deal
of ambiguity if there was a means by which publishers could provide this
sort of processing hint to intermediaries in the channel. (Consider this
sort of thing to be very much like the cache control hints that folk insert
in the headers of HTTP packets. Sometimes, you have to give the
intermediaries like proxies, publish/subscriber routers or brokers, search
engines, etc. some hints to get them to work properly.)

bob wyman

RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)


Aristotle Pagaltzis wrote:
> That issue is inheritance.
Let me give an example of problematic inheritance...
Some have suggested that there be a "License" that you can associate
with Atom feeds and entries. However, scoping becomes very important in this
case because of some peculiarities of the legal system.
One can copyright an individual thing and one can copyright a
collection of things. A claim of copyright in a collection is not, however,
necessarily a claim of copyright over the elements of the collection.
Similarly, a claim of copyright over an element of the collection doesn't
reduce any claim of copyright in the collection itself.
If we assume inheritance from feed elements, then without further
specification, it isn't possible to claim copyright in the collection that
is the feed without claiming copyright in its individual parts. What you'd
have to do is create two distinct types of claim (one for collection and one
for item. That's messy.)
I'm sure that copyright and licenses aren't the only problematic
contexts here.

bob wyman

RE: If you want "Fat Pings" just use Atom!


Aristotle Pagaltzis wrote:
> I wonder how you would make sure that the document is
> well-formed. Since the stream never actually ends and there
> is no way for a client to signal an intent to close the connection,
> the  at the top would never actually be accompanied by a
>  at the bottom.
This is a problem which has become well understood in the use and
implementation of the XMPP/Jabber protocols which are based on streaming
XML. Basically, what you do is consider the open tag to have a virtual
closure and use it primarily as a carrier of stream metadata. In XMPP
terminology, your code works at picking "stanzas" out of the stream that can
be parsed successfully or unsuccessfully on their own. In an Atom stream,
the processor would consider each atom:entry to be a parseable atomic unit.

> If you accept that the stream can never be a complete
> well-formed document, is there any reason not to simply send a
> stream of concatenated Atom Entry Documents?
> That would seem like the absolute simplest solution.
You could certainly do that, however, you will inevitably want to
pass across some stream oriented metadata and you'll eventually realize that
much of it is stuff that you can map into an Atom Feed. (i.e. "created
date", unique stream id, stream title, etc.). Since we're all in the process
of learning how to deal with atom:feed elements anyway, why not just reuse
what we've got instead of inventing something new.
A rather nice side effect of forming the stream as an atom feed is
the simple fact that a "log" of the stream can be written to disk as a
well-formed Atom file. Thus, the same tools that you usually use to parse
Atom files can be used to parse the log of the stream. It is nice to be able
to reuse tools in this way... (Note: At PubSub, the atom files that we serve
to people are, in essence, just slightly stripped logs of the proto-"Atom
over XMPP" streams that they would have received if they had been listening
with that protocol. In our clients we can use the same parser for the stream
as we do for atom files. It works out nicely and elegantly.)

bob wyman

RE: If you want "Fat Pings" just use Atom!


Joe Gregorio wrote:
> Why not POST the Atom Entry, ala the Atom Publishing Protocol?
This would be an excellent idea if what we were talking about was a
low volume site. However, a site like LiveJournal generates hundreds of
updates per minute. Right now, on a Sunday evening, they are updating at the
rate of 349 entries per minute. During peak periods, they generate much more
traffic. Generating 349 POST messages per minute to perhaps 10 or 15
different services means that they would be pumping out thousands of these
things per minute. It just isn't reasonable.
Using an open TCP/IP socket to carry a stream of Atom Entries
results in much greater efficiencies with much reduced bandwidth and
processing requirements. 
At PubSub, we've been experimentally providing "Fat Ping" versions
of our FeedMesh feeds to a small group of testers. We publish messages at a
rate much higher than LiveJournal does -- since we publish all of
LiveJournal's content plus everyone else's. We couldn't even consider Fat
Pings if we had to create and tear down a TCP/IP-HTTP session to post each
individual entry.
There are many situations in which HTTP would work fine for Fat
Pings. However, for high-volume sites, it just isn't reasonable. The key, to
me, is that we establish the expectation that the Atom format is adequate to
the task (whatever the transport) and leave the transport selection as a
context dependent decision. Thus, some server/client pairs would exchange
streams of Atom entries using the POST based Atom Publishing Protocol while
others would exchange essentially the same streams using a more efficient
transport mechanism such as streaming raw sockets or even "Atom over XMPP".

bob wyman

Re: If you want "Fat Pings" just use Atom!

2005-08-21 Thread Joe Gregorio


Why not POST the Atom Entry, ala the Atom Publishing Protocol?

   -joe

-- 
Joe Gregoriohttp://bitworking.org

Re: If you want "Fat Pings" just use Atom!


* Bob Wyman <[EMAIL PROTECTED]> [2005-08-22 01:05]:
> What do you think? Is there any conceptual problem with
> streaming basic Atom over TCP/IP, HTTP continuous sessions
> (probably using chunked content) etc.?

I wonder how you would make sure that the document is
well-formed. Since the stream never actually ends and there is no
way for a client to signal an intent to close the connection, the
 at the top would never actually be accompanied by a
 at the bottom.

If you accept that the stream can never be a complete well-formed
document, is there any reason not to simply send a stream of
concatenated Atom Entry Documents?

That would seem like the absolute simplest solution.

Regards,
-- 
Aristotle Pagaltzis //

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-21 Thread Eric Scheid

On 22/8/05 10:28 AM, "Bob Wyman" <[EMAIL PROTECTED]> wrote:

> What should an aggregate feed generator like PubSub do when it finds
> an entry in a feed that contains unscoped extensions as children of the
> feed? 

It's an interesting problem. A pity now that the idea of segregating
entry-defaults at the feed level didn't get sufficient momentum.

But what if there was an widely known extension developed for just this
purpose -- providing entry metadata at the feed level. Possibly in a
wrapper, possibly by flagging with an attribute.

Like this...

...

...

or like this ...

...

...

e.

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)


* Paul Hoffman <[EMAIL PROTECTED]> [2005-08-22 01:00]:
> The crux of the question is: what happens when an extension
> that does not specify the scope appears at the feed level?

Let me step back to look at the larger issue for a moment.

That issue is inheritance.

atom:author is the only precedent for it in Atom. It works
because each entry ALWAYS has at least one author; the override
mechanism is simple. Additionally, all Atom processors are aware
of it by definition.

But atom:author is not a model to emulate. It works only because
it falls within a narrowly defined set of circumstances.

The WG wisely avoided attempting to specify inheritance for
atom:contributor. An entry MAY have no contributors at all, and
this would require complications to express if feed-level
atom:contributor elements inherited to entries.

My point of view is that the default interpretation should avoid
forcing extensions to specify complicated mechanisms that the WG
avoided introducing when specifying atom:contributor, when an
extension element’s semantics are the same as those of
atom:contributor.

And with that, getting back to your question, the answer seems
pretty clear: it depends on whether the extension element is more
like atom:contributor, ie defines a property which an entry may
or may not have, or more like atom:author, ie defines a property
that every entry inevitably has.

But because that is a matter of interpretation, I would strongly
prefer to say that if the extension does not specify a meaning
for an element at the feed level, then the meaning is undefined. 

On the other, related point, the same principle of avoiding the
necessity of complicated override mechanisms is why I say that
aggregators should assume that unknown extension elements (which
therefore have *unknown* as opposed to *undefined* semantics)
pertain only to the feed, not to its entries.

Regards,
-- 
Aristotle Pagaltzis //

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-21 Thread Eric Scheid

On 22/8/05 9:22 AM, "Robert Sayre" <[EMAIL PROTECTED]> wrote:

>> The crux of the question is: what happens when an extension that does
>> not specify the scope appears at the feed level?
> 
> I'm not sure why this question is interesting. What sort of
> application would need to know?

a search engine which provides more than simply keyword searching?

e.

RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)


Paul Hoffman wrote: 
> The crux of the question is: what happens when an extension that
> does not specify the scope appears at the feed level?
Robert Sayre asked:
> I'm not sure why this question is interesting. What sort of
> application would need to know?
I ask:
What should an aggregate feed generator like PubSub do when it finds
an entry in a feed that contains unscoped extensions as children of the
feed? 
* Would you expect us to include these extension elements in an
atom:source element if we use the entry in one of our feeds?
* Should we include in the source elements we generate even things
that we don't understand?
* What should we do if the entry already has a source element but
that source element doesn't include the extension elements? Should we
publish the source element as we find it? Or, should we modify the source
element to include the extensions? (assuming there are no signatures...)


bob wyman

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-21 Thread Robert Sayre


On 8/21/05, Paul Hoffman <[EMAIL PROTECTED]> wrote:
> 
> At 3:35 PM -0700 8/21/05, James M Snell wrote:
> >IMHO, it depends entirely on how the extension is defined.  The
> >various extensions I have put together (e.g. comments, expires,
> >etc), the metadata can be placed on the feed/source level but is
> >only relevant on the entry level (same model as ). 

4.1.1
"The 'atom:feed' element is the document (i.e., top-level) element of
an Atom Feed Document, acting as a container for metadata and data
associated with the feed."

4.2.1
"The 'atom:author' element is a Person construct that indicates the
author of the entry or feed."


> One
> >could easily imagine, however, that other extensions would apply
> >only on the feed level.  It's entirely up to the extension and
> >implementors should make no assumptions.
> 
> The crux of the question is: what happens when an extension that does
> not specify the scope appears at the feed level?

I'm not sure why this question is interesting. What sort of
application would need to know?

Robert Sayre

Re: xml:base abuse


Sjoerd Visscher wrote:
> 
> Sam Ruby wrote:
> 
 URI(doc) = http://www.w3future.com/weblog/rss.xml?notransform
 xml:base = http://w3future.com/weblog/rss.xml?notransform
>>>
>>> Ah, ok, I missed that. (Just to be sure, you added www yourself, or is
>>> there a link to the feed somewhere with www in it?)
>>
>> Your feed is available from both of the URI's mentioned above.  The
>> tinyurl quoted above is based on passing the first one to the feed
>> validator.
> 
> Oh, this is actually interesting. I send a 301 when you use
> www.w3future.com. It looks like this is handled transparently by your
> http library, giving you the file at w3future.com. In that case it
> should be possible to request the actual uri that is used to get the file.

Fixed.

> And then there's also the Content-Location header, which sets the base
> URI. This is used with content negotiation. F.e. (I haven't actually
> implemented this) if you would send a request to
>   http://w3future.com/weblog/
> with the header
>   Accept: application/atom+xml
> I could send you the atom file with this header:
>   Content-Location: http://w3future.com/weblog/rss.xml?notransform
> 
> Afaik this is how the HTTP spec suggest to implement content
> negotiation. Then the value of Content-Location should be considered to
> be the uri of the document.

I now check for content-location too.

- Sam Ruby

If you want "Fat Pings" just use Atom!

The subject of “Fat Pings”
or full content streaming from blogs has come up on the FeedMesh list and in a proposal by Brad
Fitzpatrick of LiveJournal. I’ve responded to the FeedMesh list
suggesting that the best way to move forward is to simply use Atom feeds rather
than invent new formats. See my response at: http://groups.yahoo.com/group/feedmesh/message/451

The problem being addressed here is
that of increasing the efficiency with which feed search and/or monitoring
services (like PubSub, Feedster, IceRocket, Technorati, BlogDigger, etc.) obtain
posts from the major blog hosting platforms. In the past, most services have
limited what they do to simply sending pings. However, while the pinging
mechanism works fine in the case of low volume publishers, it simply doesn’t
scale to the requirements of high volume publishers like the major blog hosting
platforms (LiveJournal, TypePad, Blogspot, Bryght, etc.). The problem is that when
a service is pinged, it must reach back to the pinging site and retrieve an RSS
or Atom file that probably contains many duplicate entries. The service must
then filter out the dupes before indexing, publishing, or matching the “new”
or “changed” items discovered.

LiveJournal has lead for some time
in showing a more efficient and effective way for search engines to obtain new
and changed postings. What they do is produce an aggregate feed that contains copies
of all entries written on any of their public blogs. This feed typically
contains as many as 200-300 new entries per minute. But, while that might sound
like a great many entries to process each minute, it is massively less than the
number of entries that would need to be processed if LiveJournal were to rely
on a simple pinging process. The reason is that search engines can focus their
ingestion processors only on the aggregate feed and thus never need to deal
with the wasted bandwidth and processing that comes from duplicate entries. Of
course, LiveJournal benefits as well since the bandwidth and processing cost of
serving external search and/or monitoring systems is drastically reduced. At
PubSub, because of the way that LiveJournal publishes updates, we find that the
cost of processing LiveJournal updates is very much lower than the cost to
process entries from other blog hosting services that use traditional content-free
ping formats.

But, as with most RSS based systems,
the LiveJournal system has been based on a polling model. Given the speed with
which the feed updates, services like PubSub have been forced to read the LiveJournal
feed at least once a minute if not more frequently. Given that the entire
(massive) feed must be downloaded very frequently and given that LiveJournal
does not currently support RFC3229+feed,
there are inevitably duplicates that appear in the feed. Also, the polling
services never have any idea what the publishing rate is in the feed and thus
can’t slack off the frequency with which they poll during “slow”
periods. The result is that as the rate at which LiveJournal’s users begins
to slacken during “slow” periods, the percentage of duplicate entries
increases. Clearly, the solution to the problem is to move to a push feed. In
this case, LiveJournal would push the data updates to services that were
interested in them rather than forcing those services to poll LiveJournal.

Brad has proposed a somewhat bent
and extended version of Atom which would be streamed over a TCP/IP connection in
much the same way that we currently stream FeedMesh data. (I’ve included
a snapshot of his proposed format below.) He defines an “AtomStream”
and suggests that individual posts from the various LiveJournal hosted blogs would
be included in the stream as a sequence of single-entry feeds. This is a
solution that would work… However, I suggest that there is actually no
need to do anything other than “vanilla flavored” Atom in order to
address the needs here. A stream which began with an atom:feed element and continued
with a series of atom:entry elements that contained atom:source elements would be
a much more natural solution than the “stream of feeds” that Brad
proposes. (A sample of what I think a “proper Atom” format for Brad’s
sample appears below.)

The problem being addressed by “Fat
Pings” is very much like the one addressed by the “Atom
over XMPP” protocol and is very much like the service that we provide
at PubSub.com. I believe it will be an important
test of Atom to determine if it is adequate to handle this sort of problem. I would
greatly appreciate comments from others on this use of Atom.

It should be noted that “Fat
Pings” are probably only properly generated by large, trusted blog hosting
platforms. One of the essential elements of controlling spam in feeds is the
ability to trace back to an actual network resource which can be used to verify
the data in a “ping” and can be used, to some extent, to identify
the publisher of the data. For a service like PubSub to forgo actual ver

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)



At 3:35 PM -0700 8/21/05, James M Snell wrote:
IMHO, it depends entirely on how the extension is defined.  The 
various extensions I have put together (e.g. comments, expires, 
etc), the metadata can be placed on the feed/source level but is 
only relevant on the entry level (same model as ).  One 
could easily imagine, however, that other extensions would apply 
only on the feed level.  It's entirely up to the extension and 
implementors should make no assumptions.


The crux of the question is: what happens when an extension that does 
not specify the scope appears at the feed level?


--Paul Hoffman, Director
--Internet Mail Consortium

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)



Paul Hoffman wrote:



At 7:24 PM +0100 8/21/05, Peter Robinson wrote:


I do something similar, intending it to mean "the location of the items
described by this feed" (when there is a single location).



Ah, I had missed that. This leads to a question for the mailing list. 
Does an informative extension that appears at the feed level (as 
compared to in entries) indicate:


a) this information pertains to each entry

b) this information pertains to the feed itself

c) this information pertains to each entry and to the feed itself

d) completely unknown unless specified in the extension definition


--Paul Hoffman, Director
--Internet Mail Consortium


IMHO, it depends entirely on how the extension is defined.  The various 
extensions I have put together (e.g. comments, expires, etc), the 
metadata can be placed on the feed/source level but is only relevant on 
the entry level (same model as ).  One could easily imagine, 
however, that other extensions would apply only on the feed level.  It's 
entirely up to the extension and implementors should make no assumptions.


- James

RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)



At 5:10 PM -0400 8/21/05, Bob Wyman wrote:

I believe the correct answer is "e":

  e) Unless otherwise specified, this information pertains to the feed only.


Er, right. Change that list to:

a) this information pertains to each entry (unless otherwise specified)
b) this information pertains to the feed itself (unless otherwise specified)
c) this information pertains to each entry and to the feed itself 
(unless otherwise specified)

d) completely unknown unless specified in the extension definition

--Paul Hoffman, Director
--Internet Mail Consortium

RE: Extensions at the feed level (Was: Re: geolocation in atom:author?)


Paul Hoffman asked:
> Does an informative extension that appears at the feed level
> (as compared to in entries) indicate:
> a) this information pertains to each entry
> b) this information pertains to the feed itself
> c) this information pertains to each entry and to the feed itself
> d) completely unknown unless specified in the extension definition

I believe the correct answer is "e":

  e) Unless otherwise specified, this information pertains to the feed only.

bob wyman

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)


* Paul Hoffman <[EMAIL PROTECTED]> [2005-08-21 21:55]:
> Ah, I had missed that. This leads to a question for the mailing
> list. Does an informative extension that appears at the feed
> level (as compared to in entries) indicate:
> 
> a) this information pertains to each entry
> 
> b) this information pertains to the feed itself
> 
> c) this information pertains to each entry and to the feed
> itself
> 
> d) completely unknown unless specified in the extension
> definition

Atom itself has precedent for both b) and c). So I would say
it’s d) – but also that aggregators should assume b) when they
don’t know any better.

Regards,
-- 
Aristotle Pagaltzis //

Re: Extensions at the feed level (Was: Re: geolocation in atom:author?)

2005-08-21 Thread David Powell

Sunday, August 21, 2005, 8:46:54 PM, Paul Hoffman wrote:

> At 7:24 PM +0100 8/21/05, Peter Robinson wrote:
>>I do something similar, intending it to mean "the location of the items
>>described by this feed" (when there is a single location).

> Ah, I had missed that. This leads to a question for the mailing list. 
> Does an informative extension that appears at the feed level (as 
> compared to in entries) indicate:

> a) this information pertains to each entry

> b) this information pertains to the feed itself

> c) this information pertains to each entry and to the feed itself

> d) completely unknown unless specified in the extension definition

In my RDF model, feed extensions (together with properties such as
atom:generator), are considered to be properties of the FeedInstance.

EntryInstance's are related to FeedInstance's using containingFeed and
sourceFeed properties.

(Entry's and Feed's can have multiple EntryInstance's and
FeedInstance's, but that's not really relevant...)

So, feed extensions don't automatically inherit to entries in the
model (unlike atom:author which does), but for a given entry you can
locate its feed and take a look at its extension properties, so it
isn't like the information is lost.

So I'd say b); but as long as you aren't throwing away atom:feed data,
that shouldn't prevent an application using feed extensions to do a)
or c).

I think that the interpretation b) is probably what is supported by
section 6 in the absence of any talk about extension inheritance.

-- 
Dave

Extensions at the feed level (Was: Re: geolocation in atom:author?)



At 7:24 PM +0100 8/21/05, Peter Robinson wrote:

I do something similar, intending it to mean "the location of the items
described by this feed" (when there is a single location).


Ah, I had missed that. This leads to a question for the mailing list. 
Does an informative extension that appears at the feed level (as 
compared to in entries) indicate:


a) this information pertains to each entry

b) this information pertains to the feed itself

c) this information pertains to each entry and to the feed itself

d) completely unknown unless specified in the extension definition


--Paul Hoffman, Director
--Internet Mail Consortium

Re: geolocation in atom:author?

2005-08-21 Thread Peter Robinson


Paul Hoffman <[EMAIL PROTECTED]> wrote:

> At 12:17 AM +1000 8/22/05, Eric Scheid wrote:
>
> >In this example, can anything intelligent be said about the various
> >different locations? Is my intent clear, or are there clear ambiguities?
> >
> >
> >   ...
> >   location#1
> >   
> > foo
> > location#2
> >   
> >   
> > 
> >   foo
> >   location#3
> > 
> > location#4
> > ...
> > ...
> >   
> >
> 
> #2 through #4 seem understandable, but what the heck is #1? "The 
> place where this feed was put together"? "The place where this feed 
> was originally grabbed from"?

I do something similar, intending it to mean "the location of the items
described by this feed" (when there is a single location).  I don't know
whether any software understands my intent...



Regards,

Peter
-- 
Peter Robinson

Re: xml:base abuse



Sam Ruby wrote:

URI(doc) = http://www.w3future.com/weblog/rss.xml?notransform
xml:base = http://w3future.com/weblog/rss.xml?notransform


Ah, ok, I missed that. (Just to be sure, you added www yourself, or is
there a link to the feed somewhere with www in it?)



Your feed is available from both of the URI's mentioned above.  The
tinyurl quoted above is based on passing the first one to the feed
validator.


Oh, this is actually interesting. I send a 301 when you use 
www.w3future.com. It looks like this is handled transparently by your 
http library, giving you the file at w3future.com. In that case it 
should be possible to request the actual uri that is used to get the file.


And then there's also the Content-Location header, which sets the base 
URI. This is used with content negotiation. F.e. (I haven't actually 
implemented this) if you would send a request to

  http://w3future.com/weblog/
with the header
  Accept: application/atom+xml
I could send you the atom file with this header:
  Content-Location: http://w3future.com/weblog/rss.xml?notransform

Afaik this is how the HTTP spec suggest to implement content 
negotiation. Then the value of Content-Location should be considered to 
be the uri of the document.



Regarding the solution, my first suggestion would be to change the
xml:base to reference the atom document, e.g.:

http://example.com/blog/feed.atom"; />


... which would resolve to http://example.com/blog/.  Is that what we
want here?


The original in your example is

 http://example.com/blog/"; />.



Not being quite ready to drop the existing suggestion, I simply added
another.

http://www.feedvalidator.org/docs/warning/SameDocumentReference.html


That's what I meant by saying "first suggestion".

--
Sjoerd Visscher
http://w3future.com/weblog/

Re: xml:base abuse


Sjoerd Visscher wrote:
> 
> Sam Ruby wrote:
> 
>> Sjoerd Visscher wrote:
>>
>>> Sam Ruby wrote:
>>>
>>>
 Sjoerd, I'd be interested in your comments on this:

 http://tinyurl.com/9o6y2
>>>
>>>
>>> The explanation in the documentation[1] is perfect. And it says "As the
>>> current xml:base in effect does not match the URI of the document", but
>>> this is not the case in my feed, so I'm not sure why you report a
>>> warning?
>>
>>
>> URI(doc) = http://www.w3future.com/weblog/rss.xml?notransform
>> xml:base = http://w3future.com/weblog/rss.xml?notransform
> 
> Ah, ok, I missed that. (Just to be sure, you added www yourself, or is
> there a link to the feed somewhere with www in it?)

Your feed is available from both of the URI's mentioned above.  The
tinyurl quoted above is based on passing the first one to the feed
validator.

>>> Regarding the solution, my first suggestion would be to change the
>>> xml:base to reference the atom document, e.g.:
>>>
>>>  http://example.com/blog/feed.atom"; />
>>
>> ... which would resolve to http://example.com/blog/.  Is that what we
>> want here?
> 
> The original in your example is
> 
>   http://example.com/blog/"; />.

Not being quite ready to drop the existing suggestion, I simply added
another.

http://www.feedvalidator.org/docs/warning/SameDocumentReference.html

- Sam Ruby

Re: geolocation in atom:author?



At 12:17 AM +1000 8/22/05, Eric Scheid wrote:

In this example, can anything intelligent be said about the various
different locations? Is my intent clear, or are there clear ambiguities?


  ...
  location#1
  
foo
location#2
  
  

  foo
  location#3

location#4
...
...
  



#2 through #4 seem understandable, but what the heck is #1? "The 
place where this feed was put together"? "The place where this feed 
was originally grabbed from"?


--Paul Hoffman, Director
--Internet Mail Consortium

geolocation in atom:author?

2005-08-21 Thread Eric Scheid


In this example, can anything intelligent be said about the various
different locations? Is my intent clear, or are there clear ambiguities?


  ...
  location#1
  
foo
location#2
  
  

  foo
  location#3

location#4
...
...
  


e.

Re: xml:base abuse



A. Pagaltzis wrote:

What do you think about what I said? Is @rel='self' being a
same-document reference a problem?


No. As long as xml:base is the same as the document URI.


One thing to note is that when retrieving the document from the
location @rel='self' refers to, as the external base URI for
atom:feed is the one which @rel='self' also references. So when
retrieving the feed from its preferred location, @rel='self' will
always implicitly be a same-document reference.


Yes, good point!

--
Sjoerd Visscher
http://w3future.com/weblog/

Re: xml:base abuse


* Sjoerd Visscher <[EMAIL PROTECTED]> [2005-08-21 14:50]:
>> Except it’s a @rel='self' link, so you really do want it to
>> resolve to .
> 
> This was about the link in the solution in
> http://www.feedvalidator.org/docs/warning/SameDocumentReference.html
> which is not a self link.

Sorry for not being clearer. I used the documentation example
because that was handier to reference, but I *was* talking about
the actual warning emitted.

What do you think about what I said? Is @rel='self' being a
same-document reference a problem?

One thing to note is that when retrieving the document from the
location @rel='self' refers to, as the external base URI for
atom:feed is the one which @rel='self' also references. So when
retrieving the feed from its preferred location, @rel='self' will
always implicitly be a same-document reference.

Regards,
-- 
Aristotle Pagaltzis //

Re: xml:base abuse



A. Pagaltzis wrote:

* Sjoerd Visscher <[EMAIL PROTECTED]> [2005-08-21 13:40]:


Regarding the solution, my first suggestion would be to change
the xml:base to reference the atom document, e.g.:

 http://example.com/blog/feed.atom"; />

This is also more consistent with the explanation.



Except it’s a @rel='self' link, so you really do want it to
resolve to .


This was about the link in the solution in
http://www.feedvalidator.org/docs/warning/SameDocumentReference.html
which is not a self link.


In fact, I’ve been wondering whether atom:feed/@xml:base doesn’t
obsolete the purpose of atom:[EMAIL PROTECTED]'self'], so that the
former should have been the SHOULD that the latter is, and the
latter not invented at all. It would seem that this notion would
also be less controversial when backported to RSS2, as opposed to
the item of including an atom:link in an RSS2 feed.


Not only does is port well to RSS2, it ports well to every single xml 
document. I think it is good practice, as it makes the infoset of the 
document context independent.


--
Sjoerd Visscher
http://w3future.com/weblog/

Re: xml:base abuse


* Sjoerd Visscher <[EMAIL PROTECTED]> [2005-08-21 13:40]:
> Regarding the solution, my first suggestion would be to change
> the xml:base to reference the atom document, e.g.:
> 
>   http://example.com/blog/feed.atom"; />
> 
> This is also more consistent with the explanation.

Except it’s a @rel='self' link, so you really do want it to
resolve to .

@rel='self' is the one case where I still wonder how it should be
handled. To be helpful and correct, you will want to provide an
atom:feed/@xml:base for your feed, and likewise the conscientous
will want to provide an atom:[EMAIL PROTECTED]'self'] for the feed – now
regardless of *how* you write one or the other, if you use both
as intended by their specs, then the atom:link is inevitably
going to end up a same-document reference.

I wonder if that’s really harmful, though. After all, @rel='self'
is meant as the URL that should be used for *subscription* when
the aggregator does not know where the document came from. This
does not in itself imply dereferencing. (Semantics, semantics…)
Further, I expect that in practice, clients will fall back to
atom:feed/@xml:base when it exists but atom:[EMAIL PROTECTED]'self']
does not.

In fact, I’ve been wondering whether atom:feed/@xml:base doesn’t
obsolete the purpose of atom:[EMAIL PROTECTED]'self'], so that the
former should have been the SHOULD that the latter is, and the
latter not invented at all. It would seem that this notion would
also be less controversial when backported to RSS2, as opposed to
the item of including an atom:link in an RSS2 feed.

Regards,
-- 
Aristotle Pagaltzis //

Re: xml:base abuse



Sam Ruby wrote:

Sjoerd Visscher wrote:


Sam Ruby wrote:



Sjoerd, I'd be interested in your comments on this:

http://tinyurl.com/9o6y2


The explanation in the documentation[1] is perfect. And it says "As the
current xml:base in effect does not match the URI of the document", but
this is not the case in my feed, so I'm not sure why you report a warning?



URI(doc) = http://www.w3future.com/weblog/rss.xml?notransform
xml:base = http://w3future.com/weblog/rss.xml?notransform



Ah, ok, I missed that. (Just to be sure, you added www yourself, or is 
there a link to the feed somewhere with www in it?)



Regarding the solution, my first suggestion would be to change the
xml:base to reference the atom document, e.g.:

 http://example.com/blog/feed.atom"; />



... which would resolve to http://example.com/blog/.  Is that what we
want here?


The original in your example is

  http://example.com/blog/"; />.

--
Sjoerd Visscher
http://w3future.com/weblog/

Re: xml:base abuse


Sjoerd Visscher wrote:
> Sam Ruby wrote:
> 
>> Sjoerd, I'd be interested in your comments on this:
>>
>> http://tinyurl.com/9o6y2
> 
> The explanation in the documentation[1] is perfect. And it says "As the
> current xml:base in effect does not match the URI of the document", but
> this is not the case in my feed, so I'm not sure why you report a warning?

URI(doc) = http://www.w3future.com/weblog/rss.xml?notransform
xml:base = http://w3future.com/weblog/rss.xml?notransform

> Regarding the solution, my first suggestion would be to change the
> xml:base to reference the atom document, e.g.:
> 
>   http://example.com/blog/feed.atom"; />

... which would resolve to http://example.com/blog/.  Is that what we
want here?

- Sam Ruby

Re: xml:base abuse