I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
Please resolve these comments along with any other Last Call comments
you may receive.
Document: draft-ietf-mmusic-rfc2326bis-34
Reviewer: Robert Sparks
Review Date: 05-Jun-2013
IETF LC End Date: 04-Jun-2013
IESG Telechat date: not yet on a telechat
Summary: This draft is on the right track but has open issues, described
in the review.
I have not reviewed this document at the level of detail I prefer for
Gen-art reviews, but have tried to be thorough in the sections I have
reviewed.
In particular,
I didn't verify the tables and the text agree
I didn't carefully check the ABNF
I haven't thought about possible edge cases and race conditions as much
as I would have liked
I didn't closely review the security mechanisms in section 19 or the
specialized MIKEY keying mechanism in the appendix - both need careful
review from security experts.
One observation I would like to make before calling out issues and nits:
The document is very long, and the structure is unusual - much of the
definition of the protocol itself is in the appendices. You are missing
an opportunity to make this document much shorter (thereby likely
increasing its effectiveness). Much of the jump in from RFC2326 was
importing the description of headers and responses from HTTP, tailoring
them to RTSP. That was a good exercise in that it highlights some issues
that would otherwise be non-obvious (and raises a few questions below).
But you followed the style from HTTP perhaps too closely - much shorter
descriptions without examples might have done the job better. Overall,
separating exposition and examples from the protocol definition would
make it much easier for an implementer to find the definition of the
protocol, and use the document as a reference when diagnosing a failure
to interoperate.
Major Issues
I'm not seeing what instructs an RTSP element on how to find the server
it would try to open a connection to given an rtsp or rtsps URI? Are you
assuming it would be doing A or AAAA DNS lookups? Should this protocol
use NAPTR/SRV? The document should be explicit. On a related note, (and
maybe I missed this), but where do you talk about what an element should
check in the server's certificate when connecting over TLS? Are you
assuming a common name match? Or are you expecting some SubjectAltName
processing?
The document should say more about when connection reuse is appropriate,
particularly when the connections happened because of an rtsps uri. Two
different names might resolve to the same IP address - reusing a
connection formed when looking at the first name (and verifying the
server's cert) is dangerous. A new connection needs to be formed (and
verified) instead.
The text talks about the option to queue S->C requests if there isn't a
connection to the client available. Ostensibly, this means the request
can go down some future connection. It's not clear how the server can
tell the right client connected. In particular, when using rtsps, how do
you prevent a malicious client from getting such a queued message that
wasn't meant for him?
Given that the text talks about queuing S-C requests, it should
explicitly call out whether a server can queue a response if the
connection the associated request arrived on is no longer available. I
think what you want to say is that such a response must not be queued,
and must be dropped.
There are several places in the text that talk about using a 503
response with a Retry-After header to push back on traffic from an
element (the first is section 10.7).
* It's not clear what the subject of a 503 is. Is it intended to be
scoped to requests to the resource in the RURI of the associated
request? Is it intended to be scoped to the domain in that resource? Or
is it, like in SIP, intended to be scoped to the server issuing the
response, independent of what the RURI contained?
* If the intent is that the 503 talks about the server, then you should
clarify that a proxy doesn't simply forward 503 responses (after
rewriting CSeqs), and should probably have a response of its own.
Otherwise, clients that might be talking to two different servers
through one proxy would lose access to both of them when one of the
servers 503ed.
Section 4.9.1 lists values the Seek-Style header can take, but 18.45
lists a completely different set (which most of the rest of the document
uses). Should 4.9.1 be changed to use the values in 18.45? Are the right
strings being used in sections 13.4.4 through 13.4.6? Those appear to be
using strings from 4.9.1.
The document will not stand if you delete some of the appendices. They
aren't supplementary material. Please consider restructuring the
normative sections back into the main document, or clearly identifying
which ones define protocol and which are background information.
Section 15 talks about a "transparent' proxy, but there is no
description in this document of what protocol such a thing should
follow. What bad thing happens if you remove all mention of
"transparent" proxies from the document? As far as I can tell, that
won't affect the protocol definition at all.
The list of steps for establishing an SRTP cryptographic context in
C.1.4.1 says "This specification does not specify an explicit method for
indicating this SRTP cryptographic context establishment method, but
future specifications may." in the context of looking at the SDP. SDP
_has_ methods of indicating what keying mechanism is used with SRTP -
are you talking about something different? Why doesn't this section say
something like "the use of SRTP with RTSP is only defined with MIKEY
with keys established as defined in this section. Future documents may
talk about how an RTSP implementation might treat SDP that indicates
some other key mechanism be used"? Could you provide an example in the
document of an RTSP message carrying SDP reflecting the use of SRTP as
defined in this document?
Minor Issues
The document uses the notion of a "control connection" (it appears first
in section 2.5) without defining what that is, or what kind of
connections you might have that aren't control connections.
The update to the registration of the rtsp and rtsps schema call out
that there are changes that can result in interoperability issues. It
doesn't say what the issues are, or who might encounter them. I can
infer that the point is that there might be problems if a URI following
this updated registration were to be processed by an RTSP 1.0
implementation (or any other application that relied on the previous
definition). It would be better to say that explicitly, and talk about
how a previous implementation is likely to act when presented with a URI
that exercised these new changes. (It's not clear to me that all of the
thread at www.ietf.org—msg01567.html
<http://www.ietf.org/mail-archive/web/uri-review/current/msg01567.html>
concluded - I see how the fragments question ended, but what about
things like the addition of IPV6 literals? In particular, I'm not
finding a response where Ted Hardie agreed that the potential for
interoperability failure had been adequately addressed).
I think you've said something different than you mean in 5.4 item 1. I
think you mean to say that an RTSP implementation would ignore any body
that happened to come in message that MUST NOT contain one. What you've
said is that the part of the parser that's doing framing has to stop
framing at the end of the headers, and that such an errant body would be
parsed as a separate message. As written this would keep someone from
using an Internet Message Format parser since it would have to ignore
any Content-Length that might have appeared, even though it wasn't
supposed to.
In section 10.4, it looks like a server can keep an RTSP transaction
pending forever if it sends 100 responses often enough. I assume the
client's recourse if it doesn't want to remain involved is to close the
connection. If that's right, it's worth calling it out explicitly in
this section.
Somewhere, probably in section 15.2, you should be explicit that a proxy
that is multiplexing requests MUST keep the requests in the same order
as they are folded together. You can infer that, or things simply break,
but saying it explicitly would be better.
Both GET_PARAMETER and SET_PARAMETER are listed as keep-alive methods in
10.5, but the note in section 13 on table 7 only uses that as a reason
for requiring SET_PARAMETER. Why doesn't it also apply to GET_PARAMETER?
And why is this only required at servers? "Required" in this section
means Mandatory to Implement, I think. If that's right, it should be
made explicit. If that's not right, what does it mean?
Section 13.8 makes a good argument for always putting parameters to be
retrieved in a body. Why are you keeping the complexity of also allowing
them in headers?
Section 13.10: "That should not provide any benefit." is not clear - can
you expand it a little?
In Section 13.10 you talk about clients _sending_ media (search for
"send or receive media"). Do you mean anything more than possibly RTCP?
Can you make that clear in the section?
Section 15.2's second paragraph appears to be the only place the proxy
rewriting of CSeq in order to multiplex requests is defined, and it is
very loose. It would be easy for a reader to fail to recognize this as
an interoperability requirement. Please consider expanding how this is
specified when addressing the comment about above about keeping requests
in relative order when multiplexing.
You follow the principle of saying a client should treat a response with
an unknown response code, the same as it would treat the x00 of that
class (e.g. 400 for an unknown 4xx). However, you do not define what a
response code of 300 means. How is a client supposed to handle an
unknown 3xx response?
As written, a client and server can use HTTP Basic authentication over
TCP that is not protected with TLS. Consider restricting it's use to TLS
only. Are you sure you don't need to say anything RTSP specific about
DIGEST computations (I don't think you need to go as far as SIP did
talking about DIGEST, but I'm surprised you haven't run into issues
relying only on 2617 literally.
How does 411 Length Required for this protocol make sense? Given that
you've restricted the protocol to TCP and require the Content-Length
header to be present if there is a body, in what circumstance would a
server need to send a 411?
Section 18.19 requires senders to increment CSeq by 1. We have a
reliable transport with no request retransmission, so there should never
be gaps at the receiver. Should the receiver check and react some way if
there are gaps? I think you should explicitly tell them not to even
look. If you agree, why is incrementing by one important as long as you
are always strictly increasing?
You call out several places where RTCP is essential to allow RTSP using
RTP to work correctly, yet section C.1.2 sets up conditions where RTCP
MUST NOT be sent. Why are you allowing those instead of treating the
conditions you describe as errors?
Nits
The definition of Message still talks about connectionless transports
22.14 says "These URI schemes are defined in existing registries which
are not created by RTSP." Should it say that they are _registered_
rather than defined, and "not created by this document" instead of by
RTSP? In section 4.2 in the paragraph discussing ports 554 and 322, why
is the language different for rtsp and rtsps? "MUST be used" vs "is
registered and MUST be assumed"?
Consider a reference in 4.4 to where SMPTE 30 drop is defined,
particularly for where the 1/100th frame measurement comes from.
Section 4.5 introduces NPT as indicating "the absolute position" and
then defines something that can carry a point or a range. I suggest
adjusting the first sentence to allow for a range.
Section 4.9.1's description of Conditional Random Access contains some
ambiguity. It would help be more explicit. I think it's trying to say
"would move the client's play point further away from the requested play
point than it's play point currently is."?
Section 5 : what do you mean by "tricky" switching?
Section 7: consider octets instead of bytes
Section 10.3: what is "take appropriate action" trying to say?
The document mixes using "3rr" and "3xx" to talk about redirect
responses. Why are there two ways to say the same thing?
(Total nit, but) The examples show dates in 1997, when there could be no
RTSP/2.0. Are there other aspects of the examples that might not have
been updated?
Is "will need to use" at the bottom of page 85 in section 13.5.2 trying
to say something that should be said with 2119?
Consider saying that a client must not put a Terminate-Reason in a C->S
Teardown request, or tweak the definition of Terminate-Reason to discuss
client use and the possibility of C->S specific values.
These two sentences in 13.7.1 are particularly hard to parse together:
"The time period is considered extended when it is 10 times the Session
Timeout period. Consideration of the application of the server and its
content should be performed when configuring what is considered as
extended period of time." I suspect they were not written at the same
time? Can you simplify what they are trying to say, noting that the
suggested "10 times" is a default, and that if you want to change that
default, you need to consider the context?
Section 14: "Interleaved binary data SHOULD only be used if RTSP is
carried over TCP". Is this leftover from when UDP was an option?
(small nit) Section 14, you mention an ASCII dollar sign - maybe you
should just say an octet with value 0x24?
It's not clear how far up to go with "upper-layer protocol headers" in
section 14, given that you are trying to speak generally about carrying
things in interleaved data. Your example for RTP may cover the only case
that matters. Consider speaking more normatively (instead of with an
example) for RTP and provide some guidelines about where to cut the
prefixes for other things you might want to carry (lest you end up with
the v6 headers too when tunneling some new foo protocol?)
In section 17, you say "Status codes that have the same meaning are not
repeated here." But you did repeat them (see 200 OK). Should that
sentence be deleted?
304 (section 17.3.4) probably should have been a different class
response, maybe even a 2xx. The text does call it out as "unusual" for
the 3xx, but it would be better to say why this ended up in this
response class.
I might be confused, but I'm not sure the redirection semantics defined
in this document lead to the "black hole" case discussed in 17.4.14. If
I'm confused, could you send me an example of such a redirection black
hole happening?
Section 18.11 first paragraph should be split into two sentences.
Is "MUST be transported over TCP" in C.2.1 stale now that no
connectionless transport is defined?
The end of Appendix G still implies that CSeq was primarily for
unreliable transports. There are other places that out and point to
proxy multiplexing. All of them should acknowledge that CSeq is
necessary for associating responses with requests.