Re: [Gen-art] Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
Hi, Jean-Marc and co-authors On Thu, 2012-05-17 at 12:49 -0400, Jean-Marc Valin wrote: Hi Elwyn, We're submitted a -14 draft to address your comments. Again, see the response to each of these issues in the attached document. I think we are done. Thanks once again for the rapid response. Regards, Elwyn PS I still prefer octets. /E Thanks again, Jean-Marc On 12-05-16 05:26 PM, Elwyn Davies wrote: Hi, Jean-Marc. ... and thanks for the super-quick response! You have been quite busy. I have had a look through the new draft and I think the additions help considerably with comprehension for the naive (and to give new implementers a way in.)
Re: [Gen-art] Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
I think the authors just about have a -14 draft ready but I wanted to comment on one topic inline On May 16, 2012, at 3:26 PM, Elwyn Davies wrote: Hi, Jean-Marc. ... and thanks for the super-quick response! You have been quite busy. I have had a look through the new draft and I think the additions help considerably with comprehension for the naive (and to give new implementers a way in.) I'll leave you to negotiate with the RFC Editor over the Wikipaedia references. To quote the RFC Style guide http://www.rfc-editor.org/rfc-style-guide/rfc-style Section 4.8, item (x) References, last section: URLs and DNS names in RFCs The use of URLs in RFCs is discouraged, because many URLs are not stable references. Exceptions may be made for normative references in those cases where the URL is demonstrably the most stable reference available. References to long-lived files on ietf.org and rfc-editor.org are generally acceptable. They are certainly convenient *as long as they remain in place and aren't corrupted*. I found a couple of trivial editorial nits in the changes: s4.3.3 (in the added text): The CELT layer, however, can adapt over a very wide range of rates, and thus has a large number of codebooks sizes s/codebooks/codebook/ s4.3.3, para after Table 57: s?the maximums in bit/sample are precomputed?the maximums in bits/sample are precomputed? Also suggest: s4.3: Add reference for Bark scale: Zwicker, E. (1961), Subdivision of the audible frequency range into critical bands, The Journal of the Acoustical Society of America, 33, Feb., 1961. A few responses in line below (agreed pieces elided): Regards, Elwyn Davies On Tue, 2012-05-15 at 20:33 -0400, Jean-Marc Valin wrote: Hi Elwyn, Thanks for the very thorough review. We've addressed your issues and submitted draft version -13. See our response to each of the issues you raised (aggregated from all the authors) in the attached document. Thanks very much to all the authors. Cheers, Jean-Marc Elwyn Davies wrote: Major issues: Can we accept the code as normative? If not how do we proceed? The issue with code being normative was specifically addressed in the guidelines document for this WG (RFC 6569). Yes. I failed to read this although Robert Sparks did point out to me that the code was normative - but he didn't think he said this was agreed in advance (or maybe I didn't understand what he was saying). To be honest I would like to have had the time to tie the text to the code but realistically that is several weeks or even months work to do it properly - without that I feel that I have only done half a job. I may decide that it interests me enough to have a superficial go next week but no promises! Minor issues: Contrast between decoder descriptions of LP part and CELT part: The SILK descriptions go into gory detail on the values used in lots of tables, etc., whereas the CELT part has a very limited treatment of the numeric values used (assuming reliance on finding the values in the reference implementation, either explictly or implicitly). There are things to be said for both techniques. I was wondering (while reading the SILK description) if the authors have any means of automatically generating the tables from the code in the SILK part (or vice versa) to avoid double maintenance. On the other hand, there are pieces of the CELT decoder description (especially in s4.3.3 where knowing numbers of bands, etc.) where some actual numbers would help comprehension. We have made many changes to section 4.3 (and 4.3.3 specifically) to address the specific issues below. As for the tables, they are not generated automatically. I think this is addressed satisfactorily now. There is still some difference but it is much reduced and not so glaring now. The addition of the band tables really helps. s2 (and more generally): The splitting of the signal in the frequency domain into signal (components?) below and above 8kHz presumably requires that the signal is subjected to a Discrete Fourier Transform to allow the signal to be split. I think sometging is needed in s2 to explain how this is managed (or if I don't understand, to explain why it isn't necessary). No DFT is used. The lower band is obtained through resampling (which is already described) and the higher band is obtained by not coding the lower band with CELT (the text says that CELT starts at band 17 in hybrid mode). The explanation was reworded to make this as clear as possible at this point in the text. [I thought I had reworded this comment in the 2nd version to talk about MDCT but no matter]. Yes, para 5 of s2 does say that the bands are discarded. I think it would useful to have a concrete statement in the new text added to
Re: [Gen-art] Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
Hi, Jean-Marc. ... and thanks for the super-quick response! You have been quite busy. I have had a look through the new draft and I think the additions help considerably with comprehension for the naive (and to give new implementers a way in.) I'll leave you to negotiate with the RFC Editor over the Wikipaedia references. To quote the RFC Style guide http://www.rfc-editor.org/rfc-style-guide/rfc-style Section 4.8, item (x) References, last section: URLs and DNS names in RFCs The use of URLs in RFCs is discouraged, because many URLs are not stable references. Exceptions may be made for normative references in those cases where the URL is demonstrably the most stable reference available. References to long-lived files on ietf.org and rfc-editor.org are generally acceptable. They are certainly convenient *as long as they remain in place and aren't corrupted*. I found a couple of trivial editorial nits in the changes: s4.3.3 (in the added text): The CELT layer, however, can adapt over a very wide range of rates, and thus has a large number of codebooks sizes s/codebooks/codebook/ s4.3.3, para after Table 57: s?the maximums in bit/sample are precomputed?the maximums in bits/sample are precomputed? Also suggest: s4.3: Add reference for Bark scale: Zwicker, E. (1961), Subdivision of the audible frequency range into critical bands, The Journal of the Acoustical Society of America, 33, Feb., 1961. A few responses in line below (agreed pieces elided): Regards, Elwyn Davies On Tue, 2012-05-15 at 20:33 -0400, Jean-Marc Valin wrote: Hi Elwyn, Thanks for the very thorough review. We've addressed your issues and submitted draft version -13. See our response to each of the issues you raised (aggregated from all the authors) in the attached document. Thanks very much to all the authors. Cheers, Jean-Marc Elwyn Davies wrote: Major issues: Can we accept the code as normative? If not how do we proceed? The issue with code being normative was specifically addressed in the guidelines document for this WG (RFC 6569). Yes. I failed to read this although Robert Sparks did point out to me that the code was normative - but he didn't think he said this was agreed in advance (or maybe I didn't understand what he was saying). To be honest I would like to have had the time to tie the text to the code but realistically that is several weeks or even months work to do it properly - without that I feel that I have only done half a job. I may decide that it interests me enough to have a superficial go next week but no promises! Minor issues: Contrast between decoder descriptions of LP part and CELT part: The SILK descriptions go into gory detail on the values used in lots of tables, etc., whereas the CELT part has a very limited treatment of the numeric values used (assuming reliance on finding the values in the reference implementation, either explictly or implicitly). There are things to be said for both techniques. I was wondering (while reading the SILK description) if the authors have any means of automatically generating the tables from the code in the SILK part (or vice versa) to avoid double maintenance. On the other hand, there are pieces of the CELT decoder description (especially in s4.3.3 where knowing numbers of bands, etc.) where some actual numbers would help comprehension. We have made many changes to section 4.3 (and 4.3.3 specifically) to address the specific issues below. As for the tables, they are not generated automatically. I think this is addressed satisfactorily now. There is still some difference but it is much reduced and not so glaring now. The addition of the band tables really helps. s2 (and more generally): The splitting of the signal in the frequency domain into signal (components?) below and above 8kHz presumably requires that the signal is subjected to a Discrete Fourier Transform to allow the signal to be split. I think sometging is needed in s2 to explain how this is managed (or if I don't understand, to explain why it isn't necessary). No DFT is used. The lower band is obtained through resampling (which is already described) and the higher band is obtained by not coding the lower band with CELT (the text says that CELT starts at band 17 in hybrid mode). The explanation was reworded to make this as clear as possible at this point in the text. [I thought I had reworded this comment in the 2nd version to talk about MDCT but no matter]. Yes, para 5 of s2 does say that the bands are discarded. I think it would useful to have a concrete statement in the new text added to s4.3 that bands 0 to 16 are discarded in hybrid mode (thereby making the 17 in the band boost section more obvious) [There is a comment below that you have added some text about band 17 in
Re: [Gen-art] Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
Hi Elwyn, We're submitted a -14 draft to address your comments. Again, see the response to each of these issues in the attached document. Thanks again, Jean-Marc On 12-05-16 05:26 PM, Elwyn Davies wrote: Hi, Jean-Marc. ... and thanks for the super-quick response! You have been quite busy. I have had a look through the new draft and I think the additions help considerably with comprehension for the naive (and to give new implementers a way in.) On 12-05-16 05:26 PM, Elwyn Davies wrote: The CELT layer, however, can adapt over a very wide range of rates, and thus has a large number of codebooks sizes s/codebooks/codebook/ Fixed. s4.3.3, para after Table 57: s?the maximums in bit/sample are precomputed?the maximums in bits/sample are precomputed? Fixed. Also suggest: s4.3: Add reference for Bark scale: Zwicker, E. (1961), Subdivision of the audible frequency range into critical bands, The Journal of the Acoustical Society of America, 33, Feb., 1961. Done. No DFT is used. The lower band is obtained through resampling (which is already described) and the higher band is obtained by not coding the lower band with CELT (the text says that CELT starts at band 17 in hybrid mode). The explanation was reworded to make this as clear as possible at this point in the text. [I thought I had reworded this comment in the 2nd version to talk about MDCT but no matter]. Yes, para 5 of s2 does say that the bands are discarded. I think it would useful to have a concrete statement in the new text added to s4.3 that bands 0 to 16 are discarded in hybrid mode (thereby making the 17 in the band boost section more obvious) [There is a comment below that you have added some text about band 17 in section 4.3 but I can't see it]. Sorry, we started working on the revision as soon as you sent the first part of the review, and then we just copied the new parts (didn't notice some of the review changed). Also, the reason you didn't see the new explanations about band 17 in s4.3 is that we moved them to s2 para 5, to help make the explanation of the signal splitting clearer, but forgot to update the response to indicate that (sorry about that). However, it probably does make sense to explain it in both places, so the sentence, In hybrid mode, the first 17 bands (up to 8 kHz) are not coded. has been added to s4.3 as well. As explained in the LBRR text, a 10 ms frame will only contain 10 ms LBRR data even if the previous frame was 20 ms, so there's 10 ms missing. Indeed - that there would be a hole was clear. The 'How' referred to how would it be concealed. Having read further by now this may be down to Packet Loss Concealment - so maybe all it needs is a foward ref to s4.4. Reference added. We believe that in the field of audio codecs, the mention of byte without further context is well understood to mean 8 bits. True. But this is a matter of IETF style. The style is to use octets where we mean 8 bit bytes. I think you now have a mixture! Indeed, there's a bit of inconsistency here. Considering Cullen's email, the document now uses byte consistently. s4.2.7.5.1, para 1: s/This indexes an element in a coarse codebook, selects the PDFs for the second stage of the VQ/This indexes an element in a coarse codebook that selects the PDFs for the second stage of the VQ/ The text as written is correct. The index I1 is what selects the PDFs for the second stage, not the vector from the coarse codebook in Tables 23 and 24. I.e., it's saying, This does A, B, and C. OK. I think it might be clearer if the three things were separated out as a list. Now you point it out I can read it correctly but it triggered minor confusion - worth turning the three things into bullet points. This is not a bad idea. I agree it helps make things clearer. Done. NEW: s4.3: Add reference for Bark scale: Zwicker, E. (1961), Subdivision of the audible frequency range into critical bands, The Journal of the Acoustical Society of America, 33, Feb., 1961. Done (as stated above). s4.3.3: (was specified as s 4.3.2.3 whcj was wrong) Paragraph on decoding band boosts: Might be improved by using equations rather than the wordy descriptions used at present. Any thoughts on this one Oops, that one slipped through. While most of the text is actually describing an algorithm rather than an equation, it was possible to simplify the part about the quanta with an equation. The text now reads: For each band from the coding start (0 normally, but 17 in Hybrid mode) to the coding end (which changes depending on the signaled bandwidth), the boost quanta in units of 1/8 bit is calculated as: quanta = min(8*N, max(48, N)). s4.3.3: snip. Added an explanation of band 17 I don't think this happened. See above -- operator error. It's fixed now. The tests in s6.1 are measuring the quality of the decoded signal against the reference decoded signal, not against
Re: Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
Hi Elwyn, Thanks for the very thorough review. We've addressed your issues and submitted draft version -13. See our response to each of the issues you raised (aggregated from all the authors) in the attached document. Cheers, Jean-Marc On 12-05-14 09:13 AM, Elwyn Davies wrote: I am the assigned Gen-ART reviewer for this draft. For background on Gen-ART, please see the FAQ at http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq. Please resolve these comments along with any other Last Call comments you may receive. Document: draft-ietf-codec-opus-12.txt Reviewer: Elwyn Davies Review Date: 14 May 2012 (completed) IETF LC End Date: 10 May 2012 IESG Telechat date: (if known) - Elwyn Davies wrote: Major issues: Can we accept the code as normative? If not how do we proceed? The issue with code being normative was specifically addressed in the guidelines document for this WG (RFC 6569). Minor issues: Contrast between decoder descriptions of LP part and CELT part: The SILK descriptions go into gory detail on the values used in lots of tables, etc., whereas the CELT part has a very limited treatment of the numeric values used (assuming reliance on finding the values in the reference implementation, either explictly or implicitly). There are things to be said for both techniques. I was wondering (while reading the SILK description) if the authors have any means of automatically generating the tables from the code in the SILK part (or vice versa) to avoid double maintenance. On the other hand, there are pieces of the CELT decoder description (especially in s4.3.3 where knowing numbers of bands, etc.) where some actual numbers would help comprehension. We have made many changes to section 4.3 (and 4.3.3 specifically) to address the specific issues below. As for the tables, they are not generated automatically. s2 (and more generally): The splitting of the signal in the frequency domain into signal (components?) below and above 8kHz presumably requires that the signal is subjected to a Discrete Fourier Transform to allow the signal to be split. I think sometging is needed in s2 to explain how this is managed (or if I don't understand, to explain why it isn't necessary). No DFT is used. The lower band is obtained through resampling (which is already described) and the higher band is obtained by not coding the lower band with CELT (the text says that CELT starts at band 17 in hybrid mode). The explanation was reworded to make this as clear as possible at this point in the text. s4.1: Despite having found a copy of the range coding paper and tried to read it, I found the whole business of range coding opaque. Given that this is (I think) fairly novel, some additional less opaque description could be very helpful to people trying to understand what is going on. In particular knowledge of how entropy coding works is pretty much a prerequisite. Added a link to the Wikipedia article on range coding. s4.2.5, para 3: When switching from 20 ms to 10 ms, the 10 ms 20 ms Opus frame, potentially leaving a hole that needs to be concealed from even a single packet loss. How? As explained in the LBRR text, a 10 ms frame will only contain 10 ms LBRR data even if the previous frame was 20 ms, so there's 10 ms missing. s4.2.5, para 4: In order to properly produce LBRR frames under all conditions, an encoder might need to buffer up to 60 ms of audio and re-encode it during these transitions. However, the reference implementation opts to disable LBRR frames at the transition point for simplicity. Should this be phrased in RFC 2119 requiremenmts language? The first part sounds like a SHOULD with the second part being the get out , but its not entirely clear what the consequences are other then simplicity. There really is no SHOULD there. We're just describing possible strategies for encoding. In general, we have avoided 2119 language as much as possible for the encoder. However, the sentence, Since transitions are relatively infrequent in normal usage, this does not have a significant impact on packet loss robustness, was added to alleviate any concerns about the consequences of this decision. == MINOR ISSUES ABOVE submitted as part 1 of review == general/s11.2: Several references [Hadamard], [Viterbi}, etc., are to Wikipaedia pages. Whilst these are convenient (and only illustrative) they are not guaranteed to be very stable. Better (i.e., more stable) references are desirable. We believe that the links themselves should be quite stable. We could also link specific revisions in Wikipedia, but the quality of the content is more likely to get better than worse. Considering that these references are informational, we believe that Wikipedia is a good
Re: Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
On Mon, 2012-05-14 at 17:08 -0600, Cullen Jennings wrote: Thank you kindly for the detailed review. More inline ... On May 14, 2012, at 9:06 AM, Elwyn Davies wrote: Summary: Before offering some views on the document, let me say that this piece of work seems to be a tour de force on behalf of its developers. It is certainly one of the (if not the) most technically complex pieces of work that has been presented to the IETF, And I would like to say thank you for the detailed review - having read the whole draft several times, I know that ones eyes can start to glaze over on what seems like something almost as long as the NFS spec ... so thanks for the review. Not sure I would say 'it was my pleasure' but it certainly taught me a thing or two! I did want to comment on the one major issues Major issues: Can we accept the code as normative? If not how do we proceed? RFC 6569 says that the code needs to be the normative specification so I think this issues is pretty much resolved by that RFC. As a bit of irrelevant history - this topic was discussed at various stages. If was discussed in the chartering process - draft-valin-codec-guidelines referred to by the charter said code would be normative. I don't mean by this that the charter said the code had to be normative but just that this conversation goes back to before the WG was formed. It was later discussed in the WG and resulted in WG consensus to having the code as normative. Just as background on a few of the reasons that this was decided this way, many people felt that the spec would be much longer, and much harder to implement or review if the text was normative. Code is a very compact representation of what happens on boundary edge conditions and eliminates many types of misinterpretations. I do understand normative code introduces other types of issues but it is a fairly common practice in specifying codecs to use normative code. Cullen Codec Co-Chair I don't have a problem with this in principle: however as a reviewer I get this nagging feeling that on a few days acquaintance, its impossible to demonstrate that the text accurately, albeit non-normatively describes the code. And that makes me feel I have done a half job. However, I also know that starting from scratch it would take me several months to get anywhere near an accurate view (I know from considerable experience with the DTN2 reference implementation and years ago the source code of yacc - now that ages me!) If the wg, doc shepherd and the IESG are happy that the mapping is near enough, so be it. And, sorry, I didn't look at RFC 6569 during my review.. probably shoyuld have done. Regards, Elwyn
Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
I am the assigned Gen-ART reviewer for this draft. For background on Gen-ART, please see the FAQ at http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq. Please resolve these comments along with any other Last Call comments you may receive. Document: draft-ietf-codec-opus-12.txt Reviewer: Elwyn Davies Review Date: 14 May 2012 (completed) IETF LC End Date: 10 May 2012 IESG Telechat date: (if known) - Summary: Before offering some views on the document, let me say that this piece of work seems to be a tour de force on behalf of its developers. It is certainly one of the (if not the) most technically complex pieces of work that has been presented to the IETF, and is far more mathematically complex and rigorous than our usual run of standards. It is also perhaps a vindication of the rough concensus and running code philosophy. So congratulations to the authors and the companies that have supported this work. But back to the review I came to this review with a very outdated view of what goes on inside codecs, so it was rather a shock to the system. Taking that into account, I think that some additional high level, up front description of the two parts (SILK and CELT) and the range encoding system would make it easier for future (naive) readers and potential implementers to understand what is going on. For example, after struggling with the description of CELT in the decoder (s4.3) I found that going off and reading the slides on CELT presented at IETF 79 helped considerably. Such additions might also help future dissemination of the technique by giving potential users a quick overview. It would also make it easier to justify the decision to postpone the high level (and highly mathematical/technical) view to section 5 after the detailed description of the decoder. I also found that the descriptions of the SILK and CELT decoders appeared to be written with a different perspective (see detailed comments) doubtless by different authors. This is not ideal and, despite the expected further increase in page count, some additional text in s4.3 seems desirable. By far the most difficult to parse section is 4.3.3 talking about bit allocation in CELT. Despite considerable study, I didn't feel I had really got to grips with how this worked. An example might help. Finally, the most contentious aspect of this document is the requirement to treat the code as normative. There are just over 30,000 physical lines of code and I haven't checked them hardly at all. Slocount reckons this represents 7.14 man years of effort. As with the document, there is a discrepancy between CELT and SILK with the latter code being more heavily commented, especially as regards routine parameters. A proper validation of the claim that the code implements the description would take several weeks of time: I guess that we have to take the document shepherd's assurances on this. One issue is that both code and document are pretty much saturated with what we used to call 'magic numbers'. Although these are explained in the document, it does not appear that this is always the case in the code. I would also be happier if the code contained Doxygen (or similar) dcoumentation statements for all routines (rather than just the API). So overall, the work is something that ought to be standardized but the document needs further work - not quite ready for the IESG. Major issues: Can we accept the code as normative? If not how do we proceed? Minor issues: Contrast between decoder descriptions of LP part and CELT part: The SILK descriptions go into gory detail on the values used in lots of tables, etc., whereas the CELT part has a very limited treatment of the numeric values used (assuming reliance on finding the values in the reference implementation, either explictly or implicitly). There are things to be said for both techniques. I was wondering (while reading the SILK description) if the authors have any means of automatically generating the tables from the code in the SILK part (or vice versa) to avoid double maintenance. On the other hand, there are pieces of the CELT decoder description (especially in s4.3.3 where knowing numbers of bands, etc.) where some actual numbers would help comprehension. s2 (and more generally): The splitting of the signal in the frequency domain into signal (components?) below and above 8kHz in the hybrid case presumably requires that the output of the CELT encoder is windowed so that only one of encoders gnerates output below 8kHZ. I think something is needed in s2 to explain how this is managed (presumably to do with energy bands?). I didn't find anything in s5 about what has to be done for the encoder when running in hybrid mode which surprised me somewhat. s4.1: Despite having found a copy of the range coding paper and tried to read it, I found the description of range coding opaque (there are some more words later in s5 but this is a bit late for understanding the decoder).
Re: Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
Hi Elwyn, At 08:06 14-05-2012, Elwyn Davies wrote: I am the assigned Gen-ART reviewer for this draft. For background on [snip] Summary: Before offering some views on the document, let me say that this piece of work seems to be a tour de force on behalf of its developers. It is certainly one of the (if not the) most technically complex pieces of work that has been presented to the IETF, and is far more mathematically complex and rigorous than our usual run of standards. It is also Let me say that this is one of the best Gen-ART reviews I have read. It is not easy to review draft-ietf-codec-opus-12. I agree that it is the most technically complex piece of work within all IETF areas. The normative choice made by the WG is highly unusual. The alternative would be much more work; it might even be an unworkable option. Regards, -sm
Re: Gen-art last call review of draft-ietf-codec-opus-12.txt (completed)
Thank you kindly for the detailed review. More inline ... On May 14, 2012, at 9:06 AM, Elwyn Davies wrote: Summary: Before offering some views on the document, let me say that this piece of work seems to be a tour de force on behalf of its developers. It is certainly one of the (if not the) most technically complex pieces of work that has been presented to the IETF, And I would like to say thank you for the detailed review - having read the whole draft several times, I know that ones eyes can start to glaze over on what seems like something almost as long as the NFS spec ... so thanks for the review. I did want to comment on the one major issues Major issues: Can we accept the code as normative? If not how do we proceed? RFC 6569 says that the code needs to be the normative specification so I think this issues is pretty much resolved by that RFC. As a bit of irrelevant history - this topic was discussed at various stages. If was discussed in the chartering process - draft-valin-codec-guidelines referred to by the charter said code would be normative. I don't mean by this that the charter said the code had to be normative but just that this conversation goes back to before the WG was formed. It was later discussed in the WG and resulted in WG consensus to having the code as normative. Just as background on a few of the reasons that this was decided this way, many people felt that the spec would be much longer, and much harder to implement or review if the text was normative. Code is a very compact representation of what happens on boundary edge conditions and eliminates many types of misinterpretations. I do understand normative code introduces other types of issues but it is a fairly common practice in specifying codecs to use normative code. Cullen Codec Co-Chair