Re: Best practice for data encoding?
Harald Alvestrand wrote: Ted Faber wrote: On Mon, Jun 12, 2006 at 02:11:19PM +0200, Iljitsch van Beijnum wrote: The problem with text is that you have to walk through memory and compare characters. A LOT. That's not where your code spends its time. Run gprof(1). The majority of time your code spends is spent doing the 2 integer divides per text to integer conversion and in strtoimax (called by fscanf). Multiplying or dividing is the worst thing you can do on a CPU in general. Note that CPUs are different; some multiply faster than others, compared to the rest of the HW. And if you really need to, you can optimize... a multiplication by 10, for instance, can be done by two left shifts and an addition (a*10 = a 3 + a1); You might not end up with the same set of condition codes at the end, though... Joe signature.asc Description: OpenPGP digital signature ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
Ted Faber wrote: On Mon, Jun 12, 2006 at 02:11:19PM +0200, Iljitsch van Beijnum wrote: The problem with text is that you have to walk through memory and compare characters. A LOT. That's not where your code spends its time. Run gprof(1). The majority of time your code spends is spent doing the 2 integer divides per text to integer conversion and in strtoimax (called by fscanf). Multiplying or dividing is the worst thing you can do on a CPU in general. Note that CPUs are different; some multiply faster than others, compared to the rest of the HW. And if you really need to, you can optimize... a multiplication by 10, for instance, can be done by two left shifts and an addition (a*10 = a 3 + a1); I have no idea why strtoimax would do divisions, but I haven't written decimal-number parsers for a very long time; I think Knuth had 3 different ones in Seminumerical algorithms, but that was a VERY long time ago... Harald ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
Iljitsch van Beijnum wrote: in your original question to the list, you didn't quite make clear that your question was with respect to BGP-style transfer of large- scale routing information. I didn't want to limit the scope of the discussion to one particular type of protocol. Could you stop this abstract nonsense, then? Many old protocols designed for old, slow computers such as telnet/ftp/smtp, does use text. Their replies are structured text mostly with 3 digits. That is, properly designed text based protocols are fine. Note that all you need is EBNF, not XML. Masataka Ohta ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Tue, Jun 13, 2006 at 09:23:42AM +0200, Harald Alvestrand wrote: Ted Faber wrote: Multiplying or dividing is the worst thing you can do on a CPU in general. Note that CPUs are different; some multiply faster than others, compared to the rest of the HW. I am not a CPU designer, but my understanding is that the multiply instructions are usually the ones that are most complex single instructions, and in some sense determine the processor speed. (All you VAX guys can sit down; I know about your polynomial evaluation instructions, I'm just speaking generally). Trying to extrapolate great truths about data encoding from 10 lines of C will not be fruitful. Yeah, this is mostly off topic for networking, but as Joe Touch demonstrated, sometimes it matters: http://www.isi.edu/touch/pubs/sigcomm95.html -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG pgphLh6KObuJn.pgp Description: PGP signature ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On 8-jun-2006, at 11:13, Iljitsch van Beijnum wrote: But it's a clear trade-off between efficient representation and complexity for the developer - we've had this debate once already in this thread, I think. The more complex the representation, the harder it is to code correctly and debug. Simple/complex isn't the same as text/binary. I'm pretty sure a programmer armed with nothing more than a standard issue C compiler and the basic libraries that come with it, would have a harder time parsing XML than something like the DNS protocol. But my point was that the resulting code would be slower. I know it's very old- fashioned to even consider such things, but there are places where performance is important beyond just philosophical objections against bloat. A little post script to this discussion: I wrote a few small test programs in C to evaluate the performance of reading integers from a text file using stdio.h versus doing the same with direct read()s from a binary file. The difference is between two and three orders of magnitude. See http://ablog.apress.com/?p=1146 ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
A little post script to this discussion: I wrote a few small test programs in C to evaluate the performance of reading integers from a text file using stdio.h versus doing the same with direct read() s from a binary file. The difference is between two and three orders of magnitude. See http://ablog.apress.com/?p=1146 Iljitsch, in your original question to the list, you didn't quite make clear that your question was with respect to BGP-style transfer of large- scale routing information. Right now, you seem to focus on decoding performance. How much of the CPU time spent for BGP is decoding? Does the CPU time spent for the entirety of BGP even matter*? If yes, can a good data structure/encoding help with the *overall* problem? The results from your test programs are not at all surprising. Of course, a hand-coded loop where all data already is in the right form (data type, byte order, number of bits), no decisions need to be made, and you even know the number of data items beforehand, is going to be faster than calling the generic, pretty much neglected, parameterized, tired library routine fscanf that doesn't get much use outside textbooks. (The read anomaly is caused by read(2) being an expensive system call; all other cases use a form of buffering to reduce the number of system calls.) What this example shows nicely is that performance issues are non- trivial, and, yes, you do want to run measurements, but at the system level and not at the level of test cases that have little or no relationship to the performance of the real system. If you really care about the performance of text-based protocols, you cannot ignore modern tools like Ragel. If, having used them, you still manage to find the text processing overhead in your profiling data, I'd like to hear from you. Still, for BGP, a binary protocol encoding may be a better fit because routing tables are so much about bits and prefixes and other numeric information already designed to be used in binary protocol encodings. Also, it may be easier to reduce both data rate and processing by exploiting more of the structure of the BGP routing information. (I.e., to make it redundantly clear, I would probably choose binary here, but not for the reasons given in your blog post.) Gruesse, Carsten *) Yes, that's a trick question to elicit responses :-) ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On 12-jun-2006, at 13:31, Carsten Bormann wrote: in your original question to the list, you didn't quite make clear that your question was with respect to BGP-style transfer of large- scale routing information. I didn't want to limit the scope of the discussion to one particular type of protocol. Right now, you seem to focus on decoding performance. How much of the CPU time spent for BGP is decoding? Does the CPU time spent for the entirety of BGP even matter*? If yes, can a good data structure/encoding help with the *overall* problem? I can't answer the first question, because the only BGP we have uses binary. But I'm pretty confident that doing the same thing using a text based encoding isn't going to do us any favors performance wise. The results from your test programs are not at all surprising. Of course, a hand-coded loop where all data already is in the right form (data type, byte order, number of bits), no decisions need to be made, and you even know the number of data items beforehand, is going to be faster than calling the generic, pretty much neglected, parameterized, tired library routine fscanf that doesn't get much use outside textbooks. Byte order stuff and such isn't much of an issue compared to the time required for memory access. And I guess fscanf could be be a slow implementation, but this is just reading a value from a line of text, no hunting for tags and such that's required in HTML or XML. Also, the performance gap is just so huge, I don't think the details matter too much. What this example shows nicely is that performance issues are non- trivial, and, yes, you do want to run measurements, but at the system level and not at the level of test cases that have little or no relationship to the performance of the real system. Sure, but how are you going to do that kind of testing when designing a protocol? Creating two implementations just to see which variation is faster would be a good idea but I don't really see that happening... If you really care about the performance of text-based protocols, you cannot ignore modern tools like Ragel. Don't know it. If, having used them, you still manage to find the text processing overhead in your profiling data, I'd like to hear from you. The problem with text is that you have to walk through memory and compare characters. A LOT. This is pretty much the worst thing you can do to a modern CPU: you don't use the logical and hardly the physical word width, and all those compares are hard to predict so you get massive numbers of incorrectly predicted branches. But I guess this discussion can go on forever... ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Mon, Jun 12, 2006 at 02:11:19PM +0200, Iljitsch van Beijnum wrote: The problem with text is that you have to walk through memory and compare characters. A LOT. That's not where your code spends its time. Run gprof(1). The majority of time your code spends is spent doing the 2 integer divides per text to integer conversion and in strtoimax (called by fscanf). Multiplying or dividing is the worst thing you can do on a CPU in general. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG pgpKs4KugT1BN.pgp Description: PGP signature ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
So how about concluding that there is no single right answer to Iljitsch's question, but there may be scope for defining considerations for the choice of data encoding? Brian ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Wed Jun 7 23:19:29 2006, Iljitsch van Beijnum wrote: On 7-jun-2006, at 22:38, Dave Cridland wrote: I think it's worth noting that nobody is preventing you from using XML over a compressed channel, which tends to increase XML's efficiency rather sharply. [...] Wire efficiency, for the most part, needs to take place by avoiding the transmission of information entirely, rather than trying to second-guess the compression. Obviously adding compression doesn't help processing efficiency. It does if you're doing encryption, unless the protocol itself is almost entirely uncompressible. Compression is a lot cheaper than encryption, and the more you compress, the less you have to encrypt. I've long harbored the suspicion that a large percentage of the cycles in today's fast CPUs are burned up parsing various types of text. Does anyone know of any processing efficiency comparisons between binary and text based protocols? But it's a clear trade-off between efficient representation and complexity for the developer - we've had this debate once already in this thread, I think. The more complex the representation, the harder it is to code correctly and debug. I'm not saying there's no place for binary protocols, but I am saying that to go for a binary representation, you have to be working in a highly resource constrained environment where both bandwidth and CPU usage is highly limited. As an example, IMAP and ACAP streams compress by around 70% on my client - and that's trying to be bandwidth efficient in its protocol usage. I've seen figures of 85% talked about quite seriously, too. And you think that's a good thing? No, it demonstrates that even protocols which have a reputation for chatter actually turn out to be efficient on the wire post-compression. Now I understand why it takes me up to 10 minutes to download a handful of 1 - 2 kbyte emails with IMAP (+SSL) over a 9600 bps GSM data connection. I can write an efficient IMAP client, and give it away for free, but I can't force you to use it. :-) I can read a new message inside a minute over 9600bps with 1 second latency, without even using a local disk cache - including fetching configuration over ACAP to begin with. My INBOX contains 24,033 messages currently - obviously I'm not downloading them all, and this is emulated bandwidth and latency, thus not entirely accurate. IMAP can be extremely efficient; if it's taking you 10 minutes to read a handful of emails, you're using the wrong client, the wrong server, or quite possibly both. With Lemonade features coming into mainstream implementations, you really ought to be looking at your systems if you're using GSM data. Compression is not really one of these, oddly enough, but TLS deflate ought to be available to most clients and servers by now anyway. Dave. -- Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED] - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/ - http://dave.cridland.net/ Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On 8-jun-2006, at 9:50, Dave Cridland wrote: I've long harbored the suspicion that a large percentage of the cycles in today's fast CPUs are burned up parsing various types of text. Does anyone know of any processing efficiency comparisons between binary and text based protocols? But it's a clear trade-off between efficient representation and complexity for the developer - we've had this debate once already in this thread, I think. The more complex the representation, the harder it is to code correctly and debug. Simple/complex isn't the same as text/binary. I'm pretty sure a programmer armed with nothing more than a standard issue C compiler and the basic libraries that come with it, would have a harder time parsing XML than something like the DNS protocol. But my point was that the resulting code would be slower. I know it's very old- fashioned to even consider such things, but there are places where performance is important beyond just philosophical objections against bloat. Now I understand why it takes me up to 10 minutes to download a handful of 1 - 2 kbyte emails with IMAP (+SSL) over a 9600 bps GSM data connection. I can write an efficient IMAP client, and give it away for free, but I can't force you to use it. :-) It's not that I'm a huge fan of Apple's Mail but it's the only GUI mail application I've found so far that I can actually work with without the irresistible urge to chew through the mouse cable and dust off my VT420 terminal... (Which allows me to read my mail with Pine without any trouble over 9600 bps.) ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Schema languages for XML (Was: Best practice for data encoding?
On Tue, Jun 06, 2006 at 09:50:22AM -0700, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote a message of 42 lines which said: At this point XML is not a bad choice for data encoding. +1 The problem in XML is that XML Schema was botched and in particular namespaces and composition are botched. I think this could be fixed, perhaps. There are other schema languages than the bloated W3C Schema. The most common is RelaxNG (http://www.relaxng.org/). In the IETF land, while RFC 3730 and 3981 unfortunately use W3C Schema, RFC 4287 uses RelaxNG. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Mon, Jun 05, 2006 at 08:21:29PM -0400, Steven M. Bellovin wrote: More precisely -- when something is sufficiently complex, it's inherently bug-prone. That is indeed a good reason to push back on a design. The question to ask is whether the *problem* is inherently complex -- when the complexity of the solution significanlty exceeds the inherent complexity of the problem, you've probably made a mistake. When the problem itself is sufficiently complex, it's fair to ask if it should be solved. Remember point (3) of RFC 1925. One of the complaints about ASN.1 is that it is an enabler of complexity. It becomes easy to specify some arbitrarily complex data structures, and very often with that complexity comes all sorts of problems. To be fair, though, the same thing can be said of XML (and I'm not a big fan of a number of specifications utilizing XML either, for the same reason), and ultimately, you can write Fortran in any language. Ultimately, I have to agree with Steve, it's not the encoding, it's going to about asking the right questions at the design phase more than anything else, at least as far as the protocol is concerned. As far as implementation issues, it is true that ASN.1 doesn't map particularly well to standard programming types commonly in use, although if you constrain the types used in the protocol that issue can be relative easily avoided (so would using XML, or a simpler ad-hoc encoding scheme). I don't think there is any right answer here, although my personal feelings about ASN.1 can still be summed up in the aphorism, Friends don't let friends use ASN.1, but that's mostly due to psychic scars caused by my having to deal with the Kerboers v5 protocol's use of ASN.1, and the feature and design bloat that resulted from it. - Ted ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Schema languages for XML (Was: Best practice for data encoding?
I'll concur wrt the generality, flexibility, and power of XML as a data encoding. Considering comments on the ancestor thread, though, I'll also observe that the generality and flexibility are Not Your Friends if situations require encodings to be distinguished. The processing rules in X.690 that define DER relative to BER are expressed there within three pages (admittedly, excluding the cross-ref to X.680 for tag ordering); even though they may imply underlying complexity in implementation, their complexity in specification and concept seems vastly simpler than the issues that arise with XML canonicalization. --jl -Original Message- From: Stephane Bortzmeyer [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 07, 2006 3:51 AM To: Hallam-Baker, Phillip Cc: ietf@ietf.org Subject: Schema languages for XML (Was: Best practice for data encoding? On Tue, Jun 06, 2006 at 09:50:22AM -0700, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote a message of 42 lines which said: At this point XML is not a bad choice for data encoding. +1 The problem in XML is that XML Schema was botched and in particular namespaces and composition are botched. I think this could be fixed, perhaps. There are other schema languages than the bloated W3C Schema. The most common is RelaxNG (http://www.relaxng.org/). In the IETF land, while RFC 3730 and 3981 unfortunately use W3C Schema, RFC 4287 uses RelaxNG. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Schema languages for XML (Was: Best practice for data encoding?
Title: RE: Schema languages for XML (Was: Best practice for data encoding? I would suggest that the problems with canonicalization in both cases stem from the fact that it was an afterthought. The original description of DER was a single paragraph. If iso required implementations before a standard was agreed I don't think it would have passed in that form, they would have used the indefinite length encoding for structures. Xml canonicalization went sour after people started to use namespace prefixed data. This caused the namespace scheme which is poorly designed to cross over into the data space. I would suggest that the best approach for data encoding today would be to make use of the xml infoset but think twice about using the xml encoding. -Original Message- From: Linn, John [mailto:[EMAIL PROTECTED]] Sent: Wed Jun 07 05:19:44 2006 To: Stephane Bortzmeyer; Hallam-Baker, Phillip Cc: ietf@ietf.org Subject: RE: Schema languages for XML (Was: Best practice for data encoding? I'll concur wrt the generality, flexibility, and power of XML as a data encoding. Considering comments on the ancestor thread, though, I'll also observe that the generality and flexibility are Not Your Friends if situations require encodings to be distinguished. The processing rules in X.690 that define DER relative to BER are expressed there within three pages (admittedly, excluding the cross-ref to X.680 for tag ordering); even though they may imply underlying complexity in implementation, their complexity in specification and concept seems vastly simpler than the issues that arise with XML canonicalization. --jl -Original Message- From: Stephane Bortzmeyer [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 07, 2006 3:51 AM To: Hallam-Baker, Phillip Cc: ietf@ietf.org Subject: Schema languages for XML (Was: Best practice for data encoding? On Tue, Jun 06, 2006 at 09:50:22AM -0700, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote a message of 42 lines which said: At this point XML is not a bad choice for data encoding. +1 The problem in XML is that XML Schema was botched and in particular namespaces and composition are botched. I think this could be fixed, perhaps. There are other schema languages than the bloated W3C Schema. The most common is RelaxNG (http://www.relaxng.org/). In the IETF land, while RFC 3730 and 3981 unfortunately use W3C Schema, RFC 4287 uses RelaxNG. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
Theodore Tso wrote: On Mon, Jun 05, 2006 at 08:21:29PM -0400, Steven M. Bellovin wrote: More precisely -- when something is sufficiently complex, it's inherently bug-prone. That is indeed a good reason to push back on a design. The question to ask is whether the *problem* is inherently complex -- when the complexity of the solution significanlty exceeds the inherent complexity of the problem, you've probably made a mistake. When the problem itself is sufficiently complex, it's fair to ask if it should be solved. Remember point (3) of RFC 1925. One of the complaints about ASN.1 is that it is an enabler of complexity. It becomes easy to specify some arbitrarily complex data structures, and very often with that complexity comes all sorts of problems. To be fair, though, the same thing can be said of XML (and I'm not a big fan of a number of specifications utilizing XML either, for the same reason), and ultimately, you can write Fortran in any language. Ultimately, I have to agree with Steve, it's not the encoding, it's going to about asking the right questions at the design phase more than anything else, at least as far as the protocol is concerned. As far as implementation issues, it is true that ASN.1 doesn't map particularly well to standard programming types commonly in use, although if you constrain the types used in the protocol that issue can be relative easily avoided (so would using XML, or a simpler ad-hoc encoding scheme). I don't think there is any right answer here, although my personal feelings about ASN.1 can still be summed up in the aphorism, Friends don't let friends use ASN.1, but that's mostly due to psychic scars caused by my having to deal with the Kerboers v5 protocol's use of ASN.1, and the feature and design bloat that resulted from it. Here's my down in the trenches observations about XML and to some degree ASN.1. XML gives me the ability to djinn up a scheme really quickly with as much or as little elegance as needed. If nothing else, XML quite rapidly allows me to hack up intpreters that would otherwise been another parsed text file casuality residing in /etc. And most likely, if they could read the previous text encoding, the XML is about as transparent. This is a very nice feature as it allows common parsers, leads to common practices about how to lay out simple things, and common understanding about what is right and wrong. So, for my standpoint XML is no more ad hoc parsers!, which is good from a complexity standpoint, especially when you look at how spare the base syntax is. That said, anything that requires nested structure, etc XML is probably just as inadequate as anything else. And who should be surprised? Don't blame the Legos that they self-assemble into a rocket ship. One big plus about XML is that I can just _read_ it and hack up pretty printers in trivial time. ASN.1 that is, um, abstract necessasrily needs to go through a translation layer which IMO is never as fun and convenient -- and absolutely discourages dilitante hacking (when is the last time you fiddled with an XML file vs. the last time you fiddled with ANS.1? never for the latter?). I guess that what part of what this devolves into is who we're writing these protocols/schemes for: machines or people? That, I think, is a huge false dilemma. We clearly are writing things for _both_ (the executors and the maintainers) ; the only question in my mind is whether an easy for human to maintain encoding is too inefficient on the wire for its task. In some cases it clearly is, but those cases are becoming the outliers -- especially at app layer -- as the march of memory and bandwidth plods on. So with all of these wars, the sexy overblown hype (yes of course XML can solve world hunger!) often eclipses the routine and mundane tasks of writing and maintaining a system (golly, I need a config file for this, it's been a while since I wrote a really crappy parser. woo woo!). C'est la vie. Mike ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Wed Jun 7 15:37:28 2006, Michael Thomas wrote: I guess that what part of what this devolves into is who we're writing these protocols/schemes for: machines or people? That, I think, is a huge false dilemma. We clearly are writing things for _both_ (the executors and the maintainers) ; the only question in my mind is whether an easy for human to maintain encoding is too inefficient on the wire for its task. In some cases it clearly is, but those cases are becoming the outliers -- especially at app layer -- as the march of memory and bandwidth plods on. I think it's worth noting that nobody is preventing you from using XML over a compressed channel, which tends to increase XML's efficiency rather sharply. Compression also tends to make you look differently at protocol issues, because the repetitive, inefficient, protocol forms often compress equally well to, or even better than, a better structure - and they're also usually easier to handle. Wire efficiency, for the most part, needs to take place by avoiding the transmission of information entirely, rather than trying to second-guess the compression. As a rule, if you're moving strings around, or eliminating duplicates, within a single send of your protocol, you're probably wasting your time. In general, you want to be avoiding sending information at all. The only reason you have for worrying about representation for wire efficiency is if resource shortage prevents you from compression entirely - bear in mind this implies that resource shortage has already prevented you from encryption - generally a bad thing. As an example, IMAP and ACAP streams compress by around 70% on my client - and that's trying to be bandwidth efficient in its protocol usage. I've seen figures of 85% talked about quite seriously, too. So in answer to the original question, I'd say that the current best practise for data encoding has to be RFC1952, Deflate - beyond that, it really doesn't matter all that much. As an aside, whilst XMPP does provide a pure XML protocol, most protocols use XML as a payload, and use other forms to exchange protocol messages. Dave. -- Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED] - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/ - http://dave.cridland.net/ Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On 7-jun-2006, at 22:38, Dave Cridland wrote: I think it's worth noting that nobody is preventing you from using XML over a compressed channel, which tends to increase XML's efficiency rather sharply. [...] Wire efficiency, for the most part, needs to take place by avoiding the transmission of information entirely, rather than trying to second-guess the compression. Obviously adding compression doesn't help processing efficiency. I've long harbored the suspicion that a large percentage of the cycles in today's fast CPUs are burned up parsing various types of text. Does anyone know of any processing efficiency comparisons between binary and text based protocols? As an example, IMAP and ACAP streams compress by around 70% on my client - and that's trying to be bandwidth efficient in its protocol usage. I've seen figures of 85% talked about quite seriously, too. And you think that's a good thing? Now I understand why it takes me up to 10 minutes to download a handful of 1 - 2 kbyte emails with IMAP (+SSL) over a 9600 bps GSM data connection. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
Iljitsch van Beijnum wrote: I was wondering: What is considered best practice for encoding data in protocols within the IETF's purview? One should always think about what one needs and choose the appropriate solution to the task. Of course sometimes it's hard to take into account what level of performance one would need out of a protocol implementation. RAM is considerably cheaper now than it was twenty years ago, and so one approach in protocol design would be to define multiple encodings as they are required. So, if you don't think performance is crucial but toolset reuse is for an RPC-based approach, perhaps XML is a good start, and if you need to optimize later, perhaps consider something more compact like XDR. As to whether ASN.1 was a good choice or a bad choice for SNMP, there never was an argument. It was THE ONLY CHOICE. All three protocols (CMIP, SGMP, HEMP) under consideration made use of it. Nobody seriously considered anything else due to the practical limits of the time. Is it still a reasonable approach? I think a strong argument could be made that some sort of textual representation is necessary in order to satisfy more casual uses and to accommodate tool sets that are more broadly utilized, but that doesn't mean that we should do away with ASN.1, archaic as it may seem. Eliot ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
On Mon, 5 Jun 2006, David Harrington wrote: CERTR Advisory CA-2001-18 Multiple Vulnerabilities in Several Implementations of the Lightweight Directory Access Protocol (LDAP) Vulnerability Note VU#428230 Multiple vulnerabilities in S/MIME implementations Oh yes, I forgot those were ASN.1 too. Tony. -- f.a.n.finch [EMAIL PROTECTED] http://dotat.at/ FORTIES CROMARTY FORTH TYNE DOGGER: VARIABLE 3 OR 4. MAINLY FAIR. MODERATE OR GOOD. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Mon, 5 Jun 2006, Steven M. Bellovin wrote: On Mon, 5 Jun 2006 16:06:28 -0700, Randy Presuhn [EMAIL PROTECTED] wrote: I'm curious, too, about the claim that this has resulted in security problems. Could someone elaborate? See http://www.cert.org/advisories/CA-2002-03.html ASN.1 implementation bugs have also caused security problems for SSL, Kerberos, ISAKMP, and probably others. These bugs are also not due to shared code history: they turn up again and again. Are there any other binary protocols that can be usefully compared with ASN.1's security history? Tony. -- f.a.n.finch [EMAIL PROTECTED] http://dotat.at/ THE MULL OF GALLOWAY TO MULL OF KINTYRE INCLUDING THE FIRTH OF CLYDE AND THE NORTH CHANNEL: VARIABLE 2 OR 3 WITH AFTERNOON ONSHORE SEA BREEZES. FAIR VISIBILITY: MODERATE OR GOOD WITH MIST OR FOG PATCHES SEA STATE: SMOOTH OR SLIGHT. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] More precisely -- when something is sufficiently complex, it's inherently bug-prone. That is indeed a good reason to push back on a design. The question to ask is whether the *problem* is inherently complex -- when the complexity of the solution significanlty exceeds the inherent complexity of the problem, you've probably made a mistake. When the problem itself is sufficiently complex, it's fair to ask if it should be solved. Remember point (3) of RFC 1925. I think that the term 'too complex' is probably meaningless and is in any case an inaccurate explanation for the miseries of ASN.1 which are rather different to the ones normally given. People push back on protocols all the time for a range of reasons. Too complex is a typically vague and unhelpful pushback. I note that all too often the complexity of deployed protocols is the result of efforts of people to reduce the complexity of the system to the point where it was insufficient for the intended task. Having had Tony Hoare as my college tutor at Oxford I have experienced a particularly uncompromising approach to complexity. However the point Hoare makes repeatedly is as simple as possible but no simpler. In the case of ASN.1 I think the real problem is not the 'complexity' of the encoding, it's the mismatch between the encoding used and the data types supported in the languages that are used to implement ASN.1 systems. DER encoding is most certainly a painful disaster, certainly DER encoding is completely unnecessary in X.509 which is empirically demonstrated by the fact that the Internet worked just fine without anyone noticing (ok one person noticed) in the days when CAs issued BER encoded certs. The real pain in ASN.1 comes from having to deal with piles of unnecessary serialization/deserialization code. The real power of S-Expressions is not the simplicity of the S-Expression. Dealing with large structures in S-Expressions is a tedious pain to put it mildly. The code to deal with serialization/deserialization is avoided because the data structures are introspective (at least in Symbolics LISP which is the only one I ever used). If ASN.1 had been done right it would have been possible to generate the serialization/deserialization code automatically from native data structures in the way that .NET allows XML serialization classes to be generated automatically. Unfortunately ASN.1 went into committee as a good idea and came out a camel. And all of the attempts to remove the hump since have merely created new humps. At this point XML is not a bad choice for data encoding. I would like to see the baroque SGML legacy abandonded (in particular eliminate DTDs entirely). XML is not a perfect choice but is is not a bad one and done right can be efficient. The problem in XML is that XML Schema was botched and in particular namespaces and composition are botched. I think this could be fixed, perhaps. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Tue, 6 Jun 2006 09:50:22 -0700, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: Having had Tony Hoare as my college tutor at Oxford I have experienced a particularly uncompromising approach to complexity. However the point Hoare makes repeatedly is as simple as possible but no simpler. Hoare has been a great influence on my thinking, too. I particularly recall his Turing Award lecture, where he noted: There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. (In that same lecture, he warned of security issues from not checking array bounds at run-time, but that's a separate rant.) --Steven M. Bellovin, http://www.cs.columbia.edu/~smb ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
ASN.1 implementation bugs have also caused security problems for SSL, Kerberos, ISAKMP, and probably others. These bugs are also not due to shared code history: they turn up again and again. Are there any other binary protocols that can be usefully compared with ASN.1's security history? There is indeed a lot of complexity in ASN.1. At the root, ASN.1 is a basic T-L-V encoding format, similar to what we see in multiple IETF protocols. However, for various reasons, ASN.1 includes a number of encoding choices that are as many occasions for programming errors: * In most TLV applications, the type field is a simple number varying from 0 to 254, with the number 255 reserved for extension. In ASN.1, the type field is structured as a combination of scope and number, and the number itself can be encoded on a variable number of bytes. * In most TLV applications, the length field is a simple number. In ASN.1, the length field is variable length. * In most TLV applications, structures are delineated by the length field. In ASN.1, structures can be delineated either by the length field or by an end of structure mark. * In most TLV applications, a string is encoded as just a string of bytes. In ASN.1, it can be encoded either that way, or as a sequence of chunks, which conceivably could themselves be encoded as chunks. * Most applications tolerate some variations in component ordering and deal with optional components, but ASN.1 pushes that to an art form. * I don't remember exactly how many alphabet sets ASN.1 does support, but it is way more than your average application. * Most applications encode integer values by reference to classic computer encodings, e.g. signed/unsigned char, short, long, long-long. ASN.1 introduces its own encoding, which is variable length. * One can argue that SNMP makes a creative use of the Object Identifier data type of ASN.1, but one also has to wonder why this data type is specified in the language in the first place. Then there are MACRO definitions, VALUE specifications, and an even more complex definition of extension capabilities. In short, ASN.1 is vastly more complex that the average TLV encoding. The higher rate of errors is thus not entirely surprising. -- Christian Huitema ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
On Tuesday, June 06, 2006 10:33:30 AM -0700 Christian Huitema [EMAIL PROTECTED] wrote: ASN.1 implementation bugs have also caused security problems for SSL, Kerberos, ISAKMP, and probably others. These bugs are also not due to shared code history: they turn up again and again. Are there any other binary protocols that can be usefully compared with ASN.1's security history? There is indeed a lot of complexity in ASN.1. At the root, ASN.1 is a basic T-L-V encoding format, similar to what we see in multiple IETF protocols. However, for various reasons, ASN.1 includes a number of encoding choices that are as many occasions for programming errors: To be pedantic, ASN.1 is what its name says it is - a notation. The properties you go on to describe are those of BER; other encodings have other properties. For example, DER adds constraints such that there are no longer multiple ways to encode the same thing. Besides simplifying implementations, this also makes it possible to compare cryptographic hashes of DER-encoded data; X.509 and Kerberos both take advantage of this property. PER eliminates many of the tags and lengths, and my understanding is that there is a set of rules for encoding ASN.1 data in XML. * One can argue that SNMP makes a creative use of the Object Identifier data type of ASN.1, but one also has to wonder why this data type is specified in the language in the first place. Well, I can't speak to the orignial motivation, but under BER, encoding the same sort of heirarchical name as a SEQUENCE OF INTEGER takes about three times the space the primitive type does, assuming most of the values are small. Then there are MACRO definitions, VALUE specifications, and an even more complex definition of extension capabilities. In short, ASN.1 is vastly more complex that the average TLV encoding. The higher rate of errors is thus not entirely surprising. There certainly is a rich set of features (read: complexity) in both the ASN.1 syntax and its commonly-used encodings. However, I don't think that's the real source of the problem. There seem to be a lot of ad-hoc ASN.1 decoders out there that people have written as part of some other protocol, instead of using an off-the-shelf compiler/encoder/decoder; this duplication of effort and code is bound to lead to errors, especially when it is done with insufficient attention to the details of what is indeed a fairly complex encoding. I also suspect that a number of the problems found have nothing to do with decoding ASN.1 specifically, and would have come up had other approaches been used. For example, several of the problems cited earlier were buffer overflows found in code written well before the true impact of that problem was well understood. These problems are more likely to be noticed and/or create vulnerabilities when they occur in things like ASN.1 decoders, or XDR decoders, or XML parsers, because that code tends to deal directly with untrusted input. -- Jeffrey T. Hutzelman (N3NHS) [EMAIL PROTECTED] Sr. Research Systems Programmer School of Computer Science - Research Computing Facility Carnegie Mellon University - Pittsburgh, PA ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] Having had Tony Hoare as my college tutor at Oxford I have experienced a particularly uncompromising approach to complexity. However the point Hoare makes repeatedly is as simple as possible but no simpler. Hoare has been a great influence on my thinking, too. I particularly recall his Turing Award lecture, where he noted: There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. (In that same lecture, he warned of security issues from not checking array bounds at run-time, but that's a separate rant.) I think it is a useful illustration of my point. Dennis Ritchie: Bounds checking is too complex to put in the runtime library. Tony Hoare: Bounds checking is too complex to attempt to perform by hand. I think that time has proved Hoare and Algol 60 right on this point. It is much better to have a single point of control in a system and a single place where checking can take place than make it the responsibility of the programer to hand code checking throughout their code. Equally the idea of unifying control and discovery information in the DNS may sound complex but the result has the potential to be considerably simpler than the numerous ad hoc management schemes that have grown up as a result of the lack of a coherent infrastructure. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
From: Jeffrey Hutzelman [mailto:[EMAIL PROTECTED] It's a subset, in fact. All DER is valid BER. It is an illogical subset defined in a throwaway comment in an obscure part of the spec. A subset is not necessarily a reduction in complexity. Let us imagine that we have a spec that allows you to choose between three modes of transport to get to school: walk, bicycle or unicycle. The unicycle option does not create any real difficulty for you since you simply ignore it and use one of the sensible options. And it is no more complex to support since a bicycle track can also be used by unicyclists. Now the same derranged loons who wrote the DER encoding decide that your Distinguished transport option is going to be unicycle, that is all you are going to be allowed to do. Suddenly the option which you could ignore as illogical and irrelevant has become an obligation. And that is what DER encoding does. Since you don't appear to have coded DER encoding I suggest you try it before further pontification. If you have coded it and don't understand how so many people get it wrong then you are beyond hope. BTW its not just the use of definite length tags, there is also a requirement to sort the content of sets which is a real fun thing to do. Particularly when the spec fails to explain what is actually to be sorted. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
On Tuesday, June 06, 2006 11:55:15 AM -0700 Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: From: Jeffrey Hutzelman [mailto:[EMAIL PROTECTED] To be pedantic, ASN.1 is what its name says it is - a notation. The properties you go on to describe are those of BER; other encodings have other properties. For example, DER adds constraints such that there are no longer multiple ways to encode the same thing. Besides simplifying implementations, Hate to bust your bubble here but DER encoding is vastly more complex than any other encoding. It is certainly not simpler than the BER encoding. It's a subset, in fact. All DER is valid BER. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
From: Jeffrey Hutzelman [mailto:[EMAIL PROTECTED] To be pedantic, ASN.1 is what its name says it is - a notation. The properties you go on to describe are those of BER; other encodings have other properties. For example, DER adds constraints such that there are no longer multiple ways to encode the same thing. Besides simplifying implementations, Hate to bust your bubble here but DER encoding is vastly more complex than any other encoding. It is certainly not simpler than the BER encoding. The reason for this is that in DER encoding each chunck of data is encoded using the definite length encoding in which each data structure is preceded by a length descriptor. In addition to being much more troublesome to decode than a simple end of structure market such as ), }, or / it is considerably more complex to code because the length descriptor is itself a variable length integer. The upshot of this is that it is impossible to write a LR(1) encoder for DER encoding. In order to encode the structure you have to recursively size each substructure before the first byte of the enclosing structure can be emitted. this also makes it possible to compare cryptographic hashes of DER-encoded data; X.509 and Kerberos both take advantage of this property. I am not aware of any X.509 system that relies on this property. If there is such a system they certainly are not making use of the ability to reduce a DER encoded structure to X.500 data and reassemble it. Almost none of the PKIX applications have done this properly until recently. X.509 certs are exchanged as opaque binary blobs by all rational applications. Then there are MACRO definitions, VALUE specifications, and an even more complex definition of extension capabilities. In short, ASN.1 is vastly more complex that the average TLV encoding. The higher rate of errors is thus not entirely surprising. There certainly is a rich set of features (read: complexity) in both the ASN.1 syntax and its commonly-used encodings. However, I don't think that's the real source of the problem. There seem to be a lot of ad-hoc ASN.1 decoders out there that people have written as part of some other protocol, instead of using an off-the-shelf compiler/encoder/decoder; That's because most of the off the shelf compiler/encoders have historically been trash. Where do you think all the bungled DER implementations came from? I also suspect that a number of the problems found have nothing to do with decoding ASN.1 specifically, and would have come up had other approaches been used. For example, several of the problems cited earlier were buffer overflows found in code written well before the true impact of that problem was well understood. Before the 1960s? I very much doubt it. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On 6/6/06, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: At this point XML is not a bad choice for data encoding. I would like to see the baroque SGML legacy abandonded (in particular eliminate DTDs entirely). XML is not a perfect choice but is is not a bad one and done right can be efficient. JSON http://www.json.org seems like a better fit for the use cases discussed here. You get better data types, retain convenient ASCII notation for unicode characters, and lose lots of XML baggage. draft-crockford-jsonorg-json-04.txt is in the RFC queue, headed for informational status. -- Robert Sayre ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
Some ASN.1 compilers have had some bugs, however, this does not to indicate that ASN.1 is bug prone. Just the opposite: Once you have a secure compiler, you can be assured that certain kinds of bugs don't exist. Further, in the few cases of the bugs that were found, once the bug is fixed in the ASN.1 compiler, the application just needs to be relinked (or given new shared library) with the new generated runtime. And any other application which used a vulnerable runtime, but for which the vulnerability was unknown, is also fixed. So, users of compiled runtime benefit from usage experience by the entire group. Building tools that make trustable runtimes is a good approach to certain classes of security problems. You can't get this by hand written protocol encode/decode layers. --Dean On Mon, 5 Jun 2006, Iljitsch van Beijnum wrote: I was wondering: What is considered best practice for encoding data in protocols within the IETF's purview? Traditionally, many protocols use text but obviously this doesn't really work for protocols that carry a lot of data, because text lacks structure so it's hard to parse. XML and the like are text- based and structured, but take huge amounts of code and processing time to parse (especially on embedded CPUs that lack the more advanced branch prediction available in the fastest desktop and server CPUs). Then there is the ASN.1 route, but as we can see with SNMP, this also requires lots of code and is very (security) bug prone. Many protocols use hand crafted binary formats, which has the advantage that the format can be tailored to the application but it requires custom code for every protocol and it's hard to get right, especially the simplicity/extendability tradeoff. The ideal way to encode data would be a standard that requires relatively little code to implement, makes for small files/packets that are fast to process but remains reasonably extensible. So, any thoughts? Binary XML, maybe? ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf -- Av8 Internet Prepared to pay a premium for better service? www.av8.net faster, more reliable, better service 617 344 9000 ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Best practice for data encoding?
I was wondering: What is considered best practice for encoding data in protocols within the IETF's purview? Traditionally, many protocols use text but obviously this doesn't really work for protocols that carry a lot of data, because text lacks structure so it's hard to parse. XML and the like are text- based and structured, but take huge amounts of code and processing time to parse (especially on embedded CPUs that lack the more advanced branch prediction available in the fastest desktop and server CPUs). Then there is the ASN.1 route, but as we can see with SNMP, this also requires lots of code and is very (security) bug prone. Many protocols use hand crafted binary formats, which has the advantage that the format can be tailored to the application but it requires custom code for every protocol and it's hard to get right, especially the simplicity/extendability tradeoff. The ideal way to encode data would be a standard that requires relatively little code to implement, makes for small files/packets that are fast to process but remains reasonably extensible. So, any thoughts? Binary XML, maybe? ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Jun 05 2006, at 23:43 , Iljitsch van Beijnum wrote: What is considered best practice for encoding data in protocols within the IETF's purview? The best practice is to choose an encoding that is appropriate for the protocol being designed. (There is no single answer.) Maybe you can be more specific in your question, then maybe people can be more specific in their answers? Gruesse, Carsten ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
Hi - From: Iljitsch van Beijnum [EMAIL PROTECTED] To: IETF Discussion ietf@ietf.org Sent: Monday, June 05, 2006 2:43 PM Subject: Best practice for data encoding? ... Then there is the ASN.1 route, but as we can see with SNMP, this also requires lots of code and is very (security) bug prone. ... Having worked on SNMP toolkits for a long time, I'd have to strenuously disagree. In my experience, the ASN.1/BER-related code is a rather small portion of an SNMP protocol engine. The code related to the SNMP protocol's quirks, such as Get-Next/Bulk processing and the mangling of index values into object identifiers (which is far removed from how ASN.1 intended object identifiers to be used) require much more code and complexity. I'm curious, too, about the claim that this has resulted in security problems. Could someone elaborate? Randy ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Mon, 5 Jun 2006 16:06:28 -0700, Randy Presuhn [EMAIL PROTECTED] wrote: Hi - From: Iljitsch van Beijnum [EMAIL PROTECTED] To: IETF Discussion ietf@ietf.org Sent: Monday, June 05, 2006 2:43 PM Subject: Best practice for data encoding? ... Then there is the ASN.1 route, but as we can see with SNMP, this also requires lots of code and is very (security) bug prone. ... Having worked on SNMP toolkits for a long time, I'd have to strenuously disagree. In my experience, the ASN.1/BER-related code is a rather small portion of an SNMP protocol engine. The code related to the SNMP protocol's quirks, such as Get-Next/Bulk processing and the mangling of index values into object identifiers (which is far removed from how ASN.1 intended object identifiers to be used) require much more code and complexity. Yah -- measure first, then optimize. I'm curious, too, about the claim that this has resulted in security problems. Could someone elaborate? See http://www.cert.org/advisories/CA-2002-03.html --Steven M. Bellovin, http://www.cs.columbia.edu/~smb ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
Hi - From: Steven M. Bellovin [EMAIL PROTECTED] To: Randy Presuhn [EMAIL PROTECTED] Cc: ietf@ietf.org Sent: Monday, June 05, 2006 4:09 PM Subject: Re: Best practice for data encoding? ... I'm curious, too, about the claim that this has resulted in security problems. Could someone elaborate? See http://www.cert.org/advisories/CA-2002-03.html ... I remember that exercise. I don't see it as convincing evidence that the use of ASN.1 was the cause of the problems some implementations had; I doubt that someone who had buffer overflow problems when processing a BER-encoded octet string (where the length is explicitly encoded) would have had any better results with XML or any other representation. Randy ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
Hi The security problems identified in http://www.cert.org/advisories/CA-2002-03.html Multiple Vulnerabilities in Many Implementations of the Simple Network Management Protocol (SNMP) are not caused by the protocol choice to use ASN.1, but by vendors incorrectly implementing the protocol (which was made worse by vendors using toolkits that had the problems). If Multiple Vulnerabilities in Implementations were used to condemn the encoding methods of protocols that have been incorrectly implemented, then we would have to condemn an awful lot of IETF protocols as being very (security) bug prone: CERT Advisory CA-2003-26 Multiple Vulnerabilities in SSL/TLS Implementations US-CERT Vulnerability Note VU#459371 Multiple IPsec implementations do not adequately validate CERTR Advisory CA-2001-18 Multiple Vulnerabilities in Several Implementations of the Lightweight Directory Access Protocol (LDAP) CERT Advisory CA-2002-36 Multiple Vulnerabilities in SSH Implementations CERTR Advisory CA-2003-06 Multiple vulnerabilities in implementations of the Session Initiation Protocol (SIP) Vulnerability Note VU#428230 Multiple vulnerabilities in S/MIME implementations Vulnerability Note VU#955777 Multiple vulnerabilities in DNS implementations Vulnerability Note VU#226364 Multiple vulnerabilities in Internet Key Exchange (IKE) version 1 implementations CERTR Advisory CA-2002-06 Vulnerabilities in Various Implementations of the RADIUS Protocol CERTR Advisory CA-2000-06 Multiple Buffer Overflows in Kerberos Authenticated Services Vulnerability Note VU#836088 Multiple vendors' email content/virus scanners do not adequately check message/partial MIME entities David Harrington [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] -Original Message- From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] Sent: Monday, June 05, 2006 7:10 PM To: Randy Presuhn Cc: ietf@ietf.org Subject: Re: Best practice for data encoding? On Mon, 5 Jun 2006 16:06:28 -0700, Randy Presuhn [EMAIL PROTECTED] wrote: Hi - From: Iljitsch van Beijnum [EMAIL PROTECTED] To: IETF Discussion ietf@ietf.org Sent: Monday, June 05, 2006 2:43 PM Subject: Best practice for data encoding? ... Then there is the ASN.1 route, but as we can see with SNMP, this also requires lots of code and is very (security) bug prone. ... Having worked on SNMP toolkits for a long time, I'd have to strenuously disagree. In my experience, the ASN.1/BER-related code is a rather small portion of an SNMP protocol engine. The code related to the SNMP protocol's quirks, such as Get-Next/Bulk processing and the mangling of index values into object identifiers (which is far removed from how ASN.1 intended object identifiers to be used) require much more code and complexity. Yah -- measure first, then optimize. I'm curious, too, about the claim that this has resulted in security problems. Could someone elaborate? See http://www.cert.org/advisories/CA-2002-03.html --Steven M. Bellovin, http://www.cs.columbia.edu/~smb ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Mon, 5 Jun 2006 20:07:24 -0400, David Harrington [EMAIL PROTECTED] wrote: Hi The security problems identified in http://www.cert.org/advisories/CA-2002-03.html Multiple Vulnerabilities in Many Implementations of the Simple Network Management Protocol (SNMP) are not caused by the protocol choice to use ASN.1, but by vendors incorrectly implementing the protocol (which was made worse by vendors using toolkits that had the problems). If Multiple Vulnerabilities in Implementations were used to condemn the encoding methods of protocols that have been incorrectly implemented, then we would have to condemn an awful lot of IETF protocols as being very (security) bug prone: Works for me More precisely -- when something is sufficiently complex, it's inherently bug-prone. That is indeed a good reason to push back on a design. The question to ask is whether the *problem* is inherently complex -- when the complexity of the solution significanlty exceeds the inherent complexity of the problem, you've probably made a mistake. When the problem itself is sufficiently complex, it's fair to ask if it should be solved. Remember point (3) of RFC 1925. I'll note that a number of the protocols you cite were indeed criticized *during the design process* as too complex. The objectors were overruled. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
I agree that complexity breeds bug-prone implementations. I wasn't around then; did anybody push back on SNMPv1 as being too complex? http://www.cert.org/advisories/CA-2002-03.html is mainly about SNMPv1 implementations. ;-) dbh -Original Message- From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] Sent: Monday, June 05, 2006 8:21 PM To: David Harrington Cc: [EMAIL PROTECTED]; ietf@ietf.org Subject: Re: Best practice for data encoding? On Mon, 5 Jun 2006 20:07:24 -0400, David Harrington [EMAIL PROTECTED] wrote: Hi The security problems identified in http://www.cert.org/advisories/CA-2002-03.html Multiple Vulnerabilities in Many Implementations of the Simple Network Management Protocol (SNMP) are not caused by the protocol choice to use ASN.1, but by vendors incorrectly implementing the protocol (which was made worse by vendors using toolkits that had the problems). If Multiple Vulnerabilities in Implementations were used to condemn the encoding methods of protocols that have been incorrectly implemented, then we would have to condemn an awful lot of IETF protocols as being very (security) bug prone: Works for me More precisely -- when something is sufficiently complex, it's inherently bug-prone. That is indeed a good reason to push back on a design. The question to ask is whether the *problem* is inherently complex -- when the complexity of the solution significanlty exceeds the inherent complexity of the problem, you've probably made a mistake. When the problem itself is sufficiently complex, it's fair to ask if it should be solved. Remember point (3) of RFC 1925. I'll note that a number of the protocols you cite were indeed criticized *during the design process* as too complex. The objectors were overruled. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
Let's not forget that the S in SNMP stands for simple. Simple is a relative term. In this case, SNMP is simple when compared to CMIP. -Original Message- From: David Harrington [mailto:[EMAIL PROTECTED] Sent: Monday, June 05, 2006 5:33 PM I agree that complexity breeds bug-prone implementations. I wasn't around then; did anybody push back on SNMPv1 as being too complex? http://www.cert.org/advisories/CA-2002-03.html is mainly about SNMPv1 implementations. ;-) dbh ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
On Monday, June 05, 2006 08:32:58 PM -0400 David Harrington [EMAIL PROTECTED] wrote: I agree that complexity breeds bug-prone implementations. I wasn't around then; did anybody push back on SNMPv1 as being too complex? I don't think anyone pushed back on SNMPv1 as being inherently insecure. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
David Harrington wrote: I agree that complexity breeds bug-prone implementations. I wasn't around then; did anybody push back on SNMPv1 as being too complex? http://www.cert.org/advisories/CA-2002-03.html is mainly about SNMPv1 implementations. ;-) I wasn't there to push back, but when I got asked to implement it back then the Simple part such seemed like something between a fib and the Big Lie. Did we really need ASN.1 to define a debug peek/poke-like protocol? Mike dbh -Original Message- From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] Sent: Monday, June 05, 2006 8:21 PM To: David Harrington Cc: [EMAIL PROTECTED]; ietf@ietf.org Subject: Re: Best practice for data encoding? On Mon, 5 Jun 2006 20:07:24 -0400, David Harrington [EMAIL PROTECTED] wrote: Hi The security problems identified in http://www.cert.org/advisories/CA-2002-03.html Multiple Vulnerabilities in Many Implementations of the Simple Network Management Protocol (SNMP) are not caused by the protocol choice to use ASN.1, but by vendors incorrectly implementing the protocol (which was made worse by vendors using toolkits that had the problems). If Multiple Vulnerabilities in Implementations were used to condemn the encoding methods of protocols that have been incorrectly implemented, then we would have to condemn an awful lot of IETF protocols as being very (security) bug prone: Works for me More precisely -- when something is sufficiently complex, it's inherently bug-prone. That is indeed a good reason to push back on a design. The question to ask is whether the *problem* is inherently complex -- when the complexity of the solution significanlty exceeds the inherent complexity of the problem, you've probably made a mistake. When the problem itself is sufficiently complex, it's fair to ask if it should be solved. Remember point (3) of RFC 1925. I'll note that a number of the protocols you cite were indeed criticized *during the design process* as too complex. The objectors were overruled. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: Best practice for data encoding?
Steven, I'm not sure what you mean by saying that a problem that is highly complex should not be solved (or, at least, that we should consider not solving it). That seems like a cop-out. Minimally, every problem we've ever faced, we've tried to solve (where we refers to us weak-kneed Homo Sapiens) - no matter how hard it was to do so - and I like to think that is the right thing to do. In fairness, I am reasonably sure that point 3 in RFC 1925 refers to making a complex solution work, even if a simpler answer might be found, simply because enough people want that solution. It does not - IMO - rule out solving complex problems using as simple a solution as possible, however complex that might be. -- Eric -- -Original Message- -- From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] -- Sent: Monday, June 05, 2006 8:21 PM -- To: David Harrington -- Cc: ietf@ietf.org -- Subject: Re: Best practice for data encoding? -- -- On Mon, 5 Jun 2006 20:07:24 -0400, David Harrington -- [EMAIL PROTECTED] wrote: -- -- Hi -- -- The security problems identified in -- http://www.cert.org/advisories/CA-2002-03.html Multiple -- Vulnerabilities in Many Implementations of the Simple Network -- Management Protocol (SNMP) are not caused by the -- protocol choice to -- use ASN.1, but by vendors incorrectly implementing the -- protocol (which -- was made worse by vendors using toolkits that had the problems). -- -- If Multiple Vulnerabilities in Implementations were -- used to condemn -- the encoding methods of protocols that have been incorrectly -- implemented, then we would have to condemn an awful lot of IETF -- protocols as being very (security) bug prone: -- -- -- Works for me -- -- More precisely -- when something is sufficiently complex, -- it's inherently -- bug-prone. That is indeed a good reason to push back on a -- design. The -- question to ask is whether the *problem* is inherently -- complex -- when the -- complexity of the solution significanlty exceeds the -- inherent complexity of -- the problem, you've probably made a mistake. When the -- problem itself is -- sufficiently complex, it's fair to ask if it should be -- solved. Remember -- point (3) of RFC 1925. -- -- I'll note that a number of the protocols you cite were -- indeed criticized -- *during the design process* as too complex. The objectors -- were overruled. -- -- --Steven M. Bellovin, http://www.cs.columbia.edu/~smb -- -- ___ -- Ietf mailing list -- Ietf@ietf.org -- https://www1.ietf.org/mailman/listinfo/ietf -- ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
Hi - From: Fleischman, Eric [EMAIL PROTECTED] To: David Harrington [EMAIL PROTECTED]; Steven M. Bellovin [EMAIL PROTECTED] Cc: ietf@ietf.org Sent: Monday, June 05, 2006 5:41 PM Subject: RE: Best practice for data encoding? Let's not forget that the S in SNMP stands for simple. Simple is a relative term. In this case, SNMP is simple when compared to CMIP We implemented both protocols. The core protocol engine for SNMP ended up being larger and more complex than that for CMIP. The complexity of GetNext, along with OID mangling, accounted for much of the difference. The S in SNMP was half marketing and half politics, and had very little to do with actual implementation or use. Randy ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Best practice for data encoding?
On Mon, 5 Jun 2006 20:59:32 -0400 , Gray, Eric [EMAIL PROTECTED] wrote: Steven, I'm not sure what you mean by saying that a problem that is highly complex should not be solved (or, at least, that we should consider not solving it). That seems like a cop-out. Minimally, every problem we've ever faced, we've tried to solve (where we refers to us weak-kneed Homo Sapiens) - no matter how hard it was to do so - and I like to think that is the right thing to do. In fairness, I am reasonably sure that point 3 in RFC 1925 refers to making a complex solution work, even if a simpler answer might be found, simply because enough people want that solution. It does not - IMO - rule out solving complex problems using as simple a solution as possible, however complex that might be. I meant exactly what I said. The reason to avoid certain solutions is that you'll then behave as if the problem is really solved, with bad consequences if you're wrong -- and for some problems, you probably are wrong. Read David Parnas' Software Aspects of Strategic Defense Systems (available at http://klabs.org/richcontent/software_content/papers/parnas_acm_85.pdf); also consider the historical record on why the US and the USSR signed a treaty banning most anti-missile systems, and in particular why the existence of such systems made the existing nuclear deterrent standoff unstable. Note carefully that I didn't say we shouldn't do research on how to solve things. But doing research and declaring that we know how to do something are two very different things. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf