Re: Best practice for data encoding?

2006-06-14 Thread Joe Touch


Harald Alvestrand wrote:
 Ted Faber wrote:
 
 On Mon, Jun 12, 2006 at 02:11:19PM +0200, Iljitsch van Beijnum wrote:
  

 The problem with text is that you have to walk through memory and 
 compare characters. A LOT.
   

 That's not where your code spends its time.

 Run gprof(1).  The majority of time your code spends is spent doing the
 2 integer divides per text to integer conversion and in strtoimax
 (called by fscanf).
 Multiplying or dividing is the worst thing you can do on a CPU in
 general.
 Note that CPUs are different; some multiply faster than others, compared
 to the rest of the HW.
 
 And if you really need to, you can optimize... a multiplication by 10,
 for instance, can be done by two left shifts and an addition (a*10 = a
 3 + a1);

You might not end up with the same set of condition codes at the end,
though...

Joe




signature.asc
Description: OpenPGP digital signature
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-13 Thread Harald Alvestrand

Ted Faber wrote:


On Mon, Jun 12, 2006 at 02:11:19PM +0200, Iljitsch van Beijnum wrote:
 

The problem with text is that you have to walk through memory and  
compare characters. A LOT.
   



That's not where your code spends its time.

Run gprof(1).  The majority of time your code spends is spent doing the
2 integer divides per text to integer conversion and in strtoimax
(called by fscanf). 


Multiplying or dividing is the worst thing you can do on a CPU in
general. 

Note that CPUs are different; some multiply faster than others, compared 
to the rest of the HW.


And if you really need to, you can optimize... a multiplication by 10, 
for instance, can be done by two left shifts and an addition (a*10 = a 
3 + a1); I have no idea why strtoimax would do divisions, but I 
haven't written decimal-number parsers for a very long time; I think 
Knuth had 3 different ones in Seminumerical algorithms, but that was a 
VERY long time ago...


Harald




___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-13 Thread Masataka Ohta
Iljitsch van Beijnum wrote:

 in your original question to the list, you didn't quite make clear  
 that your question was with respect to BGP-style transfer of large- 
 scale routing information.

 I didn't want to limit the scope of the discussion to one particular  
 type of protocol.

Could you stop this abstract nonsense, then?

Many old protocols designed for old, slow computers such as
telnet/ftp/smtp, does use text. Their replies are structured
text mostly with 3 digits. That is, properly designed text
based protocols are fine. Note that all you need is EBNF, not
XML.

Masataka Ohta



___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-13 Thread Ted Faber
On Tue, Jun 13, 2006 at 09:23:42AM +0200, Harald Alvestrand wrote:
 Ted Faber wrote:
 Multiplying or dividing is the worst thing you can do on a CPU in
 general. 
 
 Note that CPUs are different; some multiply faster than others, compared 
 to the rest of the HW.

I am not a CPU designer, but my understanding is that the multiply
instructions are usually the ones that are most complex single
instructions, and in some sense determine the processor speed.  (All you
VAX guys can sit down; I know about your polynomial evaluation
instructions, I'm just speaking generally). 

Trying to extrapolate great truths about data encoding from 10 lines of
C will not be fruitful.

Yeah, this is mostly off topic for networking, but as Joe Touch
demonstrated, sometimes it matters:

http://www.isi.edu/touch/pubs/sigcomm95.html

-- 
Ted Faber
http://www.isi.edu/~faber   PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG


pgphLh6KObuJn.pgp
Description: PGP signature
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-12 Thread Iljitsch van Beijnum

On 8-jun-2006, at 11:13, Iljitsch van Beijnum wrote:

But it's a clear trade-off between efficient representation and  
complexity for the developer - we've had this debate once already  
in this thread, I think. The more complex the representation, the  
harder it is to code correctly and debug.


Simple/complex isn't the same as text/binary. I'm pretty sure a  
programmer armed with nothing more than a standard issue C compiler  
and the basic libraries that come with it, would have a harder time  
parsing XML than something like the DNS protocol. But my point was  
that the resulting code would be slower. I know it's very old- 
fashioned to even consider such things, but there are places where  
performance is important beyond just philosophical objections  
against bloat.


A little post script to this discussion: I wrote a few small test  
programs in C to evaluate the performance of reading integers from a  
text file using stdio.h versus doing the same with direct read()s  
from a binary file. The difference is between two and three orders of  
magnitude. See http://ablog.apress.com/?p=1146


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-12 Thread Carsten Bormann
A little post script to this discussion: I wrote a few small test  
programs in C to evaluate the performance of reading integers from  
a text file using stdio.h versus doing the same with direct read() 
s from a binary file. The difference is between two and three  
orders of magnitude. See http://ablog.apress.com/?p=1146


Iljitsch,

in your original question to the list, you didn't quite make clear  
that your question was with respect to BGP-style transfer of large- 
scale routing information.


Right now, you seem to focus on decoding performance.  How much of  
the CPU time spent for BGP is decoding?
Does the CPU time spent for the entirety of BGP even matter*?  If  
yes, can a good data structure/encoding help with the *overall* problem?


The results from your test programs are not at all surprising.
Of course, a hand-coded loop where all data already is in the right  
form (data type, byte order, number of bits), no decisions need to be  
made, and you even know the number of data items beforehand, is going  
to be faster than calling the generic, pretty much neglected,  
parameterized, tired library routine fscanf that doesn't get much use  
outside textbooks.
(The read anomaly is caused by read(2) being an expensive system  
call; all other cases use a form of buffering to reduce the number of  
system calls.)
What this example shows nicely is that performance issues are non- 
trivial, and, yes, you do want to run measurements, but at the system  
level and not at the level of test cases that have little or no  
relationship to the performance of the real system.


If you really care about the performance of text-based protocols, you  
cannot ignore modern tools like Ragel.
If, having used them, you still manage to find the text processing  
overhead in your profiling data, I'd like to hear from you.


Still, for BGP, a binary protocol encoding may be a better fit  
because routing tables are so much about bits and prefixes and other  
numeric information already designed to be used in binary protocol  
encodings.
Also, it may be easier to reduce both data rate and processing by  
exploiting more of the structure of the BGP routing information.
(I.e., to make it redundantly clear, I would probably choose binary  
here, but not for the reasons given in your blog post.)


Gruesse, Carsten

*) Yes, that's a trick question to elicit responses :-)


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-12 Thread Iljitsch van Beijnum

On 12-jun-2006, at 13:31, Carsten Bormann wrote:

in your original question to the list, you didn't quite make clear  
that your question was with respect to BGP-style transfer of large- 
scale routing information.


I didn't want to limit the scope of the discussion to one particular  
type of protocol.


Right now, you seem to focus on decoding performance.  How much of  
the CPU time spent for BGP is decoding?
Does the CPU time spent for the entirety of BGP even matter*?  If  
yes, can a good data structure/encoding help with the *overall*  
problem?


I can't answer the first question, because the only BGP we have uses  
binary. But I'm pretty confident that doing the same thing using a  
text based encoding isn't going to do us any favors performance wise.



The results from your test programs are not at all surprising.
Of course, a hand-coded loop where all data already is in the right  
form (data type, byte order, number of bits), no decisions need to  
be made, and you even know the number of data items beforehand, is  
going to be faster than calling the generic, pretty much neglected,  
parameterized, tired library routine fscanf that doesn't get much  
use outside textbooks.


Byte order stuff and such isn't much of an issue compared to the time  
required for memory access. And I guess fscanf could be be a slow  
implementation, but this is just reading a value from a line of text,  
no hunting for tags and such that's required in HTML or XML. Also,  
the performance gap is just so huge, I don't think the details matter  
too much.


What this example shows nicely is that performance issues are non- 
trivial, and, yes, you do want to run measurements, but at the  
system level and not at the level of test cases that have little  
or no relationship to the performance of the real system.


Sure, but how are you going to do that kind of testing when designing  
a protocol? Creating two implementations just to see which variation  
is faster would be a good idea but I don't really see that happening...


If you really care about the performance of text-based protocols,  
you cannot ignore modern tools like Ragel.


Don't know it.

If, having used them, you still manage to find the text processing  
overhead in your profiling data, I'd like to hear from you.


The problem with text is that you have to walk through memory and  
compare characters. A LOT. This is pretty much the worst thing you  
can do to a modern CPU: you don't use the logical and hardly the  
physical word width, and all those compares are hard to predict so  
you get massive numbers of incorrectly predicted branches.


But I guess this discussion can go on forever...

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-12 Thread Ted Faber
On Mon, Jun 12, 2006 at 02:11:19PM +0200, Iljitsch van Beijnum wrote:
 The problem with text is that you have to walk through memory and  
 compare characters. A LOT.

That's not where your code spends its time.

Run gprof(1).  The majority of time your code spends is spent doing the
2 integer divides per text to integer conversion and in strtoimax
(called by fscanf). 

Multiplying or dividing is the worst thing you can do on a CPU in
general.  

-- 
Ted Faber
http://www.isi.edu/~faber   PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG


pgpKs4KugT1BN.pgp
Description: PGP signature
___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-09 Thread Brian E Carpenter

So how about concluding that there is no single
right answer to Iljitsch's question, but there may
be scope for defining considerations for the choice
of data encoding?

Brian

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-08 Thread Dave Cridland

On Wed Jun  7 23:19:29 2006, Iljitsch van Beijnum wrote:

On 7-jun-2006, at 22:38, Dave Cridland wrote:

I think it's worth noting that nobody is preventing you from using 
 XML over a compressed channel, which tends to increase XML's  
efficiency rather sharply.


[...]

Wire efficiency, for the most part, needs to take place by 
avoiding  the transmission of information entirely, rather than 
trying to  second-guess the compression.


Obviously adding compression doesn't help processing efficiency.


It does if you're doing encryption, unless the protocol itself is 
almost entirely uncompressible. Compression is a lot cheaper than 
encryption, and the more you compress, the less you have to encrypt.



I've long harbored the suspicion that a large percentage of the  
cycles in today's fast CPUs are burned up parsing various types of  
text. Does anyone know of any processing efficiency comparisons  
between binary and text based protocols?



But it's a clear trade-off between efficient representation and 
complexity for the developer - we've had this debate once already in 
this thread, I think. The more complex the representation, the harder 
it is to code correctly and debug.


I'm not saying there's no place for binary protocols, but I am saying 
that to go for a binary representation, you have to be working in a 
highly resource constrained environment where both bandwidth and CPU 
usage is highly limited.



As an example, IMAP and ACAP streams compress by around 70% on my  
client - and that's trying to be bandwidth efficient in its  
protocol usage. I've seen figures of 85% talked about quite  
seriously, too.


And you think that's a good thing?


No, it demonstrates that even protocols which have a reputation for 
chatter actually turn out to be efficient on the wire 
post-compression.


 Now I understand why it takes me  up to 10 minutes to download a 
handful of 1 - 2 kbyte emails with  IMAP (+SSL) over a 9600 bps GSM 
data connection.


I can write an efficient IMAP client, and give it away for free, but 
I can't force you to use it. :-)


I can read a new message inside a minute over 9600bps with 1 second 
latency, without even using a local disk cache - including fetching 
configuration over ACAP to begin with. My INBOX contains 24,033 
messages currently - obviously I'm not downloading them all, and this 
is emulated bandwidth and latency, thus not entirely accurate.


IMAP can be extremely efficient; if it's taking you 10 minutes to 
read a handful of emails, you're using the wrong client, the wrong 
server, or quite possibly both. With Lemonade features coming into 
mainstream implementations, you really ought to be looking at your 
systems if you're using GSM data. Compression is not really one of 
these, oddly enough, but TLS deflate ought to be available to most 
clients and servers by now anyway.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-08 Thread Iljitsch van Beijnum

On 8-jun-2006, at 9:50, Dave Cridland wrote:

I've long harbored the suspicion that a large percentage of the   
cycles in today's fast CPUs are burned up parsing various types  
of  text. Does anyone know of any processing efficiency  
comparisons  between binary and text based protocols?


But it's a clear trade-off between efficient representation and  
complexity for the developer - we've had this debate once already  
in this thread, I think. The more complex the representation, the  
harder it is to code correctly and debug.


Simple/complex isn't the same as text/binary. I'm pretty sure a  
programmer armed with nothing more than a standard issue C compiler  
and the basic libraries that come with it, would have a harder time  
parsing XML than something like the DNS protocol. But my point was  
that the resulting code would be slower. I know it's very old- 
fashioned to even consider such things, but there are places where  
performance is important beyond just philosophical objections against  
bloat.


 Now I understand why it takes me  up to 10 minutes to download a  
handful of 1 - 2 kbyte emails with  IMAP (+SSL) over a 9600 bps  
GSM data connection.


I can write an efficient IMAP client, and give it away for free,  
but I can't force you to use it. :-)


It's not that I'm a huge fan of Apple's Mail but it's the only GUI  
mail application I've found so far that I can actually work with  
without the irresistible urge to chew through the mouse cable and  
dust off my VT420 terminal... (Which allows me to read my mail with  
Pine without any trouble over 9600 bps.)


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Schema languages for XML (Was: Best practice for data encoding?

2006-06-07 Thread Stephane Bortzmeyer
On Tue, Jun 06, 2006 at 09:50:22AM -0700,
 Hallam-Baker, Phillip [EMAIL PROTECTED] wrote 
 a message of 42 lines which said:

 At this point XML is not a bad choice for data encoding.

+1

 The problem in XML is that XML Schema was botched and in particular
 namespaces and composition are botched. I think this could be fixed,
 perhaps.

There are other schema languages than the bloated W3C Schema. The most
common is RelaxNG (http://www.relaxng.org/).

In the IETF land, while RFC 3730 and 3981 unfortunately use W3C
Schema, RFC 4287 uses RelaxNG.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-07 Thread Theodore Tso
On Mon, Jun 05, 2006 at 08:21:29PM -0400, Steven M. Bellovin wrote:
 
 More precisely -- when something is sufficiently complex, it's inherently
 bug-prone.  That is indeed a good reason to push back on a design.  The
 question to ask is whether the *problem* is inherently complex -- when the
 complexity of the solution significanlty exceeds the inherent complexity of
 the problem, you've probably made a mistake.  When the problem itself is
 sufficiently complex, it's fair to ask if it should be solved.  Remember
 point (3) of RFC 1925.

One of the complaints about ASN.1 is that it is an enabler of
complexity.  It becomes easy to specify some arbitrarily complex data
structures, and very often with that complexity comes all sorts of
problems.  To be fair, though, the same thing can be said of XML (and
I'm not a big fan of a number of specifications utilizing XML either,
for the same reason), and ultimately, you can write Fortran in any
language.

Ultimately, I have to agree with Steve, it's not the encoding, it's
going to about asking the right questions at the design phase more
than anything else, at least as far as the protocol is concerned.

As far as implementation issues, it is true that ASN.1 doesn't map
particularly well to standard programming types commonly in use,
although if you constrain the types used in the protocol that issue
can be relative easily avoided (so would using XML, or a simpler
ad-hoc encoding scheme).  I don't think there is any right answer
here, although my personal feelings about ASN.1 can still be summed up
in the aphorism, Friends don't let friends use ASN.1, but that's
mostly due to psychic scars caused by my having to deal with the
Kerboers v5 protocol's use of ASN.1, and the feature and design bloat
that resulted from it.

- Ted


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Schema languages for XML (Was: Best practice for data encoding?

2006-06-07 Thread Linn, John
I'll concur wrt the generality, flexibility, and power of XML as a data
encoding.  Considering comments on the ancestor thread, though, I'll
also observe that the generality and flexibility are Not Your Friends if
situations require encodings to be distinguished.  The processing rules
in X.690 that define DER relative to BER are expressed there within
three pages (admittedly, excluding the cross-ref to X.680 for tag
ordering); even though they may imply underlying complexity in
implementation, their complexity in specification and concept seems
vastly simpler than the issues that arise with XML canonicalization.

--jl

-Original Message-
From: Stephane Bortzmeyer [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 07, 2006 3:51 AM
To: Hallam-Baker, Phillip
Cc: ietf@ietf.org
Subject: Schema languages for XML (Was: Best practice for data encoding?

On Tue, Jun 06, 2006 at 09:50:22AM -0700,
 Hallam-Baker, Phillip [EMAIL PROTECTED] wrote 
 a message of 42 lines which said:

 At this point XML is not a bad choice for data encoding.

+1

 The problem in XML is that XML Schema was botched and in particular
 namespaces and composition are botched. I think this could be fixed,
 perhaps.

There are other schema languages than the bloated W3C Schema. The most
common is RelaxNG (http://www.relaxng.org/).

In the IETF land, while RFC 3730 and 3981 unfortunately use W3C
Schema, RFC 4287 uses RelaxNG.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Schema languages for XML (Was: Best practice for data encoding?

2006-06-07 Thread Hallam-Baker, Phillip
Title: RE: Schema languages for XML (Was: Best practice for data encoding?






I would suggest that the problems with canonicalization in both cases stem from the fact that it was an afterthought. The original description of DER was a single paragraph. If iso required implementations before a standard was agreed I don't think it would have passed in that form, they would have used the indefinite length encoding for structures.

Xml canonicalization went sour after people started to use namespace prefixed data. This caused the namespace scheme which is poorly designed to cross over into the data space.

I would suggest that the best approach for data encoding today would be to make use of the xml infoset but think twice about using the xml encoding.


-Original Message-
From:  Linn, John [mailto:[EMAIL PROTECTED]]
Sent: Wed Jun 07 05:19:44 2006
To: Stephane Bortzmeyer; Hallam-Baker, Phillip
Cc: ietf@ietf.org
Subject: RE: Schema languages for XML (Was: Best practice for data encoding?

I'll concur wrt the generality, flexibility, and power of XML as a data
encoding. Considering comments on the ancestor thread, though, I'll
also observe that the generality and flexibility are Not Your Friends if
situations require encodings to be distinguished. The processing rules
in X.690 that define DER relative to BER are expressed there within
three pages (admittedly, excluding the cross-ref to X.680 for tag
ordering); even though they may imply underlying complexity in
implementation, their complexity in specification and concept seems
vastly simpler than the issues that arise with XML canonicalization.

--jl

-Original Message-
From: Stephane Bortzmeyer [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 07, 2006 3:51 AM
To: Hallam-Baker, Phillip
Cc: ietf@ietf.org
Subject: Schema languages for XML (Was: Best practice for data encoding?

On Tue, Jun 06, 2006 at 09:50:22AM -0700,
Hallam-Baker, Phillip [EMAIL PROTECTED] wrote
a message of 42 lines which said:

 At this point XML is not a bad choice for data encoding.

+1

 The problem in XML is that XML Schema was botched and in particular
 namespaces and composition are botched. I think this could be fixed,
 perhaps.

There are other schema languages than the bloated W3C Schema. The most
common is RelaxNG (http://www.relaxng.org/).

In the IETF land, while RFC 3730 and 3981 unfortunately use W3C
Schema, RFC 4287 uses RelaxNG.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf





___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-07 Thread Michael Thomas

Theodore Tso wrote:


On Mon, Jun 05, 2006 at 08:21:29PM -0400, Steven M. Bellovin wrote:
 


More precisely -- when something is sufficiently complex, it's inherently
bug-prone.  That is indeed a good reason to push back on a design.  The
question to ask is whether the *problem* is inherently complex -- when the
complexity of the solution significanlty exceeds the inherent complexity of
the problem, you've probably made a mistake.  When the problem itself is
sufficiently complex, it's fair to ask if it should be solved.  Remember
point (3) of RFC 1925.
   



One of the complaints about ASN.1 is that it is an enabler of
complexity.  It becomes easy to specify some arbitrarily complex data
structures, and very often with that complexity comes all sorts of
problems.  To be fair, though, the same thing can be said of XML (and
I'm not a big fan of a number of specifications utilizing XML either,
for the same reason), and ultimately, you can write Fortran in any
language.

Ultimately, I have to agree with Steve, it's not the encoding, it's
going to about asking the right questions at the design phase more
than anything else, at least as far as the protocol is concerned.

As far as implementation issues, it is true that ASN.1 doesn't map
particularly well to standard programming types commonly in use,
although if you constrain the types used in the protocol that issue
can be relative easily avoided (so would using XML, or a simpler
ad-hoc encoding scheme).  I don't think there is any right answer
here, although my personal feelings about ASN.1 can still be summed up
in the aphorism, Friends don't let friends use ASN.1, but that's
mostly due to psychic scars caused by my having to deal with the
Kerboers v5 protocol's use of ASN.1, and the feature and design bloat
that resulted from it.
 

Here's my down in the trenches observations about XML and to some degree 
ASN.1.
XML gives me the ability to djinn up a scheme really quickly with as 
much or as
little elegance as needed. If nothing else, XML quite rapidly allows me 
to hack up
intpreters that would otherwise been another parsed text file casuality 
residing in /etc.
And most likely, if they could read the previous text encoding, the XML 
is about as
transparent. This is a very nice feature as it allows common parsers, 
leads to common
practices about how to lay out simple things, and common understanding 
about what
is right and wrong. So, for my standpoint XML is no more ad hoc 
parsers!, which
is good from a complexity standpoint, especially when you look at how 
spare the

base syntax is.

That said, anything that requires nested structure, etc XML is probably 
just as

inadequate as anything else. And who should be surprised? Don't blame the
Legos that they self-assemble into a rocket ship. One big plus about XML is
that I can just _read_ it and hack up pretty printers in trivial time. 
ASN.1 that is,

um, abstract necessasrily needs to go through a translation layer which IMO
is never as fun and convenient -- and absolutely discourages dilitante 
hacking (when
is the last time you fiddled with an XML file vs. the last time you 
fiddled with

ANS.1? never for the latter?).

I guess that what part of what this devolves into is who we're writing these
protocols/schemes for: machines or people? That, I think, is a huge false
dilemma. We clearly are writing things for _both_ (the executors and the
maintainers) ; the only question in my mind is whether an easy for human
to maintain encoding is too inefficient on the wire for its task. In some
cases it clearly is, but those cases are becoming the outliers -- especially
at app layer -- as the march of memory and bandwidth plods on.

So with all of these wars, the sexy overblown hype (yes of course XML can
solve world hunger!) often eclipses the routine and mundane tasks of 
writing
and maintaining a system (golly, I need a config file for this, it's 
been a while

since I wrote a really crappy parser. woo woo!). C'est la vie.

  Mike

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-07 Thread Dave Cridland

On Wed Jun  7 15:37:28 2006, Michael Thomas wrote:
I guess that what part of what this devolves into is who we're 
writing these
protocols/schemes for: machines or people? That, I think, is a huge 
false
dilemma. We clearly are writing things for _both_ (the executors 
and the
maintainers) ; the only question in my mind is whether an easy for 
human
to maintain encoding is too inefficient on the wire for its task. 
In some
cases it clearly is, but those cases are becoming the outliers -- 
especially

at app layer -- as the march of memory and bandwidth plods on.


I think it's worth noting that nobody is preventing you from using 
XML over a compressed channel, which tends to increase XML's 
efficiency rather sharply.


Compression also tends to make you look differently at protocol 
issues, because the repetitive, inefficient, protocol forms often 
compress equally well to, or even better than, a better structure - 
and they're also usually easier to handle.


Wire efficiency, for the most part, needs to take place by avoiding 
the transmission of information entirely, rather than trying to 
second-guess the compression.


As a rule, if you're moving strings around, or eliminating 
duplicates, within a single send of your protocol, you're probably 
wasting your time. In general, you want to be avoiding sending 
information at all.


The only reason you have for worrying about representation for wire 
efficiency is if resource shortage prevents you from compression 
entirely - bear in mind this implies that resource shortage has 
already prevented you from encryption - generally a bad thing.


As an example, IMAP and ACAP streams compress by around 70% on my 
client - and that's trying to be bandwidth efficient in its protocol 
usage. I've seen figures of 85% talked about quite seriously, too.


So in answer to the original question, I'd say that the current best 
practise for data encoding has to be RFC1952, Deflate - beyond that, 
it really doesn't matter all that much.


As an aside, whilst XMPP does provide a pure XML protocol, most 
protocols use XML as a payload, and use other forms to exchange 
protocol messages.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-07 Thread Iljitsch van Beijnum

On 7-jun-2006, at 22:38, Dave Cridland wrote:

I think it's worth noting that nobody is preventing you from using  
XML over a compressed channel, which tends to increase XML's  
efficiency rather sharply.


[...]

Wire efficiency, for the most part, needs to take place by avoiding  
the transmission of information entirely, rather than trying to  
second-guess the compression.


Obviously adding compression doesn't help processing efficiency.

I've long harbored the suspicion that a large percentage of the  
cycles in today's fast CPUs are burned up parsing various types of  
text. Does anyone know of any processing efficiency comparisons  
between binary and text based protocols?


As an example, IMAP and ACAP streams compress by around 70% on my  
client - and that's trying to be bandwidth efficient in its  
protocol usage. I've seen figures of 85% talked about quite  
seriously, too.


And you think that's a good thing? Now I understand why it takes me  
up to 10 minutes to download a handful of 1 - 2 kbyte emails with  
IMAP (+SSL) over a 9600 bps GSM data connection.


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-06 Thread Eliot Lear
Iljitsch van Beijnum wrote:
 I was wondering:

 What is considered best practice for encoding data in protocols within
 the IETF's purview?
One should always think about what one needs and choose the appropriate
solution to the task.  Of course sometimes it's hard to take into
account what level of performance one would need out of a protocol
implementation.  RAM is considerably cheaper now than it was twenty
years ago, and so one approach in protocol design would be to define
multiple encodings as they are required.  So, if you don't think
performance is crucial but toolset reuse is for an RPC-based approach,
perhaps XML is a good start, and if you need to optimize later, perhaps
consider something more compact like XDR.

As to whether ASN.1 was a good choice or a bad choice for SNMP, there
never was an argument.  It was THE ONLY CHOICE.  All three protocols
(CMIP, SGMP, HEMP)  under consideration made use of it.  Nobody
seriously considered anything else due to the practical limits of the
time.  Is it still a reasonable approach?  I think a strong argument
could be made that some sort of textual representation is necessary in
order to satisfy more casual uses and to accommodate tool sets that are
more broadly utilized, but that doesn't mean that we should do away with
ASN.1, archaic as it may seem.

Eliot

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-06 Thread Tony Finch
On Mon, 5 Jun 2006, David Harrington wrote:

  CERTR Advisory CA-2001-18 Multiple Vulnerabilities in Several
 Implementations of the Lightweight Directory Access Protocol (LDAP)

 Vulnerability Note VU#428230 Multiple vulnerabilities in S/MIME
 implementations

Oh yes, I forgot those were ASN.1 too.

Tony.
-- 
f.a.n.finch  [EMAIL PROTECTED]  http://dotat.at/
FORTIES CROMARTY FORTH TYNE DOGGER: VARIABLE 3 OR 4. MAINLY FAIR. MODERATE OR
GOOD.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-06 Thread Tony Finch
On Mon, 5 Jun 2006, Steven M. Bellovin wrote:
 On Mon, 5 Jun 2006 16:06:28 -0700, Randy Presuhn
 [EMAIL PROTECTED] wrote:
 
  I'm curious, too, about the claim that this has resulted in security
  problems.  Could someone elaborate?

 See http://www.cert.org/advisories/CA-2002-03.html

ASN.1 implementation bugs have also caused security problems for SSL,
Kerberos, ISAKMP, and probably others. These bugs are also not due to
shared code history: they turn up again and again.

Are there any other binary protocols that can be usefully compared with
ASN.1's security history?

Tony.
-- 
f.a.n.finch  [EMAIL PROTECTED]  http://dotat.at/
THE MULL OF GALLOWAY TO MULL OF KINTYRE INCLUDING THE FIRTH OF CLYDE AND THE
NORTH CHANNEL: VARIABLE 2 OR 3 WITH AFTERNOON ONSHORE SEA BREEZES. FAIR
VISIBILITY: MODERATE OR GOOD WITH MIST OR FOG PATCHES SEA STATE: SMOOTH OR
SLIGHT.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-06 Thread Hallam-Baker, Phillip

 From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] 

 More precisely -- when something is sufficiently complex, 
 it's inherently bug-prone.  That is indeed a good reason to 
 push back on a design.  The question to ask is whether the 
 *problem* is inherently complex -- when the complexity of the 
 solution significanlty exceeds the inherent complexity of the 
 problem, you've probably made a mistake.  When the problem 
 itself is sufficiently complex, it's fair to ask if it should 
 be solved.  Remember point (3) of RFC 1925.

I think that the term 'too complex' is probably meaningless and is in any case 
an inaccurate explanation for the miseries of ASN.1 which are rather different 
to the ones normally given.

People push back on protocols all the time for a range of reasons. Too complex 
is a typically vague and unhelpful pushback. I note that all too often the 
complexity of deployed protocols is the result of efforts of people to reduce 
the complexity of the system to the point where it was insufficient for the 
intended task.

Having had Tony Hoare as my college tutor at Oxford I have experienced a 
particularly uncompromising approach to complexity. However the point Hoare 
makes repeatedly is as simple as possible but no simpler.


In the case of ASN.1 I think the real problem is not the 'complexity' of the 
encoding, it's the mismatch between the encoding used and the data types 
supported in the languages that are used to implement ASN.1 systems.

DER encoding is most certainly a painful disaster, certainly DER encoding is 
completely unnecessary in X.509 which is empirically demonstrated by the fact 
that the Internet worked just fine without anyone noticing (ok one person 
noticed) in the days when CAs issued BER encoded certs. 

The real pain in ASN.1 comes from having to deal with piles of unnecessary 
serialization/deserialization code.


The real power of S-Expressions is not the simplicity of the S-Expression. 
Dealing with large structures in S-Expressions is a tedious pain to put it 
mildly. The code to deal with serialization/deserialization is avoided because 
the data structures are introspective (at least in Symbolics LISP which is the 
only one I ever used).

If ASN.1 had been done right it would have been possible to generate the 
serialization/deserialization code automatically from native data structures in 
the way that .NET allows XML serialization classes to be generated 
automatically.


Unfortunately ASN.1 went into committee as a good idea and came out a camel. 
And all of the attempts to remove the hump since have merely created new humps.


At this point XML is not a bad choice for data encoding. I would like to see 
the baroque SGML legacy abandonded (in particular eliminate DTDs entirely). XML 
is not a perfect choice but is is not a bad one and done right can be efficient.

The problem in XML is that XML Schema was botched and in particular namespaces 
and composition are botched. I think this could be fixed, perhaps.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-06 Thread Steven M. Bellovin
On Tue, 6 Jun 2006 09:50:22 -0700, Hallam-Baker, Phillip
[EMAIL PROTECTED] wrote:


 
 Having had Tony Hoare as my college tutor at Oxford I have experienced a
 particularly uncompromising approach to complexity. However the point
 Hoare makes repeatedly is as simple as possible but no simpler.

Hoare has been a great influence on my thinking, too.  I particularly
recall his Turing Award lecture, where he noted:

There are two ways of constructing a software design: One way is
to make it so simple that there are obviously no deficiencies, and
the other way is to make it so complicated that there are no
obvious deficiencies. The first method is far more difficult.

(In that same lecture, he warned of security issues from not checking
array bounds at run-time, but that's a separate rant.)

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-06 Thread Christian Huitema
 ASN.1 implementation bugs have also caused security problems for SSL,
 Kerberos, ISAKMP, and probably others. These bugs are also not due to
 shared code history: they turn up again and again.
 
 Are there any other binary protocols that can be usefully compared
with
 ASN.1's security history?

There is indeed a lot of complexity in ASN.1. At the root, ASN.1 is a
basic T-L-V encoding format, similar to what we see in multiple IETF
protocols. However, for various reasons, ASN.1 includes a number of
encoding choices that are as many occasions for programming errors:

* In most TLV applications, the type field is a simple number varying
from 0 to 254, with the number 255 reserved for extension. In ASN.1, the
type field is structured as a combination of scope and number, and the
number itself can be encoded on a variable number of bytes.
* In most TLV applications, the length field is a simple number. In
ASN.1, the length field is variable length.
* In most TLV applications, structures are delineated by the length
field. In ASN.1, structures can be delineated either by the length field
or by an end of structure mark.
* In most TLV applications, a string is encoded as just a string of
bytes. In ASN.1, it can be encoded either that way, or as a sequence of
chunks, which conceivably could themselves be encoded as chunks.
* Most applications tolerate some variations in component ordering and
deal with optional components, but ASN.1 pushes that to an art form.
* I don't remember exactly how many alphabet sets ASN.1 does support,
but it is way more than your average application.
* Most applications encode integer values by reference to classic
computer encodings, e.g. signed/unsigned char, short, long, long-long.
ASN.1 introduces its own encoding, which is variable length.
* One can argue that SNMP makes a creative use of the Object
Identifier data type of ASN.1, but one also has to wonder why this data
type is specified in the language in the first place.

Then there are MACRO definitions, VALUE specifications, and an even more
complex definition of extension capabilities. In short, ASN.1 is vastly
more complex that the average TLV encoding. The higher rate of errors is
thus not entirely surprising.

-- Christian Huitema


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-06 Thread Jeffrey Hutzelman



On Tuesday, June 06, 2006 10:33:30 AM -0700 Christian Huitema 
[EMAIL PROTECTED] wrote:



ASN.1 implementation bugs have also caused security problems for SSL,
Kerberos, ISAKMP, and probably others. These bugs are also not due to
shared code history: they turn up again and again.

Are there any other binary protocols that can be usefully compared

with

ASN.1's security history?


There is indeed a lot of complexity in ASN.1. At the root, ASN.1 is a
basic T-L-V encoding format, similar to what we see in multiple IETF
protocols. However, for various reasons, ASN.1 includes a number of
encoding choices that are as many occasions for programming errors:


To be pedantic, ASN.1 is what its name says it is - a notation.
The properties you go on to describe are those of BER; other encodings have 
other properties.  For example, DER adds constraints such that there are no 
longer multiple ways to encode the same thing.  Besides simplifying 
implementations, this also makes it possible to compare cryptographic 
hashes of DER-encoded data; X.509 and Kerberos both take advantage of this 
property.  PER eliminates many of the tags and lengths, and my 
understanding is that there is a set of rules for encoding ASN.1 data in 
XML.




* One can argue that SNMP makes a creative use of the Object
Identifier data type of ASN.1, but one also has to wonder why this data
type is specified in the language in the first place.


Well, I can't speak to the orignial motivation, but under BER, encoding the 
same sort of heirarchical name as a SEQUENCE OF INTEGER takes about three 
times the space the primitive type does, assuming most of the values are 
small.




Then there are MACRO definitions, VALUE specifications, and an even more
complex definition of extension capabilities. In short, ASN.1 is vastly
more complex that the average TLV encoding. The higher rate of errors is
thus not entirely surprising.


There certainly is a rich set of features (read: complexity) in both the 
ASN.1 syntax and its commonly-used encodings.  However, I don't think 
that's the real source of the problem.  There seem to be a lot of ad-hoc 
ASN.1 decoders out there that people have written as part of some other 
protocol, instead of using an off-the-shelf compiler/encoder/decoder; this 
duplication of effort and code is bound to lead to errors, especially when 
it is done with insufficient attention to the details of what is indeed a 
fairly complex encoding.


I also suspect that a number of the problems found have nothing to do with 
decoding ASN.1 specifically, and would have come up had other approaches 
been used.  For example, several of the problems cited earlier were buffer 
overflows found in code written well before the true impact of that problem 
was well understood.  These problems are more likely to be noticed and/or 
create vulnerabilities when they occur in things like ASN.1 decoders, or 
XDR decoders, or XML parsers, because that code tends to deal directly with 
untrusted input.


-- Jeffrey T. Hutzelman (N3NHS) [EMAIL PROTECTED]
  Sr. Research Systems Programmer
  School of Computer Science - Research Computing Facility
  Carnegie Mellon University - Pittsburgh, PA


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-06 Thread Hallam-Baker, Phillip

 From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] 

  Having had Tony Hoare as my college tutor at Oxford I have 
 experienced 
  a particularly uncompromising approach to complexity. However the 
  point Hoare makes repeatedly is as simple as possible but 
 no simpler.
 
 Hoare has been a great influence on my thinking, too.  I 
 particularly recall his Turing Award lecture, where he noted:
 
   There are two ways of constructing a software design: One way is
   to make it so simple that there are obviously no 
 deficiencies, and
   the other way is to make it so complicated that there are no
   obvious deficiencies. The first method is far more difficult.
 
 (In that same lecture, he warned of security issues from not 
 checking array bounds at run-time, but that's a separate rant.)

I think it is a useful illustration of my point.

Dennis Ritchie:
Bounds checking is too complex to put in the runtime library.

Tony Hoare:
Bounds checking is too complex to attempt to perform by hand.


I think that time has proved Hoare and Algol 60 right on this point. It is much 
better to have a single point of control in a system and a single place where 
checking can take place than make it the responsibility of the programer to 
hand code checking throughout their code.

Equally the idea of unifying control and discovery information in the DNS may 
sound complex but the result has the potential to be considerably simpler than 
the numerous ad hoc management schemes that have grown up as a result of the 
lack of a coherent infrastructure.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-06 Thread Hallam-Baker, Phillip

 From: Jeffrey Hutzelman [mailto:[EMAIL PROTECTED] 

 It's a subset, in fact.  All DER is valid BER.

It is an illogical subset defined in a throwaway comment in an obscure part of 
the spec.

A subset is not necessarily a reduction in complexity. Let us imagine that we 
have a spec that allows you to choose between three modes of transport to get 
to school: walk, bicycle or unicycle.

The unicycle option does not create any real difficulty for you since you 
simply ignore it and use one of the sensible options. And it is no more complex 
to support since a bicycle track can also be used by unicyclists.

Now the same derranged loons who wrote the DER encoding decide that your 
Distinguished transport option is going to be unicycle, that is all you are 
going to be allowed to do.

Suddenly the option which you could ignore as illogical and irrelevant has 
become an obligation. And that is what DER encoding does. 

Since you don't appear to have coded DER encoding I suggest you try it before 
further pontification. If you have coded it and don't understand how so many 
people get it wrong then you are beyond hope.

BTW its not just the use of definite length tags, there is also a requirement 
to sort the content of sets which is a real fun thing to do. Particularly when 
the spec fails to explain what is actually to be sorted.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-06 Thread Jeffrey Hutzelman



On Tuesday, June 06, 2006 11:55:15 AM -0700 Hallam-Baker, Phillip 
[EMAIL PROTECTED] wrote:





From: Jeffrey Hutzelman [mailto:[EMAIL PROTECTED]



To be pedantic, ASN.1 is what its name says it is - a notation.
The properties you go on to describe are those of BER; other
encodings have other properties.  For example, DER adds
constraints such that there are no longer multiple ways to
encode the same thing.  Besides simplifying implementations,


Hate to bust your bubble here but DER encoding is vastly more complex
than any other encoding. It is certainly not simpler than the BER
encoding.


It's a subset, in fact.  All DER is valid BER.


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-06 Thread Hallam-Baker, Phillip

 From: Jeffrey Hutzelman [mailto:[EMAIL PROTECTED] 

 To be pedantic, ASN.1 is what its name says it is - a notation.
 The properties you go on to describe are those of BER; other 
 encodings have other properties.  For example, DER adds 
 constraints such that there are no longer multiple ways to 
 encode the same thing.  Besides simplifying implementations, 

Hate to bust your bubble here but DER encoding is vastly more complex than any 
other encoding. It is certainly not simpler than the BER encoding.

The reason for this is that in DER encoding each chunck of data is encoded 
using the definite length encoding in which each data structure is preceded by 
a length descriptor. In addition to being much more troublesome to decode than 
a simple end of structure market such as ), }, or / it is considerably more 
complex to code because the length descriptor is itself a variable length 
integer.

The upshot of this is that it is impossible to write a LR(1) encoder for DER 
encoding. In order to encode the structure you have to recursively size each 
substructure before the first byte of the enclosing structure can be emitted.


 this also makes it possible to compare cryptographic hashes 
 of DER-encoded data; X.509 and Kerberos both take advantage 
 of this property. 

I am not aware of any X.509 system that relies on this property. If there is 
such a system they certainly are not making use of the ability to reduce a DER 
encoded structure to X.500 data and reassemble it. Almost none of the PKIX 
applications have done this properly until recently.

X.509 certs are exchanged as opaque binary blobs by all rational applications. 

  Then there are MACRO definitions, VALUE specifications, and an even 
  more complex definition of extension capabilities. In 
 short, ASN.1 is 
  vastly more complex that the average TLV encoding. The 
 higher rate of 
  errors is thus not entirely surprising.
 
 There certainly is a rich set of features (read: complexity) 
 in both the
 ASN.1 syntax and its commonly-used encodings.  However, I 
 don't think that's the real source of the problem.  There 
 seem to be a lot of ad-hoc
 ASN.1 decoders out there that people have written as part of 
 some other protocol, instead of using an off-the-shelf 
 compiler/encoder/decoder; 

That's because most of the off the shelf compiler/encoders have historically 
been trash.

Where do you think all the bungled DER implementations came from?

 I also suspect that a number of the problems found have 
 nothing to do with decoding ASN.1 specifically, and would 
 have come up had other approaches been used.  For example, 
 several of the problems cited earlier were buffer overflows 
 found in code written well before the true impact of that 
 problem was well understood.  

Before the 1960s? I very much doubt it.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-06 Thread Robert Sayre

On 6/6/06, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote:


At this point XML is not a bad choice for data encoding. I would like to see 
the baroque SGML
legacy abandonded (in particular eliminate DTDs entirely). XML is not a perfect 
choice but is is
not a bad one and done right can be efficient.


JSON http://www.json.org seems like a better fit for the use cases
discussed here. You get better data types, retain convenient ASCII
notation for unicode characters, and lose lots of XML baggage.

draft-crockford-jsonorg-json-04.txt is in the RFC queue, headed for
informational status.

--

Robert Sayre

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-06 Thread Dean Anderson
Some ASN.1 compilers have had some bugs, however, this does not to indicate that
ASN.1 is bug prone. Just the opposite: Once you have a secure compiler, you can
be assured that certain kinds of bugs don't exist.

Further, in the few cases of the bugs that were found, once the bug is fixed in
the ASN.1 compiler, the application just needs to be relinked (or given new
shared library) with the new generated runtime.  And any other application which
used a vulnerable runtime, but for which the vulnerability was unknown, is also
fixed.  So, users of compiled runtime benefit from usage experience by the
entire group.

Building tools that make trustable runtimes is a good approach to certain
classes of security problems. You can't get this by hand written protocol
encode/decode layers.

--Dean

On Mon, 5 Jun 2006, Iljitsch van Beijnum wrote:

 I was wondering:
 
 What is considered best practice for encoding data in protocols  
 within the IETF's purview?
 
 Traditionally, many protocols use text but obviously this doesn't  
 really work for protocols that carry a lot of data, because text  
 lacks structure so it's hard to parse. XML and the like are text- 
 based and structured, but take huge amounts of code and processing  
 time to parse (especially on embedded CPUs that lack the more  
 advanced branch prediction available in the fastest desktop and  
 server CPUs). Then there is the ASN.1 route, but as we can see with  
 SNMP, this also requires lots of code and is very (security) bug  
 prone. Many protocols use hand crafted binary formats, which has  
 the advantage that the format can be tailored to the application but  
 it requires custom code for every protocol and it's hard to get  
 right, especially the simplicity/extendability tradeoff.
 
 The ideal way to encode data would be a standard that requires  
 relatively little code to implement, makes for small files/packets  
 that are fast to process but remains reasonably extensible.
 
 So, any thoughts? Binary XML, maybe?
 
 ___
 Ietf mailing list
 Ietf@ietf.org
 https://www1.ietf.org/mailman/listinfo/ietf
 
 

-- 
Av8 Internet   Prepared to pay a premium for better service?
www.av8.net faster, more reliable, better service
617 344 9000   



___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Best practice for data encoding?

2006-06-05 Thread Iljitsch van Beijnum

I was wondering:

What is considered best practice for encoding data in protocols  
within the IETF's purview?


Traditionally, many protocols use text but obviously this doesn't  
really work for protocols that carry a lot of data, because text  
lacks structure so it's hard to parse. XML and the like are text- 
based and structured, but take huge amounts of code and processing  
time to parse (especially on embedded CPUs that lack the more  
advanced branch prediction available in the fastest desktop and  
server CPUs). Then there is the ASN.1 route, but as we can see with  
SNMP, this also requires lots of code and is very (security) bug  
prone. Many protocols use hand crafted binary formats, which has  
the advantage that the format can be tailored to the application but  
it requires custom code for every protocol and it's hard to get  
right, especially the simplicity/extendability tradeoff.


The ideal way to encode data would be a standard that requires  
relatively little code to implement, makes for small files/packets  
that are fast to process but remains reasonably extensible.


So, any thoughts? Binary XML, maybe?

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-05 Thread Carsten Bormann

On Jun 05 2006, at 23:43 , Iljitsch van Beijnum wrote:

What is considered best practice for encoding data in protocols  
within the IETF's purview?


The best practice is to choose an encoding that is appropriate for  
the protocol being designed.

(There is no single answer.)

Maybe you can be more specific in your question, then maybe people  
can be more specific in their answers?


Gruesse, Carsten


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-05 Thread Randy Presuhn
Hi -

 From: Iljitsch van Beijnum [EMAIL PROTECTED]
 To: IETF Discussion ietf@ietf.org
 Sent: Monday, June 05, 2006 2:43 PM
 Subject: Best practice for data encoding?
...
 Then there is the ASN.1 route, but as we can see with  
 SNMP, this also requires lots of code and is very (security) bug  
 prone.
...

Having worked on SNMP toolkits for a long time, I'd have to
strenuously disagree.  In my experience, the ASN.1/BER-related
code is a rather small portion of an SNMP protocol engine.
The code related to the SNMP protocol's quirks, such as Get-Next/Bulk
processing and the mangling of index values into object identifiers
(which is far removed from how ASN.1 intended object identifiers
to be used) require much more code and complexity.

I'm curious, too, about the claim that this has resulted in security
problems.  Could someone elaborate?

Randy


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-05 Thread Steven M. Bellovin
On Mon, 5 Jun 2006 16:06:28 -0700, Randy Presuhn
[EMAIL PROTECTED] wrote:

 Hi -
 
  From: Iljitsch van Beijnum [EMAIL PROTECTED]
  To: IETF Discussion ietf@ietf.org
  Sent: Monday, June 05, 2006 2:43 PM
  Subject: Best practice for data encoding?
 ...
  Then there is the ASN.1 route, but as we can see with  
  SNMP, this also requires lots of code and is very (security) bug  
  prone.
 ...
 
 Having worked on SNMP toolkits for a long time, I'd have to
 strenuously disagree.  In my experience, the ASN.1/BER-related
 code is a rather small portion of an SNMP protocol engine.
 The code related to the SNMP protocol's quirks, such as Get-Next/Bulk
 processing and the mangling of index values into object identifiers
 (which is far removed from how ASN.1 intended object identifiers
 to be used) require much more code and complexity.

Yah -- measure first, then optimize.

 
 I'm curious, too, about the claim that this has resulted in security
 problems.  Could someone elaborate?
 
See http://www.cert.org/advisories/CA-2002-03.html



--Steven M. Bellovin, http://www.cs.columbia.edu/~smb

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-05 Thread Randy Presuhn
Hi -

 From: Steven M. Bellovin [EMAIL PROTECTED]
 To: Randy Presuhn [EMAIL PROTECTED]
 Cc: ietf@ietf.org
 Sent: Monday, June 05, 2006 4:09 PM
 Subject: Re: Best practice for data encoding?
...
  I'm curious, too, about the claim that this has resulted in security
  problems.  Could someone elaborate?
  
 See http://www.cert.org/advisories/CA-2002-03.html
...

I remember that exercise.  I don't see it as convincing evidence that
the use of ASN.1 was the cause of the problems some implementations
had; I doubt that someone who had buffer overflow problems when
processing a BER-encoded octet string (where the length is explicitly
encoded) would have had any better results with XML or any other
representation.

Randy


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-05 Thread David Harrington
Hi 

The security problems identified in
http://www.cert.org/advisories/CA-2002-03.html Multiple
Vulnerabilities in Many Implementations of the Simple Network
Management Protocol (SNMP) are not caused by the protocol choice to
use ASN.1, but by vendors incorrectly implementing the protocol (which
was made worse by vendors using toolkits that had the problems).

If Multiple Vulnerabilities in Implementations were used to condemn
the encoding methods of protocols that have been incorrectly
implemented, then we would have to condemn an awful lot of IETF
protocols as being very (security) bug prone: 

CERT Advisory CA-2003-26 Multiple Vulnerabilities in SSL/TLS
Implementations
US-CERT Vulnerability Note VU#459371 Multiple IPsec implementations do
not adequately validate
 CERTR Advisory CA-2001-18 Multiple Vulnerabilities in Several
Implementations of the Lightweight Directory Access Protocol (LDAP) 
CERT Advisory CA-2002-36 Multiple Vulnerabilities in SSH
Implementations
 CERTR Advisory CA-2003-06 Multiple vulnerabilities in implementations
of the Session Initiation Protocol (SIP) 
Vulnerability Note VU#428230 Multiple vulnerabilities in S/MIME
implementations
Vulnerability Note VU#955777 Multiple vulnerabilities in DNS
implementations
Vulnerability Note VU#226364 Multiple vulnerabilities in Internet Key
Exchange (IKE) version 1 implementations
CERTR Advisory CA-2002-06 Vulnerabilities in Various Implementations
of the RADIUS Protocol
CERTR Advisory CA-2000-06 Multiple Buffer Overflows in Kerberos
Authenticated Services
Vulnerability Note VU#836088 Multiple vendors' email content/virus
scanners do not adequately check message/partial MIME entities

David Harrington
[EMAIL PROTECTED] 
[EMAIL PROTECTED]
[EMAIL PROTECTED]


 -Original Message-
 From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] 
 Sent: Monday, June 05, 2006 7:10 PM
 To: Randy Presuhn
 Cc: ietf@ietf.org
 Subject: Re: Best practice for data encoding?
 
 On Mon, 5 Jun 2006 16:06:28 -0700, Randy Presuhn
 [EMAIL PROTECTED] wrote:
 
  Hi -
  
   From: Iljitsch van Beijnum [EMAIL PROTECTED]
   To: IETF Discussion ietf@ietf.org
   Sent: Monday, June 05, 2006 2:43 PM
   Subject: Best practice for data encoding?
  ...
   Then there is the ASN.1 route, but as we can see with  
   SNMP, this also requires lots of code and is very (security) bug

   prone.
  ...
  
  Having worked on SNMP toolkits for a long time, I'd have to
  strenuously disagree.  In my experience, the ASN.1/BER-related
  code is a rather small portion of an SNMP protocol engine.
  The code related to the SNMP protocol's quirks, such as 
 Get-Next/Bulk
  processing and the mangling of index values into object
identifiers
  (which is far removed from how ASN.1 intended object identifiers
  to be used) require much more code and complexity.
 
 Yah -- measure first, then optimize.
 
  
  I'm curious, too, about the claim that this has resulted in
security
  problems.  Could someone elaborate?
  
 See http://www.cert.org/advisories/CA-2002-03.html
 
 
 
   --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
 
 ___
 Ietf mailing list
 Ietf@ietf.org
 https://www1.ietf.org/mailman/listinfo/ietf
 


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-05 Thread Steven M. Bellovin
On Mon, 5 Jun 2006 20:07:24 -0400, David Harrington
[EMAIL PROTECTED] wrote:

 Hi 
 
 The security problems identified in
 http://www.cert.org/advisories/CA-2002-03.html Multiple
 Vulnerabilities in Many Implementations of the Simple Network
 Management Protocol (SNMP) are not caused by the protocol choice to
 use ASN.1, but by vendors incorrectly implementing the protocol (which
 was made worse by vendors using toolkits that had the problems).
 
 If Multiple Vulnerabilities in Implementations were used to condemn
 the encoding methods of protocols that have been incorrectly
 implemented, then we would have to condemn an awful lot of IETF
 protocols as being very (security) bug prone: 
 

Works for me

More precisely -- when something is sufficiently complex, it's inherently
bug-prone.  That is indeed a good reason to push back on a design.  The
question to ask is whether the *problem* is inherently complex -- when the
complexity of the solution significanlty exceeds the inherent complexity of
the problem, you've probably made a mistake.  When the problem itself is
sufficiently complex, it's fair to ask if it should be solved.  Remember
point (3) of RFC 1925.

I'll note that a number of the protocols you cite were indeed criticized
*during the design process* as too complex.  The objectors were overruled.

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-05 Thread David Harrington
I agree that complexity breeds bug-prone implementations.

I wasn't around then; did anybody push back on SNMPv1 as being too
complex? http://www.cert.org/advisories/CA-2002-03.html is mainly
about SNMPv1 implementations. ;-)

dbh

 -Original Message-
 From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] 
 Sent: Monday, June 05, 2006 8:21 PM
 To: David Harrington
 Cc: [EMAIL PROTECTED]; ietf@ietf.org
 Subject: Re: Best practice for data encoding?
 
 On Mon, 5 Jun 2006 20:07:24 -0400, David Harrington
 [EMAIL PROTECTED] wrote:
 
  Hi 
  
  The security problems identified in
  http://www.cert.org/advisories/CA-2002-03.html Multiple
  Vulnerabilities in Many Implementations of the Simple Network
  Management Protocol (SNMP) are not caused by the protocol choice
to
  use ASN.1, but by vendors incorrectly implementing the 
 protocol (which
  was made worse by vendors using toolkits that had the problems).
  
  If Multiple Vulnerabilities in Implementations were used 
 to condemn
  the encoding methods of protocols that have been incorrectly
  implemented, then we would have to condemn an awful lot of IETF
  protocols as being very (security) bug prone: 
  
 
 Works for me
 
 More precisely -- when something is sufficiently complex, 
 it's inherently
 bug-prone.  That is indeed a good reason to push back on a 
 design.  The
 question to ask is whether the *problem* is inherently 
 complex -- when the
 complexity of the solution significanlty exceeds the inherent 
 complexity of
 the problem, you've probably made a mistake.  When the 
 problem itself is
 sufficiently complex, it's fair to ask if it should be 
 solved.  Remember
 point (3) of RFC 1925.
 
 I'll note that a number of the protocols you cite were indeed 
 criticized
 *during the design process* as too complex.  The objectors 
 were overruled.
 
   --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
 


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-05 Thread Fleischman, Eric
Let's not forget that the S in SNMP stands for simple. Simple is a
relative term. In this case, SNMP is simple when compared to CMIP.

-Original Message-
From: David Harrington [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 05, 2006 5:33 PM
I agree that complexity breeds bug-prone implementations.

I wasn't around then; did anybody push back on SNMPv1 as being too
complex? http://www.cert.org/advisories/CA-2002-03.html is mainly about
SNMPv1 implementations. ;-)

dbh



___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-05 Thread Jeffrey Hutzelman



On Monday, June 05, 2006 08:32:58 PM -0400 David Harrington 
[EMAIL PROTECTED] wrote:



I agree that complexity breeds bug-prone implementations.

I wasn't around then; did anybody push back on SNMPv1 as being too
complex?


I don't think anyone pushed back on SNMPv1 as being inherently insecure.

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-05 Thread Michael Thomas

David Harrington wrote:


I agree that complexity breeds bug-prone implementations.

I wasn't around then; did anybody push back on SNMPv1 as being too
complex? http://www.cert.org/advisories/CA-2002-03.html is mainly
about SNMPv1 implementations. ;-)
 



I wasn't there to push back, but when I got asked to implement it back then
the Simple part such seemed like something between a fib and the Big Lie.
Did we really need ASN.1 to define a debug peek/poke-like protocol?

  Mike


dbh

 


-Original Message-
From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 05, 2006 8:21 PM

To: David Harrington
Cc: [EMAIL PROTECTED]; ietf@ietf.org
Subject: Re: Best practice for data encoding?

On Mon, 5 Jun 2006 20:07:24 -0400, David Harrington
[EMAIL PROTECTED] wrote:

   

Hi 


The security problems identified in
http://www.cert.org/advisories/CA-2002-03.html Multiple
Vulnerabilities in Many Implementations of the Simple Network
Management Protocol (SNMP) are not caused by the protocol choice
 


to
 

use ASN.1, but by vendors incorrectly implementing the 
 


protocol (which
   


was made worse by vendors using toolkits that had the problems).

If Multiple Vulnerabilities in Implementations were used 
 


to condemn
   


the encoding methods of protocols that have been incorrectly
implemented, then we would have to condemn an awful lot of IETF
protocols as being very (security) bug prone: 

 


Works for me

More precisely -- when something is sufficiently complex, 
it's inherently
bug-prone.  That is indeed a good reason to push back on a 
design.  The
question to ask is whether the *problem* is inherently 
complex -- when the
complexity of the solution significanlty exceeds the inherent 
complexity of
the problem, you've probably made a mistake.  When the 
problem itself is
sufficiently complex, it's fair to ask if it should be 
solved.  Remember

point (3) of RFC 1925.

I'll note that a number of the protocols you cite were indeed 
criticized
*during the design process* as too complex.  The objectors 
were overruled.


--Steven M. Bellovin, http://www.cs.columbia.edu/~smb

   




___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf
 



___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


RE: Best practice for data encoding?

2006-06-05 Thread Gray, Eric
Steven,

I'm not sure what you mean by saying that a problem that is
highly complex should not be solved (or, at least, that we should
consider not solving it).  That seems like a cop-out.  Minimally,
every problem we've ever faced, we've tried to solve (where we
refers to us weak-kneed Homo Sapiens) - no matter how hard it was
to do so - and I like to think that is the right thing to do.

In fairness, I am reasonably sure that point 3 in RFC 1925 
refers to making a complex solution work, even if a simpler answer
might be found, simply because enough people want that solution.  

It does not - IMO - rule out solving complex problems using 
as simple a solution as possible, however complex that might be.

--
Eric

-- -Original Message-
-- From: Steven M. Bellovin [mailto:[EMAIL PROTECTED] 
-- Sent: Monday, June 05, 2006 8:21 PM
-- To: David Harrington
-- Cc: ietf@ietf.org
-- Subject: Re: Best practice for data encoding?
-- 
-- On Mon, 5 Jun 2006 20:07:24 -0400, David Harrington
-- [EMAIL PROTECTED] wrote:
-- 
--  Hi 
--  
--  The security problems identified in
--  http://www.cert.org/advisories/CA-2002-03.html Multiple
--  Vulnerabilities in Many Implementations of the Simple Network
--  Management Protocol (SNMP) are not caused by the 
-- protocol choice to
--  use ASN.1, but by vendors incorrectly implementing the 
-- protocol (which
--  was made worse by vendors using toolkits that had the problems).
--  
--  If Multiple Vulnerabilities in Implementations were 
-- used to condemn
--  the encoding methods of protocols that have been incorrectly
--  implemented, then we would have to condemn an awful lot of IETF
--  protocols as being very (security) bug prone: 
--  
-- 
-- Works for me
-- 
-- More precisely -- when something is sufficiently complex, 
-- it's inherently
-- bug-prone.  That is indeed a good reason to push back on a 
-- design.  The
-- question to ask is whether the *problem* is inherently 
-- complex -- when the
-- complexity of the solution significanlty exceeds the 
-- inherent complexity of
-- the problem, you've probably made a mistake.  When the 
-- problem itself is
-- sufficiently complex, it's fair to ask if it should be 
-- solved.  Remember
-- point (3) of RFC 1925.
-- 
-- I'll note that a number of the protocols you cite were 
-- indeed criticized
-- *during the design process* as too complex.  The objectors 
-- were overruled.
-- 
-- --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
-- 
-- ___
-- Ietf mailing list
-- Ietf@ietf.org
-- https://www1.ietf.org/mailman/listinfo/ietf
-- 

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-05 Thread Randy Presuhn
Hi -

 From: Fleischman, Eric [EMAIL PROTECTED]
 To: David Harrington [EMAIL PROTECTED]; Steven M. Bellovin [EMAIL 
 PROTECTED]
 Cc: ietf@ietf.org
 Sent: Monday, June 05, 2006 5:41 PM
 Subject: RE: Best practice for data encoding?

 Let's not forget that the S in SNMP stands for simple. Simple is a
 relative term. In this case, SNMP is simple when compared to CMIP

We implemented both protocols.  The core protocol engine for SNMP
ended up being larger and more complex than that for CMIP.  The
complexity of GetNext, along with OID mangling, accounted for much
of the difference.   The S in SNMP was half marketing and half politics,
and had very little to do with actual implementation or use.

Randy


___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


Re: Best practice for data encoding?

2006-06-05 Thread Steven M. Bellovin
On Mon, 5 Jun 2006 20:59:32 -0400 , Gray, Eric [EMAIL PROTECTED]
wrote:

 Steven,
 
   I'm not sure what you mean by saying that a problem that is
 highly complex should not be solved (or, at least, that we should
 consider not solving it).  That seems like a cop-out.  Minimally,
 every problem we've ever faced, we've tried to solve (where we
 refers to us weak-kneed Homo Sapiens) - no matter how hard it was
 to do so - and I like to think that is the right thing to do.
 
   In fairness, I am reasonably sure that point 3 in RFC 1925 
 refers to making a complex solution work, even if a simpler answer
 might be found, simply because enough people want that solution.  
 
   It does not - IMO - rule out solving complex problems using 
 as simple a solution as possible, however complex that might be.

I meant exactly what I said.  The reason to avoid certain solutions is
that you'll then behave as if the problem is really solved, with bad
consequences if you're wrong -- and for some problems, you probably are
wrong. Read David Parnas' Software Aspects of Strategic Defense
Systems (available at
http://klabs.org/richcontent/software_content/papers/parnas_acm_85.pdf);
also consider the historical record on why the US and the USSR signed a
treaty banning most anti-missile systems, and in particular why the
existence of such systems made the existing nuclear deterrent standoff
unstable.

Note carefully that I didn't say we shouldn't do research on how to solve
things.  But doing research and declaring that we know how to do something
are two very different things.

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb

___
Ietf mailing list
Ietf@ietf.org
https://www1.ietf.org/mailman/listinfo/ietf