On Thu, 28 Dec 2006, Henri Sivonen wrote: > > My primary strategy against denial of service attacks that target the > conformance checking service is to limit the number of bytes accepted as > input. This indirectly throttles everything that is proportional to the > size of input, which is OK for most stuff that has linear growth > behavior. (It doesn't address things like the billion laughs attack, > though.) > > I have additionally placed arbitrary hard limits on the size of > particular buffers.
I recommend a simpler and broader strategy: limit the total CPU and memory usage of the process. After a certain level of CPU or memory usage, possibly monitored by a separate, higher priority thread, simply terminate the algorithm and explain that the system cannot handle the given document. > I'm wondering if there's a best practice here. Is there data on how long > non-malicious attribute values legitimately appear on the Web? I have seen (and created) multimegabyte attribute values. (Typically, data: URIs of one kind or another, but not always.) > At least there can be only one attribute buffer being filled at a time. > Buffering of the textContent of <progress> and friends is potentially > worse than an attribute buffer, because you could use the leading 1 MB > of bytes to establish <progress> start tags (each creating a buffer for > content) and then use the trailing 1 MB to fill those buffers > simultaneously. Perhaps I should worry about those buffers instead. What > might be a reasonable strategy for securing those (short of writing the > associated algorithms as automata that don't need buffers)? In that kind of case, I would recommend having one buffer for all decoded "text", and then having all text nodes and text buffers refer to start and end points in that buffer. This is also remarkably cheap in both CPU and memory; you only have to pay the cost of a single copy of the text content, regardless of the complexity of the data. It is also basically no overhead compared to having individual buffers, since you are still passing around strings. For mutable cases (e.g. to support scripting), you can use a copy-on-write scheme. > Is there data on haw large legitimate HTML documents appear on the Web? > The current limit of 2 MB is based on rounding the size of the Web Apps > spec up. I have seen infinitely long documents. Discounting those, I have seen documents of tens and hundrends of megabytes. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' _______________________________________________ implementors mailing list [email protected] http://lists.whatwg.org/listinfo.cgi/implementors-whatwg.org
