On 20 Oct 2021, at 10:43, Yann Ylavic <ylavic....@gmail.com> wrote: > apr_strnstr() maybe, strnstr() is non-standard AFAICT and possibly not > available on all platforms (Windows)?
This needs to be memmem really, as the boundary isn’t always a string. Memmem is non portable and old versions are buggy. Is there license compatibility to use this code? https://cvsweb.openbsd.org/src/lib/libc/string/memmem.c?rev=1.5 > >> + >> + /* definitely found it, we leave */ >> + if (pos != NULL) { >> + >> + off = pos - str; >> + >> + /* everything up to the boundary */ >> + if (off) { >> + >> + apr_bucket_split(e, off); >> + APR_BUCKET_REMOVE(e); >> + APR_BRIGADE_INSERT_TAIL(bbOut, e); >> + >> + e = APR_BRIGADE_FIRST(bbIn); >> + } >> + >> + /* cut out the boundary */ >> + apr_bucket_split(e, boundary_len); >> + apr_bucket_delete(e); >> + >> + return APR_SUCCESS; >> + } >> + >> + /* any partial matches at the end? */ >> + leftover = boundary_len - 1; >> + off = (len - leftover); > > Can't we fall through the below "else" slow path here? The code looks > quite similar. Let me look, as I recall the paths are separate to be clear. >> + APR_BUCKET_REMOVE(e); >> + APR_BRIGADE_INSERT_TAIL(bbOut, e); >> + continue; > > "outbytes" not bumped? Good catch. > >> + >> + } >> + >> + /* >> + * If we reach skip, it means the bucket in e is: >> + * >> + * - shorter than the boundary >> + * - matches the boundary up to the bucket length >> + * - might match more buckets >> + * >> + * Read further buckets and check whether the boundary matches all >> + * the way to the end. If so, we have a match. If no match, shave >> off >> + * one byte and continue round to try again. >> + */ >> +skip: > > Below you read ahead to try to match the end of the boundary, and if > not split one byte into bbOut before restarting the loop, which could > result in a quite fragmented bbOut and is also what you almost did > above already. > > Couldn't we instead save the partially matching bucket(s) in a > temporary brigade (provided by the caller) and restart the loop with > the more effective [apr_]strnstr() match (at boundary+off)? The input brigade is our temporary brigade. Instead of shaving off one on the bucket, we could keep a counter to just skip one (extra) byte on the next go round. Regards, Graham —