On Sun, Oct 7, 2018 at 5:54 PM, Nathaniel Smith <n...@pobox.com> wrote: > Are you imagining something roughly like this? (Ignoring chunk > boundary handling for the moment.) > > def find_double_line_end(buf): > start = 0 > while True: > next_idx = buf.index(b"\n", start) > if buf[next_idx - 1:next_idx + 1] == b"\n" or buf[next_idx - > 3:next_idx] == b"\r\n\r": > return next_idx > start = next_idx + 1 > > That's much more complicated than using re.search, and on some random > HTTP headers I have lying around it benchmarks ~70% slower too. Which > makes sense, since we're basically trying to replicate re engine's > work by hand in a slower language. > > BTW, if we only want to find a fixed string like b"\r\n\r\n", then > re.search and bytearray.index are almost identical in speed. If you > have a problem that can be expressed as a regular expression, then > regular expression engines are actually pretty good at solving those > :-)
Though... here's something strange. Here's another way to search for the first appearance of either \r\n\r\n or \n\n in a bytearray: def find_double_line_end_2(buf): idx1 = buf.find(b"\r\n\r\n") idx2 = buf.find(b"\n\n", 0, idx1) if idx1 == -1: return idx2 elif idx2 == -1: return idx1 else: return min(idx1, idx2) So this is essentially equivalent to our regex (notice they both pick out position 505 as the end of the headers): In [52]: find_double_line_end_2(sample_headers) Out[52]: 505 In [53]: double_line_end_re = re.compile(b"\r\n\r\n|\n\n") In [54]: double_line_end_re.search(sample_headers) Out[54]: <_sre.SRE_Match object; span=(505, 509), match=b'\r\n\r\n'> But, the Python function that calls bytearray.find twice is about ~3x faster than the re module: In [55]: %timeit find_double_line_end_2(sample_headers) 1.18 µs ± 40 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [56]: %timeit double_line_end_re.search(sample_headers) 3.3 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) The regex module is even slower: In [57]: double_line_end_regex = regex.compile(b"\r\n\r\n|\n\n") In [58]: %timeit double_line_end_regex.search(sample_headers) 4.95 µs ± 76.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/