> Assuming stringobject.c:string_contains is the right function, the > code looks like this: > > size = PyString_GET_SIZE(el); > rhs = PyString_AS_STRING(el); > lhs = PyString_AS_STRING(a); > > /* optimize for a single character */ > if (size == 1) > return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL; > > end = lhs + (PyString_GET_SIZE(a) - size); > while (lhs <= end) { > if (memcmp(lhs++, rhs, size) == 0) > return 1; > } > > So it's doing a zillion memcmp()s. I don't think there's a more > efficient way to do this with ANSI C; memmem() is a GNU extension that > searches for blocks of memory. Perhaps saving some memcmps by writing > > if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0) > > would help.
Which is exactly how s.find() wins this race. (I guess it loses when it's found by having to do the "find" lookup.) Maybe string_contains should just call string_find_internal()? And then there's the question of how the re module gets to be faster still; I suppose it doesn't bother with memcmp() at all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com