I wrote: > On Fri, Jul 16, 2021 at 1:44 AM Vladimir Sitnikov < sitnikov.vladi...@gmail.com> wrote: > > > > Have you considered shift-based DFA for a portable implementation https://gist.github.com/pervognsen/218ea17743e1442e59bb60d29b1aa725 ? > > I did consider some kind of DFA a while back and it was too slow.
I took a closer look at this "shift-based DFA", and it seemed pretty straightforward to implement this on top of my DFA attempt from some months ago. The DFA technique is not a great fit with our API, since we need to return how many bytes we found valid. On x86 (not our target for the fallback, but convenient to test) all my attempts were either worse than HEAD in multiple cases, or showed no improvement for the important ASCII case. On Power8, it's more compelling, and competitive with v14, so I'll characterize it on that platform as I describe the patch series: 0001 is a pure DFA, and has decent performance on multibyte, but terrible on ascii. 0002 dispatches on the leading byte category, unrolls the DFA loop according to how many valid bytes we need, and only checks the DFA state afterwards. It's good on multibyte (3-byte, at least) but still terrible on ascii. 0003 adds a 1-byte ascii fast path -- while robust on all inputs, it still regresses a bit on ascii. 0004 uses the same 8-byte ascii check as previous patches do. 0005 and 0006 use combinations of 1- and 8-byte ascii checks similar to in v17. 0005 seems the best on Power8, and is very close to v4. FWIW, v14's measurements seem lucky and fragile -- if I change any little thing, even - return -1; + return 0; it easily loses 100-200ms on non-pure-ascii tests. That said, v14 still seems the logical choice, unless there is some further tweak on top of v17 or v18 that gives some non-x86 platform a significant boost. Power8, gcc 4.8: HEAD: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 2944 | 1523 | 871 | 1473 | 1509 v18-0001: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 1257 | 1681 | 1385 | 1744 | 2018 v18-0002: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 951 | 1381 | 1217 | 1469 | 1172 v18-0003: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 911 | 1111 | 942 | 1112 | 865 v18-0004: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 987 | 730 | 222 | 1325 | 2306 v18-0005: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 962 | 664 | 180 | 928 | 1179 v18-0006: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 908 | 663 | 244 | 1026 | 1464 and for comparison, v14: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 888 | 607 | 179 | 777 | 1328 v17-0003: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 1205 | 662 | 180 | 767 | 1256 Macbook, clang 12: HEAD: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 974 | 691 | 370 | 456 | 526 v18-0001: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 1334 | 2713 | 2802 | 2665 | 2541 v18-0002: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 733 | 1212 | 1064 | 1034 | 1007 v18-0003: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 653 | 560 | 370 | 420 | 465 v18-0004: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 574 | 402 | 88 | 584 | 1033 v18-0005: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 1345 | 730 | 334 | 578 | 909 v18-0006: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 674 | 485 | 153 | 594 | 989 and for comparison, v14: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 674 | 346 | 78 | 309 | 504 v17-0002: chinese | mixed | ascii | mixed16 | mixed8 ---------+-------+-------+---------+-------- 516 | 324 | 78 | 331 | 544 -- John Naylor EDB: http://www.enterprisedb.com
v18-0001-Use-pure-DFA.patch
Description: Binary data
v18-0002-Unroll-loop-in-DFA.patch
Description: Binary data
v18-0004-Check-ascii-8-bytes-at-a-time-with-bitwise-opera.patch
Description: Binary data
v18-0003-Add-ascii-fast-path-before-resorting-to-DFA.patch
Description: Binary data
v18-0005-Do-1-byte-ascii-check-if-8-byte-check-fails.patch
Description: Binary data
v18-0006-Do-8-byte-check-only-if-1-byte-check-succeeds.patch
Description: Binary data