On 05/11/2021 08:54, Alex Peshkoff via Firebird-devel wrote:
> 
> Before changing / fixing something we should first of all decide what
> result do we really need.
> 

Please also take into account that what the customer primary problem was
the unacceptable performance degradation when they switch to UTF8,
demonstrated with these test cases:
https://github.com/FirebirdSQL/firebird/issues/6915.

About greater than and variant operators, no excuses, we must fix.

Then about STARTING WITH, not only about the problems I told early,
making CH not START WITH C will make it a slower operation.

Surely index lookup will make things faster, but non-indexed comparison
will be slower.

Also note that they already had problem if their test cases were based
on real scenario usage.

They do not put a test with ANSI_CZ LIKE 'C%'. Here is it:

Times in my machine.

SELECT ANSI_CZ FROM TEST1M
WHERE ANSI_CZ LIKE 'C%'
ORDER BY ANSI_CZ;

v4: ~3s
master: 2.7s

---

SELECT UNICODE_CS_CZ FROM TEST1M
WHERE UNICODE_CS_CZ LIKE 'C%'
ORDER BY UNICODE_CS_CZ;

v4: 3.5s
master: 3.1s

---

SELECT UNICODE_CI_CZ FROM TEST1M
WHERE UNICODE_CI_CZ LIKE 'C%'
ORDER BY UNICODE_CI_CZ;

v4: 3.7s
master: 3.4s

---

So in many cases master with Unicode has about same performance than v4
Ansi - and with greater lengths it's even improved after #7038.

There is also the case of UNICODE_CS_CZ vs ANSI_CZ test LIKE 'Z%' that
became slower and Z is not a actual Czech contraction. This must be
about normalization things being reported by ICU as contractions. A
problem which must be more investigated and fixed too.

So I emphasize my opinion that main performance problem is not the C/CH
thing. This was already present and my test case demonstrate that. If
the customer did not saw this, test case does not demonstrate their
usage pattern.

But the test case demonstrated very degradation with letter Z. While
test case may also not being demonstrated a common usage pattern with
many data, it's surely is a problem we must fix.

So instead of introduce lots of inconsistencies (and slow down in some
operations), the way to go is:
- Fix compare operators case
- Fix Z% case
- Implement multiple index lookups for C% case

I'm not telling that any of these are very easy. Probably only the
compare operators case is.


Adriano


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to