Re: [Lazarus] unit Masks vs. unit FPMasks

2021-02-25 Thread José Mejuto via lazarus

El 24/02/2021 a las 22:36, Bart via lazarus escribió:


Filename:='test.txt'
Mask:='test??.txt?'
Match must be true

That sucks big time.
A ? is supposed to match EXACTLY 1 character (not optional).
Bloody @#$%$#@#$ Micro$uck,


X-D

This quirk has its explanation which is 8.3 backwards compatibility. The 
old 8.3 masking (well the system itself) resolves masks setting the name 
and extension in a 11 bytes array where space is a "no-char", so file 
name "TEST.TXT" is stored as "TEXT" and "TXT" and if you apply this 
mask "TEST.TXT" its logical that it evaluates to a positive match.


I'm quite sure that this comes from CP/M times.

Have a nice day.

--

--
___
lazarus mailing list
lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] unit Masks vs. unit FPMasks

2021-02-25 Thread José Mejuto via lazarus

El 24/02/2021 a las 21:53, Juha Manninen via lazarus escribió:

Hello,

I am interested in how well your TMask version compares with Delphi's 
version.

Does it match the speed or even surpass it?


Not tested because in my code strings are allways UTF8 stored so for 
Delphi comparison I must convert them to Unicode before. I'll try to 
perform a simple benchmark.


Case-insensitive matching of Unicode can be fixed later with functions 
found in LazUTF8.


As this code is not a priority I think its better to make it work with 
Lazutf8 functions before the first commit. The problem is the support 
for sametext in UnicodeString.


With José's approval the license will be LGPL with a linking exception. 
It will be part of the LazUtils package. Author's name will be mentioned 
of course.

Is that OK?


Yes, of course. Put the name if it is used for a reference, otherwise 
put standard headers.


I will not copy the whole original unit but use the UTF-8 parts + rename 
and tweak some things.


In fact I think that the whole unit is needed. It has (info for other 
readers) 3 classes TMaskUTF8, TMaskAnsi and TMaskUnicode, maybe 
TMaskAnsi can be omitted but UTF8 and Unicode should be present. Many 
times TMask is used over zillions of strings, converting Unicode to UTF8 
(for UnicodeStrings and WideStrings) is time consuming, much more than 
the masking itself in most cases.


I suggest to keep the 3 classes and create a new TMask one which mimic 
the behaviour of current TMask, disabling the masking extensions (escape 
char, [?],...) which is very simple as you only need to subclass the 
Create method, mask compilation happens at first use time, not at 
creation time.


This way current code using TMask will behave 99.9% identical, but an 
user that needs to mask other strings can use TMaskUnicode, in example, 
and activate or deactivate other extensions.


--

--
___
lazarus mailing list
lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] unit Masks vs. unit FPMasks

2021-02-25 Thread Juha Manninen via lazarus
On Thu, Feb 25, 2021 at 10:47 AM José Mejuto via lazarus <
lazarus@lists.lazarus-ide.org> wrote:

> In fact I think that the whole unit is needed. It has (info for other
> readers) 3 classes TMaskUTF8, TMaskAnsi and TMaskUnicode, maybe
> TMaskAnsi can be omitted but UTF8 and Unicode should be present. Many
> times TMask is used over zillions of strings, converting Unicode to UTF8
> (for UnicodeStrings and WideStrings) is time consuming, much more than
> the masking itself in most cases.
>

UTF8 is also Unicode, one of its encodings.
The name UnicodeString is misleading. It should be UTF16String.
Please remember our Unicode solution uses UTF-8. It is done by changing the
default encoding of AnsiString and triggered by the same LazUTF8 unit that
is used by Masks unit. Everything is UTF-8.


I suggest to keep the 3 classes and create a new TMask one which mimic
> the behaviour of current TMask, disabling the masking extensions (escape
> char, [?],...) which is very simple as you only need to subclass the
> Create method, mask compilation happens at first use time, not at
> creation time.
>

I can include the TMaskUnicode class there if you want, although its name
is also misleading.
TMaskAnsi must be left out. It has no use with our Unicode solution.
TMaskUTF8 I have renamed to TMask in my tests. It replaces the
current TMask which supports Unicode only partially.
I could make an alias type
  TMask = class(TMaskUTF8)
but why should I? Basically every String in our Unicode system has UTF-8
encoding. No need to have a special mask class for UTF-8.


This way current code using TMask will behave 99.9% identical, but an
> user that needs to mask other strings can use TMaskUnicode, in example,
> and activate or deactivate other extensions.
>

Where do the other strings come from? Anyway TMaskUnicode can be included,
no problem.
I am not sure we want a new TMask to behave 99.9% identical with the
current one. The new one has some clear improvements.
Interestingly there does not seem to be any standard for the mask syntax.
So we cannot be compliant to any "standard".


Regards,
Juha
-- 
___
lazarus mailing list
lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] unit Masks vs. unit FPMasks

2021-02-25 Thread José Mejuto via lazarus

El 25/02/2021 a las 10:39, Juha Manninen via lazarus escribió:

Hello,


UTF8 is also Unicode, one of its encodings.
The name UnicodeString is misleading. It should be UTF16String.
Please remember our Unicode solution uses UTF-8. It is done by changing 
the default encoding of AnsiString and triggered by the same LazUTF8 
unit that is used by Masks unit. Everything is UTF-8.


Yes, Unicode is a very, very bad name. I've developed the code with fpc 
in mind, not Lazarus, that's the reason of the three versions. From the 
point of view of Lazarus, UTF8 is enough.



This way current code using TMask will behave 99.9% identical, but an
user that needs to mask other strings can use TMaskUnicode, in example,
and activate or deactivate other extensions.

Where do the other strings come from? Anyway TMaskUnicode can be 
included, no problem.
I am not sure we want a new TMask to behave 99.9% identical with the 
current one. The new one has some clear improvements.
Interestingly there does not seem to be any standard for the mask 
syntax. So we cannot be compliant to any "standard".


Backwards compatibility, in special the escape character which can be 
used in old masks like: "C:\*.*". The other functions can be kept 
active, but escape char could be a compatibility problem.


Anyway, revisiting code I've found a bug in the escape character in char 
groups, which simply are ignored. It has been fixed in UTF8 version, I'm 
now porting to UTF16 and Ansi and I'll send you the updated version. 
Also fixed the "[!]" mask to raise an exception and the 
"RANGES_AUTOREVERSE" (was a variable rename from "cMask" to "lMask" 
which was missed in the IFDEF).


Have a nice day.

--

--
___
lazarus mailing list
lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] unit Masks vs. unit FPMasks

2021-02-25 Thread Juha Manninen via lazarus
On Thu, Feb 25, 2021 at 12:44 PM José Mejuto via lazarus <
lazarus@lists.lazarus-ide.org> wrote:

> Backwards compatibility, in special the escape character which can be
> used in old masks like: "C:\*.*". The other functions can be kept
> active, but escape char could be a compatibility problem.
>

Ok, true.
Escaping special characters would be very handy. A pity.
I kept the name TMaskUTF8 after all and inherited TMask from it. It helps
synchronise changes between our versions, among other things.
Now I need instructions or a piece of code into TMask constructor to make
it ~ backwards compatible. Later we can see if the advanced features can
be used.
I will look at the case-insensitive match of Unicode next...


Anyway, revisiting code I've found a bug in the escape character in char
> groups, which simply are ignored. It has been fixed in UTF8 version, I'm
> now porting to UTF16 and Ansi and I'll send you the updated version.
> Also fixed the "[!]" mask to raise an exception and the
> "RANGES_AUTOREVERSE" (was a variable rename from "cMask" to "lMask"
> which was missed in the IFDEF).
>

I renamed "cMask" to "lMask" here. Earlier I made other changes.
I renamed UTF8Length to UTF8CodepointSizeFast. LazUTF8 has that function
for the purpose (and UTF8CodepointSize). Then I used the LazUTF8 version.
UTF8Length in LazUTF8 means the number of codepoints in a whole string.
I also changed PByte to PChar because of that function change. It seems to
compile everywhere.
An exception from "[!]" mask would be good, yes. It is clearly an error
from a user. The current TMask also complains about it.

Juha
-- 
___
lazarus mailing list
lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus