PCRE for classic z/OS is now stable and development effort is basically over.  
Before I begin to tackle PCRE2, I would like to tackle the test suite in a more 
serious manner regarding the EBCDIC vs. ASCII environment.  From previous 
communication I know that for 8 bits code I need only TESTIN1, TESTIN2, 
TESTIN11 and TESTIN12.  I ran those tests as is, knowing that the results would 
be different then the results on an ASCII platform.  Indeed, there were some 
expected differences, but also some unexpected or at least some that I do not 
understand.  I will have to ask questions as I scan and try to make sense of 
those differences.Two comment before my questions:
* The character logical not ¬ is the EBCDIC equivalent of the circumflex ^* I 
am somewhat surprised that most tests actually produced the same results.
1. on TESTOUT11-8 line 284 you have:/\x{100}/8BMMemory allocation (code space): 
10------------------------------------------------------------------  0   6 Bra 
 3     \x{100}  6   6 Ket  9     
End------------------------------------------------------------------
/\x{1000}/8BMMemory allocation (code space): 
11------------------------------------------------------------------  0   7 Bra 
 3     \x{1000}  7   7 Ket 10     
End------------------------------------------------------------------/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
I get:/\x{100}/8BMFailed: this version of PCRE is compiled without UTF support 
at offset 0
/\x{1000}/8BMFailed: this version of PCRE is compiled without UTF support at 
offset 0/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

2. I guess that the test with saved patterns on 16 and 32 bits are not really 
relevant because nobody could produce their equivalent on EBCDIC platforms.
3. On TESTOUT2, line 518 you have:/(?i)[abcd]/ISCapturing subpattern count = 
0Options: caselessNo first charNo need charSubject length lower bound = 
1Starting chars: A B C D a b c d
while I have/(?i)[abcd]/ISCapturing subpattern count = 0Options: caselessNo 
first charNo need charSubject length lower bound = 1Starting chars: a b c d A B 
C D
The small and capital letter switch places.  Now, in EBCDIC, the capital 
letters appear after the small letters (e.g. A is 0xC1 while a is 0x81) while 
in ASCII the opposite is true.  Would that cause the difference?  Should it be 
like that?
4.  I am a bit concerned whether we have the \n defined correctly.  In 
TESTOUT2, line 689 you have:/(?<=foo\n)¬bar/ImCapturing subpattern count = 0Max 
lookbehind = 4Contains explicit CR or LF matchOptions: multilineNo first 
charNeed char = 'r'    foo\nbarbar 0: bar
while I have:/(?<=foo\n)¬bar/ImCapturing subpattern count = 0Max lookbehind = 
4Contains explicit CR or LF matchOptions: multilineNo first charNeed char = 'r' 
   foo\nbarbarNo match 
Similarly TESTOUT2, line 1098/word ((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ 
)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ 
)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+)?)?)?)?)?)?)?)?)?otherword/ICapturing 
subpattern count = 8Contains explicit CR or LF matchNo optionsFirst char = 
'w'Need char = 'd'
while I have:/word ((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ 
)((?:[a-zA-Z0-9]+)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ 
)((?:[a-zA-Z0-9]+)?)?)?)?)?)?)?)?)?otherword/ICapturing subpattern count = 8No 
optionsFirst char = 'w'Need char = 'd'



Ze'ev Atlas

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to