#49568 [Opn-Fbk]: Regex does not match when text added to matching text

2009-09-18 Thread jani
 ID:   49568
 Updated by:   j...@php.net
 Reported By:  anoop dot john at zyxware dot com
-Status:   Open
+Status:   Feedback
 Bug Type: PCRE related
 Operating System: Ubuntu Jaunty
 PHP Version:  5.2.10
 New Comment:

Please, simplify the regex to as much as possible. Once you have the
simplest case still showing the problem we might be able to say whether
it's a bug or not. 


Previous Comments:


[2009-09-16 18:16:59] anoop dot john at zyxware dot com

I know for sure one thing. The pattern matches only one opening brace
and one closing brace. So it cannot start matching with the first pair
of brackets and go on matching the second pair of braces in the example
given. When it fails with the first pair of braces the matching should
restart beginning with the opening brace of the second pair of braces.



[2009-09-16 12:03:28] j...@php.net

And you're 100% sure your pattern is not buggy?



[2009-09-16 01:39:22] anoop dot john at zyxware dot com

Description:

I am using a complex regex pattern to match stock tickers in a piece of
text. The pattern given below

$pattern =
'/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/';

should match 

(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW) 

and it does match it when the subject string is given alone. However
when you prepend another particular string that does not match this
pattern in front of this subject string the regex ceases to match the
original portion of the string. The culprit string is given below.

(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange:
CRX;NasdaqGM: QTWW)

The pattern matches only one opening brace and will not match another
opening brace. So it cannot be that the pattern ate through the first
pair of brackets and went into the second pair of brackets and fails to
match when the culprit string is prepended. 


Reproduce code:
---
$pattern =
'/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/';
preg_match_all($pattern, '(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ,
Swiss Exchange: CRX;NasdaqGM: QTWW) (AMEX,NYSE, Swiss Exchange:
CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER);
var_export($matches);
echo br /br /;
preg_match_all($pattern, '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', $matches, PREG_SET_ORDER);
var_export($matches);


Expected result:

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )

Actual result:
--
array ( )

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )





-- 
Edit this bug report at http://bugs.php.net/?id=49568edit=1



#49568 [Opn-Fbk]: Regex does not match when text added to matching text

2009-09-18 Thread jani
 ID:   49568
 Updated by:   j...@php.net
 Reported By:  anoop dot john at zyxware dot com
-Status:   Open
+Status:   Feedback
 Bug Type: PCRE related
 Operating System: Ubuntu Jaunty
 PHP Version:  5.2.10
 New Comment:

How about fixing your pattern to match 1 or more times? Now it only
matches if there's exactly one match.


Previous Comments:


[2009-09-18 14:25:52] anoop dot john at zyxware dot com

I tried taking out conditions from the regular expressions but when I
took out the first condition the expression starts giving the expected
result. So the symptom appears only for the specific expression and the
specific text. 

My logic about the issue seems to be OK.

If pattern 

\(P\) matches (A) returns (A) as matches array

\(P\) does not match (B)

where no part of P can match \( or \) then 

\(P\) should definitely match (B)(A) and return (A) in the matches
array



[2009-09-18 13:46:51] j...@php.net

Please, simplify the regex to as much as possible. Once you have the
simplest case still showing the problem we might be able to say whether
it's a bug or not. 



[2009-09-16 18:16:59] anoop dot john at zyxware dot com

I know for sure one thing. The pattern matches only one opening brace
and one closing brace. So it cannot start matching with the first pair
of brackets and go on matching the second pair of braces in the example
given. When it fails with the first pair of braces the matching should
restart beginning with the opening brace of the second pair of braces.



[2009-09-16 12:03:28] j...@php.net

And you're 100% sure your pattern is not buggy?



[2009-09-16 01:39:22] anoop dot john at zyxware dot com

Description:

I am using a complex regex pattern to match stock tickers in a piece of
text. The pattern given below

$pattern =
'/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/';

should match 

(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW) 

and it does match it when the subject string is given alone. However
when you prepend another particular string that does not match this
pattern in front of this subject string the regex ceases to match the
original portion of the string. The culprit string is given below.

(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange:
CRX;NasdaqGM: QTWW)

The pattern matches only one opening brace and will not match another
opening brace. So it cannot be that the pattern ate through the first
pair of brackets and went into the second pair of brackets and fails to
match when the culprit string is prepended. 


Reproduce code:
---
$pattern =
'/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/';
preg_match_all($pattern, '(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ,
Swiss Exchange: CRX;NasdaqGM: QTWW) (AMEX,NYSE, Swiss Exchange:
CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER);
var_export($matches);
echo br /br /;
preg_match_all($pattern, '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', $matches, PREG_SET_ORDER);
var_export($matches);


Expected result:

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )

Actual result:
--
array ( )

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )





-- 
Edit this bug report at http://bugs.php.net/?id=49568edit=1



#49568 [Opn-Fbk]: Regex does not match when text added to matching text

2009-09-16 Thread jani
 ID:   49568
 Updated by:   j...@php.net
 Reported By:  anoop dot john at zyxware dot com
-Status:   Open
+Status:   Feedback
 Bug Type: PCRE related
 Operating System: Ubuntu Jaunty
 PHP Version:  5.2.10
 New Comment:

And you're 100% sure your pattern is not buggy?


Previous Comments:


[2009-09-16 01:39:22] anoop dot john at zyxware dot com

Description:

I am using a complex regex pattern to match stock tickers in a piece of
text. The pattern given below

$pattern =
'/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/';

should match 

(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW) 

and it does match it when the subject string is given alone. However
when you prepend another particular string that does not match this
pattern in front of this subject string the regex ceases to match the
original portion of the string. The culprit string is given below.

(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange:
CRX;NasdaqGM: QTWW)

The pattern matches only one opening brace and will not match another
opening brace. So it cannot be that the pattern ate through the first
pair of brackets and went into the second pair of brackets and fails to
match when the culprit string is prepended. 


Reproduce code:
---
$pattern =
'/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/';
preg_match_all($pattern, '(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ,
Swiss Exchange: CRX;NasdaqGM: QTWW) (AMEX,NYSE, Swiss Exchange:
CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER);
var_export($matches);
echo br /br /;
preg_match_all($pattern, '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', $matches, PREG_SET_ORDER);
var_export($matches);


Expected result:

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )

Actual result:
--
array ( )

array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq:
QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4
= 'QTWW', ), )





-- 
Edit this bug report at http://bugs.php.net/?id=49568edit=1