#49568 [Opn-Fbk]: Regex does not match when text added to matching text
ID: 49568 Updated by: j...@php.net Reported By: anoop dot john at zyxware dot com -Status: Open +Status: Feedback Bug Type: PCRE related Operating System: Ubuntu Jaunty PHP Version: 5.2.10 New Comment: Please, simplify the regex to as much as possible. Once you have the simplest case still showing the problem we might be able to say whether it's a bug or not. Previous Comments: [2009-09-16 18:16:59] anoop dot john at zyxware dot com I know for sure one thing. The pattern matches only one opening brace and one closing brace. So it cannot start matching with the first pair of brackets and go on matching the second pair of braces in the example given. When it fails with the first pair of braces the matching should restart beginning with the opening brace of the second pair of braces. [2009-09-16 12:03:28] j...@php.net And you're 100% sure your pattern is not buggy? [2009-09-16 01:39:22] anoop dot john at zyxware dot com Description: I am using a complex regex pattern to match stock tickers in a piece of text. The pattern given below $pattern = '/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/'; should match (AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW) and it does match it when the subject string is given alone. However when you prepend another particular string that does not match this pattern in front of this subject string the regex ceases to match the original portion of the string. The culprit string is given below. (Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange: CRX;NasdaqGM: QTWW) The pattern matches only one opening brace and will not match another opening brace. So it cannot be that the pattern ate through the first pair of brackets and went into the second pair of brackets and fails to match when the culprit string is prepended. Reproduce code: --- $pattern = '/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/'; preg_match_all($pattern, '(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange: CRX;NasdaqGM: QTWW) (AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER); var_export($matches); echo br /br /; preg_match_all($pattern, '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER); var_export($matches); Expected result: array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) Actual result: -- array ( ) array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) -- Edit this bug report at http://bugs.php.net/?id=49568edit=1
#49568 [Opn-Fbk]: Regex does not match when text added to matching text
ID: 49568 Updated by: j...@php.net Reported By: anoop dot john at zyxware dot com -Status: Open +Status: Feedback Bug Type: PCRE related Operating System: Ubuntu Jaunty PHP Version: 5.2.10 New Comment: How about fixing your pattern to match 1 or more times? Now it only matches if there's exactly one match. Previous Comments: [2009-09-18 14:25:52] anoop dot john at zyxware dot com I tried taking out conditions from the regular expressions but when I took out the first condition the expression starts giving the expected result. So the symptom appears only for the specific expression and the specific text. My logic about the issue seems to be OK. If pattern \(P\) matches (A) returns (A) as matches array \(P\) does not match (B) where no part of P can match \( or \) then \(P\) should definitely match (B)(A) and return (A) in the matches array [2009-09-18 13:46:51] j...@php.net Please, simplify the regex to as much as possible. Once you have the simplest case still showing the problem we might be able to say whether it's a bug or not. [2009-09-16 18:16:59] anoop dot john at zyxware dot com I know for sure one thing. The pattern matches only one opening brace and one closing brace. So it cannot start matching with the first pair of brackets and go on matching the second pair of braces in the example given. When it fails with the first pair of braces the matching should restart beginning with the opening brace of the second pair of braces. [2009-09-16 12:03:28] j...@php.net And you're 100% sure your pattern is not buggy? [2009-09-16 01:39:22] anoop dot john at zyxware dot com Description: I am using a complex regex pattern to match stock tickers in a piece of text. The pattern given below $pattern = '/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/'; should match (AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW) and it does match it when the subject string is given alone. However when you prepend another particular string that does not match this pattern in front of this subject string the regex ceases to match the original portion of the string. The culprit string is given below. (Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange: CRX;NasdaqGM: QTWW) The pattern matches only one opening brace and will not match another opening brace. So it cannot be that the pattern ate through the first pair of brackets and went into the second pair of brackets and fails to match when the culprit string is prepended. Reproduce code: --- $pattern = '/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/'; preg_match_all($pattern, '(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange: CRX;NasdaqGM: QTWW) (AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER); var_export($matches); echo br /br /; preg_match_all($pattern, '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER); var_export($matches); Expected result: array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) Actual result: -- array ( ) array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) -- Edit this bug report at http://bugs.php.net/?id=49568edit=1
#49568 [Opn-Fbk]: Regex does not match when text added to matching text
ID: 49568 Updated by: j...@php.net Reported By: anoop dot john at zyxware dot com -Status: Open +Status: Feedback Bug Type: PCRE related Operating System: Ubuntu Jaunty PHP Version: 5.2.10 New Comment: And you're 100% sure your pattern is not buggy? Previous Comments: [2009-09-16 01:39:22] anoop dot john at zyxware dot com Description: I am using a complex regex pattern to match stock tickers in a piece of text. The pattern given below $pattern = '/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/'; should match (AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW) and it does match it when the subject string is given alone. However when you prepend another particular string that does not match this pattern in front of this subject string the regex ceases to match the original portion of the string. The culprit string is given below. (Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange: CRX;NasdaqGM: QTWW) The pattern matches only one opening brace and will not match another opening brace. So it cannot be that the pattern ate through the first pair of brackets and went into the second pair of brackets and fails to match when the culprit string is prepended. Reproduce code: --- $pattern = '/\(((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*;((?i:\s*[a-z]*\s*[a-z]*\s*,)*\s*(?i:AMEX|NASDAQ|NasdaqGM|NasdaqGS|NYSE)\s*(?i:,\s*[a-z]*\s*[a-z]*\s*)*):\s*([A-Z]+)\s*\)/'; preg_match_all($pattern, '(Euronext, NASDAQ: CRXL; AMEX,NYSE,NASDAQ, Swiss Exchange: CRX;NasdaqGM: QTWW) (AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER); var_export($matches); echo br /br /; preg_match_all($pattern, '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', $matches, PREG_SET_ORDER); var_export($matches); Expected result: array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) Actual result: -- array ( ) array ( 0 = array ( 0 = '(AMEX,NYSE, Swiss Exchange: CRX;Nasdaq: QTWW)', 1 = 'AMEX,NYSE, Swiss Exchange', 2 = 'CRX', 3 = 'Nasdaq', 4 = 'QTWW', ), ) -- Edit this bug report at http://bugs.php.net/?id=49568edit=1