Re: [PHP-DEV] [WARNING] Release process for 4.3.2 starts RSN..
On Sun, Mar 09, 2003 at 05:17:37PM +0100, Derick Rethans wrote: | Hello, | | I guess nobody is interested in fixing this? Then I guess we won't get | 4.3.2 ever. | | To get this thing started, I'm going to roll PHP 4.3.2-pre1 | on Wednesday, 26th Feb, around 3pm EEST. And I'll announce | it on php-general too, to get some more people testing it | before we start with any RCs. | | Following is collection of bugs marked as critical and verified | which should be looked into and dealt with. If not fixed, | then please, PLEASE add some comment why they won't ever | be fixed and bogus the out. Hi Derick: I hate to be a pain, but would it be possible to get bug #22510 (http://bugs.php.net/bug.php?id=22510) looked at or assigned before 4.3.2 goes out the door? It's a refcount problem that leads to a double-free; verified and everything. I've been stepping through code on and off for the past week, but seeing that I know little to nothing about Zend's internals, there's probably lots of people who could take care of this much more easily. P.S. Where should I send the Finlandia? ;-) Thanks a lot, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] pg_lo_open and object creation... intended behavior?
Hi: This is an excerpt from ext/pgsql/pgsql.c, in pg_lo_open: --- if (strchr(mode_string, 'w') == mode_string) { pgsql_mode |= INV_WRITE; create = 1; if (strchr(mode_string, '+') == mode_string+1) { pgsql_mode |= INV_READ; } } pgsql_lofp = (pgLofp *) emalloc(sizeof(pgLofp)); if ((pgsql_lofd = lo_open(pgsql, oid, pgsql_mode)) == -1) { if (create) { if ((oid = lo_creat(pgsql, INV_READ|INV_WRITE)) == 0) { efree(pgsql_lofp); php_error_docref(NULL TSRMLS_CC, E_WARNING, Unable to create PostgreSQL large object.); RETURN_FALSE; } else { if ((pgsql_lofd = lo_open(pgsql, oid, pgsql_mode)) == -1) { if (lo_unlink(pgsql, oid) == -1) { efree(pgsql_lofp); ... } ... registers resource and returns ... --- If I'm reading this correctly, it looks like a call to pg_lo_open (with an object identifier specified explicitly in $oid) can theoretically return a file handle pointing to a different, newly-created large object in the case that the initial open failed. If this is the case, it seems unintuitive and very cumbersome to handle from user-space. Can we change the 'if (create) {' branch to only be triggered when the oid was left unset (ensuring that the open() failure actually gets back to the caller)? Best Regards, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Getting an external param into a userspace streams filter...
Hi Wez, everyone: Is there (or will there ever be) a good way to transmit an extra parameter into a php_user_filter around the time that oncreate() is called? I've run into a couple cases where it'd be incredibly useful (e.g. for filters that don't modify the stream, but do have side-effects). Is there a known technical reason why this would be difficult or impossible? If not, can I go ahead and submit a patch at some point? :) Thanks, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Getting an external param into a userspace streams filter...
On Thu, Feb 27, 2003 at 09:40:15AM -0500, David Brown wrote: | Hi Wez, everyone: | | Is there (or will there ever be) a good way to transmit an extra | parameter into a php_user_filter around the time that oncreate() is | called? I've run into a couple cases where it'd be incredibly useful | (e.g. for filters that don't modify the stream, but do have | side-effects). bool stream_filter_append ( resource stream, string filtername [, string params]) ^^^ Nevermind. Seems I can't read this morning. Sorry about that. - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Getting an external param into a userspace streams filter...
On Thu, Feb 27, 2003 at 09:42:24AM -0500, David Brown wrote: | On Thu, Feb 27, 2003 at 09:40:15AM -0500, David Brown wrote: | | Hi Wez, everyone: | | | | Is there (or will there ever be) a good way to transmit an extra | | parameter into a php_user_filter around the time that oncreate() is | | called? I've run into a couple cases where it'd be incredibly useful | | Nevermind. Seems I can't read this morning. Sorry about that. Well, perhaps I jumped to self-chastisement a bit too quick. There's a single string parameter that can be passed to stream_filter_append / stream_filter_prepend, but how would one retrieve this information once inside of the filter? Moreover, for passing arbitrary zvals in, would I have to resort some hackery using array keys from a global, or is there an easier way? Thanks in advance, - Dave 'Replies to self way too many times' [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Getting an external param into a userspace streams filter...
Hi Wez: On Thu, Feb 27, 2003 at 04:26:40PM +, Wez Furlong wrote: | Hi David, | | user filters are in a little bit of flux atm. | | However, the idea is that the param argument will be altered to be a | zval (rather than just a string). | | In the oncreate() method, the following member variables are available | to the filter: | | string $this-filtername; // name of the filter | mixed $this-params; // params passed from prepend/append func Excellent. :) I did finally find where $this-params gets set (in ext/standard/user_filters.c:242), but I figured a third reply to myself would probably just be embarassing. ;-) | Hope this helps; please only use current CVS for PHP 5 for playing | around with this stuff; if you run into problems let me (and Sara | [EMAIL PROTECTED]) know and we can sort them out. Will do. I'm currently wrapping the experimental streams stuff in a class, falling back to a user-space streams/filter implementation when the PHP 5 one isn't available. I'm currently using the oncreate() method (of my wrapper class) to transmit the parameters to the filter; I just wanted to make sure that this wouldn't bite me when I tried to slide the PHP filter implementation underneath it. Anyway, much thanks. - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Of string constants, bytecode, and concatenation
Hi everyone: This may well be a stupid question, but I've spend enough time staring blankly at zend_compile.c/zend_execute.c that I figured it was time to ask. :) Say I have a section of code like this: ?php $s1 = 'foo' . 'bar' . 'baz'; $s2 = 'foobarbaz'; ? In the PHP bytecode (I hope I'm using the right terminology - I mean the stuff in the opline; the stuff that gets stored in the cache under APC or Zend cache), is there any functional difference between the assignment to $s1 and the assignment to $s2? Or, to put it more precisely, is Zend currently able to figure out that the strings on both sides of the concatenation operator are constants, and don't need to be concatenated at runtime? If not, can anyone explain the barriers to doing something like this? I'm attempting to learn a bit of the gory details of the interpreter, so the more pointers into the source anyone can provide, the better off I'll be. Thanks a lot, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Of string constants, bytecode, and concatenation
On Wed, Feb 26, 2003 at 05:36:54PM +0100, Derick Rethans wrote: | No, the engine doesn't do this at compile time. This first one produces: | | number of ops: 5 | line # op fetch ext operands | --- |1 0 CONCAT ~1, 'foo', 'bar' | 1 CONCAT ~2, ~1, 'baz' | 2 FETCH_W local $0, 's1' | 3 ASSIGN $3, $0, ~2 |3 4 RETURN 1 | | The second one: | | line # op fetch ext operands | --- |1 0 FETCH_W local $0, 's2' | 1 ASSIGN $1, $0, 'foobarbaz' |2 2 RETURN 1 Is that output a ZEND_DEBUG thing, or is that an external tool? | If not, can anyone explain the barriers to doing something like this? | | It's the job of an optimizer, not of a compiler. And because PHP doesn't | have an internal optimizer, this is not optimized out. You can either Okay. Makes complete sense. I was thinking more along the lines of wouldn't it be nice if...?. I hadn't quite made it to where would that belong?. :) I'll check out the optimizers. I noticed that the new CVS version of APC seems to have a configuration option for optimization as well, though I'm not sure how far along it is. Thanks again, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] mb_string overloading and binary data...
Hi: This is kind of a user-space question, but I'm hoping that it concerns enough of the PHP infrastructure (conceptually) that this is the right place to post it. I've got an application that processes both textual and binary data. The domain of the application isn't really relevant, but it makes liberal use of string and pcre function. Ignoring the fact (?) that php_pcre doesn't seem to be mb-aware... Say I were to turn on the mb_string overload support, effectively replacing strlen, etc. with equivalent multibyte functions. Is there any way then to still get the exact size, in bytes, of a 'string' of binary data? In general, I guess the question is 'Is there a preferred way of handling binary data in memory, while remaining multibyte-safe?' Apologies if this is way O/T or has already been beaten to death... - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Current HEAD segfaults with Horde/CHORA
On Tue, Feb 11, 2003 at 01:18:20PM +0100, Derick Rethans wrote: | On Tue, 11 Feb 2003, Sebastian Bergmann wrote: | | Derick Rethans wrote: |Be that as it may, but it still shouldn't segfault, no? ;-) | | recursive function calls always segfault, just like: | | ?php function a() { a(); }; a(); ? | | so it's 'expected behavior'. I assume the crash on infinite recursion is a stack-overflow type thing, but is there any reason that doesn't trigger the 'Allowed memory exhausted' and exit cleanly? Just curious... :) Thanks, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Current HEAD segfaults with Horde/CHORA
On Tue, Feb 11, 2003 at 03:17:53PM -0500, Derick Rethans wrote: | David Brown wrote: | | I assume the crash on infinite recursion is a stack-overflow type thing, | but is there any reason that doesn't trigger the 'Allowed memory | exhausted' and exit cleanly? | | Just curious... :) | efficiency :) Adding checks for this will be 1) inaccurate, 2) slow and | thats enough not to do them :) Hi Derick: Sorry about leaving php-dev off of the Cc - your reply didn't make it back to the php-dev list, though you did indeed respond. :) A couple of followup questions: 1. The crashes I'm able to catch with ?php function a(){a();}a(); ? stop GDB at zend_execute line 1489: zend_ptr_stack_n_push(EG(arg_types_stack), 2, EX(fbc), EX(object).ptr); However, zend_ptr_stack_n_push seems to handle growth of the stack, and there aren't pointer dereferences anywhere else. Am I looking at the wrong section of code? The wrong stack, maybe? :) 2. Is the PHP stack size configurable, either at run-time or at compile-time? (That is, assuming it's defined by PHP and not a resource limitation/setting in the OS). Thanks in advance, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Capturing headers with output buffering?
Hi: Architecturally speaking, is there any simple way to modify an sapi backend to return HTTP headers through the output buffering mechanism? As far as I can tell, headers are managed seperately by main/output.c, with php_ub_body_write_no_header being substituted in once the HTTP headers are sent. Pointers to anything would be greatly appreciated. TIA, - Dave -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Capturing headers with output buffering?
Hi George: It's something that's probably better solved in user-space, but I figured I'd poke around anyway. :) I'm attempting to write a little prefork HTTP server entirely in PHP. The script instansiates an 'application class', which is persistent across requests. Output of the application is captured with an ob_* callback function, and then stuffed down a socket. I'm hoping for free in-memory opcode caching and database connection persistence (by virtue of recycling the same interpreter across multiple requests), and possibly the elimination of a lot of application-specific startup time. Of course, this whole thing could very well just be a bad idea. :) Anyway, headers aren't currently included in the buffered output, which causes the header() function to print to stdout, effectively doing nothing. I could just wrap header() with a user-space function, but that would prevent a lot of scripts from running as-is. Bad idea? Maybe. There's also the matter of getting it to parse POST/GET without completely reinventing the wheel... - Dave On Sun, Nov 24, 2002 at 05:57:33PM -0500, George Schlossnagle wrote: | What are you trying to accomplish? | | | On Sunday, November 24, 2002, at 05:40 PM, David Brown wrote: | | Hi: | | Architecturally speaking, is there any simple way to modify an sapi | backend to return HTTP headers through the output buffering mechanism? | | As far as I can tell, headers are managed seperately by main/output.c, | with php_ub_body_write_no_header being substituted in once the HTTP | headers are sent. | | Pointers to anything would be greatly appreciated. | | TIA, | - Dave | | | -- | PHP Development Mailing List http://www.php.net/ | To unsubscribe, visit: http://www.php.net/unsub.php | | | -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Proto void and return values...
On Tue, Nov 12, 2002 at 02:16:41PM -0500, David Brown wrote: | Hi everyone: | | For functions prototyped as returning void, return values seem to be applied | at random. Some functions, such as trigger_error/user_error, srand, ob_start, | and phpinfo, use RETURN_TRUE. The vast majority of these functions just fall | through, implicitly returning NULL to userland. Or perhaps I'v just thought about this entirely too long. Is it possible that the prototypes are just wrong in the documentation? Regards, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Re: Bug #18547 Updated: Remote attacker can cause SIGSEGV (fwd)
On Wed, Jul 24, 2002 at 01:37:12PM -0700, Thomas Cannon wrote: -- Forwarded message -- Date: Wed, 24 Jul 2002 16:12:06 -0400 (EDT) From: Dan Kalowsky [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Bug #18547 Updated: Remote attacker can cause SIGSEGV Please send it to [EMAIL PROTECTED] (Okay, that's easy enough -- I posted this in the web form, but it wrapped all to hell. Thanks for the email address, Mr. Kalowsky) Hello. While working on an exploit for the multipart_buffer_headers() hole that you just fixed, and I found another problem that you might want to look into. It looks like a DoS only, but there might be a way to execute arbitrary code and I just haven't found it yet. Credit for the find goes to myself and members of the [0dd] 0-Day Digest. FWIW, I was able to reproduce the SEGV, one per connection, on a Linux 2.4.18 server here. - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Variable expansion in user-space
Hi guys: I've looked through the documentation and didn't see anything like this; please do let me know if this has been implemented/discussed previously. I'm looking for a fast mechanism to do 'user-space' string expansion. That is, given a key/value set $a and a string $b, i'd like for every occurance of $key in $b to be expanded to $a[$key]. This in and of itself it easy to do with preg_replace_callback() or eval(). Both of the above methods have their disadvantages, though (eval's being the possibility of escape-style attacks, preg_replace_callback's being having to fire up the regex engine). I'd like to propose a function var_expand($str [, $namespace]). It'd be capable of using PHP's built-in variable expansion code in a controlled manner - saving the need to do ad-hoc string replacement or eval() blocks. I'm willing to bet it'd also be fast. If I'm barking up the wrong tree or am completely missing something, please let me know. Barring any immediate vetos, would it be appropriate if I prepared a patch later this week? Thanks in advance, - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] [PATCH] preg_match(_all) support for capturing offsets
Hi Andrei, all: (This is a re-send of a previous message that received no replies; my apologies if you've already seen/considered it) In a previous patch (http://news.php.net/article.php?group=php.devarticle=84281), support was added to preg_split for capturing offsets along with matches. The attached patch adds similar support to preg_match and preg_match_all via a new PREG_MATCH_OFFSET_CAPTURE flag. The code handles capturing offsets for both subpattern matches and whole pattern matches, using the previously-added add_offset_pair helper function. The flag is a new fourth (and optional) parameter for preg_match, and are or'd into the existing 'order' parameter for preg_match_all, above PREG_SET_ORDER and PREG_PATTERN_ORDER. The patch below is diffed against the CVS head - humbly sumbitted for application, rejection, suggestions, or extensive flaming. :) Thanks in advance, - Dave [EMAIL PROTECTED] --- ext/pcre/php_pcre.c.origTue Jun 4 13:02:50 2002 +++ ext/pcre/php_pcre.c Tue Jun 4 13:12:10 2002 @@ -35,7 +35,9 @@ #define PREG_PATTERN_ORDER 0 #define PREG_SET_ORDER 1 -#definePREG_SPLIT_NO_EMPTY (10) +#define PREG_MATCH_OFFSET_CAPTURE (12) + +#define PREG_SPLIT_NO_EMPTY(10) #define PREG_SPLIT_DELIM_CAPTURE (11) #define PREG_SPLIT_OFFSET_CAPTURE (12) @@ -99,6 +101,7 @@ REGISTER_LONG_CONSTANT(PREG_PATTERN_ORDER, PREG_PATTERN_ORDER, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SET_ORDER, PREG_SET_ORDER, CONST_CS | CONST_PERSISTENT); + REGISTER_LONG_CONSTANT(PREG_MATCH_OFFSET_CAPTURE, PREG_MATCH_OFFSET_CAPTURE, +CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_NO_EMPTY, PREG_SPLIT_NO_EMPTY, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_DELIM_CAPTURE, PREG_SPLIT_DELIM_CAPTURE, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_OFFSET_CAPTURE, PREG_SPLIT_OFFSET_CAPTURE, CONST_CS | CONST_PERSISTENT); @@ -310,6 +313,24 @@ } /* }}} */ +/* {{{ add_offset_pair + */ +static inline void add_offset_pair(zval *result, char *str, int len, int offset) +{ + zval *match_pair; + + ALLOC_ZVAL(match_pair); + array_init(match_pair); + INIT_PZVAL(match_pair); + + /* Add (match, offset) to the return value */ + add_next_index_stringl(match_pair, str, len, 1); + add_next_index_long(match_pair, offset); + + zend_hash_next_index_insert(Z_ARRVAL_P(result), match_pair, sizeof(zval *), +NULL); +} +/* }}} */ + /* {{{ php_pcre_match */ static void php_pcre_match(INTERNAL_FUNCTION_PARAMETERS, int global) @@ -335,6 +356,7 @@ int matched; /* Has anything matched */ int i; int subpats_order_val = 0; /* Integer value of subpats_order */ + int offset_capture = 0;/* If offsets should +be captured */ int g_notempty = 0;/* If the match should not be empty */ const char **stringlist;/* Used to hold list of subpatterns */ char*match; /* The current match */ @@ -363,11 +385,17 @@ /* Make sure subpats_order is a number */ convert_to_long_ex(subpats_order); - subpats_order_val = Z_LVAL_PP(subpats_order); - if (subpats_order_val PREG_PATTERN_ORDER || - subpats_order_val PREG_SET_ORDER) { - zend_error(E_WARNING, Wrong value for parameter 4 in call to preg_match_all()); - } +offset_capture = (Z_LVAL_PP(subpats_order) PREG_MATCH_OFFSET_CAPTURE); + + if (global) { + subpats_order_val = (Z_LVAL_PP(subpats_order) 1UL); + + if ((subpats_order_val PREG_PATTERN_ORDER) || + (subpats_order_val PREG_SET_ORDER)) { + zend_error(E_WARNING, Wrong value for parameter 4 +in call to preg_match_all()); + } +} + break; default: @@ -442,8 +470,13 @@ if (subpats_order_val == PREG_PATTERN_ORDER) { /* For each subpattern, insert it into the appropriate array. */ for (i = 0; i count; i++) { - add_next_index_stringl(match_sets[i], (char *)stringlist[i], -
[PHP-DEV] [PATCH] preg_match(_all) support for capturing offsets
Hi Andrei, all: In a previous patch (http://news.php.net/article.php?group=php.devarticle=84281), support was added to preg_split for capturing offsets along with matches. The attached patch adds similar support to preg_match and preg_match_all via a new PREG_MATCH_OFFSET_CAPTURE flag. The code handles capturing offsets for both subpattern matches and whole pattern matches, using the previously-added add_offset_pair helper function. The flag is a new fourth (and optional) parameter for preg_match, and are or'd into the existing 'order' parameter for preg_match_all, above PREG_SET_ORDER and PREG_PATTERN_ORDER. The patch below is diffed against the CVS head - humbly sumbitted for application, rejection, suggestions, or extensive flaming. :) Thanks in advance, - Dave [EMAIL PROTECTED] --- ext/pcre/php_pcre.c.origTue Jun 4 13:02:50 2002 +++ ext/pcre/php_pcre.c Tue Jun 4 13:12:10 2002 @@ -35,7 +35,9 @@ #define PREG_PATTERN_ORDER 0 #define PREG_SET_ORDER 1 -#definePREG_SPLIT_NO_EMPTY (10) +#define PREG_MATCH_OFFSET_CAPTURE (12) + +#define PREG_SPLIT_NO_EMPTY(10) #define PREG_SPLIT_DELIM_CAPTURE (11) #define PREG_SPLIT_OFFSET_CAPTURE (12) @@ -99,6 +101,7 @@ REGISTER_LONG_CONSTANT(PREG_PATTERN_ORDER, PREG_PATTERN_ORDER, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SET_ORDER, PREG_SET_ORDER, CONST_CS | CONST_PERSISTENT); + REGISTER_LONG_CONSTANT(PREG_MATCH_OFFSET_CAPTURE, PREG_MATCH_OFFSET_CAPTURE, +CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_NO_EMPTY, PREG_SPLIT_NO_EMPTY, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_DELIM_CAPTURE, PREG_SPLIT_DELIM_CAPTURE, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_OFFSET_CAPTURE, PREG_SPLIT_OFFSET_CAPTURE, CONST_CS | CONST_PERSISTENT); @@ -310,6 +313,24 @@ } /* }}} */ +/* {{{ add_offset_pair + */ +static inline void add_offset_pair(zval *result, char *str, int len, int offset) +{ + zval *match_pair; + + ALLOC_ZVAL(match_pair); + array_init(match_pair); + INIT_PZVAL(match_pair); + + /* Add (match, offset) to the return value */ + add_next_index_stringl(match_pair, str, len, 1); + add_next_index_long(match_pair, offset); + + zend_hash_next_index_insert(Z_ARRVAL_P(result), match_pair, sizeof(zval *), +NULL); +} +/* }}} */ + /* {{{ php_pcre_match */ static void php_pcre_match(INTERNAL_FUNCTION_PARAMETERS, int global) @@ -335,6 +356,7 @@ int matched; /* Has anything matched */ int i; int subpats_order_val = 0; /* Integer value of subpats_order */ + int offset_capture = 0;/* If offsets should +be captured */ int g_notempty = 0;/* If the match should not be empty */ const char **stringlist;/* Used to hold list of subpatterns */ char*match; /* The current match */ @@ -363,11 +385,17 @@ /* Make sure subpats_order is a number */ convert_to_long_ex(subpats_order); - subpats_order_val = Z_LVAL_PP(subpats_order); - if (subpats_order_val PREG_PATTERN_ORDER || - subpats_order_val PREG_SET_ORDER) { - zend_error(E_WARNING, Wrong value for parameter 4 in call to preg_match_all()); - } +offset_capture = (Z_LVAL_PP(subpats_order) PREG_MATCH_OFFSET_CAPTURE); + + if (global) { + subpats_order_val = (Z_LVAL_PP(subpats_order) 1UL); + + if ((subpats_order_val PREG_PATTERN_ORDER) || + (subpats_order_val PREG_SET_ORDER)) { + zend_error(E_WARNING, Wrong value for parameter 4 +in call to preg_match_all()); + } +} + break; default: @@ -442,8 +470,13 @@ if (subpats_order_val == PREG_PATTERN_ORDER) { /* For each subpattern, insert it into the appropriate array. */ for (i = 0; i count; i++) { - add_next_index_stringl(match_sets[i], (char *)stringlist[i], - offsets[(i1)+1] - offsets[i1], 1); + if (offset_capture) { +
[PHP-DEV] Re: [PATCH] Allow preg_split to capture offsets
On Thu, May 23, 2002 at 12:28:02PM -0500, Andrei Zmievski wrote: David, Enclosed is a patch to allow PCRE's preg_split to return an array of (match, offset) pairs, if PREG_SPLIT_OFFSET_CAPTURE is or'd into the flags parameter. Submitted for inclusion, rejection, extensive flaming, or suggestions. :) I've applied the patch with some modifications. Notably, when PREG_SPLIT_DELIM_CAPTURE was along with this new flag, the delimiters were not being captured with offsets. I also abstracted the match pair addition into a separate (inlined) function. Both of those were on my to do list, but I figured I'd go ahead and post my 10-minute hack to gauge interest before moving forward. Anyway, much thanks. :) - Dave [EMAIL PROTECTED] -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] [PATCH] Allow preg_split to capture offsets
Hi: Enclosed is a patch to allow PCRE's preg_split to return an array of (match, offset) pairs, if PREG_SPLIT_OFFSET_CAPTURE is or'd into the flags parameter. Submitted for inclusion, rejection, extensive flaming, or suggestions. :) This is a re-send of a previous patch; the last one didn't seem to make it to the list. A bit of background: I'm currently working on a cross-referencing system that uses character offsets internally, matching entries in a word index to positions in a file. The system captures it's word list via preg_split, excluding certain tags, character combinations, and whitespace from indexing. Not finding an obvious way to capture the match offsets directly, I tried: + Rescanning the input string with strstr(), starting from position(last_match) + 1, looking for the current match. While reasonably fast at O(n), it has a major problem when the matched string was also a part of the delimiter. + A somewhat involved sequence of two preg_split() calls and an array_diff(). One split is PREG_SPLIT_DELIM_CAPTURE, and the array_diff finds which strings are delimiters. The resulting array is then scanned, keeping a running total of string lengths. This works, but has an obviously large memory (and to a lesser extent run-time) cost. Alternatives (especially other plain PHP solutions) are welcome. Otherwise - is there more than a snowball's chance of something like this being included in a future release? Thanks in advance, - Dave [EMAIL PROTECTED] --- php-4.2.1-dist/ext/pcre/php_pcre.c Thu Feb 28 03:26:35 2002 +++ php-4.2.1/ext/pcre/php_pcre.c Fri May 17 11:28:02 2002 @@ -37,6 +37,7 @@ #definePREG_SPLIT_NO_EMPTY (10) #define PREG_SPLIT_DELIM_CAPTURE (11) +#define PREG_SPLIT_OFFSET_CAPTURE (12) #define PREG_REPLACE_EVAL (10) @@ -100,6 +101,7 @@ REGISTER_LONG_CONSTANT(PREG_SET_ORDER, PREG_SET_ORDER, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_NO_EMPTY, PREG_SPLIT_NO_EMPTY, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_DELIM_CAPTURE, PREG_SPLIT_DELIM_CAPTURE, CONST_CS | CONST_PERSISTENT); + REGISTER_LONG_CONSTANT(PREG_SPLIT_OFFSET_CAPTURE, PREG_SPLIT_OFFSET_CAPTURE, +CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_GREP_INVERT, PREG_GREP_INVERT, CONST_CS | CONST_PERSISTENT); return SUCCESS; } @@ -1080,8 +1082,10 @@ int limit_val = -1;/* Integer value of limit */ int no_empty = 0; /* If NO_EMPTY flag is set */ int delim_capture = 0; /* If delimiters should be captured */ + int offset_capture = 0;/* If offsets should be +captured */ int count = 0; /* Count of matched subpatterns */ int start_offset; /* Where the new search starts */ + int next_offset; /* End of the last +delimiter match + 1 */ int g_notempty = 0;/* If the match should not be empty */ char*match, /* The current match */ *last_match;/* Location of last match */ @@ -1102,6 +1106,7 @@ convert_to_long_ex(flags); no_empty = Z_LVAL_PP(flags) PREG_SPLIT_NO_EMPTY; delim_capture = Z_LVAL_PP(flags) PREG_SPLIT_DELIM_CAPTURE; + offset_capture = Z_LVAL_PP(flags) PREG_SPLIT_OFFSET_CAPTURE; } } @@ -1123,6 +1128,7 @@ /* Start at the beginning of the string */ start_offset = 0; + next_offset = 0; last_match = Z_STRVAL_PP(subject); match = NULL; @@ -1143,9 +1149,27 @@ match = Z_STRVAL_PP(subject) + offsets[0]; if (!no_empty || Z_STRVAL_PP(subject)[offsets[0]] != last_match) { - /* Add the piece to the return value */ - add_next_index_stringl(return_value, last_match, - Z_STRVAL_PP(subject)[offsets[0]]-last_match, 1); + + if (offset_capture) { + zval *match_pair; + ALLOC_ZVAL(match_pair); + array_init(match_pair); + INIT_PZVAL(match_pair); + + /* Add (match, offset) to the return value */ +
[PHP-DEV] [PATCH] Allow preg_split to capture offsets
Hi: Enclosed is a patch to allow PCRE's preg_split to return an array of (match, offset) pairs, if PREG_SPLIT_OFFSET_CAPTURE is or'd into the flags parameter. Submitted for inclusion, rejection, extensive flaming, or suggestions. :) A bit of background: I'm currently working on a cross-referencing system that uses character offsets internally, matching entries in a word index to positions in a file. The system captures it's word list via preg_split, excluding certain tags, character combinations, and whitespace from indexing. Not finding an obvious way to capture the match offsets directly, I tried: + Rescanning the input string with strstr(), starting from position(last_match) + 1, looking for the current match. While reasonably fast at O(n), it has a major problem when the matched string was also a part of the delimiter. + A somewhat involved sequence of two preg_split() calls and an array_diff(). One split is PREG_SPLIT_DELIM_CAPTURE, and the array_diff finds which strings are delimiters. The resulting array is then scanned, keeping a running total of string lengths. This works, but has an obviously large memory (and to a lesser extent run-time) cost. Alternatives (especially other plain PHP solutions) are welcome. Otherwise - is there more than a snowball's chance of something like this being included in a future release? Thanks in advance, - Dave [EMAIL PROTECTED] --- php-4.2.1-dist/ext/pcre/php_pcre.c Thu Feb 28 03:26:35 2002 +++ php-4.2.1/ext/pcre/php_pcre.c Fri May 17 11:28:02 2002 @@ -37,6 +37,7 @@ #definePREG_SPLIT_NO_EMPTY (10) #define PREG_SPLIT_DELIM_CAPTURE (11) +#define PREG_SPLIT_OFFSET_CAPTURE (12) #define PREG_REPLACE_EVAL (10) @@ -100,6 +101,7 @@ REGISTER_LONG_CONSTANT(PREG_SET_ORDER, PREG_SET_ORDER, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_NO_EMPTY, PREG_SPLIT_NO_EMPTY, CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_SPLIT_DELIM_CAPTURE, PREG_SPLIT_DELIM_CAPTURE, CONST_CS | CONST_PERSISTENT); + REGISTER_LONG_CONSTANT(PREG_SPLIT_OFFSET_CAPTURE, PREG_SPLIT_OFFSET_CAPTURE, +CONST_CS | CONST_PERSISTENT); REGISTER_LONG_CONSTANT(PREG_GREP_INVERT, PREG_GREP_INVERT, CONST_CS | CONST_PERSISTENT); return SUCCESS; } @@ -1080,8 +1082,10 @@ int limit_val = -1;/* Integer value of limit */ int no_empty = 0; /* If NO_EMPTY flag is set */ int delim_capture = 0; /* If delimiters should be captured */ + int offset_capture = 0;/* If offsets should be +captured */ int count = 0; /* Count of matched subpatterns */ int start_offset; /* Where the new search starts */ + int next_offset; /* End of the last +delimiter match + 1 */ int g_notempty = 0;/* If the match should not be empty */ char*match, /* The current match */ *last_match;/* Location of last match */ @@ -1102,6 +1106,7 @@ convert_to_long_ex(flags); no_empty = Z_LVAL_PP(flags) PREG_SPLIT_NO_EMPTY; delim_capture = Z_LVAL_PP(flags) PREG_SPLIT_DELIM_CAPTURE; + offset_capture = Z_LVAL_PP(flags) PREG_SPLIT_OFFSET_CAPTURE; } } @@ -1123,6 +1128,7 @@ /* Start at the beginning of the string */ start_offset = 0; + next_offset = 0; last_match = Z_STRVAL_PP(subject); match = NULL; @@ -1143,9 +1149,27 @@ match = Z_STRVAL_PP(subject) + offsets[0]; if (!no_empty || Z_STRVAL_PP(subject)[offsets[0]] != last_match) { - /* Add the piece to the return value */ - add_next_index_stringl(return_value, last_match, - Z_STRVAL_PP(subject)[offsets[0]]-last_match, 1); + + if (offset_capture) { + zval *match_pair; + ALLOC_ZVAL(match_pair); + array_init(match_pair); + INIT_PZVAL(match_pair); + + /* Add (match, offset) to the return value */ + add_next_index_stringl(match_pair, last_match, +