Re: [PHP-DEV] [WARNING] Release process for 4.3.2 starts RSN..

2003-03-09 Thread David Brown
On Sun, Mar 09, 2003 at 05:17:37PM +0100, Derick Rethans wrote:
| Hello,
| 
| I guess nobody is interested in fixing this? Then I guess we won't get 
| 4.3.2 ever.
| 
|  To get this thing started, I'm going to roll PHP 4.3.2-pre1
|  on Wednesday, 26th Feb, around 3pm EEST. And I'll announce
|  it on php-general too, to get some more people testing it
|  before we start with any RCs.
|  
|  Following is collection of bugs marked as critical and verified
|  which should be looked into and dealt with. If not fixed,
|  then please, PLEASE add some comment why they won't ever
|  be fixed and bogus the out.

Hi Derick:

I hate to be a pain, but would it be possible to get bug #22510
(http://bugs.php.net/bug.php?id=22510) looked at or assigned before
4.3.2 goes out the door? It's a refcount problem that leads to a
double-free; verified and everything.

I've been stepping through code on and off for the past week, but seeing
that I know little to nothing about Zend's internals, there's probably
lots of people who could take care of this much more easily.

P.S. Where should I send the Finlandia? ;-)

Thanks a lot,

- Dave
  [EMAIL PROTECTED]


--
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] pg_lo_open and object creation... intended behavior?

2003-03-05 Thread David Brown
Hi:

This is an excerpt from ext/pgsql/pgsql.c, in pg_lo_open:

---
if (strchr(mode_string, 'w') == mode_string) {
pgsql_mode |= INV_WRITE;
create = 1;
if (strchr(mode_string, '+') == mode_string+1) {
pgsql_mode |= INV_READ;
}
}

pgsql_lofp = (pgLofp *) emalloc(sizeof(pgLofp));

if ((pgsql_lofd = lo_open(pgsql, oid, pgsql_mode)) == -1) {
if (create) {
if ((oid = lo_creat(pgsql, INV_READ|INV_WRITE)) == 0) {
efree(pgsql_lofp);
php_error_docref(NULL TSRMLS_CC, E_WARNING,
 Unable to create PostgreSQL large object.);
RETURN_FALSE;
} else {
if ((pgsql_lofd = lo_open(pgsql, oid, pgsql_mode)) == -1) {
if (lo_unlink(pgsql, oid) == -1) {
efree(pgsql_lofp);
...
}

  ... registers resource and returns ...
---

If I'm reading this correctly, it looks like a call to pg_lo_open (with
an object identifier specified explicitly in $oid) can theoretically
return a file handle pointing to a different, newly-created large
object in the case that the initial open failed.

If this is the case, it seems unintuitive and very cumbersome to handle
from user-space. Can we change the 'if (create) {' branch to only be
triggered when the oid was left unset (ensuring that the open() failure
actually gets back to the caller)?


Best Regards,

- Dave
  [EMAIL PROTECTED]

-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Getting an external param into a userspace streams filter...

2003-02-27 Thread David Brown
Hi Wez, everyone:

Is there (or will there ever be) a good way to transmit an extra
parameter into a php_user_filter around the time that oncreate() is
called? I've run into a couple cases where it'd be incredibly useful
(e.g. for filters that don't modify the stream, but do have
side-effects).

Is there a known technical reason why this would be difficult or
impossible? If not, can I go ahead and submit a patch at some point? :)

Thanks,

- Dave
  [EMAIL PROTECTED]


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Getting an external param into a userspace streams filter...

2003-02-27 Thread David Brown
On Thu, Feb 27, 2003 at 09:40:15AM -0500, David Brown wrote:
| Hi Wez, everyone:
| 
| Is there (or will there ever be) a good way to transmit an extra
| parameter into a php_user_filter around the time that oncreate() is
| called? I've run into a couple cases where it'd be incredibly useful
| (e.g. for filters that don't modify the stream, but do have
| side-effects).

bool stream_filter_append
  ( resource stream, string filtername [, string params])
^^^

Nevermind. Seems I can't read this morning. Sorry about that.

- Dave
  [EMAIL PROTECTED]

-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Getting an external param into a userspace streams filter...

2003-02-27 Thread David Brown
On Thu, Feb 27, 2003 at 09:42:24AM -0500, David Brown wrote:
| On Thu, Feb 27, 2003 at 09:40:15AM -0500, David Brown wrote:
| | Hi Wez, everyone:
| | 
| | Is there (or will there ever be) a good way to transmit an extra
| | parameter into a php_user_filter around the time that oncreate() is
| | called? I've run into a couple cases where it'd be incredibly useful
| 
| Nevermind. Seems I can't read this morning. Sorry about that.

Well, perhaps I jumped to self-chastisement a bit too quick. There's a
single string parameter that can be passed to stream_filter_append /
stream_filter_prepend, but how would one retrieve this information once
inside of the filter? Moreover, for passing arbitrary zvals in, would I
have to resort some hackery using array keys from a global, or is there
an easier way?

Thanks in advance,

- Dave 'Replies to self way too many times'
  [EMAIL PROTECTED]


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Getting an external param into a userspace streams filter...

2003-02-27 Thread David Brown
Hi Wez:

On Thu, Feb 27, 2003 at 04:26:40PM +, Wez Furlong wrote:
| Hi David,
| 
| user filters are in a little bit of flux atm.
| 
| However, the idea is that the param argument will be altered to be a
| zval (rather than just a string).
| 
| In the oncreate() method, the following member variables are available
| to the filter:
| 
| string $this-filtername; // name of the filter
| mixed $this-params;  // params passed from prepend/append func

Excellent. :) I did finally find where $this-params gets set (in
ext/standard/user_filters.c:242), but I figured a third reply to myself
would probably just be embarassing. ;-)


| Hope this helps; please only use current CVS for PHP 5 for playing
| around with this stuff; if you run into problems let me (and Sara
| [EMAIL PROTECTED]) know and we can sort them out.

Will do. I'm currently wrapping the experimental streams stuff in a
class, falling back to a user-space streams/filter implementation when
the PHP 5 one isn't available.

I'm currently using the oncreate() method (of my wrapper class) to
transmit the parameters to the filter; I just wanted to make sure that
this wouldn't bite me when I tried to slide the PHP filter
implementation underneath it.

Anyway, much thanks.

- Dave
  [EMAIL PROTECTED]



-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Of string constants, bytecode, and concatenation

2003-02-26 Thread David Brown
Hi everyone:

This may well be a stupid question, but I've spend enough time staring
blankly at zend_compile.c/zend_execute.c that I figured it was time to
ask. :)

Say I have a section of code like this:

?php
  $s1 = 'foo' . 'bar' . 'baz';
  $s2 = 'foobarbaz';
?

In the PHP bytecode (I hope I'm using the right terminology - I mean the
stuff in the opline; the stuff that gets stored in the cache under APC
or Zend cache), is there any functional difference between the
assignment to $s1 and the assignment to $s2? Or, to put it more
precisely, is Zend currently able to figure out that the strings on both
sides of the concatenation operator are constants, and don't need to be
concatenated at runtime?

If not, can anyone explain the barriers to doing something like this?
I'm attempting to learn a bit of the gory details of the interpreter, so
the more pointers into the source anyone can provide, the better off
I'll be.


Thanks a lot,

- Dave
  [EMAIL PROTECTED]


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Of string constants, bytecode, and concatenation

2003-02-26 Thread David Brown
On Wed, Feb 26, 2003 at 05:36:54PM +0100, Derick Rethans wrote:
| No, the engine doesn't do this at compile time. This first one produces:
| 
| number of ops:  5
| line #  op   fetch  ext operands
| ---
|1 0  CONCAT  ~1, 'foo', 'bar'
|  1  CONCAT  ~2, ~1, 'baz'
|  2  FETCH_W  local  $0, 's1'
|  3  ASSIGN  $3, $0, ~2
|3 4  RETURN  1
| 
| The second one:
| 
| line #  op   fetch  ext operands
| ---
|1 0  FETCH_W  local  $0, 's2'
|  1  ASSIGN  $1, $0, 'foobarbaz'
|2 2  RETURN  1

Is that output a ZEND_DEBUG thing, or is that an external tool?


|  If not, can anyone explain the barriers to doing something like this?
| 
| It's the job of an optimizer, not of a compiler. And because PHP doesn't 
| have an internal optimizer, this is not optimized out. You can either 

Okay. Makes complete sense. I was thinking more along the lines of
wouldn't it be nice if...?. I hadn't quite made it to where would
that belong?. :)

I'll check out the optimizers. I noticed that the new CVS version of APC
seems to have a configuration option for optimization as well, though
I'm not sure how far along it is.


Thanks again,

- Dave
  [EMAIL PROTECTED]

-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] mb_string overloading and binary data...

2003-02-19 Thread David Brown
Hi:

This is kind of a user-space question, but I'm hoping that it concerns
enough of the PHP infrastructure (conceptually) that this is the right
place to post it.

I've got an application that processes both textual and binary data. The
domain of the application isn't really relevant, but it makes liberal
use of string and pcre function.

Ignoring the fact (?) that php_pcre doesn't seem to be mb-aware...
Say I were to turn on the mb_string overload support, effectively
replacing strlen, etc. with equivalent multibyte functions. Is there any
way then to still get the exact size, in bytes, of a 'string' of binary
data?

In general, I guess the question is 'Is there a preferred way of
handling binary data in memory, while remaining multibyte-safe?'

Apologies if this is way O/T or has already been beaten to death...

- Dave
  [EMAIL PROTECTED]


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] Current HEAD segfaults with Horde/CHORA

2003-02-11 Thread David Brown
On Tue, Feb 11, 2003 at 01:18:20PM +0100, Derick Rethans wrote:
| On Tue, 11 Feb 2003, Sebastian Bergmann wrote:
| 
|  Derick Rethans wrote:
|Be that as it may, but it still shouldn't segfault, no? ;-)
| 
| recursive function calls always segfault, just like:
| 
| ?php function a() { a(); }; a(); ?
| 
| so it's 'expected behavior'.


I assume the crash on infinite recursion is a stack-overflow type thing,
but is there any reason that doesn't trigger the 'Allowed memory
exhausted' and exit cleanly?

Just curious... :)


Thanks,

- Dave
  [EMAIL PROTECTED]


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] Current HEAD segfaults with Horde/CHORA

2003-02-11 Thread David Brown
On Tue, Feb 11, 2003 at 03:17:53PM -0500, Derick Rethans wrote:
|  David Brown wrote:
| 
|  I assume the crash on infinite recursion is a stack-overflow type thing,
|  but is there any reason that doesn't trigger the 'Allowed memory
|  exhausted' and exit cleanly?
| 
| Just curious... :)

| efficiency :) Adding checks for this will be 1) inaccurate, 2) slow and
| thats enough not to do them :)


Hi Derick:

Sorry about leaving php-dev off of the Cc - your reply didn't make it
back to the php-dev list, though you did indeed respond. :)

A couple of followup questions:

  1. The crashes I'm able to catch with ?php function a(){a();}a(); ?
 stop GDB at zend_execute line 1489:
  zend_ptr_stack_n_push(EG(arg_types_stack), 2, EX(fbc), EX(object).ptr);

 However, zend_ptr_stack_n_push seems to handle growth of the stack,
 and there aren't pointer dereferences anywhere else. Am I looking
 at the wrong section of code? The wrong stack, maybe? :)

  2. Is the PHP stack size configurable, either at run-time or at
 compile-time? (That is, assuming it's defined by PHP and not a
 resource limitation/setting in the OS).


Thanks in advance,

- Dave
  [EMAIL PROTECTED]

 

-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP-DEV] Capturing headers with output buffering?

2002-11-24 Thread David Brown
Hi:

Architecturally speaking, is there any simple way to modify an sapi
backend to return HTTP headers through the output buffering mechanism?

As far as I can tell, headers are managed seperately by main/output.c,
with php_ub_body_write_no_header being substituted in once the HTTP
headers are sent.

Pointers to anything would be greatly appreciated.

TIA,
- Dave


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] Capturing headers with output buffering?

2002-11-24 Thread David Brown
Hi George:

It's something that's probably better solved in user-space, but I
figured I'd poke around anyway. :)

I'm attempting to write a little prefork HTTP server entirely in PHP.
The script instansiates an 'application class', which is persistent
across requests. Output of the application is captured with an ob_*
callback function, and then stuffed down a socket. I'm hoping for free
in-memory opcode caching and database connection persistence (by virtue
of recycling the same interpreter across multiple requests), and
possibly the elimination of a lot of application-specific startup time.

Of course, this whole thing could very well just be a bad idea. :)

Anyway, headers aren't currently included in the buffered output, which
causes the header() function to print to stdout, effectively doing
nothing. I could just wrap header() with a user-space function, but that
would prevent a lot of scripts from running as-is.

Bad idea? Maybe. There's also the matter of getting it to parse
POST/GET without completely reinventing the wheel...

- Dave


On Sun, Nov 24, 2002 at 05:57:33PM -0500, George Schlossnagle wrote:
| What are you trying to accomplish?
| 
| 
| On Sunday, November 24, 2002, at 05:40 PM, David Brown wrote:
| 
| Hi:
| 
| Architecturally speaking, is there any simple way to modify an sapi
| backend to return HTTP headers through the output buffering mechanism?
| 
| As far as I can tell, headers are managed seperately by main/output.c,
| with php_ub_body_write_no_header being substituted in once the HTTP
| headers are sent.
| 
| Pointers to anything would be greatly appreciated.
| 
| TIA,
| - Dave
| 
| 
| -- 
| PHP Development Mailing List http://www.php.net/
| To unsubscribe, visit: http://www.php.net/unsub.php
| 
| 
| 

-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] Proto void and return values...

2002-11-12 Thread David Brown
On Tue, Nov 12, 2002 at 02:16:41PM -0500, David Brown wrote:
| Hi everyone:
| 
| For functions prototyped as returning void, return values seem to be applied
| at random. Some functions, such as trigger_error/user_error, srand, ob_start,
| and phpinfo, use RETURN_TRUE. The vast majority of these functions just fall
| through, implicitly returning NULL to userland.

Or perhaps I'v just thought about this entirely too long. Is it possible
that the prototypes are just wrong in the documentation?

Regards,

- Dave
  [EMAIL PROTECTED]


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] Re: Bug #18547 Updated: Remote attacker can cause SIGSEGV (fwd)

2002-07-25 Thread David Brown

On Wed, Jul 24, 2002 at 01:37:12PM -0700, Thomas Cannon wrote:
 -- Forwarded message --
 Date: Wed, 24 Jul 2002 16:12:06 -0400 (EDT)
 From: Dan Kalowsky [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: Bug #18547 Updated: Remote attacker can cause SIGSEGV
 
 Please send it to [EMAIL PROTECTED]
 
 (Okay, that's easy enough -- I posted this in the web form, but it
 wrapped all to hell. Thanks for the email address, Mr. Kalowsky)
 
 Hello. While working on an exploit for the multipart_buffer_headers() hole
 that you just fixed, and I found another problem that you might want to
 look into. It looks like a DoS only, but there might be a way to execute
 arbitrary code and I just haven't found it yet. Credit for the find goes
 to myself and members of the [0dd] 0-Day Digest.

FWIW, I was able to reproduce the SEGV, one per connection, on a Linux
2.4.18 server here.

- Dave
  [EMAIL PROTECTED]


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP-DEV] Variable expansion in user-space

2002-07-16 Thread David Brown

Hi guys:

I've looked through the documentation and didn't see anything like this;
please do let me know if this has been implemented/discussed previously.

I'm looking for a fast mechanism to do 'user-space' string expansion.
That is, given a key/value set $a and a string $b, i'd like for every
occurance of $key in $b to be expanded to $a[$key]. This in and of itself
it easy to do with preg_replace_callback() or eval().

Both of the above methods have their disadvantages, though (eval's being
the possibility of escape-style attacks, preg_replace_callback's being
having to fire up the regex engine). 

I'd like to propose a function var_expand($str [, $namespace]). It'd be
capable of using PHP's built-in variable expansion code in a controlled
manner - saving the need to do ad-hoc string replacement or eval()
blocks. I'm willing to bet it'd also be fast.

If I'm barking up the wrong tree or am completely missing something,
please let me know. Barring any immediate vetos, would it be appropriate
if I prepared a patch later this week?


Thanks in advance,
- Dave
  [EMAIL PROTECTED]


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP-DEV] [PATCH] preg_match(_all) support for capturing offsets

2002-06-06 Thread David Brown

Hi Andrei, all:

(This is a re-send of a previous message that received no replies; my
apologies if you've already seen/considered it)

In a previous patch 
(http://news.php.net/article.php?group=php.devarticle=84281), support
was added to preg_split for capturing offsets along with matches. The
attached patch adds similar support to preg_match and preg_match_all via
a new PREG_MATCH_OFFSET_CAPTURE flag.

The code handles capturing offsets for both subpattern matches and whole
pattern matches, using the previously-added add_offset_pair helper function.

The flag is a new fourth (and optional) parameter for preg_match, and
are or'd into the existing 'order' parameter for preg_match_all, above
PREG_SET_ORDER and PREG_PATTERN_ORDER.

The patch below is diffed against the CVS head - humbly sumbitted for
application, rejection, suggestions, or extensive flaming. :)


Thanks in advance,

- Dave
  [EMAIL PROTECTED]


--- ext/pcre/php_pcre.c.origTue Jun  4 13:02:50 2002
+++ ext/pcre/php_pcre.c Tue Jun  4 13:12:10 2002
@@ -35,7 +35,9 @@
 #define PREG_PATTERN_ORDER 0
 #define PREG_SET_ORDER 1
 
-#definePREG_SPLIT_NO_EMPTY (10)
+#define PREG_MATCH_OFFSET_CAPTURE  (12)
+
+#define PREG_SPLIT_NO_EMPTY(10)
 #define PREG_SPLIT_DELIM_CAPTURE   (11)
 #define PREG_SPLIT_OFFSET_CAPTURE  (12)
 
@@ -99,6 +101,7 @@

REGISTER_LONG_CONSTANT(PREG_PATTERN_ORDER, PREG_PATTERN_ORDER, CONST_CS | 
CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SET_ORDER, PREG_SET_ORDER, CONST_CS | 
CONST_PERSISTENT);
+   REGISTER_LONG_CONSTANT(PREG_MATCH_OFFSET_CAPTURE, PREG_MATCH_OFFSET_CAPTURE, 
+CONST_CS | CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_NO_EMPTY, PREG_SPLIT_NO_EMPTY, CONST_CS | 
CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_DELIM_CAPTURE, PREG_SPLIT_DELIM_CAPTURE, 
CONST_CS | CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_OFFSET_CAPTURE, PREG_SPLIT_OFFSET_CAPTURE, 
CONST_CS | CONST_PERSISTENT);
@@ -310,6 +313,24 @@
 }
 /* }}} */
 
+/* {{{ add_offset_pair
+ */
+static inline void add_offset_pair(zval *result, char *str, int len, int offset)
+{
+   zval *match_pair;
+
+   ALLOC_ZVAL(match_pair);
+   array_init(match_pair);
+   INIT_PZVAL(match_pair);
+
+   /* Add (match, offset) to the return value */
+   add_next_index_stringl(match_pair, str, len, 1);
+   add_next_index_long(match_pair, offset);
+   
+   zend_hash_next_index_insert(Z_ARRVAL_P(result), match_pair, sizeof(zval *), 
+NULL);
+}
+/* }}} */
+
 /* {{{ php_pcre_match
  */
 static void php_pcre_match(INTERNAL_FUNCTION_PARAMETERS, int global)
@@ -335,6 +356,7 @@
int  matched;   /* Has 
anything matched */
int  i;
int  subpats_order_val = 0; /* Integer value of 
subpats_order */
+   int  offset_capture = 0;/* If offsets should 
+be captured */
int  g_notempty = 0;/* If the match should 
not be empty */
const char **stringlist;/* Used to hold list of 
subpatterns */
char*match; /* The current match */
@@ -363,11 +385,17 @@

/* Make sure subpats_order is a number */
convert_to_long_ex(subpats_order);
-   subpats_order_val = Z_LVAL_PP(subpats_order);
-   if (subpats_order_val  PREG_PATTERN_ORDER ||
-   subpats_order_val  PREG_SET_ORDER) {
-   zend_error(E_WARNING, Wrong value for parameter 4 in 
call to preg_match_all());
-   }
+offset_capture = (Z_LVAL_PP(subpats_order)  PREG_MATCH_OFFSET_CAPTURE);
+
+   if (global) {
+  subpats_order_val = (Z_LVAL_PP(subpats_order)  1UL);
+   
+  if ((subpats_order_val  PREG_PATTERN_ORDER) ||
+  (subpats_order_val  PREG_SET_ORDER)) {
+ zend_error(E_WARNING, Wrong value for parameter 4 
+in call to preg_match_all());
+ }
+}
+
break;

default:
@@ -442,8 +470,13 @@
if (subpats_order_val == PREG_PATTERN_ORDER) {
/* For each subpattern, insert it into 
the appropriate array. */
for (i = 0; i  count; i++) {
-   
add_next_index_stringl(match_sets[i], (char *)stringlist[i],
-  
   

[PHP-DEV] [PATCH] preg_match(_all) support for capturing offsets

2002-06-04 Thread David Brown

Hi Andrei, all:

In a previous patch 
(http://news.php.net/article.php?group=php.devarticle=84281), support
was added to preg_split for capturing offsets along with matches. The
attached patch adds similar support to preg_match and preg_match_all via
a new PREG_MATCH_OFFSET_CAPTURE flag.

The code handles capturing offsets for both subpattern matches and whole
pattern matches, using the previously-added add_offset_pair helper function.

The flag is a new fourth (and optional) parameter for preg_match, and
are or'd into the existing 'order' parameter for preg_match_all, above
PREG_SET_ORDER and PREG_PATTERN_ORDER.

The patch below is diffed against the CVS head - humbly sumbitted for
application, rejection, suggestions, or extensive flaming. :)


Thanks in advance,

- Dave
  [EMAIL PROTECTED]


--- ext/pcre/php_pcre.c.origTue Jun  4 13:02:50 2002
+++ ext/pcre/php_pcre.c Tue Jun  4 13:12:10 2002
@@ -35,7 +35,9 @@
 #define PREG_PATTERN_ORDER 0
 #define PREG_SET_ORDER 1
 
-#definePREG_SPLIT_NO_EMPTY (10)
+#define PREG_MATCH_OFFSET_CAPTURE  (12)
+
+#define PREG_SPLIT_NO_EMPTY(10)
 #define PREG_SPLIT_DELIM_CAPTURE   (11)
 #define PREG_SPLIT_OFFSET_CAPTURE  (12)
 
@@ -99,6 +101,7 @@

REGISTER_LONG_CONSTANT(PREG_PATTERN_ORDER, PREG_PATTERN_ORDER, CONST_CS | 
CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SET_ORDER, PREG_SET_ORDER, CONST_CS | 
CONST_PERSISTENT);
+   REGISTER_LONG_CONSTANT(PREG_MATCH_OFFSET_CAPTURE, PREG_MATCH_OFFSET_CAPTURE, 
+CONST_CS | CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_NO_EMPTY, PREG_SPLIT_NO_EMPTY, CONST_CS | 
CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_DELIM_CAPTURE, PREG_SPLIT_DELIM_CAPTURE, 
CONST_CS | CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_OFFSET_CAPTURE, PREG_SPLIT_OFFSET_CAPTURE, 
CONST_CS | CONST_PERSISTENT);
@@ -310,6 +313,24 @@
 }
 /* }}} */
 
+/* {{{ add_offset_pair
+ */
+static inline void add_offset_pair(zval *result, char *str, int len, int offset)
+{
+   zval *match_pair;
+
+   ALLOC_ZVAL(match_pair);
+   array_init(match_pair);
+   INIT_PZVAL(match_pair);
+
+   /* Add (match, offset) to the return value */
+   add_next_index_stringl(match_pair, str, len, 1);
+   add_next_index_long(match_pair, offset);
+   
+   zend_hash_next_index_insert(Z_ARRVAL_P(result), match_pair, sizeof(zval *), 
+NULL);
+}
+/* }}} */
+
 /* {{{ php_pcre_match
  */
 static void php_pcre_match(INTERNAL_FUNCTION_PARAMETERS, int global)
@@ -335,6 +356,7 @@
int  matched;   /* Has 
anything matched */
int  i;
int  subpats_order_val = 0; /* Integer value of 
subpats_order */
+   int  offset_capture = 0;/* If offsets should 
+be captured */
int  g_notempty = 0;/* If the match should 
not be empty */
const char **stringlist;/* Used to hold list of 
subpatterns */
char*match; /* The current match */
@@ -363,11 +385,17 @@

/* Make sure subpats_order is a number */
convert_to_long_ex(subpats_order);
-   subpats_order_val = Z_LVAL_PP(subpats_order);
-   if (subpats_order_val  PREG_PATTERN_ORDER ||
-   subpats_order_val  PREG_SET_ORDER) {
-   zend_error(E_WARNING, Wrong value for parameter 4 in 
call to preg_match_all());
-   }
+offset_capture = (Z_LVAL_PP(subpats_order)  PREG_MATCH_OFFSET_CAPTURE);
+
+   if (global) {
+  subpats_order_val = (Z_LVAL_PP(subpats_order)  1UL);
+   
+  if ((subpats_order_val  PREG_PATTERN_ORDER) ||
+  (subpats_order_val  PREG_SET_ORDER)) {
+ zend_error(E_WARNING, Wrong value for parameter 4 
+in call to preg_match_all());
+ }
+}
+
break;

default:
@@ -442,8 +470,13 @@
if (subpats_order_val == PREG_PATTERN_ORDER) {
/* For each subpattern, insert it into 
the appropriate array. */
for (i = 0; i  count; i++) {
-   
add_next_index_stringl(match_sets[i], (char *)stringlist[i],
-  
offsets[(i1)+1] - offsets[i1], 1);
+   if (offset_capture) {
+

[PHP-DEV] Re: [PATCH] Allow preg_split to capture offsets

2002-05-23 Thread David Brown

On Thu, May 23, 2002 at 12:28:02PM -0500, Andrei Zmievski wrote:
 David,
 
  Enclosed is a patch to allow PCRE's preg_split to return an array of
  (match, offset) pairs, if PREG_SPLIT_OFFSET_CAPTURE is or'd into the
  flags parameter. Submitted for inclusion, rejection, extensive flaming,
  or suggestions. :)
 
 I've applied the patch with some modifications. Notably, when
 PREG_SPLIT_DELIM_CAPTURE was along with this new flag, the delimiters
 were not being captured with offsets. I also abstracted the match pair
 addition into a separate (inlined) function.

Both of those were on my to do list, but I figured I'd go ahead and post
my 10-minute hack to gauge interest before moving forward.

Anyway, much thanks. :)


- Dave
  [EMAIL PROTECTED]

-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP-DEV] [PATCH] Allow preg_split to capture offsets

2002-05-22 Thread David Brown

Hi:

Enclosed is a patch to allow PCRE's preg_split to return an array of
(match, offset) pairs, if PREG_SPLIT_OFFSET_CAPTURE is or'd into the
flags parameter. Submitted for inclusion, rejection, extensive flaming,
or suggestions. :)

This is a re-send of a previous patch; the last one didn't seem to make
it to the list.


A bit of background:

I'm currently working on a cross-referencing system that uses character
offsets internally, matching entries in a word index to positions in a
file. The system captures it's word list via preg_split, excluding
certain tags, character combinations, and whitespace from indexing.

Not finding an obvious way to capture the match offsets directly, I
tried:

  + Rescanning the input string with strstr(), starting from
position(last_match) + 1, looking for the current match. While
reasonably fast at O(n), it has a major problem when the matched
string was also a part of the delimiter.

  + A somewhat involved sequence of two preg_split() calls and an
array_diff(). One split is PREG_SPLIT_DELIM_CAPTURE, and the
array_diff finds which strings are delimiters. The resulting array
is then scanned, keeping a running total of string lengths. This
works, but has an obviously large memory (and to a lesser extent
run-time) cost.

Alternatives (especially other plain PHP solutions) are welcome.
Otherwise - is there more than a snowball's chance of something like
this being included in a future release?


Thanks in advance,

- Dave
  [EMAIL PROTECTED]



--- php-4.2.1-dist/ext/pcre/php_pcre.c  Thu Feb 28 03:26:35 2002
+++ php-4.2.1/ext/pcre/php_pcre.c   Fri May 17 11:28:02 2002
@@ -37,6 +37,7 @@
 
 #definePREG_SPLIT_NO_EMPTY (10)
 #define PREG_SPLIT_DELIM_CAPTURE   (11)
+#define PREG_SPLIT_OFFSET_CAPTURE  (12)
 
 #define PREG_REPLACE_EVAL  (10)
 
@@ -100,6 +101,7 @@
REGISTER_LONG_CONSTANT(PREG_SET_ORDER, PREG_SET_ORDER, CONST_CS | 
CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_NO_EMPTY, PREG_SPLIT_NO_EMPTY, CONST_CS | 
CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_DELIM_CAPTURE, PREG_SPLIT_DELIM_CAPTURE, 
CONST_CS | CONST_PERSISTENT);
+   REGISTER_LONG_CONSTANT(PREG_SPLIT_OFFSET_CAPTURE, PREG_SPLIT_OFFSET_CAPTURE, 
+CONST_CS | CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_GREP_INVERT, PREG_GREP_INVERT, CONST_CS | 
CONST_PERSISTENT);
return SUCCESS;
 }
@@ -1080,8 +1082,10 @@
int  limit_val = -1;/* Integer value of 
limit */
int  no_empty = 0;  /* If NO_EMPTY flag is 
set */
int  delim_capture = 0; /* If delimiters should be 
captured */
+   int  offset_capture = 0;/* If offsets should be 
+captured */
int  count = 0; /* Count of 
matched subpatterns */
int  start_offset;  /* Where the new 
search starts */
+   int  next_offset;   /* End of the last 
+delimiter match + 1 */
int  g_notempty = 0;/* If the match should 
not be empty */
char*match, /* The current match */
*last_match;/* Location of last 
match */
@@ -1102,6 +1106,7 @@
convert_to_long_ex(flags);
no_empty = Z_LVAL_PP(flags)  PREG_SPLIT_NO_EMPTY;
delim_capture = Z_LVAL_PP(flags)  PREG_SPLIT_DELIM_CAPTURE;
+   offset_capture = Z_LVAL_PP(flags)  PREG_SPLIT_OFFSET_CAPTURE;
}
}

@@ -1123,6 +1128,7 @@

/* Start at the beginning of the string */
start_offset = 0;
+   next_offset = 0;
last_match = Z_STRVAL_PP(subject);
match = NULL;

@@ -1143,9 +1149,27 @@
match = Z_STRVAL_PP(subject) + offsets[0];
 
if (!no_empty || Z_STRVAL_PP(subject)[offsets[0]] != 
last_match) {
-   /* Add the piece to the return value */
-   add_next_index_stringl(return_value, last_match,
-  
Z_STRVAL_PP(subject)[offsets[0]]-last_match, 1);
+
+   if (offset_capture) {
+   zval *match_pair;
+   ALLOC_ZVAL(match_pair);
+   array_init(match_pair);
+   INIT_PZVAL(match_pair);
+   
+   /* Add (match, offset) to the return value */
+   

[PHP-DEV] [PATCH] Allow preg_split to capture offsets

2002-05-22 Thread David Brown

Hi:

Enclosed is a patch to allow PCRE's preg_split to return an array of
(match, offset) pairs, if PREG_SPLIT_OFFSET_CAPTURE is or'd into the
flags parameter. Submitted for inclusion, rejection, extensive flaming,
or suggestions. :)


A bit of background:

I'm currently working on a cross-referencing system that uses character
offsets internally, matching entries in a word index to positions in a
file. The system captures it's word list via preg_split, excluding
certain tags, character combinations, and whitespace from indexing.

Not finding an obvious way to capture the match offsets directly, I
tried:

  + Rescanning the input string with strstr(), starting from
position(last_match) + 1, looking for the current match. While
reasonably fast at O(n), it has a major problem when the matched
string was also a part of the delimiter.

  + A somewhat involved sequence of two preg_split() calls and an
array_diff(). One split is PREG_SPLIT_DELIM_CAPTURE, and the
array_diff finds which strings are delimiters. The resulting array
is then scanned, keeping a running total of string lengths. This
works, but has an obviously large memory (and to a lesser extent
run-time) cost.

Alternatives (especially other plain PHP solutions) are welcome.
Otherwise - is there more than a snowball's chance of something like
this being included in a future release?


Thanks in advance,

- Dave
  [EMAIL PROTECTED]



--- php-4.2.1-dist/ext/pcre/php_pcre.c  Thu Feb 28 03:26:35 2002
+++ php-4.2.1/ext/pcre/php_pcre.c   Fri May 17 11:28:02 2002
@@ -37,6 +37,7 @@
 
 #definePREG_SPLIT_NO_EMPTY (10)
 #define PREG_SPLIT_DELIM_CAPTURE   (11)
+#define PREG_SPLIT_OFFSET_CAPTURE  (12)
 
 #define PREG_REPLACE_EVAL  (10)
 
@@ -100,6 +101,7 @@
REGISTER_LONG_CONSTANT(PREG_SET_ORDER, PREG_SET_ORDER, CONST_CS | 
CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_NO_EMPTY, PREG_SPLIT_NO_EMPTY, CONST_CS | 
CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_SPLIT_DELIM_CAPTURE, PREG_SPLIT_DELIM_CAPTURE, 
CONST_CS | CONST_PERSISTENT);
+   REGISTER_LONG_CONSTANT(PREG_SPLIT_OFFSET_CAPTURE, PREG_SPLIT_OFFSET_CAPTURE, 
+CONST_CS | CONST_PERSISTENT);
REGISTER_LONG_CONSTANT(PREG_GREP_INVERT, PREG_GREP_INVERT, CONST_CS | 
CONST_PERSISTENT);
return SUCCESS;
 }
@@ -1080,8 +1082,10 @@
int  limit_val = -1;/* Integer value of 
limit */
int  no_empty = 0;  /* If NO_EMPTY flag is 
set */
int  delim_capture = 0; /* If delimiters should be 
captured */
+   int  offset_capture = 0;/* If offsets should be 
+captured */
int  count = 0; /* Count of 
matched subpatterns */
int  start_offset;  /* Where the new 
search starts */
+   int  next_offset;   /* End of the last 
+delimiter match + 1 */
int  g_notempty = 0;/* If the match should 
not be empty */
char*match, /* The current match */
*last_match;/* Location of last 
match */
@@ -1102,6 +1106,7 @@
convert_to_long_ex(flags);
no_empty = Z_LVAL_PP(flags)  PREG_SPLIT_NO_EMPTY;
delim_capture = Z_LVAL_PP(flags)  PREG_SPLIT_DELIM_CAPTURE;
+   offset_capture = Z_LVAL_PP(flags)  PREG_SPLIT_OFFSET_CAPTURE;
}
}

@@ -1123,6 +1128,7 @@

/* Start at the beginning of the string */
start_offset = 0;
+   next_offset = 0;
last_match = Z_STRVAL_PP(subject);
match = NULL;

@@ -1143,9 +1149,27 @@
match = Z_STRVAL_PP(subject) + offsets[0];
 
if (!no_empty || Z_STRVAL_PP(subject)[offsets[0]] != 
last_match) {
-   /* Add the piece to the return value */
-   add_next_index_stringl(return_value, last_match,
-  
Z_STRVAL_PP(subject)[offsets[0]]-last_match, 1);
+
+   if (offset_capture) {
+   zval *match_pair;
+   ALLOC_ZVAL(match_pair);
+   array_init(match_pair);
+   INIT_PZVAL(match_pair);
+   
+   /* Add (match, offset) to the return value */
+   add_next_index_stringl(match_pair, last_match,
+