[PHP-DEV] feature proposal: string types (complete with a patch)

2002-05-01 Thread vdhome

Dear PHP developers,

I propose a feature that I call "string types". I have also already
coded a first version of it that you can try. There's a bug for it
here: http://bugs.php.net/?id=16480 and a homepage with a description
and a patch here: http://nebuchadnezzar.zion.cz/php_strings.php
Please be patient when downloading. The server is behind a 64k line.
:-(

About the feature: It introduces five types of strings: plain string,
SQL string, HTML string, URL (query) string and undefined (unknown
type) string. The difference is in escaping characters that have
special meaning in SQL (quotes, nul), HTML (ampersand, less-than,
greater-than, double-quote) and URL (nearly everything except plain
letters and digits). The conversion is done automatically when
requested. This language extension is fully backwards-compatible;
users who don't know about the new features (or don't want to know)
need not worry: their existing scripts should work the same without
any change. For users who do know about this and want to use it, I
believe this new feature should bring significant improvement of code
readability, reduction of code size and reduced probability of bugs.

I think that the best explanation is by example, so see this:

$data = p"a string with 'apostrophes', \"double-quotes\" etc.";
mysql_query(s"INSERT INTO table VALUES ('$data')");

Because we include a plain string in an SQL string, the plain string
is automatically converted to an SQL string, i.e. AddSlashes is
applied to it. Strings from GET/POST/COOKIE have the right type,
which makes it possible to easily write scripts that do not depend on
the setting of magic_quotes_gpc. (An SQL string included in another
SQL string is not converted, of course.)

Another one:

$data = p"a string with greater-than, &ersand";
echo h"";

Here, the $data string is automatically HtmlSpecialChars'ed when
included in a HTML string.

Read more about it on the above mentioned homepage. Try it, test it,
tell me what you think about it! Just remember that this is alpha
code, and it is very little tested. I make no guarantees whatsoever,
except that it has bugs. :-)

Please cc me in any replies. I am not subscribed to the list (so in
fact, I don't know if it will allow me to post this). I realize that
this is not a good practice, but I couldn't handle the loads of mail -
 and according to http://www.php.net/mailing-lists.php this list
isn't available in digest form. :-(

Thanks for your attention.

Vaclav Dvorak  ([EMAIL PROTECTED])
http://nebuchadnezzar.zion.cz/

-- 
PHP Development Mailing List 
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP-DEV] Re: feature proposal: string types

2002-05-03 Thread vdhome

> Those who pay attention to my occasional ramblings may remember that I
> once suggested implementing "type hints", which is a more generic
> version of this.  Type hints is like your string types, except that they
> apply to any type.

I haven't been subscribed, so I don't remember. :-) I am certainly
open to a generalization of this, but I can't really imagine what it
means for other types than string - I'll try looking up your messages
in the archive.

For strings, I propose making an extensible (at runtime) registry of
string types (with a few built-in, at least those that I already
made); for each string type, the programmer would supply a function
that would convert it to a string type given as a parameter. The
additional performace hit (to my original proposal) shouldn't be too
great, since it would only occur when converting strings, which is a
process that the programmer would have to do anyway, just differently.

> I guess the core issue here is whether adding an int to zval or
> zval.value.str is worth the cost.  With all the zval copying going on,
> there will be a cpu overhead as well as memory.

Well, I assumed that the overhead wouldn't be too significant, but I
admit I didn't do any measurements whatsoever. Is there any existing
good benchmark, or should I just loop a million times through a few
random lines of code and measure this?

> Back when Andi moved zval.value.strlen and zval.value.strval into
> the str struct, thus saving 4 bytes in the zval struct, PHP 3 was
> generally speeded up by 25%.

WHAT??? Well, since this would add them back again... :-(

> Not sure if the PHP 4 or PHP 5 overhead will be in the same
> ballpark, but it's something to think about.

Definitely. Although I love my idea, I guess I'd give it up myself if
it proves to be such a slow-down.

Of course, it could be made a compile-time configuration option, but
I don't think that's very useful. The programmer couldn't be sure
beforehand what version of the language his web-hoster will be
using... :-( (Which is a part of what I wanted to avoid by this -
eliminate comfortably one dependence on configuration: the
magic_quotes_gpc option.)

> IMHO the syntax you suggest is a bit terse, what about this instead:
>
> echo url"http://$host:$port/$path";;

Personally, I don't really care, and it's easy to change, so why not?
It would certainly need to allow for longer type names if it were to
be general, which I now want to implement.

> As for the SQL string type, there needs to be at least two, some
> databases quote "'" as "''", others quote it as "\'".

This is set by cfg. option magic_quotes_sybase, so I don't think it's
necessary (the web-hoster should set it depending on which database
they provide; the programmer shouldn't even need to care about it).
But there could be one sql-type which follows the m_q_s setting, and
two others for each possibility.

> But I would like to extend the idea beyond reformatting inserted
> strings.  For example, for easy soap/xmlrpc serialization, being able to
> tag a value as a date or some other soap/xmlrpc-specific type is very
> useful.

I know nothing about soap/xmlrpc, but... How about making a reserved
method name for objects that would - if it exists - be called when
that object should be converted to a string?

class Date {
function __to_string($type) {
# return a string with $this formatted as $type string
}
}
$date = new Date();
echo xml"$date";

H...??

BTW, Python now has the possibility of inheriting classes from built-
in types, which would sure come in handy here...

BTW2, as I describe on the webpage that I announced, I wasn't able to
successfully run the self-test suite (make test). Could someone
please tell me what I am doing wrong (see the description at the
url)? Thanks.

The URL is, again: http://nebuchadnezzar.zion.cz/php_strings.php

Vaclav Dvorak  ([EMAIL PROTECTED])
http://nebuchadnezzar.zion.cz/

-- 
PHP Development Mailing List 
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP-DEV] Re: feature proposal: string types

2002-05-03 Thread vdhome

> > IMHO the syntax you suggest is a bit terse, what about this instead:
> >
> > echo url"http://$host:$port/$path";;
>
> Looks perlish to me, I'd rather see a casting thing like this then:
>
> echo (url) "http://$host:$port/$path";;

I was originally thinking about casting too, but although I'm sure it
could be done, I don't think it's good to have casting(-like) syntax
have the effect I proposed. You see, what I proposed is having a
different behaviour already on inserting the variables ($host, $port
and $path in the above example would be converted to url string type)
into the string. With casting, the behaviour would be to first make a
normal string, and then cast it as a whole, losing the information
about the inserted variables.

> but then you just could make it a function (or a language construct):
>
> echo url("http://$host:$port/$path";);

Same argument against a function; could be a language construct, but
I think it would be less intuitive - it would look like you first
make a normal string and then convert it, which is not the case in my
proposal.

> but this can break BC as those functions may be in use in scripts.

Yeah, unless we give them really long and ugly names. Not sure if
that's a good idea, though. ;-)

> > But I would like to extend the idea beyond reformatting inserted
> > strings.  For example, for easy soap/xmlrpc serialization, being able to
> > tag a value as a date or some other soap/xmlrpc-specific type is very
> > useful.
>
> It might, but remember that PHP is not a strong typed language; somehow it
> feels like this is not just PHP then.

PHP would not become a strong typed language. It would be... perhaps
optionally sub-typed? :-) I wouldn't worry about this.

Vaclav Dvorak  ([EMAIL PROTECTED])
http://nebuchadnezzar.zion.cz/

-- 
PHP Development Mailing List 
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP-DEV] Re: feature proposal: string types

2002-05-04 Thread vdhome

> > Well, I assumed that the overhead wouldn't be too significant, but I
> > admit I didn't do any measurements whatsoever. Is there any existing
> > good benchmark, or should I just loop a million times through a few
> > random lines of code and measure this?
>
> Yup :)

OK, I did some _very_ simple measurements, and the results aren't too
bad. But is there really no benchmark that you PHP developers use
regularly to test your changes? What I did must surely be very
artificial, not reflecting real-world usage and performance very
much.

I loop 500,000 times through a block of random code that contains:
- assignment of strings and numbers to simple variables
- function call
- array access by numeric index and by string key
- an if condition
- a string concatenation
- some arithmetic operations
- and occasionally an echo statement
- no usage of the new string types

The script runs for about 47.35 seconds (an average from 10
consecutive runs) on unmodified PHP 4.2.0 (default ./configure with
no parameters), and about 48.02 seconds on my modified version.
That's only 1.4% slower. If you want my opinion, I will gladly
sacrifice that. :-)

The results differ with the code in the loop - in the worst case that
I saw the difference was about 4%, in the best case my modified
version was actually a tiny bit quicker :-))) - must be some
alignment magic.

Any ideas on an improvement of the benchmark?

The script:



Vaclav Dvorak  ([EMAIL PROTECTED])
http://nebuchadnezzar.zion.cz/

-- 
PHP Development Mailing List 
To unsubscribe, visit: http://www.php.net/unsub.php