Re: [PHP-DEV] RFC Idea - is_json - looking for feedback

2022-07-29 Thread Jordan LeDoux
On Fri, Jul 29, 2022 at 7:27 AM juan carlos morales <
dev.juan.mora...@gmail.com> wrote:

> # Why this function ?
>
> At the moment the only way to determine if a JSON-string is valid we have
> to execute the json_decode() function.
>
> The drawback about this, is that json_decode() generates an in memory an
> object/array (depending on parameters) while parsing the string; this leads
> to a memory usage that is not needed (because we use memory for creating
> the object/array) and also can cause an error for reaching the memory-limit
> of the php process.
>
> Sometimes we just need to know is the string is a valid json or not, and
> nothing else.
>

You say that you have a first-pass at the implementation done. I'd be
curious to see it. My initial thought was that in order to validate the
string, you likely need to allocate extra memory as part of the validation
that depends on the string size. You'd definitely save the overhead of a
ZVAL, but for such an object that overhead is likely negligible.

So I guess my question would be: in the actual implementation that lands,
how much memory would this actually save compared to json_decode()? This
seems like it would make the RFC tricky, as the purported benefit of the
RFC depends very tightly on the implementation that lands.

Jordan


Re: [PHP-DEV] RFC Idea - is_json - looking for feedback

2022-07-29 Thread juan carlos morales
El vie, 29 jul 2022 a las 22:18, David Rodrigues ()
escribió:

> I was about to say NO, but after being completely your argument, the idea
> makes sense.
>
> Initially I thought about using json_decode() with error capture, but the
> idea that it would overload memory makes perfect sense, compared to a
> simple structure analysis, if that is indeed the user's intention. The
> performance would also be absurdly better.
>
> What worries me above is the misuse of the function, like checking if
> is_json() === true and using json_decode() right after. However, I believe
> this can be easily optimized by the engine itself.
>
> My vote is YES.
>
>
> Atenciosamente,
> David Rodrigues
>
>
> Em sex., 29 de jul. de 2022 às 16:32, Michał Marcin Brzuchalski <
> michal.brzuchal...@gmail.com> escreveu:
>
> > Hi Juan,
> >
> > pt., 29 lip 2022, 16:26 użytkownik juan carlos morales <
> > dev.juan.mora...@gmail.com> napisał:
> >
> > > I am following the RFC guideline for the first time. (
> > > https://wiki.php.net/rfc/howto)
> > >
> > > As suggested there, I am here to get a feeling from you, regarding the
> > > following RFC for PHP.
> > >
> > > # Change (draft):
> > >
> > > New function in php called like:
> > >
> > > is_json(string $string): bool
> > >
> > > ## Description
> > > ### Parameters
> > > string $string -> string to find out if is a valid JSON or not
> > >
> > > ### Return
> > > Returns a bool. The function is capable to determine if the passed
> string
> > > is a valid JSON (true) or not (false).
> > >
> > > # Why this function ?
> > >
> > > At the moment the only way to determine if a JSON-string is valid we
> have
> > > to execute the json_decode() function.
> > >
> > > The drawback about this, is that json_decode() generates an in memory
> an
> > > object/array (depending on parameters) while parsing the string; this
> > leads
> > > to a memory usage that is not needed (because we use memory for
> creating
> > > the object/array) and also can cause an error for reaching the
> > memory-limit
> > > of the php process.
> > >
> >
> > Personally I'd vote NO.
> >
> > Cheers,
> > Michał Marcin Brzuchalski
> >
> > >
> >
>

Thanks for the feedback

*** To: Michał Marcin Brzuchalski
Thanks for the opinion, but it would be useful for me to know if there is a
reason behind the "no" that could help me get better.

*** To: David Rodrigues
Gracias. Regarding "performance", I did some benchmarks. I tested is_json()
vs json_decode() with a json-string with 1 million key-values; is_json was
100% faster than json_decode(), plus memory usage with is_json() was flat!
, yes , flat. memory_get_usage() before and after is_json() returned the
same value; with json_decode() I had to change the memory limit setting
otherwise memory reaches max allowed before finishing parsing it.

Regarding use cases, I can say that this idea came up , as in my current
company , our product heavily uses json_decode() to know if an string is a
valid json, and compromises the memory usage once in a while, plus,
performance decreases; that is why this RFC-Idea. The good thing is that I
already have something for this.

Also in stack overflow, honestly, take a look, and check how much people is
needing this.

Unit tests could heavily benefit from such a functionality.

A quick look in github for php code with a function called "is_json()"
exposes tons of projects where developers are writting their own
"is_json()" function relaying in json_decode() to make it work.

Etc.

:)

Stay in contact David. and once again, thanks!


Re: [PHP-DEV] RFC Idea - is_json - looking for feedback

2022-07-29 Thread David Rodrigues
I was about to say NO, but after being completely your argument, the idea
makes sense.

Initially I thought about using json_decode() with error capture, but the
idea that it would overload memory makes perfect sense, compared to a
simple structure analysis, if that is indeed the user's intention. The
performance would also be absurdly better.

What worries me above is the misuse of the function, like checking if
is_json() === true and using json_decode() right after. However, I believe
this can be easily optimized by the engine itself.

My vote is YES.


Atenciosamente,
David Rodrigues


Em sex., 29 de jul. de 2022 às 16:32, Michał Marcin Brzuchalski <
michal.brzuchal...@gmail.com> escreveu:

> Hi Juan,
>
> pt., 29 lip 2022, 16:26 użytkownik juan carlos morales <
> dev.juan.mora...@gmail.com> napisał:
>
> > I am following the RFC guideline for the first time. (
> > https://wiki.php.net/rfc/howto)
> >
> > As suggested there, I am here to get a feeling from you, regarding the
> > following RFC for PHP.
> >
> > # Change (draft):
> >
> > New function in php called like:
> >
> > is_json(string $string): bool
> >
> > ## Description
> > ### Parameters
> > string $string -> string to find out if is a valid JSON or not
> >
> > ### Return
> > Returns a bool. The function is capable to determine if the passed string
> > is a valid JSON (true) or not (false).
> >
> > # Why this function ?
> >
> > At the moment the only way to determine if a JSON-string is valid we have
> > to execute the json_decode() function.
> >
> > The drawback about this, is that json_decode() generates an in memory an
> > object/array (depending on parameters) while parsing the string; this
> leads
> > to a memory usage that is not needed (because we use memory for creating
> > the object/array) and also can cause an error for reaching the
> memory-limit
> > of the php process.
> >
>
> Personally I'd vote NO.
>
> Cheers,
> Michał Marcin Brzuchalski
>
> >
>


Re: [PHP-DEV] RFC Idea - is_json - looking for feedback

2022-07-29 Thread Michał Marcin Brzuchalski
Hi Juan,

pt., 29 lip 2022, 16:26 użytkownik juan carlos morales <
dev.juan.mora...@gmail.com> napisał:

> I am following the RFC guideline for the first time. (
> https://wiki.php.net/rfc/howto)
>
> As suggested there, I am here to get a feeling from you, regarding the
> following RFC for PHP.
>
> # Change (draft):
>
> New function in php called like:
>
> is_json(string $string): bool
>
> ## Description
> ### Parameters
> string $string -> string to find out if is a valid JSON or not
>
> ### Return
> Returns a bool. The function is capable to determine if the passed string
> is a valid JSON (true) or not (false).
>
> # Why this function ?
>
> At the moment the only way to determine if a JSON-string is valid we have
> to execute the json_decode() function.
>
> The drawback about this, is that json_decode() generates an in memory an
> object/array (depending on parameters) while parsing the string; this leads
> to a memory usage that is not needed (because we use memory for creating
> the object/array) and also can cause an error for reaching the memory-limit
> of the php process.
>

Personally I'd vote NO.

Cheers,
Michał Marcin Brzuchalski

>


[PHP-DEV] What type of Exception to use for unserialize() failure?

2022-07-29 Thread Tim Düsterhus

Hi!

I'm currently in the process of cleaning up the error handling of the 
new ext/random and now came across the implementation of __unserialize().


Looking through the source code for existing examples I found that the 
following is used:


- ext/gmp: Exception
- ext/hash: Exception
- ext/date: Error
- ext/spl: UnexpectedValueException
- ext/random: Currently Exception.

… all of those with different error messages. unserialize() itself will 
emit a regular 'Notice' (yes, you read that right) when encountering 
garbage data.


In the end I've opted to follow ext/date's lead and created a PR 
changing ext/random to use an Error, as unserialization failures likely 
mean that you are unserializing untrusted data which you should not do 
in the first place. Also any errors during unserializing are not really 
recoverable: You can't go and fix up the input string.


The PR can be found here:

https://github.com/php/php-src/pull/9185

During review cmb noted that using an 'Error' here might not be the best 
choice, as while it is likely to be a programmer error if unserializing 
fails, we do not want to educate users to catch(Error).


As the existing behavior already is inconsistent and the Notice really 
should be upgraded to something … more alarming in the future, I'm 
putting this topic up for discussion on the list.


My suggested path forward would be:

For 8.2:

- Update ext/random to use the ext/date wording (I like that one best):

"Invalid serialization data for  object."

- Revert the change from Exception to Error in my ext/random PR #9185.
- Update ext/date to Exception (the Error for __unserialize() was added 
in 8.2, but there already was an Error for __wakeup() before).


For whatever comes after 8.2:

- Add a new UnserializationFailureException extends Exception (or 
whatever color the bikeshed should have). Any existing catch blocks 
should still match, as Exception is more general. It would break for 
ext/spl, though. Unless UnserializationFailureException extends 
UnexpectedValueException.
- Add a helper function that throws this Exception with a consistent 
wording.

- Upgrade unserialize() from Notice to UnserializationFailureException.

Opinions?

Best regards
Tim Düsterhus

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



[PHP-DEV] RFC Idea - is_json - looking for feedback

2022-07-29 Thread juan carlos morales
I am following the RFC guideline for the first time. (
https://wiki.php.net/rfc/howto)

As suggested there, I am here to get a feeling from you, regarding the
following RFC for PHP.

# Change (draft):

New function in php called like:

is_json(string $string): bool

## Description
### Parameters
string $string -> string to find out if is a valid JSON or not

### Return
Returns a bool. The function is capable to determine if the passed string
is a valid JSON (true) or not (false).

# Why this function ?

At the moment the only way to determine if a JSON-string is valid we have
to execute the json_decode() function.

The drawback about this, is that json_decode() generates an in memory an
object/array (depending on parameters) while parsing the string; this leads
to a memory usage that is not needed (because we use memory for creating
the object/array) and also can cause an error for reaching the memory-limit
of the php process.

Sometimes we just need to know is the string is a valid json or not, and
nothing else.

# Do we need something like this? If a check to an string is valid JSON
then for sure I will have to use it in my code either as an object or an
array.

Well that is not true. There are plenty of cases where you just need to
check if a string is a valid json and that is it. Just looking into
stackoverflow will give you an idea about how many people is looking for
something like this in an efficient way.

# Who would develop the RFC ?

I would develop the RFC with the help of the community if needed. I already
have a first version of the is_json() function tested and ready for review.



Thanks in advance, I am looking forward to hear your opinion on this.

Kind Regards,

Juan Carlos Morales.


Re: [PHP-DEV] Re: Character range syntax ".." for character masks

2022-07-29 Thread Hans Henrik Bergan
>1. Are there any reasonable objections to consistently implementing
character range expressions for all character masks?

would be a minor BC break to silently change the meaning of memspn($str,
"a..b"), which currently has the same meaning as "a.b" with wasted cpu
cycles, but with your suggestion it would become the same meaning as "ab"
and the dot would no longer pass the check..
But then again, currently writing ".." is just a waste of cpu, and i don't
think i've actually ever seen anyone do that in the wild
¯\_(ツ)_/¯


On Fri, 29 Jul 2022 at 10:58, Guilliam Xavier 
wrote:

> On Fri, Jul 29, 2022 at 7:15 AM mickmackusa  wrote:
>
> >
> >
> > On Monday, July 25, 2022, Guilliam Xavier 
> > wrote:
> >
> >> On Sat, Jul 9, 2022 at 1:56 AM mickmackusa 
> wrote:
> >>
> >>> I've discovered that several native string functions offer a character
> >>> mask
> >>> as a parameter.
> >>>
> >>> I've laid out my observations at
> >>> https://stackoverflow.com/q/72865138/2943403
> >>>
> >>
> >> Out of curiosity, why do you say that strtr() is "not a good candidate
> >> because character order matters" (although you give a reasonable
> example)?
> >> Maybe you have some counter-example?
> >>
> >> Regards,
> >>
> >> --
> >> Guilliam Xavier
> >>
> >
> > I prefer to keep my scope very tight when posting on Stack Overflow.
> >
> > My focus was purely on enabling character range syntax for native
> > functions with character mask parameters.  My understanding of character
> > masks in PHP requires single-byte characters and no meaning to character
> > order.
> >
> > When strtr() is fed two strings, they cannot be considered "character
> > masks" because the character orders matter.
> >
> > If extending character range syntax to parameters which are not character
> > masks, I might support the feature for strtr(), but ensuring that the two
> > strings are balanced will be made more difficult with ranged syntax.
> > strtr() will silently condone imbalanced strings.
> https://3v4l.org/PY15F
> >
>
> Thanks for the clarifications. You're right that the internal
> `php_charmask` converts a character list (possibly containing one or more
> ranges) into a 256-char *mask*, thus "losing" any original order; so
> strtr() actually couldn't use the same implementation (even without
> ranges), and a counter-example is `strtr('adobe', 'abcde', 'ebcda')`
> (`strtr('adobe', 'a..e', 'e..a')` would trigger a Warning "Invalid
> '..'-range, '..'-range needs to be incrementing").
>
> I had seen a parallel with the Unix `tr` command, which *does* support
> [incrementing] ranges (e.g. both `echo adobe | tr abcde ABCDE` and `echo
> adobe | tr a-e A-E` give "ADoBE", while `echo adobe | tr abcde edcba` gives
> "eboda" but `echo adobe | tr a-e e-a` errors "range-endpoints of 'e-a' are
> in reverse collating sequence order"), but its implementation doesn't use
> character masks indeed (
> https://github.com/coreutils/coreutils/blob/master/src/tr.c), and `echo
> abracadabra | tr a-f x` gives "xxrxxrx" not "xbrxcxdxbrx"; and it also
> supports more things like POSIX character classes...
>
> PS: I find the `strtr(string $string, array $replace_pairs)` form generally
> superior to the `strtr(string $string, string $from, string $to)` one
> anyway ;)
>
> Regards,
>
> --
> Guilliam Xavier
>


[PHP-DEV] Re: Character range syntax ".." for character masks

2022-07-29 Thread Guilliam Xavier
On Fri, Jul 29, 2022 at 7:15 AM mickmackusa  wrote:

>
>
> On Monday, July 25, 2022, Guilliam Xavier 
> wrote:
>
>> On Sat, Jul 9, 2022 at 1:56 AM mickmackusa  wrote:
>>
>>> I've discovered that several native string functions offer a character
>>> mask
>>> as a parameter.
>>>
>>> I've laid out my observations at
>>> https://stackoverflow.com/q/72865138/2943403
>>>
>>
>> Out of curiosity, why do you say that strtr() is "not a good candidate
>> because character order matters" (although you give a reasonable example)?
>> Maybe you have some counter-example?
>>
>> Regards,
>>
>> --
>> Guilliam Xavier
>>
>
> I prefer to keep my scope very tight when posting on Stack Overflow.
>
> My focus was purely on enabling character range syntax for native
> functions with character mask parameters.  My understanding of character
> masks in PHP requires single-byte characters and no meaning to character
> order.
>
> When strtr() is fed two strings, they cannot be considered "character
> masks" because the character orders matter.
>
> If extending character range syntax to parameters which are not character
> masks, I might support the feature for strtr(), but ensuring that the two
> strings are balanced will be made more difficult with ranged syntax.
> strtr() will silently condone imbalanced strings.  https://3v4l.org/PY15F
>

Thanks for the clarifications. You're right that the internal
`php_charmask` converts a character list (possibly containing one or more
ranges) into a 256-char *mask*, thus "losing" any original order; so
strtr() actually couldn't use the same implementation (even without
ranges), and a counter-example is `strtr('adobe', 'abcde', 'ebcda')`
(`strtr('adobe', 'a..e', 'e..a')` would trigger a Warning "Invalid
'..'-range, '..'-range needs to be incrementing").

I had seen a parallel with the Unix `tr` command, which *does* support
[incrementing] ranges (e.g. both `echo adobe | tr abcde ABCDE` and `echo
adobe | tr a-e A-E` give "ADoBE", while `echo adobe | tr abcde edcba` gives
"eboda" but `echo adobe | tr a-e e-a` errors "range-endpoints of 'e-a' are
in reverse collating sequence order"), but its implementation doesn't use
character masks indeed (
https://github.com/coreutils/coreutils/blob/master/src/tr.c), and `echo
abracadabra | tr a-f x` gives "xxrxxrx" not "xbrxcxdxbrx"; and it also
supports more things like POSIX character classes...

PS: I find the `strtr(string $string, array $replace_pairs)` form generally
superior to the `strtr(string $string, string $from, string $to)` one
anyway ;)

Regards,

-- 
Guilliam Xavier