Re: [PHP-DEV] RFC Idea - is_json - looking for feedback
On Fri, Jul 29, 2022 at 7:27 AM juan carlos morales < dev.juan.mora...@gmail.com> wrote: > # Why this function ? > > At the moment the only way to determine if a JSON-string is valid we have > to execute the json_decode() function. > > The drawback about this, is that json_decode() generates an in memory an > object/array (depending on parameters) while parsing the string; this leads > to a memory usage that is not needed (because we use memory for creating > the object/array) and also can cause an error for reaching the memory-limit > of the php process. > > Sometimes we just need to know is the string is a valid json or not, and > nothing else. > You say that you have a first-pass at the implementation done. I'd be curious to see it. My initial thought was that in order to validate the string, you likely need to allocate extra memory as part of the validation that depends on the string size. You'd definitely save the overhead of a ZVAL, but for such an object that overhead is likely negligible. So I guess my question would be: in the actual implementation that lands, how much memory would this actually save compared to json_decode()? This seems like it would make the RFC tricky, as the purported benefit of the RFC depends very tightly on the implementation that lands. Jordan
Re: [PHP-DEV] RFC Idea - is_json - looking for feedback
El vie, 29 jul 2022 a las 22:18, David Rodrigues () escribió: > I was about to say NO, but after being completely your argument, the idea > makes sense. > > Initially I thought about using json_decode() with error capture, but the > idea that it would overload memory makes perfect sense, compared to a > simple structure analysis, if that is indeed the user's intention. The > performance would also be absurdly better. > > What worries me above is the misuse of the function, like checking if > is_json() === true and using json_decode() right after. However, I believe > this can be easily optimized by the engine itself. > > My vote is YES. > > > Atenciosamente, > David Rodrigues > > > Em sex., 29 de jul. de 2022 às 16:32, Michał Marcin Brzuchalski < > michal.brzuchal...@gmail.com> escreveu: > > > Hi Juan, > > > > pt., 29 lip 2022, 16:26 użytkownik juan carlos morales < > > dev.juan.mora...@gmail.com> napisał: > > > > > I am following the RFC guideline for the first time. ( > > > https://wiki.php.net/rfc/howto) > > > > > > As suggested there, I am here to get a feeling from you, regarding the > > > following RFC for PHP. > > > > > > # Change (draft): > > > > > > New function in php called like: > > > > > > is_json(string $string): bool > > > > > > ## Description > > > ### Parameters > > > string $string -> string to find out if is a valid JSON or not > > > > > > ### Return > > > Returns a bool. The function is capable to determine if the passed > string > > > is a valid JSON (true) or not (false). > > > > > > # Why this function ? > > > > > > At the moment the only way to determine if a JSON-string is valid we > have > > > to execute the json_decode() function. > > > > > > The drawback about this, is that json_decode() generates an in memory > an > > > object/array (depending on parameters) while parsing the string; this > > leads > > > to a memory usage that is not needed (because we use memory for > creating > > > the object/array) and also can cause an error for reaching the > > memory-limit > > > of the php process. > > > > > > > Personally I'd vote NO. > > > > Cheers, > > Michał Marcin Brzuchalski > > > > > > > > Thanks for the feedback *** To: Michał Marcin Brzuchalski Thanks for the opinion, but it would be useful for me to know if there is a reason behind the "no" that could help me get better. *** To: David Rodrigues Gracias. Regarding "performance", I did some benchmarks. I tested is_json() vs json_decode() with a json-string with 1 million key-values; is_json was 100% faster than json_decode(), plus memory usage with is_json() was flat! , yes , flat. memory_get_usage() before and after is_json() returned the same value; with json_decode() I had to change the memory limit setting otherwise memory reaches max allowed before finishing parsing it. Regarding use cases, I can say that this idea came up , as in my current company , our product heavily uses json_decode() to know if an string is a valid json, and compromises the memory usage once in a while, plus, performance decreases; that is why this RFC-Idea. The good thing is that I already have something for this. Also in stack overflow, honestly, take a look, and check how much people is needing this. Unit tests could heavily benefit from such a functionality. A quick look in github for php code with a function called "is_json()" exposes tons of projects where developers are writting their own "is_json()" function relaying in json_decode() to make it work. Etc. :) Stay in contact David. and once again, thanks!
Re: [PHP-DEV] RFC Idea - is_json - looking for feedback
I was about to say NO, but after being completely your argument, the idea makes sense. Initially I thought about using json_decode() with error capture, but the idea that it would overload memory makes perfect sense, compared to a simple structure analysis, if that is indeed the user's intention. The performance would also be absurdly better. What worries me above is the misuse of the function, like checking if is_json() === true and using json_decode() right after. However, I believe this can be easily optimized by the engine itself. My vote is YES. Atenciosamente, David Rodrigues Em sex., 29 de jul. de 2022 às 16:32, Michał Marcin Brzuchalski < michal.brzuchal...@gmail.com> escreveu: > Hi Juan, > > pt., 29 lip 2022, 16:26 użytkownik juan carlos morales < > dev.juan.mora...@gmail.com> napisał: > > > I am following the RFC guideline for the first time. ( > > https://wiki.php.net/rfc/howto) > > > > As suggested there, I am here to get a feeling from you, regarding the > > following RFC for PHP. > > > > # Change (draft): > > > > New function in php called like: > > > > is_json(string $string): bool > > > > ## Description > > ### Parameters > > string $string -> string to find out if is a valid JSON or not > > > > ### Return > > Returns a bool. The function is capable to determine if the passed string > > is a valid JSON (true) or not (false). > > > > # Why this function ? > > > > At the moment the only way to determine if a JSON-string is valid we have > > to execute the json_decode() function. > > > > The drawback about this, is that json_decode() generates an in memory an > > object/array (depending on parameters) while parsing the string; this > leads > > to a memory usage that is not needed (because we use memory for creating > > the object/array) and also can cause an error for reaching the > memory-limit > > of the php process. > > > > Personally I'd vote NO. > > Cheers, > Michał Marcin Brzuchalski > > > >
Re: [PHP-DEV] RFC Idea - is_json - looking for feedback
Hi Juan, pt., 29 lip 2022, 16:26 użytkownik juan carlos morales < dev.juan.mora...@gmail.com> napisał: > I am following the RFC guideline for the first time. ( > https://wiki.php.net/rfc/howto) > > As suggested there, I am here to get a feeling from you, regarding the > following RFC for PHP. > > # Change (draft): > > New function in php called like: > > is_json(string $string): bool > > ## Description > ### Parameters > string $string -> string to find out if is a valid JSON or not > > ### Return > Returns a bool. The function is capable to determine if the passed string > is a valid JSON (true) or not (false). > > # Why this function ? > > At the moment the only way to determine if a JSON-string is valid we have > to execute the json_decode() function. > > The drawback about this, is that json_decode() generates an in memory an > object/array (depending on parameters) while parsing the string; this leads > to a memory usage that is not needed (because we use memory for creating > the object/array) and also can cause an error for reaching the memory-limit > of the php process. > Personally I'd vote NO. Cheers, Michał Marcin Brzuchalski >
[PHP-DEV] What type of Exception to use for unserialize() failure?
Hi! I'm currently in the process of cleaning up the error handling of the new ext/random and now came across the implementation of __unserialize(). Looking through the source code for existing examples I found that the following is used: - ext/gmp: Exception - ext/hash: Exception - ext/date: Error - ext/spl: UnexpectedValueException - ext/random: Currently Exception. … all of those with different error messages. unserialize() itself will emit a regular 'Notice' (yes, you read that right) when encountering garbage data. In the end I've opted to follow ext/date's lead and created a PR changing ext/random to use an Error, as unserialization failures likely mean that you are unserializing untrusted data which you should not do in the first place. Also any errors during unserializing are not really recoverable: You can't go and fix up the input string. The PR can be found here: https://github.com/php/php-src/pull/9185 During review cmb noted that using an 'Error' here might not be the best choice, as while it is likely to be a programmer error if unserializing fails, we do not want to educate users to catch(Error). As the existing behavior already is inconsistent and the Notice really should be upgraded to something … more alarming in the future, I'm putting this topic up for discussion on the list. My suggested path forward would be: For 8.2: - Update ext/random to use the ext/date wording (I like that one best): "Invalid serialization data for object." - Revert the change from Exception to Error in my ext/random PR #9185. - Update ext/date to Exception (the Error for __unserialize() was added in 8.2, but there already was an Error for __wakeup() before). For whatever comes after 8.2: - Add a new UnserializationFailureException extends Exception (or whatever color the bikeshed should have). Any existing catch blocks should still match, as Exception is more general. It would break for ext/spl, though. Unless UnserializationFailureException extends UnexpectedValueException. - Add a helper function that throws this Exception with a consistent wording. - Upgrade unserialize() from Notice to UnserializationFailureException. Opinions? Best regards Tim Düsterhus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php
[PHP-DEV] RFC Idea - is_json - looking for feedback
I am following the RFC guideline for the first time. ( https://wiki.php.net/rfc/howto) As suggested there, I am here to get a feeling from you, regarding the following RFC for PHP. # Change (draft): New function in php called like: is_json(string $string): bool ## Description ### Parameters string $string -> string to find out if is a valid JSON or not ### Return Returns a bool. The function is capable to determine if the passed string is a valid JSON (true) or not (false). # Why this function ? At the moment the only way to determine if a JSON-string is valid we have to execute the json_decode() function. The drawback about this, is that json_decode() generates an in memory an object/array (depending on parameters) while parsing the string; this leads to a memory usage that is not needed (because we use memory for creating the object/array) and also can cause an error for reaching the memory-limit of the php process. Sometimes we just need to know is the string is a valid json or not, and nothing else. # Do we need something like this? If a check to an string is valid JSON then for sure I will have to use it in my code either as an object or an array. Well that is not true. There are plenty of cases where you just need to check if a string is a valid json and that is it. Just looking into stackoverflow will give you an idea about how many people is looking for something like this in an efficient way. # Who would develop the RFC ? I would develop the RFC with the help of the community if needed. I already have a first version of the is_json() function tested and ready for review. Thanks in advance, I am looking forward to hear your opinion on this. Kind Regards, Juan Carlos Morales.
Re: [PHP-DEV] Re: Character range syntax ".." for character masks
>1. Are there any reasonable objections to consistently implementing character range expressions for all character masks? would be a minor BC break to silently change the meaning of memspn($str, "a..b"), which currently has the same meaning as "a.b" with wasted cpu cycles, but with your suggestion it would become the same meaning as "ab" and the dot would no longer pass the check.. But then again, currently writing ".." is just a waste of cpu, and i don't think i've actually ever seen anyone do that in the wild ¯\_(ツ)_/¯ On Fri, 29 Jul 2022 at 10:58, Guilliam Xavier wrote: > On Fri, Jul 29, 2022 at 7:15 AM mickmackusa wrote: > > > > > > > On Monday, July 25, 2022, Guilliam Xavier > > wrote: > > > >> On Sat, Jul 9, 2022 at 1:56 AM mickmackusa > wrote: > >> > >>> I've discovered that several native string functions offer a character > >>> mask > >>> as a parameter. > >>> > >>> I've laid out my observations at > >>> https://stackoverflow.com/q/72865138/2943403 > >>> > >> > >> Out of curiosity, why do you say that strtr() is "not a good candidate > >> because character order matters" (although you give a reasonable > example)? > >> Maybe you have some counter-example? > >> > >> Regards, > >> > >> -- > >> Guilliam Xavier > >> > > > > I prefer to keep my scope very tight when posting on Stack Overflow. > > > > My focus was purely on enabling character range syntax for native > > functions with character mask parameters. My understanding of character > > masks in PHP requires single-byte characters and no meaning to character > > order. > > > > When strtr() is fed two strings, they cannot be considered "character > > masks" because the character orders matter. > > > > If extending character range syntax to parameters which are not character > > masks, I might support the feature for strtr(), but ensuring that the two > > strings are balanced will be made more difficult with ranged syntax. > > strtr() will silently condone imbalanced strings. > https://3v4l.org/PY15F > > > > Thanks for the clarifications. You're right that the internal > `php_charmask` converts a character list (possibly containing one or more > ranges) into a 256-char *mask*, thus "losing" any original order; so > strtr() actually couldn't use the same implementation (even without > ranges), and a counter-example is `strtr('adobe', 'abcde', 'ebcda')` > (`strtr('adobe', 'a..e', 'e..a')` would trigger a Warning "Invalid > '..'-range, '..'-range needs to be incrementing"). > > I had seen a parallel with the Unix `tr` command, which *does* support > [incrementing] ranges (e.g. both `echo adobe | tr abcde ABCDE` and `echo > adobe | tr a-e A-E` give "ADoBE", while `echo adobe | tr abcde edcba` gives > "eboda" but `echo adobe | tr a-e e-a` errors "range-endpoints of 'e-a' are > in reverse collating sequence order"), but its implementation doesn't use > character masks indeed ( > https://github.com/coreutils/coreutils/blob/master/src/tr.c), and `echo > abracadabra | tr a-f x` gives "xxrxxrx" not "xbrxcxdxbrx"; and it also > supports more things like POSIX character classes... > > PS: I find the `strtr(string $string, array $replace_pairs)` form generally > superior to the `strtr(string $string, string $from, string $to)` one > anyway ;) > > Regards, > > -- > Guilliam Xavier >
[PHP-DEV] Re: Character range syntax ".." for character masks
On Fri, Jul 29, 2022 at 7:15 AM mickmackusa wrote: > > > On Monday, July 25, 2022, Guilliam Xavier > wrote: > >> On Sat, Jul 9, 2022 at 1:56 AM mickmackusa wrote: >> >>> I've discovered that several native string functions offer a character >>> mask >>> as a parameter. >>> >>> I've laid out my observations at >>> https://stackoverflow.com/q/72865138/2943403 >>> >> >> Out of curiosity, why do you say that strtr() is "not a good candidate >> because character order matters" (although you give a reasonable example)? >> Maybe you have some counter-example? >> >> Regards, >> >> -- >> Guilliam Xavier >> > > I prefer to keep my scope very tight when posting on Stack Overflow. > > My focus was purely on enabling character range syntax for native > functions with character mask parameters. My understanding of character > masks in PHP requires single-byte characters and no meaning to character > order. > > When strtr() is fed two strings, they cannot be considered "character > masks" because the character orders matter. > > If extending character range syntax to parameters which are not character > masks, I might support the feature for strtr(), but ensuring that the two > strings are balanced will be made more difficult with ranged syntax. > strtr() will silently condone imbalanced strings. https://3v4l.org/PY15F > Thanks for the clarifications. You're right that the internal `php_charmask` converts a character list (possibly containing one or more ranges) into a 256-char *mask*, thus "losing" any original order; so strtr() actually couldn't use the same implementation (even without ranges), and a counter-example is `strtr('adobe', 'abcde', 'ebcda')` (`strtr('adobe', 'a..e', 'e..a')` would trigger a Warning "Invalid '..'-range, '..'-range needs to be incrementing"). I had seen a parallel with the Unix `tr` command, which *does* support [incrementing] ranges (e.g. both `echo adobe | tr abcde ABCDE` and `echo adobe | tr a-e A-E` give "ADoBE", while `echo adobe | tr abcde edcba` gives "eboda" but `echo adobe | tr a-e e-a` errors "range-endpoints of 'e-a' are in reverse collating sequence order"), but its implementation doesn't use character masks indeed ( https://github.com/coreutils/coreutils/blob/master/src/tr.c), and `echo abracadabra | tr a-f x` gives "xxrxxrx" not "xbrxcxdxbrx"; and it also supports more things like POSIX character classes... PS: I find the `strtr(string $string, array $replace_pairs)` form generally superior to the `strtr(string $string, string $from, string $to)` one anyway ;) Regards, -- Guilliam Xavier