Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
Michael, Internally JsonTools just stores the document as JSON fragments or text, in the case of object, array, or null, the text is empty, but in the case of number or bool, it converts the text to a number or bool when you ask AsNumber or AsBoolean. Now normally this wont be a problem because as you read a document you'll read a particular node possibly once as you hand it off to some other system, be it a database insert, a subroutine, or other use. In practice what's happening is I am delaying the conversion until it's needed, which makes my parser faster when just parsing a document. If you need to use the node value as a number or a bool, then the cost of the conversion happens at that time, but if you don't need to use a particular field there is no conversion cost. This is one of the reasons my parser is faster. I still validate the number, bool, and other fields but I don't convert until needed. The downside, as you've seen is that converting the same value over and over again results in longer execution time. That said, if a user request every field as a number or bool once when processing an incoming request (say you are writing a REST service) then the performance would equal out and the speed difference would favor the faster parser. A side benefit of storing JSON fragments is that the node kind can be changed easily without the need to remove an item from an internal list, destroy the class, create a new one of a different class, apply a copy of the original name, and reinsert. Of course this lazy evaluation of numbers and boolean scould be improved by caching the conversion once calculated (e.g. StrToFloat) and reusing the conversion in any subsequent calls. For now I will keep it simple as is as I feel for most cases the user will either not need the conversion, or need it once, and the performance lost in invoking it many times wills be made up for in all the times the conversion is never used. Real world examples of never used JSON fields: Calling most web REST methods which return JSON as a web result where the caller is only interested in success or failure with a message. Acting as a RESTful service where many JSON request bodies use optional values that are meant to be skipped. Retrieving settings where and option is never used, yet stored, such as a dockable or floating pallet position that is always left closed. And so on... -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
On Sat, 31 Aug 2019, Luca Olivetti via lazarus wrote: El 31/8/19 a les 16:22, Michael Van Canneyt via lazarus ha escrit: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse Also frequently encountered is omitting "" around property names. JSON is a subset of Javascript: D.Parse('{ d: 12345678.3 }'); The parser at mozilla says: "Error: JSON.parse: expected property name or '}' at line 1 column 3 of the JSON data" I know. But if you treat it as Javascript e.g. b = eval('{ d: 12345678.3 }'); it does work. JSON is a subset of Javascript. That is why I said "frequently encountered". Not all parsers handle & allow it. But ExtJS for instance handles&produces it. (I used ExtJS and had to add it for that) On large JSON files this shaves off quite some bytes off the result, I guess that is why they did it. (not that it helped, an ExtJS JSON Store is dead slow.) Michael. -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
El 31/8/19 a les 16:22, Michael Van Canneyt via lazarus ha escrit: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse Also frequently encountered is omitting "" around property names. JSON is a subset of Javascript: D.Parse('{ d: 12345678.3 }'); The parser at mozilla says: "Error: JSON.parse: expected property name or '}' at line 1 column 3 of the JSON data" Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
On Sat, 31 Aug 2019, Anthony Walter wrote: Could you include https://github.com/BeRo1985/pasjson in the comparison? Sure. I also have a few other people have requested. I will also list the license of each in the first table. [snip] For example if wanted to store object state using RTTI in a JSON file, create a separate TJsonObjectState class to handle this for you. Or if you wanted to create a database table from a JSON file, or create a JSON file from a database table, then again write this into its own class. Not sure I understand what you mean. It seems to me that in that case you will repeat your scanner/parser code all over the place. in case of an error, you need to fix it in as many places. I can of course be wrong. The current fpjson scanner/parser can be used anywhere. You don't need to use fpjson data structures to be able to use the scanner or reader: It's perfectly possible to use the scanner/parser to create TJSONNode from JSONTools. But general usability comes indeed at the price of some speed loss. That said, your classes are markedly slower when it comes to data manipulation. The following is 100 (!) times slower than fpjson code: {$mode objfpc} {$h+} uses dateutils,sysutils, jsontools; Var I,N : Integer; D,E : TJSONNode; B : double; NT : TDateTime; begin N:=1000; D:=TJSONNode.Create; D.Parse('{ "d": 12345678.3 }'); E:=D.Child(0); NT:=Now; B:=1; for i:=0 to N do B:=E.AsNumber * 2; Writeln('Time ',MillisecondsBetween(Now,NT)); D.Free; end. home:~> ./tb Time 3888 Same program in fpJSON: home:~> ./tb2 Time 32 This is because when accessing the value, you must do the conversion to float. Every time. This is true for JSON string values as well: you must re-decode the JSON on every access. And you do it over and over again, each time the data is accessed. No doubt you can easily fix this by storing the value in the proper type, but this will slow down your parser. So: if you use the resulting JSON a lot, code will run faster in fpJSON. It thus boils down to a choice: do you need fast processing or fast parsing ? In the end it will probably not matter: most likely all the nodes will be traversed in a typical use case, and the overall time for your and my approach will be similar. This is the danger of benchmarks. They focus on 1 aspect. In real life, all aspects are usually present. Anyway. While coding this small test, I noticed that this also does not work in jsontools: D : TJSONNode; begin D.Parse('true'); // or D.Parse('12345678.3'); end. An unhandled exception occurred at $004730B0: EJsonException: Root node must be an array or object if you look at browser specs, this is supposed to work as well: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse Also frequently encountered is omitting "" around property names. JSON is a subset of Javascript: D.Parse('{ d: 12345678.3 }'); Results in: An unhandled exception occurred at $00473075: EJsonException: Error while parsing text Both are things which are supported in fpJSON. No doubt you can fix this easily. So you see, with some extra study, the picture of what is "better", jsontool or fpjson is not so clear as it may seem. In the end it all boils down to some choices. Michael. PS. With 2 relatively simple changes, I took 40% off the parsing time of fpJSON. No doubt more gain can be achieved, for example I didn't do the suggestion by Benito yet. -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
> Could you include https://github.com/BeRo1985/pasjson in the comparison? Sure. I also have a few other people have requested. I will also list the license of each in the first table. Regarding a huge gigabytes of JSON in a file, I know a small portion of programmers of people might be inclined use it as an offline database format much like CSV. Even though by far most JSON is used with XMLHttpRequest, REST APIS, or storing settings and configurations, there are bound to be endless requests for use cases with JSON. For example to accommodate the reading a huge files as indoividual records a helper class operating outside the definition of a JSON parser could accomplish this goal. For example, it would be relatively easy to write in a separate file: type TJsonStreamReader = class public constructor Create(Stream: TStream; OwnsStream: Boolean = False); constructor CreateFromFile(const FileName: string); destructor Destroy; function Read(out Parser: TSomeJsonParser): Boolean; end; Then use as ... var R: TJsonStreamReader; P: TSomeJsonParser; begin R := TJsonStreamReader.Create(InputStreamOrFileName); try while R.Read(P) do // Read JSON record here finally R.Free; end; end; And in this way a large file could be read in small blocks and given back to the user as a parser to allow for processing of individual records. The benefit of breaking this into its own class is that you do not need to mix in every possible use case into the parser. You can simply write separate use cases into their own independent units, rather than trying to make a super class which handles every possible concern. For example if wanted to store object state using RTTI in a JSON file, create a separate TJsonObjectState class to handle this for you. Or if you wanted to create a database table from a JSON file, or create a JSON file from a database table, then again write this into its own class. The point is, saying this JSON class does lots of these things is the wrong approach (IMO), as these use case scenarios as likely endless and would add unnecessary cruft to a parser. Even designing a plug in or other extensible seems unnecessary, when simple separate classes to add functionality works as well without all the complexity. -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
On Fri, Aug 30, 2019 at 11:02 PM Michael Van Canneyt via lazarus wrote: > Can you try setting defaultsystemcodepage to UTF8 ? Feeling a little bit embarrassed now (I'm used to Lazarus which defaults to that). With DefaultSystemCodePage := CP_UTF8 it works: Handles unicode chars correctly: >{ "name": "Joe®Schmoe", "occupation": "bank teller \u00Ae " }< Name: 004A 006F 0065 00AE 0053 0063 0068 006D 006F 0065 [Joe®Schmoe] Expected: 004A 006F 0065 00AE 0053 0063 0068 006D 006F 0065 [Joe®Schmoe] Occupation: 0062 0061 006E 006B 0020 0074 0065 006C 006C 0065 0072 0020 00AE 0020 [bank teller ® ] Expected: 0062 0061 006E 006B 0020 0074 0065 006C 006C 0065 0072 0020 00AE 0020 [bank teller ® ] TRUE -- Bart -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
Could you include https://github.com/BeRo1985/pasjson in the comparison? On Fri, Aug 30, 2019 at 4:22 PM Anthony Walter via lazarus < lazarus@lists.lazarus-ide.org> wrote: > Alan, oh that's a good idea. I will do that as well as add a few more > parser libraries as requested by a few people in other non mailing lists > threads. I will also try to find out what's going on the unicode strings as > it might be a problem with the compiler. > > Michael, > > I am on Linux as well, but I will test under Windows and Mac too. > -- > ___ > lazarus mailing list > lazarus@lists.lazarus-ide.org > https://lists.lazarus-ide.org/listinfo/lazarus > -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
On Sat, 31 Aug 2019, Sven Barth via lazarus wrote: Am 31.08.2019 um 09:45 schrieb Michael Van Canneyt via lazarus: Codepages & strings require careful setup. Contrary to popular belief, it does not 'just work'. All this is documented: https://www.freepascal.org/docs-html/current/ref/refsu9.html#x32-390003.2.4 Many people tend to ignore this, because Lazarus does a lot behind the scenes (which is a good thing). Looking at the text of the "Code page conversions" section: what do these mean: (CODE_CP ¡¿ CP_ACP) ? Or should it have been (CODE_CP <> CP_ACP)? Latex->html conversion errors, I suppose. Using $\lt$ or so should fix it. Michael.-- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
Am 31.08.2019 um 11:08 schrieb Sven Barth: Am 31.08.2019 um 09:45 schrieb Michael Van Canneyt via lazarus: Codepages & strings require careful setup. Contrary to popular belief, it does not 'just work'. All this is documented: https://www.freepascal.org/docs-html/current/ref/refsu9.html#x32-390003.2.4 Many people tend to ignore this, because Lazarus does a lot behind the scenes (which is a good thing). Looking at the text of the "Code page conversions" section: what do these mean: (CODE_CP ¡¿ CP_ACP) ? Or should it have been (CODE_CP <> CP_ACP)? And there's another one in the section "UTF8String" at the bottom: (ordinal value ¡128) Should this have been (ordinal value < 128)? Regards, Sven -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
Am 31.08.2019 um 09:45 schrieb Michael Van Canneyt via lazarus: Codepages & strings require careful setup. Contrary to popular belief, it does not 'just work'. All this is documented: https://www.freepascal.org/docs-html/current/ref/refsu9.html#x32-390003.2.4 Many people tend to ignore this, because Lazarus does a lot behind the scenes (which is a good thing). Looking at the text of the "Code page conversions" section: what do these mean: (CODE_CP ¡¿ CP_ACP) ? Or should it have been (CODE_CP <> CP_ACP)? Regards, Sven -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Project Groups are saving LPI files
On 28.08.2019 09:23, Ondrej Pokorny via lazarus wrote: On 17.08.2019 16:53, Mattias Gaertner via lazarus wrote: On Thu, 15 Aug 2019 13:43:58 +0200 Ondrej Pokorny via lazarus wrote: [...] Project groups are saving my LPI files upon loading the project group. Huh? Do you really mean, merely opening a lpg touches some lpi files? Yes, this is exactly what happens. The project group rewrites all lpi files within the project group itself. I reported it: https://bugs.freepascal.org/view.php?id=36030 Ondrej -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
On Sat, 31 Aug 2019, Michael Van Canneyt via lazarus wrote: On Sat, 31 Aug 2019, Anthony Walter via lazarus wrote: Okay, going back and looking through the messages I see you did post a test with: {$codepage UTF8} and uses cwstring Here are the results with that added: On Linux using {$codepage UTF8} by itself causes both tests to fail. Adding cwstring causes both tests to work. On Windows trying to use cwstring causes the compilation to fail, but with {$codepage UTF8} added the tests work. I will try a few more tests, but there should be an "out of the box" option to get FPJson working without the need to add ifdefs along with extra directives added outside of the FPJson units themselves. Glad you picked it up. See my other mail for more details. Bottom line: You simply cannot ignore this. Doing so is asking for problems. It may work for you, but fail for someone else, and then you'll be scratching your head as to "why on earth doesn't it work?" One last thing. Lazarus includes cwstring by default: interfaces/carbon/interfaces.pas: {$IFNDEF DisableCWString}cwstring,{$ENDIF} interfaces/cocoa/interfaces.pas: {$IFNDEF DisableCWString}cwstring,{$ENDIF} interfaces/gtk2/interfaces.pas:{$IFDEF UNIX}{$IFNDEF DisableCWString}uses cwstring;{$ENDIF}{$ENDIF} interfaces/gtk3/interfaces.pp: {$IFDEF UNIX}{$IFNDEF DisableCWString}cwstring,{$ENDIF}{$ENDIF} interfaces/qt5/interfaces.pp: {$IFDEF UNIX}{$IFNDEF DisableCWString}cwstring,{$ENDIF}{$ENDIF} interfaces/qt/interfaces.pp: {$IFDEF UNIX}{$IFNDEF DisableCWString}cwstring,{$ENDIF}{$ENDIF} If you look in the code, you'll see that it handles codepages explicitly in many places. Just to corroborate that ignoring this is not an option, and that lazarus goes to great lengths to make it easier on people. Michael. -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
On Sat, 31 Aug 2019, Anthony Walter via lazarus wrote: Okay, going back and looking through the messages I see you did post a test with: {$codepage UTF8} and uses cwstring Here are the results with that added: On Linux using {$codepage UTF8} by itself causes both tests to fail. Adding cwstring causes both tests to work. On Windows trying to use cwstring causes the compilation to fail, but with {$codepage UTF8} added the tests work. I will try a few more tests, but there should be an "out of the box" option to get FPJson working without the need to add ifdefs along with extra directives added outside of the FPJson units themselves. Glad you picked it up. See my other mail for more details. Bottom line: You simply cannot ignore this. Doing so is asking for problems. It may work for you, but fail for someone else, and then you'll be scratching your head as to "why on earth doesn't it work?" Michael. -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
Okay, going back and looking through the messages I see you did post a test with: {$codepage UTF8} and uses cwstring Here are the results with that added: On Linux using {$codepage UTF8} by itself causes both tests to fail. Adding cwstring causes both tests to work. On Windows trying to use cwstring causes the compilation to fail, but with {$codepage UTF8} added the tests work. I will try a few more tests, but there should be an "out of the box" option to get FPJson working without the need to add ifdefs along with extra directives added outside of the FPJson units themselves. I will write a few more unicode tests, perhaps with 4 byte character strings, and some other potential unicode problems to be sure both are working before we come to a final resolution. -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
On Sat, 31 Aug 2019, Anthony Walter via lazarus wrote: Michael, regarding this unicode problem, all the code has already been posted in this thread. program Test; uses FPJson, JsonParser, JsonTools; There you are. You're missing the cwstring unit and the codepage directive. Change the above code to {$mode objfpc} {$h+} {$codepage utf8} uses {$IFDEF UNIX}cwstring, {$ENDIF} FPJson, JsonParser, JsonTools; and it will work correctly. (The objpas and $h+ are probably in your fpc.cfg or lazarus setup) Your program will only work correctly in a utf8-only environment. (see also below) But fpJSON relies on the fpc infrastructure to handle all codepages, as a consequence this infrastructure also must be set up properly. Now you can see why I insisted on using my program, it was known to work correctly: it sets up things properly. If you look at my initial mail, you'll also see that I explicitly mentioned including cwstring. You probably failed to pick up on that important piece of info. So, mystery solved. That said : Unfortunately JSONTools is also not without problems. I copied the program to a windows VM. Attached screenshot of the output. As you can see, jsontools also does not work correctly. It's no mystery why not. I had to add DefaultSystemCodePage:=CP_UTF8; as the first line in the program, then it does show TRUE for both tests. Now, if you work in lazarus, it does this for you, so you don't notice/know it. Codepages & strings require careful setup. Contrary to popular belief, it does not 'just work'. All this is documented: https://www.freepascal.org/docs-html/current/ref/refsu9.html#x32-390003.2.4 Many people tend to ignore this, because Lazarus does a lot behind the scenes (which is a good thing). But if people use your JSONTools in a 'mixed' environment, you might get strange results, if you ignore the correct and careful setup. You control your environment, and jsontools will function correctly in your environment. But it's a big world out there, where things might be happening that you didn't foresee but which do influence jsontools. I hope with my explanations, you are now well equipped/informed to strengthen jsontools and help people should problems nevertheless pop up. Now that we've hopefully established that fpjson does work correctly, I would appreciate it if you could correct the JSON test comparison page you created. Michael.-- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
Michael, I hadn't tried your example code yet as I thought the discussion was on the topic of the unicode failure, and your example was about parsing speed. I'll be happy to take a look at speed improvements, but like you I am interested to find our what's failing with VerifyUnicodeCharsFPJson. -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
If there is any chance the char codes are being altered through whatever browser / mail client you are using, here is a direct link to the program source: https://cache.getlazarus.org/projects/test.lpr -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] [fpc-pascal] Tests results of several pascal based JSON parsers
Michael, regarding this unicode problem, all the code has already been posted in this thread. program Test; uses FPJson, JsonParser, JsonTools; const UnicodeChars = '{ "name": "Joe®Schmoe", "occupation": "bank teller \u00Ae " }'; function VerifyUnicodeCharsFPJson: Boolean; var N: TJSONData; begin N := GetJSON(UnicodeChars); Result := (N.Items[0].AsUnicodeString = 'Joe®Schmoe') and (N.Items[1].AsUnicodeString = 'bank teller ® '); N.Free; end; function VerifyUnicodeCharsJsonTools: Boolean; const UnicodeChars = '{ "name": "Joe®Schmoe", "occupation": "bank teller \u00Ae " }'; var N: TJsonNode; begin N := TJsonNode.Create; N.Parse(UnicodeChars); Result := (N.Child(0).AsString = 'Joe®Schmoe') and (N.Child(1).AsString = 'bank teller ® '); N.Free; end; begin WriteLn('FPJson Handles unicode chars correctly: ', VerifyUnicodeCharsFPJson); WriteLn('JsonTools Handles unicode chars correctly: ', VerifyUnicodeCharsJsonTools); end. Output: FPJson Handles unicode chars correctly: FALSE JsonTools Handles unicode chars correctly: TRUE Tested on both Linux and Windows with the same results. Differing versions of FPC on differing platforms and other people have verified the same result. Try the tests yourself. Maybe you can figure out what's going wrong. -- ___ lazarus mailing list lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus