Re: [E-devel] About eina_simple_xml_parse
On Wed, Dec 19, 2012 at 2:44 AM, thomasg tho...@gstaedtner.net wrote: On Wed, Dec 19, 2012 at 5:18 AM, Gustavo Sverzut Barbieri barbi...@profusion.mobi wrote: On Wednesday, December 19, 2012, thomasg wrote: On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri barbi...@profusion.mobi javascript:; wrote: Hi Thomas, The standard way is pretty fast and lean, but it is a SAX-like parser. That mean you only get tokens, for the tags you need to call yet another function to split the tag and arguments. It is good enough to parse svg, as done by Esvg. Should be also enough to parse config files and your chat.xml There is also a version trust creates nodes from XML. It's useful to debug and for simple cases without performance worries. As very likely you will store your parsed data in a custom structure than a generic Dom, I recommend using the sax version. I didn't try the example with your XML, but seems to be okay. The example could use eina_strbuf instead of array of strings, but that's marginal. Also could use the size and avoid strncmp(), but also marginal for an example. What is exactly failing? As you can see, the tags are totally wrong. They are neither corretly aligned (a foo can be closed with /bar and not just /foo), nor do the items correspond with the tags. So if the input is not 100% like the parser expects it, say there's an additional level, the parser won't fail but just receive totally wrong data. If I want to make sure that I get the date from tag bazDATA/baz, I have to manually compare the string and it seems that I might as well just parse it myself alltogether. That is always the case with sax. It allows you to handle errors yourself, like abort, auto fix, etc. like parsing bogus HTML that is common in the Internet. I don't recall how strict I was with the tree/node version, I guess to make it usable by Evas textblock u can close tags with /, but not sure if you specify an incorrect close tag what it would do. Anyway I'd recommend a final version to avoid the intermediate node tree and use sax directly, then you get more eficient data structures. Also consider always using the size. The original buffer is not modified, then strings will not be null terminated. Usually the sax parser will keep a stack, and you can validate based in that. But just validate if data is untrusted. Same for attributes, you just pay the price if you expect them for such tag. IOW it can be very efficient. The added benefit of using it over manual parse is that it will handle whitespaces and also do minimal tag boundary match. If is missing, etc. that will emit errors. Hm, I guess I had/have some misconceptions on how a SAX parser was supposed to work. It just seemed like a terrible idea to just take the data as it comes while ignoring half of it. SAX is much like a tokenizer. However, most will handle you new strings (either strdup() or modifying the input buffer) with the actual tag. It's a bit easier than what I did in eina's, but that one is faster and lighter on memory. But it means you must consider the size argument when you get it. The benefits of using a SAX parser is when you have those large config files that are composed of just tags, without arguments, and contents: config item keybla/key valuexyz/value /item /config you can create a list/array of My_Item structures with fields key and value, if these are integers or enumerations it's pretty simple to see how much fast it can be, zero string creation. :-) If you need a more traditional approach, use eina_simple_xml_node_load() http://docs.enlightenment.org/auto/eina/group__Eina__Simple__XML__Group.html#gadc951418424b679ea32ba63492894fe3and eina_simple_xml_node_dump() Test at http://svn.enlightenment.org/svn/e/trunk/efl/src/tests/eina/eina_test_simple_xml_parser.c Then again, to me XML seems like a terrible idea in general :) indeed. The reason of having eina_simple_xml is to avoid pulling in libxml2 and similar just to do some basic configuration parsing. Ideally someone would convert Efreet's menu parser to use it. -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -- MSN: barbi...@gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net
[E-devel] About eina_simple_xml_parse
Hi everyone, I was just looking at Eina Simple XML which, at first sight seemed a nice tiny XML library. However after looking closer, it seems that it is only useful to create basic XML files, but NOT to read/parse them. I used the eina_simple_xml_parse function and realized, that this basically is it, every single step of parsing has to be done manually and it basically makes no difference if eina_simple_xml is used or not at all. I then took a look at the example parser in eina_simple_xml_parser_01.c and realized that, for the same reason, this is a extremely poor parser, basically worthless (no offense intended). Actually it is so poor, it is not even a simple XML parser because all it does is check if the input looks somewhat similar to XML. I realize, that this is not meant to be a full featured parser or even a basic parser, but seeing as it is hardly a parser at all, I can't see the point of having it (as an example). On the other hand, simple xml does have the concept of nodes using eina inlists and such, but they seem to be usable only for creating xml, not reading it. So my question is: Am I missing something here? Here's a modified/broken chat.xml file to be parsed by the example code to show how poorly it does: http://bpaste.net/show/65296/ If there's no better way to do it, I'd suggest to make this explicit in the docs/examples and/or remove the example. Regards -- thomasg -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
Re: [E-devel] About eina_simple_xml_parse
Hi Thomas, The standard way is pretty fast and lean, but it is a SAX-like parser. That mean you only get tokens, for the tags you need to call yet another function to split the tag and arguments. It is good enough to parse svg, as done by Esvg. Should be also enough to parse config files and your chat.xml There is also a version trust creates nodes from XML. It's useful to debug and for simple cases without performance worries. As very likely you will store your parsed data in a custom structure than a generic Dom, I recommend using the sax version. I didn't try the example with your XML, but seems to be okay. The example could use eina_strbuf instead of array of strings, but that's marginal. Also could use the size and avoid strncmp(), but also marginal for an example. What is exactly failing? On Wednesday, December 19, 2012, thomasg wrote: Hi everyone, I was just looking at Eina Simple XML which, at first sight seemed a nice tiny XML library. However after looking closer, it seems that it is only useful to create basic XML files, but NOT to read/parse them. I used the eina_simple_xml_parse function and realized, that this basically is it, every single step of parsing has to be done manually and it basically makes no difference if eina_simple_xml is used or not at all. I then took a look at the example parser in eina_simple_xml_parser_01.c and realized that, for the same reason, this is a extremely poor parser, basically worthless (no offense intended). Actually it is so poor, it is not even a simple XML parser because all it does is check if the input looks somewhat similar to XML. I realize, that this is not meant to be a full featured parser or even a basic parser, but seeing as it is hardly a parser at all, I can't see the point of having it (as an example). On the other hand, simple xml does have the concept of nodes using eina inlists and such, but they seem to be usable only for creating xml, not reading it. So my question is: Am I missing something here? Here's a modified/broken chat.xml file to be parsed by the example code to show how poorly it does: http://bpaste.net/show/65296/ If there's no better way to do it, I'd suggest to make this explicit in the docs/examples and/or remove the example. Regards -- thomasg -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net javascript:; https://lists.sourceforge.net/lists/listinfo/enlightenment-devel -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -- MSN: barbi...@gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
Re: [E-devel] About eina_simple_xml_parse
On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri barbi...@profusion.mobi wrote: Hi Thomas, The standard way is pretty fast and lean, but it is a SAX-like parser. That mean you only get tokens, for the tags you need to call yet another function to split the tag and arguments. It is good enough to parse svg, as done by Esvg. Should be also enough to parse config files and your chat.xml There is also a version trust creates nodes from XML. It's useful to debug and for simple cases without performance worries. As very likely you will store your parsed data in a custom structure than a generic Dom, I recommend using the sax version. I didn't try the example with your XML, but seems to be okay. The example could use eina_strbuf instead of array of strings, but that's marginal. Also could use the size and avoid strncmp(), but also marginal for an example. What is exactly failing? As you can see, the tags are totally wrong. They are neither corretly aligned (a foo can be closed with /bar and not just /foo), nor do the items correspond with the tags. So if the input is not 100% like the parser expects it, say there's an additional level, the parser won't fail but just receive totally wrong data. If I want to make sure that I get the date from tag bazDATA/baz, I have to manually compare the string and it seems that I might as well just parse it myself alltogether. On Wednesday, December 19, 2012, thomasg wrote: Hi everyone, I was just looking at Eina Simple XML which, at first sight seemed a nice tiny XML library. However after looking closer, it seems that it is only useful to create basic XML files, but NOT to read/parse them. I used the eina_simple_xml_parse function and realized, that this basically is it, every single step of parsing has to be done manually and it basically makes no difference if eina_simple_xml is used or not at all. I then took a look at the example parser in eina_simple_xml_parser_01.c and realized that, for the same reason, this is a extremely poor parser, basically worthless (no offense intended). Actually it is so poor, it is not even a simple XML parser because all it does is check if the input looks somewhat similar to XML. I realize, that this is not meant to be a full featured parser or even a basic parser, but seeing as it is hardly a parser at all, I can't see the point of having it (as an example). On the other hand, simple xml does have the concept of nodes using eina inlists and such, but they seem to be usable only for creating xml, not reading it. So my question is: Am I missing something here? Here's a modified/broken chat.xml file to be parsed by the example code to show how poorly it does: http://bpaste.net/show/65296/ If there's no better way to do it, I'd suggest to make this explicit in the docs/examples and/or remove the example. Regards -- thomasg -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
Re: [E-devel] About eina_simple_xml_parse
On Wednesday, December 19, 2012, thomasg wrote: On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri barbi...@profusion.mobi javascript:; wrote: Hi Thomas, The standard way is pretty fast and lean, but it is a SAX-like parser. That mean you only get tokens, for the tags you need to call yet another function to split the tag and arguments. It is good enough to parse svg, as done by Esvg. Should be also enough to parse config files and your chat.xml There is also a version trust creates nodes from XML. It's useful to debug and for simple cases without performance worries. As very likely you will store your parsed data in a custom structure than a generic Dom, I recommend using the sax version. I didn't try the example with your XML, but seems to be okay. The example could use eina_strbuf instead of array of strings, but that's marginal. Also could use the size and avoid strncmp(), but also marginal for an example. What is exactly failing? As you can see, the tags are totally wrong. They are neither corretly aligned (a foo can be closed with /bar and not just /foo), nor do the items correspond with the tags. So if the input is not 100% like the parser expects it, say there's an additional level, the parser won't fail but just receive totally wrong data. If I want to make sure that I get the date from tag bazDATA/baz, I have to manually compare the string and it seems that I might as well just parse it myself alltogether. That is always the case with sax. It allows you to handle errors yourself, like abort, auto fix, etc. like parsing bogus HTML that is common in the Internet. I don't recall how strict I was with the tree/node version, I guess to make it usable by Evas textblock u can close tags with /, but not sure if you specify an incorrect close tag what it would do. Anyway I'd recommend a final version to avoid the intermediate node tree and use sax directly, then you get more eficient data structures. Also consider always using the size. The original buffer is not modified, then strings will not be null terminated. Usually the sax parser will keep a stack, and you can validate based in that. But just validate if data is untrusted. Same for attributes, you just pay the price if you expect them for such tag. IOW it can be very efficient. The added benefit of using it over manual parse is that it will handle whitespaces and also do minimal tag boundary match. If is missing, etc. that will emit errors. On Wednesday, December 19, 2012, thomasg wrote: Hi everyone, I was just looking at Eina Simple XML which, at first sight seemed a nice tiny XML library. However after looking closer, it seems that it is only useful to create basic XML files, but NOT to read/parse them. I used the eina_simple_xml_parse function and realized, that this basically is it, every single step of parsing has to be done manually and it basically makes no difference if eina_simple_xml is used or not at all. I then took a look at the example parser in eina_simple_xml_parser_01.c and realized that, for the same reason, this is a extremely poor parser, basically worthless (no offense intended). Actually it is so poor, it is not even a simple XML parser because all it does is check if the input looks somewhat similar to XML. I realize, that this is not meant to be a full featured parser or even a basic parser, but seeing as it is hardly a parser at all, I can't see the point of having it (as an example). On the other hand, simple xml does have the concept of nodes using eina inlists and such, but they seem to be usable only for creating xml, not reading it. So my question is: Am I missing something here? Here's a modified/broken chat.xml file to be parsed by the example code to show how poorly it does: http://bpaste.net/show/65296/ If there's no better way to do it, I'd suggest to make this explicit in the docs/examples and/or remove the example. Regards -- thomasg -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net javascript:; https://lists.sourceforge.net/lists/listinfo/enlightenment-devel -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -- MSN: barbi...@gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 -- LogMeIn Rescue: Anywhere, Anytime Remote
Re: [E-devel] About eina_simple_xml_parse
On Wed, Dec 19, 2012 at 5:18 AM, Gustavo Sverzut Barbieri barbi...@profusion.mobi wrote: On Wednesday, December 19, 2012, thomasg wrote: On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri barbi...@profusion.mobi javascript:; wrote: Hi Thomas, The standard way is pretty fast and lean, but it is a SAX-like parser. That mean you only get tokens, for the tags you need to call yet another function to split the tag and arguments. It is good enough to parse svg, as done by Esvg. Should be also enough to parse config files and your chat.xml There is also a version trust creates nodes from XML. It's useful to debug and for simple cases without performance worries. As very likely you will store your parsed data in a custom structure than a generic Dom, I recommend using the sax version. I didn't try the example with your XML, but seems to be okay. The example could use eina_strbuf instead of array of strings, but that's marginal. Also could use the size and avoid strncmp(), but also marginal for an example. What is exactly failing? As you can see, the tags are totally wrong. They are neither corretly aligned (a foo can be closed with /bar and not just /foo), nor do the items correspond with the tags. So if the input is not 100% like the parser expects it, say there's an additional level, the parser won't fail but just receive totally wrong data. If I want to make sure that I get the date from tag bazDATA/baz, I have to manually compare the string and it seems that I might as well just parse it myself alltogether. That is always the case with sax. It allows you to handle errors yourself, like abort, auto fix, etc. like parsing bogus HTML that is common in the Internet. I don't recall how strict I was with the tree/node version, I guess to make it usable by Evas textblock u can close tags with /, but not sure if you specify an incorrect close tag what it would do. Anyway I'd recommend a final version to avoid the intermediate node tree and use sax directly, then you get more eficient data structures. Also consider always using the size. The original buffer is not modified, then strings will not be null terminated. Usually the sax parser will keep a stack, and you can validate based in that. But just validate if data is untrusted. Same for attributes, you just pay the price if you expect them for such tag. IOW it can be very efficient. The added benefit of using it over manual parse is that it will handle whitespaces and also do minimal tag boundary match. If is missing, etc. that will emit errors. Hm, I guess I had/have some misconceptions on how a SAX parser was supposed to work. It just seemed like a terrible idea to just take the data as it comes while ignoring half of it. Then again, to me XML seems like a terrible idea in general :) Thanks for clearing it up. On Wednesday, December 19, 2012, thomasg wrote: Hi everyone, I was just looking at Eina Simple XML which, at first sight seemed a nice tiny XML library. However after looking closer, it seems that it is only useful to create basic XML files, but NOT to read/parse them. I used the eina_simple_xml_parse function and realized, that this basically is it, every single step of parsing has to be done manually and it basically makes no difference if eina_simple_xml is used or not at all. I then took a look at the example parser in eina_simple_xml_parser_01.c and realized that, for the same reason, this is a extremely poor parser, basically worthless (no offense intended). Actually it is so poor, it is not even a simple XML parser because all it does is check if the input looks somewhat similar to XML. I realize, that this is not meant to be a full featured parser or even a basic parser, but seeing as it is hardly a parser at all, I can't see the point of having it (as an example). On the other hand, simple xml does have the concept of nodes using eina inlists and such, but they seem to be usable only for creating xml, not reading it. So my question is: Am I missing something here? Here's a modified/broken chat.xml file to be parsed by the example code to show how poorly it does: http://bpaste.net/show/65296/ If there's no better way to do it, I'd suggest to make this explicit in the docs/examples and/or remove the example. Regards -- thomasg -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d