Re: [E-devel] About eina_simple_xml_parse

2012-12-19 Thread Gustavo Sverzut Barbieri
On Wed, Dec 19, 2012 at 2:44 AM, thomasg tho...@gstaedtner.net wrote:

 On Wed, Dec 19, 2012 at 5:18 AM, Gustavo Sverzut Barbieri 
 barbi...@profusion.mobi wrote:

  On Wednesday, December 19, 2012, thomasg wrote:
 
   On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri 
   barbi...@profusion.mobi javascript:; wrote:
  
Hi Thomas,
   
The standard way is pretty fast and lean, but it is a SAX-like
 parser.
   That
mean you only get tokens, for the tags you need to call yet another
function to split the tag and arguments.
   
It is good enough to parse svg, as done by Esvg. Should be also
 enough
  to
parse config files and your chat.xml
   
There is also a version trust creates nodes from XML. It's useful to
   debug
and for simple cases without performance worries. As very likely you
  will
store your parsed data in a custom structure than a generic Dom, I
recommend using the sax version.
   
I didn't try the example with your XML, but seems to be okay. The
  example
could use eina_strbuf instead of array of strings, but that's
 marginal.
Also could use the size and avoid strncmp(), but also marginal for an
example.
   
What is exactly failing?
   
  
   As you can see, the tags are totally wrong.
   They are neither corretly aligned (a foo can be closed with /bar
 and
   not just /foo), nor do the items correspond with the tags.
   So if the input is not 100% like the parser expects it, say there's an
   additional level, the parser won't fail but just receive totally wrong
   data.
   If I want to make sure that I get the date from tag bazDATA/baz, I
  have
   to manually compare the string and it seems that I might as well just
  parse
   it myself alltogether.
 
 
  That is always the case with sax. It allows you to handle errors
 yourself,
  like abort, auto fix, etc. like parsing bogus HTML that is common in the
  Internet.
 
  I don't recall how strict I was with the tree/node version, I guess to
 make
  it usable by Evas textblock u can close tags with /, but not sure if
 you
  specify an incorrect close tag what it would do. Anyway I'd recommend a
  final version to avoid the intermediate node tree and use sax directly,
  then you get more eficient data structures.
 
  Also consider always using the size. The original buffer is not modified,
  then strings will not be null terminated.
 
  Usually the sax parser will keep a stack, and you can validate based in
  that. But just validate if data is untrusted. Same for attributes, you
 just
  pay the price if you expect them for such tag. IOW it can be very
  efficient.
 
  The added benefit of using it over manual parse is that it will handle
  whitespaces and also do minimal tag boundary match. If  is missing, etc.
  that will emit errors.
 
 
 Hm, I guess I had/have some misconceptions on how a SAX parser was supposed
 to work.
 It just seemed like a terrible idea to just take the data as it comes while
 ignoring half of it.


SAX is much like a tokenizer. However, most will handle you new strings
(either strdup() or modifying the input buffer) with the actual tag. It's a
bit easier than what I did in eina's, but that one is faster and lighter on
memory. But it means you must consider the size argument when you get it.

The benefits of using a SAX parser is when you have those large config
files that are composed of just tags, without arguments, and contents:

config
item
keybla/key
valuexyz/value
/item
/config

you can create a list/array of My_Item structures with fields key and
value, if these are integers or enumerations it's pretty simple to see how
much fast it can be, zero string creation. :-)

If you need a more traditional approach, use eina_simple_xml_node_load()
http://docs.enlightenment.org/auto/eina/group__Eina__Simple__XML__Group.html#gadc951418424b679ea32ba63492894fe3and
eina_simple_xml_node_dump()

Test at
http://svn.enlightenment.org/svn/e/trunk/efl/src/tests/eina/eina_test_simple_xml_parser.c



 Then again, to me XML seems like a terrible idea in general :)



indeed. The reason of having eina_simple_xml is to avoid pulling in libxml2
and similar just to do some basic configuration parsing. Ideally someone
would convert Efreet's menu parser to use it.


-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--
MSN: barbi...@gmail.com
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202
--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net

[E-devel] About eina_simple_xml_parse

2012-12-18 Thread thomasg
Hi everyone,

I was just looking at Eina Simple XML which, at first sight seemed a nice
tiny XML library.
However after looking closer, it seems that it is only useful to create
basic XML files, but NOT to read/parse them.

I used the eina_simple_xml_parse function and realized, that this basically
is it, every single step of parsing has to be done manually and it
basically makes no difference if eina_simple_xml is used or not at all.
I then took a look at the example parser in eina_simple_xml_parser_01.c and
realized that, for the same reason, this is a extremely poor parser,
basically worthless (no offense intended).
Actually it is so poor, it is not even a simple XML parser because all it
does is check if the input looks somewhat similar to XML.

I realize, that this is not meant to be a full featured parser or even a
basic parser, but seeing as it is hardly a parser at all, I can't see the
point of having it (as an example).

On the other hand, simple xml does have the concept of nodes using eina
inlists and such, but they seem to be usable only for creating xml, not
reading it.

So my question is: Am I missing something here?

Here's a modified/broken chat.xml file to be parsed by the example code to
show how poorly it does: http://bpaste.net/show/65296/
If there's no better way to do it, I'd suggest to make this explicit in the
docs/examples and/or remove the example.

Regards

--
thomasg
--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] About eina_simple_xml_parse

2012-12-18 Thread Gustavo Sverzut Barbieri
Hi Thomas,

The standard way is pretty fast and lean, but it is a SAX-like parser. That
mean you only get tokens, for the tags you need to call yet another
function to split the tag and arguments.

It is good enough to parse svg, as done by Esvg. Should be also enough to
parse config files and your chat.xml

There is also a version trust creates nodes from XML. It's useful to debug
and for simple cases without performance worries. As very likely you will
store your parsed data in a custom structure than a generic Dom, I
recommend using the sax version.

I didn't try the example with your XML, but seems to be okay. The example
could use eina_strbuf instead of array of strings, but that's marginal.
Also could use the size and avoid strncmp(), but also marginal for an
example.

What is exactly failing?

On Wednesday, December 19, 2012, thomasg wrote:

 Hi everyone,

 I was just looking at Eina Simple XML which, at first sight seemed a nice
 tiny XML library.
 However after looking closer, it seems that it is only useful to create
 basic XML files, but NOT to read/parse them.

 I used the eina_simple_xml_parse function and realized, that this basically
 is it, every single step of parsing has to be done manually and it
 basically makes no difference if eina_simple_xml is used or not at all.
 I then took a look at the example parser in eina_simple_xml_parser_01.c and
 realized that, for the same reason, this is a extremely poor parser,
 basically worthless (no offense intended).
 Actually it is so poor, it is not even a simple XML parser because all it
 does is check if the input looks somewhat similar to XML.

 I realize, that this is not meant to be a full featured parser or even a
 basic parser, but seeing as it is hardly a parser at all, I can't see the
 point of having it (as an example).

 On the other hand, simple xml does have the concept of nodes using eina
 inlists and such, but they seem to be usable only for creating xml, not
 reading it.

 So my question is: Am I missing something here?

 Here's a modified/broken chat.xml file to be parsed by the example code to
 show how poorly it does: http://bpaste.net/show/65296/
 If there's no better way to do it, I'd suggest to make this explicit in the
 docs/examples and/or remove the example.

 Regards

 --
 thomasg

 --
 LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
 Remotely access PCs and mobile devices and provide instant support
 Improve your efficiency, and focus on delivering more value-add services
 Discover what IT Professionals Know. Rescue delivers
 http://p.sf.net/sfu/logmein_12329d2d
 ___
 enlightenment-devel mailing list
 enlightenment-devel@lists.sourceforge.net javascript:;
 https://lists.sourceforge.net/lists/listinfo/enlightenment-devel



-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--
MSN: barbi...@gmail.com
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202
--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] About eina_simple_xml_parse

2012-12-18 Thread thomasg
On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri 
barbi...@profusion.mobi wrote:

 Hi Thomas,

 The standard way is pretty fast and lean, but it is a SAX-like parser. That
 mean you only get tokens, for the tags you need to call yet another
 function to split the tag and arguments.

 It is good enough to parse svg, as done by Esvg. Should be also enough to
 parse config files and your chat.xml

 There is also a version trust creates nodes from XML. It's useful to debug
 and for simple cases without performance worries. As very likely you will
 store your parsed data in a custom structure than a generic Dom, I
 recommend using the sax version.

 I didn't try the example with your XML, but seems to be okay. The example
 could use eina_strbuf instead of array of strings, but that's marginal.
 Also could use the size and avoid strncmp(), but also marginal for an
 example.

 What is exactly failing?


As you can see, the tags are totally wrong.
They are neither corretly aligned (a foo can be closed with /bar and
not just /foo), nor do the items correspond with the tags.
So if the input is not 100% like the parser expects it, say there's an
additional level, the parser won't fail but just receive totally wrong data.
If I want to make sure that I get the date from tag bazDATA/baz, I have
to manually compare the string and it seems that I might as well just parse
it myself alltogether.



 On Wednesday, December 19, 2012, thomasg wrote:

  Hi everyone,
 
  I was just looking at Eina Simple XML which, at first sight seemed a nice
  tiny XML library.
  However after looking closer, it seems that it is only useful to create
  basic XML files, but NOT to read/parse them.
 
  I used the eina_simple_xml_parse function and realized, that this
 basically
  is it, every single step of parsing has to be done manually and it
  basically makes no difference if eina_simple_xml is used or not at all.
  I then took a look at the example parser in eina_simple_xml_parser_01.c
 and
  realized that, for the same reason, this is a extremely poor parser,
  basically worthless (no offense intended).
  Actually it is so poor, it is not even a simple XML parser because all it
  does is check if the input looks somewhat similar to XML.
 
  I realize, that this is not meant to be a full featured parser or even a
  basic parser, but seeing as it is hardly a parser at all, I can't see the
  point of having it (as an example).
 
  On the other hand, simple xml does have the concept of nodes using eina
  inlists and such, but they seem to be usable only for creating xml, not
  reading it.
 
  So my question is: Am I missing something here?
 
  Here's a modified/broken chat.xml file to be parsed by the example code
 to
  show how poorly it does: http://bpaste.net/show/65296/
  If there's no better way to do it, I'd suggest to make this explicit in
 the
  docs/examples and/or remove the example.
 
  Regards
 
  --
  thomasg

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


Re: [E-devel] About eina_simple_xml_parse

2012-12-18 Thread Gustavo Sverzut Barbieri
On Wednesday, December 19, 2012, thomasg wrote:

 On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri 
 barbi...@profusion.mobi javascript:; wrote:

  Hi Thomas,
 
  The standard way is pretty fast and lean, but it is a SAX-like parser.
 That
  mean you only get tokens, for the tags you need to call yet another
  function to split the tag and arguments.
 
  It is good enough to parse svg, as done by Esvg. Should be also enough to
  parse config files and your chat.xml
 
  There is also a version trust creates nodes from XML. It's useful to
 debug
  and for simple cases without performance worries. As very likely you will
  store your parsed data in a custom structure than a generic Dom, I
  recommend using the sax version.
 
  I didn't try the example with your XML, but seems to be okay. The example
  could use eina_strbuf instead of array of strings, but that's marginal.
  Also could use the size and avoid strncmp(), but also marginal for an
  example.
 
  What is exactly failing?
 

 As you can see, the tags are totally wrong.
 They are neither corretly aligned (a foo can be closed with /bar and
 not just /foo), nor do the items correspond with the tags.
 So if the input is not 100% like the parser expects it, say there's an
 additional level, the parser won't fail but just receive totally wrong
 data.
 If I want to make sure that I get the date from tag bazDATA/baz, I have
 to manually compare the string and it seems that I might as well just parse
 it myself alltogether.


That is always the case with sax. It allows you to handle errors yourself,
like abort, auto fix, etc. like parsing bogus HTML that is common in the
Internet.

I don't recall how strict I was with the tree/node version, I guess to make
it usable by Evas textblock u can close tags with /, but not sure if you
specify an incorrect close tag what it would do. Anyway I'd recommend a
final version to avoid the intermediate node tree and use sax directly,
then you get more eficient data structures.

Also consider always using the size. The original buffer is not modified,
then strings will not be null terminated.

Usually the sax parser will keep a stack, and you can validate based in
that. But just validate if data is untrusted. Same for attributes, you just
pay the price if you expect them for such tag. IOW it can be very
efficient.

The added benefit of using it over manual parse is that it will handle
whitespaces and also do minimal tag boundary match. If  is missing, etc.
that will emit errors.






 
  On Wednesday, December 19, 2012, thomasg wrote:
 
   Hi everyone,
  
   I was just looking at Eina Simple XML which, at first sight seemed a
 nice
   tiny XML library.
   However after looking closer, it seems that it is only useful to create
   basic XML files, but NOT to read/parse them.
  
   I used the eina_simple_xml_parse function and realized, that this
  basically
   is it, every single step of parsing has to be done manually and it
   basically makes no difference if eina_simple_xml is used or not at all.
   I then took a look at the example parser in eina_simple_xml_parser_01.c
  and
   realized that, for the same reason, this is a extremely poor parser,
   basically worthless (no offense intended).
   Actually it is so poor, it is not even a simple XML parser because all
 it
   does is check if the input looks somewhat similar to XML.
  
   I realize, that this is not meant to be a full featured parser or even
 a
   basic parser, but seeing as it is hardly a parser at all, I can't see
 the
   point of having it (as an example).
  
   On the other hand, simple xml does have the concept of nodes using eina
   inlists and such, but they seem to be usable only for creating xml, not
   reading it.
  
   So my question is: Am I missing something here?
  
   Here's a modified/broken chat.xml file to be parsed by the example code
  to
   show how poorly it does: http://bpaste.net/show/65296/
   If there's no better way to do it, I'd suggest to make this explicit in
  the
   docs/examples and/or remove the example.
  
   Regards
  
   --
   thomasg
 

 --
 LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
 Remotely access PCs and mobile devices and provide instant support
 Improve your efficiency, and focus on delivering more value-add services
 Discover what IT Professionals Know. Rescue delivers
 http://p.sf.net/sfu/logmein_12329d2d
 ___
 enlightenment-devel mailing list
 enlightenment-devel@lists.sourceforge.net javascript:;
 https://lists.sourceforge.net/lists/listinfo/enlightenment-devel



-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--
MSN: barbi...@gmail.com
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202
--
LogMeIn Rescue: Anywhere, Anytime Remote 

Re: [E-devel] About eina_simple_xml_parse

2012-12-18 Thread thomasg
On Wed, Dec 19, 2012 at 5:18 AM, Gustavo Sverzut Barbieri 
barbi...@profusion.mobi wrote:

 On Wednesday, December 19, 2012, thomasg wrote:

  On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri 
  barbi...@profusion.mobi javascript:; wrote:
 
   Hi Thomas,
  
   The standard way is pretty fast and lean, but it is a SAX-like parser.
  That
   mean you only get tokens, for the tags you need to call yet another
   function to split the tag and arguments.
  
   It is good enough to parse svg, as done by Esvg. Should be also enough
 to
   parse config files and your chat.xml
  
   There is also a version trust creates nodes from XML. It's useful to
  debug
   and for simple cases without performance worries. As very likely you
 will
   store your parsed data in a custom structure than a generic Dom, I
   recommend using the sax version.
  
   I didn't try the example with your XML, but seems to be okay. The
 example
   could use eina_strbuf instead of array of strings, but that's marginal.
   Also could use the size and avoid strncmp(), but also marginal for an
   example.
  
   What is exactly failing?
  
 
  As you can see, the tags are totally wrong.
  They are neither corretly aligned (a foo can be closed with /bar and
  not just /foo), nor do the items correspond with the tags.
  So if the input is not 100% like the parser expects it, say there's an
  additional level, the parser won't fail but just receive totally wrong
  data.
  If I want to make sure that I get the date from tag bazDATA/baz, I
 have
  to manually compare the string and it seems that I might as well just
 parse
  it myself alltogether.


 That is always the case with sax. It allows you to handle errors yourself,
 like abort, auto fix, etc. like parsing bogus HTML that is common in the
 Internet.

 I don't recall how strict I was with the tree/node version, I guess to make
 it usable by Evas textblock u can close tags with /, but not sure if you
 specify an incorrect close tag what it would do. Anyway I'd recommend a
 final version to avoid the intermediate node tree and use sax directly,
 then you get more eficient data structures.

 Also consider always using the size. The original buffer is not modified,
 then strings will not be null terminated.

 Usually the sax parser will keep a stack, and you can validate based in
 that. But just validate if data is untrusted. Same for attributes, you just
 pay the price if you expect them for such tag. IOW it can be very
 efficient.

 The added benefit of using it over manual parse is that it will handle
 whitespaces and also do minimal tag boundary match. If  is missing, etc.
 that will emit errors.


Hm, I guess I had/have some misconceptions on how a SAX parser was supposed
to work.
It just seemed like a terrible idea to just take the data as it comes while
ignoring half of it.
Then again, to me XML seems like a terrible idea in general :)

Thanks for clearing it up.



 
 
  
   On Wednesday, December 19, 2012, thomasg wrote:
  
Hi everyone,
   
I was just looking at Eina Simple XML which, at first sight seemed a
  nice
tiny XML library.
However after looking closer, it seems that it is only useful to
 create
basic XML files, but NOT to read/parse them.
   
I used the eina_simple_xml_parse function and realized, that this
   basically
is it, every single step of parsing has to be done manually and it
basically makes no difference if eina_simple_xml is used or not at
 all.
I then took a look at the example parser in
 eina_simple_xml_parser_01.c
   and
realized that, for the same reason, this is a extremely poor parser,
basically worthless (no offense intended).
Actually it is so poor, it is not even a simple XML parser because
 all
  it
does is check if the input looks somewhat similar to XML.
   
I realize, that this is not meant to be a full featured parser or
 even
  a
basic parser, but seeing as it is hardly a parser at all, I can't see
  the
point of having it (as an example).
   
On the other hand, simple xml does have the concept of nodes using
 eina
inlists and such, but they seem to be usable only for creating xml,
 not
reading it.
   
So my question is: Am I missing something here?
   
Here's a modified/broken chat.xml file to be parsed by the example
 code
   to
show how poorly it does: http://bpaste.net/show/65296/
If there's no better way to do it, I'd suggest to make this explicit
 in
   the
docs/examples and/or remove the example.
   
Regards
   
--
thomasg
  
 
 
 --
  LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
  Remotely access PCs and mobile devices and provide instant support
  Improve your efficiency, and focus on delivering more value-add services
  Discover what IT Professionals Know. Rescue delivers
  http://p.sf.net/sfu/logmein_12329d2d