On Wed, 9 Mar 2011 16:31:23 -0300 Gustavo Sverzut Barbieri
<barbi...@profusion.mobi> said:

> On Wed, Mar 9, 2011 at 8:19 AM, Carsten Haitzler <ras...@rasterman.com> wrote:
> > On Sun, 27 Feb 2011 20:25:06 -0300 Gustavo Sverzut Barbieri
> > <barbi...@profusion.mobi> said:
> >
> > well i'll respond to the original post here after having read this entire
> > thread.
> >
> > i've actually looked at the parser code gustavo posted - i suspect only few
> > people did and they just saw "xml" and "eina" and jumped :). let me be
> > blunt - i am no fan of xml. i despise it like being smeared with fish poop
> > first thing on a sunday morning. (some people might be into that... and
> > some peolpe are also into xml... not sure if there's a link...)
> >
> > anyway...  overall there is something nice about the simplicity of this
> > parser. it avoids allocating anything itself - it leaves you to do so in
> > the parse callback. my first reaction was "oh no xml.. not in eina" but
> > having a read and some time to think...i am thinking - we have to parse xml
> > for efreet already. we have it forced on us already. moving efreet to use
> > this eina parser should make life easier and help share a parser for more
> > purposes, so i'd be up for this going into eina conditionally that efreet's
> > parser uses this instead and any missing features it needs are moved into
> > ssxmlp. but.. with 2 extra things (not having looked at efreet's needs)...
> >
> > 1. the ability to quit the current decode and report WHICH byte it quit at
> > to allow multi-pass parsing (not an all-or none pass or fail). sure u can do
> > this outside via a global var etc. but it'd be nicer if it were a return
> > from eina_simple_xml_parse() that said what byte it got up to (if return ==
> > buflen then it got all the way to the end with no quits).
> 
> this is simple enough to change, and I can do it, but you don't need a
> global variable now, you get a context (void *) to your callback, and
> you get reports about error there, so you could flag it there. Do you
> want it anyways?

oh sure - i guess what i meant was that u cant pass it back on the stack. i
guess you can piggy back on the data ptr etc. etc. - ok scrap this idea.

> > 2. i don't like the fact that it doesn't handle encoding. that at least it
> > doesnt auto-parse the <xml ... encoding="xxxx"> for you - and by this i mean
> > still hand it to you to parse via the tag handling callback, BUT ALSO have
> > an internal handler that parses this, stores encoding and then provides a
> > "encoding decoder" (if encoding is not ascii) that returns utf8 text always
> > from whatever encoding the document has - and... well.. also handles the
> > &amp; etc. escapes too (optionally). that's really my only gripe that it
> > handle this so a simple parser can just, in its Eina_Simple_XML_Cb callback
> > take content and say "hey - decode this baby given the documents encoding
> > to standard utf8 for me - k.tnx.bi".
> 
> I can store the encoding myself and make it available upon request,

that'd be an improvement for sure. as such xml defines the following for
encoding:

http://www.opentag.com/xfaq_enc.htm

so it's not just a matter of handling <xml encoding="..."> but also the first
bytes. based on these you have to decode the rest of the xml doc appropriately
- including tags and all. at least we'd have to support utf-8 and utf-16 - and
iconv wrapping can do pretty much everything for us buit even the tags and
CDATA and so on have to be handled with the encoding given - that's why i am
kind of pushing for this :).

> eina already provides wrappers around iconv() so you could call that.

sure - though you'd also want to be able to handle escapes. that's not too hard
to do and the wrapper call can do both of them in the same pass (depending if
you ask it to do it or not).

> I can also provide an entity-to-utf8, just copy it from Evas... but

yup.

> are these required at such level? Efreet's don't do any of those

i'm sure it doesn't - but i'd like to see a slightly more complete xml impl
here - not more than this, just this. i hope i'm not pushing for too much?

> AFAIR. I'll just not make it in the default path or automatic, as I
> believe most people will use without it (did you ever see a file in
> your linux install that is not UTF8 for ages? I mean these
> configuration files). If I need to do it I'd do in:

sure - i was thinking keep the structure as you have it now, BUT be able to do
char *eina_simple_xml_decode(Eina_Simple_XML_Type type, const char *content,
int offset, int length, Eina_Bool decode_escapes);

but one thing is missing here - a "document handle" so i'd suggest adding a

typedef struct _Eina_Simple_XML_Doc Eina_Simple_XML_Doc;

and then add that as a param to Eina_Simple_XML_Cb and
Eina_Simple_XML_Attribute_Cb (after void *data). now the callback gets passed
the doc handle and this can be non-public and store encoding etc. etc. and then
be passed to eina_simple_xml_decode() as an added param. i'd prefer this to the
encoding param below. it allows more document things to be stored and passed in
future without breaking api.

> typedef Eina_Bool (*Eina_Simple_XML_Cb)(void *data, const char
> *encoding, Eina_Simple_XML_Type type, const char *content, unsigned
> offset, unsigned length);
> typedef Eina_Bool (*Eina_Simple_XML_Attribute_Cb)(void *data, const
> char *encoding, const char *key, const char *value);
> 
> (note added "encoding" parameter, it's kept from <?xml encoding="name" ?>)
> 
> with: char *eina_simple_xml_content_to_utf8(const char *encoding,
> const char *content, unsigned length);
> 
> that would scan for &entity; and replace with utf-8 symbols, also run
> non-entities through eina's iconv wrapper to get utf8 output. You can
> call it from Eina_Simple_XML_Cb or Eina_Simple_XML_Attribute_Cb
> whenever you like.  Is that fine?
> 
> 
> Attached is a new version, none of your comments are in it, but it was
> the last I changed... it also supports </> to close tags so it's
> usable by Edje's TEXTBLOCK as well. Also check out the node conversion
> there if it's helpful.
> 
> 
> 
> -- 
> Gustavo Sverzut Barbieri
> http://profusion.mobi embedded systems
> --------------------------------------
> MSN: barbi...@gmail.com
> Skype: gsbarbieri
> Mobile: +55 (19) 9225-2202


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    ras...@rasterman.com


------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to