Jan Ekström (12021-01-25): > For now I kept them separate since one is just moving Stefano's code, > and another adds new functionality that you requested.
Stefano's code was good enough for ffprobe, because it had to perform a very precise task in a very constrained context, but as is, it is not good enough for lavu, it needs to be adapted. > I really have no bigger bone with either naming, I just utilized: > 1. How the XML spec calls the thing ("AttValue" thus becoming > ATT_VALUE, "CharData" becoming CHAR_DATA). I slightly would prefer to > keep this due to this making it easy to point towards what is actually > meant in the XML spec. CharData is only used a few times in the abstract syntax of the spec, it will have meaning only for people who are very familiar with it. > 2. Single and double quotes are the wording that I am more acquainted > with, but I have no hard opinion so if you want _QUOT/_APOS, sure. I am fine with "double_quotes" and "single_quotes" or "dquot" and "squot" too. > > Outside attributes, " and ' do not need to be quoted. > > I think this is where I'm slightly going out of my comfort zone. This > is already existing code, and I just moved it out of ffprobe (where it > was already successfully merged and existed for a while). Additionally > the expression for char_data does at least reference single quotes. I > do see the paragraph at the end of 2.4 that notes Let us take a few steps back. The spec says: - < needs to be always escaped; - ' needs to be escaped when between '; - " needs to be escaped when between ". Common sens and good practice say: - characters that do not need to be escaped should not be escaped. (With XML, we make an exception for that last rule for >.) Therefore, we need three modes: - one mode to use between '; - one mode to use between "; - one mode to use outside. > > To allow attribute values to contain both single and double quotes, the > > apostrophe or single-quote character (') may be represented as "'", > > and the double-quote character (") as """." > > Of course, which seems specific to attribute values. > But at this point I'm 50/50 if it makes even sense for me to continue > this struggle or if this is an unclear message from you to me that I > should just utilize libxml2, which is already utilized in the DASH > demuxer. Mostly because there I at least can relatively surely know > that (probably) not only escaping, but also removal of invalid UTF-8 > might be handled by it (which we currently do not do either with JSON > or XML output from ffprobe as far as I know). That way also I don't > have to attempt to be a nice person and attempt to re-utilize existing > code from ffprobe, nor do I have to touch ffprobe. libxml2 has a terrible track records in terms of stability, including security. We should strive to use it less, not more. For parsers, we currently have no choice, because parsing XML is tricky. I still intend to finish the limited XML parser that I started implementing a few years ago, though. On the other hand, writing good XML is quite easy, and that is what we should be doing. Your code is almost ready, it just needs a few tweaks. Regards, -- Nicolas George
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".