2011/11/29 Karl Pflästerer <k...@rl.pflaesterer.de>: > Am 27.11.11 23:47, schrieb Karl Pflästerer: >> Am 27.11.11 23:23, schrieb Yannick Torrès: >>> 2011/11/27 Karl Pflästerer<k...@rl.pflaesterer.de>: >>>> Hi, >>> >>> Hi, >>> >>>> forgive me if I ask something which had already been discussed, but I've >>>> seen nothing in the archives. >>>> >>>> I try to help translating some of the docs and saw here >>>> https://edit.php.net/ this box: >>>> >>>> Check for errors in /language-snippets.ent >>>> >>>> The content for that box seems to get computed from tha class >>>> http://svn.php.net/repository/web/doc-editor/trunk/php/ToolsError.php >>>> >>>> There is a method attributLinkTag() >>>> >>>> To compare the linkend atrribute of the<link> tags it uses a regex. >>>> >>>> $reg = '/<link\s*?linkend=("|\')(.*?)("|\')\s*?>/s'; >>>> >>>> You see between<link and the linkend attribute only whitespace is >>>> allowed. >>>> But for example in the german translation (and also in the english >>>> documentation) some<link> tags have another attribute between the >>>> element >>>> name and "linkend". >>> >>> Could you give me an example please of this case ? >> >>> From en/language-snippets.ent >> >> <!ENTITY seealso.array.sorting 'The<link >> xmlns="http://docbook.org/ns/docbook" linkend="array.sorting">comparison of >> array sorting functions</link>'> >> >> <!ENTITY seealso.callback 'information about the<link >> xmlns="http://docbook.org/ns/docbook" >> linkend="language.types.callback">callback</link> type'> >> >> In the german translation are more examples (some of them IMHO wrong, since >> they duplicate the xmlns attribute), but I'm not sure if such a simple >> difference should trigger such an error. >> >>> >>>> An easy fix would be >>>> $reg = '/<link[^<>]+linkend=("|\')(.*?)("|\')[^<>]*>/s'; >>>> >>>> But that would solve only have of the problem; the other problem is that >>>> the >>>> check script needs the same order of entities in both files and it >>>> compares >>>> only the position of the found links in both match arrays. So e.g. one >>>> link >>>> more in the translation will give false matches for all following >>>> entries. >>> >>> Yes it is. >>> The goal here is to check each file and warn when there is only one >>> difference even if this is an ordre problem (this can be a translation >>> error too). >> >> Ok. (for a file with only entity definition order shouldn't matter or?) >> >>> >>>> Does it make sense to rewrite that algorithm, so that it compares each >>>> entity in the english original and the translation so we get better >>>> errors? >>> >>> You mean to avoid order check ? >>> Perhaps we can do this yes : check the number of this tag, and check >>> if there is all of this tag, even if the order is not respected. >> >> I thought to perhaps check each entity definition; so not to do a simple >> preg_match_all and compare $match_en[1] to $match_lang[1] but to compare the >> linkend attribute of entity definition in en and $lang. >> >> Then the error could be: Difference in linkend attribute in entity xyz. > > To be a little bit more concrete, here is a code example (that's just a POC): > > <?php > > function extract_linkend ($s) { > > $rx_linkend = ' > / > <(?: link | xref) > [^<>]+ > linkend=(?:"|\') (.*?) (?:"|\') > [^<>]* > > > /xs'; > > $rx_entities = '/(<!ENTITY\s+(\S+).+?)(?=(?:<!ENTITY|$))/s'; > > preg_match_all($rx_entities, $s, $m_entities, PREG_SET_ORDER); > $linkend_by_entity = array(); > foreach ($m_entities as $entity) { > preg_match_all($rx_linkend, $entity[1], $m_linkend); > if ($m_linkend[1]) > $linkend_by_entity[$entity[2]] = $m_linkend[1]; > }; > return $linkend_by_entity; > } > > > $link_de = extract_linkend(file_get_contents('language-snippets.ent')); > $link_en = extract_linkend(file_get_contents('../en/language-snippets.ent')); > > $diff = array_udiff_assoc($link_en, $link_de, > function ($en, $lang) { return array_diff($en, > $lang) ? 1 : 0; } ); > > foreach ($diff as $entity => $linkends) { > echo "Entity: $entity\n"; > echo 'EN: ' . join('; ', $linkends), "\n"; > echo 'DE: ' . join('; ', $link_de[$entity]), "\n\n"; > } > > > If I run that (with the de translation), I get: > > Entity: ini.php.constants > EN: configuration.changes.modes > DE: ini > > Entity: mysqli.available.mysqlnd > EN: book.mysqlnd > DE: mysqli.overview.mysqlnd > > That could be helpful (IMHO). > > KP > >
Thanks Karl, I will add this asap Great work, Yannick