On 02/11/2016 09:15 PM, Patrick Lauer wrote:
> Now instead of looking up [metadata.xml] -> (herd name) -> [herds.xml]
> -> email it goes backwards:
> [metadata.xml] -> (maintainer type=project) -> email -> [projects.xml]
> -> Project name
>
> Since this involves XML and python's ElementTree library it's a
> nontrivial change that also removes a few now useless helpers
> (_get_herd_email has no reason to be, but we'd need a _get_herd_name
> helper instead. Err, get_proj ... ah well, whatever name works)
>
> And all that just so (1) gentoolkit output works and (2) euscan updates
> properly. Both of which I don't really care about much, but now that
> I've invested ~4h into debugging and trying to fix it I'm a tiny bit
> IRRITATED.
>
So this turns out to be more fun than expected.

Having spent a little bit of time staring at XML, DTDs and wondering why
we do things the most difficult way ...

Previously the herd tag was defined as:
<!ELEMENT herd (#PCDATA)>

So we end up with, for example:
<herd>kde</herd>

The new schema collapses herd (err, project!) into maintainers (err,
sustainers ... staff ... linchpin?)
And maintainer is defined as:
<!ELEMENT maintainer ( email, (description| name)* )>

Which means that only email is mandatory. So instead of search by name
you are now required to search by email.
And it leads to inconsistent (partial) duplication: Some metadata.xml
entries carry Name, some Description, and some are Email only.

For example for gentoolkit this means that instead of search by name now
it needs to be search by email, and the previous search by name
functionality requires herds.xml, err, projects.xml to figure out the
name of a project. Which might not match the one in metadata.xml!
(And you may need to filter out maintainers-that-are-not-projects, and
what about maintainers that are undefined? So much extra code complexity!)

And this is why I avoided the topic and hoped that the 'migration' would
make sense:
(1) Using XML is mildly insane. Neither machine- nor human-readable
(2) The DTD is even more insane, and few people have the patience to
figure it out
(3) The recent changes to the DTD change the data model in subtle ways
so that there's even *more* denormalization possible
(4) The tooling is, due to XML, wonderfully horrible and requires things
like XPATH to get the required data (because query by attribute is
harder than query by tag)

There's fundamental questions that should be handled before doing more
modifications - for example, should the data be more normalized (e.g.
name only in projects.xml / maintainers.xml and only email in
metadata.xml)? If we allow denormalization, do we have tools to check
and autocorrect (e.g. a maintainer changing name)?

Once we decide to abstract it away so that people should use tools and
not mangle it manually (have you looked at herds.xml ?! omg ...) there's
the question ... why XML? It's about the worst format for this job, INI
format is sufficient and easier to parse. Or JSON, or YAML, or whatever
is trendy now. Or do we autogenerate from templates?

Another funny thing: projects.xml is not in the same repository, so
synchronizing changes gets more tricky. And the metadata.dtd is in yet
another place. Wouldn't it make sense to have this organized in a less
confusing way?

You see where this is going - and why I didn't object loud enough to the
changes: I want to not care about this whole cluster of topics and do
things that are more rewarding. But that choice got taken away when
things broke (oh, they didn't break, they Function Differently now) and
I had to spend some time investigating why things deviate.

Sigh.


Am I grumpy?

Reply via email to