On a Friday in 2020, Daniel P. Berrangé wrote:
On Fri, Jul 17, 2020 at 05:01:47PM +0200, Pino Toscano wrote:
Hi,

I recently took a look at the UI/user visible messages from libvirt,
which are translated using gettext. They are extracted in a single
libvirt.pot catalog, which includes messages from libvirt.so itself
(mostly, if not all, errors), the separate daemons, the helper tools,
and from virsh.

I noticed there is plently of room for improvements: what strikes is
the lack of consistency among the messages. Let me state first: I
understand that not all the people are native English speakers
(I am not), so I'm not picking against anyone.

Yes, the lack of consistency is pretty bad and makes more work for
our translators.


Also, I'm sure a portion of our translatable strings are in unreachable
error paths (i.e. we are looking up some data that we just succesfully
put there a few lines above) and by Murphy's law, there are code paths
missing an error completely or having an undescriptive message.
Hopefuly aborting on OOM will help us erase more messages.

Some examples:

a) different capitalization:
- "cannot open %s"
- "Cannot open %s"


I vote for the capitalized version, see below.

b) different quoting for files/identifiers/etc:
- "Cannot open %s"
- "Cannot open '%s'"


Yes, sometimes the error is worded in a way that prevents this,
e.g.
   current vcpus count must be an integer
for
   <vcpu current='x'>

We could even pass the hardcoded identifiers via %s, e.g.
  _("Invalid value of '%s': %s"), cpuset, tmp
instead of:
  _("Invalid value of 'cpuset': %s"), tmp
to prevent the identifier from being translated.

c) different verbs for failed actions:
- "Cannot frobnicate ..."
- "Could not frobnicate ..."
- "Did not frobnicate ..."
- "Failed to frobnicate ..."

"Failed to" seems most factual here

- "Unable to frobnicate ..."
depending on the message, also "frobbing failed"

Frobbing failed takes one extra character compared to that.


d) sometimes contractions ("couldn't", "don't", etc), sometimes not
("could not", "do not", etc)

e) what QEMU/etc supports:
- "... by this QEMU binary"
- "... for this QEMU binary"
- "... in this QEMU binary"
- "... with this QEMU binary"
- "... by this QEMU"
- "... for this QEMU"
- "... with this QEMU"
- "... with this binary" [in a QEMU file]
- "... [supported] by qemu"

There are possibly subtle nuances there:
  "by this QEMU binary" -> the particular QEMU does not support it at all - it 
was not
     impleneted yet or it was compiled out
  "with this QEMU binary" -> it might but libvirt does not bother to do the 
legacy part
  "by qemu" -> not in QEMU at the moment of writing this error message
  "for this QEMU binary" just sounds wrong to me, maybe a native speaker
  can correct me on that?

  (but I bet most of the uses did not care about those and just copied
  and pasted it from somewhere)

Also, does 'QEMU binary' vs. 'QEMU' bring any extra clarity?

there is also "qemu does not support ...", which I think it can stay

Most of these are quarded by QEMU_CAPS so they fall into one of the
first two categories above. I think I found only 'accel2d' that was
never intended to be supported by QEMU.

for now; also both "available [by/for/etc]" and "supported [by/for/etc]"
are used

That should be 'supported for <functionality>', not 'supported for
QEMU'.


I can give it a try in fixing the messages to be more consistent all
around; before I start the mass editing, I need to know which style to
follow:

If you put the style in writing first, other people might help too.


a) it seems like the virError fields @message, @str1, @str2 and @str3
are joined together in reporting/log strings like "error: <text>";
hence, should they be not capitalized? It may look OK in English, but
less nice and hard to fix in translations.
Obviously, sentences as shown in tools (e.g. virsh) definitely need to
be properly capitalized.

I think there is no correct answer here, because even with the error
messages, the <text> is not always used in the "error: <text>" scenario.
eg an application like virt-manager will merely display "<text>" in a
dialog box.

On the one hand I'd suggest lowercase text for error mesages, but if
the message is multiple sentances that would involve a capital. Probably
don't have many of the latter though, so standardizing in lowecase is
likely fine.

Starting with a lowercase letter feels more UNIX-like and helps if the
message starts with a lowercase identifier, but if some apps use the
text on their own, starting with uppercase would be more consistent.


b) should identifiers such as filenames, paths, XML tags, JSON fields,
etc be always quoted?

Generally user data that may go missing should be quoted because it makes
it more obvious when there is an accidentally empty string provided. I've
gone back to add quotes every time I've debugged a problem where the empty
string was involved.  To make it easier as a policy, it is fine to expand
that to all filenames/path, regardless of whether they come from the user
data or not. For XML / JSON field names, if it is just a bare word, then
I'd probably suggest quoting too, as some field names could accidentally
lead to grammatically correct but misleading error messages if unquoted.


c) which verb to use when something failed? "could not" is a subjective
thing, not a past action; "failed" seems to imply that something was
attempted; "did not" seems to imply that it was not done, but nothing
whether it was attempted; the rest sort of indicate the ability to do
something.


This one seems like more complicated question than the others and should
not let us from e.g. quoting the identifiers first.

Jano

I don't especially care which we use, as long as we're pretty
consistent. Perhaps the thing todo is just see which is the most
popular  usage today, so we invalidate the fewest translations
when changing.

d) allow contractions or not? They are generally used in spoken/informal
language, and while libvirt is not that formal it should not be that
colloquial either IMHO; also, they make the text slightly harder to
understand by non-native speakers, and they are lost when translating.
A POV on the matter is:
https://www.businesswritingblog.com/business_writing/2006/04/dont_use_contra.html

Yeah, I think I've seen enough recommendations about not using
contractions, that we should apply that rule.

e) which message to use to indicate that QEMU does not support
something?

I don't have a strong preference. Perhaps again just let a popularity
contest decide it.



I wonder if there's any clever python code we can pull in that reports
on "similar" strings that we could usefully run across the pot file
to identify candidates for sanitizing.

Also if there are many cases where we use roughly the same string
message, then that's a candidate for creating a wrapper function
to standardize on message text.

eg we added a virReportEnumRangeError() so that we got guaranteed
identical error messages for all enum range problems.

Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Attachment: signature.asc
Description: PGP signature

Reply via email to