[mb-style] Re: [Clean up CSG] Typography (was: Capitalization (and placement))

Brian Schweitzer Wed, 30 Jan 2008 10:28:13 -0800

Alexander Dupuy wrote:
>>> > As I say, we already use such illegal symbols (:,?, /, etc).  The
>>> > "Colons and slashes and perhaps other symbols are not recognised in
>>> > some mp3 ogg player filesystems e.g. iriver, OSX." argument, if we
>>> > truly are moving to a music database, and not a tagging database,
>>> > would seem outdated.
>>>
>and Arturus Magi chimed in:
>
>> Especially considering that properly designed taggers can compensate
>> for the filesystem-side issues, and and reciprocating that
>> compensation to the ID3/APE/etc. tags will fix the coresponding issues
>> in most players that are marginally unicode compliant.
>
>I'd agree that the filesystem argument is weak, but I haven't seen any
>specific evidence that Picard (or any other tagger using MB data) has
>such functionality for converting full Unicode to a subset in


Well, I know luks is quick to say "don't base style issues on what
Picard or any other tagger can handle", but if I can be forgiven for
saying it luks (smile), Picard does handle just the type of
replacement you're talking about.

It currently has two different ways, actually.  For file names,
there's the "replace non-ASCII characters" option.  For file names and
tags, there's taggerscript.  What you're describing is actually
essentially the reverse of part of the script I use now.  (I have
seeing all the underscores, and like seeing the slashes and such):

$set(title,$replace(%title%,...,…))
$set(album,$replace(%album%,...,…))
$set(title,$replace(%title%,/,⁄))
$set(album,$replace(%album%,/,⁄))
$set(album,$replace(%album%,:,﹕))
$set(title,$replace(%title%,:,﹕))
$set(album,$replace(%album%, No., N°))
$set(title,$replace(%title%, No., N°))
$set(album,$replace(%album%,",＂))
$set(title,$replace(%title%,",＂))
$set(album,$replace(%album%,?,？))
$set(title,$replace(%title%,?,？))

Essentially, I replace the windows-invalid characters with unicode
(near) equivalents.  (If anyone knows a non-full width ? or " that's a
better equivalent, please let me know! :) ).

So, if you're the person who is tagging, and you don't want em-dash,
en-dash, foreign quotes, etc, it'd be a simple substitution using
taggerscript:

$set(title,$replace(%title%,—,-))
$set(title,$replace(%title%,–,-))
$set(title,$replace(%title%,‒,-))
$set(title,$replace(%title%,«,"))
$set(title,$replace(%title%,»,"))
etc.

(and actually, if there were demand, it wouldn't be all that hard to
make a "CSG-de-typographicaphier" plugin, to do it without even having
to handle tagger script.)

Problem is, it's very easy to go from » to " with a simple script.
It's very difficult to go from " to knowing if you need », «, ›, ‹, 〝,
〞, 〟, etc with even a complex script.

As for web display support, while I do understand what you're saying,
the same would seem to also hold true for anything, not just
classical.  A cell phone may not have support for Hangul characters -
but is that a reason we ought to not be entering Korean releases?  How
about the soundtrack to this release:
http://musicbrainz.org/release/c66b2ad1-1f82-4382-a1f8-fdc54685f281.html
- should we rename it to I (heart) Huckabees?

I don't intend to sound sarcastic - I'm just being realistic.
MusicBrainz is an international site - there's at least a few dozen
countries, languages, and even scripts represented on this mailing
list alone, I would suspect (even if we all communicate in
latin/English on this list).  Issues with various devices or tagging
utilities possibly not properly handling an international standard lie
with the programs and devices, and ought to not influence what we do
or don't do to the data.

As for the data entry side, first the task would be to create the
master lists.  And again, generics would be perfectly acceptable until
the list is completed and the data "upgraded" to the corrected
listing.  That would leave us with new releases entering the system.
I think if the current classical editors are not only creating, but
using these listings, we don't have to worry about them being the
source of improperly formed titles.

It'd be the new editors who would be doing it, just as it is now - but
just based on my own experiences with what kinds of classical data new
editors enter, I rather doubt the issues in a new editor's add edit,
even once we have cleaned up all the outstanding issues with CSG,
would be as minor as "-" instead of "—".  Rather, it'd be "Allegro"
instead of a correct CSG title, just as it is now - and if we then are
pointing them to a standardized list, the amount of "data needing
cleanup" actually, I think, would go down, not up, as we get the new
editors on board with correct CSG faster...

It's much easier if they can copy and paste, and learn correct CSG
style by example, rather than by right from the start making them do
the full creation work to make a correct CSG title.  So perhaps, then,
long term we even end up with more editors doing classical, as more
works lists are created (so progressively less actual CSG title
creation needs to ever be done) and classical becomes more and more
copy/paste, not "so which piece of data goes where, in what order,
with what capitalization, with what orthnography, and which which
typography again?"

Brian

_______________________________________________
Musicbrainz-style mailing list
Musicbrainz-style@lists.musicbrainz.org
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style

[mb-style] Re: [Clean up CSG] Typography (was: Capitalization (and placement))

Reply via email to