Alexander Dupuy wrote: >>> > As I say, we already use such illegal symbols (:,?, /, etc). The >>> > "Colons and slashes and perhaps other symbols are not recognised in >>> > some mp3 ogg player filesystems e.g. iriver, OSX." argument, if we >>> > truly are moving to a music database, and not a tagging database, >>> > would seem outdated. >>> >and Arturus Magi chimed in: > >> Especially considering that properly designed taggers can compensate >> for the filesystem-side issues, and and reciprocating that >> compensation to the ID3/APE/etc. tags will fix the coresponding issues >> in most players that are marginally unicode compliant. > >I'd agree that the filesystem argument is weak, but I haven't seen any >specific evidence that Picard (or any other tagger using MB data) has >such functionality for converting full Unicode to a subset in
Well, I know luks is quick to say "don't base style issues on what Picard or any other tagger can handle", but if I can be forgiven for saying it luks (smile), Picard does handle just the type of replacement you're talking about. It currently has two different ways, actually. For file names, there's the "replace non-ASCII characters" option. For file names and tags, there's taggerscript. What you're describing is actually essentially the reverse of part of the script I use now. (I have seeing all the underscores, and like seeing the slashes and such): $set(title,$replace(%title%,...,…)) $set(album,$replace(%album%,...,…)) $set(title,$replace(%title%,/,⁄)) $set(album,$replace(%album%,/,⁄)) $set(album,$replace(%album%,:,﹕)) $set(title,$replace(%title%,:,﹕)) $set(album,$replace(%album%, No., N°)) $set(title,$replace(%title%, No., N°)) $set(album,$replace(%album%,",")) $set(title,$replace(%title%,",")) $set(album,$replace(%album%,?,?)) $set(title,$replace(%title%,?,?)) Essentially, I replace the windows-invalid characters with unicode (near) equivalents. (If anyone knows a non-full width ? or " that's a better equivalent, please let me know! :) ). So, if you're the person who is tagging, and you don't want em-dash, en-dash, foreign quotes, etc, it'd be a simple substitution using taggerscript: $set(title,$replace(%title%,—,-)) $set(title,$replace(%title%,–,-)) $set(title,$replace(%title%,‒,-)) $set(title,$replace(%title%,«,")) $set(title,$replace(%title%,»,")) etc. (and actually, if there were demand, it wouldn't be all that hard to make a "CSG-de-typographicaphier" plugin, to do it without even having to handle tagger script.) Problem is, it's very easy to go from » to " with a simple script. It's very difficult to go from " to knowing if you need », «, ›, ‹, 〝, 〞, 〟, etc with even a complex script. As for web display support, while I do understand what you're saying, the same would seem to also hold true for anything, not just classical. A cell phone may not have support for Hangul characters - but is that a reason we ought to not be entering Korean releases? How about the soundtrack to this release: http://musicbrainz.org/release/c66b2ad1-1f82-4382-a1f8-fdc54685f281.html - should we rename it to I (heart) Huckabees? I don't intend to sound sarcastic - I'm just being realistic. MusicBrainz is an international site - there's at least a few dozen countries, languages, and even scripts represented on this mailing list alone, I would suspect (even if we all communicate in latin/English on this list). Issues with various devices or tagging utilities possibly not properly handling an international standard lie with the programs and devices, and ought to not influence what we do or don't do to the data. As for the data entry side, first the task would be to create the master lists. And again, generics would be perfectly acceptable until the list is completed and the data "upgraded" to the corrected listing. That would leave us with new releases entering the system. I think if the current classical editors are not only creating, but using these listings, we don't have to worry about them being the source of improperly formed titles. It'd be the new editors who would be doing it, just as it is now - but just based on my own experiences with what kinds of classical data new editors enter, I rather doubt the issues in a new editor's add edit, even once we have cleaned up all the outstanding issues with CSG, would be as minor as "-" instead of "—". Rather, it'd be "Allegro" instead of a correct CSG title, just as it is now - and if we then are pointing them to a standardized list, the amount of "data needing cleanup" actually, I think, would go down, not up, as we get the new editors on board with correct CSG faster... It's much easier if they can copy and paste, and learn correct CSG style by example, rather than by right from the start making them do the full creation work to make a correct CSG title. So perhaps, then, long term we even end up with more editors doing classical, as more works lists are created (so progressively less actual CSG title creation needs to ever be done) and classical becomes more and more copy/paste, not "so which piece of data goes where, in what order, with what capitalization, with what orthnography, and which which typography again?" Brian
_______________________________________________ Musicbrainz-style mailing list Musicbrainz-style@lists.musicbrainz.org http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style