On Thursday, June 5, 2014 12:12:06 AM UTC+5:30, Roy Smith wrote: > Chris Angelico wrote:
> > You can't ignore those. You might be able to say "Well, my program > > will run slower if you throw these at it", but if you're going down > > that route, you probably want the full FSR and the advantages it > > confers on ASCII and Latin-1 strings. Binding your program to BMP-only > > is nearly as dangerous as binding it to ASCII-only; potentially worse, > > because you can run an awful lot of artificial tests without > > remembering to stick in some astral characters. > Yup. I wrote a while(*) back about the pain I was having importing some > data into a MySQL(**) database which (unknown to me when I started) only > handled BMP. It turns out in the entire dataset of 20-odd million > records, there were exactly four that had astral characters. All of my > tests worked. I didn't discover the problem until it blew up many hours > into the "final" production import run. > (*) Two years? > (**) This was not the only pain point with MySQL. We eventually > switched to Postgress. Thanks Roy for bringing up that example - I was trying to recollect the details. I forgot about the MySQL angle which adds a different twist to it. Here's my interpretation of that situation; I'd like to hear yours: Basic problem was that MySQL handled a strict subset of what the rest of the system (Python 2.7?) could handle. This meant that at a late (and embarrassing) stage, exceptions were being thrown, from deep within the system. OTOH, let's say you could detect the 'error' (more correctly 'un-handle-able') at the borders of your system, say when the user enters the data on a web-form. Would you have a problem kicking out those characters (in both senses!) with a curt: "Cant deal with all this supra-galactic rubble!" ? Of course switching to postgres may be a sound choice on other fronts. But if that were not an option, and you only had these choices: - significantly complexify your MySQL data structures to handle 4 in 20 million cases - just detect and throw such cases out at the outset which would you take? In any case this is the choice I hear from the micropython folks who are explicitly seeking a cutdown version of python -- https://mail.python.org/mailman/listinfo/python-list