>I would suggest that the manner appropriate to most any environment is to >just use plain ascii for your filenames :-) The "swung dash" you refer to >is called a tilde, btw, and is mostly used in spanish. > [JS] ... and mathematical notation.
I certainly agree with your suggestion about file names, I don't even like to see spaces (although that's common on Windows platforms). However, I thought the original question had to do with the CONTENTS, not the file names. Getting back to what I think was the original question, I work in a multi-lingual (mostly Western Europe and Asia). As long as I stick to UTF-8 and web browsers, I'm okay. Unfortunately I have to import data from MS Excel worksheets a lot, and then all bets are off. When you get a worksheet that was populated by copy/paste from MS Word, you'll wind up with all kinds of characters that "look right" but are actually not what you think they are. (There are some Cyrillic punctuation marks that look like Latin-1 punctuation marks, but are not the same.) I spent a very long time experimenting with this, and summarized my conclusions in http://lists.mysql.com/mysql/212392. Here's a little bit of code that I use when cleaning up blocks of text. It might not do what you want! It works on a "transliteration" scheme that maps "funky" characters into rough equivalents. We use this throughout our system, so the contents of the database are consistent in certain areas. There are other places where we use UTF-8, because the source is UTF-8. It would have been better to use UTF-8 throughout, rather than this transliteration scheme, but not only did I inherit a lot of existing data but my colleagues in Japan use the "MS Mincho" font, which can't handle these characters. (If they used "Arial Unicode MS" it would solve a lot of problems, but I don't run the zoo.) It's written in Visual Basic, and implemented as an Excel function, but is easily re-used. (I have implemented the same algorithm in PHP, for our web site.) ========== Option Explicit Const VERSION As String = "2009-12-18 - 11:51" Public Function FixCP1252(CellToScan As String) ' This function will transliterate the common high-ANSI (CP1252) ' characters that come from pasting MS Office text into an Excel ' spreadsheet. Dim Temp As String Dim I As Integer Dim CharsToReplace(7, 1) As String CharsToReplace(0, 0) = Chr(&H96) CharsToReplace(0, 1) = "-" CharsToReplace(1, 0) = Chr(&H97) CharsToReplace(1, 1) = "--" CharsToReplace(2, 0) = Chr(&H91) CharsToReplace(2, 1) = "'" CharsToReplace(3, 0) = Chr(&H92) CharsToReplace(3, 1) = "'" CharsToReplace(4, 0) = Chr(&H85) CharsToReplace(4, 1) = "..." CharsToReplace(5, 0) = Chr(&H93) CharsToReplace(5, 1) = """" CharsToReplace(6, 0) = Chr(&H94) CharsToReplace(6, 1) = """" CharsToReplace(7, 0) = Chr(&H95) CharsToReplace(7, 1) = "*" Temp = CellToScan For I = 0 To UBound(CharsToReplace, 1) - 1 Temp = Replace(Temp, CharsToReplace(I, 0), CharsToReplace(I, 1)) Next I FixCP1252 = Temp End Function ========== I also have to take text with all of this weirdness and make web pages out of it. Just in case it comes in handy, here's the code I use for that: ========== Private Function FixTroubleChars(ByVal LineIn As String) As String ' This little function cleans out any really troublesome characters, making substitutions ' as appropriate. We'll have to extend the coding as necessary. Dim Temp As String ' Fix some Unicode characters that are too weird to handle normally. According to the ' Unicode maps, some are for "private" use (meaning that they have no standard glyph assignment). ' Microsoft's "Arial Unicode MS" font can usually give you a suggestion, since the data ' probably came from a Windows source. Temp = Replace(LineIn, ChrW(&HDBC0), "•") Temp = Replace(Temp, ChrW(&HDC83), "") Temp = Replace(Temp, ChrW(&HF0A7), "•") FixTroubleChars = Temp End Function Function MyHTMLEncode(ByVal InString As String) Dim OutString As String, CleanString As String Dim ThisChar As String * 1 Dim I As Integer Dim CodePoint As Long ' First, take care of anything that is truly horrible and cannot be used. CleanString = FixTroubleChars(InString) ' Encode all "special" characters for use in a web page. OutString = "" For I = 1 To Len(CleanString) ThisChar = Mid(CleanString, I, 1) If ThisChar Like "[- a-zA-Z0-9!""#$%'&()*+,./:;=...@]" Then OutString = OutString & ThisChar Else CodePoint = AscW(ThisChar) If CodePoint < 0 Then MsgBox "Untranslatable character in " & vbCrLf & """" & InString & """" & vbCrLf & vbCrLf _ & "Codepoint = " & CodePoint & vbCrLf & vbCrLf _ & "Inspect the character and modify function FixTroubleChars() accordingly", _ vbCritical + vbOKOnly, "Bad character" Application.Cursor = xlDefault End End If OutString = OutString & "&#" & CodePoint & ";" End If Next I MyHTMLEncode = OutString End Function ========== I hope this helps somebody. Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org