Hi all, I have indexed a mix of English and German html documents downloaded as "Web page complete" via Firefox. Everything works fine and the pages get indexed by beagled.
However, there seems to be a problem with German Umlauts or in general with character encodings, e.g.: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html><head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <meta name="Author" content="Thomas Frenzel"> <meta name="Generator" content="NetObjects Fusion 4.0.1 für Windows"> <meta name="Keywords" content="DH, Downhill, Enduro, Enduro - Zschopau, Mountainbike, Mountainbiketouren, geführte Touren, Stülpner, "><title>Löwenkopftrails</title></head> ... ... Even when the page is marked clearly with "charset=ISO-8859-1" the term "Löwenkopftrails" is displayed in "Beagle-Best" only as "Lwenkopftrails - the German "ö" is missing. Also only a search for "Lwenkopftrails" brings out a result, "Löwenkopftrails" returns nothing. I'm running a Novell/SuSE 10.0 system, English as primary language (env: LANG=en_US.UTF-8). Is there a way to prevent / workaround/ configure this ? Thanks & kind regards, Stephan. _______________________________________________ Dashboard-hackers mailing list [email protected] http://mail.gnome.org/mailman/listinfo/dashboard-hackers
