Re: Malformed XML with exotic characters
Hi I've seen almost all funky charsets but gothic is always trouble. I'm also unsure if its really a bug in Solr. It could well be the Xerces being unable to cope. Besides, most systems indeed don't go well with gothic. This mail client does, but my terminal can't find its cursor after (properly) displaying such text. http://got.wikipedia.org/wiki/%F0%90%8C%B7%F0%90%8C%B0%F0%90%8C%BF%F0%90%8C%B1%F0%90%8C%B9%F0%90%8C%B3%F0%90%8C%B0%F0%90%8C%B1%F0%90%8C%B0%F0%90%8C%BF%F0%90%8D%82%F0%90%8C%B2%F0%90%8D%83/Haubidabaurgs Thanks for the input. Cheers, On Tuesday 01 February 2011 19:59:33 Robert Muir wrote: Hi, it might only be a problem with your xml tools (e.g. firefox). the problem here is characters outside of the basic multilingual plane (in this case Gothic). XML tools typically fall apart on these portions of unicode (in lucene we recently reverted to a patched/hacked copy of xerces specifically for this reason). If you care about characters outside of the basic multilingual plane actually working, unfortunately you have to start being very very very particular about what software you use... you can assume most software/setups WON'T work. For example, if you were to use mysql's utf8 character set you would find it doesn't actually support all of UTF-8! in this case you would need to use the recent 'utf8mb4' or something instead, that is actually utf-8! Thats just one example of a well-used piece of software that suffers from issues like this, there are others. Its for reasons like these that if support for these languages is important to you, I would stick with the most simple/textual methods for input and output: e.g. using things like CSV and JSON if you can. I would also fully test every component/jar in your application individually and once you get it working, don't ever upgrade. In any case, if you are having problems with characters outside of the basic multilingual plane, and you suspect its actually a bug in Solr, please open a JIRA issue, especially if you can provide some way to reproduce it
Malformed XML with exotic characters
There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the JSON response writer. The problem is, i am unsure whether this is a bug in Solr or that perhaps Firefox itself trips over. Here's the output of the JSONResponeWriter for a query returning the home page: { responseHeader:{ status:0, QTime:1, params:{ fl:url,content, indent:true, wt:json, q:*:*, rows:1}}, response:{numFound:6744,start:0,docs:[ { url:http://www.wikipedia.org/;, content:Wikipedia English The Free Encyclopedia 3 543 000+ articles 日 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije encyclopedie 668 000+ artikelen Search • Suchen • Rechercher • Szukaj • Ricerca • 検索 • Buscar • Busca • Zoeken • Поиск • Sök • 搜尋 • Cerca • Søk • Haku • Пошук • Hledání • Keresés • Căutare • 찾기 • Tìm kiếm • Ara • Cari • Søg • بحث • Serĉu • Претрага • Paieška • Hľadať • Suk • جستجو • חיפוש • Търсене • Poišči • Cari • Bilnga العربية Български Català Česky Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文 100 000+ العربية • Български • Català • Česky • Dansk • Deutsch • English • Español • Esperanto • فارسی • Français • 한국어 • Bahasa Indonesia • Italiano • עברית • Lietuvių • Magyar • Bahasa Melayu • Nederlands • 日本語 • Norsk (bokmål) • Polski • Português • Русский • Română • Slovenčina • Slovenščina • Српски / Srpski • Suomi • Svenska • Türkçe • Українська • Tiếng Việt • Volapük • Winaray • 中文 10 000+ Afrikaans • Aragonés • Armãneashce • Asturianu • Kreyòl Ayisyen • Azərbaycan / آذربايجان ديلی • বাংলা • Беларуская ( Акадэмічная • Тарашкевiца ) • বিষ্ণুপ্রিযা় মণিপুরী • Bosanski • Brezhoneg • Чăваш • Cymraeg • Eesti • Ελληνικά • Euskara • Frysk • Gaeilge • Galego • ગુજરાતી • Հայերեն • हिन्दी • Hrvatski • Ido • Íslenska • Basa Jawa • ಕನ್ನಡ • ქართული • Kurdî / كوردی • Latina • Latviešu • Lëtzebuergesch • Lumbaart • Македонски • മലയാളം • मराठी • नेपाल भाषा • नेपाली • Norsk (nynorsk) • Nnapulitano • Occitan • Piemontèis • Plattdüütsch • Ripoarisch • Runa Simi • شاہ مکھی پنجابی • Shqip • Sicilianu • Simple English • Sinugboanon • Srpskohrvatski / Српскохрватски • Basa Sunda • Kiswahili • Tagalog • தமிழ் • తెలుగు • ไทย • اردو • Walon • Yorùbá • 粵語 • Žemaitėška 1 000+ Bahsa Acèh • Alemannisch • አማርኛ • Arpitan • ܐܬܘܪܝܐ • Avañe’ẽ • Aymar Aru • Bân-lâm-gú • Bahasa Banjar • Basa Banyumasan • Башҡорт • भोजपुरी • Bikol Central • Boarisch • བོད་ཡིག • Chavacano de Zamboanga • Corsu • Deitsch • ދިވެހި • Diné Bizaad • Eald Englisc • Emigliàn–Rumagnòl • Эрзянь • Estremeñu • Fiji Hindi • Føroyskt • Furlan • Gaelg • Gàidhlig • 贛語 • گیلکی • Hak- kâ-fa / 客家話 • Хальмг • ʻŌlelo Hawaiʻi • Hornjoserbsce • Ilokano • Interlingua • Interlingue • Ирон Æвзаг • Kapampangan • Kaszëbsczi • Kernewek • ភាសាខ្មែរ • Kinyarwanda • Коми • Кыргызча • Ladino / לאדינו • Ligure • Limburgs • Lingála • lojban • Malagasy • Malti • 文言 • Māori • مصرى • مازِرونی / Mäzeruni • Монгол • မြန်မာဘာသာ • Nāhuatlahtōlli • Nedersaksisch • Nouormand • Novial • Нохчийн • Олык Марий • O‘zbek • पाऴि • Pangasinán • ਪੰਜਾਬੀ / پنجابی • Papiamentu • پښتو • Picard • Къарачай– Малкъар • Қазақша • Qırımtatarca • Rumantsch • Русиньскый Язык • संस्कृतम् • Sámegiella • Sardu • Саха Тыла • Scots • Seeltersk • සිංහල • Ślůnski • Af Soomaali • کوردی • Tarandíne • Татарча / Tatarça • Тоҷикӣ • Lea faka- Tonga • Türkmen • Удмурт • ᨅᨔ ᨕᨙᨁᨗ • Uyghur / ئۇيغۇرچه • Vèneto • Võro • West-Vlams • Wolof • 吴语 • ייִדיש • Zazaki 100+ Akan • Аҧсуа • Авар • Bamanankan • Bislama • Буряад • Chamoru • Chichewa • Cuengh • Dolnoserbski • Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • • Hausa / هَوُسَا • Igbo • ᐃᓄᒃᑎᑐᑦ / Inuktitut • Iñupiak • Kalaallisut • कश्मीरी / كشميري • Kongo • Кырык Мары • ພາສາລາວ • Лакку • Luganda • Mìng-dĕ̤ng-ngṳ̄ • Mirandés • Мокшень • Молдовеняскэ • Na Vosa Vaka-Viti • Dorerin Naoero • Nēhiyawēwin / ᓀᐦᐃᔭᐍᐏᐣ • Norfuk / Pitkern •
Re: Malformed XML with exotic characters
Hi Markus, to verify that it's not an Firefox-Issue, try xmllint on your shell to check the given xml? Regards Stefan On Tue, Feb 1, 2011 at 4:43 PM, Markus Jelsma markus.jel...@openindex.io wrote: There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the JSON response writer. The problem is, i am unsure whether this is a bug in Solr or that perhaps Firefox itself trips over. Here's the output of the JSONResponeWriter for a query returning the home page: { responseHeader:{ status:0, QTime:1, params:{ fl:url,content, indent:true, wt:json, q:*:*, rows:1}}, response:{numFound:6744,start:0,docs:[ { url:http://www.wikipedia.org/;, content:Wikipedia English The Free Encyclopedia 3 543 000+ articles 日 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije encyclopedie 668 000+ artikelen Search • Suchen • Rechercher • Szukaj • Ricerca • 検索 • Buscar • Busca • Zoeken • Поиск • Sök • 搜尋 • Cerca • Søk • Haku • Пошук • Hledání • Keresés • Căutare • 찾기 • Tìm kiếm • Ara • Cari • Søg • بحث • Serĉu • Претрага • Paieška • Hľadať • Suk • جستجو • חיפוש • Търсене • Poišči • Cari • Bilnga العربية Български Català Česky Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文 100 000+ العربية • Български • Català • Česky • Dansk • Deutsch • English • Español • Esperanto • فارسی • Français • 한국어 • Bahasa Indonesia • Italiano • עברית • Lietuvių • Magyar • Bahasa Melayu • Nederlands • 日本語 • Norsk (bokmål) • Polski • Português • Русский • Română • Slovenčina • Slovenščina • Српски / Srpski • Suomi • Svenska • Türkçe • Українська • Tiếng Việt • Volapük • Winaray • 中文 10 000+ Afrikaans • Aragonés • Armãneashce • Asturianu • Kreyòl Ayisyen • Azərbaycan / آذربايجان ديلی • বাংলা • Беларуская ( Акадэмічная • Тарашкевiца ) • বিষ্ণুপ্রিযা় মণিপুরী • Bosanski • Brezhoneg • Чăваш • Cymraeg • Eesti • Ελληνικά • Euskara • Frysk • Gaeilge • Galego • ગુજરાતી • Հայերեն • हिन्दी • Hrvatski • Ido • Íslenska • Basa Jawa • ಕನ್ನಡ • ქართული • Kurdî / كوردی • Latina • Latviešu • Lëtzebuergesch • Lumbaart • Македонски • മലയാളം • मराठी • नेपाल भाषा • नेपाली • Norsk (nynorsk) • Nnapulitano • Occitan • Piemontèis • Plattdüütsch • Ripoarisch • Runa Simi • شاہ مکھی پنجابی • Shqip • Sicilianu • Simple English • Sinugboanon • Srpskohrvatski / Српскохрватски • Basa Sunda • Kiswahili • Tagalog • தமிழ் • తెలుగు • ไทย • اردو • Walon • Yorùbá • 粵語 • Žemaitėška 1 000+ Bahsa Acèh • Alemannisch • አማርኛ • Arpitan • ܐܬܘܪܝܐ • Avañe’ẽ • Aymar Aru • Bân-lâm-gú • Bahasa Banjar • Basa Banyumasan • Башҡорт • भोजपुरी • Bikol Central • Boarisch • བོད་ཡིག • Chavacano de Zamboanga • Corsu • Deitsch • ދިވެހި • Diné Bizaad • Eald Englisc • Emigliàn–Rumagnòl • Эрзянь • Estremeñu • Fiji Hindi • Føroyskt • Furlan • Gaelg • Gàidhlig • 贛語 • گیلکی • Hak- kâ-fa / 客家話 • Хальмг • ʻŌlelo Hawaiʻi • Hornjoserbsce • Ilokano • Interlingua • Interlingue • Ирон Æвзаг • Kapampangan • Kaszëbsczi • Kernewek • ភាសាខ្មែរ • Kinyarwanda • Коми • Кыргызча • Ladino / לאדינו • Ligure • Limburgs • Lingála • lojban • Malagasy • Malti • 文言 • Māori • مصرى • مازِرونی / Mäzeruni • Монгол • မြန်မာဘာသာ • Nāhuatlahtōlli • Nedersaksisch • Nouormand • Novial • Нохчийн • Олык Марий • O‘zbek • पाऴि • Pangasinán • ਪੰਜਾਬੀ / پنجابی • Papiamentu • پښتو • Picard • Къарачай– Малкъар • Қазақша • Qırımtatarca • Rumantsch • Русиньскый Язык • संस्कृतम् • Sámegiella • Sardu • Саха Тыла • Scots • Seeltersk • සිංහල • Ślůnski • Af Soomaali • کوردی • Tarandíne • Татарча / Tatarça • Тоҷикӣ • Lea faka- Tonga • Türkmen • Удмурт • ᨅᨔ ᨕᨙᨁᨗ • Uyghur / ئۇيغۇرچه • Vèneto • Võro • West-Vlams • Wolof • 吴语 • ייִדיש • Zazaki 100+ Akan • Аҧсуа • Авар • Bamanankan • Bislama • Буряад • Chamoru • Chichewa • Cuengh • Dolnoserbski • Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • • Hausa / هَوُسَا • Igbo • ᐃᓄᒃᑎᑐᑦ / Inuktitut • Iñupiak • Kalaallisut •
Re: Malformed XML with exotic characters
Markus A few things to check, make sure whatever SOLR is hosted on is outputting utf-8 ( URIEncoding=UTF-8 in the Connector section in server.xml on Tomcat for example), which it looks like here, also make sure that whatever http header there is tells firefox that it is getting utf-8 (otherwise it defaults to iso-8859-1/latin-1), finally make sure that whatever font you use in firefox has the 'exotic' characters you are expecting. There might also be some issues on your platform with mixing script direction but that is probably not likely. Cheers François On Feb 1, 2011, at 10:43 AM, Markus Jelsma wrote: There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the JSON response writer. The problem is, i am unsure whether this is a bug in Solr or that perhaps Firefox itself trips over. Here's the output of the JSONResponeWriter for a query returning the home page: { responseHeader:{ status:0, QTime:1, params:{ fl:url,content, indent:true, wt:json, q:*:*, rows:1}}, response:{numFound:6744,start:0,docs:[ { url:http://www.wikipedia.org/;, content:Wikipedia English The Free Encyclopedia 3 543 000+ articles 日 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije encyclopedie 668 000+ artikelen Search • Suchen • Rechercher • Szukaj • Ricerca • 検索 • Buscar • Busca • Zoeken • Поиск • Sök • 搜尋 • Cerca • Søk • Haku • Пошук • Hledání • Keresés • Căutare • 찾기 • Tìm kiếm • Ara • Cari • Søg • بحث • Serĉu • Претрага • Paieška • Hľadať • Suk • جستجو • חיפוש • Търсене • Poišči • Cari • Bilnga العربية Български Català Česky Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文 100 000+ العربية • Български • Català • Česky • Dansk • Deutsch • English • Español • Esperanto • فارسی • Français • 한국어 • Bahasa Indonesia • Italiano • עברית • Lietuvių • Magyar • Bahasa Melayu • Nederlands • 日本語 • Norsk (bokmål) • Polski • Português • Русский • Română • Slovenčina • Slovenščina • Српски / Srpski • Suomi • Svenska • Türkçe • Українська • Tiếng Việt • Volapük • Winaray • 中文 10 000+ Afrikaans • Aragonés • Armãneashce • Asturianu • Kreyòl Ayisyen • Azərbaycan / آذربايجان ديلی • বাংলা • Беларуская ( Акадэмічная • Тарашкевiца ) • বিষ্ণুপ্রিযা় মণিপুরী • Bosanski • Brezhoneg • Чăваш • Cymraeg • Eesti • Ελληνικά • Euskara • Frysk • Gaeilge • Galego • ગુજરાતી • Հայերեն • हिन्दी • Hrvatski • Ido • Íslenska • Basa Jawa • ಕನ್ನಡ • ქართული • Kurdî / كوردی • Latina • Latviešu • Lëtzebuergesch • Lumbaart • Македонски • മലയാളം • मराठी • नेपाल भाषा • नेपाली • Norsk (nynorsk) • Nnapulitano • Occitan • Piemontèis • Plattdüütsch • Ripoarisch • Runa Simi • شاہ مکھی پنجابی • Shqip • Sicilianu • Simple English • Sinugboanon • Srpskohrvatski / Српскохрватски • Basa Sunda • Kiswahili • Tagalog • தமிழ் • తెలుగు • ไทย • اردو • Walon • Yorùbá • 粵語 • Žemaitėška 1 000+ Bahsa Acèh • Alemannisch • አማርኛ • Arpitan • ܐܬܘܪܝܐ • Avañe’ẽ • Aymar Aru • Bân-lâm-gú • Bahasa Banjar • Basa Banyumasan • Башҡорт • भोजपुरी • Bikol Central • Boarisch • བོད་ཡིག • Chavacano de Zamboanga • Corsu • Deitsch • ދިވެހި • Diné Bizaad • Eald Englisc • Emigliàn–Rumagnòl • Эрзянь • Estremeñu • Fiji Hindi • Føroyskt • Furlan • Gaelg • Gàidhlig • 贛語 • گیلکی • Hak- kâ-fa / 客家話 • Хальмг • ʻŌlelo Hawaiʻi • Hornjoserbsce • Ilokano • Interlingua • Interlingue • Ирон Æвзаг • Kapampangan • Kaszëbsczi • Kernewek • ភាសាខ្មែរ • Kinyarwanda • Коми • Кыргызча • Ladino / לאדינו • Ligure • Limburgs • Lingála • lojban • Malagasy • Malti • 文言 • Māori • مصرى • مازِرونی / Mäzeruni • Монгол • မြန်မာဘာသာ • Nāhuatlahtōlli • Nedersaksisch • Nouormand • Novial • Нохчийн • Олык Марий • O‘zbek • पाऴि • Pangasinán • ਪੰਜਾਬੀ / پنجابی • Papiamentu • پښتو • Picard • Къарачай– Малкъар • Қазақша • Qırımtatarca • Rumantsch • Русиньскый Язык • संस्कृतम् • Sámegiella • Sardu • Саха Тыла • Scots • Seeltersk • සිංහල •
Re: Malformed XML with exotic characters
It's throwing out a lot of disturbing messages: select.xml:17: parser error : Char 0xD800 out of allowed range ki • Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • ^ select.xml:17: parser error : PCDATA invalid Char value 55296 ki • Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • ^ select.xml:17: parser error : Char 0xDF32 out of allowed range • Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • � ^ select.xml:17: parser error : PCDATA invalid Char value 57138 • Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • � ^ select.xml:17: parser error : Char 0xD800 out of allowed range �� Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • �� ^ select.xml:17: parser error : PCDATA invalid Char value 55296 �� Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • �� ^ select.xml:17: parser error : Char 0xDF3F out of allowed range Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • ��� ^ select.xml:17: parser error : PCDATA invalid Char value 57151 Eʋegbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • ��� ^ select.xml:17: parser error : Char 0xD800 out of allowed range egbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • ^ select.xml:17: parser error : PCDATA invalid Char value 55296 egbe • Frasch • Fulfulde • Gagauz • Gĩkũyũ • ^ select.xml:17: parser error : Char 0xDF44 out of allowed range e • Frasch • Fulfulde • Gagauz • Gĩkũyũ • � ^ select.xml:17: parser error : PCDATA invalid Char value 57156 e • Frasch • Fulfulde • Gagauz • Gĩkũyũ • � ^ select.xml:17: parser error : Char 0xD800 out of allowed range �• Frasch • Fulfulde • Gagauz • Gĩkũyũ • �� ^ select.xml:17: parser error : PCDATA invalid Char value 55296 �• Frasch • Fulfulde • Gagauz • Gĩkũyũ • �� ^ select.xml:17: parser error : Char 0xDF39 out of allowed range � Frasch • Fulfulde • Gagauz • Gĩkũyũ • ��� ^ select.xml:17: parser error : PCDATA invalid Char value 57145 � Frasch • Fulfulde • Gagauz • Gĩkũyũ • ��� ^ select.xml:17: parser error : Char 0xD800 out of allowed range rasch • Fulfulde • Gagauz • Gĩkũyũ • ^ select.xml:17: parser error : PCDATA invalid Char value 55296 rasch • Fulfulde • Gagauz • Gĩkũyũ • ^ select.xml:17: parser error : Char 0xDF43 out of allowed range ch • Fulfulde • Gagauz • Gĩkũyũ • � ^ select.xml:17: parser error : PCDATA invalid Char value 57155 ch • Fulfulde • Gagauz • Gĩkũyũ • � ^ select.xml:17: parser error : Char 0xD800 out of allowed range • Fulfulde • Gagauz • Gĩkũyũ • �� ^ select.xml:17: parser error : PCDATA invalid Char value 55296 • Fulfulde • Gagauz • Gĩkũyũ • �� ^ select.xml:17: parser error : Char 0xDF3A out of allowed range �� Fulfulde • Gagauz • Gĩkũyũ • ��� ^ select.xml:17: parser error : PCDATA invalid Char value 57146 �� Fulfulde • Gagauz • Gĩkũyũ • ��� On Tuesday 01 February 2011 17:00:19 Stefan Matheis wrote: Hi Markus, to verify that it's not an Firefox-Issue, try xmllint on your shell to check the given xml? Regards Stefan On Tue, Feb 1, 2011 at 4:43 PM, Markus Jelsma markus.jel...@openindex.io wrote: There is an issue with the XML response
Re: Malformed XML with exotic characters
Hi, There is no typical encoding issues on my system. I can index, query and display english, german, chinese, vietnamese etc. Cheers On Tuesday 01 February 2011 17:23:49 François Schiettecatte wrote: Markus A few things to check, make sure whatever SOLR is hosted on is outputting utf-8 ( URIEncoding=UTF-8 in the Connector section in server.xml on Tomcat for example), which it looks like here, also make sure that whatever http header there is tells firefox that it is getting utf-8 (otherwise it defaults to iso-8859-1/latin-1), finally make sure that whatever font you use in firefox has the 'exotic' characters you are expecting. There might also be some issues on your platform with mixing script direction but that is probably not likely. Cheers François On Feb 1, 2011, at 10:43 AM, Markus Jelsma wrote: There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the JSON response writer. The problem is, i am unsure whether this is a bug in Solr or that perhaps Firefox itself trips over. Here's the output of the JSONResponeWriter for a query returning the home page: { responseHeader:{ status:0, QTime:1, params:{ fl:url,content, indent:true, wt:json, q:*:*, rows:1}}, response:{numFound:6744,start:0,docs:[ { url:http://www.wikipedia.org/;, content:Wikipedia English The Free Encyclopedia 3 543 000+ articles 日 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije encyclopedie 668 000+ artikelen Search • Suchen • Rechercher • Szukaj • Ricerca • 検索 • Buscar • Busca • Zoeken • Поиск • Sök • 搜尋 • Cerca • Søk • Haku • Пошук • Hledání • Keresés • Căutare • 찾기 • Tìm kiếm • Ara • Cari • Søg • بحث • Serĉu • Претрага • Paieška • Hľadať • Suk • جستجو • חיפוש • Търсене • Poišči • Cari • Bilnga العربية Български Català Česky Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文 100 000+ العربية • Български • Català • Česky • Dansk • Deutsch • English • Español • Esperanto • فارسی • Français • 한국어 • Bahasa Indonesia • Italiano • עברית • Lietuvių • Magyar • Bahasa Melayu • Nederlands • 日本語 • Norsk (bokmål) • Polski • Português • Русский • Română • Slovenčina • Slovenščina • Српски / Srpski • Suomi • Svenska • Türkçe • Українська • Tiếng Việt • Volapük • Winaray • 中文 10 000+ Afrikaans • Aragonés • Armãneashce • Asturianu • Kreyòl Ayisyen • Azərbaycan / آذربايجان ديلی • বাংলা • Беларуская ( Акадэмічная • Тарашкевiца ) • বিষ্ণুপ্রিযা় মণিপুরী • Bosanski • Brezhoneg • Чăваш • Cymraeg • Eesti • Ελληνικά • Euskara • Frysk • Gaeilge • Galego • ગુજરાતી • Հայերեն • हिन्दी • Hrvatski • Ido • Íslenska • Basa Jawa • ಕನ್ನಡ • ქართული • Kurdî / كوردی • Latina • Latviešu • Lëtzebuergesch • Lumbaart • Македонски • മലയാളം • मराठी • नेपाल भाषा • नेपाली • Norsk (nynorsk) • Nnapulitano • Occitan • Piemontèis • Plattdüütsch • Ripoarisch • Runa Simi • شاہ مکھی پنجابی • Shqip • Sicilianu • Simple English • Sinugboanon • Srpskohrvatski / Српскохрватски • Basa Sunda • Kiswahili • Tagalog • தமிழ் • తెలుగు • ไทย • اردو • Walon • Yorùbá • 粵語 • Žemaitėška 1 000+ Bahsa Acèh • Alemannisch • አማርኛ • Arpitan • ܐܬܘܪܝܐ • Avañe’ẽ • Aymar Aru • Bân-lâm-gú • Bahasa Banjar • Basa Banyumasan • Башҡорт • भोजपुरी • Bikol Central • Boarisch • བོད་ཡིག • Chavacano de Zamboanga • Corsu • Deitsch • ދިވެހި • Diné Bizaad • Eald Englisc • Emigliàn–Rumagnòl • Эрзянь • Estremeñu • Fiji Hindi • Føroyskt • Furlan • Gaelg • Gàidhlig • 贛語 • گیلکی • Hak- kâ-fa / 客家話 • Хальмг • ʻŌlelo Hawaiʻi • Hornjoserbsce • Ilokano • Interlingua • Interlingue • Ирон Æвзаг • Kapampangan • Kaszëbsczi • Kernewek • ភាសាខ្មែរ • Kinyarwanda • Коми • Кыргызча • Ladino / לאדינו • Ligure • Limburgs • Lingála • lojban • Malagasy • Malti • 文言 • Māori • مصرى • مازِرونی / Mäzeruni • Монгол • မြန်မာဘာသာ • Nāhuatlahtōlli • Nedersaksisch • Nouormand • Novial • Нохчийн • Олык Марий • O‘zbek • पाऴि • Pangasinán • ਪੰਜਾਬੀ
Re: Malformed XML with exotic characters
Hi folks, I've made the same observation when working with Solr's ExtractingRequestHandler on the command line (no browser interaction). When issuing the following curl command curl 'http://mysolrhost/solr/update/extract?extractOnly=trueextractFormat=textwt=xmlresource.name=foo.pdf' --data-binary @foo.pdf -H 'Content-type:text/xml; charset=utf-8' foo.xml Solr's XML response writer returns malformed xml, e.g., xmllint gives me: foo.xml:21: parser error : Char 0xD835 out of allowed range foo.xml:21: parser error : PCDATA invalid Char value 55349 I'm not totally sure, if this is an Tika/PDFBox issue. However, I would expect in every case that the XML output produced by Solr is well-formed even if the libraries used under the hood return garbage. -Sascha p.s. I can provide the pdf file in question, if anybody would like to see it in action. On 01.02.2011 16:43, Markus Jelsma wrote: There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the JSON response writer. The problem is, i am unsure whether this is a bug in Solr or that perhaps Firefox itself trips over. Here's the output of the JSONResponeWriter for a query returning the home page: { responseHeader:{ status:0, QTime:1, params:{ fl:url,content, indent:true, wt:json, q:*:*, rows:1}}, response:{numFound:6744,start:0,docs:[ { url:http://www.wikipedia.org/;, content:Wikipedia English The Free Encyclopedia 3 543 000+ articles 日 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije encyclopedie 668 000+ artikelen Search • Suchen • Rechercher • Szukaj • Ricerca • 検索 • Buscar • Busca • Zoeken • Поиск • Sök • 搜尋 • Cerca • Søk • Haku • Пошук • Hledání • Keresés • Căutare • 찾기 • Tìm kiếm • Ara • Cari • Søg • بحث • Serĉu • Претрага • Paieška • Hľadať • Suk • جستجو • חיפוש • Търсене • Poišči • Cari • Bilnga العربية Български Català Česky Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文 100 000+ العربية • Български • Català • Česky • Dansk • Deutsch • English • Español • Esperanto • فارسی • Français • 한국어 • Bahasa Indonesia • Italiano • עברית • Lietuvių • Magyar • Bahasa Melayu • Nederlands • 日本語 • Norsk (bokmål) • Polski • Português • Русский • Română • Slovenčina • Slovenščina • Српски / Srpski • Suomi • Svenska • Türkçe • Українська • Tiếng Việt • Volapük • Winaray • 中文 10 000+ Afrikaans • Aragonés • Armãneashce • Asturianu • Kreyòl Ayisyen • Azərbaycan / آذربايجان ديلی • বাংলা • Беларуская ( Акадэмічная • Тарашкевiца ) • বিষ্ণুপ্রিযা় মণিপুরী • Bosanski • Brezhoneg • Чăваш • Cymraeg • Eesti • Ελληνικά • Euskara • Frysk • Gaeilge • Galego • ગુજરાતી • Հայերեն • हिन्दी • Hrvatski • Ido • Íslenska • Basa Jawa • ಕನ್ನಡ • ქართული • Kurdî / كوردی • Latina • Latviešu • Lëtzebuergesch • Lumbaart • Македонски • മലയാളം • मराठी • नेपाल भाषा • नेपाली • Norsk (nynorsk) • Nnapulitano • Occitan • Piemontèis • Plattdüütsch • Ripoarisch • Runa Simi • شاہ مکھی پنجابی • Shqip • Sicilianu • Simple English • Sinugboanon • Srpskohrvatski / Српскохрватски • Basa Sunda • Kiswahili • Tagalog • தமிழ் • తెలుగు • ไทย • اردو • Walon • Yorùbá • 粵語 • Žemaitėška 1 000+ Bahsa Acèh • Alemannisch • አማርኛ • Arpitan • ܐܬܘܪܝܐ • Avañe’ẽ • Aymar Aru • Bân-lâm-gú • Bahasa Banjar • Basa Banyumasan • Башҡорт • भोजपुरी • Bikol Central • Boarisch • བོད་ཡིག • Chavacano de Zamboanga • Corsu • Deitsch • ދިވެހި • Diné Bizaad • Eald Englisc • Emigliàn–Rumagnòl • Эрзянь • Estremeñu • Fiji Hindi • Føroyskt • Furlan • Gaelg • Gàidhlig • 贛語 • گیلکی • Hak- kâ-fa / 客家話 • Хальмг • ʻŌlelo Hawaiʻi • Hornjoserbsce • Ilokano • Interlingua • Interlingue • Ирон Æвзаг • Kapampangan • Kaszëbsczi • Kernewek • ភាសាខ្មែរ • Kinyarwanda • Коми • Кыргызча • Ladino / לאדינו • Ligure • Limburgs • Lingála • lojban • Malagasy • Malti • 文言 • Māori • مصرى • مازِرونی / Mäzeruni • Монгол • မြန်မာဘာသာ • Nāhuatlahtōlli • Nedersaksisch • Nouormand • Novial • Нохчийн • Олык Марий • O‘zbek • पाऴि • Pangasinán • ਪੰਜਾਬੀ / پنجابی • Papiamentu • پښتو • Picard •
Re: Malformed XML with exotic characters
You can exclude the input's involvement by checking if other response writers do work. For me, the JSONResponseWriter works perfectly with the same returned data in some AJAX environment. On Tuesday 01 February 2011 18:29:06 Sascha Szott wrote: Hi folks, I've made the same observation when working with Solr's ExtractingRequestHandler on the command line (no browser interaction). When issuing the following curl command curl 'http://mysolrhost/solr/update/extract?extractOnly=trueextractFormat=text; wt=xmlresource.name=foo.pdf' --data-binary @foo.pdf -H 'Content-type:text/xml; charset=utf-8' foo.xml Solr's XML response writer returns malformed xml, e.g., xmllint gives me: foo.xml:21: parser error : Char 0xD835 out of allowed range foo.xml:21: parser error : PCDATA invalid Char value 55349 I'm not totally sure, if this is an Tika/PDFBox issue. However, I would expect in every case that the XML output produced by Solr is well-formed even if the libraries used under the hood return garbage. -Sascha p.s. I can provide the pdf file in question, if anybody would like to see it in action. On 01.02.2011 16:43, Markus Jelsma wrote: There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the JSON response writer. The problem is, i am unsure whether this is a bug in Solr or that perhaps Firefox itself trips over. Here's the output of the JSONResponeWriter for a query returning the home page: { responseHeader:{ status:0, QTime:1, params:{ fl:url,content, indent:true, wt:json, q:*:*, rows:1}}, response:{numFound:6744,start:0,docs:[ { url:http://www.wikipedia.org/;, content:Wikipedia English The Free Encyclopedia 3 543 000+ articles 日 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije encyclopedie 668 000+ artikelen Search • Suchen • Rechercher • Szukaj • Ricerca • 検索 • Buscar • Busca • Zoeken • Поиск • Sök • 搜尋 • Cerca • Søk • Haku • Пошук • Hledání • Keresés • Căutare • 찾기 • Tìm kiếm • Ara • Cari • Søg • بحث • Serĉu • Претрага • Paieška • Hľadať • Suk • جستجو • חיפוש • Търсене • Poišči • Cari • Bilnga العربية Български Català Česky Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文 100 000+ العربية • Български • Català • Česky • Dansk • Deutsch • English • Español • Esperanto • فارسی • Français • 한국어 • Bahasa Indonesia • Italiano • עברית • Lietuvių • Magyar • Bahasa Melayu • Nederlands • 日本語 • Norsk (bokmål) • Polski • Português • Русский • Română • Slovenčina • Slovenščina • Српски / Srpski • Suomi • Svenska • Türkçe • Українська • Tiếng Việt • Volapük • Winaray • 中文 10 000+ Afrikaans • Aragonés • Armãneashce • Asturianu • Kreyòl Ayisyen • Azərbaycan / آذربايجان ديلی • বাংলা • Беларуская ( Акадэмічная • Тарашкевiца ) • বিষ্ণুপ্রিযা় মণিপুরী • Bosanski • Brezhoneg • Чăваш • Cymraeg • Eesti • Ελληνικά • Euskara • Frysk • Gaeilge • Galego • ગુજરાતી • Հայերեն • हिन्दी • Hrvatski • Ido • Íslenska • Basa Jawa • ಕನ್ನಡ • ქართული • Kurdî / كوردی • Latina • Latviešu • Lëtzebuergesch • Lumbaart • Македонски • മലയാളം • मराठी • नेपाल भाषा • नेपाली • Norsk (nynorsk) • Nnapulitano • Occitan • Piemontèis • Plattdüütsch • Ripoarisch • Runa Simi • شاہ مکھی پنجابی • Shqip • Sicilianu • Simple English • Sinugboanon • Srpskohrvatski / Српскохрватски • Basa Sunda • Kiswahili • Tagalog • தமிழ் • తెలుగు • ไทย • اردو • Walon • Yorùbá • 粵語 • Žemaitėška 1 000+ Bahsa Acèh • Alemannisch • አማርኛ • Arpitan • ܐܬܘܪܝܐ • Avañe’ẽ • Aymar Aru • Bân-lâm-gú • Bahasa Banjar • Basa Banyumasan • Башҡорт • भोजपुरी • Bikol Central • Boarisch • བོད་ཡིག • Chavacano de Zamboanga • Corsu • Deitsch • ދިވެހި • Diné Bizaad • Eald Englisc • Emigliàn–Rumagnòl • Эрзянь • Estremeñu • Fiji Hindi • Føroyskt • Furlan • Gaelg • Gàidhlig • 贛語 • گیلکی • Hak- kâ-fa / 客家話 • Хальмг • ʻŌlelo Hawaiʻi • Hornjoserbsce • Ilokano • Interlingua • Interlingue • Ирон Æвзаг •
Re: Malformed XML with exotic characters
Hi Markus, in my case the JSON response writer returns valid JSON. The same holds for the PHP response writer. -Sascha On 01.02.2011 18:44, Markus Jelsma wrote: You can exclude the input's involvement by checking if other response writers do work. For me, the JSONResponseWriter works perfectly with the same returned data in some AJAX environment. On Tuesday 01 February 2011 18:29:06 Sascha Szott wrote: Hi folks, I've made the same observation when working with Solr's ExtractingRequestHandler on the command line (no browser interaction). When issuing the following curl command curl 'http://mysolrhost/solr/update/extract?extractOnly=trueextractFormat=text; wt=xmlresource.name=foo.pdf' --data-binary @foo.pdf -H 'Content-type:text/xml; charset=utf-8' foo.xml Solr's XML response writer returns malformed xml, e.g., xmllint gives me: foo.xml:21: parser error : Char 0xD835 out of allowed range foo.xml:21: parser error : PCDATA invalid Char value 55349 I'm not totally sure, if this is an Tika/PDFBox issue. However, I would expect in every case that the XML output produced by Solr is well-formed even if the libraries used under the hood return garbage. -Sascha p.s. I can provide the pdf file in question, if anybody would like to see it in action. On 01.02.2011 16:43, Markus Jelsma wrote: There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the JSON response writer. The problem is, i am unsure whether this is a bug in Solr or that perhaps Firefox itself trips over. Here's the output of the JSONResponeWriter for a query returning the home page: { responseHeader:{ status:0, QTime:1, params:{ fl:url,content, indent:true, wt:json, q:*:*, rows:1}}, response:{numFound:6744,start:0,docs:[ { url:http://www.wikipedia.org/;, content:Wikipedia English The Free Encyclopedia 3 543 000+ articles 日 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije encyclopedie 668 000+ artikelen Search • Suchen • Rechercher • Szukaj • Ricerca • 検索 • Buscar • Busca • Zoeken • Поиск • Sök • 搜尋 • Cerca • Søk • Haku • Пошук • Hledání • Keresés • Căutare • 찾기 • Tìm kiếm • Ara • Cari • Søg • بحث • Serĉu • Претрага • Paieška • Hľadať • Suk • جستجو • חיפוש • Търсене • Poišči • Cari • Bilnga العربية Български Català Česky Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文 100 000+ العربية • Български • Català • Česky • Dansk • Deutsch • English • Español • Esperanto • فارسی • Français • 한국어 • Bahasa Indonesia • Italiano • עברית • Lietuvių • Magyar • Bahasa Melayu • Nederlands • 日本語 • Norsk (bokmål) • Polski • Português • Русский • Română • Slovenčina • Slovenščina • Српски / Srpski • Suomi • Svenska • Türkçe • Українська • Tiếng Việt • Volapük • Winaray • 中文 10 000+ Afrikaans • Aragonés • Armãneashce • Asturianu • Kreyòl Ayisyen • Azərbaycan / آذربايجان ديلی • বাংলা • Беларуская ( Акадэмічная • Тарашкевiца ) • বিষ্ণুপ্রিযা় মণিপুরী • Bosanski • Brezhoneg • Чăваш • Cymraeg • Eesti • Ελληνικά • Euskara • Frysk • Gaeilge • Galego • ગુજરાતી • Հայերեն • हिन्दी • Hrvatski • Ido • Íslenska • Basa Jawa • ಕನ್ನಡ • ქართული • Kurdî / كوردی • Latina • Latviešu • Lëtzebuergesch • Lumbaart • Македонски • മലയാളം • मराठी • नेपाल भाषा • नेपाली • Norsk (nynorsk) • Nnapulitano • Occitan • Piemontèis • Plattdüütsch • Ripoarisch • Runa Simi • شاہ مکھی پنجابی • Shqip • Sicilianu • Simple English • Sinugboanon • Srpskohrvatski / Српскохрватски • Basa Sunda • Kiswahili • Tagalog • தமிழ் • తెలుగు • ไทย • اردو • Walon • Yorùbá • 粵語 • Žemaitėška 1 000+ Bahsa Acèh • Alemannisch • አማርኛ • Arpitan • ܐܬܘܪܝܐ • Avañe’ẽ • Aymar Aru • Bân-lâm-gú • Bahasa Banjar • Basa Banyumasan • Башҡорт • भोजपुरी • Bikol Central • Boarisch • བོད་ཡིག • Chavacano de Zamboanga • Corsu • Deitsch • ދިވެހި • Diné Bizaad • Eald Englisc • Emigliàn–Rumagnòl • Эрзянь • Estremeñu • Fiji Hindi • Føroyskt • Furlan • Gaelg • Gàidhlig • 贛語 • گیلکی • Hak- kâ-fa / 客家話 • Хальмг • ʻŌlelo Hawaiʻi • Hornjoserbsce • Ilokano •
Re: Malformed XML with exotic characters
Hi, it might only be a problem with your xml tools (e.g. firefox). the problem here is characters outside of the basic multilingual plane (in this case Gothic). XML tools typically fall apart on these portions of unicode (in lucene we recently reverted to a patched/hacked copy of xerces specifically for this reason). If you care about characters outside of the basic multilingual plane actually working, unfortunately you have to start being very very very particular about what software you use... you can assume most software/setups WON'T work. For example, if you were to use mysql's utf8 character set you would find it doesn't actually support all of UTF-8! in this case you would need to use the recent 'utf8mb4' or something instead, that is actually utf-8! Thats just one example of a well-used piece of software that suffers from issues like this, there are others. Its for reasons like these that if support for these languages is important to you, I would stick with the most simple/textual methods for input and output: e.g. using things like CSV and JSON if you can. I would also fully test every component/jar in your application individually and once you get it working, don't ever upgrade. In any case, if you are having problems with characters outside of the basic multilingual plane, and you suspect its actually a bug in Solr, please open a JIRA issue, especially if you can provide some way to reproduce it On Tue, Feb 1, 2011 at 10:43 AM, Markus Jelsma markus.jel...@openindex.io wrote: There is an issue with the XML response writer. It cannot cope with some very exotic characters or possibly the right-to-left writing systems. The issue can be reproduced by indexing the content of the home page of wikipedia as it contains a lot of exotic matter. The problem does not affect the JSON response writer. The problem is, i am unsure whether this is a bug in Solr or that perhaps Firefox itself trips over. Here's the output of the JSONResponeWriter for a query returning the home page: { responseHeader:{ status:0, QTime:1, params:{ fl:url,content, indent:true, wt:json, q:*:*, rows:1}}, response:{numFound:6744,start:0,docs:[ { url:http://www.wikipedia.org/;, content:Wikipedia English The Free Encyclopedia 3 543 000+ articles 日 本語 フリー百科事典 730 000+ 記事 Deutsch Die freie Enzyklopädie 1 181 000+ Artikel Español La enciclopedia libre 710 000+ artículos Français L’encyclopédie libre 1 061 000+ articles Русский Свободная энциклопедия 654 000+ статей Italiano L’enciclopedia libera 768 000+ voci Português A enciclopédia livre 669 000+ artigos Polski Wolna encyklopedia 769 000+ haseł Nederlands De vrije encyclopedie 668 000+ artikelen Search • Suchen • Rechercher • Szukaj • Ricerca • 検索 • Buscar • Busca • Zoeken • Поиск • Sök • 搜尋 • Cerca • Søk • Haku • Пошук • Hledání • Keresés • Căutare • 찾기 • Tìm kiếm • Ara • Cari • Søg • بحث • Serĉu • Претрага • Paieška • Hľadať • Suk • جستجو • חיפוש • Търсене • Poišči • Cari • Bilnga العربية Български Català Česky Dansk Deutsch English Español Esperanto فارسی Français 한국어 Bahasa Indonesia Italiano עברית Lietuvių Magyar Bahasa Melayu Nederlands 日本語 Norsk (bokmål) Polski Português Română Русский Slovenčina Slovenščina Српски / Srpski Suomi Svenska Türkçe Українська Tiếng Việt Volapük Winaray 中文 100 000+ العربية • Български • Català • Česky • Dansk • Deutsch • English • Español • Esperanto • فارسی • Français • 한국어 • Bahasa Indonesia • Italiano • עברית • Lietuvių • Magyar • Bahasa Melayu • Nederlands • 日本語 • Norsk (bokmål) • Polski • Português • Русский • Română • Slovenčina • Slovenščina • Српски / Srpski • Suomi • Svenska • Türkçe • Українська • Tiếng Việt • Volapük • Winaray • 中文 10 000+ Afrikaans • Aragonés • Armãneashce • Asturianu • Kreyòl Ayisyen • Azərbaycan / آذربايجان ديلی • বাংলা • Беларуская ( Акадэмічная • Тарашкевiца ) • বিষ্ণুপ্রিযা় মণিপুরী • Bosanski • Brezhoneg • Чăваш • Cymraeg • Eesti • Ελληνικά • Euskara • Frysk • Gaeilge • Galego • ગુજરાતી • Հայերեն • हिन्दी • Hrvatski • Ido • Íslenska • Basa Jawa • ಕನ್ನಡ • ქართული • Kurdî / كوردی • Latina • Latviešu • Lëtzebuergesch • Lumbaart • Македонски • മലയാളം • मराठी • नेपाल भाषा • नेपाली • Norsk (nynorsk) • Nnapulitano • Occitan • Piemontèis • Plattdüütsch • Ripoarisch • Runa Simi • شاہ مکھی پنجابی • Shqip • Sicilianu • Simple English • Sinugboanon • Srpskohrvatski / Српскохрватски • Basa Sunda • Kiswahili • Tagalog • தமிழ் • తెలుగు • ไทย • اردو • Walon • Yorùbá • 粵語 • Žemaitėška 1 000+ Bahsa Acèh • Alemannisch • አማርኛ • Arpitan • ܐܬܘܪܝܐ • Avañe’ẽ • Aymar Aru • Bân-lâm-gú • Bahasa Banjar • Basa Banyumasan • Башҡорт • भोजपुरी • Bikol Central • Boarisch • བོད་ཡིག • Chavacano de Zamboanga • Corsu • Deitsch • ދިވެހި • Diné Bizaad • Eald Englisc