[ http://issues.apache.org/jira/browse/NUTCH-257?page=all ]
Jerome Charron resolved NUTCH-257:
----------------------------------
Fix Version: 0.8-dev
Resolution: Fixed
Assign To: Jerome Charron
Fixed in revision #405565 - http://svn.apache.org/viewcvs?view=rev&rev=405565
> Summary#toString always Entity encodes -- problem for
> OpenSearchServlet#description field
> -----------------------------------------------------------------------------------------
>
> Key: NUTCH-257
> URL: http://issues.apache.org/jira/browse/NUTCH-257
> Project: Nutch
> Type: Bug
> Components: searcher
> Versions: 0.8-dev
> Reporter: [EMAIL PROTECTED]
> Assignee: Jerome Charron
> Priority: Minor
> Fix For: 0.8-dev
>
> All search result data we display in search results has to be explicitly
> Entity.encoded outputing in search.jsp ( title, url, etc.) except Summaries.
> Its already Entity.encoded. This is fine when outputing HTML but it gets in
> the way when outputing otherwise -- as xml for example. I'd suggest we not
> make any presumption about how search results are used.
> The problem becomes especially acute when the text language is other than
> english.
> Here is an example of what a Czech description field in an OpenSearchServlet
> hit record looks like:
> <description><span class="ellipsis"> ...
> </span>V&#283;deck&aacute; knihovna v Olomouci
> Bezru&#269;ova 2, Olomouc 9, 779 11, &#268;esk&aacute; republika
> &nbsp; tel. +420-585223441 &nbsp; fax +420-585225774
> http://www.<span class="highlight">vkol</span>.cz/
> &nbsp;&nbsp; mailto:info@<span
> class="highlight">vkol</span>.cz Otev&#345;eno : &nbsp;
> po-p&aacute; &nbsp; 8 30 -19 00 &nbsp;&nbsp;&nbsp; so
> &nbsp; 9 00 -13 00 &nbsp;&nbsp;&nbsp; ne &nbsp;
> zav&#345;eno V katalogu s &uacute;pln&yacute;m
> &#269;asov&yacute;m<span class="ellipsis"> ... </span>03
> Organizace 20/12 Odkazy 19/04 Hledej 23/03 &nbsp; 23/03 &nbsp;
> Po&#269;et p&#345;&iacute;stup&#367; od 1.9.1998. Statistiky
> . [ ] &nbsp; [ Nahoru ] <span
> class="highlight">VKOL</span></description>
> Here is same description field with Entity.encoding disabled:
> <description><span class="ellipsis"> ... </span>tisky statistiky
> knihovny WWW serveru st?edov?ké rukopisy studovny CD-ROM historických fond?
> hlavní Internet N?mecké knihovny vázaných novin SVKOL viz <span
> class="highlight">VKOL</span> ?atna T telefonní ?ísla knihovny
> zam?stnanc? U V vazba v?cný popis vedení knihovny vedoucí odd?lení video
> <span class="highlight">VKOL</span> volný výb?r výp?j?ka výro?ní
> zpráva výstavy W webmaster WWW odkazy X Y Z - ? zamluvení knihy zahrani?ní
> periodika zpracování fondu<span class="highlight">VKOL</span> -
> hledej Hledej [ <span class="highlight">VKOL</span> ] [ Novinky ]
> [ Katalog ] [ Slu?by ] [ Aktivity ] [ Pr?vodce ] [ Dokumenty ] [ Regionální
> fce ] [ Organizace ] [ Odkazy ] [ Hledej ] [ ] [ ] Obsah full-textové
> vyhledávání, 19/04/2003 rejst?ík vybraných<span class="ellipsis"> ...
> </span></description>
> Notice how the Czech characters in the first are all numerically encoded:
> i.e. #NNN;.
> I'd suggest that Summary#toString() become Summary#toEntityEncodedString()
> and that toString return raw aggregation of Fragments. Would likely require
> adding methods to the HitSummarizer interface so can ask for either raw text
> or entity encoded with addition to NutchBean so can ask for either. Or,
> better I'd suggest is that Summarizer never return Entity.encoded text. Let
> that happen in search.jsp (I can make patch to do the latter if its amenable).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers