Hi Ken,

Now I set the jsp pages to utf-8 encoding, but the result is still the same.
Do you have any other idea? There's another problem.
If I'm writing non-english characters in the input textbox, after clicking
on the search the content of the text-box will be false, e.g. the text is
not displayed correctly.

Many Thanks,
Zsolt

Ken Krugler ([EMAIL PROTECTED]) wrote:
>
> Hi Zsolt,
>
> >Here is the cache view :
> >http://64.34.163.57:8080/nutch-0.9/cached.jsp?idx=0&id=0
>
> When I hit this with curl, I see that it's
> returning Content-Type:
> text/html;charset=iso-8859-2 in the response
> header, and the content has <meta
> http-equiv="Content-Type" content="text/html;
> charset=iso-8859-2">.
>
> But I see that the base href is:
>
> <base href="http://www.daganatok.hu/";>
>
> And when I hit that URL, I get back:
>
> < Content-Type: text/html; charset=utf-8
>          <meta http-equiv="Content-Type"
> content="text/html; charset=utf-8"
> />.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
>
> The data seems to be valid UTF-8, and from my
> experience Nutch works correctly with correctly
> identified UTF-8 web pages.
>
> So I'm I'm guessing the '?' come about when your
> webapp container/server tries to convert the
> UTF-8 data to 8859-2.
>
> -- Ken
>
> >Ken Krugler ([EMAIL PROTECTED]) wrote:
> >>
> >>  >Hi All,
> >>  >
> >>  >I would like to share an issue regarding the encoding
> >>  >using Nutch 0.9.x.
> >>  >
> >>  >When I'm indexing some sites, which contains lot of
> >>  >ISO-8859-2 characters, (these are mainly eastern-european
> >>  >sites, mainly hungarian ones) then at the search page
> >>  >I cannot see the characters correcty. Even at the cached
> >>  >view, the non-english characters like áéú&#337; are visible
> >>  >as a question mark.
> >>  >
> >>  >If some of you, have an experience with this issue,
> >>  >I would be glad when some of You can help me.
> >>
> >>  What's the URL of an example page with this type of problem?
> >>
> >  > -- Ken
>
> --
> Ken Krugler
> Krugle, Inc.
> +1 530-210-6378
> "Find Code, Find Answers"
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to