Not all of charsets can be converted to ascii or latin1 charaset.
windows-1251 can't be converted to latin1/ascii, at least it's cyrillic
part.
Don't worry about windows-1252, it's letter compatible with latin1.
windows-1250 can be converted to latin1/ascii without having to loose 
major information, but 3.1.x branch has not so powerfull charset 
convertion code.

The answer is NO, you can't do translation in indexer. You may try
to do it in PHP.

"Briggs, Gary" wrote:
> 
> I'm outputting XML from my search engine for use in other people's websites,
> and I'm having a small problem.
> 
> Some of the sites I'm indexing are made in word [I've no control over this],
> and outputted as html.
> 
> And they're in strange character sets like windows-125{0,1,2}.
> 
> When I output the XML, it contains things like <92>s, which are the word
> equivalent of a normal '. Is there any way I can do translations on this,
> either in the indexer, or in the php? [I'm using the php front end, and
> crc-multi DB schema].
> 
> Basically, I'd like to see nothing more than US-ASCII or friends; much
> easier to use, and won't break perl scripts on unix boxes.
> 
> Anybody?
> 
> Ta,
> Gary (-;
> 
> PS I never got any response to my RFC on my code for putting stuff INTO the
> database from XML. Does anyone have anythign to add to it?
> ___________________________________________
> If you want to unsubscribe send "unsubscribe general"
> to [EMAIL PROTECTED]
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to