After negligible response to my last questions about this, I'm trying
again...


I've been modify parts of MnogoSearch and writing lots of my own code
recently.
Since I like XML as a transport medium, I have a way of pushing information
into the search engine as XML, and for returning results as XML.


First, Input:

This works quite well.

All the important bits of the pushing-info-into-the-engine can be found at
http://www-student.cs.york.ac.uk/~gjb105/udm_xml.tar.gz

Unfortunately, it currently only works on the crc-multi db schema. If anyone
ever responded, and asked for support for other types, I'd look at it, but
until then, it does what I want.

Any advice/comment from people on this would be greatly appreciated.




Second, Output:

I can upload a tarball of the relevant files if required [PHP, but if you
only want the XML file without the interesting HTTP header modification,
then you just need the serach.html equivalent]

This also works quite well. Internet Explorer loves it.

But perl under solaris doesn't. OK. It does like it, but not all of the
time.

Basically, I index a lot of word documents, and HTML documents created by
word. In that helpful way that word does, it uses various ugly character
sets, mostly {windows-,cp}{1250,1251, and derivatives}

Whenever I try and munge any output that has some of these non-ISO-friendly
charaters in it [nost noticably, apostrophes, but copyright symbols appear a
fair bit, too], it breaks. 
It throws a perl error of the form "scary character I don't recognise found
at line XX column YY"

I've not got no experience of perl character set munging, so I'm kinda
having problems working stuff out; I really need some help on this, as it's
currently the single biggest problem I'm having.

I don't like database translations, since you lose original data; so this
would probably have to be a post-processing thing, or even done in the PHP
if possible. Again, I've no real experience of this...



Thank-you very much,
Gary (-;
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to