Re: translation of searchtmpl/search.wml

2003-01-21 Thread Craig Small
On Tue, Jan 21, 2003 at 08:34:18AM +0900, Tomohiro KUBOTA wrote:
> Not yet.  However, I believe that --with-extra-charsets is a necessary
> condition, though I am not sure that it is a necessary and sufficient
> condition (but I expect so).  Please read:

I have also received your bug reports about the extra charsets, the next
version will take that into account.  I've got some more problematic
bugs with mnogosearch that need fixing first.

> I thought about testing it but I don't have enough time to study database,
> because I am entirely new on database.  (Also, I could not test another
> etc/index..htm problem because search.cgi in klecker:~/public_html
> didn't work well and I don't know why.  It may be because of apache's
> configuration.)
UID problems? Postgresql complains you're not user www-data?
If something is in ~/public_html then it runs as that user, not www-data

  - Craig

-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.enc.com.au/<[EMAIL PROTECTED]>
MIEEE <[EMAIL PROTECTED]> Debian developer <[EMAIL PROTECTED]>



Re: translation of searchtmpl/search.wml

2003-01-20 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Craig Small)
Subject: Re: translation of searchtmpl/search.wml
Date: Tue, 21 Jan 2003 09:13:57 +1100

> It was before you replied to that mail (or maybe another?).  Essentially
> I was incorrect.  mnogosearch *stores* its indexing in UTF-8 but needs
> all those flags to do the indexing.
> 
> Have you tried a little bit of indexing of Japanese pages to see if it
> does seem to behave itself?

Not yet.  However, I believe that --with-extra-charsets is a necessary
condition, though I am not sure that it is a necessary and sufficient
condition (but I expect so).  Please read:
http://www.mnogosearch.org/board/message.php?id=6350

Note that Japanese (and Chinese) has "The Problem 2" (no spaces between
words) and need the newer version of mnoGoSearch with ChaSen, as you
wrote.  However, I expect Korean will be fully fixed by --with-extra-charsets.
I also expect that Japanese and Chinese words which occasionally appear
independently (i.e., separated by spaces or HTML tags) will be able to
be searched.

I thought about testing it but I don't have enough time to study database,
because I am entirely new on database.  (Also, I could not test another
etc/index..htm problem because search.cgi in klecker:~/public_html
didn't work well and I don't know why.  It may be because of apache's
configuration.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: translation of searchtmpl/search.wml

2003-01-20 Thread Craig Small
On Sun, Jan 19, 2003 at 11:45:17PM +0900, Tomohiro KUBOTA wrote:
> 
> PS. The content negotiation
>   http://search.debian.org/new/index.en.cgi
>   http://search.debian.org/new/index.fr.cgi
> seems not work well.  Though I am afraid I am wrong, how about
> renaming /org/search.debian.org/etc/search..htm into
> renaming /org/search.debian.org/etc/index..htm ?  The source
> code of mnoGoSearch seems to substitute ".cgi" with ".htm" to
> search the configuration file (src/search.c).
That's right, the /new/ stuff was to test other things.  The files
eventually become index..htm

The problem is you need to, currently, manually edit them before they
are placed in the etc directory.

  - Craig
-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.enc.com.au/<[EMAIL PROTECTED]>
MIEEE <[EMAIL PROTECTED]> Debian developer <[EMAIL PROTECTED]>



Re: translation of searchtmpl/search.wml

2003-01-20 Thread Craig Small
On Mon, Jan 20, 2003 at 06:22:08PM +0900, Tomohiro KUBOTA wrote:
> 
> BTW, do you have any idea why Craig thinks like the following mail?
> Maybe I am missing something
> http://lists.debian.org/debian-www/2003/debian-www-200301/msg00271.html

It was before you replied to that mail (or maybe another?).  Essentially
I was incorrect.  mnogosearch *stores* its indexing in UTF-8 but needs
all those flags to do the indexing.

Have you tried a little bit of indexing of Japanese pages to see if it
does seem to behave itself?

  - Craig
-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.enc.com.au/<[EMAIL PROTECTED]>
MIEEE <[EMAIL PROTECTED]> Debian developer <[EMAIL PROTECTED]>



Re: translation of searchtmpl/search.wml

2003-01-20 Thread Denis Barbier
On Mon, Jan 20, 2003 at 06:22:08PM +0900, Tomohiro KUBOTA wrote:
> Hi,
> 
> From: [EMAIL PROTECTED] (Denis Barbier)
> Subject: Re: translation of searchtmpl/search.wml
> Date: Sun, 19 Jan 2003 21:46:34 +0100
> 
> > At first glance it sounds very good, but I am not sure this is the way
> > to go, because some strings are not handled by gettext, e.g. see
> > Catalan strings in webwml/english/template/debian/ctime.wml
> > There are also several Perl variables in date.pot, which will be
> > displayed according to current locale, and not UTF-8.
> > We could certainly play with CUR_LOCALE, but a simpler solution is
> > to post-process HTML files with iconv and change their charset
> > field in  tags, see attached patch.
> 
> I think your idea is better than mine.  I also checked your patch
> works well, i.e., builds translated pages in UTF-8.

All right, it is committed.

> BTW, do you have any idea why Craig thinks like the following mail?
> Maybe I am missing something
> http://lists.debian.org/debian-www/2003/debian-www-200301/msg00271.html

No idea, it seems quite clear.  You could compile and install it in a
private area, then index some files to check that it works as expected.
When you are sure it works, explain again why it has to be recompiled,
by providing examples with your own version.

Denis



Re: translation of searchtmpl/search.wml

2003-01-20 Thread Tomohiro KUBOTA
Hi,

From: [EMAIL PROTECTED] (Denis Barbier)
Subject: Re: translation of searchtmpl/search.wml
Date: Sun, 19 Jan 2003 21:46:34 +0100

> At first glance it sounds very good, but I am not sure this is the way
> to go, because some strings are not handled by gettext, e.g. see
> Catalan strings in webwml/english/template/debian/ctime.wml
> There are also several Perl variables in date.pot, which will be
> displayed according to current locale, and not UTF-8.
> We could certainly play with CUR_LOCALE, but a simpler solution is
> to post-process HTML files with iconv and change their charset
> field in  tags, see attached patch.

I think your idea is better than mine.  I also checked your patch
works well, i.e., builds translated pages in UTF-8.


BTW, do you have any idea why Craig thinks like the following mail?
Maybe I am missing something
http://lists.debian.org/debian-www/2003/debian-www-200301/msg00271.html

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/




Re: translation of searchtmpl/search.wml

2003-01-19 Thread Denis Barbier
On Sun, Jan 19, 2003 at 11:45:17PM +0900, Tomohiro KUBOTA wrote:
[..]
> I found that target encoding of gettext is defined in
> webwml/english/template/debian/common_tags.wml .  Thus, the best way
> is to redefine CHARSET_WML and CHARSET variables before it.  (These
> variables are defined in webwml//.wmlrc files.)  The patch
> attached to this mail includes this modification.
> It is also needed that translated search.wml files to be written in
> UTF-8.  The following patch includes a note on this point.

At first glance it sounds very good, but I am not sure this is the way
to go, because some strings are not handled by gettext, e.g. see
Catalan strings in webwml/english/template/debian/ctime.wml
There are also several Perl variables in date.pot, which will be
displayed according to current locale, and not UTF-8.
We could certainly play with CUR_LOCALE, but a simpler solution is
to post-process HTML files with iconv and change their charset
field in  tags, see attached patch.

> PS. The content negotiation
>   http://search.debian.org/new/index.en.cgi
>   http://search.debian.org/new/index.fr.cgi
> seems not work well.  Though I am afraid I am wrong, how about
> renaming /org/search.debian.org/etc/search..htm into
> renaming /org/search.debian.org/etc/index..htm ?  The source
> code of mnoGoSearch seems to substitute ".cgi" with ".htm" to
> search the configuration file (src/search.c).

No idea about this one.

Denis
Index: english/searchtmpl/Makefile
===
RCS file: /cvs/webwml/webwml/english/searchtmpl/Makefile,v
retrieving revision 1.5
diff -u -r1.5 Makefile
--- english/searchtmpl/Makefile 2 Nov 2002 23:36:01 -   1.5
+++ english/searchtmpl/Makefile 19 Jan 2003 20:45:33 -
@@ -10,8 +10,13 @@
 
 include $(WMLBASE)/Make.lang
 
+all:: search-convert
 
 search.$(LANGUAGE).html: search.wml $(ENGLISHSRCDIR)/searchtmpl/search.data \
   $(ENGLISHSRCDIR)/searchtmpl/search.def $(TEMPLDIR)/common_translation.wml \
   $(TEMPLDIR)/basic.wml $(TEMPLDIR)/languages.wml $(TEMPLDIR)/footer.wml \
   $(GETTEXTDEP)
+
+search-convert: search.$(LANGUAGE).html
+   @c=`grep '^' $? | sed -e 's/^/\1/'`; \
+  iconv -f $$c -t UTF-8 $? | sed -e 's///' > $?.tmp && mv $?.tmp $?
Index: english/searchtmpl/search.data
===
RCS file: /cvs/webwml/webwml/english/searchtmpl/search.data,v
retrieving revision 1.30
diff -u -r1.30 search.data
--- english/searchtmpl/search.data  30 Dec 2002 03:26:24 -  1.30
+++ english/searchtmpl/search.data  19 Jan 2003 20:45:33 -
@@ -62,7 +62,6 @@
 -->
 
 
-$(CHARSET=UTF-8) $(CHARSET_WML=UTF-8)
 #use wml::debian::common_translation HOME="http://www.debian.org";
 $(title=)
 #use wml::debian::languages