Re: [translate-pootle] search by phrase in Pootle

2008-11-27 Thread F Wolff
On Vr, 2008-11-28 at 02:18 +0100, Lars Kruse wrote:

...


> > > Would AND be the desired behaviour for the text search field?
> > >   
> > 
> > I think so, if you input 'pootle server' in search window, you should 
> > only get the result containing 'pootle server'.
> > AND was default behavior in previous versions of Pootle.
> 
> I just commited revision 9020. It splits the search input into words 
> (separated
> by whitespace) and appends each word to the "AND" query,
> I tested it with a xapian engine and with Pootle and the toolkit at revision
> 8822 due to some issues, that I did not investigate at that moment. But I
> assume, it should work well with HEAD, too.

Thank you for this, Lars. I guess things should work on trunk, but we
need to test and confirm at some stage. Do you consider this good enough
to backport to the 1.2 branch for the release of 1.2.1?

> 
> Just to make sure, that I did not neglect anything: is a simple "split" call 
> the
> right approach to separate words in a language neutral way?
> (see line 1020 in Pootle/projects.py)

It is the best we can do without going into lots of work. The bigger
question is perhaps how Lucene and Xapian splits/tokenises words. We
might want to get closer to that, rather than doing the 100% correct
thing.


> Just to clarify the current behaviour of the search field:
> 1) every word search is "partial" and case-insensitive - thus "poot" will find
> "Pootle"
> 2) Multiple words get splitted into single words. The single queries are
> partial, too. They are combined by "AND".
> 3) The order of multi-word queries does not matter: a search for "admin 
> pootle"
> will return "Pootle Languages Admin Page" (and others).
> 4) Multiple word input can be a mixture of source and target strings: a search
> for "remove sprache" will return "Remove Language" which is translated to
> "Sprache loeschen"
> 
> Do you think, that this detailed description of the search processing would be
> suitable for the "searching" wiki page[1] of Pootle? Then I could add it
> there ...
> 
> 
> I would appreciate any comments!
> 
> regards,
> Lars
> 
> [1] http://translate.sourceforge.net/wiki/pootle/searching


I think we can definitely add it there. There are some definite
differences between the indexed search and the pogrep search, so it
would be good to document them well.

I think there is still another way to really improve things: if we can
obtain possibly relevant results quickly from the indexer and use a real
GrepFilter to filter out the less relevant ones from there. This would
get the behaviour much closer to the non-indexed search, but still with
a good speedup, I think. Does this sound doable, Lars? Should we try to
do that for Pootle 1.2.1?

Keep well
Friedel


--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/blurred-vision-beeld


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] html2po with tidy

2008-11-27 Thread Amos Jeffries
Pål Eivind Jacobsen Nes wrote:
> Hi!
> 
> Without having seen your files; it sounds more like you are seeing the 
> UTF-8 characters encoded into entities. I would blame tidy for this, and 
> not Pootle.
> 
> Tell tidy to use UTF-8 for input and output. You're trying to squeeze 
> two or more bytes (Å) into one (A), and ending up using 6 (Å).
> 
> Don't use anything but UTF-8 for your localization projects! :D
> 
> US-ASCII contains only 128 characters, with all letters from the English 
> alphabet. Unicode (UTF-8) currently supports more than 100,000 characters.
> 
> Also, UTF-8 is a superset of ASCII, so an ASCII string is a valid UTF-8 
> string, you don't need to convert ASCII into UTF-8.
> 
> 

Thank you yes that was the problem.
We had to fully uninstall tidy from the machine in question.
While we are able to do this now, it may not always be the case.

IMO it would be a great addition to have the option in html2po to ignore 
tidy on outputs even if its installed.

AYJ

> - Pål
> 
> Amos Jeffries wrote:
>> Greetings, I'm running the Squid Proxy translation project using 
>> pootle and the toolkit to do the hard yards.
>>
>> We've encountered a serious problem with the way Pootle 1.2 html2po 
>> and tidy are interacting.
>>
>> The .html templates and .po are in utf-8 format. The files appear to 
>> translate correctly. But after the final filter through tidy they come 
>> out with a lot of garbage characters from what I guess is the us-ascii 
>> codepages encoded into HTML entities and the .html are labeled with 
>> content-type=use-ascii.
>>
>> AYJ
>>
>> -
>> This SF.Net email is sponsored by the Moblin Your Move Developer's 
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win great 
>> prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the 
>> world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> ___
>> Translate-pootle mailing list
>> Translate-pootle@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/translate-pootle
> 


-- 
Please be using
   Current Stable Squid 2.7.STABLE5 or 3.0.STABLE10
   Current Beta Squid 3.1.0.2

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] search by phrase in Pootle

2008-11-27 Thread Lars Kruse
Hi,

> I installed Pootle 1.2 and the code line seems different. (the first 
> make_query is located on the line 914)
> I changed the parameter of first make_query in indexsearch to "True". 
> However it didn't work well. The result is neither OR nor AND. After 
> changing, I only got strings which contains 'pootle'.

my fault - I did not have a test setup at hand, thus my guessing went wrong ...


> > Would AND be the desired behaviour for the text search field?
> >   
> 
> I think so, if you input 'pootle server' in search window, you should 
> only get the result containing 'pootle server'.
> AND was default behavior in previous versions of Pootle.

I just commited revision 9020. It splits the search input into words (separated
by whitespace) and appends each word to the "AND" query,
I tested it with a xapian engine and with Pootle and the toolkit at revision
8822 due to some issues, that I did not investigate at that moment. But I
assume, it should work well with HEAD, too.

Just to make sure, that I did not neglect anything: is a simple "split" call the
right approach to separate words in a language neutral way?
(see line 1020 in Pootle/projects.py)


Just to clarify the current behaviour of the search field:
1) every word search is "partial" and case-insensitive - thus "poot" will find
"Pootle"
2) Multiple words get splitted into single words. The single queries are
partial, too. They are combined by "AND".
3) The order of multi-word queries does not matter: a search for "admin pootle"
will return "Pootle Languages Admin Page" (and others).
4) Multiple word input can be a mixture of source and target strings: a search
for "remove sprache" will return "Remove Language" which is translated to
"Sprache loeschen"

Do you think, that this detailed description of the search processing would be
suitable for the "searching" wiki page[1] of Pootle? Then I could add it
there ...


I would appreciate any comments!

regards,
Lars

[1] http://translate.sourceforge.net/wiki/pootle/searching

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle