Re: [translate-pootle] search by phrase in Pootle
On Vr, 2008-11-28 at 02:18 +0100, Lars Kruse wrote: ... > > > Would AND be the desired behaviour for the text search field? > > > > > > > I think so, if you input 'pootle server' in search window, you should > > only get the result containing 'pootle server'. > > AND was default behavior in previous versions of Pootle. > > I just commited revision 9020. It splits the search input into words > (separated > by whitespace) and appends each word to the "AND" query, > I tested it with a xapian engine and with Pootle and the toolkit at revision > 8822 due to some issues, that I did not investigate at that moment. But I > assume, it should work well with HEAD, too. Thank you for this, Lars. I guess things should work on trunk, but we need to test and confirm at some stage. Do you consider this good enough to backport to the 1.2 branch for the release of 1.2.1? > > Just to make sure, that I did not neglect anything: is a simple "split" call > the > right approach to separate words in a language neutral way? > (see line 1020 in Pootle/projects.py) It is the best we can do without going into lots of work. The bigger question is perhaps how Lucene and Xapian splits/tokenises words. We might want to get closer to that, rather than doing the 100% correct thing. > Just to clarify the current behaviour of the search field: > 1) every word search is "partial" and case-insensitive - thus "poot" will find > "Pootle" > 2) Multiple words get splitted into single words. The single queries are > partial, too. They are combined by "AND". > 3) The order of multi-word queries does not matter: a search for "admin > pootle" > will return "Pootle Languages Admin Page" (and others). > 4) Multiple word input can be a mixture of source and target strings: a search > for "remove sprache" will return "Remove Language" which is translated to > "Sprache loeschen" > > Do you think, that this detailed description of the search processing would be > suitable for the "searching" wiki page[1] of Pootle? Then I could add it > there ... > > > I would appreciate any comments! > > regards, > Lars > > [1] http://translate.sourceforge.net/wiki/pootle/searching I think we can definitely add it there. There are some definite differences between the indexed search and the pogrep search, so it would be good to document them well. I think there is still another way to really improve things: if we can obtain possibly relevant results quickly from the indexer and use a real GrepFilter to filter out the less relevant ones from there. This would get the behaviour much closer to the non-indexed search, but still with a good speedup, I think. Does this sound doable, Lars? Should we try to do that for Pootle 1.2.1? Keep well Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/blurred-vision-beeld - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] html2po with tidy
Pål Eivind Jacobsen Nes wrote: > Hi! > > Without having seen your files; it sounds more like you are seeing the > UTF-8 characters encoded into entities. I would blame tidy for this, and > not Pootle. > > Tell tidy to use UTF-8 for input and output. You're trying to squeeze > two or more bytes (Å) into one (A), and ending up using 6 (Å). > > Don't use anything but UTF-8 for your localization projects! :D > > US-ASCII contains only 128 characters, with all letters from the English > alphabet. Unicode (UTF-8) currently supports more than 100,000 characters. > > Also, UTF-8 is a superset of ASCII, so an ASCII string is a valid UTF-8 > string, you don't need to convert ASCII into UTF-8. > > Thank you yes that was the problem. We had to fully uninstall tidy from the machine in question. While we are able to do this now, it may not always be the case. IMO it would be a great addition to have the option in html2po to ignore tidy on outputs even if its installed. AYJ > - Pål > > Amos Jeffries wrote: >> Greetings, I'm running the Squid Proxy translation project using >> pootle and the toolkit to do the hard yards. >> >> We've encountered a serious problem with the way Pootle 1.2 html2po >> and tidy are interacting. >> >> The .html templates and .po are in utf-8 format. The files appear to >> translate correctly. But after the final filter through tidy they come >> out with a lot of garbage characters from what I guess is the us-ascii >> codepages encoded into HTML entities and the .html are labeled with >> content-type=use-ascii. >> >> AYJ >> >> - >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> ___ >> Translate-pootle mailing list >> Translate-pootle@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/translate-pootle > -- Please be using Current Stable Squid 2.7.STABLE5 or 3.0.STABLE10 Current Beta Squid 3.1.0.2 - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] search by phrase in Pootle
Hi, > I installed Pootle 1.2 and the code line seems different. (the first > make_query is located on the line 914) > I changed the parameter of first make_query in indexsearch to "True". > However it didn't work well. The result is neither OR nor AND. After > changing, I only got strings which contains 'pootle'. my fault - I did not have a test setup at hand, thus my guessing went wrong ... > > Would AND be the desired behaviour for the text search field? > > > > I think so, if you input 'pootle server' in search window, you should > only get the result containing 'pootle server'. > AND was default behavior in previous versions of Pootle. I just commited revision 9020. It splits the search input into words (separated by whitespace) and appends each word to the "AND" query, I tested it with a xapian engine and with Pootle and the toolkit at revision 8822 due to some issues, that I did not investigate at that moment. But I assume, it should work well with HEAD, too. Just to make sure, that I did not neglect anything: is a simple "split" call the right approach to separate words in a language neutral way? (see line 1020 in Pootle/projects.py) Just to clarify the current behaviour of the search field: 1) every word search is "partial" and case-insensitive - thus "poot" will find "Pootle" 2) Multiple words get splitted into single words. The single queries are partial, too. They are combined by "AND". 3) The order of multi-word queries does not matter: a search for "admin pootle" will return "Pootle Languages Admin Page" (and others). 4) Multiple word input can be a mixture of source and target strings: a search for "remove sprache" will return "Remove Language" which is translated to "Sprache loeschen" Do you think, that this detailed description of the search processing would be suitable for the "searching" wiki page[1] of Pootle? Then I could add it there ... I would appreciate any comments! regards, Lars [1] http://translate.sourceforge.net/wiki/pootle/searching - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle