Hi,
I just stumbled over the same problem and then found that you are
already some steps ahead.
The question is what to do with a word not in the dictionary, there are
2 possible reasons.
1. the word is misspelled
2. the dictionary is incomplete
So maybe Lars' aim to improve the Swedish dictionary can be combined
with spell checking.
At first there has to be a list with words not recognized by the OOo
spell checker. This list contains words for one of the above mentioned
reasons and has to be split up in separate lists as they are handled
differently.
Using a 'trusted' dictionary could ease this job to find words which
have not been recognized by the OOo dictionary. The others would have to
get analyzed by hand I guess.
Where I don't have a clue at the moment is how to get the red lined
words out of the *.odt document again?
Is the status saved or do we have to run a script first to add a marker?
By the way if you do not only have a sourcetarball but also a solver
(tarball or built yourself) there is a tool to extract an sdf file
containing one or more languages from all localize.sdf files. In
addition it can also extract source languages which are de and en-US.
its called localize
to extract all French strings simply call
> localize -e -l fr -f all_fr.sdf
(well you need to do a configure first)
Regards,
Gregor
Marcin Miłkowski wrote:
Hi Lars,
Here is another way:
find OOE680_m6 -name '*sdf' | xargs cat |
awk '-F\t' '$10=="sv"{print $11}' | sed 's/~//g;s/\\[nt]/ /g'
Apparently, tilde precedes the underlined shortcut letter in a menus
(E~xport), and the texts contain \n for newlines.
I was removing tildes etc. from the result file, but I kept them in
the source txt file - because you need to find the source faulty
segment, and without knowing the tilde and the rest of "formatting
garbage", you cannot pin down the right sdf. Of course, we could
include the ID and set its style to "no language" in ODF (translating
to ODF is easy in this case, and could be done with awk and zip), but
I started with something much simpler.
In the future, I think that a simple style tagger should be used. Let
me explain: there should be "no language" special style for help tags
etc. so that they would not be checked. Most translation tools support
such things, for example free TortoiseTagger for Word, OmegaT does it,
and MemoQ or Across (all free and/or open source), not to mention
enlasotools (dedicated filter set) but probably awk would be enough
even for XML tagging in the help file. So two files would be needed: a
complete text file (probably with some additional info like IDs), and
tagged ODF file for spell and grammar checking.
Yet I haven't yet started working on that as the schedule is
unrealistically tight for additional translation QA _before_ release
and _after_ integrating the translation. My idea was born out of the
fact that Polish translations had broken characters in latest builds
just because of some faulty conversion to UFT-8, and that would be
detected automatically using spell-check. So this should be a step in
testing before the release, and after integrating the localized strings.
See my proposal:
http://wiki.services.openoffice.org/wiki/Automating_Translation_QA
I have no experience from the tools used in translation. Is anything
like Alchemy Catalyst available as free software? Could such
functionality be built into future releases of OpenOffice? I would
think that OpenOffice has many users who are translators, especially
since the software is adopted in poorer countries where all kinds of
languages are spoken.
Catalyst is free but only in a very restricted version (no way to
create new projects). But it's only one of the tools that's available,
as I mentioned above. Anyway, these tests are quite trivial to
implement using sed, awk and other standard Unix tools which can run
happily in Win32 using cygwin.
Regards,
Marcin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]