Hi Alan,
I have been assigned the issue 51772, Quotes in Hebrew workbreaking
don't work during spellcheck, I will work with Writer team to fix it in
next release.
Thanks,
Karl.
On 2008年11月07日 02:48, Alan Yaniger wrote:
Hi Karl,
Thanks for you help and your patience. Your solution helps regarding
work-breaking when I travel between words with Ctrl-Arrow Left and
Ctrl-Arrow Right. The word L"HZ is treated as one word in this
respect. However, regarding spellchecking it is still split into two
words at the quote mark. I was able to overcome this only with the
following patch to the function endOfScript():
--- breakiteratorImpl.cxx 14 Aug 2008 16:27:31 -0000 1.27.8.1
+++ breakiteratorImpl.cxx 7 Nov 2008 10:39:26 -0000
@@ -301,7 +301,8 @@
sal_Int32 strLen = Text.getLength();
sal_uInt32 ch=0;
while(iterateCodePoints(Text, nStartPos, 1, ch) < strLen ) {
- sal_Int16 currentCharScriptType = getScriptClass(ch);
+ sal_Int16 currentCharScriptType = ch == 0x22 ?
ScriptType::WEAK : getScriptClass(ch);
if(ScriptType != currentCharScriptType &&
currentCharScriptType != ScriptType::WEAK)
break;
}
It seems that the quotation mark has script type LATIN, and regarding
spellchecking, the script change from Complex to Latin overrides the
work-breaking rule I've set in data/dict_work_he.txt.
Do you agree with my analysis?
By the way, I was able to get my desired results without having to
change the rule to data/dict_word_he.txt, since dict_word_he.txt
already had altered the rule for $MidLetter to include a quotation
mark as follows:
35c31
< $MidLetter = [[:name = APOSTROPHE:] [:name = GRAVE ACCENT:] \u0084
[:name = SOFT HYPHEN:] [:name = MIDDLE DOT:] [:name = GREEK TONOS:]
[:name= FULL STOP:]
---
> $MidLetter = [[:name = QUOTATION MARK:] [:name = APOSTROPHE:] [:name
= GRAVE ACCENT:] \u0084 [:name = SOFT HYPHEN:] [:name = MIDDLE DOT:]
[:name = GREEK TONOS:] [:name= FULL STOP:]
37,39c33
In fact, even before the change I tried to dict_word_he.txt, traveling
between words containing quotes in Hebrew was working. The problem was
only with spellchecking.
My code is kind of a hack, though, and if I'm correct, I'd appreciate
your input about the best way to code a fix.
Thanks,
Alan
Karl Hong wrote:
Hi Alan,
You also need to remove Hebrew script from $ALetter, otherwise
another general rule may take charge on Hebrew word.
$Hebrew = [:Script = Hebrew:];
$DoubleQuote = \u0022;
$Hebrew+ $DoubleQuote $Hebrew+;
$ALetter = [\u0002 [:Alphabetic:] [:name= COMMERCIAL AT:] [:name=
HEBREW PUNCTUATION GERESH:]
- $Ideographic
- $Katakana
- $Hangul
- $Hebrew
- [:Script = Thai:]
- [:Script = Lao:]
- [:Script = Hiragana:]];
Regards,
Karl.
On 2008年11月06日 11:44, Alan Yaniger wrote:
Hi Karl,
I've added mt rule to dict_word_he.txt, and rebuilt, but still no
change. Is the syntax of my rule OK?
Alan
Karl Hong wrote:
Hi Alan,
There is a dict_word_he.txt in the directory, it will be used for
Hebrew text, you need to add the rule in this file.
Thanks,
Karl.
On 2008年11月06日 11:15, Alan Yaniger wrote:
Hi Karl,
Thanks for your help, but I'm still having difficulty.
I've added the following at the end of data/dict_word.txt:
$HebrewLetter = [\u05d0-\u05ea];
$DoubleQuote = \u0022;
$HebrewLetter+ $DoubleQuote $HebrewLetter+;
but it doesn't have any effect. If I have a word like L"HZ, the
spell checker still marks HZ as a separate word. Is there
something wrong with my syntax?
Alan
Karl Hong wrote:
Hi Alan,
I would suggest you write a rule in data/dict_word.txt, something
like
<hebrew_letter>+<quotation_mark><hebrew_letter>+;
it means a Hebrew word is one or more Hebrew letters, following
by a quotation mark, and following by one or more Hebrew letters.
for rule syntax, check ICU user guide
http://icu-project.org/userguide/boundaryAnalysis.html
Regards,
Karl.
On 2008年11月05日 11:11, Alan Yaniger wrote:
Hi Karl,
I'm trying to address issue 51772. Single or double-quotes are
used in Hebrew within a word to specify the sound "j" or
acronyms, respectively. At present, they are considered as word
breaks during spellchecking, because their script type is not
COMPLEX, but LATIN. endOfScript() treats this script change as a
word break, but in Hebrew it's not. I'd like to avoid having
double quotes within a word () as a word break, while treating
them as a word break if they are at the beginning or end of a
word (preceded or followed by whitespace, beginning or end of a
paragraph).
Alan
Karl Hong wrote:
Hi Alan,
ScriptType breakiterator is not controlled by language, but
Unicode script type definition. It does not like
character/word/sentence/line breakiterators, which can be
customized by language, only one script type breakiterator for
all languages.
What would you like to do with endOfScript for Hebrew exactly?
Regards,
Karl.
On 2008?11?05? 06:41, Mathias Bauer wrote:
Hi Alan,
Alan Yaniger wrote:
Hi list-members,
For Hebrew text, I would like to override the
BreakIteratorImpl::endOfScript() function.
I tried:
- writing a Breakiterator_he class (with hxx and cxx files) ,
- I added it to the SLOFILES section of makefile.mk,
- I added it to the instances array in registerservices.cxx
- I rebuilt OOo
But it's still not getting called from the Writer code I'm
testing it with.
What should I do differently?
In case you don't get an answer here, I think you should try
to repeat
your question on the sw dev-list.
Regards,
Mathias
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]