Hi Alan,

I have been assigned the issue 51772, Quotes in Hebrew workbreaking don't work during spellcheck, I will work with Writer team to fix it in next release.

Thanks,
Karl.

On 2008年11月07日 02:48, Alan Yaniger wrote:
Hi Karl,

Thanks for you help and your patience. Your solution helps regarding work-breaking when I travel between words with Ctrl-Arrow Left and Ctrl-Arrow Right. The word L"HZ is treated as one word in this respect. However, regarding spellchecking it is still split into two words at the quote mark. I was able to overcome this only with the following patch to the function endOfScript():

--- breakiteratorImpl.cxx       14 Aug 2008 16:27:31 -0000      1.27.8.1
+++ breakiteratorImpl.cxx       7 Nov 2008 10:39:26 -0000
@@ -301,7 +301,8 @@
        sal_Int32 strLen = Text.getLength();
               sal_uInt32 ch=0;
        while(iterateCodePoints(Text, nStartPos, 1, ch) < strLen ) {
-            sal_Int16 currentCharScriptType = getScriptClass(ch);
+ sal_Int16 currentCharScriptType = ch == 0x22 ? ScriptType::WEAK : getScriptClass(ch); if(ScriptType != currentCharScriptType && currentCharScriptType != ScriptType::WEAK)
                break;
        }

It seems that the quotation mark has script type LATIN, and regarding spellchecking, the script change from Complex to Latin overrides the work-breaking rule I've set in data/dict_work_he.txt.

Do you agree with my analysis?

By the way, I was able to get my desired results without having to change the rule to data/dict_word_he.txt, since dict_word_he.txt already had altered the rule for $MidLetter to include a quotation mark as follows:

35c31
< $MidLetter = [[:name = APOSTROPHE:] [:name = GRAVE ACCENT:] \u0084 [:name = SOFT HYPHEN:] [:name = MIDDLE DOT:] [:name = GREEK TONOS:] [:name= FULL STOP:]
---
> $MidLetter = [[:name = QUOTATION MARK:] [:name = APOSTROPHE:] [:name = GRAVE ACCENT:] \u0084 [:name = SOFT HYPHEN:] [:name = MIDDLE DOT:] [:name = GREEK TONOS:] [:name= FULL STOP:]
37,39c33

In fact, even before the change I tried to dict_word_he.txt, traveling between words containing quotes in Hebrew was working. The problem was only with spellchecking.

My code is kind of a hack, though, and if I'm correct, I'd appreciate your input about the best way to code a fix.

Thanks,
Alan

Karl Hong wrote:
Hi Alan,

You also need to remove Hebrew script from $ALetter, otherwise another general rule may take charge on Hebrew word.

$Hebrew = [:Script = Hebrew:];
$DoubleQuote = \u0022;
$Hebrew+ $DoubleQuote $Hebrew+;

$ALetter = [\u0002 [:Alphabetic:] [:name= COMMERCIAL AT:] [:name= HEBREW PUNCTUATION GERESH:]
                          - $Ideographic
                          - $Katakana
                          - $Hangul
                          - $Hebrew
                          - [:Script = Thai:]
                          - [:Script = Lao:]
                          - [:Script = Hiragana:]];

Regards,
Karl.

On 2008年11月06日 11:44, Alan Yaniger wrote:
Hi Karl,

I've added mt rule to dict_word_he.txt, and rebuilt, but still no change. Is the syntax of my rule OK?

Alan

Karl Hong wrote:
Hi Alan,

There is a dict_word_he.txt in the directory, it will be used for Hebrew text, you need to add the rule in this file.

Thanks,
Karl.

On 2008年11月06日 11:15, Alan Yaniger wrote:
Hi Karl,

Thanks for your help, but I'm still having difficulty.

I've added the following at the end of data/dict_word.txt:

$HebrewLetter   = [\u05d0-\u05ea];
$DoubleQuote         = \u0022;
$HebrewLetter+  $DoubleQuote  $HebrewLetter+;

but it doesn't have any effect. If I have a word like L"HZ, the spell checker still marks HZ as a separate word. Is there something wrong with my syntax?

Alan

Karl Hong wrote:
Hi Alan,

I would suggest you write a rule in data/dict_word.txt, something like

<hebrew_letter>+<quotation_mark><hebrew_letter>+;

it means a Hebrew word is one or more Hebrew letters, following by a quotation mark, and following by one or more Hebrew letters. for rule syntax, check ICU user guide

http://icu-project.org/userguide/boundaryAnalysis.html

Regards,
Karl.

On 2008年11月05日 11:11, Alan Yaniger wrote:
Hi Karl,

I'm trying to address issue 51772. Single or double-quotes are used in Hebrew within a word to specify the sound "j" or acronyms, respectively. At present, they are considered as word breaks during spellchecking, because their script type is not COMPLEX, but LATIN. endOfScript() treats this script change as a word break, but in Hebrew it's not. I'd like to avoid having double quotes within a word () as a word break, while treating them as a word break if they are at the beginning or end of a word (preceded or followed by whitespace, beginning or end of a paragraph).

Alan

Karl Hong wrote:
Hi Alan,

ScriptType breakiterator is not controlled by language, but Unicode script type definition. It does not like character/word/sentence/line breakiterators, which can be customized by language, only one script type breakiterator for all languages.

What would you like to do with endOfScript for Hebrew exactly?

Regards,
Karl.

On 2008?11?05? 06:41, Mathias Bauer wrote:
Hi Alan,

Alan Yaniger wrote:


Hi list-members,

For Hebrew text, I would like to override the BreakIteratorImpl::endOfScript() function.

I tried:

- writing a Breakiterator_he class (with hxx and cxx files) ,
- I added it to the SLOFILES section of makefile.mk,
- I added it to the instances array in registerservices.cxx
- I rebuilt OOo

But it's still not getting called from the Writer code I'm testing it with.
What should I do differently?

In case you don't get an answer here, I think you should try to repeat
your question on the sw dev-list.

Regards,
Mathias




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to