--- Tomas Frydrych <[EMAIL PROTECTED]> wrote: > > Hi Martin, > > > It should not be done in the formatter. We could > do a scan at the > > piecetable level just after the document has > loaded. Do these changes > > without throwing change records so we don't get > into into issues with > > undo. > > > > The Piecetable already has the document properties > and you can keep > > track of the state of left to righted-ness by > examining the properties > > of the frags as you step through the document. > > There reason why I want to do this from the > formatter (within > fl_BlockLayout) is that determining the context is > not that simple, > because not all characters have strong directional > properties; > FriBiDi uses about a dozen of initial 'direction > types' that have to be > resolved by the BIDI algorithm to either RTL or LTR. > This requires > several passes over the data, and it is the job of > the layout engine > to do that. The piecetable stores the logical shape > of the document > (i.e., the sequence of characters in the order in > which the user > inputs them), while the layout engine creates the > visual sequence. > The problem here lies in the fact that in the Word > format the > piecetable is not purely logical representation of > the document, it > contains some stuff that is the product of some > layout engine. > > To avoid undo problems, we could let this one > routine in > fl_BlockLayout to access the PT data directly -- > this is always 1:1 > transformation, so we could just overwrite the > character in memory, > rather than delete it from the piecetable and then > insert the > replacement. > > > Justa caution though, suppose an auther in Hewbrew > wanted to place a > > ")(" in his document, would your algorithim detect > this and not make > > the change? > > That's really not a problem. The algorithm is very > simple, it just > replaces all mirror characters in RTL context with > their mirror > images. It does so based on the knowledge that a > given file format, > in this case Word doc, stores these visually rather > than > semantically.
I guess I don't understand something so please clear it up for me. It looks like you want to put code in that is not part of the importer but which depends on what type of document has just been imported. I think this kind of code, being format-specific should be in the importer. I guess this really is what you are suggesting and I've read it wrong. Is this correct? Andrew Dunbar. > > Also is it possible to quickly detect if a > document has any RTL durng > > import so we don't have to scan ordinary docs? > Not unless the document format explicitely stores > that kind of info > somewhere. > > Tomas ===== http://linguaphile.sourceforge.net http://www.abisource.com __________________________________________________ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com