[aspell] Aspell Intergration into Other Programs, Fwd: Aspell and LyX

Kevin Atkinson Wed, 2 Feb 2000 02:10:53 -0800

I just sent this message to the Lyx-Devel.  Names of other projects I
should contact about my intentions would be most appreciated.

-------- Original Message --------
Subject: Aspell and LyX
Date: Wed, 02 Feb 2000 04:59:44 -0500
From: Kevin Atkinson <[EMAIL PROTECTED]>
To: LyX-Developers <[EMAIL PROTECTED]>

Back in February of 1999 I posted A proposal to integrate Aspell into
Lyx.  I attached the relevant parts of the conversation as a text file
for quick review of those who where here and to bring those new to this
list up to speed.

However since then I have releases that all to many programs are not
implanting there own spell checker which has suggestion intelligence
about the same of ispell but does not actually use ispell.  THis means
that there is no way to use the better suggestion intelligence of aspell
since there is no way to change the spell checker used.

Unfortunately my spell checker has two barriers against it being adapted
by mainstream Open Source programs.  1) It is written in C++ and all two
many Open Source projects are still in pure C.  2) It is written in very
modern C++ which means it is not the most portable thing in the world.

So, what I would like to now is instead of coming up with a interface
for just LyX I would like to come up with a pure C interace/library
which will use aspell if it is available and if not use Ispell.

Are you up to working with me on designing such an interface?  I will
handle the Aspell interface while I will late someone else handle the
ispell interface.  I will also need lots of help because I have no clue
how to dynamically load code at run time.

The code you write for this library will need to be under the LGPL as
also want commercial programs to be able to use it.  My eventual goal is
to have ALL programs use this library instead of either using ispell
directory through a pipe or writing a spell checker of there own.

Other places where I should post my intentions would be appreciated as I
want this project to get as much exposure as it can.
-- 
Kevin Atkinson
[EMAIL PROTECTED]
http://metalab.unc.edu/kevina/

Date: Wed, 11 Nov 1998 12:05:02 +0100
From: Asger Alstrup Nielsen <[EMAIL PROTECTED]>
To: Kevin Atkinson <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: LyX 1.0 and Aspell

Hi!

I'm forwarding this to the LyX list in the hope that somebody will implement
this small and useful feature request.

> Would it be possible to incorporate an option in the (pre-)Release
> version of LyX that will allow the user to chose the spell checker
> command somewhere in the spell checker options dialog box?  This way
> people can use my spell checker with out having to rename ispell or some
> sort of other fancy trick.

The easy solution is to provide a lyxrc command which can specify the spelling
command.  Could somebody do that?  It should take half an hour if you have the
sources compiled and set up for work.

Date: Mon, 28 Dec 1998 15:05:17 +0100 (MET)
From: Jean-Marc Lasgouttes <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Patch for LyX to work better with Aspell

>>>>> "Kevin" == Kevin Atkinson <[EMAIL PROTECTED]> writes:

Kevin> Here is a patch that will allow Aspell to learn from users
Kevin> mistakes when used with LyX.  All it does is store the
Kevin> replacement pairs when it detects that aspell is being used
Kevin> instead of Ispell.

Kevin> I would appreciate it if you could apply the patch to the 1.0
Kevin> branch because the change is minor.  However I will understand
Kevin> if you think it is two major of a change.

Hello,

I added your patch to the 1.0 since it looks simple enough. Thanks.

Date: Tue, 16 Feb 1999 04:00:35 +0000
From: Kevin Atkinson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Aspell and LyX 1.1

Hi.  There I was wondering if you are still interested in using Aspell
(http://metalab.unc.edu/kevina/aspell) as the new LyX 1.1 spell checker.
I would be willing to help you out if you would point me in the right
direction.

The reason I ask is because I would like Aspell to incorporated in at
least one large project before I am conferrable with realizing it to
version 1.0.

The interface is still is a current state of flux however it should
stabilize soon.

Early feed back on what sort of things you are looking for would be more
than appreciated.

From: Jean-Marc Lasgouttes <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

It could be nice as long as it is optional (IMO). I guess we should
have a generic spellchecker interface that is plugged at compile time
to either ispell, aspell (library version, I guess) or KSpell (for
klyx). However, I think we have to keep the support for plain old
ispell. 

From: Kevin Atkinson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

Some of the fancy things you will soon be able to do in aspell that you
can't do (or will be very difficult to do) in ispell.  

1) Have a diffrent "ignore all" word list for each document so that you
won't keep aving to press ignore for special words you are not willing
to insert into you personal dictiionares.

2) Skip over url's, host names, and email addresses.

3) Intellegenly spell check code and mathematics (aspell will figure out
which words are variable names and skip over them.  See the mailing list
archive for how I plan on doing this)

4) Learn from users misspellings

5) Finally a much better suggestion strategy.

Your current code will allow aspell to do #5 correctly. 

I have submitted code for #4 however it has a few problems.

Aspell can't do #2 because you insist on sending things one word at a
time which breaks up url's and the like.  

Being able to do #3 will require a prescan of the document with all the
symbols in tack and with out any sort of artificially breaking up of the
text like you currently do.  

And #1 will requore you store word lists with the document.

So basically in order to support Aspell in the fullest your current
spell checker code will require a major rewrite.

From: Asger K. Alstrup Nielsen <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

> Kevin> So basically in order to support Aspell in the fullest your
> Kevin> current spell checker code will require a major rewrite.
> 
> Anyway, the spellchecker code needs a major rewrite. By `support', do
> you mean a pipe-based interface as we have now, or the use of aspell
> as a library?

Kevin, you are more than welcome to rewrite the spell checking
interface in LyX.
The requirements are simple to present:  All of what the current
spell checker can do, and a few other additions:

1) Local words.
2) Easier support for different spell checkers. 
   (on other platforms, such as windows, there is probably a 
    system API for this.)
3) Hide the spell checker communication.

Ideally, I'd like to have an interface where we pass a
string const & of words that we want to spell check,
and get a vector<pair<string const &, vector<string const &> > back,
where each misspelled word in the string has been mapped to a list of 
potential replacement words.  (The current restriction that we only 
spell check one word at a time should be lifted, because this is
unnecessarily restrictive.  For instance, the spell checking interface
should also be flexible enough for grammar checking.)

All the behind-the-scenes communication with the spell checker should
be hidden from the user.

If you feel up to it, present a design here, and we can comment on it
before you implement it.

Date: 18 Feb 1999 20:50:10 +0100
From: Lars Gullik Bj�nnes <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

  >> Kevin Atkinson writes:
  KA> 1) Have a diffrent "ignore all" word list for each document so

This is planned, and is also fairly easy to do for ispell.

  KA> 2) Skip over url's, host names, and email addresses.

When we have character styles, this will be easier.

  KA> And #1 will requore you store word lists with the document.

Will be there in 1.1.x

  KA> So basically in order to support Aspell in the fullest your
  KA> current spell checker code will require a major rewrite.

Yes, and this is planned.
(help is good)

From: Kevin Atkinson <[EMAIL PROTECTED]>
To: "Lars Gullik [iso-8859-1] Bj�nnes" <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

"Lars Gullik Bj�nnes" wrote:
> 
>   >> Kevin Atkinson writes:
>   KA> 1) Have a diffrent "ignore all" word list for each document so
> 
> This is planned, and is also fairly easy to do for ispell.

Ok.  But how do you manage multiple documents.  Do you have a seperate
ispell process for each open document or do you load and unload the word
list before spell checking a specific document.    Unloading and loading
will work fine if you don't plan to do spell checking while you type. 
If you plan to do spell checking wiile you type you would almost
certanly need a seperate process for each open document.  Well I guess
you could load and unload the word list each time you change documents. 
But them how will you handle having multiple docvuments visable at once.

Aspell avoids this problem by having detachable dictionaries.  Thus you
can have multiple Aspell classes which share the main word list.  Each
of these Aspell classes can also have a separate "ignore all"
dictionary.  In fact with aspell you can have as many dictionaries as
you like.  All of them being completely detachable.

From: Lars Gullik Bj�nnes <[EMAIL PROTECTED]>
To: Kevin Atkinson <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

  >> Kevin Atkinson writes:
  KA> Ok. But how do you manage multiple documents. Do you have a

For Ispell we will most likely have to use multiple processes. Note
that I said "easy for ispell". I did not say elegant :-)

  KA> Aspell avoids this problem by having detachable dictionaries.

This sounds really nice.

  KA> So basically in order to support Aspell in the fullest your
  KA> current spell checker code will require a major rewrite.

Until aspell is the defacto speller in the unix world we will need to
have an interface to ispell too. and hopefully we will be able to have
an abstract intervace to the different spelling processes.

From: Jean-Marc Lasgouttes <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

Kevin> Ok.  But how do you manage multiple documents.  Do you have a

We had plans for multiple ispell processes, but the reason was rather
multi-language documents support. Somewhat related, I guess.

JMarc

From: Kevin Atkinson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

Jean-Marc Lasgouttes wrote:

> We had plans for multiple ispell processes, but the reason was rather

Having multiple ispell processes with the same language will cause
problems with their personal dictionaries because when ispell saves its
personal dictionary it simply writs the information to disk.  If the
personal dictionary changes sense the process started it will over right
the changes.  This means that if you have two ispell process and in both
processes the personal dictionary was changed only one of the two
modified personal dictionaries will be saved to disk because the two
ispell processes are unaware of each other and will blindly over right
the changes the other one made. 

How do you plan on dealing with this?

From: Lars Gullik Bj�nnes <[EMAIL PROTECTED]>
To: Kevin Atkinson <[EMAIL PROTECTED]>
Cc: Garst R. Reese <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

with some hacking we can make it _almost_ foolproof. But for ispell
processes spawned from outside lyx there will stil be a prob.

From: Jean-Marc Lasgouttes <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Aspell and LyX 1.1

Kevin> Having multiple ispell processes with the same language will

Are we absolutely forced to offer the possibility of spellchecking two
documents at the same time? It does not look like a feature I'd be
killing for.

From: Kevin Atkinson <[EMAIL PROTECTED]>
To: "Lars Gullik [iso-8859-1] Bj�nnes" <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: Aspell LyX 1.1 and Exceptions

"Lars Gullik Bj�nnes" wrote:
>
>   KA> Now do you want the LyX spell checker interface to also throw
>   KA> exceptions (with all of them steaming form a common base class
>   KA> such as lyx_spell_error) or do you rather it catch all thrown
>   KA> exceptions and toggle an error flag or something similar.
> 
>   KA> It doesn't really make a difference. If you wish for it to throw
>   KA> exceptions it will also throw them when an ispell process
>   KA> returns an error code.
> 
> If we allow it to use exceptions we limit the range of usable
> compilers a great deal. Then we can throw out support for gcc 2.7.x at
> once.
> 
Ok than I take it you don't wan to use exceptions as you still wish to
support gcc 2.7.x?  Thats fine.  

In that case aspell support will only be compiled in if a comptable
compiler is used such as egcs...  Otherwise it will use ispell....

Date: Sat, 13 Mar 1999 02:04:54 -0500
From: Kevin Atkinson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: LyX New Spell Checker Interface

Here is a rough outline of what I had in mind for LyX new Spell
Checker.  Let me know what you think.  I will right the interface to
aspell and rely on someone else to write the interface for Ispell.
Sorry I took so long to write something out.

class SpellChecker;

class DictManager {
public:
  void add_sc(SpellChecker *); 
  void remove_sc(SpellChecker *);
  SC_Error save_wls(); //save all word lists;
  // a bunch of other dictionary management methods
};

class SpellChecker {
  typedef Itr    ... // A forward (but preferably bidirectional) iterator
  typedef EndItr ... // n iterator such that if i an Itr and e is an EndItr
                     // i == e if and only if i is at the end of the iterator
                     // range.  This can be the same as Itr and in most cases
                     // it probably is.

public:
  SpellChecker(DictManager *);

  void set_language(const string &lang);
  string language() const;

  void restart(Itr c, EndItr e); 
  // starts or restarts the process with a new iterator pair.  Will stop 
  // when it encounters a misspelling or reaches the end.
  // If spell checking has already started c must be within one word of
  // where the spell checker stops.  If you need to skip over an area
  // use the scan method.
  void skip_word();
  // skip past the current mispelled word
  void continue();
  // continues the process.  Will stop when it enconters a misspelling

  string  word() const;
  // returns the misspelled word.
  Itr     word_begin() const;
  // returns an iterator pointing to the beggining of the misspelled
  // word
  Itr     word_end()   const;
  // returns an iterator pointing to the end of the misspelled word
    bool at_end() const;
  // returns true if the spell checker reached the end of the iterator
  // range.

  void scan_ahead(Itr stop);
  // skips to position stop gatering any nessary state information.
  void reset();
  // Resets all state information.
  void scan(Itr begin, Itr stop, EndItr end);
  // Scans from begin until end gatering any nessary state
  // information.  This has the potential of being much more efficent
  // if Itr is bidirectional

  // Note: In order to properly support some of Aspell advanced spell
  // checking modes it is important that you use the above three
  // methods to move around the document.

  Itr    cur() const;
  // returns the location where the spell checker will resume checking
  EndItr end() const;
  // returns the end iterator

  WordList suggestions();
  // returns a list of suggestions for the current word

  void add_personal(const string &word);
  // add a word to the personal word list
  void add_session(const string &word);
  // add a word to the session or "ignore all" word list

  void save_all_wls();
  // save all relevent word lists

  void clear_session();
  // clear the session word list

  bool ignore_replacements();
  bool ignore_replacements(bool);

  void store_replacement(const string &cor, bool memory = true);
  // if ignore_replacements is not set return store the replacement pair
  // the memory parameter should be ignored
};

To give you an idea of how the spell checker works lets assume we are
spell 
checking this paragraph:

  This is a stupid exampe as I, Kevin Atkinson, cant think of
  intelligent to say.  Anothr stupid.  The closing sentence.

Let sc be the spell checker method, and i be an iterator pointing to
thge beginning of the paragraph, and e be the end.

We first need to reset the spell checker
  sc.reset();
No we need to start it at the beginning.
  sc.restart(i, e);
No we need to find where we are so 
  i = sc.word_begin();
Ah we are at "exampe".  So lets get some suggestions.
  sugs = sc.suggestions();
Ok we wan't to replace it with example so we educate the spell checker
of
out choice
  sc.store_replacement("example");
And then we make the replacement in the document.  However doing this 
replacement invalidated the iterator so we need to restart the scan
with.
  sc.restart(i, e);
Where i is the location right before "example".
Now the next misspelling is "Atkinson" However we wan't to ignore this
word so we simply skip over it
  sc.skip_word();
and continue on
  sc.continue();
Now it stops at "Anothr".  However the user say this sentense was a
fragment and just want's to skip over it. So we set i to the end of the 
sentence and use
  sc.scan_ahead(i);
Even though you could just restart at i you shouldn't as Aspell might
need to gather some state information along the way (this is a poor
example, A better example would be restrting in the middle of s
sentence or such).  So in order to skip over a region always use 
scan_ahead.  
Now we continue on.
  sc.contunue(); 
We don't use restart becuase cur() is already at where we wan't to start
from the scan_ahead method.
And we discover that
  sc.at_end() is true so we stop.

Date: Sun, 14 Mar 1999 14:31:14 +0100
From: Asger Alstrup Nielsen <[EMAIL PROTECTED]>
To: Kevin Atkinson <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
Subject: Re: LyX New Spell Checker Interface

> Here is a rough outline of what I had in mind for LyX new Spell
> Checker.  

Thank you for taking the time to do this!

> Let me know what you think.  I will right the interface to
> aspell and rely on someone else to write the interface for Ispell.

Ok, that sounds fair to me.

I have some comments to the code, and then some comments to the design as such.

[Note: comments on stylistic, such as naming, things removed]

What is this DictionaryManager used for?

>   typedef Itr    ... // A forward (but preferably bidirectional) iterator
>   typedef EndItr ... // n iterator such that if i an Itr and e is an EndItr

I don't think you should distinguish between the two kinds of iterators.  An
iterator can point to any element in the list, and at the end of the list.

The spellchecker should work on strings.  The LyX data structure will be small
strings, so you should simply use LString::const_iterator to make life simplest
for us.

>   void set_language(const string &lang);
>   string language() const;

I think we need methods to define the encoding of the data.
LyX can provide MIME-type encoding flags for the spell checker.  
This might be needed in the future, so add:

        void set_encoding(string const & encoding);
        string encoding() const;

>   void continue();
>   // continues the process.  Will stop when it enconters a misspelling
> 
>   string  word() const;
>   // returns the misspelled word.
>   Itr     word_begin() const;
>   // returns an iterator pointing to the beggining of the misspelled word
>   Itr     word_end()   const;
>   // returns an iterator pointing to the end of the misspelled word

Maybe you could use a pair instead:

/// Returns a range that surrounds the misspelled word
pair<LString::const_iterator start, LString::const_iterator end>
word_boundary() const;

>   bool at_end() const;
>   // returns true if the spell checker reached the end of the iterator range.

Since later you present the "cur" and "end" methods, there is no need for this
one.  The user can just do "if (spellchecker.current() == spellchecker.end()) {
.. }"

>   void reset();

What is this used for?

>   void scan(Itr begin, Itr stop, EndItr end);

What is the purpose of this method "scan"?

>   bool ignore_replacements();

What is this ignore_replacement state?

The design presents an interface on words.  I think the design is fairly
complete.  However, the stuff about scanning, skipping and all that is a bit
complicated.  Why is this needed?

In general, I prefer to have a minimal interface.  The one you present has many
methods that overlap.  We should try to cut these to a minimum.

Also, the language and encoding of a spellchecker is probably fixed.  Can a
spell checker change disctionary?  I don't think ispell can, so we can't assume
that.

So I think we can get away with just passing these in the constructor of the
spellchecker:

SpellChecker(string const & language, string const & encoding);

We need to standardize the language strings.  Maybe we should use the two
letter ISO codes (us, de, dk, au, en)?

We don't need access methods for asking the spellchecker which language and
encoding it is.

So based on your design, here is my proposal:

class SpellChecker {
/** Create a spellchecker with given language and encoding.
    Also, a bunch of spell checker specific parameters can
    be specified. */
SpellChecker(string const & language, string const & encoding,
                 string const & parameters);

/** What is the status of the spell checker?
    Before you use a spell checker, you have to make sure
    that it is ok.  The user might not have any spell checker
    installed, so we have to return an error string in this
    case.
    If the spell checker is ok, we return an empty string here.
 */
string errorStatus() const;

/** Define the string the spell checker should work on.
    If it already is started, "start" must be within one
    word of where the spell checker stopped last time.  If you need 
    to skip over an area of the string, use the moveTo method.
 */
void set_string(LString::const_iterator start,
                    LString::const_iterator end);

/** Starts or restarts the spell checking.
    The spell checker will stop when it encounters a misspelling 
    or reaches the end of the string.
 */
void spellcheck();

/// Skip the current word
void skipWord();

/** Return the current misspelled word.
    If we reached the end of the string, the returned string is
    empty. 
  */
LString word() const;

/// Returns a range that surrounds the misspelled word
pair<LString::const_iterator, LString::const_iterator>
word_boundary() const;

/** Skips to given position.
    This method is necessary because the spell checker
    might gather state information. 
  */
void moveTo(LString::constr_iterator position);

/// Returns the location where the spell checker is in the string
LString::const_iterator current() const;

/// Returns the location where the spell checker will end
LString::const_iterator end() const;

/// Returns a list of suggestions for the current word
vector<LString const &> suggestions();

/// Add a word to the personal word list
void addPersonal(LString const & word);

/// Add a word to the session or "ignore all" word list
void addSession(LString const & word);

/// Add a global replacement
void addReplacement(LString const & word, LString const & replacement);

/// Save all relevant word lists, replacements, etc.
void save();
};

Date: Sun, 14 Mar 1999 16:31:31 -0500
From: Kevin Atkinson <[EMAIL PROTECTED]>
To: Asger Alstrup Nielsen <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: LyX New Spell Checker Interface

Asger Alstrup Nielsen wrote:

> What is this DictionaryManager used for?

To avoid having to have seperate master word lists in memory when using
multiple documents amoung other things.

> The spellchecker should work on strings.  The LyX data structure will be small

Fine than make the typedef for Itr and EndItr to a
LString::const_iterator.  Also my spell checker doesn't like it very
much when you break things up.  How about making an iterator class that
will automatically treat multiple strings as if they where one unit.  It
will not be that difficult and it may actually make things simpler for
you.

The have my reasons for EndItr however they are not that important.

> I think we need methods to define the encoding of the data.

Yes I forgot about that. 

> Maybe you could use a pair instead:

Ok if you like it better.

> >   bool at_end() const;
> 
> Since later you present the "cur" and "end" methods, there is no need for this

That is long and ugly in my view.

> >   void reset();
> 
> What is this used for?

Needed when you restart the spell checker at the beggining.  Natually
all state information should be reset.

> >   void scan(Itr begin, Itr stop, EndItr end);
> 
> What is the purpose of this method "scan"?

When you want to restart the spell checker in the middle of the
document.

> What is this ignore_replacement state?

It is not really needed.  It is used when you for some reason don't want
to store replacement pairs.

> complete.  However, the stuff about scanning, skipping and all that is a bit

Ok.  Suppose you have a spell checker mode when you spell check all
comments of you code:

/* This is a sample block to spell check.
   And here is another sentence.
   etc... */
int main() {
  cout << "Hellow Word\n";
}

Now if the spell checker start at say "another" is this in a document
how does it now if it is in a comment?  It doesn't.  It has to scan from
the begginning for the /* string.

> In general, I prefer to have a minimal interface.  The one you present has many

Which ones? Other than the is_end?  Some of those methods are there for
speed and flexibility.

> Also, the language and encoding of a spellchecker is probably fixed.  Can a

That is what the dictionary manager class is.

> So I think we can get away with just passing these in the constructor of the

Fine if you don't want to one day support multilingual documents.  It
should be fairly eazy  to pull this of with both ispell and aspell with
the help of the dictionary management class.

> We need to standardize the language strings.  Maybe we should use the two

Could you give me a refrence?

> We don't need access methods for asking the spellchecker which language and

They don't do any harm.  And also I may what this interface to work with
other projects.

> So based on your design, here is my proposal:

Use Typedefs instead of hardcoding LString everwhere!  I will budge on
other things but NOT this!

> class SpellChecker {
> /** Create a spellchecker with given language and encoding.
>     Also, a bunch of spell checker specific parameters can
>     be specified. */
> SpellChecker(string const & language, string const & encoding,
>                  string const & parameters);

See my note above.  Putting all this in the constructer loses
flexibility.

The rest has some problems.  But most of them are covered above.

Date: Sun, 14 Mar 1999 20:12:30 -0500
From: Kevin Atkinson <[EMAIL PROTECTED]>
To: Asger Alstrup Nielsen <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: LyX New Spell Checker Interface

Asger Alstrup Nielsen wrote:

> The spellchecker should work on strings.  The LyX data structure will be small

Only giving the Spell Checker small strings at a time will work for a
simple minded spell checker that only want to look at a document a
word at a time however it will not work for spell checkers that want
to be able to see the entire document at once.  Aspell is going to
eventually want to see the entire document at once in order to support
some advance skipping and suggestion algorithms.  Two such algorithms
include Word skipping by context (see
http://franklin.oit.unc.edu/cgi-bin/lyris.pl?visit=aspell&id=79941057)
and suggesting close matches that exist elsewhere in the document
before looking for matches in the dictionary.  Now, in order for these
algorithm to work Aspell will need to be able to first have a prescan
of the document to build a database of the words in the document.

In order to get this prescan Aspell will need to iterate to the end
of the document before it returns anything.  If you only give it small
segments of the document at once there is no way aspell can do this.
Thus Aspell needs a continuous iterator that will represent the entire
document.

Writhing such an iterator is not that difficult providing you have
some sort of container where are the individual little strings are
held:

class doc_iterator {
private:
  typedef ...                           StringCollectionItr;
  typedef StringCollectionItr::iterator StringItr;
  // StringCollectionItr is an bidirectional iterator of pointers to
strings
  StringCollectionItr strs_begin_;
  StringCollectionItr strs_cur_;
  StringCollectionItr strs_end_;
  StringItr begin_;
  StringItr cur_;
  StringItr end_;
public:
  doc_iterator(StringCollectionItr bbegin, StringCollectionItr eend);
  jump_to(StringCollectionItr strs_current, StringItr current);
  doc_iterator& operator++() {
    ++cur_;
    if (cur_ == end_ && strs_cur_ != strs_end_) {
      ++strs_cur_;
      begin_ = strs_cur->begin();
      cur_ = begin_;
      end_ = strs_cur->end();
    }
    return *this;
  }
  doc_iterator operator++(int) {
    doc_iterator temp = *this;
    operator++;
    return temp;
  }
  doc_iterator& operator--() {
    if (cur_ == begin_ && strs_cur_ != strs_begin_) {
      --strs_cur_;
      begin_ = strs_cur_->begin();
      end_ = strs_cur_->end();
      cur_ = end_;
      --cur_;
    } else {
      --cur_;
    }
    return *temp;
  }
  doc_iterator operator--(int) {
    doc_iterator temp = *this;
    operator--;
    return temp;
  }
  const char & operator*() const {return *cur_;}
  StringCollectionItr string_collection_iterator() const {return
strs_cur_;}
  StringItr string_iterator() const {return cur_;}
};

And when ever you need to find the actual location all you need to do
is call the string_collection_iterator() or string_iterator() methods.

However this does not take into consideration the fact that you may
need to provide space between the strings if you strings are like this.
"This is a"
"dog jumping over a fence."

However that is also quite possible to do.  It just won't be as simple
and you willFrom: Asger K. Alstrup Nielsen <[EMAIL PROTECTED]>

Date: Mon, 15 Mar 1999 10:30:53 +0100 (MET)
To: Kevin Atkinson <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: LyX New Spell Checker Interface

> Only giving the Spell Checker small strings at a time will work for a

I see and acknowledge the relevance of this.

> Writhing such an iterator is not that difficult providing you have

Yes, this is a good solution, and very useful in other situations
as well.  Consider the LyX document one giant string from now on.

 have to return "char" instead of "const char &" when the
iterator is dereferenced.

Date: 15 Mar 1999 18:37:31 +0100
From: Lars Gullik Bj�nnes <[EMAIL PROTECTED]>
To: Kevin Atkinson <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: LyX New Spell Checker Interface

  KA> "Lars Gullik Bj�nnes" wrote:

  >>  Is it Aspell that needs it, or your interface to aspell?

  KA> What Aspell Needs. Well it doesn't really need it it is just
  KA> that it has the potential to do a better job with access to the
  KA> entire document at once.

I am not convinced that it is possible/easy to see a complete lyx
buffer/document as a long string with context, that would almost be
similar to write out the lyx file, spellcheck that and reload.
Wouldn't it be better to use the context the insets provide more
directly?

Why see more context than a complete paragraph at a time? sometimes I
belive that a singel inset provides all the context you need too.

Date: Mon, 15 Mar 1999 17:45:11 -0500
From: Kevin Atkinson <[EMAIL PROTECTED]>
To: "Lars Gullik [iso-8859-1] Bj�nnes" <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: LyX New Spell Checker Interface

"Lars Gullik Bj�nnes" wrote:

> I am not convinced that it is possible/easy to see a complete lyx

No.  Pay close attention to the iterator model I gave you.  It doesn't
copy a
single thing it just iterator over multiple strings as IF there where
one.  It never every makes a copy of anything except perhaps a single
character. You can always find out where you as it returns iterator
pointing to the real inset and location within that inset it is in.

> Wouldn't it be better to use the context the insets provide more

That would be more complicated.  I am trying to avoid a lot of LyX
specific code.  

> Why see more context than a complete paragraph at a time? sometimes I

Not necessarily.  That is the same philosophy as two digits for the date
is more than enough.  Or 640K of memory if more than any one needs.  Or,
Why would anyone need a color monitor.  

Date: Tue, 16 Mar 1999 00:36:23 +0100 (MET)
From: Asger K. Alstrup Nielsen <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: LyX New Spell Checker Interface

> I am not convinced that it is possible/easy to see a complete lyx

The difference is that the iterator can perform the reverse lookup:
Given a position in the "long string", we can actually find the
exact spot in the original data structure in constant time.
We need this in order to high-light the misspelled words...

This is not easy if we export the lot to a file, and then reload:
We will not be able to map the positions in the file to the
positions in the document representation data structure.

> Wouldn't it be better to use the context the insets provide more

Since I suggest that different font should be in different insets, the
context of an inset can be very small.  I personally considered to
just present each paragraph as one string with an iterator like the
other one, but the added complexity of exposing the entire document
as one string is very small.
Basically, Kevin presented the code that is needed.

Notice that the cost of this kind of iterator is very small:
We just need to hold an BufferIterator inside it.  Thus, the
memory usage is minimal, the performance is optimal, and
the semantics are crystal clear.
Also, all the necessary code can be contained in one header file,
so why not do it?  It enables Aspell to use some advanced spell 
checking routines.

> Why see more context than a complete paragraph at a time? sometimes I

As mentioned, I imagine that a paragraph will be made up from a bunch
of small insets.  So we need some means to collapse all these

Also, having this interface will make search-replace trivial:
We can simply use the STL find, and the STL replace.

One comment to Kevin:

Notice that we need to handle the issue of wide strings...  In LyX 1.1,
the LString is going to be compile-time variable: Either char or wchar_t.
We need to handle this in some way...

Date: Wed, 24 Mar 1999 04:00:17 -0500
From: Kevin Atkinson <[EMAIL PROTECTED]>
To: Asger K. Alstrup Nielsen <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: LyX New Spell Checker Interface

"Asger K. Alstrup Nielsen" wrote:

> Notice that we need to handle the issue of wide strings...  In LyX 1.1,

Except that can you really rely on wchar_t being really wide?  From what
I herd its best to use a type which you know the size and typedef it.

Anyway being able to handle wide strings won't be a several issue.  All
it will take would be an iterator to translate the wide including into
something 8-bit.  The next aspell release will have code very similar to
this.

What type of encoding are your wide charters going to be?

[The rest of this thread contains a bunch of communication back and
forth between lyx developers about creating the iterator and changes in
the inset design]

[aspell] Aspell Intergration into Other Programs, Fwd: Aspell and LyX

Reply via email to