Feature Requests item #1469300, was opened at 2006-04-12 07:30
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=352439&aid=1469300&group_id=2439

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: customizable regular expression support

Initial Comment:

Hi all there  :-)

I would like to have better support for regular
expressions in Scintilla library. 

I use Scintilla as a part of larger project (text
editor of my dreams :-) 

In the project I need to use regexp outside of the
Scintilla controls as well. Also all the text is
converted to UTF-8 when editted and loadied into the
Scintilla control.

So it would be great to provide a way how to force
Scintilla to use the same regexp library I use (BTW
it's PCRE) instead of the built-in regexp support for
two reasons:
  1. Built-in regexp does not support UTF-8 properly.
  2. I need to use regexp outside of the Scintilla
control as well. I think to have two different
implementations of regexp in one program is just bad thing.

I don't know if there is any effort in this direction
(I found only some post in some very old dicussion
which noted pcre). If not I'm ready to try to implement
this functionaility myself and post some patch or
something.

Short description of my suggestion (written just after
short investigation into Scintilla's sources so please
don't beat me if I misunderstood something):

(step 1) Rewrite class CellBuffer so that chars and
styles are stored in two separatly allocated blocks of
memory (each one using the same schema: two block of
valid bytes diveded by gap). The current interface of
the class would be kept unchanged and only one new
method added. The new method would be read-only
relative of GetCharRange() - it would return pointer to
the inside of the CellBuffer (after calling gapTo(0)). 

Maybe this is the most controversial step of the
suggestion. It can make Scintilla somewhat slower
becuase rule of locality is crossed - styles related to
their chars are far away in memory so CPU cache will be
less effective.

On the other side it would allow simpler calling any
3rd party regexp functions which usually work with
array of chars only. (No more need for class
CharIndexer or temporary buffers). As a side effect it
would also allow to save memory when no styling is used.

Any better idea how to solve the CPU cache problem is
wellcome.

(step 2) Create new type and messages for registering
callback functions implementing the regexp support.
Probably it would be something similar to this:

struct SCRegExp {
   ... // pointers to functions
};

struct SCRegExp* SCI_GETREGEXPSTRUCT()
SCI_SETREGEXPSTRUCT(struct SCRegExp * api);
 
Structure SCRegExp would hold pointers to callback
functions provided by caller. The functions would
implement the regexp functionality. 

Prototypes of functions pointed by the structure should
be designed so that it whould be very straightforward
to call Posix regexp or PCRE library from them.
(Requires better analyze comparing what I did so far).

Class Document would have one new member added -
pointer to the struct RegExpApi.

Message SCI_GETREGEXPSTRUCT would return pointer to the
actually registered structure (or NULL if built-in
functions would be used and if you don't want to
provide the built-in functions outside of the Scintilla
library).

SCI_SETREGEXPSTRUCT would set the pointer to the
structure. NULL would reset to the bult-in -- see step(3).

(step 3) Rewrite RESearch so that it would be
compatible with new generalized regexp support i.e.
there would be some static const instance of struct
SCRegExp implementing the built-in regexp support.

(step 4) When searching/replacing with the flag
SCFIND_REGEXP set, the apporopriate messages would use
functions in the struct SCRegExp (either set by
SCI_SETREGEXPSTRUCT or in the built-in struct instance).


So to conclude:

If not set otherwise by SCI_SETREGEXPAPI, Scintilla
would still work the same way it does now (it would use
built-in regexp algorithms) so there should be no
compatibility issue. The bult-in Regexp support would
be only altered to new generalised interface. (The
current interface is internal to Scintilla so it's not
an issue).

By setting new regexp functions by SCI_SETREGEXPAPI,
Scintilla would call the registered functions. In most
common case the functions provided by user would
probably just translate its arguments so it could call
regexp functions to 3rd party library (e.g. PCRE or
Posix regexp library).


Sure it requires a lot of changes in Scintilla's
internals but I do believe it would really make
Scintilla library even better then it is already.

Can you tell me if there is any chance of accepting
such changes into the Scintilla project someday? (I
don't want to manage fork of Scintilla project so I
will code only if you answer "Yes" to this question.)

Also as noted above I'm open to any discussion about
the problem.


Mity
<mity[at]morous[dot]org>

P.S. Anyway great thanks for your work on Scintilla
project.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=352439&aid=1469300&group_id=2439
_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest

Reply via email to