Le 07/11/2017 à 23:00, Steffen Märcker a écrit :
I am not familiar with the literature on scanners. May I ask you about some details on the "for all characters" algorithms you are referring to?

The two main ones available, from the typical Aho/Ullman textbook, are:

- NFA to DFA conversion (i.e. build a NFA with your regular expressions, then convert it to a DFA)

- Direct regular expression to DFA construction

Both of them have loop of the type:

for (each input symbol a) {
...

Building a (or connecting to) a BDD library would be fun, indeed. But within that time frame it seems not realistic. Anyway, after finishing my thesis, I'd like to come back to that idea.

It would certainly be interesting. Please contact us again when you'll have time :)

Regards,

Thierry

Ciao, Steffen


Am 7. November 2017 16:33:03 MEZ schrieb Andrew Glynn <aglyn...@gmail.com>:

    A possible way to accomplish it would be to use an object graph with
    an incremental query engine, such as EMF/CDO with Viatra or
    something similar.  You could then put different character sets in
    connected objects and query only as far as you need to.

    Andrew Glynn

    Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
    Windows 10

    *From: *Thierry Goubier <mailto:thierry.goub...@gmail.com>
    *Sent: *Tuesday, November 7, 2017 7:17 AM
    *To: *Any question about pharo is welcome
    <mailto:pharo-users@lists.pharo.org>
    *Subject: *Re: [Pharo-users] Binary Decision Diagram Package in
    Smalltalk

    Hi Andrew, Steffen,

    2017-11-07 13:10 GMT+01:00 Prof. Andrew P. Black <bl...@cs.pdx.edu
    <mailto:bl...@cs.pdx.edu>>:


         > On 28 Oct 2017, at 17:37 , Steffen Märcker <merk...@web.de
        <mailto:merk...@web.de>> wrote:
         >
         > Does that mean the sets/bdd would be constructed mainly at
        comile time? Anyway, Andrew, feel free to contact me, I might
        help you with this.
         >

        Thanks for the offer, Steffen!  The problem is that I need to
        use SmaCC for my current project, and really do not have a month
to take off and re-design the way that it builds its scanner. I’ve talked to Thierry Goubier about, and he doesn’t have time
        either!  It would be a fun project, though, and it ought to be
        fairly separate from other parts of SmaCC.  I’ve spent a fair
        bit of time thinking about how to do it, but don’t think that I
        will be able to actually focus on it.

    Yes, this is the essence of the issue. There are a few alternatives
    about it, but none we have the time to pursue.


        An alternative approach, which Thierry has suggested, is to make
        SmaCC work on the UTF-8 representation of the Unicode.  Then we
        could represent character sets as prefix trees.  But the core
        problem would still exist: you can’t run an algorithm that
        repeatedly executes

                         for all characters in the alphabet do:

        when there are 2^21 characters in the alphabet!

    The main issue is that `for all characters`... All the literature on
    scanner building uses 'for all characters do'.

    Thierry


                 Andrew





Reply via email to