On Thu, Apr 8, 2010 at 3:06 AM, Ville M. Vainio <vivai...@gmail.com> wrote:

> I may sound like a broken record, but this is all taken care of by
> codewise / ctags scheme...

Earlier today I downloaded and installed ctags for Windows.  There is
now a binary installer--I'm not sure whether there was such a thing
the last time I tried to install.  There might have been, but my
memory for such things is (and always has been) zilch.

Anyway, I took a look at exuberant ctags.  It's really very simple.
It consists of driver code and a bunch of parsers, 41 to be exact.
Each "parser" is also simple. For example, I include python.c in the
post script.  As you can see, the amount of work for this parser is
comparable to Leo's present import scanners.  It would certainly be
possible to rewrite this parse in Python.  Similarly, it would be
possible to write all the other 41 parsers in Python.

Yes, I know a python wrapper for ctags exists:
http://code.google.com/p/python-ctags/  But that's not the point. The
point is that ctags is doing something very simple.  But we need
something more than something very simple. Yes, we can, and likely
will, adapt codewisecompleter to Leo's internal completion scheme.  I
may do that soon.  But that will not be the end of the story.

Here we come to the crux of the matter: 80 or 90 percent of
engineering is often devoted to the last 10 or 20 percent of
functionality, or even fit and finish.  That doesn't mean that the
last 80 percent of the effort is wasted!  But it does mean that one
must plan the "tail" of a project carefully.

In this case, I *insist* that heavyweight type inference *must* be
part of the ultimate product.  It's fine to make do with a lesser
solution for now, and the lesser solutions themselves provide
important guide posts and tools.  But my intention is to do something
very nearly as good as the Komodo completer, done entirely in Python,
with no extra installs necessary.  To do this, I'll end up doing
"completion parsers" for several languages, maybe even all 41.

In the first post of this thread I said the following:

QQQ
I've been resisting work on the new pylint lately.  Not sure exactly
why, but perhaps it was my unconscious trying to tell me that no,
pylint really doesn't help solve auto-completion problems.
QQQ

Yesterday's work show that this statement is mistaken.  Indeed, one
can do "better" than Python's inspect module fairly easily, such that
one doesn't have to import modules or instantiate "live" objects.  But
without live objects the *only* way to determine the type of z in
x.y.z is to do full, pylint-like, type inference.

This I intend to do.  You won't be able to talk me out of it :-)
Indeed, the analysis is a bit picky (it's akin to code generation in a
compiler), but it's not mysterious and it's not rocket science.  If
you haven't been following, you can see my thoughts at:
http://groups.google.com/group/leo-and-pylint.  You can see the actual
code at: https://code.launchpad.net/~edreamleo/+junk/new-pylint

Edward

P.S.  Here is ctags58/python.c::

/*
*   $Id: python.c 720 2009-07-07 03:55:23Z dhiebert $
*
*   Copyright (c) 2000-2003, Darren Hiebert
*
*   This source code is released for free distribution under the terms of the
*   GNU General Public License.
*
*   This module contains functions for generating tags for Python language
*   files.
*/
/*
*   INCLUDE FILES
*/
#include "general.h"  /* must always come first */

#include <string.h>

#include "entry.h"
#include "options.h"
#include "read.h"
#include "main.h"
#include "vstring.h"
#include "routines.h"
#include "debug.h"

/*
*   DATA DECLARATIONS
*/
typedef struct NestingLevel NestingLevel;
typedef struct NestingLevels NestingLevels;

struct NestingLevel
{
        int indentation;
        vString *name;
        int type;
};

struct NestingLevels
{
        NestingLevel *levels;
        int n;                                  /* number of levels in use */
        int allocated;
};

typedef enum {
        K_CLASS, K_FUNCTION, K_MEMBER, K_VARIABLE, K_IMPORT
} pythonKind;

/*
*   DATA DEFINITIONS
*/
static kindOption PythonKinds[] = {
        {TRUE, 'c', "class",    "classes"},
        {TRUE, 'f', "function", "functions"},
        {TRUE, 'm', "member",   "class members"},
    {TRUE, 'v', "variable", "variables"},
    {TRUE, 'i', "namespace", "imports"}
};

static char const * const singletriple = "'''";
static char const * const doubletriple = "\"\"\"";

/*
*   FUNCTION DEFINITIONS
*/

static NestingLevels *nestingLevelsNew (void)
{
        NestingLevels *nls = xCalloc (1, NestingLevels);
        return nls;
}

static void nestingLevelsFree (NestingLevels *nls)
{
        int i;
        for (i = 0; i < nls->allocated; i++)
                vStringDelete(nls->levels[i].name);
        if (nls->levels) eFree(nls->levels);
        eFree(nls);
}

static void nestingLevelsPush (NestingLevels *nls,
        const vString *name, int type)
{
        NestingLevel *nl = NULL;

        if (nls->n >= nls->allocated)
        {
                nls->allocated++;
                nls->levels = xRealloc(nls->levels,
                        nls->allocated, NestingLevel);
                nls->levels[nls->n].name = vStringNew();
        }
        nl = &nls->levels[nls->n];
        nls->n++;

        vStringCopy(nl->name, name);
        nl->type = type;
}

#if 0
static NestingLevel *nestingLevelsGetCurrent (NestingLevels *nls)
{
        Assert (nls != NULL);

        if (nls->n < 1)
                return NULL;

        return &nls->levels[nls->n - 1];
}

static void nestingLevelsPop (NestingLevels *nls)
{
        const NestingLevel *nl = nestingLevelsGetCurrent(nls);

        Assert (nl != NULL);
        vStringClear(nl->name);
        nls->n--;
}
#endif

static boolean isIdentifierFirstCharacter (int c)
{
        return (boolean) (isalpha (c) || c == '_');
}

static boolean isIdentifierCharacter (int c)
{
        return (boolean) (isalnum (c) || c == '_');
}

/* Given a string with the contents of a line directly after the "def" keyword,
 * extract all relevant information and create a tag.
 */
static void makeFunctionTag (vString *const function,
        vString *const parent, int is_class_parent, const char *arglist 
__unused__)
{
        tagEntryInfo tag;
        initTagEntry (&tag, vStringValue (function));

        tag.kindName = "function";
        tag.kind = 'f';
        /* tag.extensionFields.arglist = arglist; */

        if (vStringLength (parent) > 0)
        {
                if (is_class_parent)
                {
                        tag.kindName = "member";
                        tag.kind = 'm';
                        tag.extensionFields.scope [0] = "class";
                        tag.extensionFields.scope [1] = vStringValue (parent);
                }
                else
                {
                        tag.extensionFields.scope [0] = "function";
                        tag.extensionFields.scope [1] = vStringValue (parent);
                }
        }

        /* If a function starts with __, we mark it as file scope.
         * FIXME: What is the proper way to signal such attributes?
         * TODO: What does functions/classes starting with _ and __ mean in 
python?
         */
        if (strncmp (vStringValue (function), "__", 2) == 0 &&
                strcmp (vStringValue (function), "__init__") != 0)
        {
                tag.extensionFields.access = "private";
                tag.isFileScope = TRUE;
        }
        else
        {
                tag.extensionFields.access = "public";
        }
        makeTagEntry (&tag);
}

/* Given a string with the contents of the line directly after the "class"
 * keyword, extract all necessary information and create a tag.
 */
static void makeClassTag (vString *const class, vString *const inheritance,
        vString *const parent, int is_class_parent)
{
        tagEntryInfo tag;
        initTagEntry (&tag, vStringValue (class));
        tag.kindName = "class";
        tag.kind = 'c';
        if (vStringLength (parent) > 0)
        {
                if (is_class_parent)
                {
                        tag.extensionFields.scope [0] = "class";
                        tag.extensionFields.scope [1] = vStringValue (parent);
                }
                else
                {
                        tag.extensionFields.scope [0] = "function";
                        tag.extensionFields.scope [1] = vStringValue (parent);
                }
        }
        tag.extensionFields.inheritance = vStringValue (inheritance);
        makeTagEntry (&tag);
}

static void makeVariableTag (vString *const var, vString *const parent)
{
        tagEntryInfo tag;
        initTagEntry (&tag, vStringValue (var));
        tag.kindName = "variable";
        tag.kind = 'v';
        if (vStringLength (parent) > 0)
        {
                tag.extensionFields.scope [0] = "class";
                tag.extensionFields.scope [1] = vStringValue (parent);
        }
        makeTagEntry (&tag);
}

/* Skip a single or double quoted string. */
static const char *skipString (const char *cp)
{
        const char *start = cp;
        int escaped = 0;
        for (cp++; *cp; cp++)
        {
                if (escaped)
                        escaped--;
                else if (*cp == '\\')
                        escaped++;
                else if (*cp == *start)
                        return cp + 1;
        }
        return cp;
}

/* Skip everything up to an identifier start. */
static const char *skipEverything (const char *cp)
{
        for (; *cp; cp++)
        {
                if (*cp == '"' || *cp == '\'')
                {
                        cp = skipString(cp);
                        if (!*cp) break;
                }
                if (isIdentifierFirstCharacter ((int) *cp))
                        return cp;
        }
        return cp;
}

/* Skip an identifier. */
static const char *skipIdentifier (const char *cp)
{
        while (isIdentifierCharacter ((int) *cp))
                cp++;
        return cp;
}

static const char *findDefinitionOrClass (const char *cp)
{
        while (*cp)
        {
                cp = skipEverything (cp);
                if (!strncmp(cp, "def", 3) || !strncmp(cp, "class", 5) ||
                        !strncmp(cp, "cdef", 4) || !strncmp(cp, "cpdef", 5))
                {
                        return cp;
                }
                cp = skipIdentifier (cp);
        }
        return NULL;
}

static const char *skipSpace (const char *cp)
{
        while (isspace ((int) *cp))
                ++cp;
        return cp;
}

/* Starting at ''cp'', parse an identifier into ''identifier''. */
static const char *parseIdentifier (const char *cp, vString *const identifier)
{
        vStringClear (identifier);
        while (isIdentifierCharacter ((int) *cp))
        {
                vStringPut (identifier, (int) *cp);
                ++cp;
        }
        vStringTerminate (identifier);
        return cp;
}

static void parseClass (const char *cp, vString *const class,
        vString *const parent, int is_class_parent)
{
        vString *const inheritance = vStringNew ();
        vStringClear (inheritance);
        cp = parseIdentifier (cp, class);
        cp = skipSpace (cp);
        if (*cp == '(')
        {
                ++cp;
                while (*cp != ')')
                {
                        if (*cp == '\0')
                        {
                                /* Closing parenthesis can be in follow up 
line. */
                                cp = (const char *) fileReadLine ();
                                if (!cp) break;
                                vStringPut (inheritance, ' ');
                                continue;
                        }
                        vStringPut (inheritance, *cp);
                        ++cp;
                }
                vStringTerminate (inheritance);
        }
        makeClassTag (class, inheritance, parent, is_class_parent);
        vStringDelete (inheritance);
}

static void parseImports (const char *cp)
{
        const char *pos;
        vString *name, *name_next;

        cp = skipEverything (cp);

        if ((pos = strstr (cp, "import")) == NULL)
                return;

        cp = pos + 6;

        /* continue only if there is some space between the keyword and the
identifier */
        if (! isspace (*cp))
                return;

        cp++;
        cp = skipSpace (cp);

        name = vStringNew ();
        name_next = vStringNew ();

        cp = skipEverything (cp);
        while (*cp)
        {
                cp = parseIdentifier (cp, name);

                cp = skipEverything (cp);
                /* we parse the next possible import statement as well to be 
able to
ignore 'foo' in
                 * 'import foo as bar' */
                parseIdentifier (cp, name_next);

                /* take the current tag only if the next one is not "as" */
                if (strcmp (vStringValue (name_next), "as") != 0 &&
                        strcmp (vStringValue (name), "as") != 0)
                {
                        makeSimpleTag (name, PythonKinds, K_IMPORT);
                }
        }
        vStringDelete (name);
        vStringDelete (name_next);
}

/* modified from get.c getArglistFromStr().
 * warning: terminates rest of string past arglist!
 * note: does not ignore brackets inside strings! */
static char *parseArglist(const char *buf)
{
        char *start, *end;
        int level;
        if (NULL == buf)
                return NULL;
        if (NULL == (start = strchr(buf, '(')))
                return NULL;
        for (level = 1, end = start + 1; level > 0; ++end)
        {
                if ('\0' == *end)
                        break;
                else if ('(' == *end)
                        ++ level;
                else if (')' == *end)
                        -- level;
        }
        *end = '\0';
        return strdup(start);
}

static void parseFunction (const char *cp, vString *const def,
        vString *const parent, int is_class_parent)
{
        char *arglist;

        cp = parseIdentifier (cp, def);
        arglist = parseArglist (cp);
        makeFunctionTag (def, parent, is_class_parent, arglist);
        eFree (arglist);
}

/* Get the combined name of a nested symbol. Classes are separated with ".",
 * functions with "/". For example this code:
 * class MyClass:
 *     def myFunction:
 *         def SubFunction:
 *             class SubClass:
 *                 def Method:
 *                     pass
 * Would produce this string:
 * MyClass.MyFunction/SubFunction/SubClass.Method
 */
static boolean constructParentString(NestingLevels *nls, int indent,
        vString *result)
{
        int i;
        NestingLevel *prev = NULL;
        int is_class = FALSE;
        vStringClear (result);
        for (i = 0; i < nls->n; i++)
        {
                NestingLevel *nl = nls->levels + i;
                if (indent <= nl->indentation)
                        break;
                if (prev)
                {
                        vStringCatS(result, ".");       /* make Geany symbol 
list grouping work properly */
/*
                        if (prev->type == K_CLASS)
                                vStringCatS(result, ".");
                        else
                                vStringCatS(result, "/");
*/
                }
                vStringCat(result, nl->name);
                is_class = (nl->type == K_CLASS);
                prev = nl;
        }
        return is_class;
}

/* Check whether parent's indentation level is higher than the current level and
 * if so, remove it.
 */
static void checkParent(NestingLevels *nls, int indent, vString *parent)
{
        int i;
        NestingLevel *n;

        for (i = 0; i < nls->n; i++)
        {
                n = nls->levels + i;
                /* is there a better way to compare two vStrings? */
                if (strcmp(vStringValue(parent), vStringValue(n->name)) == 0)
                {
                        if (n && indent <= n->indentation)
                        {
                                /* remove this level by clearing its name */
                                vStringClear(n->name);
                        }
                        break;
                }
        }
}

static void addNestingLevel(NestingLevels *nls, int indentation,
        const vString *name, boolean is_class)
{
        int i;
        NestingLevel *nl = NULL;

        for (i = 0; i < nls->n; i++)
        {
                nl = nls->levels + i;
                if (indentation <= nl->indentation) break;
        }
        if (i == nls->n)
        {
                nestingLevelsPush(nls, name, 0);
                nl = nls->levels + i;
        }
        else
        {       /* reuse existing slot */
                nls->n = i + 1;
                vStringCopy(nl->name, name);
        }
        nl->indentation = indentation;
        nl->type = is_class ? K_CLASS : !K_CLASS;
}

/* Return a pointer to the start of the next triple string, or NULL. Store
 * the kind of triple string in "which" if the return is not NULL.
 */
static char const *find_triple_start(char const *string, char const **which)
{
        char const *cp = string;

        for (; *cp; cp++)
        {
                if (*cp == '"' || *cp == '\'')
                {
                        if (strncmp(cp, doubletriple, 3) == 0)
                        {
                                *which = doubletriple;
                                return cp;
                        }
                        if (strncmp(cp, singletriple, 3) == 0)
                        {
                                *which = singletriple;
                                return cp;
                        }
                        cp = skipString(cp);
                        if (!*cp) break;
                }
        }
        return NULL;
}

/* Find the end of a triple string as pointed to by "which", and update "which"
 * with any other triple strings following in the given string.
 */
static void find_triple_end(char const *string, char const **which)
{
        char const *s = string;
        while (1)
        {
                /* Check if the string ends in the same line. */
                s = strstr (s, *which);
                if (!s) break;
                s += 3;
                *which = NULL;
                /* If yes, check if another one starts in the same line. */
                s = find_triple_start(s, which);
                if (!s) break;
                s += 3;
        }
}

static const char *findVariable(const char *line)
{
        /* Parse global and class variable names (C.x) from assignment 
statements.
         * Object attributes (obj.x) are ignored.
         * Assignment to a tuple 'x, y = 2, 3' not supported.
         * TODO: ignore duplicate tags from reassignment statements. */
        const char *cp, *sp, *eq, *start;

        cp = strstr(line, "=");
        if (!cp)
                return NULL;
        eq = cp + 1;
        while (*eq)
        {
                if (*eq == '=')
                        return NULL;    /* ignore '==' operator and 'x=5,y=6)' 
function lines */
                if (*eq == '(' || *eq == '#')
                        break;  /* allow 'x = func(b=2,y=2,' lines and comments 
at the end of line */
                eq++;
        }

        /* go backwards to the start of the line, checking we have valid chars 
*/
        start = cp - 1;
        while (start >= line && isspace ((int) *start))
                --start;
        while (start >= line && isIdentifierCharacter ((int) *start))
                --start;
        if (!isIdentifierFirstCharacter(*(start + 1)))
                return NULL;
        sp = start;
        while (sp >= line && isspace ((int) *sp))
                --sp;
        if ((sp + 1) != line)   /* the line isn't a simple variable assignment 
*/
                return NULL;
        /* the line is valid, parse the variable name */
        ++start;
        return start;
}

/* Skip type declaration that optionally follows a cdef/cpdef */
static const char *skipTypeDecl (const char *cp, boolean *is_class)
{
        const char *lastStart = cp, *ptr = cp;
        int loopCount = 0;
        ptr = skipSpace(cp);
        if (!strncmp("extern", ptr, 6)) {
                ptr += 6;
                ptr = skipSpace(ptr);
                if (!strncmp("from", ptr, 4)) { return NULL; }
        }
        if (!strncmp("class", ptr, 5)) {
                ptr += 5 ;
                *is_class = TRUE;
                ptr = skipSpace(ptr);
                return ptr;
        }
        /* limit so that we don't pick off "int item=obj()" */
        while (*ptr && loopCount++ < 2) {
                while (*ptr && *ptr != '=' && *ptr != '(' && !isspace(*ptr)) 
ptr++;
                if (!*ptr || *ptr == '=') return NULL;
                if (*ptr == '(') {
                    return lastStart; /* if we stopped on a '(' we are done */
                }
                ptr = skipSpace(ptr);
                lastStart = ptr;
                while (*lastStart == '*') lastStart++;  /* cdef int *identifier 
*/
        }
        return NULL;
}

static void findPythonTags (void)
{
        vString *const continuation = vStringNew ();
        vString *const name = vStringNew ();
        vString *const parent = vStringNew();

        NestingLevels *const nesting_levels = nestingLevelsNew();

        const char *line;
        int line_skip = 0;
        char const *longStringLiteral = NULL;

        while ((line = (const char *) fileReadLine ()) != NULL)
        {
                const char *cp = line, *candidate;
                char const *longstring;
                char const *keyword, *variable;
                int indent;

                cp = skipSpace (cp);

                if (*cp == '\0')  /* skip blank line */
                        continue;

                /* Skip comment if we are not inside a multi-line string. */
                if (*cp == '#' && !longStringLiteral)
                        continue;

                /* Deal with line continuation. */
                if (!line_skip) vStringClear(continuation);
                vStringCatS(continuation, line);
                vStringStripTrailing(continuation);
                if (vStringLast(continuation) == '\\')
                {
                        vStringChop(continuation);
                        vStringCatS(continuation, " ");
                        line_skip = 1;
                        continue;
                }
                cp = line = vStringValue(continuation);
                cp = skipSpace (cp);
                indent = cp - line;
                line_skip = 0;

                checkParent(nesting_levels, indent, parent);

                /* Deal with multiline string ending. */
                if (longStringLiteral)
                {
                        find_triple_end(cp, &longStringLiteral);
                        continue;
                }

                /* Deal with multiline string start. */
                longstring = find_triple_start(cp, &longStringLiteral);
                if (longstring)
                {
                        longstring += 3;
                        find_triple_end(longstring, &longStringLiteral);
                        /* We don't parse for any tags in the rest of the line. 
*/
                        continue;
                }

                /* Deal with def and class keywords. */
                keyword = findDefinitionOrClass (cp);
                if (keyword)
                {
                        boolean found = FALSE;
                        boolean is_class = FALSE;
                        if (!strncmp (keyword, "def ", 4))
                        {
                                cp = skipSpace (keyword + 3);
                                found = TRUE;
                        }
                        else if (!strncmp (keyword, "class ", 6))
                        {
                                cp = skipSpace (keyword + 5);
                                found = TRUE;
                                is_class = TRUE;
                        }
                        else if (!strncmp (keyword, "cdef ", 5))
                    {
                        cp = skipSpace(keyword + 4);
                        candidate = skipTypeDecl (cp, &is_class);
                        if (candidate)
                        {
                                found = TRUE;
                                cp = candidate;
                        }

                    }
                else if (!strncmp (keyword, "cpdef ", 6))
                    {
                        cp = skipSpace(keyword + 5);
                        candidate = skipTypeDecl (cp, &is_class);
                        if (candidate)
                        {
                                found = TRUE;
                                cp = candidate;
                        }
                    }

                        if (found)
                        {
                                boolean is_parent_class;

                                is_parent_class =
                                        constructParentString(nesting_levels, 
indent, parent);

                                if (is_class)
                                        parseClass (cp, name, parent, 
is_parent_class);
                                else
                                        parseFunction(cp, name, parent, 
is_parent_class);

                                addNestingLevel(nesting_levels, indent, name, 
is_class);
                        }
                }
                /* Find global and class variables */
                variable = findVariable(line);
                if (variable)
                {
                        const char *start = variable;
                        boolean parent_is_class;

                        vStringClear (name);
                        while (isIdentifierCharacter ((int) *start))
                        {
                                vStringPut (name, (int) *start);
                                ++start;
                        }
                        vStringTerminate (name);

                        parent_is_class = constructParentString(nesting_levels, 
indent, parent);
                        /* skip variables in methods */
                        if (! parent_is_class && vStringLength(parent) > 0)
                                continue;

                        makeVariableTag (name, parent);
                }
                /* Find and parse imports */
                parseImports(line);
        }
        /* Clean up all memory we allocated. */
        vStringDelete (parent);
        vStringDelete (name);
        vStringDelete (continuation);
        nestingLevelsFree (nesting_levels);
}

extern parserDefinition *PythonParser (void)
{
    static const char *const extensions[] = { "py", "pyx", "pxd",
"pxi" ,"scons", NULL };
        parserDefinition *def = parserNew ("Python");
        def->kinds = PythonKinds;
        def->kindCount = KIND_COUNT (PythonKinds);
        def->extensions = extensions;
        def->parser = findPythonTags;
        return def;
}

/* vi:set tabstop=4 shiftwidth=4: */

EKR
------------------------------------------------------------------------------
Edward K. Ream email: edream...@gmail.com
Leo: http://webpages.charter.net/edreamleo/front.html
------------------------------------------------------------------------------

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to leo-edi...@googlegroups.com.
To unsubscribe from this group, send email to 
leo-editor+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Reply via email to