Hello, everyone,

After a few months off, I recently resumed work on Bithenge. As you may
recall, Vojtech was especially interested in code generation and editing
support, but I didn't manage to figure out how to implement them during
GSoC. I've been working on a constraint-based design for Bithenge that
should help make these features possible. A partial proof-of-concept is
attached; it requires a recent version of Python 3, and converts a
trivial Bithenge transform into a Python encoder and decoder.

The new design uses four stages: a transform program, a constraint
program, an instruction program, and the final Python program. Each
stage is shown in the proof-of-concept's output. The transform program
works the same as in the existing design. The constraint program is a
somewhat simpler and more explicit version of the transform program; in
particular, each constraint is applied to explicit variables, rather
than the implicit left and right variables used by transforms. The
instruction program consists of instructions for a simple virtual
machine to encode or decode the data.

There are two key features of the constraint-based design. First, rather
than applying the transforms directly to the data, it creates an
instruction program that specifies exactly how to encode/decode the
data. The instruction program can easily be converted to, say,
HelenOS-compatible C code that can be used in HelenOS. Second, you can
attempt to determine any set of variables given the values of any other
set of variables; you can not only both encode and decode, but also
determine some fields in a struct given other fields. This will make it
possible to edit binary data, among other things.

Converting from the transform program to the constraint program is
trivial; in fact, a real implementation would just create the constraint
program directly in the parser. Using the simpler, more explicit
constraint program helps avoid unnecessary complications in the next
step.

The difficult part is converting from the constraint program to the
instruction program, given a set of variables with known values and a
set of variables that need to be computed. The current algorithm works
backwards. For instance, if it needs to find "left" given "right", it
looks through each constraint to find any way of determining "left". If
it finds a way to determine "left" given "mid", it must now find a way
to determine "mid" given "right".

Converting from the instruction program to the Python program is again
trivial.  It would also be possible to generate code for C or other
languages, interpret the instruction program, or even feed it to a JIT
compiler! The current output is unnecessarily long; constant propagation
and other well-known optimizations could fix that and make it more
readable.

For now, I'm going to keep working on the proof-of-concept and figuring
out other crucial features, like conditionals and repetition. It's going
to take a while to figure out how to solve those things for flexible
encoding and decoding.

Finally, a note on languages supported by HelenOS. The Bithenge design
(old and new) relies on object-orientation and reference counting, and
it was tedious to write in C. When I start working on the full new
version of Bithenge, I'll probably want to use C++ or Python instead.
Bithenge wouldn't be usable within HelenOS without full C++
(#413-related) or Python (#403) support, although it could still
generate C code to be used in HelenOS.

Thanks,
Sean Bartell
#!/usr/bin/env python
# Copyright 2012 Sean Bartell.

# This is a proof-of-concept for a constraint-based version of Bithenge. It
# requires a recent Python version.


from collections import defaultdict, deque, OrderedDict
from functools import reduce
import itertools
import operator
try:
    from itertools import imap    # Python 2
except ImportError:
    imap = map                    # Python 3


def indent(text):
    '''Return the string with a level of indentation added.'''
    return str(text).replace('\n', '\n    ')


##############################
# Transforms and expressions #
##############################

# This is the first phase: a TransformProgram consists of callable transforms,
# which are made from other transforms and expressions. The TransformProgram is
# then trivially converted to a ConstraintProgram. In fact, these classes
# aren't even necessary; in a real implementation, the Bithenge parser would
# construct ConstraintPrograms directly.


class Transform:
    def make_constraint(self, scope, left, right):
        raise NotImplementedError


class Expression:
    def make_constraint(self, scope, result):
        raise NotImplementedError


class SimpleTransform(Transform):
    def __init__(self, name, cls):
        self.name, self.cls = name, cls

    def __str__(self):
        return self.name

    def make_constraint(self, scope, left, right):
        self.cls(scope, left, right)


class CallTransform(Transform):
    def __init__(self, name):
        self.name = name

    def __str__(self):
        return self.name

    def make_constraint(self, scope, left, right):
        CallConstraint(scope.cprog, scope, self.name, (left, right))


class ComposeTransform(Transform):
    def __init__(self, left, right):
        self.left, self.right = left, right

    def __str__(self):
        return '{} <- {}'.format(self.left, self.right)

    def make_constraint(self, scope, left, right):
        mid = ConstraintVariable(scope, 'mid')
        self.left.make_constraint(scope, left, mid)
        self.right.make_constraint(scope, mid, right)


class ComposeExpression(Expression):
    def __init__(self, left, right):
        self.left, self.right = left, right

    def __str__(self):
        return '{} <- {}'.format(self.left, self.right)

    def make_constraint(self, scope, output):
        mid = ConstraintVariable(scope, 'mid')
        self.left.make_constraint(scope, output, mid)
        self.right.make_constraint(scope, mid)


class ConstantExpression(Expression):
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return repr(self.value)

    def make_constraint(self, scope, output):
        ConstantConstraint(scope, output, self.value)


class StructTransform(Transform):
    def __init__(self, fields):
        self.fields = OrderedDict(fields)

    def __str__(self):
        s = ''.join('\n.{} <- {};'.format(*field) for field in self.fields.items())
        return 'struct {{{}\n}}'.format(indent(s))

    def make_constraint(self, scope, left, right):
        StructConstraint(scope, left, self.fields.keys())
        subblobs = []
        for name, xform in self.fields.items():
            subblobs.append(ConstraintVariable(scope, '{}_{}'.format(right.name, name)))
            xform.make_constraint(scope, left.field(name), subblobs[-1])
        ConcatConstraint(scope, right, subblobs)


# TODO: from "if (.big) { .scale <- 1000 } else { .scale <- 1 }" and the value
# of ".scale", we need to determine ".big". Current plan: get ".scale <- 1000"
# to create an instruction like "valid <- .scale == 1000", and set ".big" based
# on whichever branch is valid.

#class IfExpression(Expression):
#    def __init__(self, cond, true, false):
#        self.cond, self.true, self.false = cond, true, false
#
#    def __str__(self):
#        return 'if {} then {} else {}'.format(self.cond, self.true, self.false)
#
#    def make_constraint(self, scope, output):
#        # each Constraint has verify() method that creates a bool; the bool can
#        # be asserted or used to resolve an if-statement.
#
#        # verify method works only with all vars known, or with any combination?
#
#        cond = ConstraintVariable(scope, 'cond')
#        self.cond.make_constraint(scope, cond)
#        true = ConstraintScope()
#        false = ConstraintScope()
#        IfConstraint(scope


# TODO
#class RepeatTransform(Transform):
#    def __init__(self, times, xform):
#        self.times = times
#        self.xform = xform
#
#    def __str__(self):
#        return 'repeat({}) {{\n{}\n}}'.format(self.times, indent(self.xform))
#
#    def make_constraint(self, scope, left, right):


class TransformProgram:
    def __init__(self):
        self.callables = {}

    def add(self, name, xform):
        self.callables[name] = xform

    def __str__(self):
        return '\n'.join('transform {} = {};\n'.format(name, indent(xform))
                         for name, xform in self.callables.items())


#########
# Types #
#########

# These specify the types of ConstraintVariables. Type() is a completely
# unknown type, while SeqType(ByteType()) is a sequence of bytes. Uses
# Hindley-Milner type inference:
# http://www.codecommit.com/blog/scala/what-is-hindley-milner-and-why-is-it-cool


class Type:
    def __init__(self, params = ()):
        self.actual = None # The real Type, if this one was replaced during unification.
        self.params = tuple(params)
        self.name = None

    def unify(self, other):
        self = self.simplify()
        other = other.simplify()
        if self is other:
            pass
        elif isinstance(other, type(self)):
            self.actual = other
            for a, b in zip(self.params, other.params):
                b.unify(a)
        elif isinstance(self, type(other)):
            other.unify(self)
        else:
            assert False

    def simplify(self):
        if self.actual is not None:
            return self.actual.simplify()
        self.params = tuple(param.simplify() for param in self.params)
        return self

    def __eq__(self, other):
        return type(self) == type(other) and self.params == other.params

    names = imap('t{}'.format, itertools.count())
    def __str__(self):
        if self.name is None:
            self.name = next(Type.names)
        return self.name


class SeqType(Type):
    def __init__(self, param):
        Type.__init__(self, (param,))

    def __str__(self):
        return 'seq({})'.format(self.params[0])


class ByteType(Type):
    def __str__(self):
        return 'byte'


class BoolType(Type):
    def __str__(self):
        return 'bool'


class IntType(Type):
    def __str__(self):
        return 'int'


# Each StructConstraint creates its own StructType, incompatible with all
# others.
class StructType(Type):
    def __init__(self, names):
        self.fields = OrderedDict((name, Type()) for name in names)
        Type.__init__(self, self.fields.values())

    def __str__(self):
        fields = ', '.join('{}:{}'.format(*field) for field in self.fields.items())
        return 'struct({})'.format(fields)

    def __eq__(self, other):
        return self is other

    def simplify(self):
        for name in self.fields:
            self.fields[name] = self.fields[name].simplify()
        return Type.simplify(self)

    def unify(self, other):
        if self is other:
            pass
        elif other.actual is not None:
            self.unify(other.actual)
        elif type(other) == Type:
            other.actual = self
        else:
            assert False


def value_type(value):
    '''Returns the type of a constant value.'''
    if isinstance(value, int):
        return IntType()
    elif isinstance(value, bytes):
        return SeqType(ByteType())
    elif isinstance(value, bool):
        return BoolType()
    elif isinstance(value, tuple):
        if value:
            return SeqType(value_type(value[0]))
        else:
            return SeqType(Type())


###############
# Constraints #
###############

# The second phase. ConstraintPrograms are somewhat simpler and more explicit
# than TransformPrograms; in particular, Constraints are applied to explicit
# variables, rather than the implicit left and right variables used by
# Transforms. Solving ConstraintPrograms into InstructionPrograms is the most
# difficult part of this script, so it's important to avoid unnecessary
# complications.


class Constraint:
    def __init__(self, scope):
        self.scope = scope
        scope.constraints.add(self)


class ConstraintProgram:
    def __init__(self, xprog):
        self.callables = {}
        for name, xform in xprog.callables.items():
            if isinstance(xform, Transform):
                args = ('left', 'right')
            else: # Expression
                args = ('left',)
            self.callables[name] = ConstraintScope(self, name, args)

        for name, xform in xprog.callables.items():
            scope = self.callables[name]
            xform.make_constraint(scope, *scope.args)

        for scope in self.callables.values():
            scope.unify_types()

    def __str__(self):
        return '\n'.join(map(str, self.callables.values()))


class ConstraintScope:
    def __init__(self, cprog, name, args):
        self.cprog = cprog
        self.name = name
        self.variables = set()
        self.constraints = set()
        self.args = tuple(ConstraintVariable(self, arg) for arg in args)

    def __str__(self):
        args = ', '.join(map(str, self.args))
        variables = ', '.join(map('{0}:{0.typ}'.format, self.variables))
        constraints = '\n'.join(map(str, self.constraints))
        body = indent('\nvar {};\n{}'.format(variables, constraints))
        return 'constraint {}({}):{}\n'.format(self.name, args, body)

    def unify_types(self):
        for var in self.variables:
            var.typ = var.typ.simplify()

    def solve(self, iprog, wanted, known, available):
        '''Find ways to determine the "wanted" variables from the "known"
        variables, and possibly the "available" variables. Yields tuples of
        (variables from "available" used by the solution, instructions to
                perform the solution).'''
        regs = {var : Register(var) for var in self.variables}
        known = tuple(known)
        solver = Solver(iprog, self, regs, wanted, known, available)
        for newly_needed, solution in solver.solve():
            newly_needed = tuple(newly_needed)
            outputs = map(regs.get, wanted)
            inputs = map(regs.get, known + newly_needed)
            yield newly_needed, InstructionScope(self, outputs, inputs, solution.inses)

    def solve_best(self, iprog, wanted, known):
        '''Find the best way to determine the "wanted" variables from the
        "known" variables. Returns instructions to perform the solution.'''
        # TODO: make use of other args determined as part of the solution
        _, iscope = min(self.solve(iprog, wanted, known, ()), key=lambda x:len(x[1].inses))
        return iscope


class ConstraintVariable:
    '''A single variable in a constraint scope. There may be subvariables for
    length (sequences) or fields (structs); a call can determine the value of a
    subvariable without determining the value of the whole variable..'''

    def __init__(self, scope, name):
        self.scope = scope
        self.name = name
        self.typ = Type()
        self._len_var = None
        self._fields = {}
        scope.variables.add(self)

    def __str__(self):
        return self.name

    def __repr__(self):
        return self.name

    def field(self, name):
        if name not in self._fields:
            self.typ = self.typ.simplify()
            assert isinstance(self.typ, StructType)
            self._fields[name] = ConstraintVariable(self.scope, self.name + '.' + name)
            self._fields[name].typ.unify(self.typ.fields[name])
        return self._fields[name]

    @property
    def len(self):
        if self._len_var is None:
            self.typ = self.typ.simplify()
            assert isinstance(self.typ, SeqType)
            self._len_var = ConstraintVariable(self.scope, self.name + '.len')
            self._len_var.typ.unify(IntType())
        return self._len_var


class Solver:
    '''Helper for ConstraintScope.solve. Works backwards by trying to determine
    each variable in "wanted". Ignores useless PartialSolutions: those with
    more wanted variables and a greater cost than already known solutions.'''

    def __init__(self, iprog, cscope, regs, wanted, known, available):
        self.iprog = iprog
        self.cscope = cscope
        self.regs = regs
        self.known = frozenset(known)
        self.available = frozenset(available)
        self.wanted = frozenset(wanted)

    def solve(self):
        self.lowest_cost = {self.wanted: 0}
        self.partials = deque((PartialSolution(self, self.wanted),))
        while self.partials:
            partial = self.partials.popleft()
            if partial.wanted <= self.available:
                yield partial.wanted, partial
            for constraint in self.cscope.constraints:
                for partial2 in constraint.solve(partial):
                    self.add_partial(partial2)

    def add_partial(self, partial):
        if (partial.wanted in self.lowest_cost and
                self.lowest_cost[partial.wanted] <= partial.cost):
            return
        useful = True
        for lowest_wanted, lowest_cost in self.lowest_cost.items():
            if lowest_wanted <= partial.wanted and lowest_cost <= partial.cost:
                useful = False
                break
            if lowest_wanted >= partial.wanted and lowest_cost >= partial.cost:
                self.partials = deque(p for p in self.partials if not (p.wanted >= partial.wanted and p.cost >= partial.cost))
                break
        if useful:
            self.partials.append(partial)
        self.lowest_cost[partial.wanted] = partial.cost


# TODO: use "unproven" to make sure all constraints are checked with
# AssertInstructions if necessary.
class PartialSolution:
    def __init__(self, solver, wanted, inses = (), unproven = None):
        self.solver = solver
        self.wanted = frozenset(wanted)
        self.inses = inses
        if unproven is None:
            unproven = solver.cscope.constraints
        self.unproven = frozenset(unproven)

    def __str__(self):
        return 'PartialSolution(wanted={}, cost={})'.format(set(self.wanted), self.cost)

    @property
    def cost(self):
        return len(self.inses) # TODO

    @property
    def regs(self):
        return self.solver.regs

    def useful(self, outputs, inputs):
        # TODO
        return not self.wanted.isdisjoint(outputs)

    def step(self, outputs, inputs, ins, proven = ()):
        '''Returns the PartialSolution with the given instructions added, which
        determine "outputs" given "inputs".'''
        outputs = frozenset(outputs)
        inputs = frozenset(inputs)
        wanted2 = (self.wanted - outputs) | (inputs - self.solver.known)
        unproven = self.unproven - frozenset(proven)
        return PartialSolution(self.solver, wanted2, (ins,) + self.inses, unproven)


class CallConstraint(Constraint):
    def __init__(self, cprog, scope, name, args):
        Constraint.__init__(self, scope)
        self.args = args
        self.callee = cprog.callables[name]
        assert len(args) == len(self.callee.args)
        for arg0, arg1 in zip(args, self.callee.args):
            arg0.typ.unify(arg1.typ)

    def __str__(self):
        return '{}({});'.format(self.callee.name, ', '.join(map(str, self.args)))

    def solve(self, partial):
        to_other = defaultdict(set)
        todo = list(zip(self.args, self.callee.args))
        while todo:
            arg0, arg1 = todo.pop()
            to_other[arg0].add(arg1)
            if type(arg0.typ) == SeqType:
                todo.append((arg0.len, arg1.len))
            elif type(arg0.typ) == StructType:
                for name in arg0.typ.fields:
                    todo.append((arg0.field(name), arg1.field(name)))

        if partial.wanted.isdisjoint(to_other.keys()):
            return
        from_other = {z: x for x, y in to_other.items() for z in y}
        wanted = partial.wanted & frozenset(to_other.keys())
        known = partial.solver.known & frozenset(to_other.keys())
        available = frozenset(to_other.keys()) - wanted - known
        wanted, known, available = tuple(wanted), tuple(known), tuple(available)
        def set_to_other(vars):
            return reduce(operator.or_, map(to_other.get, vars), set())
        for newly_needed, iscope in self.callee.solve(
                partial.solver.iprog, set_to_other(wanted),
                set_to_other(known), set_to_other(available)):
            newly_needed = tuple(map(from_other.get, newly_needed))
            ins = CallInstruction(iscope,
                    [partial.regs[var] for var in wanted],
                    [partial.regs[var] for var in known + newly_needed])
            yield partial.step(wanted, newly_needed, ins) # TODO: proven


class ConstantConstraint(Constraint):
    def __init__(self, scope, var, value):
        Constraint.__init__(self, scope)
        self.var = var
        self.value = value
        var.typ.unify(value_type(value))

    def __str__(self):
        return '{} = {!r};'.format(self.var, self.value)

    def solve(self, partial):
        if partial.useful((self.var,), ()):
            ins = ConstantInstruction(partial.regs[self.var], self.value)
            yield partial.step((self.var,), (), ins, (self,))


class AddConstraint(Constraint):
    def __init__(self, scope, whole, a, b):
        Constraint.__init__(self, scope)
        self.whole, self.a, self.b = whole, a, b
        whole.typ.unify(IntType())
        a.typ.unify(IntType())
        b.typ.unify(IntType())

    def __str__(self):
        return '{} = {} + {};'.format(self.whole, self.a, self.b)

    def solve(self, partial):
        if partial.useful((self.whole,), (self.a, self.b)):
            ins = AddInstruction(partial.regs[self.whole], partial.regs[self.a], partial.regs[self.b])
            yield partial.step((self.whole,), (self.a, self.b), ins, (self,))
        if partial.useful((self.a,), (self.whole, self.b)):
            ins = SubInstruction(partial.regs[self.a], partial.regs[self.whole], partial.regs[self.b])
            yield partial.step((self.a,), (self.whole, self.b), ins, (self,))
        if partial.useful((self.b,), (self.whole, self.a)):
            ins = SubInstruction(partial.regs[self.b], partial.regs[self.whole], partial.regs[self.a])
            yield partial.step((self.b,), (self.whole, self.a), ins, (self,))


class ConcatConstraint(Constraint):
    def __init__(self, scope, full, parts):
        Constraint.__init__(self, scope)
        full.typ.unify(SeqType(Type()))
        for part in parts:
            part.typ.unify(full.typ)
        self.full = full
        self.parts = parts

        self.starts = [ConstraintVariable(scope, str(part) + '_start') for part in parts]
        self.starts.append(self.full.len)
        ConstantConstraint(scope, self.starts[0], 0)
        for start, end, part in zip(self.starts, self.starts[1:], parts):
            AddConstraint(scope, end, start, part.len)

    def __str__(self):
        return '{} = {};'.format(self.full, ' ++ '.join(map(str, self.parts)))

    def solve(self, partial):
        if partial.useful((self.full,), self.parts):
            ins = ConcatInstruction(partial.regs[self.full], [partial.regs[part] for part in self.parts])
            yield partial.step((self.full,), self.parts, ins, (self,))
        for start, end, part in zip(self.starts, self.starts[1:], self.parts):
            if partial.useful((part,), (start, end, self.full)):
                ins = SliceInstruction(partial.regs[part], partial.regs[self.full], partial.regs[start], partial.regs[end])
                yield partial.step((part,), (self.full, start, end), ins)


class StructConstraint(Constraint):
    '''Different from StructTransform: this is only the relationship between a
    StructType variable and its fields. It doesn't connect the fields to
    anything else.'''

    def __init__(self, scope, var, names):
        var.typ.unify(StructType(names))
        self.fields = [var.field(name) for name in names]
        Constraint.__init__(self, scope)
        self.var = var
        self.names = names

    def __str__(self):
        return 'struct({});'.format(self.var)

    def solve(self, partial):
        if partial.useful((self.var,), self.fields):
            fields = {name: partial.regs[self.var.field(name)] for name in self.var.typ.fields}
            ins = StructInstruction(partial.regs[self.var], fields)
            yield partial.step((self.var,), self.fields, ins, (self,))
        elif partial.useful(self.fields, (self.var,)):
            fields = {name: partial.regs[self.var.field(name)] for name in self.var.typ.fields}
            ins = DestructInstruction(fields, partial.regs[self.var])
            yield partial.step(self.fields, (self.var,), ins, (self,))


class Uint16leConstraint(Constraint):
    def __init__(self, scope, left, right):
        Constraint.__init__(self, scope)
        self.left = left
        self.right = right
        left.typ.unify(IntType())
        right.typ.unify(SeqType(ByteType()))
        ConstantConstraint(scope, right.len, 2)

    def __str__(self):
        return 'uint16le({}, {});'.format(self.left, self.right)

    def solve(self, partial):
        if partial.useful((self.left,), (self.right,)):
            src = partial.regs[self.right]
            ins = Uint16leDecodeInstruction(partial.regs[self.left], src)
            yield partial.step((self.left,), (self.right,), ins, (self,))
        if partial.useful((self.right,), (self.left,)):
            src = partial.regs[self.left]
            ins = Uint16leEncodeInstruction(partial.regs[self.right], src)
            yield partial.step((self.right,), (self.left,), ins, (self,))


class NonzeroBooleanConstraint(Constraint):
    def __init__(self, scope, left, right):
        Constraint.__init__(self, scope)
        self.left = left
        self.right = right
        left.typ.unify(BoolType())
        right.typ.unify(IntType())

    def __str__(self):
        return 'nonzero_boolean({}, {});'.format(self.left, self.right)

    def solve(self, partial):
        if partial.useful((self.left,), (self.right,)):
            src = partial.regs[self.right]
            ins = NonzeroBooleanDecodeInstruction(partial.regs[self.left], src)
            yield partial.step((self.left,), (self.right,), ins, (self,))
        if partial.useful((self.right,), (self.left,)):
            src = partial.regs[self.left]
            ins = NonzeroBooleanEncodeInstruction(partial.regs[self.right], src)
            yield partial.step((self.right,), (self.left,), ins, (self,))


################
# Instructions #
################

# The third phase: an InstructionProgram specifies how to deterministically
# calculate some variables given the values of others. It could be interpreted,
# converted to other programming languages, or even fed into LLVM!


class Register:
    def __init__(self, var):
        self.var = var
        self.typ = var.typ
        self.name = str(var) # None

    names = imap('r{}'.format, itertools.count())
    def __str__(self):
        if self.name is None:
            self.name = next(Register.names)
        return self.name

    def __repr__(self):
        return str(self)


class Instruction:
    def __init__(self, dests, srcs):
        self.dests = tuple(dests)
        self.srcs = tuple(srcs)

    def __str__(self):
        name = type(self).__name__
        if name.endswith('Instruction'):
            name = name[:-11]
        dests = ', '.join(str(dest) for dest in self.dests)
        srcs = ', '.join(str(src) for src in self.srcs)
        return '{} <- {} {}'.format(dests, name, srcs)

    def __repr__(self):
        return 'Instruction(' + str(self) + ')'


class StructInstruction(Instruction):
    def __init__(self, out, fields):
        Instruction.__init__(self, (out,), fields.values())
        self.fields = fields

    def __str__(self):
        return '{} <- {}'.format(self.dests[0], self.fields)


class DestructInstruction(Instruction):
    def __init__(self, fields, struct):
        Instruction.__init__(self, fields.values(), (struct,))
        self.fields = fields
        self.struct = struct

    def __str__(self):
        return '{} <- {}'.format(self.fields, self.struct)


class ConstantInstruction(Instruction):
    def __init__(self, out, value):
        Instruction.__init__(self, (out,), ())
        assert out.typ == value_type(value)
        self.value = value

    def __str__(self):
        return '{} <- Constant {!r}'.format(self.dests[0], self.value)


class CallInstruction(Instruction):
    def __init__(self, callee, outputs, inputs):
        Instruction.__init__(self, outputs, inputs)
        self.callee = callee

    def __str__(self):
        dests = ', '.join(str(dest) for dest in self.dests)
        srcs = ', '.join(str(src) for src in self.srcs)
        return '{} <- {} {}'.format(dests, self.callee.name, srcs)


class AddInstruction(Instruction):
    def __init__(self, out, a, b):
        Instruction.__init__(self, (out,), (a, b))

    def __str__(self):
        return '{} <- {} + {}'.format(self.dests[0], self.srcs[0], self.srcs[1])


class SubInstruction(Instruction):
    def __init__(self, out, a, b):
        Instruction.__init__(self, (out,), (a, b))

    def __str__(self):
        return '{} <- {} - {}'.format(self.dests[0], self.srcs[0], self.srcs[1])


class SliceInstruction(Instruction):
    def __init__(self, out, seq, start, end):
        Instruction.__init__(self, (out,), (seq, start, end))
        self.seq = seq
        self.start = start
        self.end = end

    def __str__(self):
        return '{} <- {}[{}:{}]'.format(self.dests[0], self.seq, self.start, self.end)


class NonzeroBooleanDecodeInstruction(Instruction):
    def __init__(self, left, right):
        Instruction.__init__(self, (left,), (right,))


class NonzeroBooleanEncodeInstruction(Instruction):
    def __init__(self, left, right):
        Instruction.__init__(self, (left,), (right,))


class ConcatInstruction(Instruction):
    def __init__(self, dest, srcs):
        Instruction.__init__(self, (dest,), srcs)


class Uint16leDecodeInstruction(Instruction):
    def __init__(self, left, right):
        Instruction.__init__(self, (left,), (right,))


class Uint16leEncodeInstruction(Instruction):
    def __init__(self, left, right):
        Instruction.__init__(self, (left,), (right,))


class InstructionScope:
    def __init__(self, cscope, outputs, inputs, inses):
        self.cscope = cscope
        self.outputs = tuple(outputs)
        self.inputs = tuple(inputs)
        self.inses = tuple(inses)
        self.name = '{}_{}_from_{}'.format(cscope.name, '_'.join(map(str, self.outputs)), '_'.join(map(str, self.inputs)))

    def __str__(self):
        result = '{}({}):'.format(self.name, ', '.join(['&'+str(reg) for reg in self.outputs] + [str(reg) for reg in self.inputs]))
        result += indent(''.join('\n' + str(ins) for ins in self.inses))
        return result + '\n'


class InstructionProgram:
    def __init__(self, cprog, entrypoints):
        self.entrypoints = list(cprog.callables[name].solve_best(self, wanted, known) for name, wanted, known in entrypoints)

        # Many unnecessary InstructionScopes will be generated. Put only the
        # needed ones in self.scopes.
        self.scopes = set()
        todo = set(self.entrypoints)
        while todo:
            scope = todo.pop()
            self.scopes.add(scope)
            for ins in scope.inses:
                if isinstance(ins, CallInstruction):
                    if ins.callee not in self.scopes:
                        todo.add(ins.callee)

    def __str__(self):
        return '\n'.join(str(scope) for scope in self.scopes)


####################
# Output languages #
####################


def as_python(iprog):
    result = 'from struct import pack, unpack\n'
    for scope in iprog.scopes:
        result += '\ndef {}({}):\n'.format(scope.name, ', '.join(str(reg) for reg in scope.inputs))
        for ins in scope.inses:
            if isinstance(ins, NonzeroBooleanDecodeInstruction):
                result += '    {} = {} != 0\n'.format(ins.dests[0], ins.srcs[0])
            elif isinstance(ins, NonzeroBooleanEncodeInstruction):
                result += '    {} = 1 if {} else 0\n'.format(ins.dests[0], ins.srcs[0])
            elif isinstance(ins, Uint16leDecodeInstruction):
                result += '    {}, = unpack("<H", {})\n'.format(ins.dests[0], ins.srcs[0])
            elif isinstance(ins, Uint16leEncodeInstruction):
                result += '    {} = pack("<H", {})\n'.format(ins.dests[0], ins.srcs[0])
            elif isinstance(ins, ConstantInstruction):
                result += '    {} = {!r}\n'.format(ins.dests[0], ins.value)
            elif isinstance(ins, CallInstruction):
                dests = ', '.join(str(dest) for dest in ins.dests)
                result += '    {} = {}({})\n'.format(dests, ins.callee.name, ', '.join(str(arg) for arg in ins.srcs))
            elif isinstance(ins, AddInstruction):
                result += '    {} = {} + {}\n'.format(ins.dests[0], ins.srcs[0], ins.srcs[1])
            elif isinstance(ins, SubInstruction):
                result += '    {} = {} - {}\n'.format(ins.dests[0], ins.srcs[0], ins.srcs[1])
            elif isinstance(ins, SliceInstruction):
                result += '    {} = {}[{}:{}]\n'.format(ins.dests[0], ins.seq, ins.start, ins.end)
            elif isinstance(ins, StructInstruction):
                result += '    {} = {}\n'.format(ins.dests[0], ins.fields)
            elif isinstance(ins, DestructInstruction):
                for name, reg in ins.fields.items():
                    result += '    {} = {}.{}\n'.format(reg, ins.struct, name)
            elif isinstance(ins, ConcatInstruction):
                result += '    {} = {}\n'.format(ins.dests[0], ' + '.join(str(src) for src in ins.srcs))
            else:
                print(type(ins))
                assert False
        result += '    return ' + ', '.join(str(reg) for reg in scope.outputs) + '\n'
    return result


########
# Demo #
########


Uint16leTransform = SimpleTransform('uint16le', Uint16leConstraint)
NonzeroBooleanTransform = SimpleTransform('nonzero_boolean', NonzeroBooleanConstraint)


if __name__ == '__main__':
    xprog = TransformProgram()
    xprog.add('nzb', NonzeroBooleanTransform)
    xprog.add('nzb16', ComposeTransform(CallTransform('nzb'), Uint16leTransform))
    xprog.add('main', StructTransform(
        (('x', CallTransform('nzb16')),
         ('y', Uint16leTransform))))

    print('Transform program\n=================\n')
    print(xprog)

    cprog = ConstraintProgram(xprog)
    print('Constraint program\n==================\n')
    print(cprog)

    cmain = cprog.callables['main']
    entrypoint0 = ('main', (cmain.args[0],), (cmain.args[1],))
    iprog0 = InstructionProgram(cprog, (entrypoint0,))
    print('Instruction program (decode)\n============================\n')
    print(iprog0)

    print('Python program (decode)\n=======================\n')
    print(as_python(iprog0))

    cmain = cprog.callables['main']
    entrypoint1 = ('main', (cmain.args[1],), (cmain.args[0],))
    iprog1 = InstructionProgram(cprog, (entrypoint1,))
    print('Instruction program (encode)\n============================\n')
    print(iprog1)

    print('Python program (encode)\n=======================\n')
    print(as_python(iprog1))

# TODO: Recursion.

# TODO: More advanced inference.
# - Calculate lower/upper bounds for integer variables.
_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Reply via email to