Re: [ARTIQ] Proposition for a units system

2014-10-13 Thread Sébastien Bourdeauducq
On 10/14/2014 05:47 AM, Robert Jordens wrote:
 The obvious performance optimization here is to have the constant
  folding transform work across statements.
 The llvm optimizer passes should be able to do that, right?

As far as I can tell right now, not so much. In particular it does not
work well for rational arithmetic (which is also particularly slow),
even after telling it to inline the gcd function. Maybe there is some
llvm optimization pass that I overlooked, or some other problem with
llvm/llvmpy (the execution engine is also unexpectedly slow).

  Since there is only one FUD sampling by the DDS chip every 8 RTIO/DDS
  cycles, this phase difference computation is correct only if now is
  increased by a multiple of 8 in the block of code.
 You mean a multiple of 8 in total after the loop?
 I would think it is correct in any case just that you can not change POW
 (correctly) if now.microcycle != 0. Having known phase even for the
 microcycles is valuable since then we can e.g. turn on and off a dds
 pulse with microcycle phase accuracy using the rf switches.

Yes.

Sébastien

___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq


Re: [ARTIQ] Proposition for a units system

2014-08-15 Thread Sébastien Bourdeauducq
On 08/14/2014 09:16 AM, Robert Jordens wrote:
 Also that excerpt misses the point of the underlying question, which is
 when to best convert floating point physical quantities to integer
 device units.

What about having soft floating point (and rational with e.g. 64-bit
numerator and denominator) support on the device? That's not
particularly hard to implement, as we already need a typing system to
distinguish between 32-bit integers (for fast computations), 64-bit
integers (for high range variables like time) and integer arrays (for
histograms).

Then the compiler just tries to fold constants as much as possible to
improve performance. If parts of the computation are not folded - either
because the transform is not smart enough or because of a dependency on
runtime data - things will still work, albeit at a slower speed.

Sébastien

___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq
Migen/MiSoC: please use de...@lists.m-labs.hk instead.


Re: [ARTIQ] Proposition for a units system

2014-08-15 Thread Robert Jordens
On 08/15/2014 08:32 AM, Sébastien Bourdeauducq wrote:
 On 08/14/2014 09:16 AM, Robert Jordens wrote:
 Also that excerpt misses the point of the underlying question, which is
 when to best convert floating point physical quantities to integer
 device units.
 
 What about having soft floating point (and rational with e.g. 64-bit
 numerator and denominator) support on the device? That's not
 particularly hard to implement, as we already need a typing system to
 distinguish between 32-bit integers (for fast computations), 64-bit
 integers (for high range variables like time) and integer arrays (for
 histograms).
 
 Then the compiler just tries to fold constants as much as possible to
 improve performance. If parts of the computation are not folded - either
 because the transform is not smart enough or because of a dependency on
 runtime data - things will still work, albeit at a slower speed.

That sounds good to me.

I would expect the overwhelming majority of cases to be foldable down to
integer device units. And I assume these could then be represented in
the binary kernel more efficiently than a double or a 2x64 bit rational.

Robert.
___
ARTIQ mailing list
https://ssl.serverraum.org/lists/listinfo/artiq
Migen/MiSoC: please use de...@lists.m-labs.hk instead.


Re: [ARTIQ] Proposition for a units system

2014-08-12 Thread Robert Jordens
Hello,

On 08/12/2014 01:20 AM, Sébastien Bourdeauducq wrote:
 I'd like to propose a new system to deal with units, quantization by the
 devices, and rounding errors.
 
 The basic idea is to expose the native integer units of the devices
 directly to the user. For example, the statement:
 delay(1)
 represents a delay of one microcycle, which is e.g. 12.5ns on the
 Papilio Pro without the SERDES, and 1ns on the KC705 with the 1GHz SERDES.
 Obviously, the computation of native unit values can benefit from some
 automation, to make the system easier to use and facilitate the porting
 of experiments across different device setups. Thus, each driver can
 define its own mapping that converts usual units (us, ms, etc.) into
 native units. For example, a delay of 5 microseconds can be written as:
 delay(5*self.core.units.us)
 which evaluates to 5000=5*1000 (1ns periods) on the KC705, and 400=5*80
 (12.5ns periods) on the Papilio Pro.
 
 The values in unit mappings can be integer or rational. On the Papilio
 Pro, core.units.ns is Fraction(2, 25) and delays that are a multiple of
 12.5ns, e.g.
 delay(25*self.core.units.ns)
 is turned by the compiler (via constant folding) into:
 delay(2)

Quoting from the last meeting's minutes, the notion of a multiplication
breaks down at least in the following three cases:

   * does not help for calibrated quantities (non-linear
 calibration of a dac)

   * does not help for non-constant steps (floating point
 pdq2 time)

   * does not help for timings with physical latencies (different rising
 and falling edge latencies of an AOM due to RF amps etc)

simply because the conversion between physical and device units is not
multiplicative (+ quantization). There are offsets and non-linearities.
And you can't hide that in the operator because of associativity. Thus
the general need for coercion functions. Also the same physical unit
will need different conversion routes to the same device unit depending
on the context (e.g. delay vs absolute time vs pulse duration).

 If the value passed to delay() is not an integer, the compiler returns
 an error. In case the user can live with an approximate value instead,
 they can use the round() function:
 delay(round(24*self.core.units.ns))
 which is again turned into delay(2).
 The error (1ns) can be computed explicitly (when done on the core
 device, this would of course require rational arithmetic support):
 (round(24*ns)-24*ns)/ns
 Fraction(1, 1)
 
 Since writing the whole access path to each unit (self.core.units...)
 results in a heavy and unwieldy syntax, unit mappings can be associated
 to function parameters. For example, the delay() function associates the
 core device's unit mapping to its parameter, so it is possible to use
 the short unit form, e.g.:
 delay(5*us)
 
 Associations are done using the @short_units decorator and parameter
 annotations. For example:
 @short_units
 def pulse(self, frequency: make_frequency_mapping(1000), duration:
 core.units):
 defines a pulse function where the frequency parameter is expressed in
 a unit system where 1000 Hz is the reference (=1), and duration uses
 the mapping of self.core.units.
 (NB: it is a string because function parameter annotations are evaluated
 by Python at function definition time, and the value of self is not
 available yet. It is the same situation as with the optional core device
 selection parameter of the kernel decorator.)

-ESYNTAX. Do you mean
@short_units(frequency=make_frequency_mapping(1000), duration=..)
def pulse(self, frequency, duration):
   ...
?

 One problem with this system occurs when a function passes one of its
 parameters to different devices that potentially use different drivers
 and unit mappings, e.g:
 def pulse2(self, f):
   self.dev_a.pulse(f)
   self.dev_b.pulse(f)
 Parameter annotations can help here as well; the user could write:
 @short_units
 def pulse2(self, f: (dev_a.units, dev_b.units)):
 and the compiler checks at compile time that self.dev_a.units and
 self.dev_b.units are equal.

The syntax is counterintuitive. A MHz is a MHz is a MHz. An intuitive
thing would be dds_a.ftw or core.cycle units.

The @short_units stuff is not needed as physical units are the right
thing to use here if coercion is implicit.

 Of course, if the user prefers convenience over accuracy, they can also
 integrate rounding into pulse2:
 def pulse2(self, f_in_MHz):
   self.dev_a.pulse(round(f_in_MHz*MHz))
   self.dev_b.pulse(round(f_in_MHz*MHz))

Should that read f_in_MHz*self.dev_a.MHz etc?

 This new system offers the following advantages:
 * does not introduce rounding errors by itself.
 * more transparent - lets the user access the native representation if
 needed.
 * since each driver defines its own scales, unnecessary use of large
 integers (which would arise e.g. from representing everything in
 picoseconds and driving some device that operates on the order of
 milliseconds), rationals or floats is reduced.
 
 Comments?

In virtually all