A problem is that Guile doesn't really provide a god set of fast rank 1 ops. 
None of them have strides!=1 for example (this is ok for regular vectors, but 
it hurts for general arrays), and some are missing start/end or you have to 
write wrappers yourself, like for the typed vectors (other than u8). So in some 
cases you have to do the loop in Scheme. That's fine when the body of the loop 
is Scheme ops but if it's something like copy or fill it really hurts compared 
to C.


> On 11 Sep 2021, at 19:03, Stefan Israelsson Tampe <stefan.ita...@gmail.com> 
> wrote:
> 
> I did some test ands wingo's superb compiler is about equally fast for a hand 
> made scheme loop as the automatic dispatch for getter and setter. It e.g. can 
> copy from 
> e.g. u8 to i16 in about 100 op's / second using native byte order. However 
> compiling it in C lead to nasty 2 Go ops / second. So for these kind of 
> patterns
> it is still better to work in C as it probaly vectorises the operation quite 
> well. Supervectors supports pushing busy loops to C very well and I will 
> probably 
> enable fast C code for some simple utility ops.
> 
> On Wed, Sep 8, 2021 at 9:18 AM lloda <ll...@sarc.name 
> <mailto:ll...@sarc.name>> wrote:
> 
> 
>> On 8 Sep 2021, at 04:04, Stefan Israelsson Tampe <stefan.ita...@gmail.com 
>> <mailto:stefan.ita...@gmail.com>> wrote:
>> 
> 
> ...
> 
>> So using get-setter typically means
>> ((get-setter #f bin1 #f 
>>    (lambda (set) (set v 2 val)))
>> 
>>    #:is-endian 'little          ;; only consider little endian setters like 
>> I know 
>>    #:is-unsigned  #t         ;; only use unsigned
>>    #:is-integer      #t         ;; only use integer representations
>>    #:is-fixed          #t        ;; do not use the scm value vector versions
>> )
>> So a version where we only consider handling nonegative integers of up to 
>> 64bit. The gain is faster compilation as this ideom will dispatch
>> between 4 different versions of the the loop lambda and the compiler could 
>> inline all of them or be able to detect the one that are used and hot 
>> compile that version
>> (a feature we do not have yet in guile) now whe you select between a ref and 
>> a set you will similarly end up with 4*4 versions = 16 different loops that. 
>> full versions
>> is very large and a double loop with all featurs consists of (2*2 + 3*2*2*2 
>> + 4 + 1)**2 = 33*33 ~ 1000 versions of the loop which is crazy if we should 
>> expand the loop
>> for all cases in the compilation. Now guile would just use a functional 
>> approach and not expand the loop everywhere. We will have parameterised 
>> versions of
>> libraries so that one can select which versions to compile for. for example 
>> the general functions that performs transform form one supervector to 
>> another is a general
>> ideom that would use the full dispatc which is not practical, 
> 
> I'm curious where you're going with this.
> 
> I implemented something similar (iiuc) in 
> https://github.com/lloda/guile-newra/ 
> <https://github.com/lloda/guile-newra/>, specifically 
> https://github.com/lloda/guile-newra/blob/master/mod/newra/map.scm 
> <https://github.com/lloda/guile-newra/blob/master/mod/newra/map.scm> , where 
> the lookup/set methods are inlined in the loop. The compilation times indeed 
> grow exponentially so I'm forced to have a default 'generic' case. 
> 
> The idea for fixing this was to have some kind of run time compilation cache 
> so only a fixed number of type combinations that actually get used would be 
> compiled, instead of the tensor product of all types. But I haven't figured 
> out, or actually tried to do that yet.
> 
> Regards
>       
>       Daniel
> 

Reply via email to