On Fri, Sep 11, 2015 at 11:56 PM, Jeffrey Sarnoff <jeffrey.sarn...@gmail.com> wrote: > The way to get a non-global, global-like value is to wrap the whole thing > as a module: > module aes > exports rijnadael > > not_a_global = true > # .. functions use not_a_global ... > end # module
Just to be clear, `not_a_global` is a global. > > Wrapping a value in a type gets you a passable, mutable value via the field > name: > > julia> type PassMe > b::UInt8 > end > julia> useMe = PassMe(0x03); useMe.b > 0x03 > julia> function changeMe!(pass::PassMe, newval::UInt8) > pass.b = newval > true # pass is not returned > end; > julia> changeMe!(useMe, 0xff); useMe.b > 0xff > > > On Friday, September 11, 2015 at 11:34:58 PM UTC-4, Stefan Karpinski wrote: >> >> Oh yeah, I didn't notice that – having those be abstractly typed is going >> to a trainwreck for performance. It should be possible to eliminate all >> allocation from this kind of code. >> >> On Fri, Sep 11, 2015 at 10:46 PM, Tom Breloff <t...@breloff.com> wrote: >>> >>> I looked at your code for a grand total of 20 seconds, but the one thing >>> I would check is whether using concrete types (Int) in your immutable, or >>> maybe making it parametric: >>> >>> immutable AES_cipher_params{T<:Unsigned} >>> bits::T # Cipher key length, bits >>> nk::T # Number of 32-bit words, cipher key >>> nb::T # Number of columns in State >>> nr::T # Number of rounds >>> block_size::T # byte length >>> block_x::T # block dimensions X >>> block_y::T # block dimensions Y >>> end >>> >>> It's possible the compiler can't infer enough types because of this? >>> Just a guess... >>> >>> On Fri, Sep 11, 2015 at 10:38 PM, Corey Moncure <corey....@gmail.com> >>> wrote: >>>> >>>> The creator has descended from the shining heavens and responded to my >>>> post. Cool. >>>> >>>> Here are some stats naively gathered from taking time() before and after >>>> the run and subtracting. Second run is with gc_disable(). A 1.26x speedup >>>> is noted. >>>> >>>> elapsed (s): 0.0916 >>>> Throughput, KB/s: 1705.97 >>>> Average time (μs) per iteration: 0.0 >>>> Estimated cycles / iteration @ 4.0 GHz: 36636.0 >>>> >>>> elapsed (s): 0.0727 >>>> Throughput, KB/s: 2149.6 >>>> Average time (μs) per iteration: 7.2688 >>>> Estimated cycles / iteration @ 4.0 GHz: 29075.0 >>>> >>>> >>>> I'd like to know how to minimize the effect of the garbage collector and >>>> allocations. The algorithm is a tight loop with a handful of tiny >>>> functions, none of which ought to require much allocation. A few variables >>>> for placeholder data are inevitable. But I have read the warnings >>>> discouraging the use of global state. What is the Julia way to allocate a >>>> few bytes of scratch memory that can be accessed within the scope of, say, >>>> apply_ECB_mode!() without having to be reallocated each time gf_mult() or >>>> mix_columns! are called? >>>> >>>> Also, can a function be forced to mutate variables passed to it, >>>> eliminating a redundant assignment? Julia seems happy to do this with >>>> Arrays but not unit primitives. Pass by reference (pointer)? Does the >>>> LLVM >>>> appreciate this? >>>> >>>> On Friday, September 11, 2015 at 6:15:52 PM UTC-4, Stefan Karpinski >>>> wrote: >>>>> >>>>> There's nothing obviously glaring here. I would definitely recommend >>>>> using the built-in profiler to see where it's spending its time. There may >>>>> be subtle type instabilities or some other non-obvious issue. You >>>>> definitely >>>>> ought to be able to get within striking distance of similar C code, which >>>>> should be in the expected 4-10x slower than hand-coded assembly. >>>>> >>>>> On Fri, Sep 11, 2015 at 5:10 PM, Corey Moncure <corey....@gmail.com> >>>>> wrote: >>>>>> >>>>>> https://github.com/cmoncure/crypto/blob/master/aes.jl >>>>>> >>>>>> In the process of learning Julia (and crypto) I implemented the >>>>>> Rijndael block cipher and inverse cipher. I tried to write idiomatic yet >>>>>> concise code, but the performance is not very desirable. On my machine >>>>>> (i5-2500k @ 4.0 Ghz) the throughput is piddling, on the order of 10e6 >>>>>> bytes/sec, and memory allocation is at 3056 bytes / block, which I have >>>>>> not >>>>>> been able to cut down any further. >>>>>> >>>>>> Obviously I do not intend to compete with hand-tuned assembler >>>>>> routines that heavily exploit SIMD and pre-computed tables, but my >>>>>> completely unfounded gut feeling is that given the right input, Julia >>>>>> should >>>>>> be able to approach within a factor of 4-10 without such optimizations. >>>>>> Currently this routine is within a factor of 1000. >>>>>> >>>>>> Any Julia experts out there willing to take a peek at the code and >>>>>> offer some tips for idiomatic (i.e. within the framework of Julia syntax >>>>>> and >>>>>> style) optimizations? >>>>>> >>>>>> In the course of doing this I have run into several gripes with Julia, >>>>>> particularly some of the built-in functions which are often confusing or >>>>>> contradictory by virtue of the type declarations of certain methods (or >>>>>> lack >>>>>> of needed ones). For instance, Julia does not support negative indexing >>>>>> of >>>>>> arrays... so then why do so many functions on arrays take only signed >>>>>> integer types for dimensions? To the noobie it seems like an obvious >>>>>> choice >>>>>> to type data holding the calculation of matrix dimensions or indices as >>>>>> unsigned integers, given that the language does not support negative >>>>>> indexing. Yet this fails unexpectedly in many built-ins such as sub(). >>>>>> >>>>>> >>>>> >>> >> >