On Fri, Sep 11, 2015 at 11:56 PM, Jeffrey Sarnoff
<jeffrey.sarn...@gmail.com> wrote:
> The way to get  a non-global, global-like value is to wrap the whole thing
> as a module:
> module aes
> exports rijnadael
>
> not_a_global = true
> # .. functions use not_a_global ...
> end # module

Just to be clear, `not_a_global` is a global.

>
> Wrapping a value in a type gets you a passable, mutable value via the field
> name:
>
>  julia> type PassMe
>           b::UInt8
>        end
> julia> useMe = PassMe(0x03); useMe.b
> 0x03
> julia> function changeMe!(pass::PassMe, newval::UInt8)
>           pass.b = newval
>           true # pass is not returned
>        end;
> julia> changeMe!(useMe, 0xff); useMe.b
> 0xff
>
>
> On Friday, September 11, 2015 at 11:34:58 PM UTC-4, Stefan Karpinski wrote:
>>
>> Oh yeah, I didn't notice that – having those be abstractly typed is going
>> to a trainwreck for performance. It should be possible to eliminate all
>> allocation from this kind of code.
>>
>> On Fri, Sep 11, 2015 at 10:46 PM, Tom Breloff <t...@breloff.com> wrote:
>>>
>>> I looked at your code for a grand total of 20 seconds, but the one thing
>>> I would check is whether using concrete types (Int) in your immutable, or
>>> maybe making it parametric:
>>>
>>> immutable AES_cipher_params{T<:Unsigned}
>>>   bits::T # Cipher key length, bits
>>>   nk::T # Number of 32-bit words, cipher key
>>>   nb::T # Number of columns in State
>>>   nr::T # Number of rounds
>>>   block_size::T # byte length
>>>   block_x::T # block dimensions X
>>>   block_y::T # block dimensions Y
>>> end
>>>
>>> It's possible the compiler can't infer enough types because of this?
>>> Just a guess...
>>>
>>> On Fri, Sep 11, 2015 at 10:38 PM, Corey Moncure <corey....@gmail.com>
>>> wrote:
>>>>
>>>> The creator has descended from the shining heavens and responded to my
>>>> post.  Cool.
>>>>
>>>> Here are some stats naively gathered from taking time() before and after
>>>> the run and subtracting.  Second run is with gc_disable().  A 1.26x speedup
>>>> is noted.
>>>>
>>>> elapsed (s): 0.0916
>>>> Throughput, KB/s: 1705.97
>>>> Average time (μs) per iteration: 0.0
>>>> Estimated cycles / iteration @ 4.0 GHz: 36636.0
>>>>
>>>> elapsed (s): 0.0727
>>>> Throughput, KB/s: 2149.6
>>>> Average time (μs) per iteration: 7.2688
>>>> Estimated cycles / iteration @ 4.0 GHz: 29075.0
>>>>
>>>>
>>>> I'd like to know how to minimize the effect of the garbage collector and
>>>> allocations.  The algorithm is a tight loop with a handful of tiny
>>>> functions, none of which ought to require much allocation.  A few variables
>>>> for placeholder data are inevitable.  But I have read the warnings
>>>> discouraging the use of global state.  What is the Julia way to allocate a
>>>> few bytes of scratch memory that can be accessed within the scope of, say,
>>>> apply_ECB_mode!() without having to be reallocated each time gf_mult() or
>>>> mix_columns! are called?
>>>>
>>>> Also, can a function be forced to mutate variables passed to it,
>>>> eliminating a redundant assignment?  Julia seems happy to do this with
>>>> Arrays but not unit primitives.  Pass by reference (pointer)?  Does the 
>>>> LLVM
>>>> appreciate this?
>>>>
>>>> On Friday, September 11, 2015 at 6:15:52 PM UTC-4, Stefan Karpinski
>>>> wrote:
>>>>>
>>>>> There's nothing obviously glaring here. I would definitely recommend
>>>>> using the built-in profiler to see where it's spending its time. There may
>>>>> be subtle type instabilities or some other non-obvious issue. You 
>>>>> definitely
>>>>> ought to be able to get within striking distance of similar C code, which
>>>>> should be in the expected 4-10x slower than hand-coded assembly.
>>>>>
>>>>> On Fri, Sep 11, 2015 at 5:10 PM, Corey Moncure <corey....@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> https://github.com/cmoncure/crypto/blob/master/aes.jl
>>>>>>
>>>>>> In the process of learning Julia (and crypto) I implemented the
>>>>>> Rijndael block cipher and inverse cipher.  I tried to write idiomatic yet
>>>>>> concise code, but the performance is not very desirable.  On my machine
>>>>>> (i5-2500k @ 4.0 Ghz) the throughput is piddling, on the order of 10e6
>>>>>> bytes/sec, and memory allocation is at 3056 bytes / block, which I have 
>>>>>> not
>>>>>> been able to cut down any further.
>>>>>>
>>>>>> Obviously I do not intend to compete with hand-tuned assembler
>>>>>> routines that heavily exploit SIMD and pre-computed tables, but my
>>>>>> completely unfounded gut feeling is that given the right input, Julia 
>>>>>> should
>>>>>> be able to approach within a factor of 4-10 without such optimizations.
>>>>>> Currently this routine is within a factor of 1000.
>>>>>>
>>>>>> Any Julia experts out there willing to take a peek at the code and
>>>>>> offer some tips for idiomatic (i.e. within the framework of Julia syntax 
>>>>>> and
>>>>>> style) optimizations?
>>>>>>
>>>>>> In the course of doing this I have run into several gripes with Julia,
>>>>>> particularly some of the built-in functions which are often confusing or
>>>>>> contradictory by virtue of the type declarations of certain methods (or 
>>>>>> lack
>>>>>> of needed ones).  For instance, Julia does not support negative indexing 
>>>>>> of
>>>>>> arrays... so then why do so many functions on arrays take only signed
>>>>>> integer types for dimensions?  To the noobie it seems like an obvious 
>>>>>> choice
>>>>>> to type data holding the calculation of matrix dimensions or indices as
>>>>>> unsigned integers, given that the language does not support negative
>>>>>> indexing.  Yet this fails unexpectedly in many built-ins such as sub().
>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Reply via email to