Re: [julia-users] Re: PSA: Changes to Base Docstrings

2015-08-08 Thread Joshua Adelman
Public service announcement. It's actually on that Wikipedia page under the
"other" section:
https://en.m.wikipedia.org/wiki/Public_service_announcement

Josh

On Aug 8, 2015, at 8:15 AM, Uwe Fechner  wrote:

Hello,
what does PSA in the title mean?

Wikipedia didn't help: https://en.wikipedia.org/wiki/Psa

Uwe

Am Sonntag, 2. August 2015 21:22:03 UTC+2 schrieb Mike Innes:
>
> Hi All,
>
> As of #11943 , Julia uses
> the shiny new doc system to provide help for functions in the REPL.
> Previously, docstrings were stored in ReStructured text files
>  as part of
> the manual, but now all docstrings are kept in Base
>  and
> the rst docs are automatically generated.
>
> What this means immediately is that any updates to Base docstrings must
> happen in the helpdb.jl file
> , as
> opposed to the RST docs. Note that it's still fine to edit the main body of
> the manual; only function docstrings, which can be identified by the `..
> function::` syntax, are affected by this change.
>
> Right now everything is in one big file, but we'd like to move the
> docstrings to appropriate places in Base and convert them to Markdown.
> We'll need some help with that, as there are a *lot* of docstrings, but
> when we're ready we'll send out a call for help so that people can get
> involved.
>
> There's still a lot of work to do, but I think this will be a big step
> forward for documentation in Julia.
>
> – Mike
>


[julia-users] Memory mapping composite type + fixed length strings

2014-11-21 Thread Joshua Adelman
I'm playing around with Julia for the first time in an attempt to see if I 
can replace a Python + Cython component of a system I'm building. Basically 
I have a file of bytes representing a numpy structured/recarray (in memory 
this is an array of structs). This gets memory mapped into a numpy array as 
(Python code):

f = open(data_file, 'r+')
cmap = mmap.mmap(f.fileno(), nbytes)
data_array = np.ndarray(size, dtype=dtype, buffer=cmap)


where dtype=[('x', np.int32), ('y', np.float64), ('name', 'S17')].

In cython I would create a C packed struct and to deal with the fixed 
length string elements, I would specify them as char[N] arrays:

cdef packed struct atype:
np.int32_t x
np.float64 y
char[17] name

I'm trying to figure out how I would accomplish something similar in Julia. 
Setting aside the issue of the fixed length strings for a moment, I thought 
to initially create a composite type:

immutable AType
x::Int32
y::Float64
name::???
end

and then if I had an file containing 20 records use:

f = open("test1.dat", "r")
data = mmap_array(AType, 20, f)

but I get an error:

ERROR: `mmap_array` has no method matching mmap_array(::Type{AType}, 
::Int64, ::IOStream)

Is there a way to memory map a file into an array of custom 
records/composite types in Julia? And if there is, how should one represent 
the fixed length string fields?

Any suggestions would be much appreciated.

Josh



Re: [julia-users] Memory mapping composite type + fixed length strings

2014-11-21 Thread Joshua Adelman
Hi Tim,

Thanks for pointing out my basic error. I can now get some test files read 
that don't contain any string components.

Josh



On Friday, November 21, 2014 3:11:32 PM UTC-5, Tim Holy wrote:
>
> You'll see why if you type `methods(mmap_array)`: the dims has to be 
> represented as a tuple. 
>
> Currently, the only way I know of to create a fixed-sized buffer as an 
> element 
> of a "struct" in julia is via immutables with one field per object. Here's 
> one 
> example: 
>
> https://github.com/JuliaGPU/CUDArt.jl/blob/1742a19b35a52ecec4ee14cfbec823f8bcb22e0f/gen/gen_libcudart_h.jl#L403-L660
>  
>
> It has not escaped notice that this is less than ideal :-). 
>
> --Tim 
>
> On Friday, November 21, 2014 11:57:10 AM Joshua Adelman wrote: 
> > I'm playing around with Julia for the first time in an attempt to see if 
> I 
> > can replace a Python + Cython component of a system I'm building. 
> Basically 
> > I have a file of bytes representing a numpy structured/recarray (in 
> memory 
> > this is an array of structs). This gets memory mapped into a numpy array 
> as 
> > (Python code): 
> > 
> > f = open(data_file, 'r+') 
> > cmap = mmap.mmap(f.fileno(), nbytes) 
> > data_array = np.ndarray(size, dtype=dtype, buffer=cmap) 
> > 
> > 
> > where dtype=[('x', np.int32), ('y', np.float64), ('name', 'S17')]. 
> > 
> > In cython I would create a C packed struct and to deal with the fixed 
> > length string elements, I would specify them as char[N] arrays: 
> > 
> > cdef packed struct atype: 
> > np.int32_t x 
> > np.float64 y 
> > char[17] name 
> > 
> > I'm trying to figure out how I would accomplish something similar in 
> Julia. 
> > Setting aside the issue of the fixed length strings for a moment, I 
> thought 
> > to initially create a composite type: 
> > 
> > immutable AType 
> > x::Int32 
> > y::Float64 
> > name::??? 
> > end 
> > 
> > and then if I had an file containing 20 records use: 
> > 
> > f = open("test1.dat", "r") 
> > data = mmap_array(AType, 20, f) 
> > 
> > but I get an error: 
> > 
> > ERROR: `mmap_array` has no method matching mmap_array(::Type{AType}, 
> > 
> > ::Int64, ::IOStream) 
> > 
> > Is there a way to memory map a file into an array of custom 
> > records/composite types in Julia? And if there is, how should one 
> represent 
> > the fixed length string fields? 
> > 
> > Any suggestions would be much appreciated. 
> > 
> > Josh 
>
>

Re: [julia-users] Memory mapping composite type + fixed length strings

2014-11-22 Thread Joshua Adelman
For testing/evaluation purposes, it's actually the case that I don't need 
to actually use any of the fixed length string fields. They are in the 
data, but I have numerical encodings for most of the important ones in 
other fields. So in playing around I found that I could just create a 
bitstype of the appropriate size to consume them when using test data. 

However the caveat to this strategy is that numpy's internal memory layout 
for a record array (and when it's saved to a file), is that the records are 
packed structs. So a record with fields float64, int32, float32, int16 has 
an itemsize of 18 bytes. If I build an immutable composite type in Julia:

immutable TestType
a::Float64
b::Int32
c::Float32
d::Int16
end

And then query the size of each field and the the aggregate as a whole:

println(sizeof(Float64))
println(sizeof(Int32))
println(sizeof(Float32))
println(sizeof(Int16))
println(sizeof(TestType))

I get:

8
4
4
2
24

So it looks like Julia is padding the internal layout of `TestType` (I was 
hoping this wasn't the case based on some of the language in:
http://julialang.org/blog/2013/03/efficient-aggregates/). 

If I put dummy fields to mimic the padding when I create some test data, I 
can get everything to work just fine on the Julia side. However, if for 
real data I assume that I get a file and can't pick packed vs padded, is 
there any way on the Julia side to specified that my immutable type's 
memory layout should be packed? Or perhaps some other workaround? I 
couldn't find any promising leads on this reading the documentation and 
searching through the Github issues. In cython this is trivial to do since 
you can define an actual struct as `cdef packed struct`. 

I forgot to mention it when I posted originally, but I'm currently using 
Julia v0.3.2 on OS X.

Any suggestions would be appreciated.

Josh


On Friday, November 21, 2014 3:11:32 PM UTC-5, Tim Holy wrote:
>
> You'll see why if you type `methods(mmap_array)`: the dims has to be 
> represented as a tuple. 
>
> Currently, the only way I know of to create a fixed-sized buffer as an 
> element 
> of a "struct" in julia is via immutables with one field per object. Here's 
> one 
> example: 
>
> https://github.com/JuliaGPU/CUDArt.jl/blob/1742a19b35a52ecec4ee14cfbec823f8bcb22e0f/gen/gen_libcudart_h.jl#L403-L660
>  
>
> It has not escaped notice that this is less than ideal :-). 
>
> --Tim 
>
> On Friday, November 21, 2014 11:57:10 AM Joshua Adelman wrote: 
> > I'm playing around with Julia for the first time in an attempt to see if 
> I 
> > can replace a Python + Cython component of a system I'm building. 
> Basically 
> > I have a file of bytes representing a numpy structured/recarray (in 
> memory 
> > this is an array of structs). This gets memory mapped into a numpy array 
> as 
> > (Python code): 
> > 
> > f = open(data_file, 'r+') 
> > cmap = mmap.mmap(f.fileno(), nbytes) 
> > data_array = np.ndarray(size, dtype=dtype, buffer=cmap) 
> > 
> > 
> > where dtype=[('x', np.int32), ('y', np.float64), ('name', 'S17')]. 
> > 
> > In cython I would create a C packed struct and to deal with the fixed 
> > length string elements, I would specify them as char[N] arrays: 
> > 
> > cdef packed struct atype: 
> > np.int32_t x 
> > np.float64 y 
> > char[17] name 
> > 
> > I'm trying to figure out how I would accomplish something similar in 
> Julia. 
> > Setting aside the issue of the fixed length strings for a moment, I 
> thought 
> > to initially create a composite type: 
> > 
> > immutable AType 
> > x::Int32 
> > y::Float64 
> > name::??? 
> > end 
> > 
> > and then if I had an file containing 20 records use: 
> > 
> > f = open("test1.dat", "r") 
> > data = mmap_array(AType, 20, f) 
> > 
> > but I get an error: 
> > 
> > ERROR: `mmap_array` has no method matching mmap_array(::Type{AType}, 
> > 
> > ::Int64, ::IOStream) 
> > 
> > Is there a way to memory map a file into an array of custom 
> > records/composite types in Julia? And if there is, how should one 
> represent 
> > the fixed length string fields? 
> > 
> > Any suggestions would be much appreciated. 
> > 
> > Josh 
>
>

Re: [julia-users] Memory mapping composite type + fixed length strings

2014-11-22 Thread Joshua Adelman


On Saturday, November 22, 2014 6:31:27 PM UTC-5, Patrick O'Leary wrote:
>
> On Saturday, November 22, 2014 9:57:51 AM UTC-6, Isaiah wrote:
>>
>> Have you looked at StrPack.jl? It may have a packed option. Julia uses 
>> the platform ABI padding rules for easy interop with C.
>>
>
> Yes, you can used the align_packed strategy.
>

Hi Patrick,

I just checked out StrPack and installed it. I think I have it working 
properly in the case of a packed immutable type that just contains 
numerical fields. I'm still not sure how to make things work to unpack 
fixed length strings. If my string in field "e" is a fixed-length of 4, 
then my composite type looks like

@struct immutable TestType2
a::Float64
b::Int32
c::Float32
d::Int16
e::Array{Uint8,1}(4)   # also tried ASCIIString(4)
end

And then I know my data on file has 10 records, I can loop through as:

data = Array(TestType2, (10,))

f = open("test1.dat", "r")
for k = 1:10
data[k] = unpack(f, TestType2, {"a"=>8, "b"=>4,"c"=>4, "d"=>2, "e"=>4}, 
align_packed, :NativeEndian)
println(data[k])
end

But Array{Uint8,1}(4) results in corrupted values after the first record is 
read in and ASCIIString(4) gives an error:

TestType2(-1.2538434657806456,0,0.5783242f0,0,"a")
ERROR: invalid ASCII sequence

Where in the first record that appears to be read correctly (at least it 
prints), the last field value should be "abcd" not "a". Am I missing 
something obvious? Also, is there a strategy for combining all of this with 
a memory mapped file?

I really appreciate all of the suggestions everyone has offered. It's my 
first foray into Julia and I'm guessing I'm starting out with something a 
little atypical. I'm hoping if I can get the data read then everything else 
should be much more straight forward. 



[julia-users] Re: do I misunderstand zip, or is it not referentially transparent?

2014-12-21 Thread Joshua Adelman


On Sunday, December 21, 2014 10:36:46 PM UTC-5, Ronald L. Rivest wrote:
>
> I find the following behavior of zip extremely confusing;
> either I am misunderstanding something, or perhaps there
> is a bug in the implementation of zip(?) [I would expect
> the first and last expressions to give the same result!] :
>
>
> julia> [t for t in zip([2,3,4],[6,7,8])]
> 3-element Array{(Int64,Int64),1}:
>  (2,6)
>  (3,7)
>  (4,8)
>
> julia> x = [2,3,4]
> 3-element Array{Int64,1}:
>  2
>  3
>  4
>
> julia> y = [6,7,8]
> 3-element Array{Int64,1}:
>  6
>  7
>  8
>
> julia> [t for t in zip(x,y)]
> 1-element Array{(Any,Any),1}:
> (4,7)
>
>
> Can anyone explain this behavior??
>
> Thanks!
>
> Cheers,
> Ron Rivest 
>

Your second example seemed to work properly for me using v0.3.3 on OS X:

julia> x = [2,3,4]
3-element Array{Int64,1}:
 2
 3
 4

julia> y = [6,7,8]
3-element Array{Int64,1}:
 6
 7
 8

julia> [t for t in zip(x,y)]
3-element Array{(Any,Any),1}:
 (2,6)
 (3,7)
 (4,8) 

Josh


[julia-users] Converting string to module type/Programmatically using modules

2014-12-29 Thread Joshua Adelman
I'm attempting to do some dynamic module loading and subsequent processing 
and can't quite figure something out. I think I have a reasonable (albeit 
maybe not idiomatic) mechanism for dynamically importing a set of modules. 
See my stackoverflow question and the subsequent self answer:

http://stackoverflow.com/q/27696356/392949

My next question is, after I've dynamically/programmatically imported a 
bunch of modules, is there a way of iterating over that set of modules? 
Specifically, if I store the module names as strings (e.g. ["mod00", 
"mod01", "mod02"]) and each module contains some function `func` that isn't 
exported, how would I call `mod00.func()` if I only know the string 
"mod00"? I think this comes down to converting the string to a Module type, 
but I'm not sure and haven't come up with something workable after a few 
passes through the docs.

Any suggestions would be very much appreciated.

Josh


Re: [julia-users] Converting string to module type/Programmatically using modules

2014-12-29 Thread Joshua Adelman
Thanks Isaiah. That helps a lot and I think I'm starting to understand what 
those calls are doing. I'm coming from a python background, so I don't have 
much experience with metaprogramming. I'm prototyping port of a complex 
system that I had previously written in python/cython to Julia to see how 
Julia performs.

The basic idea is to dynamically import a bunch of modules that each 
contain some metadata and an implementation of a feature to transform some 
data. The metadata will be parsed and inserted into a composite type that 
also contains a ::Function field that will hold the implementing function. 
A collection of these composite types will then be systematically run 
against some large set of data. 

The dynamic import is to provide something like a plugin system and also to 
control the startup cost by only importing what is needed for a particular 
calculation, rather than every available module. 

Thanks again for your help. I'm having a lot (too much?) fun with Julia 
thus far.

Josh

On Monday, December 29, 2014 11:19:51 PM UTC-5, Isaiah wrote:
>
> Regarding your SO question, the suggested way to do this is with a macro, 
> but (as you observed) it is a bit tricky. Whenever I try to do this I have 
> to remind myself how it works by using parse and dump to see how the 
> expression should look:
>
> ```
> julia> parse("import foo")
> :(import foo)
>
> julia> dump(x)
> Expr 
>   head: Symbol import
>   args: Array(Any,(1,))
> 1: Symbol foo
>   typ: Any
> ```
>
> So then I can duplicate that and eval it, for example:
> ```
> julia> eval(Expr(:import, :Color))
> ```
>
> module contains some function `func` that isn't exported, how would I call 
>> `mod00.func()` if I only know the string "mod00"
>
>
> ```
> julia> module mod00
>func() = println("hello")
>end
>
> julia> eval(symbol("mod00")).(symbol("func"))()
> hello
> ```
>
> I'm not sure what the overall goal is here, but just beware that eval 
> should only be used like this for testing purposes or else your code will 
> be slow.
>  
>
> On Mon, Dec 29, 2014 at 10:45 PM, Joshua Adelman  > wrote:
>
>> I'm attempting to do some dynamic module loading and subsequent 
>> processing and can't quite figure something out. I think I have a 
>> reasonable (albeit maybe not idiomatic) mechanism for dynamically importing 
>> a set of modules. See my stackoverflow question and the subsequent self 
>> answer:
>>
>> http://stackoverflow.com/q/27696356/392949
>>
>> My next question is, after I've dynamically/programmatically imported a 
>> bunch of modules, is there a way of iterating over that set of modules? 
>> Specifically, if I store the module names as strings (e.g. ["mod00", 
>> "mod01", "mod02"]) and each module contains some function `func` that isn't 
>> exported, how would I call `mod00.func()` if I only know the string 
>> "mod00"? I think this comes down to converting the string to a Module type, 
>> but I'm not sure and haven't come up with something workable after a few 
>> passes through the docs.
>>
>> Any suggestions would be very much appreciated.
>>
>> Josh
>>
>
>

[julia-users] Micro-benchmarking Vandermonde matrix generation

2015-01-03 Thread Joshua Adelman
I'm a long time Python/Numpy user and am starting to play around with Julia 
a bit. To get a handle on the language and how to write fast code, I've 
been implementing some simple functions and then trying to performance tune 
them. One such experiment involved generating Vandermonde matrices and then 
comparing timings vs the method that numpy supplies (np.vander). The code 
for two simple implementations that I wrote, plus the method supplied by 
MatrixDepot.jl are in this notebook along with timings for each and the 
corresponding timing for the numpy method on the same machine. 

http://nbviewer.ipython.org/gist/synapticarbors/26910166ab775c04c47b

Generally Julia fares pretty well against numpy here, but does not 
necessarily match or beat it over all array sizes. The methods I wrote are 
similar to the numpy implementation and are typically faster than what's in 
MatrixDepot.jl, but I was hoping someone with a bit more experience in 
Julia might have some further tricks that would be educational to see. I've 
already looked at the Performance Tips section of the documentation, and I 
think I'm following best practices. 

Suggestions and feedback are appreciated as always.

Josh

PS - maybe it's my experience with numpy, but I wonder if any thought has 
been given to following more of a numpy-style convention of always 
returning a view of an array when possible rather than a copy? This is 
often a source of confusion with new numpy users, but once you realize that 
generally, if the slice or transformation can be represented by a fixed 
stride through the data, then you get a view, it's pretty intuitive. 


Re: [julia-users] Micro-benchmarking Vandermonde matrix generation

2015-01-03 Thread Joshua Adelman


On Saturday, January 3, 2015 11:56:20 PM UTC-5, Jiahao Chen wrote:
>
>
> On Sat, Jan 3, 2015 at 11:27 PM, Joshua Adelman  > wrote:
>
>> PS - maybe it's my experience with numpy, but I wonder if any thought has 
>> been given to following more of a numpy-style convention of always 
>> returning a view of an array when possible rather than a copy? 
>
>
> Search the Github issues for ArrayViews.
>
> All 3 versions of your Julia code have assignments of the form
>
> v2[:, k] = p
>
> which could be faster if you explicitly devectorize.
>
> Thanks,
>
> Jiahao Chen
> Staff Research Scientist
> MIT Computer Science and Artificial Intelligence Laboratory
>

Hi Jiahao,

Actually when I devectorized vander2 (explicitly looping over the first 
dimension as the inner-most loop) as suggested, while cases where N < 100 
are indeed faster, for N > 100 the devectorized code is slower.

Josh 


Re: [julia-users] Speeding up Floating Point Operations in Julia

2015-01-06 Thread Joshua Adelman

When I just stick this whole thing in a function (as is recommended by the 
performance tips section of the docs), it goes from 0.03 seconds to 1.2e-6 
seconds. Literally:

function testf()

end

I’ve just started playing around with Julia myself, and I’ve definitely 
appreciated that there is, what feels like, a very small set of rules to follow 
to get good performance. 

Josh

On January 6, 2015 at 8:38:44 PM, Rodolfo Santana (santana9...@gmail.com) wrote:

# Electron mass in grams
m_e = 9.1094e-28 
# Speed of light in cm/sec
c = 2.9979e10 

# Electron momentum direction in Cartesian coordinates
v10elec = 1.0/sqrt(2.0)
v20elec = -1.0/sqrt(3.0)
v30elec = 1.0/sqrt(6.0)

# Photon momentum direction in Cartesian coordinates
omega_one_phot = -1.0/sqrt(3.0) 
omega_two_phot = 1.0/sqrt(4.0) 
omega_three_phot = sqrt(5.0/12.0) 

# Dimensionless electron speed and Lorentz factor
beta_e = 0.98 ; 
gamma_e = 1.0/sqrt(1-beta_e*beta_e)

# Photon energy in ergs
E_phot_comv = 1.6022e-9

# Angle between electron and photon momentum 
mu = v10elec*omega_one_phot + v20elec*omega_two_phot + v30elec*omega_three_phot

# Dimensionless energy of photon in electron rest frame
tic()
x = (2.0*gamma_e*E_phot_comv*(1-mu*beta_e))/(m_e*c*c)
toc()

Re: [julia-users] Speeding up Floating Point Operations in Julia

2015-01-06 Thread Joshua Adelman
Here's what I did:

function testf()
   # Electron mass in grams
   m_e = 9.1094e-28
   # Speed of light in cm/sec
   c = 2.9979e10

   # Electron momentum direction in Cartesian coordinates
   v10elec = 1.0/sqrt(2.0)
   v20elec = -1.0/sqrt(3.0)
   v30elec = 1.0/sqrt(6.0)

   # Photon momentum direction in Cartesian coordinates
   omega_one_phot = -1.0/sqrt(3.0)
   omega_two_phot = 1.0/sqrt(4.0)
   omega_three_phot = sqrt(5.0/12.0)

   # Dimensionless electron speed and Lorentz factor
   beta_e = 0.98 ;
   gamma_e = 1.0/sqrt(1-beta_e*beta_e)

   # Photon energy in ergs
   E_phot_comv = 1.6022e-9

   # Angle between electron and photon momentum
   mu = v10elec*omega_one_phot + v20elec*omega_two_phot + v30elec*
omega_three_phot

   # Dimensionless energy of photon in electron rest frame
   x = (2.0*gamma_e*E_phot_comv*(1-mu*beta_e))/(m_e*c*c)
 end


and then to time it, I used:

julia> @time testf()
elapsed time: 0.002718398 seconds (172020 bytes allocated)
0.028022594193490992


julia> @time testf()
elapsed time: 2.838e-6 seconds (96 bytes allocated)
0.028022594193490992

Notice that the first time it ran the time includes the time it takes for 
Julia to JIT the code, but then all subsequent runs are much faster since 
they're using the compiled/specialized code. 

Josh

On Tuesday, January 6, 2015 9:11:55 PM UTC-5, Rodolfo Santana wrote:
>
> Thanks for the replies, I really appreciate it! I don't fully understand 
> the solution tough. I did
>
> function testf()
> My Code
> end
>
> Then, when I do the commands shown below, I still get 1e-3 seconds. 
>
> tic()
> testf()
> toc()
>
>
>
> On Tuesday, January 6, 2015 7:48:02 PM UTC-6, Joshua Adelman wrote:
>>
>>
>> When I just stick this whole thing in a function (as is recommended by 
>> the performance tips section of the docs), it goes from 0.03 seconds to 
>> 1.2e-6 seconds. Literally:
>>
>> function testf()
>> 
>> end
>>
>> I’ve just started playing around with Julia myself, and I’ve definitely 
>> appreciated that there is, what feels like, a very small set of rules to 
>> follow to get good performance. 
>>
>> Josh
>>
>> On January 6, 2015 at 8:38:44 PM, Rodolfo Santana (santa...@gmail.com) 
>> wrote:
>>
>> # Electron mass in grams
>> m_e = 9.1094e-28 
>> # Speed of light in cm/sec
>> c = 2.9979e10 
>>
>> # Electron momentum direction in Cartesian coordinates
>> v10elec = 1.0/sqrt(2.0)
>> v20elec = -1.0/sqrt(3.0)
>> v30elec = 1.0/sqrt(6.0)
>>
>> # Photon momentum direction in Cartesian coordinates
>> omega_one_phot = -1.0/sqrt(3.0) 
>> omega_two_phot = 1.0/sqrt(4.0) 
>> omega_three_phot = sqrt(5.0/12.0) 
>>
>> # Dimensionless electron speed and Lorentz factor
>> beta_e = 0.98 ; 
>> gamma_e = 1.0/sqrt(1-beta_e*beta_e)
>>
>> # Photon energy in ergs
>> E_phot_comv = 1.6022e-9
>>
>> # Angle between electron and photon momentum 
>> mu = v10elec*omega_one_phot + v20elec*omega_two_phot + 
>> v30elec*omega_three_phot
>>
>> # Dimensionless energy of photon in electron rest frame
>> tic()
>> x = (2.0*gamma_e*E_phot_comv*(1-mu*beta_e))/(m_e*c*c)
>> toc()
>>
>>

Re: [julia-users] Speeding up Floating Point Operations in Julia

2015-01-06 Thread Joshua Adelman
My understanding from the style guide and general recommendations is that you 
shouldn’t have a single monolithic function that gets called once. Instead it’s 
idiomatic and performant to compose complex algorithms from many concise 
functions that are called repeatedly. If one of those functions is called 
within a loop with many iterations, than that first call should be negligible 
compared to the total runtime.

As an added benefit, small functions that do one well-defined thing also make 
testing much easier.

Josh

On January 6, 2015 at 9:34:39 PM, K leo (cnbiz...@gmail.com) wrote:

Might be slightly off-topic, but closely related.

Does anyone find the logic to run a code first just to compile it and then do 
the real run afterwards somewhat flawed, or am I missing anything?

Suppose I have a code that takes a day to finish after being compiled.  So the 
first run (since it is being compiled) might take say 5 days.  But after that 5 
days I have got the results and there is no need to run it the second time.  So 
the supposedly fast execution after compile is not going to be necessary 
anyway, and hence provides no benefits.

On Wednesday, January 7, 2015, Christoph Ortner  
wrote:
Maybe run 

test()

then 

tic()
testf()
toc()

so that the code is compiled first? Just a guess

   Christoph




Re: [julia-users] numpy vs julia benchmarking for random matrix-vector multiplication

2015-01-08 Thread Joshua Adelman
numpy.dot is calling BLAS under the hood so it's calling fast code and I 
wouldn't expect Julia to shine against it. Try calling numpy methods that 
aren't thin wrappers around C and you should see a bigger difference. Or 
implement a larger more complex algorithm. Here's a simple micro-benchmark I 
did recently of Julia vs np.vander:

http://nbviewer.ipython.org/gist/synapticarbors/26910166ab775c04c47b

Not large, but maybe a bit more illustrative.

Josh


On Jan 8, 2015, at 1:27 PM, Dakota St. Laurent wrote:

> hi all, I've been trying to test some simple benchmarks for my new job to see 
> what language we should use between Python (Numpy/Scipy) and Julia. I like 
> how simple it seems for Julia to do things in parallel (we plan to be running 
> code on a supercomputer using lots and lots of cores), but I'm not getting 
> the ideal benchmarks. I'm sure I'm doing something wrong here.
> 
> Python code:
> 
> import time, numpy as np
> N = 25000
> A = np.random.rand(N,N)
> x = np.random.rand(N)
> 
> t0 = time.clock()
> A.dot(x)
> print time.clock() - t0
> 
> 
> 
> Julia code:
> 
> function rand_mat_vec_mul(A::Array{Float64, 2}, x::Array{Float64,1})
>   tic()
>   A * x
>   toc()
> end
> 
> # warmup
> rand_mat_vec_mul(rand(1000,1000), rand(1000))
> rand_mat_vec_mul(rand(1000,1000), rand(1000))
> rand_mat_vec_mul(rand(1000,1000), rand(1000))
> 
> # timing
> rand_mat_vec_mul(rand(25000,25000), rand(25000))
> 
> ---
> 
> Python generally takes about 0.630 - 0.635 seconds, Julia generally takes 
> about 0.640 - 0.650 seconds. as I said, I'm sure I'm doing something wrong, 
> I'm just not really sure what. any help is appreciated :)



Re: [julia-users] numpy vs julia benchmarking for random matrix-vector multiplication

2015-01-08 Thread Joshua Adelman
You're reading it correctly. I'm not sure exactly why numpy is performing 
better for the larger matrices, but I suspect that numpy's accumulate function 
that np.vander uses may be taking advantage of simd, sse or mkl optimizations. 
I'd have to do a bit of digging though to confirm that. I also haven't 
experimented with Julia's @inbounds or @simd operators, but that might help. I 
also believe that some of the subarray stuff might be better optimized in Julia 
0.4 vs 0.3.4.

Josh


On Jan 8, 2015, at 3:21 PM, Dakota St. Laurent wrote:

> hey Josh,
> 
> that makes sense to me. your benchmark, though, I may not be understanding; 
> it looks as if Julia is slower for larger matrices. is this true, or am I 
> just going crazy and not able to properly read graphs anymore?
> 
> On Thursday, January 8, 2015 at 12:46:29 PM UTC-6, Joshua Adelman wrote:
> numpy.dot is calling BLAS under the hood so it's calling fast code and I 
> wouldn't expect Julia to shine against it. Try calling numpy methods that 
> aren't thin wrappers around C and you should see a bigger difference. Or 
> implement a larger more complex algorithm. Here's a simple micro-benchmark I 
> did recently of Julia vs np.vander:
> 
> http://nbviewer.ipython.org/gist/synapticarbors/26910166ab775c04c47b
> 
> Not large, but maybe a bit more illustrative.
> 
> Josh
> 
> 
> On Jan 8, 2015, at 1:27 PM, Dakota St. Laurent wrote:
> 
>> hi all, I've been trying to test some simple benchmarks for my new job to 
>> see what language we should use between Python (Numpy/Scipy) and Julia. I 
>> like how simple it seems for Julia to do things in parallel (we plan to be 
>> running code on a supercomputer using lots and lots of cores), but I'm not 
>> getting the ideal benchmarks. I'm sure I'm doing something wrong here.
>> 
>> Python code:
>> 
>> import time, numpy as np
>> N = 25000
>> A = np.random.rand(N,N)
>> x = np.random.rand(N)
>> 
>> t0 = time.clock()
>> A.dot(x)
>> print time.clock() - t0
>> 
>> 
>> 
>> Julia code:
>> 
>> function rand_mat_vec_mul(A::Array{Float64, 2}, x::Array{Float64,1})
>>   tic()
>>   A * x
>>   toc()
>> end
>> 
>> # warmup
>> rand_mat_vec_mul(rand(1000,1000), rand(1000))
>> rand_mat_vec_mul(rand(1000,1000), rand(1000))
>> rand_mat_vec_mul(rand(1000,1000), rand(1000))
>> 
>> # timing
>> rand_mat_vec_mul(rand(25000,25000), rand(25000))
>> 
>> ---
>> 
>> Python generally takes about 0.630 - 0.635 seconds, Julia generally takes 
>> about 0.640 - 0.650 seconds. as I said, I'm sure I'm doing something wrong, 
>> I'm just not really sure what. any help is appreciated :)
> 



Re: [julia-users] numpy vs julia benchmarking for random matrix-vector multiplication

2015-01-08 Thread Joshua Adelman
Hi Jiahao,

Just a small note - based on your comments in the thread you mentioned, I ended 
up changing my test to just multiply ones to avoid over/underflow. Those are 
the results that are now in that notebook, so that shouldn't be an issue in the 
plotted timings. On the python side, I'm using Numpy 1.9 via the Anaconda 
distribution built against MKL.

Josh


On Jan 8, 2015, at 4:15 PM, Jiahao Chen wrote:

> As Stefan wrote, all you are really doing with larger matrix tests is testing 
> the speed of the different BLAS implementations being used by your 
> distributions of Julia and NumPy.
> 
> As I wrote in the other thread
> 
> https://groups.google.com/d/msg/julia-users/Q96aPufg4S8/IBU9hW0xvWYJ
> 
> the Vandermonde matrix generation is significantly faster for me in Julia 
> than in Python (numpy using reference BLAS).
> 
> Furthermore Vandermonde is not a good test with larger matrix sizes since you 
> are basically testing the speed of multiplying things by infinity, which may 
> not be representative of typical computations as it may incur overhead from 
> handling overflows.
> 
> For n=1 I get the following results:
> 
> Macbook Julia (Core(TM) i5-4258U)
> +/-1 matrix: 1.22s
> [1:n] matrix: 1.21s #mostly overflow
> rand matrix: 2.95s #mostly underflow
> 
> Macbook NumPy
> +/-1 matrix: 3.96s
> [1:n] matrix: 5.04s #mostly overflow
> rand matrix: 5.46s #mostly underflow
> 
> Linux Julia (Xeon(R) E7- 8850)
> +/-1 matrix: 2.18s
> [1:n] matrix: 1.89s #mostly overflow
> rand matrix: 4.36s #mostly underflow
> 
> Linux NumPy
> +/-1 matrix: 9.38s
> [1:n] matrix: 10.64s #mostly overflow *
> rand matrix: 32.30s #mostly underflow
> 
> * emits warnings:
> /usr/lib/python2.7/dist-packages/numpy/lib/twodim_base.py:515: 
> RuntimeWarning: overflow encountered in power
>   X[:, i] = x**(N - i - 1)
> /usr/lib/python2.7/dist-packages/numpy/lib/twodim_base.py:515: 
> RuntimeWarning: invalid value encountered in power
>   X[:, i] = x**(N - i - 1)



Re: [julia-users] numpy vs julia benchmarking for random matrix-vector multiplication

2015-01-08 Thread Joshua Adelman
No, I'm just using the 0.3.4 release build from the website. I don't have a 
personal MKL license. The python build is from Continuum via their Anaconda 
distribution with the accelerate add-on (free for academics).

Josh



On Jan 8, 2015, at 4:21 PM, Jiahao Chen wrote:

> Thanks for the update. Are you also building Julia with MKL also then?
> 
> On Thu Jan 08 2015 at 4:19:48 PM Joshua Adelman  
> wrote:
> Hi Jiahao,
> 
> Just a small note - based on your comments in the thread you mentioned, I 
> ended up changing my test to just multiply ones to avoid over/underflow. 
> Those are the results that are now in that notebook, so that shouldn't be an 
> issue in the plotted timings. On the python side, I'm using Numpy 1.9 via the 
> Anaconda distribution built against MKL.
> 
> Josh
> 
> 
> On Jan 8, 2015, at 4:15 PM, Jiahao Chen wrote:
> 
> > As Stefan wrote, all you are really doing with larger matrix tests is 
> > testing the speed of the different BLAS implementations being used by your 
> > distributions of Julia and NumPy.
> >
> > As I wrote in the other thread
> >
> > https://groups.google.com/d/msg/julia-users/Q96aPufg4S8/IBU9hW0xvWYJ
> >
> > the Vandermonde matrix generation is significantly faster for me in Julia 
> > than in Python (numpy using reference BLAS).
> >
> > Furthermore Vandermonde is not a good test with larger matrix sizes since 
> > you are basically testing the speed of multiplying things by infinity, 
> > which may not be representative of typical computations as it may incur 
> > overhead from handling overflows.
> >
> > For n=1 I get the following results:
> >
> > Macbook Julia (Core(TM) i5-4258U)
> > +/-1 matrix: 1.22s
> > [1:n] matrix: 1.21s #mostly overflow
> > rand matrix: 2.95s #mostly underflow
> >
> > Macbook NumPy
> > +/-1 matrix: 3.96s
> > [1:n] matrix: 5.04s #mostly overflow
> > rand matrix: 5.46s #mostly underflow
> >
> > Linux Julia (Xeon(R) E7- 8850)
> > +/-1 matrix: 2.18s
> > [1:n] matrix: 1.89s #mostly overflow
> > rand matrix: 4.36s #mostly underflow
> >
> > Linux NumPy
> > +/-1 matrix: 9.38s
> > [1:n] matrix: 10.64s #mostly overflow *
> > rand matrix: 32.30s #mostly underflow
> >
> > * emits warnings:
> > /usr/lib/python2.7/dist-packages/numpy/lib/twodim_base.py:515: 
> > RuntimeWarning: overflow encountered in power
> >   X[:, i] = x**(N - i - 1)
> > /usr/lib/python2.7/dist-packages/numpy/lib/twodim_base.py:515: 
> > RuntimeWarning: invalid value encountered in power
> >   X[:, i] = x**(N - i - 1)
> 



Re: [julia-users] numpy vs julia benchmarking for random matrix-vector multiplication

2015-01-08 Thread Joshua Adelman
Hi Steven,

I added your version (vander3) to my benchmark and updated the IJulia 
notebook:

http://nbviewer.ipython.org/gist/synapticarbors/26910166ab775c04c47b

As you mentioned it's a lot faster than the other version I wrote and evens 
out the underperformance vs numpy for the larger arrays on my machine. The 
@inbounds macro makes a small difference, but not dramatic. One of the 
things that I wonder is if there would be interest in having a way of 
globally turning off bounds checking either at the function level or 
module/file level, similar to cython's decorators and file-level compiler 
directives. 

Josh

On Thursday, January 8, 2015 at 4:36:52 PM UTC-5, Steven G. Johnson wrote:
>
> For comparison, the NumPy vander function
>
>  
> https://github.com/numpy/numpy/blob/f4be1039d6fe3e4fdc157a22e8c071ac10651997/numpy/lib/twodim_base.py#L490-L577
>
> does all its work in multiply.accumulate.   Here is the outer loop of 
> multiply.accumulate (written in C):
>
>
> https://github.com/numpy/numpy/blob/3b22d87050ab63db0dcd2d763644d924a69c5254/numpy/core/src/umath/ufunc_object.c#L2936-L3264
>
> and the inner loops (I think) are generated from this source file for 
> various numeric types:
>
>   
> https://github.com/numpy/numpy/blob/3b22d87050ab63db0dcd2d763644d924a69c5254/numpy/core/src/umath/loops.c.src
>
> A quick glance at these will tell you the price in code complexity that 
> NumPy is paying for the performance they manage to get.
>


Re: [julia-users] Re: Almost at 500 packages!

2015-01-22 Thread Joshua Adelman
Personally (and I am definitely biased by experience), I really like the model 
that is used pretty widely in the scientific python community. A lot of people 
use Continuum's Anaconda Python distribution, which includes about 200 packages 
(not all of them are actually python though). It's tested and represents most 
of the really important core libraries that one might use. I think that I end 
up having less than 20 other packages that I've installed in my python 
environment manually either via pip (using PyPi), a standard setup.py install, 
or an auxiliary conda (Anaconda's package manager) channel:

http://docs.continuum.io/anaconda/pkg-docs.html

Sure PyPi has many more packages available, but that a package is included in 
Anaconda sends a signal that it is being (relatively) widely used. Enthought 
has a similar distribution as well. A curated distribution may be better suited 
for a more mature ecosystem.

Somewhat unrelated, another really nice feature of Anaconda (actually of it's 
package manager Conda), is the ability to create isolated, named environments 
to test out new packages or new versions of a package. I'm sure that there is a 
way to do this with Julia's Pkg interface (I've seen Playground.jl that does 
something similar too). Since a number of Julia packages leverage PyCall, and 
Continuum is now building official conda packages for R 
(http://continuum.io/blog/preliminary-support-R-conda), maybe it would be 
useful to do something similar for Julia. Perhaps this has been discussed 
previously. 

Josh



On Jan 22, 2015, at 1:49 PM, Ista Zahn wrote:

> As an R user I'm surprised to see CRAN held up as a model to aspire
> to. There is a _lot_ of overlapping functionality among those 6k
> packages, making it hard to figure out which one is "best" for a
> particular purpose. There are also a lot of unfocused packages
> providing miscellaneous collections of functions, which makes it
> difficult to understand exactly what the package offers you as a user.
> As a user things are easier if a) each package has a clearly defined
> scope (i.e., "does one thing well"), and b) there are not too many
> similar packages to choose from for any particular task. None of this
> is to say that julia isn't on the right track in terms of packages,
> just that I question the wisdom of emulating CRAN in this regard.
> 
> Best,
> Ista
> 
> On Wed, Jan 21, 2015 at 7:46 PM, Iain Dunning  wrote:
>> Yes indeed Christoph, a package that doesn't work is a package that might as
>> well not exist. Fortunately, and fairly uniquely I think, we can quantify to
>> some extent how many of our packages are working, and the degree to which
>> they are.
>> 
>> In my mind the goal now is "grow fast and don't break too many things", and
>> I think our pace over the last month or so of around 1 package per day is
>> fantastic, with good stability of packages (i.e. they pass tests). I've also
>> noticed that packages being registered now are often of a higher quality
>> than they used to be, in terms of tests and documentation. I talked about
>> this a bit at JuliaCon, but in some sense NPM and CRAN represent different
>> ends of a spectrum of possibilities, and it seems like the consensus is more
>> towards CRAN. So, we're doing good I think.
>> 
>> 
>> On Wed, Jan 21, 2015 at 7:02 PM, Kevin Squire 
>> wrote:
>>> 
>>> Additional references: PyPI lists 54212 packages, currently (roughly half
>>> as many as node) but, CRAN only has 6214.
>>> 
>>> Cheers,
>>>   Kevin
>>> 
>>> On Wed, Jan 21, 2015 at 3:37 PM, Sean Garborg 
>>> wrote:
 
 You wouldn't like node ;)
 
 On Wednesday, January 21, 2015 at 4:29:53 PM UTC-7, Christoph Ortner
 wrote:
> 
> Great that so many are contributing to Julia, but I would question
> whether such a large number of packages will be healthy in the long run. 
> It
> will make it very difficult for new users to use Julia effectively.
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Iain Dunning
>> PhD Candidate / MIT Operations Research Center
>> http://iaindunning.com  /  http://juliaopt.org



[julia-users] Re: Question about list of list; equivalent to Python's key paramater?

2015-01-26 Thread Joshua Adelman


On Monday, January 26, 2015 at 12:02:10 PM UTC-5, Wai Yip Tung wrote:
>
> I'm trying to construct a list of list and do some operation on it. In 
> Python I would do
>
> In [7]: ll = [[1,2],[1,2,3],[7]]
>
> Say I want to sort them by the length of the list, many function accepts a 
> `key` parameter. In this case I want the `key` to be `len`.
>
> In [8]: max(ll, key=len)
> Out[8]: [1, 2, 3]
>
> In [9]: sorted(ll, key=len)
> Out[9]: [[7], [1, 2], [1, 2, 3]]
>
> In Julia, if I enter the literal  [[1,2],[1,2,3],[7]], then are join to 
> together into a long list. Through many trial and error I found a way by
>
> ll = (Array)[[1,2],[1,2,3],[7]]
>
> What is the proper way to construct a list of list?
>
> Secondly, Julia's sort support a `by` parameter that does what `key` do in 
> Python. But there is no equivalent in `maximum`. What is the recommended 
> way to find the longest list in the data structure?
>
> Wai Yip
>

I think it may be more idiomatic to initialize the array as:

ll = Array{Int,1}[[1,2],[1,2,3],[7]]

 if all elements will be 1d arrays of type Int. You could also use:

ll = Vector{Int}[[1,2],[1,2,3],[7]]

as a short hand. If the arrays are going to be of arbitrary type:

ll = Any[[1,2],[1,2,3],[7]]

Josh


[julia-users] Re: Question about list of list; equivalent to Python's key paramater?

2015-01-26 Thread Joshua Adelman


On Monday, January 26, 2015 at 12:09:40 PM UTC-5, Joshua Adelman wrote:
>
>
>
> On Monday, January 26, 2015 at 12:02:10 PM UTC-5, Wai Yip Tung wrote:
>>
>> I'm trying to construct a list of list and do some operation on it. In 
>> Python I would do
>>
>> In [7]: ll = [[1,2],[1,2,3],[7]]
>>
>> Say I want to sort them by the length of the list, many function accepts 
>> a `key` parameter. In this case I want the `key` to be `len`.
>>
>> In [8]: max(ll, key=len)
>> Out[8]: [1, 2, 3]
>>
>> In [9]: sorted(ll, key=len)
>> Out[9]: [[7], [1, 2], [1, 2, 3]]
>>
>> In Julia, if I enter the literal  [[1,2],[1,2,3],[7]], then are join to 
>> together into a long list. Through many trial and error I found a way by
>>
>> ll = (Array)[[1,2],[1,2,3],[7]]
>>
>> What is the proper way to construct a list of list?
>>
>> Secondly, Julia's sort support a `by` parameter that does what `key` do 
>> in Python. But there is no equivalent in `maximum`. What is the recommended 
>> way to find the longest list in the data structure?
>>
>> Wai Yip
>>
>
> I think it may be more idiomatic to initialize the array as:
>
> ll = Array{Int,1}[[1,2],[1,2,3],[7]]
>
>  if all elements will be 1d arrays of type Int. You could also use:
>
> ll = Vector{Int}[[1,2],[1,2,3],[7]]
>
> as a short hand. If the arrays are going to be of arbitrary type:
>
> ll = Any[[1,2],[1,2,3],[7]]
>
> Josh
>

And actually, just writing similarly to what you landed on, but without the 
parentheses

ll = Array[[1,2],[1,2,3],[7]]

 looks like it would also be reasonable since it returns an array of type 

Array{Array{T,N},1}

which I believe holds arrays of arbitrary (but homogenous) type and 
dimension. I'm just learning Julia myself, so others might have a better 
suggestion. Also I think the reason why your original construction just 
returned a long list was that you were using the syntactic shorthand for 
the hcat command.

Josh