date:20141110

Re: [julia-users] Re: Initialize dict of dicts with => syntax

2014-11-10 Thread Michele Zaffalon

Array(Dict{Int64,Int64},0)

On Tue, Nov 11, 2014 at 7:49 AM, Todd Leo  wrote:

> How to initialize an array of dicts? Is there any suggested ways to do it?
>
> julia> (Int64=>Int64)[]
> Dict{Int64,Int64} with 0 entries
>
> # And since brackets creates Arrays:
> julia> Any[]
> 0-element Array{Any,1}
>
> # So I suppose this would generate array of dicts, until it fails:
> julia> ((Int64=>Int64)[])[]
> ERROR: `getindex` has no method matching getindex(::Dict{Int64,Int64})
>
>
>
>
> On Sunday, May 4, 2014 5:02:14 AM UTC+8, thom lake wrote:
>>
>> One thing that I like about {} for initializing Array{Any,1}, is the
>> consistency with comprehension syntax. Namely, braces for Any, brackets for
>> specific types
>>
>> julia> typeof({i=>2i for i = 1:10})
>> Dict{Any,Any}
>>
>> julia> typeof([i=>2i for i = 1:10])
>> Dict{Int64,Int64}
>>
>> julia> typeof({2i for i = 1:10})
>> Array{Any,1}
>>
>> julia> typeof([2i for i = 1:10])
>> Array{Int64,1}
>>
>>
>>

[julia-users] Re: parallel for loop in Julia

2014-11-10 Thread DrKey

Thank you for your answer.
Do you have any suggestions how to deal with that?

Am Montag, 10. November 2014 23:25:23 UTC+1 schrieb ele...@gmail.com:
>
>
>
> On Tuesday, November 11, 2014 5:10:30 AM UTC+11, DrKey wrote:
>>
>> Here is what i tried:
>> variant1:
>>
>> forcp = zeros(3,1);
>>
>> forcp = @parallel (hcat) for partA = 1:nPart
>> for partB = (partA+1):nPart
>> ...
>> end
>> forcp = forces[:,partA];
>> end
>>
>> variant2:
>> function calcforces(coords,L,np,i) # with np... number of processes i... 
>> current process
>> for partA = i+1:np:nPart-1
>> for partB = (partA+1):nPart
>> ...
>> return forces
>> end
>>
>> np = nprocs();
>> parad = Array(RemoteRef,np);
>>
>> and then calling function calcforces with: 
>> for i=1:np parad[i] = @spawn LJ_Force_MT(coords,L,np,i); end
>> for i=1:np forces = fetch(parad[i]); end
>>
>> both ways are giving me wrong results over more than 1 timestep
>>
>
> You have multiple parallel loops modifying the forces array.  They will be 
> generating races for sure.
>
> Cheers
> Lex 
>

[julia-users] Re: Initialize dict of dicts with => syntax

2014-11-10 Thread Todd Leo

How to initialize an array of dicts? Is there any suggested ways to do it?

julia> (Int64=>Int64)[]
Dict{Int64,Int64} with 0 entries

# And since brackets creates Arrays:
julia> Any[]
0-element Array{Any,1}

# So I suppose this would generate array of dicts, until it fails:
julia> ((Int64=>Int64)[])[]
ERROR: `getindex` has no method matching getindex(::Dict{Int64,Int64})




On Sunday, May 4, 2014 5:02:14 AM UTC+8, thom lake wrote:
>
> One thing that I like about {} for initializing Array{Any,1}, is the 
> consistency with comprehension syntax. Namely, braces for Any, brackets for 
> specific types
>
> julia> typeof({i=>2i for i = 1:10})
> Dict{Any,Any}
>
> julia> typeof([i=>2i for i = 1:10])
> Dict{Int64,Int64}
>
> julia> typeof({2i for i = 1:10})
> Array{Any,1}
>
> julia> typeof([2i for i = 1:10])
> Array{Int64,1}
>
>
>

[julia-users] Re: Displaying a polygon mesh

2014-11-10 Thread Alex

Winston has an experimental/undocumented function surf + some stuff around it 
(https://github.com/nolta/Winston.jl/blob/master/src/canvas3d.jl), which might 
be sufficient if you just want to have a look at your meshes.

Best,

Alex.

On Tuesday, 11 November 2014 03:09:29 UTC+1, Simon Kornblith  wrote:
> Is there an easy way to display a polygon mesh in Julia, i.e., vertices and 
> faces loaded from an STL file or created by marching tetrahedra using 
> Meshes.jl? So far, I see:
> PyPlot/matplotlib, which seems to be surprisingly difficult to convince to do 
> this.GLPlot, which doesn't currently work for me on 0.4. (I haven't tried 
> very hard yet.)
> ihnorton's VTK bindings, which aren't registered in METADATA.jl. 
> Is there another option I'm missing? If not, can I convince one of these 
> packages to show my mesh with minimal time investment, or should I use a 
> separate volume viewer (or maybe a Python package via PyPlot)?
> 
> Thanks,
> Simon

[julia-users] Elementwise operator

2014-11-10 Thread Michael Louwrens

I was looking at the Devectorise package and was wondering, why not have an 
operator that calls elementwise operations?

While syntax is not something I have considered, using something basic like 
the example I see 

r = a .* b + c .* d + a


could be expressed as

r = .(a * b + c * d + a)


which would then apply the expression

a * b + c * d + a


to each element in the array.

.= could possibly be used in place of surrounding the expression with 
.(Expr).

I am not too familiar with Devectorise here but the advantage of this (from 
what I can tell with a limited look through of the readme) is that this 
could include user functions as well as user functions would then be 
applied.

r = .(a * b + c * d + foo(a) * bar(c,d))

or

r .= a * b + c * d + foo(a) * bar(c,d))


Should theoretically be possible then.

The obvious advantage would be that memory only needs to be allocated once 
for the new array instead of each broadcasted operator.

Just a thought which may be stepping on Devectorise's toes but reading 
through some of the vectorised code issues I thought this may be a simple 
solution which may provide a performance benefit.

Re: [julia-users] travis for os x packages

2014-11-10 Thread Pontus Stenetorp

On 11 November 2014 10:49, Tony Kelman  wrote:
> I don't want to steal Pontus Stenetorp's thunder since he did all the work,
> but there's a PR open here
> https://github.com/travis-ci/travis-build/pull/318 that will sooner or later
> add "community maintained" support for Julia directly in Travis as
> `language: julia`. The default .travis.yml for Julia packages can be
> simplified even further once that gets rolled out.

No worries about the thunder, let's hope they merge it soon enough and
I can make a public announcement.  Also, thank you for poking them the
other day.

Pontus

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Aneesh Sathe

Ah! I had misunderstood that. Thank you! :)

On Tuesday, November 11, 2014 11:19:29 AM UTC+8, Tim Holy wrote:
>
> On Monday, November 10, 2014 06:49:17 PM Aneesh Sathe wrote: 
> > 2) I understand your reasons for making all images in the Gray range, 
> but i 
> > prefer having "real" pixel values. That way its easier to correlate test 
> > data with something like Fiji or Matlab. And I don't understand Julia 
> float 
> > handling fully but there might be a gain in speed if using non-float 
> > values. 
>
> They're not really float values, underneath they are integers. You can 
> just say 
> `reinterpret(Uint16, x)`. 
>
> --Tim 
>

Re: [julia-users] Displaying a polygon mesh

2014-11-10 Thread Erik Schnetter

I'm using Compose (and Color), on which Gadfly is built. I tried
Gadfly itself, but there were some inefficiencies -- I tried to
compose an image consisting of many different edges, and this many
independent graphs (I'm using the wrong terminology here) was not
handled well.

I've copy-and-pasted my plot routines at
 to give you an
example.

"circle" draws a filled circle (a vertex), and "line" draws a line (an
edge). I'm choosing colours depending on the z coordinate. The code
isn't self-contained, but should serve as example to see how
easy/complex this approach is.

-erik

On Mon, Nov 10, 2014 at 9:09 PM, Simon Kornblith  wrote:
> Is there an easy way to display a polygon mesh in Julia, i.e., vertices and
> faces loaded from an STL file or created by marching tetrahedra using
> Meshes.jl? So far, I see:
>
> PyPlot/matplotlib, which seems to be surprisingly difficult to convince to
> do this.
> GLPlot, which doesn't currently work for me on 0.4. (I haven't tried very
> hard yet.)
> ihnorton's VTK bindings, which aren't registered in METADATA.jl.
>
> Is there another option I'm missing? If not, can I convince one of these
> packages to show my mesh with minimal time investment, or should I use a
> separate volume viewer (or maybe a Python package via PyPlot)?
>
> Thanks,
> Simon

-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/

[julia-users] Re: Displaying a polygon mesh

2014-11-10 Thread Steven G. Johnson



On Monday, November 10, 2014 9:09:29 PM UTC-5, Simon Kornblith wrote:
>
> Is there an easy way to display a polygon mesh in Julia, i.e., vertices 
> and faces loaded from an STL file or created by marching tetrahedra using 
> Meshes.jl? So far, I see:
>
>
Mayavi via PyCall?

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Tim Holy

On Monday, November 10, 2014 06:49:17 PM Aneesh Sathe wrote:
> 2) I understand your reasons for making all images in the Gray range, but i 
> prefer having "real" pixel values. That way its easier to correlate test
> data with something like Fiji or Matlab. And I don't understand Julia float
> handling fully but there might be a gain in speed if using non-float
> values.

They're not really float values, underneath they are integers. You can just say 
`reinterpret(Uint16, x)`.

--Tim

Re: [julia-users] Performance confusions on matrix extractions in loops, and memory allocations

2014-11-10 Thread Todd Leo

I do, actually, tried expanding vectorized operations into explicit for 
loops, and computing vector multiplication / vector norm in BLAS 
interfaces. For explicit loops, it did allocate less memory, but took much 
more time. Meanwhile, the vectorized version which I've been get used to 
write runs incredibly fast, as the following tests indicates:

# Explicit for loop, slightly modified from SimilarityMetric.jl by 
johnmyleswhite 
(https://github.com/johnmyleswhite/SimilarityMetrics.jl/blob/master/src/cosine.jl)
function cosine(a::SparseMatrixCSC{Float64, Int64}, 
b::SparseMatrixCSC{Float64, Int64})
sA, sB, sI = 0.0, 0.0, 0.0
for i in 1:length(a)
sA += a[i]^2
sI += a[i] * b[i]
end
for i in 1:length(b)
sB += b[i]^2
end
return sI / sqrt(sA * sB)
end

# BLAS version
function cosine_blas(i::SparseMatrixCSC{Float64, Int64}, 
j::SparseMatrixCSC{Float64, Int64})
i = full(i)
j = full(j)
numerator = BLAS.dot(i, j)
denominator = BLAS.nrm2(i) * BLAS.nrm2(j)
return numerator / denominator
end

# the vectorized version remains the same, as the 1st post shows.

# Test functions
function test_explicit_loop(d)
for n in 1:1
v = d[:,1]
cosine(v,v)
end
end
  
function test_blas(d)
for n in 1:1
v = d[:,1]
cosine_blas(v,v)
end
end
  
function test_vectorized(d)
for n in 1:1
v = d[:,1]
cosine_vectorized(v,v)
end
end

test_explicit_loop(mat)
test_blas(mat)
test_vectorized(mat)
gc()
@time test_explicit_loop(mat)
gc()
@time test_blas(mat)
gc()
@time test_vectorized(mat)

# Results
elapsed time: 3.772606858 seconds (6240080 bytes allocated)
elapsed time: 0.400972089 seconds (327520080 bytes allocated, 81.58% gc 
time)
elapsed time: 0.011236068 seconds (34560080 bytes allocated)


On Monday, November 10, 2014 7:23:17 PM UTC+8, Milan Bouchet-Valat wrote:
>
>  Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo a écrit : 
>
> Hi fellows,  
>
>  
>
>  I'm currently working on sparse matrix and cosine similarity 
> computation, but my routines is running very slow, at least not reach my 
> expectation. So I wrote some test functions, to dig out the reason of 
> ineffectiveness. To my surprise, the execution time of passing two vectors 
> to the test function and passing the whole sparse matrix differs greatly, 
> the latter is 80x faster. I am wondering why extracting two vectors of the 
> matrix in each loop is dramatically faster that much, and how to avoid the 
> multi-GB memory allocate. Thanks guys. 
>
>  
>
>  -- 
>
>  BEST REGARDS, 
>
>  Todd Leo 
>
>  
>
>  # The sparse matrix 
>
>  mat # 2000x15037 SparseMatrixCSC{Float64, Int64} 
>
>  
>
>  # The two vectors, prepared in advance 
>
>  v = mat'[:,1] 
>
>  w = mat'[:,2] 
>
>  
>
>  # Cosine similarity function 
>
>  function cosine_vectorized(i::SparseMatrixCSC{Float64, Int64}, 
> j::SparseMatrixCSC{Float64, Int64}) 
>
>  return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j)) 
>
>  end 
>
> I think you'll experience a dramatic speed gain if you write the sums in 
> explicit loops, accessing elements one by one, taking their product and 
> adding it immediately to a counter. In your current version, the 
> element-wise products allocate new vectors before computing the sums, which 
> is very costly.
>
> This will also get rid of the difference you report between passing arrays 
> and vectors.
>
>
> Regards
>
>  function test1(d) 
>
>  res = 0. 
>
>  for i in 1:1 
>
>  res = cosine_vectorized(d[:,1], d[:,2]) 
>
>  end 
>
>  end 
>
>  
>
>  function test2(_v,_w) 
>
>  res = 0. 
>
>  for i in 1:1 
>
>  res = cosine_vectorized(_v, _w) 
>
>  end 
>
>  end 
>
>  
>
>  test1(dtm) 
>
>  test2(v,w) 
>
>  gc() 
>
>  @time test1(dtm) 
>
>  gc() 
>
>  @time test2(v,w) 
>
>  
>
>  #elapsed time: 0.054925372 seconds (59360080 bytes allocated, 59.07% gc 
> time)
>
>  #elapsed time: 4.204132608 seconds (3684160080 bytes allocated, 65.51% 
> gc time)
>
>  
>

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Aneesh Sathe

Unless I understood wrong (which is very possible) the 65536 bins were to 
cover all possible values a 16bit pixel can take. Though, in the actual 
graythresh function i will probably use 256 bins by default.

I did find the docs for adding custom formats 
(https://github.com/timholy/Images.jl/blob/master/doc/extendingIO.md) 

But perhaps making bio formats .jar file will be better in the long run for 
few reasons:

1) A lot more formats are covered so implementing that would allow coverage 
of more formats faster. 
2) I understand your reasons for making all images in the Gray range, but i 
prefer having "real" pixel values. That way its easier to correlate test 
data with something like Fiji or Matlab. And I don't understand Julia float 
handling fully but there might be a gain in speed if using non-float 
values. 
3) Bio formats already allows the reading of individual images based on 
XYZCT so that doesn't  need to be rebuilt. 

Course, the above is the ideal thing to do. I'm still trying to figure out 
how to use the .jar file, so i might just end up adding the custom format 
first. 

Let's see...

-Aneesh

On Monday, November 10, 2014 6:55:08 PM UTC+8, Tim Holy wrote:
>
> All good plans. (I'm not sure about using 65536 bins for 16-bit images, 
> though, because that would be more bins than there are pixels in some 
> images. 
> Still, it's not all that much memory, really, so maybe that would be OK.) 
>
> It would be great to add native support. Presumably you've found the docs 
> on 
> adding support for new formats. 
>
> For formats that encode large datasets in a single block (like NRRD), you 
> can 
> work with GB-sized datasets on a laptop because you can use mmap (I do it 
> routinely). But the love of TIFF does demand an alternative solution. 
> Presumably we should add a lower-level routine that returns a structure 
> that 
> facilitates later access, e.g., 
> imds = imdataset("my_image_file") 
> img = imds["z", 14, "t", 7] 
> or somesuch. 
>
> --Tim 
>
> On Sunday, November 09, 2014 07:38:27 PM Aneesh Sathe wrote: 
> > Tim, 
> > i would like the imhist to be idiot proof. (i've been teaching matlab 
> and 
> > nothing puts new people off more than things not being idiot proof). 
> > things like using 256 bins by default returning a plot  if no 
> outputs 
> > are specified (basically make it like matlab's imthresh() ) 
> > 
> > Btw, on matlab using bioformats is actually the slowest part of my 
> > algorithm, so unless it can be faster in julia native support might be 
> > nicer. Bioformats also fails in that it reads the whole sequence at 
> once... 
> > so running things on laptops with even GB-level datasets is impossible. 
> I 
> > wrote my own version of bfopen to only open the required XYZCT for 
> > specified series, but that only solves the memory usage. 
> > 
> > the source format for my image was .mvd2 (perkin elmer spinning disk). 
> > 
> > i know about JavaCall.jl just havent had the time to play with it... 
> > 
> > i was thinking it might be fun to attempt native support for a few 
> formats. 
> > I can also generate test data in a few vendor formats for a few 
> > microscopes. 
> > perhaps even make it a julia-box based project. ;) 
> > 
> > On Monday, November 10, 2014 4:49:22 AM UTC+8, Tim Holy wrote: 
> > > On Sunday, November 09, 2014 11:39:53 AM Aneesh Sathe wrote: 
> > > > Yes, Images does read it okay but only if i cut out the substack. If 
> i 
> > > > don't, then it interprets the three channels as a time dimension, 
> which 
> > > > isnt a pain at the moment but will be if i start using it for work. 
> > > 
> > > Hmm, that sounds like an annotation problem. 
> > > 
> > > > I realized that both the convert and the g[:] would slow me down but 
> the 
> > > > hist function just wouldn't work without that kind of dance. Also, 
> > > > graythresh (http://www.mathworks.com/help/images/ref/graythresh.html) 
>
> > > 
> > > uses 
> > > 
> > > > reshape to make it all one image which might also add to speed. 
> > > > 
> > > > The pull request is well and good but personally i would rather have 
> a 
> > > > dedicated image histogram function like 
> > > > imhist: http://www.mathworks.com/help/images/ref/imhist.html 
> > > > which would give histograms based on input images. To me that's the 
> only 
> > > > way to make life easier. maybe i'll write one :) 
> > > 
> > > imhist is necessary in matlab largely because hist works columnwise; 
> in a 
> > > sense, Julia's `hist` is like imhist. Is there some specific 
> functionality 
> > > you're interested in? There's no reason Images can't provide a custom 
> > > version 
> > > of `hist`. 
> > > 
> > > > Something about Images: do you think it possible to use the bio 
> formats' 
> > > > .jar file to import images from a microscope format to Images? 
> > > > Opening a microscope format image file in the relevant software and 
> then 
> > > > exporting it as tiff takes too long and i'd rather be able to access 
> th

Re: [julia-users] Questions relating to packages and using/creating them

2014-11-10 Thread Isaiah Norton

1. see LOAD_PATH (http://julia.readthedocs.org/en/latest/manual/modules/)
2. this is not specifically supported, as far as I know. We could be fancy
and add a UUID to the package spec, or something like that, but I don't
think it is a very pressing concern right now. The simple options right now
are to manipulate LOAD_PATH to put the preferred package path(s) first (I
think this should work) or to manually `require` a specific path (which
won't work with `using`).

On Mon, Nov 10, 2014 at 9:25 PM, Dom Luna  wrote:

> I have some general questions about using packages.
>
> 1. Is there a way to create a workspace separate of $HOME/.julia? This
> would still have the same functionality when calling "using" in the REPL.
> 2. What's the best practice for packages with the same name? I don't have
> a problem related to this but I'm just curious how this is handled. I think
> via Pkg.add(...) there's only one definition of any package name, but with
> Pkg.clone(...) I could see package name collisions. Having all the packages
> under one directory doesn't seem scalable to me.
>
> thanks
>

[julia-users] Questions relating to packages and using/creating them

2014-11-10 Thread Dom Luna

I have some general questions about using packages.

1. Is there a way to create a workspace separate of $HOME/.julia? This 
would still have the same functionality when calling "using" in the REPL.
2. What's the best practice for packages with the same name? I don't have a 
problem related to this but I'm just curious how this is handled. I think 
via Pkg.add(...) there's only one definition of any package name, but with 
Pkg.clone(...) I could see package name collisions. Having all the packages 
under one directory doesn't seem scalable to me.

thanks

[julia-users] Julia Tech Talk at the University of Pennsylvania

2014-11-10 Thread Ted Fujimoto

Hi all,

Feel free to come by if you're around Philly!

Julia Tech Talk on Thursday, November 13 at 6:00pm at Wu and Chen Auditorium
When: Thursday, November 13 
 at 6:00pm
Where: Wu and Chen Auditorium 
 
Philadelphia, 
Pennsylvania 19104

 

On Thursday, November 13th @ 6pm the Dining Philosophers will be hosting a 
talk on the Julia Programming language in Wu & Chen Auditorium. Julia has 
the elegance and familiarity of Python and Matlab, with speed close to C, 
and is completely open source. This is a great opportunity for anyone 
interested in scientific and parallel computation, machine learning, data 
analysis, and visualization. There will be a giveaway of online JuliaBox 
codes for the Julia language for all attendees!

Speakers: Ted Fujimoto (CIT Masters student) and Randy Zwitch (Senior Data 
Scientist at Comcast)

 

Randy Zwitch is Senior Data Scientist at Comcast, researching how to 
improve the overall customer viewing experience using petabyte-scale tools 
and datasets. Randy also contributes to the R and Julia open-source 
communities, creating and maintaining packages primarily related to the web 
(HTTP requests/APIs, Server Log Parsing, Geo-Location, etc.) and database 
access. 


Abstract: Using publicly available datasets, Randy will provide an intro to 
machine learning using ad-hoc Julia code and via add-on packages.

[julia-users] Displaying a polygon mesh

2014-11-10 Thread Simon Kornblith

Is there an easy way to display a polygon mesh in Julia, i.e., vertices and 
faces loaded from an STL file or created by marching tetrahedra using 
Meshes.jl? So far, I see:

   - PyPlot/matplotlib, which seems to be surprisingly difficult to 
   convince to do this.
   - GLPlot, which doesn't currently work for me on 0.4. (I haven't tried 
   very hard yet.)
   - ihnorton's VTK bindings, which aren't registered in METADATA.jl. 

Is there another option I'm missing? If not, can I convince one of these 
packages to show my mesh with minimal time investment, or should I use a 
separate volume viewer (or maybe a Python package via PyPlot)?

Thanks,
Simon

Re: [julia-users] travis for os x packages

2014-11-10 Thread Tony Kelman

I don't want to steal Pontus Stenetorp's thunder since he did all the work, 
but there's a PR open 
here https://github.com/travis-ci/travis-build/pull/318 that will sooner or 
later add "community maintained" support for Julia directly in Travis as 
`language: julia`. The default .travis.yml for Julia packages can be 
simplified even further once that gets rolled out.

That doesn't fix the capacity issues at Travis where they aren't accepting 
new repos, so for now the `language: objective-c` version, and using the 
install-julia.sh script, is the best way to temporarily test things out on 
Mac workers.

On Monday, November 10, 2014 12:32:34 PM UTC-8, Elliot Saba wrote:
>
> Yep.  Essentially, you'll need to enable the "osx" build environment 
> .  It looks like 
> Travis is not accepting  more 
> multi-os requests at the moment, so the typical approach, (used on, for 
> instance, the main julia repository 
> ) won't 
> work.
>
> You may not be able to get it to run on multiple OS'es, but you should be 
> able to get it to run on OSX only by setting the language to 
> "objective-c".  This will get it to run on OSX only, then you can use the 
> default 
> .travis.yml file 
> 
>  
> that is generated by Pkg.
>
> In short, you should be able to take that default file, change the 
> language to "objective-c", remove the "os" block, and call it good.  Save 
> that as ".travis.yml" in your repo, enable Travis in your repository's 
> "services" section, and test away!
> -E
>
> On Mon, Nov 10, 2014 at 7:50 AM, Simon Byrne  > wrote:
>
>> I would like to set up travis for an OS X-only package: does anyone have 
>> suggestions for how I should set up travis (or has anyone already done 
>> this)?
>>
>> simon
>>
>
>

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson

On Monday, November 10, 2014 8:39:41 PM UTC-5, Steven G. Johnson wrote:

> Google's Snappy library has a 64-bit API, but seems to also be limited to 
> 32-bit sizes internally, as is the LZ4 library.  Kind of surprising that so 
> many people would independently limit themselves to 32-bit buffers nowadays.
>

Snappy's only excuse was backwards compatibility: 
https://code.google.com/p/snappy/issues/detail?id=76

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson

On Monday, November 10, 2014 6:09:50 PM UTC-5, Jake Bolewski wrote:
>
> The 64 bit issue is killer and why I didn't go farther with integrating 
> blosc with hdf5.  I guess I should had been more vocal about this.  Take 
> what you may from my nascent package :-) 
>

Google's Snappy library has a 64-bit API, but seems to also be limited to 
32-bit sizes internally, as is the LZ4 library.  Kind of surprising that so 
many people would independently limit themselves to 32-bit buffers nowadays.

[julia-users] Available packages for compression?

2014-11-10 Thread Steven G. Johnson

Pkg.add("Blosc") should now add a working Blosc package.

[julia-users] Re: JuliaBox

2014-11-10 Thread cdm

the Sagemath Cloud google chrome app also gets users to a rich environment 
for Julia ...

https://chrome.google.com/webstore/detail/the-sagemath-cloud/eocdndagganmilahaiclppjigemcinmb

users can run Julia inside a terminal ... OR ... via iJulia notebooks ... 
OR ... via Sagemath worksheets.

also available for running Julia within a terminal, the VMs served at

   https://koding.com (there is also a google chrome app for this ...)

best,

cdm

On Monday, November 10, 2014 11:04:13 AM UTC-8, Ivar Nesje wrote:
>
> Yesterday someone suggested 
> 
>  
>
>  https://tmpnb.org/
>
>>
>>

[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread Ivar Nesje

That seems like a tricky edge case, indeed. Not sure if this is a bug 
either, or if there are any existing issues on github that covers this.

kl. 23:26:49 UTC+1 mandag 10. november 2014 skrev John Drummond følgende:
>
> Got it - I don't know whether it's a bug or not.
> if I comment out 
> #import Base.isless
> in the LogParse.jl file and initially reload that in the repl and then 
> reload the correct version with
> import Base.isless
> methods(isless) shows the method but sort says it's not defined, even when 
> I specify it directly.
> Apologies for not checking the initial input in a fresh session, I thought 
> that reloading a module would completely reload the functions, but 
> presumably not when appending to those in Base.
>
> Kind regards, John.
>
>
>
>
> On Monday, November 10, 2014 6:04:29 PM UTC, John Drummond wrote:
>>
>> Thank you, that's helpful. 
>> I reentered it all in a fresh session and found it working as well - I'll 
>> try and find the difference which caused it not to work and come back.
>> Kind Regards, John.
>>
>> On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:
>>>
>>> This code works everywhere I'm able to try it. 
>>>
>>> kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:

 I was originally julia 0.3.1 on windows 7
 this is on Macosx 10 julia 0.3.2
 I loaded the file LogParse.jl below and then in the repl ran

 reload("LogParse.jl")

 methods(isless)


 ary1 = LogParse.DayPriceText[]
 push!(ary1,LogParse.DayPriceText(4,"a1",1))
 push!(ary1,LogParse.DayPriceText(2,"a1",1))
 push!(ary1,LogParse.DayPriceText(6,"a1",1))


 sort(ary1)

 sort(ary1,lt=LogParse.isless)
 I get the same messages - methods(isless) shows that it's loaded
 but the sort can't find it, even when I try to specify the function


 #in file LogParse.jl ###
 module LogParse
 export DayPriceText
 import Base.isless

 type DayPriceText
   a1::Uint32
   b1::ASCIIString
   a2::Uint32
 end

 function isless(a::DayPriceText, b::DayPriceText)
   if (a.a1 < b.a1)
 return true
   else
 return false
   end
 end


 end
 ##

 Many thanks.
 Kind regards, John


 On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:
>
> In this case it would be really great if you had a minimal 
> reproducible example. It looks to me as you are doing everything right, 
> so 
> I would start looking for typos and scoping issues. It's hard to find 
> them 
> without looking at the code.
>
> Ideally the example should be small and possible to paste into a REPL 
> session, but if you can publish your code and don't want to extract only 
> the relevant part, that might be fine too.
>
> Julia version and operating system is also nice to include, so that we 
> have it available in case we have problems reproducing your results.
>
> Regards Ivar
>
> kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond 
> følgende:
>>
>> Hi,
>> I suspect I'm doing something stupid but no idea what I'm missing.
>>
>> I create a module .
>> I create a type in it, DayPriceText
>> I import Base.isless
>> I define isless for the type
>>
>> now in the repl I get
>>
>> methods(isless)
>> =>
>> # 25 methods for generic function "isless":
>> ..
>> isless(x::DayPriceText,y::DayPriceText) at 
>> c:\works\juliaplay\LogParse.jl:16
>>
>> but
>>
>> julia> typeof(a1p)
>> Array{DayPriceText,1}
>>
>> julia> sort(a1p, lt=CILogParse.isless)
>> ERROR: `isless` has no method matching isless(::DayPriceText, 
>> ::DayPriceText)
>>  in sort! at sort.jl:246
>>
>> julia> sort(a1p)
>> ERROR: `isless` has no method matching isless(::DayPriceText, 
>> ::DayPriceText)
>>  in sort! at sort.jl:246
>>
>> I'm sure there's some obvious answer, but I've not idea what.
>> Thanks for any help
>> kind regards, John.
>>
>>

[julia-users] Re: Great new expository article about Julia by the core developers

2014-11-10 Thread cdm

see this ...

https://groups.google.com/d/msg/julia-box/hw81as3GPWA/E1QJm1shnV4J



On Monday, November 10, 2014 7:37:08 AM UTC-8, David Higgins wrote:
>
> So how does one go about getting an invitation to JuliaBox? It's 
> referenced in the article but you need an invitation to login
>
> Dave.
>

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Jake Bolewski

The 64 bit issue is killer and why I didn't go farther with integrating 
blosc with hdf5.  I guess I should had been more vocal about this.  Take 
what you may from my nascent package :-)

On Monday, November 10, 2014 6:05:40 PM UTC-5, Steven G. Johnson wrote:
>
>
> That seems to be the most reasonable approach but I couldn't work out how 
>> to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
>> package aware of that location when building libhdf5.  Are there examples 
>> of how to do that?
>>
>
> Note that the dependencies in some sense run in the opposite direction.  
> You don't technically need to make HDF5 aware of Blosc when building 
> libhdf5.  Instead, you need to build a Blosc "filter" for HDF5 (included 
> with c-blosc) and register it with HDF5.
>
> The Blosc.jl package can't build the HDF5 filter, because that would 
> introduce an unnecessary dependency on HDF5 for other things using Blosc.   
> So, at least this component needs to be built in/after the HDF5 package.
>

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson



> That seems to be the most reasonable approach but I couldn't work out how 
> to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
> package aware of that location when building libhdf5.  Are there examples 
> of how to do that?
>

Note that the dependencies in some sense run in the opposite direction.  
You don't technically need to make HDF5 aware of Blosc when building 
libhdf5.  Instead, you need to build a Blosc "filter" for HDF5 (included 
with c-blosc) and register it with HDF5.

The Blosc.jl package can't build the HDF5 filter, because that would 
introduce an unnecessary dependency on HDF5 for other things using Blosc.   
So, at least this component needs to be built in/after the HDF5 package.

[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread John Drummond

Got it - I don't know whether it's a bug or not.
if I comment out 
#import Base.isless
in the LogParse.jl file and initially reload that in the repl and then 
reload the correct version with
import Base.isless
methods(isless) shows the method but sort says it's not defined, even when 
I specify it directly.
Apologies for not checking the initial input in a fresh session, I thought 
that reloading a module would completely reload the functions, but 
presumably not when appending to those in Base.

Kind regards, John.

On Monday, November 10, 2014 6:04:29 PM UTC, John Drummond wrote:
>
> Thank you, that's helpful. 
> I reentered it all in a fresh session and found it working as well - I'll 
> try and find the difference which caused it not to work and come back.
> Kind Regards, John.
>
> On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:
>>
>> This code works everywhere I'm able to try it. 
>>
>> kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:
>>>
>>> I was originally julia 0.3.1 on windows 7
>>> this is on Macosx 10 julia 0.3.2
>>> I loaded the file LogParse.jl below and then in the repl ran
>>>
>>> reload("LogParse.jl")
>>>
>>> methods(isless)
>>>
>>>
>>> ary1 = LogParse.DayPriceText[]
>>> push!(ary1,LogParse.DayPriceText(4,"a1",1))
>>> push!(ary1,LogParse.DayPriceText(2,"a1",1))
>>> push!(ary1,LogParse.DayPriceText(6,"a1",1))
>>>
>>>
>>> sort(ary1)
>>>
>>> sort(ary1,lt=LogParse.isless)
>>> I get the same messages - methods(isless) shows that it's loaded
>>> but the sort can't find it, even when I try to specify the function
>>>
>>>
>>> #in file LogParse.jl ###
>>> module LogParse
>>> export DayPriceText
>>> import Base.isless
>>>
>>> type DayPriceText
>>>   a1::Uint32
>>>   b1::ASCIIString
>>>   a2::Uint32
>>> end
>>>
>>> function isless(a::DayPriceText, b::DayPriceText)
>>>   if (a.a1 < b.a1)
>>> return true
>>>   else
>>> return false
>>>   end
>>> end
>>>
>>>
>>> end
>>> ##
>>>
>>> Many thanks.
>>> Kind regards, John
>>>
>>>
>>> On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:

 In this case it would be really great if you had a minimal reproducible 
 example. It looks to me as you are doing everything right, so I would 
 start 
 looking for typos and scoping issues. It's hard to find them without 
 looking at the code.

 Ideally the example should be small and possible to paste into a REPL 
 session, but if you can publish your code and don't want to extract only 
 the relevant part, that might be fine too.

 Julia version and operating system is also nice to include, so that we 
 have it available in case we have problems reproducing your results.

 Regards Ivar

 kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende:
>
> Hi,
> I suspect I'm doing something stupid but no idea what I'm missing.
>
> I create a module .
> I create a type in it, DayPriceText
> I import Base.isless
> I define isless for the type
>
> now in the repl I get
>
> methods(isless)
> =>
> # 25 methods for generic function "isless":
> ..
> isless(x::DayPriceText,y::DayPriceText) at 
> c:\works\juliaplay\LogParse.jl:16
>
> but
>
> julia> typeof(a1p)
> Array{DayPriceText,1}
>
> julia> sort(a1p, lt=CILogParse.isless)
> ERROR: `isless` has no method matching isless(::DayPriceText, 
> ::DayPriceText)
>  in sort! at sort.jl:246
>
> julia> sort(a1p)
> ERROR: `isless` has no method matching isless(::DayPriceText, 
> ::DayPriceText)
>  in sort! at sort.jl:246
>
> I'm sure there's some obvious answer, but I've not idea what.
> Thanks for any help
> kind regards, John.
>
>

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson



On Monday, November 10, 2014 5:02:03 PM UTC-5, Steven G. Johnson wrote:

> I've just created a Blosc.jl package and registered it.   Do Pkg.update() 
> and Pkg.add("Blosc") to get it.
>

Oh, darn it, I just realized I am duplicating some work by jakebolewski...

[julia-users] Re: parallel for loop in Julia

2014-11-10 Thread elextr



On Tuesday, November 11, 2014 5:10:30 AM UTC+11, DrKey wrote:
>
> Here is what i tried:
> variant1:
>
> forcp = zeros(3,1);
>
> forcp = @parallel (hcat) for partA = 1:nPart
> for partB = (partA+1):nPart
> ...
> end
> forcp = forces[:,partA];
> end
>
> variant2:
> function calcforces(coords,L,np,i) # with np... number of processes i... 
> current process
> for partA = i+1:np:nPart-1
> for partB = (partA+1):nPart
> ...
> return forces
> end
>
> np = nprocs();
> parad = Array(RemoteRef,np);
>
> and then calling function calcforces with: 
> for i=1:np parad[i] = @spawn LJ_Force_MT(coords,L,np,i); end
> for i=1:np forces = fetch(parad[i]); end
>
> both ways are giving me wrong results over more than 1 timestep
>

You have multiple parallel loops modifying the forces array.  They will be 
generating races for sure.

Cheers
Lex

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson

Note that, currently, Blosc is limited to 32-bit buffer sizes:
 https://github.com/Blosc/c-blosc/issues/67
Until this is fixed, I wouldn't recommend using it for JLD files.

Re: [julia-users] Help optimizing sparse matrix code

2014-11-10 Thread Milan Bouchet-Valat

Le lundi 10 novembre 2014 à 13:03 -0800, Joshua Tokle a écrit :
> Hello! I'm trying to replace an existing matlab code with julia and
> I'm having trouble matching the performance of the original code. The
> matlab code is here:
> 
> https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m
> 
> The program clusters inventors from a database of patent applications.
> The input data is a sparse boolean matrix (named XX in the script),
> where each row defines an inventor and each column defines a feature.
> For example, the jth column might correspond to a feature "first name
> is John". If there is a 1 in the XX[i, j], this means that inventor
> i's first name is John. Given an inventor i, we find similar inventors
> by identifying rows in the matrix that agree with XX[i, :] on a given
> column and then applying element-wise boolean operations to the rows.
> In the code, for a given value of `index`, C_lastname holds the unique
> column in XX corresponding to a "last name" feature such that
> XX[index, :] equals 1. C_firstname holds the unique column in XX
> corresponding to a "first name" feature such that XX[index, :] equals
> 1. And so on. The following code snippet finds all rows in the matrix
> that agree with XX[index, :] on full name and one of patent assignee
> name, inventory city, or patent class:
> 
> lump_index_2 = step & ((C_assignee | C_city | C_class))
> 
> The `step` variable is an indicator that's used to prevent the same
> inventors from being considered multiple times. My attempt at a
> literal translation of this code to julia is here:
> 
> https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl
> 
> The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean
> operations aren't supported for sparse matrices in julia, so I fake it
> with integer arithmetic.  The line that corresponds to the matlab code
> above is
> 
> lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class)))
You should be able to get a speedup by replacing this line with an
explicit `for` loop. First, you'll avoid memory allocation (one for each
+ or .* operation). Second, you'll be able to return as soon as the
index is found, instead of computing the value for all elements (IIUC
you're only looking for one index, right?).


My two cents

> The reason I grouped it this way is that initially `step` will be a
> "sparse" vector of all 1's, and I thought it might help to do the
> truly sparse arithmetic first.
> 
> I've been testing this code on a Windows 2008 Server. The test data
> contains 45,763 inventors and 274,578 possible features (in other
> words, XX is an 45,763 x 274,58 sparse matrix). The matlab program
> consistently takes about 70 seconds to run on this data. The julia
> version shows a lot of variation: it's taken as little as 60 seconds
> and as much as 10 minutes. However, most runs take around 3.5 to 4
> minutes. I pasted one output from the sampling profiler here [1]. If
> I'm reading this correctly, it looks like the program is spending most
> of its time performing element-wise multiplication of the indicator
> vectors I described above.
> 
> I would be grateful for any suggestions that would bring the
> performance of the julia program in line with the matlab version. I've
> heard that the last time the matlab code was run on the full data set
> it took a couple days, so a slow-down of 3-4x is a signficant burden.
> I did attempt to write a more idiomatic julia version using Dicts and
> Sets, but it's slower than the version that uses sparse matrix
> operations:
> 
> https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl
> 
> Thank you!
> Josh
> 
> 
> [1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5
> 
>

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson



> Wouldn't it be better to have a separate Blosc.jl package that is used by 
>> HDF5.jl?   After all, there are presumably many other applications of this.
>>
>
> That seems to be the most reasonable approach but I couldn't work out how 
> to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
> package aware of that location when building libhdf5.  Are there examples 
> of how to do that?
>

I've just created a Blosc.jl package and registered it.   Do Pkg.update() 
and Pkg.add("Blosc") to get it.

To get the library location in the HDF5 package, just:

1) Add Blosc to the REQUIRE file
2) import Blosc
3) Blosc.libblosc is the path to the shared library.

[julia-users] Help optimizing sparse matrix code

2014-11-10 Thread Joshua Tokle

Hello! I'm trying to replace an existing matlab code with julia and I'm 
having trouble matching the performance of the original code. The matlab 
code is here:
https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m

The program clusters inventors from a database of patent applications. The 
input data is a sparse boolean matrix (named XX in the script), where each 
row defines an inventor and each column defines a feature. For example, the 
jth column might correspond to a feature "first name is John". If there is 
a 1 in the XX[i, j], this means that inventor i's first name is John. Given 
an inventor i, we find similar inventors by identifying rows in the matrix 
that agree with XX[i, :] on a given column and then applying element-wise 
boolean operations to the rows. In the code, for a given value of `index`, 
C_lastname holds the unique column in XX corresponding to a "last name" 
feature such that XX[index, :] equals 1. C_firstname holds the unique 
column in XX corresponding to a "first name" feature such that XX[index, :] 
equals 1. And so on. The following code snippet finds all rows in the 
matrix that agree with XX[index, :] on full name and one of patent assignee 
name, inventory city, or patent class:

lump_index_2 = step & ((C_assignee | C_city | C_class))

The `step` variable is an indicator that's used to prevent the same 
inventors from being considered multiple times. My attempt at a literal 
translation of this code to julia is here:
https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl

The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean operations 
aren't supported for sparse matrices in julia, so I fake it with integer 
arithmetic.  The line that corresponds to the matlab code above is

lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class)))

The reason I grouped it this way is that initially `step` will be a 
"sparse" vector of all 1's, and I thought it might help to do the truly 
sparse arithmetic first.

I've been testing this code on a Windows 2008 Server. The test data 
contains 45,763 inventors and 274,578 possible features (in other words, XX 
is an 45,763 x 274,58 sparse matrix). The matlab program consistently takes 
about 70 seconds to run on this data. The julia version shows a lot of 
variation: it's taken as little as 60 seconds and as much as 10 minutes. 
However, most runs take around 3.5 to 4 minutes. I pasted one output from 
the sampling profiler here [1]. If I'm reading this correctly, it looks 
like the program is spending most of its time performing element-wise 
multiplication of the indicator vectors I described above.

I would be grateful for any suggestions that would bring the performance of 
the julia program in line with the matlab version. I've heard that the last 
time the matlab code was run on the full data set it took a couple days, so 
a slow-down of 3-4x is a signficant burden. I did attempt to write a more 
idiomatic julia version using Dicts and Sets, but it's slower than the 
version that uses sparse matrix operations:
https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl

Thank you!
Josh


[1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread Milan Bouchet-Valat

Le lundi 10 novembre 2014 à 10:07 -0800, David van Leeuwen a écrit :
> Hello, 
> 
> On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat wrote:
> Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit 
> : 
> > Hello, 
> > 
> > On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: 
> > NamedArrays.jl generally goes along this way. However, it 
> > remains limited in two aspects: 
> > 
> > 
> > 1. Some fields in NamedArrays are not declared of specific 
> > types. In particular, the field `dicts` is of the type 
> > `Vector{Dict}`, and the use of this field is on the 
> critical 
> > path when looping over the table, e.g. when counting. This 
> > would potentially lead to substantial impact on 
> performance. > 
> > 
> > I suppose the problem you indicate can be alleviated by making 
> > NamedArray parameterized by the type of the key in the dict as 
> well.   
> Right. Sounds reasonable. 
> 
> 
> 
> I've been pondering over how this could be done. NamedArray has a type
> parameter N, and it should then further have N type parameters
> indicating the dictionary type along each of the N dimension.  So I
> figure this is going to be a challenging type definition.  
A tuple type could be used to give the type of the dimension names.

But there's another issue: `dicts::Vector{Dict}` cannot be defined more
precisely than that if heterogeneous types are allowed for different
dimensions. Is this a case where staged functions could be used to
generate efficient functions to access dictionaries?


Regards

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread David van Leeuwen

On Monday, November 10, 2014 10:07:46 PM UTC+1, Milan Bouchet-Valat wrote:
>
> Le lundi 10 novembre 2014 à 10:07 -0800, David van Leeuwen a écrit : 
> > Hello, 
> > 
> > On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat 
> wrote: 
> > Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a 
> écrit : 
> > > Hello, 
> > > 
> > > On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin 
> wrote: 
> > > NamedArrays.jl generally goes along this way. However, 
> it 
> > > remains limited in two aspects: 
> > > 
> > > 
> > > 1. Some fields in NamedArrays are not declared of 
> specific 
> > > types. In particular, the field `dicts` is of the type 
> > > `Vector{Dict}`, and the use of this field is on the 
> critical 
> > > path when looping over the table, e.g. when counting. 
> This 
> > > would potentially lead to substantial impact on 
> performance. > 
> > > 
> > > I suppose the problem you indicate can be alleviated by making 
> > > NamedArray parameterized by the type of the key in the dict as 
> well.   
> > Right. Sounds reasonable. 
> > 
> > 
> > 
> > I've been pondering over how this could be done. NamedArray has a type 
> > parameter N, and it should then further have N type parameters 
> > indicating the dictionary type along each of the N dimension.  So I 
> > figure this is going to be a challenging type definition.   
> A tuple type could be used to give the type of the dimension names. 
>
> But there's another issue: `dicts::Vector{Dict}` cannot be defined more 
> precisely than that if heterogeneous types are allowed for different 
>

This is exactly what I was referring to.  Not all dimensions will have the 
same type, so the number of types that are parameterizing NamedArrays 
depends on N, yet another parameter of the type.  I am not sure how to 
define a variable number of parameters for a type.  Maybe something 
recursive will do. 
 

> dimensions. Is this a case where staged functions could be used to 
> generate efficient functions to access dictionaries? 
>
>
> Regards 
>

[julia-users] Re: Contributing to a Julia Package

2014-11-10 Thread Ivar Nesje

Another important point (for actively developed packages) is that Pkg.add() 
checks out the commit of the latest released version registered in 
METADATA.jl. Most packages do development on the master branch, so you 
should likely base your changes on master, rather than the latest released 
version.

To do this, you can use `Pkg.checkout()`, but `git checkout master` will 
also work.

Ivar

kl. 21:07:49 UTC+1 mandag 10. november 2014 skrev Tim Wheeler følgende:
>
> Thank you! It seems to have worked.
> Per João's suggestions, I had to:
>
>
>- Create a fork on Github of the target package repository
>- Clone my fork locally
>- Create a branch on my local repository
>- Add, commit, & push my changes to said branch
>- On github I could then submit the pull request from my forked repo 
>to the upstream master
>
>
>
>
>
>
> On Monday, November 10, 2014 11:17:55 AM UTC-8, Tim Wheeler wrote:
>>
>> Hello Julia Users,
>>
>> I wrote some code that I would like to submit via pull request to a Julia 
>> package. The thing is, I am new to this and do not understand the pull 
>> request process.
>>
>> What I have done:
>>
>>- used Pkg.add to obtain a local version of said package
>>- ran `git branch mybranch` to create a local git branch 
>>- created my code additions and used `git add` to include them. Ran 
>>`git commit -m`
>>
>> I am confused over how to continue. The instructions on git for issuing a 
>> pull request require that I use their UI interface, but my local branch is 
>> not going to show up when I select "new pull request" because it is, well, 
>> local to my machine. Do I need to fork the repository first? When I try 
>> creating a branch through the UI I do not get an option to create one like 
>> they indicate in the tutorial 
>> ,
>>  
>> perhaps because I am not a repo owner.
>>
>> Thank you.
>>
>

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Douglas Bates

On Monday, November 10, 2014 12:55:24 PM UTC-6, Steven G. Johnson wrote:
>
>
>
> On Tuesday, September 2, 2014 3:58:25 PM UTC-4, Jake Bolewski wrote:
>>
>> It would be best to incorporate it into the HDF5 package.  A julia 
>> package would be useful if you wanted to do the same sort of compression on 
>> Julia binary blobs, such as serialized julia values in an IOBuffer.
>>
>
> Wouldn't it be better to have a separate Blosc.jl package that is used by 
> HDF5.jl?   After all, there are presumably many other applications of this.
>

That seems to be the most reasonable approach but I couldn't work out how 
to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
package aware of that location when building libhdf5.  Are there examples 
of how to do that?
 

>
> Note that HDF5 has a Blosc filter (
> http://www.hdfgroup.org/services/filters.html#blosc and 
> https://github.com/Blosc/c-blosc/tree/master/hdf5), so that I guess you 
> can use Blosc internally in the HDF5 file while still allowing HDF5 tools 
> to work with the file. 
>

Re: [julia-users] travis for os x packages

2014-11-10 Thread Elliot Saba

Yep.  Essentially, you'll need to enable the "osx" build environment
.  It looks like Travis
is not accepting  more multi-os
requests at the moment, so the typical approach, (used on, for instance,
the main julia repository
) won't
work.

You may not be able to get it to run on multiple OS'es, but you should be
able to get it to run on OSX only by setting the language to
"objective-c".  This will get it to run on OSX only, then you can use
the default
.travis.yml file

that is generated by Pkg.

In short, you should be able to take that default file, change the language
to "objective-c", remove the "os" block, and call it good.  Save that as
".travis.yml" in your repo, enable Travis in your repository's "services"
section, and test away!
-E

On Mon, Nov 10, 2014 at 7:50 AM, Simon Byrne  wrote:

> I would like to set up travis for an OS X-only package: does anyone have
> suggestions for how I should set up travis (or has anyone already done
> this)?
>
> simon
>

Re: [julia-users] JuliaBox

2014-11-10 Thread David Higgins



On Monday, 10 November 2014 19:33:09 UTC, Shashi Gowda wrote:
>
>
> On Tue, Nov 11, 2014 at 1:01 AM, Shashi Gowda  > wrote:
>
>>
>> Just do not publish it online.
>>
>
> Oops I meant to send it to David directly. If anyone else wants a code, 
> please let me know.
>

I did wonder about this bit :P

Thank you very much in any case.

Dave

[julia-users] Re: Contributing to a Julia Package

2014-11-10 Thread Tim Wheeler

Thank you! It seems to have worked.
Per João's suggestions, I had to:


   - Create a fork on Github of the target package repository
   - Clone my fork locally
   - Create a branch on my local repository
   - Add, commit, & push my changes to said branch
   - On github I could then submit the pull request from my forked repo to 
   the upstream master






On Monday, November 10, 2014 11:17:55 AM UTC-8, Tim Wheeler wrote:
>
> Hello Julia Users,
>
> I wrote some code that I would like to submit via pull request to a Julia 
> package. The thing is, I am new to this and do not understand the pull 
> request process.
>
> What I have done:
>
>- used Pkg.add to obtain a local version of said package
>- ran `git branch mybranch` to create a local git branch 
>- created my code additions and used `git add` to include them. Ran 
>`git commit -m`
>
> I am confused over how to continue. The instructions on git for issuing a 
> pull request require that I use their UI interface, but my local branch is 
> not going to show up when I select "new pull request" because it is, well, 
> local to my machine. Do I need to fork the repository first? When I try 
> creating a branch through the UI I do not get an option to create one like 
> they indicate in the tutorial 
> ,
>  
> perhaps because I am not a repo owner.
>
> Thank you.
>

[julia-users] Re: JuliaBox

2014-11-10 Thread Pablo Zubieta

Hi Shashi, I would like a code too.

Thanks in advance,
Pablo

Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda

Sure :) Happy to let them in.

On Tue, Nov 11, 2014 at 1:02 AM, David Higgins 
wrote:

> Thanks Ivar.
>
> 5 people Shashi, all academics so I'd like to get them interested.
>
> Dave.
>
> On Monday, 10 November 2014 19:31:17 UTC, Shashi Gowda wrote:
>>
>> Hello David,
>>
>> Sorry about that. You can use the invite code G01014. How many others do
>> you want to invite? A handful should be fine. Just do not publish it online.
>>
>> Thank you
>>
>> On Tue, Nov 11, 2014 at 12:15 AM, David Higgins 
>> wrote:
>>
>>> Hi,
>>>
>>> Does anyone if JuliaBox  is open to
>>> applications to use it these days? I came across it in the ArXiV paper
>>> about Julia mentioned here
>>> .
>>> I'm a current Julia user but I have a number of colleagues who would be
>>> interested in a sandboxed, non-install version to play with before making
>>> the jump to installation. I made the mistake of suggesting JuliaBox before
>>> verifying that it was possible to create an account, it seems it's invite
>>> only for now.
>>>
>>> Thanks,
>>> Dave.
>>>
>>
>>

Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda

On Tue, Nov 11, 2014 at 1:01 AM, Shashi Gowda 
wrote:

>
> Just do not publish it online.
>

Oops I meant to send it to David directly. If anyone else wants a code,
please let me know.

Re: [julia-users] JuliaBox

2014-11-10 Thread David Higgins

Thanks Ivar.

5 people Shashi, all academics so I'd like to get them interested.

Dave.

On Monday, 10 November 2014 19:31:17 UTC, Shashi Gowda wrote:
>
> Hello David,
>
> Sorry about that. You can use the invite code G01014. How many others do 
> you want to invite? A handful should be fine. Just do not publish it online.
>
> Thank you
>
> On Tue, Nov 11, 2014 at 12:15 AM, David Higgins  > wrote:
>
>> Hi,
>>
>> Does anyone if JuliaBox  is open to 
>> applications to use it these days? I came across it in the ArXiV paper 
>> about Julia mentioned here 
>> . 
>> I'm a current Julia user but I have a number of colleagues who would be 
>> interested in a sandboxed, non-install version to play with before making 
>> the jump to installation. I made the mistake of suggesting JuliaBox before 
>> verifying that it was possible to create an account, it seems it's invite 
>> only for now.
>>
>> Thanks,
>> Dave.
>>
>
>

Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda

Hello David,

Sorry about that. You can use the invite code G01014. How many others do
you want to invite? A handful should be fine. Just do not publish it online.

Thank you

On Tue, Nov 11, 2014 at 12:15 AM, David Higgins 
wrote:

> Hi,
>
> Does anyone if JuliaBox  is open to applications
> to use it these days? I came across it in the ArXiV paper about Julia
> mentioned here
> .
> I'm a current Julia user but I have a number of colleagues who would be
> interested in a sandboxed, non-install version to play with before making
> the jump to installation. I made the mistake of suggesting JuliaBox before
> verifying that it was possible to create an account, it seems it's invite
> only for now.
>
> Thanks,
> Dave.
>

Re: [julia-users] Contributing to a Julia Package

2014-11-10 Thread João Felipe Santos

Hi Tim,

you have to create a fork on Github and then push your new branch to your
personal fork. Then, on Github, switch to that fork and the interface will
show a "Pull request" button if your personal fork is ahead of the upstream
repository.

Best

--
João Felipe Santos

On Mon, Nov 10, 2014 at 2:17 PM, Tim Wheeler 
wrote:

> Hello Julia Users,
>
> I wrote some code that I would like to submit via pull request to a Julia
> package. The thing is, I am new to this and do not understand the pull
> request process.
>
> What I have done:
>
>- used Pkg.add to obtain a local version of said package
>- ran `git branch mybranch` to create a local git branch
>- created my code additions and used `git add` to include them. Ran
>`git commit -m`
>
> I am confused over how to continue. The instructions on git for issuing a
> pull request require that I use their UI interface, but my local branch is
> not going to show up when I select "new pull request" because it is, well,
> local to my machine. Do I need to fork the repository first? When I try
> creating a branch through the UI I do not get an option to create one like
> they indicate in the tutorial
> ,
> perhaps because I am not a repo owner.
>
> Thank you.
>

[julia-users] Contributing to a Julia Package

2014-11-10 Thread Tim Wheeler

Hello Julia Users,

I wrote some code that I would like to submit via pull request to a Julia 
package. The thing is, I am new to this and do not understand the pull 
request process.

What I have done:

   - used Pkg.add to obtain a local version of said package
   - ran `git branch mybranch` to create a local git branch 
   - created my code additions and used `git add` to include them. Ran `git 
   commit -m`

I am confused over how to continue. The instructions on git for issuing a 
pull request require that I use their UI interface, but my local branch is 
not going to show up when I select "new pull request" because it is, well, 
local to my machine. Do I need to fork the repository first? When I try 
creating a branch through the UI I do not get an option to create one like 
they indicate in the tutorial 
,
 
perhaps because I am not a repo owner.

Thank you.

[julia-users] Re: JuliaBox

2014-11-10 Thread Ivar Nesje

Yesterday someone suggested 

 

 https://tmpnb.org/

kl. 19:45:05 UTC+1 mandag 10. november 2014 skrev David Higgins følgende:
>
> Hi,
>
> Does anyone if JuliaBox  is open to applications 
> to use it these days? I came across it in the ArXiV paper about Julia 
> mentioned here 
> . 
> I'm a current Julia user but I have a number of colleagues who would be 
> interested in a sandboxed, non-install version to play with before making 
> the jump to installation. I made the mistake of suggesting JuliaBox before 
> verifying that it was possible to create an account, it seems it's invite 
> only for now.
>
> Thanks,
> Dave.
>

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson

On Tuesday, September 2, 2014 3:58:25 PM UTC-4, Jake Bolewski wrote:
>
> It would be best to incorporate it into the HDF5 package.  A julia package 
> would be useful if you wanted to do the same sort of compression on Julia 
> binary blobs, such as serialized julia values in an IOBuffer.
>

Wouldn't it be better to have a separate Blosc.jl package that is used by 
HDF5.jl?   After all, there are presumably many other applications of this.

Note that HDF5 has a Blosc filter 
(http://www.hdfgroup.org/services/filters.html#blosc and 
https://github.com/Blosc/c-blosc/tree/master/hdf5), so that I guess you can 
use Blosc internally in the HDF5 file while still allowing HDF5 tools to 
work with the file.

[julia-users] JuliaBox

2014-11-10 Thread David Higgins

Hi,

Does anyone if JuliaBox  is open to applications 
to use it these days? I came across it in the ArXiV paper about Julia 
mentioned here 
. I'm 
a current Julia user but I have a number of colleagues who would be 
interested in a sandboxed, non-install version to play with before making 
the jump to installation. I made the mistake of suggesting JuliaBox before 
verifying that it was possible to create an account, it seems it's invite 
only for now.

Thanks,
Dave.

[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread Steven G. Johnson

On Monday, November 10, 2014 1:15:40 PM UTC-5, David van Leeuwen wrote:

> I was just following Stefan's syntax.  The dots on my screen are about as 
> big as the stuck pieces of dust, but I really believe there is a period 
> there. 
>  
>

The syntax in Compat.jl changed shortly after its release.  The new syntax 
is to use:

 @compat ...Julia 0.4 syntax

and have it be automatically translated into older syntax as needed.  If 
there is a case where this does not work, please file an issue.

[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread David van Leeuwen

Hi Nils, 

My current work around is

## temporary compatibility hack
if VERSION < v"0.4.0-dev"
Base.Dict(z::Base.Zip2) = Dict(z.a, z.b)
end

On Monday, November 10, 2014 12:04:14 PM UTC+1, Nils Gudat wrote:
>
> Hi David,
>
> shouldnt it be @Compat Dict(zip(keys, values)) instead of 
> @Compat.Dict(zip(keys, values)), i.e. a space between compat and dict 
> rather than a dot method call?
>
> I was just following Stefan's syntax.  The dots on my screen are about as 
big as the stuck pieces of dust, but I really believe there is a period 
there. 

julia> @Compat.Dict(:foo => 2, :bar => 2)
> Dict{Symbol,Int64} with 2 entries:
>   :bar => 2
>   :foo => 2
>
>  Macro programming is beyond the scope of my brain, anyway...

---david

> Best,
> Nils
>

[julia-users] Re: parallel for loop in Julia

2014-11-10 Thread DrKey

Here is what i tried:
variant1:

forcp = zeros(3,1);

forcp = @parallel (hcat) for partA = 1:nPart
for partB = (partA+1):nPart
...
end
forcp = forces[:,partA];
end

variant2:
function calcforces(coords,L,np,i) # with np... number of processes i... 
current process
for partA = i+1:np:nPart-1
for partB = (partA+1):nPart
...
return forces
end

np = nprocs();
parad = Array(RemoteRef,np);

and then calling function calcforces with: 
for i=1:np parad[i] = @spawn LJ_Force_MT(coords,L,np,i); end
for i=1:np forces = fetch(parad[i]); end

both ways are giving me wrong results over more than 1 timestep

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread David van Leeuwen

Hello, 

On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat wrote:
>
> Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit : 
> > Hello, 
> > 
> > On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: 
> > NamedArrays.jl generally goes along this way. However, it 
> > remains limited in two aspects: 
> > 
> > 
> > 1. Some fields in NamedArrays are not declared of specific 
> > types. In particular, the field `dicts` is of the type 
> > `Vector{Dict}`, and the use of this field is on the critical 
> > path when looping over the table, e.g. when counting. This 
> > would potentially lead to substantial impact on performance. > 
> > 
> > I suppose the problem you indicate can be alleviated by making 
> > NamedArray parameterized by the type of the key in the dict as well.   
> Right. Sounds reasonable. 
>
>
I've been pondering over how this could be done. NamedArray has a type 
parameter N, and it should then further have N type parameters indicating 
the dictionary type along each of the N dimension.  So I figure this is 
going to be a challenging type definition.  

---david


>

[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread John Drummond

Thank you, that's helpful. 
I reentered it all in a fresh session and found it working as well - I'll 
try and find the difference which caused it not to work and come back.
Kind Regards, John.

On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:
>
> This code works everywhere I'm able to try it. 
>
> kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:
>>
>> I was originally julia 0.3.1 on windows 7
>> this is on Macosx 10 julia 0.3.2
>> I loaded the file LogParse.jl below and then in the repl ran
>>
>> reload("LogParse.jl")
>>
>> methods(isless)
>>
>>
>> ary1 = LogParse.DayPriceText[]
>> push!(ary1,LogParse.DayPriceText(4,"a1",1))
>> push!(ary1,LogParse.DayPriceText(2,"a1",1))
>> push!(ary1,LogParse.DayPriceText(6,"a1",1))
>>
>>
>> sort(ary1)
>>
>> sort(ary1,lt=LogParse.isless)
>> I get the same messages - methods(isless) shows that it's loaded
>> but the sort can't find it, even when I try to specify the function
>>
>>
>> #in file LogParse.jl ###
>> module LogParse
>> export DayPriceText
>> import Base.isless
>>
>> type DayPriceText
>>   a1::Uint32
>>   b1::ASCIIString
>>   a2::Uint32
>> end
>>
>> function isless(a::DayPriceText, b::DayPriceText)
>>   if (a.a1 < b.a1)
>> return true
>>   else
>> return false
>>   end
>> end
>>
>>
>> end
>> ##
>>
>> Many thanks.
>> Kind regards, John
>>
>>
>> On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:
>>>
>>> In this case it would be really great if you had a minimal reproducible 
>>> example. It looks to me as you are doing everything right, so I would start 
>>> looking for typos and scoping issues. It's hard to find them without 
>>> looking at the code.
>>>
>>> Ideally the example should be small and possible to paste into a REPL 
>>> session, but if you can publish your code and don't want to extract only 
>>> the relevant part, that might be fine too.
>>>
>>> Julia version and operating system is also nice to include, so that we 
>>> have it available in case we have problems reproducing your results.
>>>
>>> Regards Ivar
>>>
>>> kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende:

 Hi,
 I suspect I'm doing something stupid but no idea what I'm missing.

 I create a module .
 I create a type in it, DayPriceText
 I import Base.isless
 I define isless for the type

 now in the repl I get

 methods(isless)
 =>
 # 25 methods for generic function "isless":
 ..
 isless(x::DayPriceText,y::DayPriceText) at 
 c:\works\juliaplay\LogParse.jl:16

 but

 julia> typeof(a1p)
 Array{DayPriceText,1}

 julia> sort(a1p, lt=CILogParse.isless)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 julia> sort(a1p)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 I'm sure there's some obvious answer, but I've not idea what.
 Thanks for any help
 kind regards, John.

[julia-users] Re: Error in PyPlot; "cm_get_cmap not defined"

2014-11-10 Thread Steven G. Johnson

Should be fixed now, sorry.

Re: [julia-users] Absolute value of big(-0.0)

2014-11-10 Thread Samuel S. Watson

Done: https://github.com/JuliaLang/julia/issues/8968

On Monday, November 10, 2014 12:06:31 PM UTC-5, Stefan Karpinski wrote:
>
> This is indeed a bug – could you open an issue? 
> https://github.com/JuliaLang/julia/issues
>
> On Mon, Nov 10, 2014 at 5:55 PM, Samuel S. Watson  > wrote:
>
>> I'm getting (notice the negative sign):
>>
>> abs(big(-0.0)) = -0e+00 with 256 bits of precision
>>
>> I think it would be better to have abs(big(-0.0)) return 0e+00 (for 
>> example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an 
>> abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is 
>> ifelse(x<0,-x,0), and -0 is not less than 0. 
>>
>
>

Re: [julia-users] Absolute value of big(-0.0)

2014-11-10 Thread Stefan Karpinski

This is indeed a bug – could you open an issue?
https://github.com/JuliaLang/julia/issues

On Mon, Nov 10, 2014 at 5:55 PM, Samuel S. Watson  wrote:

> I'm getting (notice the negative sign):
>
> abs(big(-0.0)) = -0e+00 with 256 bits of precision
>
> I think it would be better to have abs(big(-0.0)) return 0e+00 (for
> example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an
> abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is
> ifelse(x<0,-x,0), and -0 is not less than 0.
>

[julia-users] Absolute value of big(-0.0)

2014-11-10 Thread Samuel S. Watson

I'm getting (notice the negative sign):

abs(big(-0.0)) = -0e+00 with 256 bits of precision

I think it would be better to have abs(big(-0.0)) return 0e+00 (for 
example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an 
abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is 
ifelse(x<0,-x,0), and -0 is not less than 0.

[julia-users] Error in PyPlot; "cm_get_cmap not defined"

2014-11-10 Thread Nils Gudat

I'm using PyPlot to make 3D plots, which I color by getting color maps 
through ColorMap(::String). After running a Pkg.update() today, I am now 
getting an error message when trying to construct a 3D plot, saying 
"cm_get_cmap not defined (...) at Plots.jl:141". 
Indeed, when checking colormaps.jl 
, I 
find that ColorMaps should lead to a call to get_cmap, not cm_get_cmap. Why 
is my PyPlot trying to get the color maps through a different function?

[julia-users] parallel for loop in Julia

2014-11-10 Thread DrKey

I'm a beginner at using Julia and I have written a simple molecular dynamic 
simulation, which works quite well and fast.

Now I'm trying to parallelize my core loop which calculates the forces 
between each pair of particles.

My loop is:

for partA = 1:nParts-1
for partB = (partA+1):nParts

# Calculate particle-particle distance
dr = coords[:,partA] - coords[:,partB];
   
dr2 = dot(dr,dr) 

invDr2 = 1.0/dr2; 
invDr6 = invDr2^3;
tforce = invDr2^4 * (invDr6 - 0.5);

forces[:,partA] = forces[:,partA] + dr* tforce ;
forces[:,partB] = forces[:,partB] - dr* tforce ;
end
end

coords is a array holding the 3 dimensional coordinates for each particle.
nParts is the number of particles and forces has the same size as coords 
and holds the forces for each particle.

I tried @parallel for with different reduction operators (I found + and 
vcat, of course with changing my loop a little bit) which are not 
documented very well. At least I only found examples for (+) in the help.
What is the best way to parallelize this?

Re: [julia-users] Translating Class-Based OO Apps to Julia

2014-11-10 Thread Greg Trzeciak



On Thursday, January 17, 2013 2:56:52 AM UTC+1, Stefan Karpinski wrote:
>
> ... This definitely should go in an object-oriented programming in Julia 
> document.
>

Does a document like this exist? It would definitely be useful.

[julia-users] travis for os x packages

2014-11-10 Thread Simon Byrne

I would like to set up travis for an OS X-only package: does anyone have 
suggestions for how I should set up travis (or has anyone already done 
this)?

simon

Re: [julia-users] Re: strange speed reduction when using external function in inner loop

2014-11-10 Thread Rob J Goedman

David,

Not sure this is correct or helps, but on my Yosemite 10.10.1 MacBook Pro I 
get below results.

Regards,
Rob

*julia> **@time prof(true)*

 Count FileFunction Line

47 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 15

   165 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   502 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

98 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

64 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 29

 5 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

20 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

45 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   883 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   45

   884 REPL.jl eval_user_input54

   502 array.jl+ 719

   165 random.jl   rand! 130

   884 task.jl anonymous  96

elapsed time: 1.51332406 seconds (488212276 bytes allocated, 53.00% gc time)


*julia> **@time prof(true)*

 Count FileFunction Line

   156 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   577 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 21

   116 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 26

53 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

10 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

43 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 3 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   910 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   910 REPL.jl eval_user_input54

   577 array.jl+ 719

   156 random.jl   rand! 130

   910 task.jl anonymous  96

elapsed time: 1.488157718 seconds (488208960 bytes allocated, 50.96% gc 
time)


*julia> **@time prof(true)*

 Count FileFunction Line

   174 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   545 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

   115 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 26

46 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 29

 8 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

18 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

28 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 3 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   894 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   894 REPL.jl eval_user_input54

   545 array.jl+ 719

   174 random.jl   rand! 130

   894 task.jl anonymous  96

elapsed time: 1.448621207 seconds (488206436 bytes allocated, 49.75% gc 
time)


*julia> **@time prof(true)*

 Count FileFunction Line

   165 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   584 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

   117 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

51 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 5 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

16 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

34 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

   922 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   922 REPL.jl eval_user_input54

   584 array.jl+ 719

   165 random.jl   rand! 130

   922 ta

[julia-users] Re: Great new expository article about Julia by the core developers

2014-11-10 Thread David Higgins

So how does one go about getting an invitation to JuliaBox? It's referenced 
in the article but you need an invitation to login

Dave.

On Saturday, 8 November 2014 22:58:31 UTC, Peter Simon wrote:
>
> Just found this great new highly accessible exposition about the Julia 
> language: http://arxiv.org/pdf/1411.1607v1.pdf, by Jeff et al.  It's the 
> perfect into to share with many of my not-yet-Julian colleagues.
>
> --Peter
>

Re: [julia-users] Re: Input arguments to gemm!

2014-11-10 Thread Andreas Noack

E.g.

julia> A = randn(3,4);B = randn(4,3);C = Array(Float64,3,3);


julia> BLAS.gemm!('N', 'N', 1.0, A, B, 0.0, C)

3x3 Array{Float64,2}:

 -1.39617  4.02968   -1.2171

 -2.35074  2.609030.216789

  1.63807  0.102948  -0.41358



2014-11-10 9:09 GMT-05:00 Steven G. Johnson :

>
>
> On Monday, November 10, 2014 8:39:00 AM UTC-5, Kapil Agarwal wrote:
>
>> I am unable to figure out what should I pass as input parameters to the
>> gemm! function. The function declaration asks for function BlasChar,
>> StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?
>>
>
> Yes.  (Or rather, the StridedFoo types are a superset, including various
> 1d/2d array types.)
>
>

[julia-users] Re: Input arguments to gemm!

2014-11-10 Thread Steven G. Johnson



On Monday, November 10, 2014 8:39:00 AM UTC-5, Kapil Agarwal wrote:

> I am unable to figure out what should I pass as input parameters to the 
> gemm! function. The function declaration asks for function BlasChar, 
> StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?
>

Yes.  (Or rather, the StridedFoo types are a superset, including various 
1d/2d array types.)

Re: [julia-users] Silhouette width

2014-11-10 Thread Jacob Quinn

Check out the Clustering.jl package which has an interface for silhouette.
Specifically, see this file:
https://github.com/JuliaStats/Clustering.jl/blob/master/src/silhouette.jl

-Jacob

On Mon, Nov 10, 2014 at 5:53 AM, Francesco Brundu <
francesco.bru...@gmail.com> wrote:

> Hi all,
> I am new to Julia. I searched a bit but I did not find anything related to
> the silhouette (http://en.wikipedia.org/wiki/Silhouette_(clustering)) ..
> Do you know if there is something about it?
>
> Thanks,
> Francesco
>

Re: [julia-users] Reinterpreting parts of a byte array

2014-11-10 Thread Sebastian Good

Thanks for the responses. As usual, I discover myself making assumptions
that may not have been stated well.

1. I'll be reading small bits (32 bit ints, mostly) at fairly random
addresses and was worried about the overhead of creating array views for
such small objects. Perhaps they are optimized away. I should check :-)
2. I've been taught by other languages that touching raw pointers is
dangerous without also holding some promise that they won't be relocated,
e.g. by a copying collector, etc. I suppose if it's a memory mapped array,
I can roughly cheat and know that the OS won't move it, so Julia can't
either. But it worried me.

*Sebastian Good*

On Sun, Nov 9, 2014 at 11:36 PM, Jameson Nash  wrote:

> It rather depends upon what you know about the data. If you want a
> file-like abstraction, it may be possible to wrap it in an IOBuffer type
> (if not, it should be parameterized to allow it). If you want an array-like
> abstraction, then I think reinterpreting to different array types may be
> the most direct approach. If the array is coming from C, then you can use
> unsafe_load/unsafe_store directly. As Ivar points out, this is not more nor
> less dangerous than the same operation in C. Although, if you wrap the data
> buffer in a Julia object (or got it from a Julia call), you can gain some
> element of protection against memory corruption bugs by minimizing the
> amount of julia code that is directly interfacing with the raw memory
> pointer.
>
>
> On Sun Nov 09 2014 at 5:42:42 PM Ivar Nesje  wrote:
>
>> Is there any problem with reinterpreting the array and then use a
>> SubArray or ArrayView to do the index transformation?
>>
>> Pointer arithmetic is not more or less dangerous in Julia, than what it
>> is in C. The only thing you need to ensure is that the object you have a
>> pointer to is referenced by something the GC traverses, and that it isn't
>> moved in memory (Eg. vector resize).
>>
>

[julia-users] Input arguments to gemm!

2014-11-10 Thread Kapil Agarwal

Hi

I am unable to figure out what should I pass as input parameters to the 
gemm! function. The function declaration asks for function BlasChar, 
StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?

--
Kapil

[julia-users] Silhouette width

2014-11-10 Thread Francesco Brundu

Hi all,
I am new to Julia. I searched a bit but I did not find anything related to 
the silhouette (http://en.wikipedia.org/wiki/Silhouette_(clustering)) ..
Do you know if there is something about it?

Thanks,
Francesco

Re: [julia-users] Performance confusions on matrix extractions in loops, and memory allocations

2014-11-10 Thread Milan Bouchet-Valat

Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo a écrit :
> Hi fellows, 
> 
> 
> 
> I'm currently working on sparse matrix and cosine similarity
> computation, but my routines is running very slow, at least not reach
> my expectation. So I wrote some test functions, to dig out the reason
> of ineffectiveness. To my surprise, the execution time of passing two
> vectors to the test function and passing the whole sparse matrix
> differs greatly, the latter is 80x faster. I am wondering why
> extracting two vectors of the matrix in each loop is dramatically
> faster that much, and how to avoid the multi-GB memory allocate.
> Thanks guys.
> 
> 
> --
> BEST REGARDS,
> Todd Leo
> 
> 
> # The sparse matrix
> mat # 2000x15037 SparseMatrixCSC{Float64, Int64}
> 
> 
> # The two vectors, prepared in advance
> v = mat'[:,1]
> w = mat'[:,2]
> 
> 
> # Cosine similarity function
> function cosine_vectorized(i::SparseMatrixCSC{Float64, Int64},
> j::SparseMatrixCSC{Float64, Int64})
> return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j))
> end
I think you'll experience a dramatic speed gain if you write the sums in
explicit loops, accessing elements one by one, taking their product and
adding it immediately to a counter. In your current version, the
element-wise products allocate new vectors before computing the sums,
which is very costly.

This will also get rid of the difference you report between passing
arrays and vectors.


Regards

> function test1(d)
> res = 0.
> for i in 1:1
> res = cosine_vectorized(d[:,1], d[:,2])
> end
> end
> 
> 
> function test2(_v,_w)
> res = 0.
> for i in 1:1
> res = cosine_vectorized(_v, _w)
> end
> end
> 
> 
> test1(dtm)
> test2(v,w)
> gc()
> @time test1(dtm)
> gc()
> @time test2(v,w)
> 
> 
> #elapsed time: 0.054925372 seconds (59360080 bytes allocated, 59.07%
> gc time)
> 
> #elapsed time: 4.204132608 seconds (3684160080 bytes allocated, 65.51%
> gc time)
>

[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread Nils Gudat

Hi David,

shouldnt it be @Compat Dict(zip(keys, values) instead of 
@Compat.dict(zip(keys, values)), i.e. a space between compat and dict 
rather than a dot method call?

Best,
Nils

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Tim Holy

All good plans. (I'm not sure about using 65536 bins for 16-bit images, 
though, because that would be more bins than there are pixels in some images. 
Still, it's not all that much memory, really, so maybe that would be OK.)

It would be great to add native support. Presumably you've found the docs on 
adding support for new formats.

For formats that encode large datasets in a single block (like NRRD), you can 
work with GB-sized datasets on a laptop because you can use mmap (I do it 
routinely). But the love of TIFF does demand an alternative solution. 
Presumably we should add a lower-level routine that returns a structure that 
facilitates later access, e.g.,
imds = imdataset("my_image_file")
img = imds["z", 14, "t", 7]
or somesuch.

--Tim

On Sunday, November 09, 2014 07:38:27 PM Aneesh Sathe wrote:
> Tim,
> i would like the imhist to be idiot proof. (i've been teaching matlab and
> nothing puts new people off more than things not being idiot proof).
> things like using 256 bins by default returning a plot  if no outputs
> are specified (basically make it like matlab's imthresh() )
> 
> Btw, on matlab using bioformats is actually the slowest part of my
> algorithm, so unless it can be faster in julia native support might be
> nicer. Bioformats also fails in that it reads the whole sequence at once...
> so running things on laptops with even GB-level datasets is impossible. I
> wrote my own version of bfopen to only open the required XYZCT for
> specified series, but that only solves the memory usage.
> 
> the source format for my image was .mvd2 (perkin elmer spinning disk).
> 
> i know about JavaCall.jl just havent had the time to play with it...
> 
> i was thinking it might be fun to attempt native support for a few formats.
> I can also generate test data in a few vendor formats for a few
> microscopes.
> perhaps even make it a julia-box based project. ;)
> 
> On Monday, November 10, 2014 4:49:22 AM UTC+8, Tim Holy wrote:
> > On Sunday, November 09, 2014 11:39:53 AM Aneesh Sathe wrote:
> > > Yes, Images does read it okay but only if i cut out the substack. If i
> > > don't, then it interprets the three channels as a time dimension, which
> > > isnt a pain at the moment but will be if i start using it for work.
> > 
> > Hmm, that sounds like an annotation problem.
> > 
> > > I realized that both the convert and the g[:] would slow me down but the
> > > hist function just wouldn't work without that kind of dance. Also,
> > > graythresh (http://www.mathworks.com/help/images/ref/graythresh.html)
> > 
> > uses
> > 
> > > reshape to make it all one image which might also add to speed.
> > > 
> > > The pull request is well and good but personally i would rather have a
> > > dedicated image histogram function like
> > > imhist: http://www.mathworks.com/help/images/ref/imhist.html
> > > which would give histograms based on input images. To me that's the only
> > > way to make life easier. maybe i'll write one :)
> > 
> > imhist is necessary in matlab largely because hist works columnwise; in a
> > sense, Julia's `hist` is like imhist. Is there some specific functionality
> > you're interested in? There's no reason Images can't provide a custom
> > version
> > of `hist`.
> > 
> > > Something about Images: do you think it possible to use the bio formats'
> > > .jar file to import images from a microscope format to Images?
> > > Opening a microscope format image file in the relevant software and then
> > > exporting it as tiff takes too long and i'd rather be able to access the
> > > images directly.
> > 
> > Yes, expansion of Images' I/O capabilities would be great. I've wondered
> > about
> > Bio-Formats myself, but not had a direct need, nor do I know Java (but see
> > JavaCall.jl, if you haven't already).
> > 
> > The other way to go, of course, is Julia native support. Our support for
> > NRRD
> > is a reasonable model of this approach. However, the reason we use
> > ImageMagick
> > is because the reality is that there are a lot of formats out there; Bio-
> > Formats would fill a similar need for vendor-specific file formats. Out of
> > curiousity, what's the original format you're using?
> > 
> > --Tim

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread Milan Bouchet-Valat

Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit :
> Hello, 
> 
> On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote:
> NamedArrays.jl generally goes along this way. However, it
> remains limited in two aspects:
> 
> 
> 1. Some fields in NamedArrays are not declared of specific
> types. In particular, the field `dicts` is of the type
> `Vector{Dict}`, and the use of this field is on the critical
> path when looping over the table, e.g. when counting. This
> would potentially lead to substantial impact on performance.
> 
> 
> In the beginning I have been experimenting with indexing speed, mainly
> to sort out the various forms of getindex(), and I although I don't
> remember the exact result, I do remember that I found the drop in
> performance w.r.t. integer indexing surprisingly small. 
> 
> 
> I suppose the problem you indicate can be alleviated by making
> NamedArray parameterized by the type of the key in the dict as well.  
Right. Sounds reasonable.

> 2. Currently, it only accepts a limited set of types for
> indices, e.g. Real and String. But in some cases, people may
> go beyond this. I don't think we have to impose this limit. 
> 
> 
> Ah---I now see what you mean.  I thought I had built in support for
> all types as index, but there obviously is no catch all-rule in
> getindex.  I suppose NamedArray needs an update there. 
I think the last time I looked into this, it was a problem even for
efficiently indexing AbstractArrays:
https://github.com/JuliaLang/julia/pull/4892#issuecomment-31087910

Slow catch-all methods are good, but if we want specialized versions it
will probably need more work. If you want to accept combinations of
Int/String/Complement{T}/anything, the number of specialized methods to
generate explodes. I think the conclusion was that we needed to wait for
staged functions. Since they are implemented now, it may be a good time
to look into this issue for both AbstractArrays and NamedArrays.


Regards

> On Monday, November 10, 2014 8:35:32 AM UTC+8, Dahua Lin
> wrote:
> I have been observing an interesting differences
> between people coming from stats and machine learning.
> 
> 
> Stats people tend to favor the approach that allows
> one to directly use the category names to index the
> table, e.g. A["apple"]. This tendency is clearly
> reflected in the design of R, where one can attach a
> name to everything.
> 
> 
> While in machine learning practice, it is a common
> convention to just encode categories into integers,
> and simply use an ordinary array to represent a
> counting table. Whereas it makes it a little bit
> inconvenient in an interactive environment, this way
> is generally more efficient when you have to deal with
> these categories over a large number of samples.
> 
> 
> These differences aside, I believe, however, that
> there exist a very generic approach to this problem --
> a multi-dimensional associative map, which allows one
> to write A[i1, i2, ...] where the indices can be
> arbitrary hashable & equality-comparable instances,
> including integers, strings, symbols, among many other
> things.
> 
> 
> A multi-dimensional associative map can be considered
> as a multi-dimensional generalization of dictionaries,
> which can be easily implemented via an
> multidimensional array and several dictionaries, each
> for one dimension, to map user-side indexes to integer
> indexes. 
> 
> 
> - Dahua
> 
> 
> 
> 
> 
> 
> 
> On Monday, November 10, 2014 8:12:54 AM UTC+8, David
> van Leeuwen wrote:
> Hi, 
> 
> On Sunday, November 9, 2014 5:10:19 PM UTC+1,
> Milan Bouchet-Valat wrot
> Actually I didn't do it because
> NamedArrays.jl didn't work well on 0.3
> when I first worked on the package.
> Now I see the tests are still failing.
> Do you know what is needed to make
> them wo

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread Milan Bouchet-Valat

Le dimanche 09 novembre 2014 à 23:50 +, John Myles White a écrit :
> FWIW, I think the best way to move forward with NamedArrays is to
> replace NamedArrays with a parametric type Named{T} that wraps around
> other AbstractArray types. That gives you both named Array and named
> DataArray objects for the same cost.
Yeah, looks like a good idea. Duplicating the code for each array type
would be a waste.


Regards


> On Nov 9, 2014, at 5:49 PM, Tim Holy  wrote:
> 
> > Indeed, better to use a Dict if you're naming each row/column. I'd 
> > forgotten 
> > that was part of NamedArrays.
> > 
> > --Tim
> > 
> > On Sunday, November 09, 2014 06:11:44 PM Milan Bouchet-Valat wrote:
> >> Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit :
> >>> With regards to arrays with named dimensions, I suspect that with the
> >>> arrival of stagedfunctions, something like NamedAxesArrays
> >>> (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But
> >>> stagedfunctions still have some show-stopper bugs, and we need to fix
> >>> those
> >>> first.
> >> 
> >> Interesting package!
> >> 
> >> But when I said "named dimensions", I actually meant that dimensions had
> >> names, but that elements on each dimension (rows, columns...) had names
> >> too. I'm not sure it also makes sense to use staged functions to
> >> specialize code on element names, since they can vary much more than
> >> dimension names. This could generate quite a lot of methods which would
> >> use memory even if only used once.
> >> 
> >> 
> >> Regards
> >> 
> >>> On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote:
>  Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
> > I would vote for calling such a function `table()`, to get even closer
> > to R's table().
>  
>  Well, that's the debate at
>  https://github.com/JuliaStats/StatsBase.jl/issues/32
>  
>  At first I was in favor of table() too, but now I prefer freqtable(),
>  because "table" could mean any kind of cross-tabulation. I think
>  NamedArray could even be called Table.
>  
> > And I can't wait for such functionality to be included in METADATA...
>  
>  Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
>  when I first worked on the package. Now I see the tests are still
>  failing. Do you know what is needed to make them work?
>  
>  Another point is that I think this deserves going into StatsBase, but
>  before that we need everybody to agree on a design for NamedArrays.
>  
>  Regards
>  
> > On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
> > 
> > wrote:
> >Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
> > 
> >écrit :
> >> I was also looking for a function like this, but could not
> >> find one in docs.julialang.org.  I was doing this
> >> (v0.4.0-dev), for anyone who is interested:
> >> 
> >> 
> >> example = rand(1:10,100)
> >> uexample = sort(unique(example))
> >> counts = map(x->count(y->x==y,example),uexample)
> >> 
> >> 
> >> It's pretty ugly, so thanks, Johan, for pointing out the
> >> StatsBase->countmap
> > 
> >I've also put together a small package precisely aimed at
> >offering an equivalent of R's table():
> >https://github.com/nalimilan/Tables.jl
> > 
> >But there's a more general issue about how to handle arrays
> >with dimension names in Julia. NamedArrays.jl (which is used
> >in my package) attempts to tackle this issue, but I don't
> >think we've reached a consensus yet about the best solution.
> > 
> > 
> >Regards
> > 
> >> On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
> >> 
> >> wrote:
> >>I think countmap comes closest to giving you what
> >>you want:
> >> 
> >>using StatsBase
> >>data = sample(["a", "b", "c"], 20)
> >>countmap(data)
> >> 
> >>Dict{ASCIIString,Int64} with 3 entries:
> >>  "c" => 3
> >>  "b" => 10
> >>  "a" => 7
> >> 
> >>On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
> >> 
> >>Oswald wrote:
> >>Hi
> >> 
> >> 
> >>I'm looking for the best way to count how
> >>many times a certain value x_i appears in
> >>vector x, where x could be integers, floats,
> >>strings. In R I would do table(x). I found
> >>StatsBase.counts(x,k) but I'm a bit confused
> >>by k (where k goes into 1:k, i.e. the vector
> >>is scanned to find how many elements locate
> >>at each point of 1:k). most of the times I
> >>

[julia-users] Re: Available packages for compression?

2014-11-10 Thread Robert Feldt

If people want to try Blosc please see this issue for how to build it on 
Julia 0.3.0 (at least on my Mac OS X 10.9):

https://github.com/jakebolewski/Blosc.jl/issues/1

but then one can compare Zlib and Blosc compressors:

using Zlib
zliblength(str) = length(Zlib.compress(str,9,false,true))
using Blosc
lz4length(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:lz4))
lz4hclength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:lz4hc))
bzliblength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:zlib))

function report(name, func, input)
  tic()
  len = func(input)
  t = toq()
  @printf("%s, time = %.3e seconds, compression ratio = %.3f\n", name, t, 
length(input)/len)
end

for exponent in 1:7
  n = 10^exponent
  input = Uint8[1:n];
  strinput = string(input);
  println("\nInput of length 10^$exponent")
  report("zlib ", (input) -> zliblength(input), input)
  report("zlib in blosc", (input) -> lz4hclength(input), input)
  report("lz4hc", (input) -> bzliblength(input), input)
  report("lz4  ", (input) -> lz4length(input), input)
end

which gives output:

Input of length 10^1
zlib , time = 4.789e-02 seconds, compression ratio = 0.833
zlib in blosc, time = 3.256e-02 seconds, compression ratio = 0.385
lz4hc, time = 3.939e-03 seconds, compression ratio = 0.385
lz4  , time = 3.482e-03 seconds, compression ratio = 0.385

Input of length 10^2
zlib , time = 1.211e-04 seconds, compression ratio = 0.980
zlib in blosc, time = 1.448e-05 seconds, compression ratio = 0.862
lz4hc, time = 3.801e-06 seconds, compression ratio = 0.862
lz4  , time = 3.403e-06 seconds, compression ratio = 0.862

Input of length 10^3
zlib , time = 8.187e-05 seconds, compression ratio = 3.571
zlib in blosc, time = 1.400e-04 seconds, compression ratio = 3.413
lz4hc, time = 5.589e-05 seconds, compression ratio = 3.226
lz4  , time = 1.119e-05 seconds, compression ratio = 3.413

Input of length 10^4
zlib , time = 1.158e-04 seconds, compression ratio = 27.473
zlib in blosc, time = 4.732e-05 seconds, compression ratio = 30.395
lz4hc, time = 1.107e-04 seconds, compression ratio = 25.381
lz4  , time = 6.572e-06 seconds, compression ratio = 30.395

Input of length 10^5
zlib , time = 7.319e-04 seconds, compression ratio = 140.252
zlib in blosc, time = 2.058e-04 seconds, compression ratio = 146.628
lz4hc, time = 6.519e-04 seconds, compression ratio = 134.590
lz4  , time = 2.368e-05 seconds, compression ratio = 146.628

Input of length 10^6
zlib , time = 4.517e-03 seconds, compression ratio = 238.095
zlib in blosc, time = 2.291e-04 seconds, compression ratio = 237.473
lz4hc, time = 4.493e-03 seconds, compression ratio = 236.407
lz4  , time = 6.989e-04 seconds, compression ratio = 198.807

Input of length 10^7
zlib , time = 4.499e-02 seconds, compression ratio = 255.669
zlib in blosc, time = 3.146e-02 seconds, compression ratio = 246.299
lz4hc, time = 1.749e-02 seconds, compression ratio = 247.078
lz4  , time = 5.670e-03 seconds, compression ratio = 200.489

It seems that LZ4Hc compression in Blosc is sometimes quite some bit 
faster, but not always. Compression ratio is good. 
LZ4 is always faster than the others but sometimes compresses a bit less.
For strings shorter than ~350 characters there is not always any 
compression of the input.
Note that the string being compressed here is very regular though so this 
eval is not very good and might be misleading of compression levels to 
expect. This is just a very rough indication.

Cheers,

Robert



Den måndagen den 10:e november 2014 kl. 09:49:54 UTC+1 skrev Robert Feldt:
>
> For a project I need fast string compression accessible from Julia. I have 
> found:
>
> * Gzip.jl, file-based access to gzip compression
>   https://github.com/JuliaLang/GZip.jl
>
> * Zlib.jl, in-memory access to gzip compression
>   https://github.com/dcjones/Zlib.jl
>
> * There has been talks about doing a Julia package for Blosc (blosc.org) 
> and I found this but not sure it's working:
>   https://github.com/jakebolewski/Blosc.jl
>   https://groups.google.com/forum/#!topic/julia-users/eT5_h9zfT5k
>
> If anyone knows of more/other compression packages useable from Julia, 
> please share in this thread. This way people can get a more up-to-date 
> view. 
> Compression is a basic building block for a lot of different things so 
> good if we have many options in Julia. Would be very nice to have access to 
> liblzma, xz, paq etc, long-term.
>
> If one just needs to estimate the LZ76 complexity there is a pure Julia 
> implementation here:
>
> https://github.com/robertfeldt/InfoTheory.jl/blob/master/spikes/lempel_ziv_76_complexity.jl
> but it has bad performance for long strings compare to Zlib so probably 
> not very useful.
>
> Thanks,
>
> Robert Feldt
>

[julia-users] Available packages for compression?

2014-11-10 Thread Robert Feldt

For a project I need fast string compression accessible from Julia. I have 
found:

* Gzip.jl, file-based access to gzip compression
  https://github.com/JuliaLang/GZip.jl

* Zlib.jl, in-memory access to gzip compression
  https://github.com/dcjones/Zlib.jl

* There has been talks about doing a Julia package for Blosc (blosc.org) 
and I found this but not sure it's working:
  https://github.com/jakebolewski/Blosc.jl
  https://groups.google.com/forum/#!topic/julia-users/eT5_h9zfT5k

If anyone knows of more/other compression packages useable from Julia, 
please share in this thread. This way people can get a more up-to-date 
view. 
Compression is a basic building block for a lot of different things so good 
if we have many options in Julia. Would be very nice to have access to 
liblzma, xz, paq etc, long-term.

If one just needs to estimate the LZ76 complexity there is a pure Julia 
implementation here:
https://github.com/robertfeldt/InfoTheory.jl/blob/master/spikes/lempel_ziv_76_complexity.jl
but it has bad performance for long strings compare to Zlib so probably not 
very useful.

Thanks,

Robert Feldt

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Robert Feldt

Has there been any progress on a (stand-alone) Blosc package for Julia? If 
not I might have time to contribute since I need a fast compressor for a 
project. If there is any code/start for it I'd appreciate it though.

Cheers,

Robert Feldt

Den tisdagen den 2:e september 2014 kl. 21:47:33 UTC+2 skrev Douglas Bates:
>
> Would it be reasonable to create a Blosc package or it is best to 
> incorporate it directly into the HDF5 package?  If a separate package is 
> reasonable I could start on it, as I was the one who suggested this in the 
> first place.
>
> On Tuesday, September 2, 2014 2:43:15 PM UTC-5, Tim Holy wrote:
>>
>> All these testimonials do make it sound promising. Even three-fold 
>> compression 
>> is a pretty big deal. 
>>
>> One disadvantage to compression is that it makes mmap impossible. But, 
>> since 
>> HDF5 supports hyperslabs, that's not as big a deal as it would have been. 
>>
>> --Tim 
>>
>> On Tuesday, September 02, 2014 12:11:55 PM Jake Bolewski wrote: 
>> > I've used Blosc in the past with great success.  Oftentimes it is 
>> faster 
>> > than the uncompressed version if IO is the bottleneck.  The compression 
>> > ratios are not great but that is really not the point. 
>> > 
>> > On Tuesday, September 2, 2014 2:09:20 PM UTC-4, Stefan Karpinski wrote: 
>> > > That looks pretty sweet. It seems to avoid a lot of the pitfalls of 
>> > > naively compressing data files while still getting the benefits. It 
>> would 
>> > > be great to support that in JLD, maybe even turned on by default. 
>> > > 
>> > > 
>> > > On Tue, Sep 2, 2014 at 1:35 PM, Kevin Squire > > > 
>> > > > wrote: 
>> > >> Just to hype blosc a little more, see 
>> > >> 
>> > >> http://www.blosc.org/blosc-in-depth.html 
>> > >> 
>> > >> The main feature is that data is chunked so that the compressed 
>> chunk 
>> > >> size fits into L1 cache, and is then decompressed and used there. 
>>  There 
>> > >> are a few more buzzwords (multithreading, simd) in the link above. 
>> Worth 
>> > >> exploring where this might be useful in Julia. 
>> > >> 
>> > >> Cheers, 
>> > >> 
>> > >>   Kevin 
>> > >> 
>> > >> On Tuesday, September 2, 2014, Tim Holy > > 
>> > >> 
>> > >> wrote: 
>> > >>> HDF5/JLD does support compression: 
>> > >>> 
>> > >>> 
>> https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-w 
>> > >>> riting-data 
>> > >>> 
>> > >>> But it's not turned on by default. Matlab uses compression by 
>> default, 
>> > >>> and 
>> > >>> I've found it's a huge bottleneck in terms of performance 
>> > >>> ( 
>> > >>> 
>> http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files 
>> > >>> -more-quickly). But perhaps there's a good middle ground. It would 
>> take 
>> > >>> someone 
>> > >>> doing a little experimentation to see what the compromises are. 
>> > >>> 
>> > >>> --Tim 
>> > >>> 
>> > >>> On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote: 
>> > >>> > Now that the JLD format can handle DataFrame objects I would like 
>> to 
>> > >>> 
>> > >>> switch 
>> > >>> 
>> > >>> > from storing data sets in .RData format to .jld format.  Datasets 
>> > >>> 
>> > >>> stored in 
>> > >>> 
>> > >>> > .RData format are compressed after they are written.  The default 
>> > >>> > compression is gzip.  Bzip2 and xz compression are also 
>> available. 
>> > >>> > The 
>> > >>> > compression can make a substantial difference in the file size 
>> because 
>> > >>> 
>> > >>> the 
>> > >>> 
>> > >>> > data values are often highly repetitive. 
>> > >>> > 
>> > >>> > JLD is different in scope in that .jld files can be queried using 
>> > >>> 
>> > >>> external 
>> > >>> 
>> > >>> > programs like h5ls and the files can have new data added or 
>> existing 
>> > >>> 
>> > >>> data 
>> > >>> 
>> > >>> > edited or removed.  The .RData format is an archival format. 
>>  Once the 
>> > >>> 
>> > >>> file 
>> > >>> 
>> > >>> > is written it cannot be modified in place. 
>> > >>> > 
>> > >>> > Given these differences I can appreciate that JLD files are not 
>> > >>> 
>> > >>> compressed. 
>> > >>> 
>> > >>> >  Nevertheless I think it would be useful to adopt a convention in 
>> the 
>> > >>> 
>> > >>> JLD 
>> > >>> 
>> > >>> > module for accessing data from files with a .jld.xz or .jld.7z 
>> > >>> 
>> > >>> extension. 
>> > >>> 
>> > >>> >  It could be as simple as uncompressing the files in a temporary 
>> > >>> 
>> > >>> directory, 
>> > >>> 
>> > >>> > reading then removing, or it could be more sophisticated.  I 
>> notice 
>> > >>> 
>> > >>> that my 
>> > >>> 
>> > >>> > versions of libjulia.so on an Ubuntu 64-bit system are linked 
>> against 
>> > >>> 
>> > >>> both 
>> > >>> 
>> > >>> > libz.so and liblzma.so 
>> > >>> > 
>> > >>> > $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so 
>> > >>> > linux-vdso.so.1 =>  (0x7fff5214f000) 
>> > >>> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 
>> (0x7f62932ee000) 
>> > >>> > libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7f62930d5000) 
>> > >>> >

Re: [julia-users] no zero() for DateTime?

2014-11-10 Thread John Myles White

Yes, the use of zero is an anachronism from a design in which zero was used to 
have a default value for arbitrary types.

 -- John

On Nov 10, 2014, at 8:22 AM, Ivar Nesje  wrote:

> Basically this is an issue with DataFrames using a function in base for a 
> different purpose than its documented intent. zero() has been documented to 
> mean additive identity, and Date and DateTime, doesn't have an additive 
> identity. (apart from the period types, but it is unclear which one to return)
> 
> Looking at dataframes, I discovered that they already monkey patch 
> Base.zeros() to make it work for strings.
> 
> I think this is a bigger issue to be discussed in the contest of the use case 
> in DataFrames. My two obvious suggestions would be to:
> Change the documentation for zero() to say that it is the additive identity 
> unless it doesn't make sense, in which case any default value is good.
> Create a new function in Base for this specific need of a default value.
> Ivar
> 
> kl. 03:53:43 UTC+1 mandag 10. november 2014 skrev Jacob Quinn følgende:
> HmmmI guess we could add 0 and 1 definitions if it'll be generally 
> useful (i.e. Date/DateTime s are ordinals with numeric-like properties, so 
> being able to define zero/one and have them work with generic functions).
> 
> It still just seems a little weird because there's not a real solid 
> reasoning/meaning. I think one reason a lot of other languages define a 
> zero(::DateTime) is because values can be "truthy" or "falsey", so you would 
> compare a date with zero(::DateTime) to check for falseness. In Julia, you 
> have to use explicit Booleans, so that's not as important a reason.
> 
> Happy to hear other opinions/use cases from people though.
> 
> -Jacob
> 
> On Sun, Nov 9, 2014 at 9:23 PM, Thomas Covert  wrote:
> To your first question, I'm sure there are good reasons for not having zeros 
> in the Date and Time types, but in other languages (i.e., stata), dates and 
> times are stored as integers or floats with respect to some reference time.  
> So, I *think* the 0-date in stata refers to January 1, 1960.  Obviously this 
> is fairly arbitrary, but there is some precedence for it in other languages.
> 
> On Sunday, November 9, 2014 8:17:04 PM UTC-6, Jacob Quinn wrote:
> What Date would represent zero(::Date)? Or one(::Date), for that matter? 
> Doesn't seem like a particularly useful definition. What's the use case?
> 
> On Sun, Nov 9, 2014 at 9:14 PM, Thomas Covert  wrote:
> I'm using Dates.jl on 0.3 and have discovered that there is no zero defined 
> for the Date or DateTime types.  Is this intentional?
> 
> 
> 
>

Re: [julia-users] no zero() for DateTime?

2014-11-10 Thread Ivar Nesje

Basically this is an issue with DataFrames using a function in base for a 
different purpose than its documented intent. zero() has been documented to 
mean additive identity 
, and Date and 
DateTime, doesn't have an additive identity. (apart from the period types, 
but it is unclear which one to return)

Looking at dataframes, I discovered that they already monkey patch 
Base.zeros() to make it work for strings 

.

I think this is a bigger issue to be discussed in the contest of the use 
case in DataFrames. My two obvious suggestions would be to:

   1. Change the documentation for zero() to say that it is the additive 
   identity unless it doesn't make sense, in which case any default value is 
   good.
   2. Create a new function in Base for this specific need of a default 
   value.

Ivar

kl. 03:53:43 UTC+1 mandag 10. november 2014 skrev Jacob Quinn følgende:
>
> HmmmI guess we could add 0 and 1 definitions if it'll be generally 
> useful (i.e. Date/DateTime s are ordinals with numeric-like properties, so 
> being able to define zero/one and have them work with generic functions).
>
> It still just seems a little weird because there's not a real solid 
> reasoning/meaning. I think one reason a lot of other languages define a 
> zero(::DateTime) is because values can be "truthy" or "falsey", so you 
> would compare a date with zero(::DateTime) to check for falseness. In 
> Julia, you have to use explicit Booleans, so that's not as important a 
> reason.
>
> Happy to hear other opinions/use cases from people though.
>
> -Jacob
>
> On Sun, Nov 9, 2014 at 9:23 PM, Thomas Covert  > wrote:
>
>> To your first question, I'm sure there are good reasons for not having 
>> zeros in the Date and Time types, but in other languages (i.e., stata), 
>> dates and times are stored as integers or floats with respect to some 
>> reference time.  So, I *think* the 0-date in stata refers to January 1, 
>> 1960.  Obviously this is fairly arbitrary, but there is some precedence for 
>> it in other languages.
>>
>> On Sunday, November 9, 2014 8:17:04 PM UTC-6, Jacob Quinn wrote:
>>>
>>> What Date would represent zero(::Date)? Or one(::Date), for that matter? 
>>> Doesn't seem like a particularly useful definition. What's the use case?
>>>
>>> On Sun, Nov 9, 2014 at 9:14 PM, Thomas Covert  
>>> wrote:
>>>
 I'm using Dates.jl on 0.3 and have discovered that there is no zero 
 defined for the Date or DateTime types.  Is this intentional?



>>>
>

[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread David van Leeuwen

Hello, 

I didn't realize NamedArrays was broken on release-0.3, because of my lack 
of travis skills.  I had a different 0.4 incompatibility: 
"(Dict{K,V})(ks::AbstractArray{K},vs::AbstractArray{V}) 
is deprecated, use (Dict{K,V})(zip(ks,vs)) instead".  Foolishly I replace 
my construct

Dict(keys, values)

by 

@Compat.dict(zip(keys, values))

but that breaks on release-0.3. 

Is there a recommended way to solve this incompatibility?

Cheers, 

---david



On Saturday, October 11, 2014 8:17:38 PM UTC+2, Stefan Karpinski wrote:
>
> This announcement is primarily for Julia package developers. Since there 
> is already some syntax breakage between Julia v0.3 and v0.4, and there will 
> be more, it's increasingly tricky to make packages to work on both 
> versions. The Compat package  was 
> just created to help: it provides compatibility constructs that will work 
> in both versions without warnings.
>
> For example, in v0.3 you could create a dictionary like this:
>
> julia> [ :foo => 1, :bar => 2 ]
> Dict{Symbol,Int64} with 2 entries:
>   :bar => 2
>   :foo => 1
>
>
> This still works in v0.4 but it produces a warning. The new syntax is this:
>
> julia> Dict(:foo => 1, :bar => 2)
> Dict{Symbol,Int64} with 2 entries:
>   :bar => 2
>   :foo => 1
>
>
> However, this newer syntax won't work in v0.3, so you're a bit stuck if 
> you want to write a dictionary literal in a way that will work in both v0.3 
> and v0.4 without producing a warning. Compat to the rescue!:
>
> julia> using Compat
>
> julia> @Compat.Dict(:foo => 2, :bar => 2)
> Dict{Symbol,Int64} with 2 entries:
>   :bar => 2
>   :foo => 2
>
>
> This works with no warning on both v0.3 and v0.4. We've intentionally not 
> exported the Dict macro so that the usage needs to be prefixed with 
> "Compat.", which will make usages of the compatibility workarounds easier 
> to find and remove later when they're no longer necessary.
>
> Currently, there's only a couple of definitions in the Compat package, but 
> if you have your own hacks that have helped make it easier to write 
> cross-version package code, please contribute them and we can build up a 
> nice little collection.
>

[julia-users] Re: Performance confusions on matrix extractions in loops, and memory allocations

2014-11-10 Thread Todd Leo

I tested it again with @time test2(dtm'[:,1], dtm'[:,2]) and it took only 
0.013seconds. I also checked @time test2(v,w) and it resulted similar time. 
I changed nothing, it was odd.

On Monday, November 10, 2014 3:28:10 PM UTC+8, Daniel Høegh wrote:
>
> I have made a minimum test case:
> a=rand(1,2)
> function newsum(a)
> for i in 1:100
> sum(a[:,1])+sum(a[:,2])
> end
> end
> function newsum(a1,a2)
> for i in 1:100
> sum(a1)+sum(a2)
> end
> end
> @time newsum(a)
> @time newsum(a[:,1],a[:,2])
> elapsed time: 0.073095574 seconds (17709844 bytes allocated, 23.23% gc 
> time)
> elapsed time: 0.006946504 seconds (244796 bytes allocated)
>
> I suggest that a[:,1] is making a copy of the data in the a matrix this is 
> done in each iteration of the first function, but in the second function 
> this is done only once when the function is called like: 
> newsum(a[:,1],a[:,2]).
>

82 matches

Mail list logo