[julia-users] ANN: Glove.jl
Hey all, I started this a couple of months back but ran into a couple of issues and sort of just had it on the backburner. I worked on it these last couple of days and I've gotten it to a usable state. https://github.com/domluna/Glove.jl I haven't done anything to make it parallel yet, that would be the next big performance win. Any tips or contributions to make it better would be more than welcome :)
[julia-users] Re: ANN: Glove.jl
Thanks Kevin. This slipped my mind. Glove (or rather GloVe, Glove is just easier to type) stands for Global Word Vectors. The package implements the algorithm in described http://nlp.stanford.edu/projects/glove/. The idea is to represent words as a vector of floats that capture word similarities. For example, "king - man + woman = queen" (operating on word vectors). The other popular implementation is word2vec, https://code.google.com/p/word2vec/.
[julia-users] Re: Julia Summer of Code
I'd be interested in bridging Julia and Torch. I believe this has been thought about before https://github.com/soumith/NeuralNetworks.jl. What are the challenges to starting some work on this? If not I'd like to work on the JuliaWeb project. I was watching the GoSF meeting last night via live stream and in Go 1.5 (master) shared libraries are available. This got me thinking it would be cool to interface between Julia and Go. Go is known for its server capabilities so leveraging this could be very useful. I realize this sounds a bit crazy. It's an idea I've had for a little while now that just perhaps became viable. Go runs on a boatload of architectures so that shouldn't be a problem either. On Friday, May 15, 2015 at 1:57:24 PM UTC-4, Viral Shah wrote: Folks, The Moore Foundation is generously funding us to allow for 6-8 Julia Summer of Code projects. Details will be published soon, but if you are interested, please mark your calendars and plan your projects. -viral
[julia-users] Avoid memory allocations when reading from matrices
Reposting this from Gitter chat since it seems this is more active. I'm writing a GloVe module to learn Julia. How can I avoid memory allocations? My main function deals with a lot of random indexing in Matrices. A[i, :] = 0.5 * B[i, :] In this case* i* isn't from a linear sequence. I'm not sure that matters. Anyway, I've done analysis and I know B[i, :] is the issue here since it's creating a copy. https://github.com/JuliaLang/julia/blob/master/base/array.jl#L309 makes the copy I tried to do it via loop but it looks like that doesn't help either. In fact, it seems to allocate slight more memory which seems really odd. Here's some of the code, it's a little messy since I'm commenting different approaches I'm trying out. type Model{T} W_main::Matrix{T} W_ctx::Matrix{T} b_main::Vector{T} b_ctx::Vector{T} W_main_grad::Matrix{T} W_ctx_grad::Matrix{T} b_main_grad::Vector{T} b_ctx_grad::Vector{T} covec::Vector{Cooccurence} end # Each vocab word in associated with a main vector and a context vector. # The paper initializes the to values [-0.5, 0.5] / vecsize+1 and # the gradients to 1.0. # # The +1 term is for the bias. function Model(comatrix; vecsize=100) vs = size(comatrix, 1) Model( (rand(vecsize, vs) - 0.5) / (vecsize + 1), (rand(vecsize, vs) - 0.5) / (vecsize + 1), (rand(vs) - 0.5) / (vecsize + 1), (rand(vs) - 0.5) / (vecsize + 1), ones(vecsize, vs), ones(vecsize, vs), ones(vs), ones(vs), CoVector(comatrix), # not required in 0.4 ) end # TODO: figure out memory issue # the memory comments are from 500 loop test with vecsize=100 function train!(m::Model, s::Adagrad; xmax=100, alpha=0.75) J = 0.0 shuffle!(m.covec) vecsize = size(m.W_main, 1) eltype = typeof(m.b_main[1]) vm = zeros(eltype, vecsize) vc = zeros(eltype, vecsize) grad_main = zeros(eltype, vecsize) grad_ctx = zeros(eltype, vecsize) for n=1:s.niter # shuffle indices for i = 1:length(m.covec) @inbounds l1 = m.covec[i].i # main index @inbounds l2 = m.covec[i].j # context index @inbounds v = m.covec[i].v vm[:] = m.W_main[:, l1] vc[:] = m.W_ctx[:, l2] diff = dot(vec(vm), vec(vc)) + m.b_main[l1] + m.b_ctx[l2] - log(v) fdiff = ifelse(v xmax, (v / xmax) ^ alpha, 1.0) * diff J += 0.5 * fdiff * diff fdiff *= s.lrate # inc memory by ~200 MB running time by 2x grad_main[:] = fdiff * m.W_ctx[:, l2] grad_ctx[:] = fdiff * m.W_main[:, l1] # Adaptive learning # inc ~ 600MB + 0.75s #= @inbounds for ii = 1:vecsize =# #= m.W_main[ii, l1] -= grad_main[ii] / sqrt(m.W_main_grad[ii, l1]) =# #= m.W_ctx[ii, l2] -= grad_ctx[ii] / sqrt(m.W_ctx_grad[ii, l2]) =# #= m.b_main[l1] -= fdiff ./ sqrt(m.b_main_grad[l1]) =# #= m.b_ctx[l2] -= fdiff ./ sqrt(m.b_ctx_grad[l2]) =# #= end =# m.W_main[:, l1] -= grad_main ./ sqrt(m.W_main_grad[:, l1]) m.W_ctx[:, l2] -= grad_ctx ./ sqrt(m.W_ctx_grad[:, l2]) m.b_main[l1] -= fdiff ./ sqrt(m.b_main_grad[l1]) m.b_ctx[l2] -= fdiff ./ sqrt(m.b_ctx_grad[l2]) # Gradients fdiff *= fdiff m.W_main_grad[:, l1] += grad_main .^ 2 m.W_ctx_grad[:, l2] += grad_ctx .^ 2 m.b_main_grad[l1] += fdiff m.b_ctx_grad[l2] += fdiff end #= if n % 10 == 0 =# #= println(iteration $n, cost $J) =# #= end =# end end Here's the entire repo https://github.com/domluna/GloVe.jl. Might be helpful. I tried doing some loops but it allocates more memory (oddly enough) and gets slower. You'll notice the word vectors are indexed by column, I changed the representation to that seeing if it would make a difference during the loop. It didn't seem to. The memory analysis showed Julia Version 0.4.0-dev+4893 Commit eb5da26* (2015-05-19 11:51 UTC) Platform Info: System: Darwin (x86_64-apple-darwin14.4.0) CPU: Intel(R) Core(TM) i5-2557M CPU @ 1.70GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge) LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 Here model consists of 100x19 Matrices and 100 element vectors, 19 words in the vocab, 100 element word vector. @time GloVe.train!(model, GloVe.Adagrad(500)) 1.990 seconds (6383 k allocations: 1162 MB, 10.82% gc time) 0.3 has is a bit slower due to worse gc but same memory. Any help would be greatly appreciated!
[julia-users] Questions relating to packages and using/creating them
I have some general questions about using packages. 1. Is there a way to create a workspace separate of $HOME/.julia? This would still have the same functionality when calling using in the REPL. 2. What's the best practice for packages with the same name? I don't have a problem related to this but I'm just curious how this is handled. I think via Pkg.add(...) there's only one definition of any package name, but with Pkg.clone(...) I could see package name collisions. Having all the packages under one directory doesn't seem scalable to me. thanks
[julia-users] Gadfly plotting multiple lines with different colours
So I have a vector x and a matrix M. The vector x consists of the points on the x axis and the columns of M are the respective y-cords for each line. I'm currently plotting all the lines using layers: layers = Layer[] for i=1:10 push!(layers, layer(x=x, y=M[:,i], Geom.line)[1]) end plot(layers) This gives me essentially what I want except all the lines are same colour. What's the best way for Gadfly to uniquely colour each line? Also if the above layering approach isn't the best way to do what I'm trying to do please let me know. Thanks.
Re: [julia-users] What's with the Nothing type?
Thanks for all the helpful messages everyone, much appreciated:)
[julia-users] What's with the Nothing type?
I just have general curiosity about the Nothing type, is there anything one should particularly know about it? Is it similar to a None type that one would typically find in pattern matching, ex. an Option type where it can be either Something or None, etc. I feel like Nothing and patterns for its use aren't well documented to this point. Dom
Re: [julia-users] What's with the Nothing type?
That cleared it up, thanks! Do functions that don't explicitly return anything then implicitly return Nothing? Sorry I didn't catch the FAQ section, might it be better to have that as a short section in types? Dom On Friday, May 23, 2014 3:19:35 PM UTC-4, Stefan Karpinski wrote: This FAQ entry may answer the question: http://docs.julialang.org/en/latest/manual/faq/#nothingness-and-missing-values If not, maybe we can expand it or clarify whatever's not clear. On Fri, May 23, 2014 at 2:40 PM, Dom Luna dlun...@gmail.com javascript: wrote: I just have general curiosity about the Nothing type, is there anything one should particularly know about it? Is it similar to a None type that one would typically find in pattern matching, ex. an Option type where it can be either Something or None, etc. I feel like Nothing and patterns for its use aren't well documented to this point. Dom
Re: [julia-users] Re: Downloaded binary startup way slower than when compiled from github
Just downloaded it today again to try it out and the binary has the same startup times as from source now. The version of darwin in the binary is still 12.5.0 vs 13.1.0 from source. I have no idea if that's an issue or not but the startup time is fine now, thanks Elliot. Dom On Saturday, May 17, 2014 5:09:42 PM UTC-4, Elliot Saba wrote: Yep, we used to do this on purpose, since we didn't have a good way of restricting the optimizations used by the compiler. Now we've got a good baseline set, and the nightlies needed their configurations to be matched. New binaries should be up by tonight. -E On Sat, May 17, 2014 at 11:34 AM, Tobias Knopp tobias...@googlemail.comjavascript: wrote: It seems that the compiled system image is not included in the prerelease binaries. Am Samstag, 17. Mai 2014 20:23:46 UTC+2 schrieb Dom Luna: I find it weird that the downloaded one has a drastically slower REPL startup than when compiled from github repo. $ where julia /Applications/Julia-0.3.0-prerelease-0b05b21911.app/ Contents/Resources/julia/bin/julia /usr/local/bin/julia I'm symlinking $HOME/julia/julia to /usr/local/bin/julia Here's the startup times Downloaded: time julia -e 'println(Helo)' 5.07s user 0.10s system 98% cpu 5.250 total Source: time /usr/local/bin/julia -e 'println(Helo)' 0.28s user 0.08s system 117% cpu 0.308 total The versions are 1 day old from each other. Downloaded: _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_)| Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type help() to list help topics | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.3.0-prerelease+3053 (2014-05-14 22:03 UTC) _/ |\__'_|_|_|\__'_| | Commit 0b05b21* (2 days old master) |__/ | x86_64-apple-darwin12.5.0 Source: _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_)| Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type help() to list help topics | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.3.0-prerelease+3081 (2014-05-16 15:12 UTC) _/ |\__'_|_|_|\__'_| | Commit eb4bfcc (1 day old master) |__/ | x86_64-apple-darwin13.1.0 The main thing I notice is the apple-darwin12.5.0 vs apple-darwin13.1.0. I'm not sure what that means. I'm on OSX 10.9.2. Dom
[julia-users] Downloaded binary startup way slower than when compiled from github
I find it weird that the downloaded one has a drastically slower REPL startup than when compiled from github repo. $ where julia /Applications/Julia-0.3.0-prerelease-0b05b21911.app/Contents/Resources/julia/bin/julia /usr/local/bin/julia I'm symlinking $HOME/julia/julia to /usr/local/bin/julia Here's the startup times Downloaded: time julia -e 'println(Helo)' 5.07s user 0.10s system 98% cpu 5.250 total Source: time /usr/local/bin/julia -e 'println(Helo)' 0.28s user 0.08s system 117% cpu 0.308 total The versions are 1 day old from each other. Downloaded: _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_)| Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type help() to list help topics | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.3.0-prerelease+3053 (2014-05-14 22:03 UTC) _/ |\__'_|_|_|\__'_| | Commit 0b05b21* (2 days old master) |__/ | x86_64-apple-darwin12.5.0 Source: _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_)| Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type help() to list help topics | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.3.0-prerelease+3081 (2014-05-16 15:12 UTC) _/ |\__'_|_|_|\__'_| | Commit eb4bfcc (1 day old master) |__/ | x86_64-apple-darwin13.1.0 The main thing I notice is the apple-darwin12.5.0 vs apple-darwin13.1.0. I'm not sure what that means. I'm on OSX 10.9.2. Dom
Re: [julia-users] Interfaces like in Go
Yeah abstract types seem to be the best place to implement something like this since, at least to my knowledge it wouldn't fundamentally break anything. You would still be able to define abstract types as is but you would also have the added power to further refine the behaviour of that type. On Thursday, March 27, 2014 4:53:26 AM UTC-4, Tobias Knopp wrote: In my opinion it would be worth adding some syntax for defining an interface for abstract types. It should give us nice error messages and clean way to document an interface. This is quite similar to the C++ concepts but as it is already possible to restrict the template parameter in Julia, the only missing thing is to define the interface for an abstract type.
[julia-users] Interfaces like in Go
I really like how interfaces work in Go and I was wondering if there's a similar way to express this in Julia. For those who are unfamiliar with Go interfaces they're essentially types defined by behaviour (functions) and not by fields. So for example the ReadWriter interface implements the Read and Write method, other defined types that have a Read and Write method can be considered a ReadWriter. Dom