[julia-users] mktemp doesn't accept anonymous function?
Am I doing something wrong or is this a bug? It seems like if the 1st version works, the 2nd and 3rd should, too. julia> mktemp(println) /var/folders/jd/1skd5rh11hnc_s19lmx93zywgp/T/tmpf7HaUHIOStream() julia> mktemp(x->println(x)) ERROR: wrong number of arguments in anonymous at none:1 in mktemp at file.jl:218 in mktemp at file.jl:216 julia> mktemp() do x println(x) end ERROR: wrong number of arguments in anonymous at none:2 in mktemp at file.jl:218 in mktemp at file.jl:216 julia> versioninfo() Julia Version 0.4.7 Commit ae26b25* (2016-09-18 16:17 UTC) Platform Info: System: Darwin (x86_64-apple-darwin15.6.0) CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.3
[julia-users] Re: Regex(...) vs. r"..."?
Never mind, I figured out it was an escaping issue. This is all fine if you do Regex("\\d\\d"). On Monday, February 22, 2016 at 5:27:17 PM UTC-8, John Brock wrote: > > regex_from_macro = r"\d\d" > regex_from_function = Regex("\d\d") > > julia> regex_from_macro("45") > true > > julia> regex_from_function("45") > false > > julia> regex_from_macro("5") > false > > julia> regex_from_function("5") > false > > > Why does calling r"\d\d" result in different behavior than Regex("\d\d")? > I would have expected the behavior to be identical, but it looks like call > with Regex(...) always returns false, while r"..." matches as expected. > > > This is with Julia v0.4.2. >
[julia-users] Regex(...) vs. r"..."?
regex_from_macro = r"\d\d" regex_from_function = Regex("\d\d") julia> regex_from_macro("45") true julia> regex_from_function("45") false julia> regex_from_macro("5") false julia> regex_from_function("5") false Why does calling r"\d\d" result in different behavior than Regex("\d\d")? I would have expected the behavior to be identical, but it looks like call with Regex(...) always returns false, while r"..." matches as expected. This is with Julia v0.4.2.
[julia-users] Functions inside pmap call can't see SharedArrays unless preceded by @async
I don't know if this is a feature or a bug. Functions used inside a pmap call can only see SharedArrays when the pmap call is preceded by @async. To replicate, be sure to start julia with more than one process (e.g., `julia -p 2`): julia> foo = convert(SharedArray, [1,2,3,4]); julia> @async pmap(i->println(foo), 1:2) Task (waiting) @0x00010cd8f730 julia> From worker 2: [1,2,3,4] From worker 3: [1,2,3,4] julia> pmap(i->println(foo), 1:2) 2-element Array{Any,1}: RemoteException(2,CapturedException(UndefVarError(:foo),Any[(:anonymous,: none,1,symbol(""),-1,1),(:anonymous,symbol("multi.jl"),907,symbol(""),-1,1 ),(:run_work_thunk,symbol("multi.jl"),645,symbol(""),-1,1),(:anonymous, symbol("multi.jl"),907,symbol("task.jl"),63,1)])) RemoteException(3,CapturedException(UndefVarError(:foo),Any[(:anonymous,: none,1,symbol(""),-1,1),(:anonymous,symbol("multi.jl"),907,symbol(""),-1,1 ),(:run_work_thunk,symbol("multi.jl"),645,symbol(""),-1,1),(:anonymous, symbol("multi.jl"),907,symbol("task.jl"),63,1)])) julia> versioninfo() Julia Version 0.4.2 Commit bb73f34 (2015-12-06 21:47 UTC) Platform Info: System: Darwin (x86_64-apple-darwin15.0.0) CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.3 I can also replicate this on Julia v0.3.9.
[julia-users] Possible bug with SharedArray and pmap?
To replicate, be sure to start julia with more than one process (e.g., `julia -p 2`): julia> foo = convert(SharedArray, [1,2,3,4]); julia> @async pmap(i->println(foo), 1:2) Task (waiting) @0x00010cd8f730 julia> From worker 2: [1,2,3,4] From worker 3: [1,2,3,4] julia> pmap(i->println(foo), 1:2) 2-element Array{Any,1}: RemoteException(2,CapturedException(UndefVarError(:foo),Any[(:anonymous,: none,1,symbol(""),-1,1),(:anonymous,symbol("multi.jl"),907,symbol(""),-1,1 ),(:run_work_thunk,symbol("multi.jl"),645,symbol(""),-1,1),(:anonymous, symbol("multi.jl"),907,symbol("task.jl"),63,1)])) RemoteException(3,CapturedException(UndefVarError(:foo),Any[(:anonymous,: none,1,symbol(""),-1,1),(:anonymous,symbol("multi.jl"),907,symbol(""),-1,1 ),(:run_work_thunk,symbol("multi.jl"),645,symbol(""),-1,1),(:anonymous, symbol("multi.jl"),907,symbol("task.jl"),63,1)]))
Re: [julia-users] Re: Large Data Sets in Julia
Thanks, Tomas, but I wasn't referring to the keyword arguments. One signature starts with a filename argument, and the other doesn't. What's the difference? Is the filename specifying the location at which to create a memory mapped file? On Wednesday, November 11, 2015 at 3:33:56 AM UTC-8, Tomas Lycken wrote: > > Everything after the semicolon is keyword arguments, and will dispatch to > the same method as if they are left out. Thus, the documentation for > SharedArray(T, > dims; init=false, pids=[]) is valid for SharedArray(T, dims) too, and the > values of init and pids will be the ones given in the signature. > > // T > > On Monday, November 9, 2015 at 9:43:17 PM UTC+1, John Brock wrote: > > It looks like SharedArray(filename, T, dims) isn't documented, >> but SharedArray(T, dims; init=false, pids=Int[]) is. What's the difference? >> >> On Friday, November 6, 2015 at 2:21:01 AM UTC-8, Tim Holy wrote: >>> >>> Not sure if it's as high-level as you're hoping for, but julia has great >>> support for arrays that are much bigger than memory. See Mmap.mmap and >>> SharedArray(filename, T, dims). >>> >>> --Tim >>> >>> On Thursday, November 05, 2015 06:33:52 PM André Lage wrote: >>> > hi Viral, >>> > >>> > Do you have any news on this? >>> > >>> > André Lage. >>> > >>> > On Wednesday, July 3, 2013 at 5:12:06 AM UTC-3, Viral Shah wrote: >>> > > Hi all, >>> > > >>> > > I am cross-posting my reply to julia-stats and julia-users as there >>> was a >>> > > separate post on large logistic regressions on julia-users too. >>> > > >>> > > Just as these questions came up, Tanmay and I have been chatting >>> about a >>> > > general framework for working on problems that are too large to fit >>> in >>> > > memory, or need parallelism for performance. The idea is simple and >>> based >>> > > on providing a convenient and generic way to break up a problem into >>> > > subproblems, each of which can then be scheduled to run anywhere. To >>> start >>> > > with, we will implement a map and mapreduce using this, and we hope >>> that >>> > > it >>> > > should be able to handle large files sequentially, distributed data >>> > > in-memory, and distributed filesystems within the same framework. Of >>> > > course, this all sounds too good to be true. We are trying out a >>> simple >>> > > implementation, and if early results are promising, we can have a >>> detailed >>> > > discussion on API design and implementation. >>> > > >>> > > Doug, I would love to see if we can use some of this work to >>> parallelize >>> > > GLM at a higher level than using remotecall and fetch. >>> > > >>> > > -viral >>> > > >>> > > On Tuesday, July 2, 2013 11:10:35 PM UTC+5:30, Douglas Bates wrote: >>> > >> On Tuesday, July 2, 2013 6:26:33 AM UTC-5, Raj DG wrote: >>> > >>> Hi all, >>> > >>> >>> > >>> I am a regular user of R and also use it for handling very large >>> data >>> > >>> sets (~ 50 GB). We have enough RAM to fit all that data into >>> memory for >>> > >>> processing, so don't really need to do anything additional to >>> chunk, >>> > >>> etc. >>> > >>> >>> > >>> I wanted to get an idea of whether anyone has, in practice, >>> performed >>> > >>> analysis on large data sets using Julia. Use cases range from >>> performing >>> > >>> Cox Regression on ~ 40 million rows and over 10 independent >>> variables to >>> > >>> simple statistical analysis using T-Tests, etc. Also, how does the >>> > >>> timings >>> > >>> for operations like logistic regressions compare to Julia ? Are >>> there >>> > >>> any >>> > >>> libraries/packages that can perform Cox, Poisson (Negative >>> Binomial), >>> > >>> and >>> > >>> other regression types ? >>> > >>> >>> > >>> The benchmarks for Julia look promising, but in today'
Re: [julia-users] Re: Large Data Sets in Julia
It looks like SharedArray(filename, T, dims) isn't documented, but SharedArray(T, dims; init=false, pids=Int[]) is. What's the difference? On Friday, November 6, 2015 at 2:21:01 AM UTC-8, Tim Holy wrote: > > Not sure if it's as high-level as you're hoping for, but julia has great > support for arrays that are much bigger than memory. See Mmap.mmap and > SharedArray(filename, T, dims). > > --Tim > > On Thursday, November 05, 2015 06:33:52 PM André Lage wrote: > > hi Viral, > > > > Do you have any news on this? > > > > André Lage. > > > > On Wednesday, July 3, 2013 at 5:12:06 AM UTC-3, Viral Shah wrote: > > > Hi all, > > > > > > I am cross-posting my reply to julia-stats and julia-users as there > was a > > > separate post on large logistic regressions on julia-users too. > > > > > > Just as these questions came up, Tanmay and I have been chatting about > a > > > general framework for working on problems that are too large to fit in > > > memory, or need parallelism for performance. The idea is simple and > based > > > on providing a convenient and generic way to break up a problem into > > > subproblems, each of which can then be scheduled to run anywhere. To > start > > > with, we will implement a map and mapreduce using this, and we hope > that > > > it > > > should be able to handle large files sequentially, distributed data > > > in-memory, and distributed filesystems within the same framework. Of > > > course, this all sounds too good to be true. We are trying out a > simple > > > implementation, and if early results are promising, we can have a > detailed > > > discussion on API design and implementation. > > > > > > Doug, I would love to see if we can use some of this work to > parallelize > > > GLM at a higher level than using remotecall and fetch. > > > > > > -viral > > > > > > On Tuesday, July 2, 2013 11:10:35 PM UTC+5:30, Douglas Bates wrote: > > >> On Tuesday, July 2, 2013 6:26:33 AM UTC-5, Raj DG wrote: > > >>> Hi all, > > >>> > > >>> I am a regular user of R and also use it for handling very large > data > > >>> sets (~ 50 GB). We have enough RAM to fit all that data into memory > for > > >>> processing, so don't really need to do anything additional to chunk, > > >>> etc. > > >>> > > >>> I wanted to get an idea of whether anyone has, in practice, > performed > > >>> analysis on large data sets using Julia. Use cases range from > performing > > >>> Cox Regression on ~ 40 million rows and over 10 independent > variables to > > >>> simple statistical analysis using T-Tests, etc. Also, how does the > > >>> timings > > >>> for operations like logistic regressions compare to Julia ? Are > there > > >>> any > > >>> libraries/packages that can perform Cox, Poisson (Negative > Binomial), > > >>> and > > >>> other regression types ? > > >>> > > >>> The benchmarks for Julia look promising, but in today's age of the > "big > > >>> data", it seems that the capability of handling large data is a > > >>> pre-requisite to the future success of any new platform or language. > > >>> Looking forward to your feedback, > > >> > > >> I think the potential for working with large data sets in Julia is > better > > >> than that in R. Among other things Julia allows for memory-mapped > files > > >> and for distributed arrays, both of which have great potential. > > >> > > >> I have been working with some Biostatisticians on a prototype package > for > > >> working with snp data of the sort generated in genome-wide > association > > >> studies. Current data sizes can be information on tens of thousands > of > > >> individuals (rows) for over a million snp positions (columns). The > > >> nature > > >> of the data is such that each position provides one of four potential > > >> values, including a missing value. A compact storage format using 2 > bits > > >> per position is widely used for such data. We are able to read and > > >> process > > >> such a large array in a few seconds using memory-mapped files in > Julia. > > >> > > >> The amazing thing is that the code is pure Julia. When I write in R > I > > >> am > > >> > > >> always conscious of the bottlenecks and the need to write C or C++ > code > > >> for > > >> those places. I haven't encountered cases where I need to write new > code > > >> in a compiled language to speed up a Julia function. I have > interfaced > > >> to > > >> existing numerical libraries but not writing fresh code. > > >> > > >> As John mentioned I have written the GLM package allowing for hooks > to > > >> use distributed arrays. As yet I haven't had a large enough problem > to > > >> warrant fleshing out those hooks but I could be persuaded. > >
[julia-users] sub vs. ArrayViews in 0.4?
As of 0.4, when should I choose to use sub versus ArrayViews.jl? The ArrayViews.jl README mentions that sub uses a less efficient view representation, but just how much less efficient is it? Is there ever a good reason to use sub instead of ArrayViews, despite the less efficient representation? -John
[julia-users] Re: escaping double quotes inside backticks?
A-ha, I see, thanks. Also, for anyone who comes across this post, I forgot the -o flag before 'StrictHostKeyChecking no'. On Thursday, October 8, 2015 at 4:35:07 PM UTC-7, Steven G. Johnson wrote: > > > > On Thursday, October 8, 2015 at 7:16:34 PM UTC-4, John Brock wrote: >> >> I'm trying to construct a command containing double quotes, e.g.: scp >> "StrictHostKeyChecking no" source user@127.0.0.1:~ >> >> However, I can't figure out how to escape the double quotes when using >> the julia backtick syntax. For example, none of the following work: >> >> julia> `scp "StrictHostKeyChecking no" source user@127.0.0.1:~` >> `scp 'StrictHostKeyChecking no' source user@127.0.0.1:~` >> > > First, are you sure you actually want to pass double quotes to `scp`? > Double quotes are used in the shell to prevent spaces from being parsed as > separate arguments, they aren't actually passed to `scp`. The above > example is correct if you want to pass > StrictHostKeyChecking no > as the first argument of scp, and is equivalent to > scp "StrictHostKeyChecking no" source user@127.0.0.1:~ > in a shell like bash. > > If you actually wanted to pass double quotes to `scp` as part of the > arguments, you would escape them exactly a you would declare a literal > string with quotes in Julia: > *julia>* `scp "\"StrictHostKeyChecking no\"" source user@127.0.0.1:~` > > `scp '"StrictHostKeyChecking no"' source user@127.0.0.1:~` > > --SGJ > >
[julia-users] escaping double quotes inside backticks?
I'm trying to construct a command containing double quotes, e.g.: scp "StrictHostKeyChecking no" source user@127.0.0.1:~ However, I can't figure out how to escape the double quotes when using the julia backtick syntax. For example, none of the following work: julia> `scp "StrictHostKeyChecking no" source user@127.0.0.1:~` `scp 'StrictHostKeyChecking no' source user@127.0.0.1:~` julia> `scp \"StrictHostKeyChecking no\" source user@127.0.0.1:~` `scp '"StrictHostKeyChecking' 'no"' source user@127.0.0.1:~` julia> `scp """StrictHostKeyChecking no""" source user@127.0.0.1:~` `scp 'StrictHostKeyChecking no' source user@127.0.0.1:~` Any suggestions?
[julia-users] threads and processes, @async vs. @spawn/@parallel?
I've read through the parallel computing documentation and experimented with some toy examples, but I still have some questions about the fundamentals of parallel programming in Julia: 1. It seems that @async performs work in a separate green thread, while @spawn performs work in a separate julia process. Is that right? 2. Will code executed with several calls to @async actually run on separate cores? 3. Getting true parallelism with @spawn or @parallel requires launching julia with the -p flag or using addprocs(...). Does the same hold for @async, i.e., will I get true parallelism with several calls to @async if I only have a single julia process? 4. In what situations should I choose @spawn over @async, and vice versa? 5. How does scope and serialization work with regards to @async? If the code being executed with @async references some Array, will each thread get a copy of that Array, like if I had called @spawn instead? Or will each thread have access to the same Array, obviating the need for SharedArray when using @async? A lot of this stuff is left ambiguous in the documentation, but I'd be happy to submit a pull request with updates if I can get some clear answers. Thanks! -John
[julia-users] Re: Why does enumerate fail in parallel?
This seems issue-worthy if the most recent nightly has the same problem. It looks like Enumerate supports the length property, so the underlying code for @parallel should be able to check the length of the enumerator and figure out how many jobs to assign to each worker. And regardless of whether it makes sense for @parallel to support enumerate, that error message is pretty opaque -- it doesn't make it obvious to the programmer that parallel doesn't support enumerate, which is a pretty natural thing to try. On Thursday, August 20, 2015 at 3:44:49 AM UTC-7, ele...@gmail.com wrote: > > > > On Thursday, August 20, 2015 at 4:50:49 AM UTC+10, Ismael VC wrote: >> >> Well that works but it's indeed odd, can you open a new issue for this? >> > > Not really odd, @parallel needs to divide the set of values between > multiple processes, so it needs the whole set of values. > > >> >> El miércoles, 19 de agosto de 2015, 13:48:28 (UTC-5), Ismael VC escribió: >>> >>> Enumerate is an iterator, you need to collect the items first: >>> >>> julia> @parallel for i in collect(enumerate(list)) >>>println(i) >>>end >>> >>> julia> From worker 2: (1,"a") >>> From worker 2: (2,"b") >>> From worker 3: (3,"c") >>> >>> >>> El miércoles, 19 de agosto de 2015, 12:17:35 (UTC-5), Nils Gudat >>> escribió: I just rewrote one of my parallel loops and was surprised by this: > list = ["a", "b", "c"] > for i in enumerate(list) > println(i) > end (1,"a") (2,"b") (3,"c") > addprocs(2) > @sync @parallel for i in enumerate(list) > println(i) > end ERROR: `getindex` has no method matching getindex(::Enumerate{Array{ASCIIString,1}}, ::UnitRange{Int64}) Am I doing something wrong here? Is this expected behaviour? >>>
[julia-users] Re: Why does enumerate fail in parallel?
This seems issue-worthy if the most recent nightly have the same problem. It looks like Enumerate supports the length property, so the underlying code for @parallel should be able to check the length of the enumerator and figure out how many jobs to assign to each worker. And regardless of whether it makes sense for @parallel to support enumerate, that error message is pretty opaque -- it doesn't make it obvious to the programmer that parallel doesn't support enumerate, which is a pretty natural thing to try. On Thursday, August 20, 2015 at 3:44:49 AM UTC-7, ele...@gmail.com wrote: > > > > On Thursday, August 20, 2015 at 4:50:49 AM UTC+10, Ismael VC wrote: >> >> Well that works but it's indeed odd, can you open a new issue for this? >> > > Not really odd, @parallel needs to divide the set of values between > multiple processes, so it needs the whole set of values. > > >> >> El miércoles, 19 de agosto de 2015, 13:48:28 (UTC-5), Ismael VC escribió: >>> >>> Enumerate is an iterator, you need to collect the items first: >>> >>> julia> @parallel for i in collect(enumerate(list)) >>>println(i) >>>end >>> >>> julia> From worker 2: (1,"a") >>> From worker 2: (2,"b") >>> From worker 3: (3,"c") >>> >>> >>> El miércoles, 19 de agosto de 2015, 12:17:35 (UTC-5), Nils Gudat >>> escribió: I just rewrote one of my parallel loops and was surprised by this: > list = ["a", "b", "c"] > for i in enumerate(list) > println(i) > end (1,"a") (2,"b") (3,"c") > addprocs(2) > @sync @parallel for i in enumerate(list) > println(i) > end ERROR: `getindex` has no method matching getindex(::Enumerate{Array{ASCIIString,1}}, ::UnitRange{Int64}) Am I doing something wrong here? Is this expected behaviour? >>>
[julia-users] Syntax for slicing with an array of ranges?
Is there an easy way to slice an array using an array of ranges? I'm looking for something like the following: foo = [2,6,2,8,4,9] some_ranges = UnitRange{Int64}[2:3, 5:6] foo[some_ranges] # gives error; desired output is [6,2,4,9]
Re: [julia-users] Mutual exclusion when writing to SharedArrays?
Isn't that what DArrays are for, though? Does Julia provide a mechanism for mutual exclusion/marking critical sections? I'm imagining something like: shared_result = SharedArray(Int64, (2,), init = S -> S[localindexes(S)] = 0 ) @parallel for i=1:3 lock shared_result shared_result[:] += [i, i] end end On Monday, July 20, 2015 at 6:59:03 PM UTC-7, Tim Holy wrote: > > Usually the whole point of a SharedArray is that workers only update the > piece > they "own." You can make it work different if you implement locking, but > lock > contention can be a bottleneck. > > --Tim > > On Monday, July 20, 2015 04:29:04 PM John Brock wrote: > > I'm seeing inconsistent results when multiple workers write values to a > > SharedArray at the same time, presumably because += isn't atomic. Is > this > > intended behavior, and is there a workaround? Behavior is reproducible > in > > 0.3.8-pre+22 and 0.3.9. > > > > Sample code: > > > > function doStuff() > > result_shared = SharedArray(Int64, (2,), init = S -> > S[localindexes(S)] = > > 0) > > @sync for i=1:3 > > @spawn begin > > result_shared[:] += [i, i] > > end > > end > > return sdata(result_shared) > > end > > > > > julia -p 3 > > > > julia> dump(doStuff()) > > Array(Int64,(2,)) [3,3] > > > > julia> dump(doStuff()) > > Array(Int64,(2,)) [6,6] > >
[julia-users] Mutual exclusion when writing to SharedArrays?
I'm seeing inconsistent results when multiple workers write values to a SharedArray at the same time, presumably because += isn't atomic. Is this intended behavior, and is there a workaround? Behavior is reproducible in 0.3.8-pre+22 and 0.3.9. Sample code: function doStuff() result_shared = SharedArray(Int64, (2,), init = S -> S[localindexes(S)] = 0) @sync for i=1:3 @spawn begin result_shared[:] += [i, i] end end return sdata(result_shared) end > julia -p 3 julia> dump(doStuff()) Array(Int64,(2,)) [3,3] julia> dump(doStuff()) Array(Int64,(2,)) [6,6]