from:"Amit Murthy"

Re: [julia-users] How does distribute send data to workers?

2013-12-17 Thread Amit Murthy

distribute creates a new DArray from a regular array. It does this by
allocating parts of the regular array on each of the workers. "
remotecall_fetch(owner, ()->fetch(rr)[I...])" is the regular init function
passed to the DArray constructor. On each worker, it pulls in its its part
of the regular array.

While the DArray itself is created in parallel, 1000 workers is a lot,
typically one would expect 1 worker per CPU core. Of course, if you do have
access to 1000 cores, it will be a good idea to try it out, though I
suspect, we may see other issues too as a result of it.

On Tue, Dec 17, 2013 at 5:13 PM, David C Cohen  wrote:

> Hi,
>
> According to 
> this,
> distribute is defined as:
>
> function distribute(a::AbstractArray)
> owner = myid()
> rr = RemoteRef()
> put(rr, a)
> DArray(size(a)) do I
> remotecall_fetch(owner, ()->fetch(rr)[I...])
> end
> end
>
>
> I'm trying to find out how this sends data to the workers.
>
>- Here rr is the remote reference to the local machine. Then put(rr,
>a) sends array a to the local machine. That doesn't make sense.
>- When, in whatever way that doesn't make sense to me, data of array a
>is sent to workers, does the network io happen in parallel, or in series?
>- If we have 1000 workers, parallel data sending means network
>overload, series data sending means taking a long time. What's a good way
>of working with larger distributed arrays, or distributing an array over
>many many workers?
>
>

Re: [julia-users] How does distribute send data to workers?

2013-12-17 Thread Amit Murthy

DArrays are horizontally scalable in the sense that each worker holds its
part of the DArray, and the workers can exist across different machines.
There is support for HDFS via  https://github.com/tanmaykm/HDFS.jl that you
can check out.

Out-of-the-box, Julia has a variety of parallel processing constructs as
documented here -
http://docs.julialang.org/en/latest/manual/parallel-computing/  . Some of
the currently available external packages - full list here -
http://docs.julialang.org/en/latest/packages/packagelist/ are useful for
distributed data.



On Wed, Dec 18, 2013 at 11:31 AM, David C Cohen wrote:

>
>
> On Wednesday, December 18, 2013 4:34:48 AM UTC, Amit Murthy wrote:
>>
>> distribute creates a new DArray from a regular array. It does this by
>> allocating parts of the regular array on each of the workers. "
>> remotecall_fetch(owner, ()->fetch(rr)[I...])" is the regular init
>> function passed to the DArray constructor. On each worker, it pulls in its
>> its part of the regular array.
>>
>> While the DArray itself is created in parallel, 1000 workers is a lot,
>> typically one would expect 1 worker per CPU core. Of course, if you do have
>> access to 1000 cores, it will be a good idea to try it out, though I
>> suspect, we may see other issues too as a result of it.
>>
>
> I assume this means that DArrays are not horizontally scalable. What other
> solutions are out there for julia when working with big data? Is julia
> itself considered a complete system when working with distributed data, or
> do people use julia on top of other frameworks?
>
>
>>
>> On Tue, Dec 17, 2013 at 5:13 PM, David C Cohen wrote:
>>
>>> Hi,
>>>
>>> According to 
>>> this<https://github.com/JuliaLang/julia/blob/b4fa86124dd1cb298373c3bef3f98c060cbb19b8/base/darray.jl#L160-L167>,
>>> distribute is defined as:
>>>
>>> function distribute(a::AbstractArray)
>>> owner = myid()
>>> rr = RemoteRef()
>>> put(rr, a)
>>> DArray(size(a)) do I
>>> remotecall_fetch(owner, ()->fetch(rr)[I...])
>>> end
>>> end
>>>
>>>
>>> I'm trying to find out how this sends data to the workers.
>>>
>>>- Here rr is the remote reference to the local machine. Then put(rr,
>>>a) sends array a to the local machine. That doesn't make sense.
>>>- When, in whatever way that doesn't make sense to me, data of array
>>>a is sent to workers, does the network io happen in parallel, or in 
>>> series?
>>>- If we have 1000 workers, parallel data sending means network
>>>overload, series data sending means taking a long time. What's a good way
>>>of working with larger distributed arrays, or distributing an array over
>>>many many workers?
>>>
>>>
>>

Re: [julia-users] How does distribute send data to workers?

2013-12-17 Thread Amit Murthy

It is separate from DArray and is a map-reduce framework for working with
HDFS. Just mentioned it since you asked for options for working with
distributed data.


On Wed, Dec 18, 2013 at 12:50 PM, David C Cohen wrote:

>
>
> On Wednesday, December 18, 2013 6:16:54 AM UTC, Amit Murthy wrote:
>>
>> DArrays are horizontally scalable in the sense that each worker holds its
>> part of the DArray, and the workers can exist across different machines.
>> There is support for HDFS via  https://github.com/tanmaykm/HDFS.jl that
>> you can check out.
>>
>
> Does HDFS.jl work with DArray, or is it a replace for DArray?
>
>
>>
>> Out-of-the-box, Julia has a variety of parallel processing constructs as
>> documented here - http://docs.julialang.org/en/latest/manual/parallel-
>> computing/  . Some of the currently available external packages - full
>> list here - http://docs.julialang.org/en/latest/packages/packagelist/ are
>> useful for distributed data.
>>
>>
>>
>> On Wed, Dec 18, 2013 at 11:31 AM, David C Cohen wrote:
>>
>>>
>>>
>>> On Wednesday, December 18, 2013 4:34:48 AM UTC, Amit Murthy wrote:
>>>>
>>>> distribute creates a new DArray from a regular array. It does this by
>>>> allocating parts of the regular array on each of the workers. "
>>>> remotecall_fetch(owner, ()->fetch(rr)[I...])" is the regular init
>>>> function passed to the DArray constructor. On each worker, it pulls in its
>>>> its part of the regular array.
>>>>
>>>> While the DArray itself is created in parallel, 1000 workers is a lot,
>>>> typically one would expect 1 worker per CPU core. Of course, if you do have
>>>> access to 1000 cores, it will be a good idea to try it out, though I
>>>> suspect, we may see other issues too as a result of it.
>>>>
>>>
>>> I assume this means that DArrays are not horizontally scalable. What
>>> other solutions are out there for julia when working with big data? Is
>>> julia itself considered a complete system when working with distributed
>>> data, or do people use julia on top of other frameworks?
>>>
>>>
>>>>
>>>> On Tue, Dec 17, 2013 at 5:13 PM, David C Cohen wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> According to 
>>>>> this<https://github.com/JuliaLang/julia/blob/b4fa86124dd1cb298373c3bef3f98c060cbb19b8/base/darray.jl#L160-L167>,
>>>>> distribute is defined as:
>>>>>
>>>>> function distribute(a::AbstractArray)
>>>>>
>>>>> owner = myid()
>>>>>
>>>>> rr = RemoteRef()
>>>>>
>>>>> put(rr, a)
>>>>>
>>>>> DArray(size(a)) do I
>>>>>
>>>>> remotecall_fetch(owner, ()->fetch(rr)[I...])
>>>>>
>>>>> end
>>>>>
>>>>> end
>>>>>
>>>>>
>>>>> I'm trying to find out how this sends data to the workers.
>>>>>
>>>>>- Here rr is the remote reference to the local machine. Then
>>>>>put(rr, a) sends array a to the local machine. That doesn't make sense.
>>>>>- When, in whatever way that doesn't make sense to me, data of
>>>>>array a is sent to workers, does the network io happen in parallel, or 
>>>>> in
>>>>>series?
>>>>>- If we have 1000 workers, parallel data sending means network
>>>>>overload, series data sending means taking a long time. What's a good 
>>>>> way
>>>>>of working with larger distributed arrays, or distributing an array 
>>>>> over
>>>>>many many workers?
>>>>>
>>>>>
>>>>
>>

Re: [julia-users] Re: Parallel access to shared array?

2013-12-20 Thread Amit Murthy

>
>
>
> Perhaps it is possible to wrap libc's fork() in a ccall.  This might
> happen without Julia even be aware... Getting data back from the child
> processes may be a bit tricky, I suppose you can always use the file system
> (that's what I did at the time) or communicate through a pipe or socket.
>
>
>
pfork in package PTools.jl does exactly this, i.e. uses pipes to return
data. If you allocate a shmem segment before calling pfork, data can be
returned via the shmem too.

One of the problems with fork (among others), is that the child starts with
only a single thread (the calling thread), hence any library that has its
own threads in the parent are in an inconsistent state in the children.

Re: [julia-users] Running Julia in a Sandboxed Environment

2014-01-16 Thread Amit Murthy

JDock - https://github.com/amitmurthy/JDock -  uses docker for sandboxing
julia/ijulia processes.

Though docker itself is not production-ready (as per the original devs), I
have not come across any major problems with my limited testing so far.

On Thu, Jan 16, 2014 at 9:06 PM, Stephen Chisholm wrote:

> Hi,
>
> I'm building a system which will run user defined functions written in
> Julia with the C API.  In order to do this safely in a production
> environment we need a way of sandboxing Julia so that the user cannot
> execute system calls, uninstall/install julia libraries, etc.. Is there
> currently a way to provide these types of safeties in a Julia environment,
> if not has anyone thought about how to go about implementing such a thing?
>
> Cheers, Steve
>
>
>

Re: [julia-users] Ambiguity Warnings

2014-01-20 Thread Amit Murthy

What would be the best way to solve this?

A SharedArray type has a regular Array backing it and we should make it
usable wherever a regular Array can be used.

Would the right thing to do be

- get a list of getindex methods that operate on a regular Array
- generate the same definitions for a SharedArray with a pass through to
the backing Array
- this would ensure that any further getindex definitions for an Array are
automatically generated for SharedArray too





On Tue, Jan 21, 2014 at 1:04 AM, John Myles White
wrote:

> The recent SharedArray change to Base created some new ambiguity warnings
> for DataFrames.
>
> Warning: New definition
> getindex(AbstractArray{T,1},Indexer) at
> /Users/johnmyleswhite/.julia/DataFrames/src/indexing.jl:195
> is ambiguous with:
> getindex(SharedArray{T,N},Any...) at sharedarray.jl:156.
> To fix, define
> getindex(SharedArray{T,1},Indexer)
> before the new definition.
>
>  — John
>
>

Re: [julia-users] Unexpected socket close?

2014-01-21 Thread Amit Murthy

I can't recall anything that has changed in the parallel codebase recently.
You could try with the 0.2 version just to be sure. Maybe the the julia
processes on the cluster are dying soon after they launch, and hence the
closed connection while reading port information? Could you try launching
julia manually on any of the nodes of the cluster, just to ensure that the
julia setup on those nodes are OK?

On Wed, Jan 22, 2014 at 5:07 AM, David Bindel wrote:

>
>
> I wrote a cluster manager for launching jobs on 
> HTCondora little while back, 
> and was having good luck with it, but now I seem to be
> having some trouble.  The basic logic is that Julia starts a TCP server and
> launches jobs on the cluster that then connect to the server and send back
> their information (by piping through telnet).  The problem is that
> somewhere between when the connection is accepted and when Julia tries to
> read the port information, "the connection is closed by the foreign host".
>
> I've rebuilt Julia between when I last tested this and now, so it's
> possible that there was some change in Julia; it's also possible that there
> was a change in the cluster configuration, since that's equally a moving
> target.  But I'm a bit foxed, and any insights would be welcome.
>
> Cheers,
> David
>

Re: [julia-users] [Parallel] Chaining dependencies on modules

2014-01-21 Thread Amit Murthy

I just did a 'Pkg.add("DataStructures")' and tried the above code. Seeing
the same issue.


On Wed, Jan 22, 2014 at 11:38 AM, Jeff Bezanson wrote:

> This ought to work. The warning is interesting, since the
> DataStructures package does (for me at least) define a DataStructures
> module. Is it possible DataStructures is not fully installed, missing
> files or something like that?
>
>
> On Wed, Jan 22, 2014 at 1:01 AM, Madeleine Udell
>  wrote:
> > I'm trying to understand the most Julian way to perform a particular
> > parallel programming task. Suppose I need function foo from module.jl to
> be
> > available everywhere. Let's call the following code map_foo.jl:
> >
> > @everywhere include("module.jl")
> > @everywhere using MyModule
> > pmap(foo,1:100)
> >
> > That works fine, except when module.jl itself has other dependencies on
> > other modules:
> >
> > module MyModule
> >
> > using DataStructures
> > export foo
> >
> > function foo(i)
> > return Queue(i)
> > end
> >
> > end # module
> >
> > In this case, it works to call
> >
> > julia map_foo.jl
> >
> > but when I call
> >
> > julia -p 2 map_foo.jl
> >
> > I get the following error
> >
> > Warning: requiring "DataStructures" did not define a corresponding
> module.
> > Warning: requiring "DataStructures" did not define a corresponding
> module.
> > exception on exception on 2: 3: ERROR: ERROR: Queue not definedQueue not
> > defined
> >  in
> >  in foo at /Users/madeleineudell/Dropbox/pestilli_icme_life
> > (1)/src/julia/questions/module.jl:7
> >  in anonymous at multi.jl:834
> >  in run_work_thunk at multi.jl:575
> >  in anonymous at task.jl:834
> > foo at /Users/madeleineudell/Dropbox/pestilli_icme_life
> > (1)/src/julia/questions/module.jl:7
> >  in anonymous at multi.jl:834
> >  in run_work_thunk at multi.jl:575
> >  in anonymous at task.jl:834
> >
> > Does anyone know how I can successfully chain dependencies like this when
> > using parallelism? Calling @everywhere on the import call in module.jl
> also
> > doesn't fix the problem, strangely enough.
> >
> > Of course, if I could put all my code into shared memory, I'd be much
> > happier. I just saw an update adding shared memory arrays, but I don't
> know
> > if there's a way to get shared memory code!
> >
>

Re: [julia-users] [Parallel] Chaining dependencies on modules

2014-01-21 Thread Amit Murthy

amitm:/tmp$ julia -p 2 -e "@everywhere include(\"module.jl\")"
Warning: requiring "DataStructures" did not define a corresponding module.
Warning: requiring "DataStructures" did not define a corresponding module.

amitm:/tmp$ julia -p 2 -e "require(\"module.jl\")"

Only the include is throwing up the warning, require seems to be fine




On Wed, Jan 22, 2014 at 11:41 AM, Amit Murthy  wrote:

> I just did a 'Pkg.add("DataStructures")' and tried the above code. Seeing
> the same issue.
>
>
> On Wed, Jan 22, 2014 at 11:38 AM, Jeff Bezanson 
> wrote:
>
>> This ought to work. The warning is interesting, since the
>> DataStructures package does (for me at least) define a DataStructures
>> module. Is it possible DataStructures is not fully installed, missing
>> files or something like that?
>>
>>
>> On Wed, Jan 22, 2014 at 1:01 AM, Madeleine Udell
>>  wrote:
>> > I'm trying to understand the most Julian way to perform a particular
>> > parallel programming task. Suppose I need function foo from module.jl
>> to be
>> > available everywhere. Let's call the following code map_foo.jl:
>> >
>> > @everywhere include("module.jl")
>> > @everywhere using MyModule
>> > pmap(foo,1:100)
>> >
>> > That works fine, except when module.jl itself has other dependencies on
>> > other modules:
>> >
>> > module MyModule
>> >
>> > using DataStructures
>> > export foo
>> >
>> > function foo(i)
>> > return Queue(i)
>> > end
>> >
>> > end # module
>> >
>> > In this case, it works to call
>> >
>> > julia map_foo.jl
>> >
>> > but when I call
>> >
>> > julia -p 2 map_foo.jl
>> >
>> > I get the following error
>> >
>> > Warning: requiring "DataStructures" did not define a corresponding
>> module.
>> > Warning: requiring "DataStructures" did not define a corresponding
>> module.
>> > exception on exception on 2: 3: ERROR: ERROR: Queue not definedQueue not
>> > defined
>> >  in
>> >  in foo at /Users/madeleineudell/Dropbox/pestilli_icme_life
>> > (1)/src/julia/questions/module.jl:7
>> >  in anonymous at multi.jl:834
>> >  in run_work_thunk at multi.jl:575
>> >  in anonymous at task.jl:834
>> > foo at /Users/madeleineudell/Dropbox/pestilli_icme_life
>> > (1)/src/julia/questions/module.jl:7
>> >  in anonymous at multi.jl:834
>> >  in run_work_thunk at multi.jl:575
>> >  in anonymous at task.jl:834
>> >
>> > Does anyone know how I can successfully chain dependencies like this
>> when
>> > using parallelism? Calling @everywhere on the import call in module.jl
>> also
>> > doesn't fix the problem, strangely enough.
>> >
>> > Of course, if I could put all my code into shared memory, I'd be much
>> > happier. I just saw an update adding shared memory arrays, but I don't
>> know
>> > if there's a way to get shared memory code!
>> >
>>
>
>

Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

2014-01-21 Thread Amit Murthy

I have not gone through your post in detail, but would like to point out
that SharedArray can only be used for bitstypes.


On Wed, Jan 22, 2014 at 12:23 PM, Madeleine Udell  wrote:

> # Say I have a list of tasks, eg tasks i=1:n
> # For each task I want to call a function foo
> # that depends on that task and some fixed data
> # I have many types of fixed data: eg, arrays, dictionaries, integers, etc
>
> # Imagine the data comes from eg loading a file based on user input,
> # so we can't hard code the data into the function foo
> # although it's constant during program execution
>
> # If I were doing this in serial, I'd do the following
>
> type MyData
> myint
> mydict
> myarray
> end
>
> function foo(task,data::MyData)
> data.myint + data.myarray[data.mydict[task]]
> end
>
> n = 10
> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>
> results = zeros(n)
> for i = 1:n
> results[i] = foo(i,data)
> end
>
> # What's the right way to do this in parallel? Here are a number of ideas
> # To use @parallel or pmap, we have to first copy all the code and data
> everywhere
> # I'd like to avoid that, since the data is huge (10 - 100 GB)
>
> @everywhere begin
> type MyData
> myint
> mydict
> myarray
> end
>
> function foo(task,data::MyData)
> data.myint + data.myarray[data.mydict[task]]
> end
>
> n = 10
> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
> end
>
> ## @parallel
> results = zeros(n)
> @parallel for i = 1:n
> results[i] = foo(i,data)
> end
>
> ## pmap
> @everywhere foo(task) = foo(task,data)
> results = pmap(foo,1:n)
>
> # To avoid copying data, I can make myarray a shared array
> # In that case, I don't want to use @everywhere to put data on each
> processor
> # since that would reinstantiate the shared array.
> # My current solution is to rewrite my data structure to *not* include
> myarray,
> # and pass the array to the function foo separately.
> # But the code gets much less pretty as I tear apart my data structure,
> # especially if I have a large number of shared arrays.
> # Is there a way for me to avoid this while using shared memory?
> # really, I'd like to be able to define my own shared memory data types...
>
> @everywhere begin
> type MySmallerData
> myint
> mydict
> end
>
> function foo(task,data::MySmallerData,myarray::SharedArray)
> data.myint + myarray[data.mydict[task]]
> end
>
> n = 10
> const data = MySmallerData(rand(),Dict(1:n,randperm(n)))
> end
>
> myarray = SharedArray(randperm(n))
>
> ## @parallel
> results = zeros(n)
> @parallel for i = 1:n
> results[i] = foo(i,data,myarray)
> end
>
> ## pmap
> @everywhere foo(task) = foo(task,data,myarray)
> results = pmap(foo,1:n)
>
> # Finally, what can I do to avoid copying mydict to each processor?
> # Is there a way to use shared memory for it?
> # Once again, I'd really like to be able to define my own shared memory
> data types...
>

Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

2014-01-22 Thread Amit Murthy

1. The SharedArray object can be sent to any of the processes that mapped
the shared memory segment during construction. The backing array is not
copied.
2. User defined composite types are fine as long as isbits(T) is true.



On Thu, Jan 23, 2014 at 1:01 AM, Madeleine Udell
wrote:

> That's not a problem for me; all of my data is numeric. To summarize a
> long post, I'm interested in understanding
>
> 1) good programming paradigms for using shared memory together with
> parallel maps. In particular, can a shared array and other nonshared data
> structure be combined into a single data structure and "passed" in a remote
> call without unnecessarily copying the shared array? and
> 2) possibilities for extending shared memory in julia to other data types,
> and even to user defined types.
>
>
> On Tuesday, January 21, 2014 11:17:10 PM UTC-8, Amit Murthy wrote:
>
>> I have not gone through your post in detail, but would like to point out
>> that SharedArray can only be used for bitstypes.
>>
>>
>> On Wed, Jan 22, 2014 at 12:23 PM, Madeleine Udell 
>> wrote:
>>
>>> # Say I have a list of tasks, eg tasks i=1:n
>>> # For each task I want to call a function foo
>>> # that depends on that task and some fixed data
>>> # I have many types of fixed data: eg, arrays, dictionaries, integers,
>>> etc
>>>
>>> # Imagine the data comes from eg loading a file based on user input,
>>> # so we can't hard code the data into the function foo
>>> # although it's constant during program execution
>>>
>>> # If I were doing this in serial, I'd do the following
>>>
>>> type MyData
>>> myint
>>> mydict
>>> myarray
>>> end
>>>
>>> function foo(task,data::MyData)
>>> data.myint + data.myarray[data.mydict[task]]
>>> end
>>>
>>> n = 10
>>> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>>>
>>> results = zeros(n)
>>> for i = 1:n
>>> results[i] = foo(i,data)
>>> end
>>>
>>> # What's the right way to do this in parallel? Here are a number of ideas
>>> # To use @parallel or pmap, we have to first copy all the code and data
>>> everywhere
>>> # I'd like to avoid that, since the data is huge (10 - 100 GB)
>>>
>>> @everywhere begin
>>> type MyData
>>>  myint
>>> mydict
>>> myarray
>>> end
>>>
>>> function foo(task,data::MyData)
>>> data.myint + data.myarray[data.mydict[task]]
>>> end
>>>
>>> n = 10
>>> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>>> end
>>>
>>> ## @parallel
>>> results = zeros(n)
>>> @parallel for i = 1:n
>>> results[i] = foo(i,data)
>>> end
>>>
>>> ## pmap
>>> @everywhere foo(task) = foo(task,data)
>>> results = pmap(foo,1:n)
>>>
>>> # To avoid copying data, I can make myarray a shared array
>>> # In that case, I don't want to use @everywhere to put data on each
>>> processor
>>> # since that would reinstantiate the shared array.
>>> # My current solution is to rewrite my data structure to *not* include
>>> myarray,
>>> # and pass the array to the function foo separately.
>>> # But the code gets much less pretty as I tear apart my data structure,
>>> # especially if I have a large number of shared arrays.
>>> # Is there a way for me to avoid this while using shared memory?
>>> # really, I'd like to be able to define my own shared memory data
>>> types...
>>>
>>> @everywhere begin
>>> type MySmallerData
>>> myint
>>> mydict
>>>  end
>>>
>>> function foo(task,data::MySmallerData,myarray::SharedArray)
>>> data.myint + myarray[data.mydict[task]]
>>>  end
>>>
>>> n = 10
>>> const data = MySmallerData(rand(),Dict(1:n,randperm(n)))
>>> end
>>>
>>> myarray = SharedArray(randperm(n))
>>>
>>> ## @parallel
>>> results = zeros(n)
>>> @parallel for i = 1:n
>>> results[i] = foo(i,data,myarray)
>>> end
>>>
>>> ## pmap
>>> @everywhere foo(task) = foo(task,data,myarray)
>>> results = pmap(foo,1:n)
>>>
>>> # Finally, what can I do to avoid copying mydict to each processor?
>>> # Is there a way to use shared memory for it?
>>> # Once again, I'd really like to be able to define my own shared memory
>>> data types...
>>>
>>
>>

Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

2014-01-23 Thread Amit Murthy

The SharedArray object ha a field loc_shmarr which represents the backing
array. So S.loc_shmarr should work everywhere. But you are right, we need
to ensure that the SharedArray can be used just as a regular array.


On Fri, Jan 24, 2014 at 9:00 AM, Madeleine Udell
wrote:

> even more problematic: I can't multiply by my SharedArray:
>
> no method *(SharedArray{Float64,2}, Array{Float64,2})
>
>
> On Thursday, January 23, 2014 7:22:59 PM UTC-8, Madeleine Udell wrote:
>>
>> Thanks! I'm trying out a SharedArray solution now, but wondered if you
>> can tell me if there's an easy way to reimplement many of the convenience
>> wrappers on arrays for shared arrays. Eg I get the following errors:
>>
>> >> shared_array[1,:]
>> no method getindex(SharedArray{Float64,2}, Float64, Range1{Int64})
>>
>> >> repmat(shared_array,2,1)
>> no method similar(SharedArray{Float64,2}, Type{Float64}, (Int64,Int64))
>>  in repmat at abstractarray.jl:1043
>>
>> I'm surprised these aren't inherited properties from AbstractArray!
>>
>> On Wednesday, January 22, 2014 8:05:45 PM UTC-8, Amit Murthy wrote:
>>>
>>> 1. The SharedArray object can be sent to any of the processes that
>>> mapped the shared memory segment during construction. The backing array is
>>> not copied.
>>> 2. User defined composite types are fine as long as isbits(T) is true.
>>>
>>>
>>>
>>> On Thu, Jan 23, 2014 at 1:01 AM, Madeleine Udell 
>>> wrote:
>>>
>>>> That's not a problem for me; all of my data is numeric. To summarize a
>>>> long post, I'm interested in understanding
>>>>
>>>> 1) good programming paradigms for using shared memory together with
>>>> parallel maps. In particular, can a shared array and other nonshared data
>>>> structure be combined into a single data structure and "passed" in a remote
>>>> call without unnecessarily copying the shared array? and
>>>> 2) possibilities for extending shared memory in julia to other data
>>>> types, and even to user defined types.
>>>>
>>>>
>>>> On Tuesday, January 21, 2014 11:17:10 PM UTC-8, Amit Murthy wrote:
>>>>
>>>>> I have not gone through your post in detail, but would like to point
>>>>> out that SharedArray can only be used for bitstypes.
>>>>>
>>>>>
>>>>> On Wed, Jan 22, 2014 at 12:23 PM, Madeleine Udell <
>>>>> madelei...@gmail.com> wrote:
>>>>>
>>>>>> # Say I have a list of tasks, eg tasks i=1:n
>>>>>> # For each task I want to call a function foo
>>>>>> # that depends on that task and some fixed data
>>>>>> # I have many types of fixed data: eg, arrays, dictionaries,
>>>>>> integers, etc
>>>>>>
>>>>>> # Imagine the data comes from eg loading a file based on user input,
>>>>>> # so we can't hard code the data into the function foo
>>>>>> # although it's constant during program execution
>>>>>>
>>>>>> # If I were doing this in serial, I'd do the following
>>>>>>
>>>>>> type MyData
>>>>>> myint
>>>>>> mydict
>>>>>> myarray
>>>>>> end
>>>>>>
>>>>>> function foo(task,data::MyData)
>>>>>> data.myint + data.myarray[data.mydict[task]]
>>>>>> end
>>>>>>
>>>>>> n = 10
>>>>>> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>>>>>>
>>>>>> results = zeros(n)
>>>>>> for i = 1:n
>>>>>> results[i] = foo(i,data)
>>>>>> end
>>>>>>
>>>>>> # What's the right way to do this in parallel? Here are a number of
>>>>>> ideas
>>>>>> # To use @parallel or pmap, we have to first copy all the code and
>>>>>> data everywhere
>>>>>> # I'd like to avoid that, since the data is huge (10 - 100 GB)
>>>>>>
>>>>>> @everywhere begin
>>>>>> type MyData
>>>>>>  myint
>>>>>> mydict
>>>>>> myarray
>>>>>> end
>>>>>>
>>>>>> function foo(task,data::MyData)
>>>>>> data.myint + data.myarray[da

Re: [julia-users] ijulia parallel

2014-01-25 Thread Amit Murthy

This was the fix in Julia (2 days ago) that corrected it -
https://github.com/JuliaLang/julia/commit/390f466d23798b47ba6be40ee5777e21642569bb


Yo may want to pull the latest Julia and try again.


On Sun, Jan 26, 2014 at 9:36 AM, Madeleine Udell
wrote:

> I tried running Pkg.update() and restarting IJulia, with the same error.
> I'm using the Julia installed from github two days ago. Any other ideas
> what might be going on?
>
>
> On Saturday, January 25, 2014 1:59:55 PM UTC-8, Jiahao Chen wrote:
>>
>> > ERROR: StateError("Resource temporarily unavailable")
>>
>> This error was recently fixed. Try running Pkg.update() and restarting
>> IJulia.
>>
>> > Additionally, I'd like to be able to use the ijulia analog of
>> >
>> > julia --machinefile fn
>> >
>> >
>> > Is there something I need to add to my ipython julia profile to make
>> this
>> > possible? Or a command line argument I can pass? Or can I set up an
>> ipython
>> > cluster for use with ijulia?
>>
>> You can do this programmatically using a ClusterManager.
>>
>> http://docs.julialang.org/en/latest/manual/parallel-
>> computing/#clustermanagers
>>
>> Not sure about the IPython cluster option.
>>
>> Thanks,
>>
>> Jiahao Chen, PhD
>> Staff Research Scientist
>> MIT Computer Science and Artificial Intelligence Laboratory
>>
>

Re: [julia-users] Using SharedArray with pmap

2014-01-26 Thread Amit Murthy

@parallel is efficient at executing a large number of small computations,
while pmap is better for a small number of complex computations.

What is happening in the mypmap case is that remotecall_fetch is being
called 1 times, i.e., 1 roundtrips to other processors. Not very
efficient.

I would also point out that in your particular example of @parallel (+),
you will find that sum(a::AbstractArray) is much faster than @parallel.

However, say you wanted to initialize the shared array in parallel, then
you would find that

s=SharedArray(Int, 10^8)
@parallel for i in 1:10^8
   s[i] = rand(1:10)
end

quite efficiently use all workers in initializing the shared array.






On Sun, Jan 26, 2014 at 12:55 PM, Madeleine Udell  wrote:

> When using SharedArrays with pmap, I'm getting an increase in memory usage
> and time proportional to the number of tasks. This doesn't happen when
> using @parallel. What's the right way to pass shared arrays to workers
> using functional syntax?
>
> (code for file q3.jl pasted below and also attached; the first timing
> result refers to a @parallel implementation, the second to a pmap-style
> implementation)
>
> ᐅ julia -p 10 q3.jl 100
> elapsed time: 1.14932906 seconds (12402424 bytes allocated)
> elapsed time: 0.097900614 seconds (2716048 bytes allocated)
> ᐅ julia -p 10 q3.jl 1000
> elapsed time: 1.140016584 seconds (12390724 bytes allocated)
> elapsed time: 0.302179888 seconds (21641260 bytes allocated)
> ᐅ julia -p 10 q3.jl 1
> elapsed time: 1.173121314 seconds (12402424 bytes allocated)
> elapsed time: 2.429918636 seconds (197840960 bytes allocated)
>
> n = int(ARGS[1])
> arr = randn(n)
> function make_shared(a::AbstractArray,pids=workers())
> sh = SharedArray(typeof(a[1]),size(a),pids=pids)
> sh[:] = a[:]
> return sh
> end
> arr = make_shared(arr)
> tasks = 1:n
>
> @time begin
> @parallel (+) for i in tasks
> arr[i]
> end
> end
>
> @everywhere function f(task,arr)
> arr[task]
> end
> function mypmap(f::Function, tasks, arr)
> # if this resends the shared data every time, it shouldn't)
> np = nprocs()  # determine the number of processes available
> n = length(tasks)
> results = 0
> i = 1
> # function to produce the next work item from the queue.
> # in this case it's just an index.
> nextidx() = (idx=i; i+=1; idx)
> @sync begin
> for p=1:np
> if p != myid() || np == 1
> @async begin
> while true
> idx = nextidx()
> if idx > n
> break
> end
> task = tasks[idx]
> results += remotecall_fetch(p, f, task, arr)
> end
> end
> end
> end
> end
> results
> end
>
> @time mypmap(f,tasks,arr)
>

Re: [julia-users] @parallel + zip

2014-01-26 Thread Amit Murthy

The "@parallel for" works only with ranges  - only data that is in the for
body is copied. We should print a better error message though.

I cannot think of a way to have a distributed randperm that does not
involve copying other than using a SharedArray.

If it is not an issue copying only the specific parts of the distribution,
a DArray can also serve your purpose.

n=1000
x = randperm(n); y = randperm(n)

d=distribute(map(t->t, zip(x,y)))
# Only the specific localparts are copied to each of the workers
participating in the darray...

@sync begin
for p in procs(d)
@async begin
remotecall_fetch(p,
D -> begin
for t in localpart(D)
println(t)
# do any work on the localpart of the DArray
end
end,
d)
end
end
end

On Sun, Jan 26, 2014 at 11:54 PM, Madeleine Udell  wrote:

> @parallel breaks when paralleling a loop over a Zip. Is there a workaround
> that allows me not to explicitly form the sequence I'm iterating over? I'd
> like to avoid copying (unnecessarily) the data from the sequences I'm
> zipping up.
>
> n = 1000
> x = randperm(n); y = randperm(n)
> @parallel for t in zip(x,y)
> x,y = t
> println(x,y)
> end
>
> exception on 1: ERROR: no method
> getindex(Zip2{Array{Int64,1},Array{Int64,1}}, Range1{Int64})
>  in anonymous at no file:1467
>  in anonymous at multi.jl:1278
>  in run_work_thunk at multi.jl:575
>  in run_work_thunk at multi.jl:584
>  in anonymous at task.jl:88
>

Re: [julia-users] @parallel + zip

2014-01-26 Thread Amit Murthy

Would just like to add that a regular DArray constructor takes an init
function that initializes the localparts of the DArray - there is no
copying. But in your case with a randperm(n), I think we will have to
create it in the caller and then distribute the parts.


On Mon, Jan 27, 2014 at 8:58 AM, Amit Murthy  wrote:

> The "@parallel for" works only with ranges  - only data that is in the for
> body is copied. We should print a better error message though.
>
> I cannot think of a way to have a distributed randperm that does not
> involve copying other than using a SharedArray.
>
> If it is not an issue copying only the specific parts of the distribution,
> a DArray can also serve your purpose.
>
> n=1000
> x = randperm(n); y = randperm(n)
>
> d=distribute(map(t->t, zip(x,y)))
> # Only the specific localparts are copied to each of the workers
> participating in the darray...
>
>
> @sync begin
> for p in procs(d)
> @async begin
> remotecall_fetch(p,
> D -> begin
> for t in localpart(D)
> println(t)
> # do any work on the localpart of the DArray
> end
> end,
> d)
> end
> end
> end
>
>
>
>
>
>
>
>
>
> On Sun, Jan 26, 2014 at 11:54 PM, Madeleine Udell <
> madeleine.ud...@gmail.com> wrote:
>
>> @parallel breaks when paralleling a loop over a Zip. Is there a
>> workaround that allows me not to explicitly form the sequence I'm iterating
>> over? I'd like to avoid copying (unnecessarily) the data from the sequences
>> I'm zipping up.
>>
>> n = 1000
>> x = randperm(n); y = randperm(n)
>> @parallel for t in zip(x,y)
>> x,y = t
>> println(x,y)
>> end
>>
>> exception on 1: ERROR: no method
>> getindex(Zip2{Array{Int64,1},Array{Int64,1}}, Range1{Int64})
>>  in anonymous at no file:1467
>>  in anonymous at multi.jl:1278
>>  in run_work_thunk at multi.jl:575
>>  in run_work_thunk at multi.jl:584
>>  in anonymous at task.jl:88
>>
>
>

Re: [julia-users] @parallel + zip

2014-01-26 Thread Amit Murthy

distribute(map(t->t, zip(x,y))) in my code above is probably better written
as distribute([zip(x,y)...])



On Mon, Jan 27, 2014 at 9:16 AM, Amit Murthy  wrote:

> Would just like to add that a regular DArray constructor takes an init
> function that initializes the localparts of the DArray - there is no
> copying. But in your case with a randperm(n), I think we will have to
> create it in the caller and then distribute the parts.
>
>
> On Mon, Jan 27, 2014 at 8:58 AM, Amit Murthy wrote:
>
>> The "@parallel for" works only with ranges  - only data that is in the
>> for body is copied. We should print a better error message though.
>>
>> I cannot think of a way to have a distributed randperm that does not
>> involve copying other than using a SharedArray.
>>
>> If it is not an issue copying only the specific parts of the
>> distribution, a DArray can also serve your purpose.
>>
>> n=1000
>> x = randperm(n); y = randperm(n)
>>
>> d=distribute(map(t->t, zip(x,y)))
>> # Only the specific localparts are copied to each of the workers
>> participating in the darray...
>>
>>
>> @sync begin
>>  for p in procs(d)
>> @async begin
>> remotecall_fetch(p,
>> D -> begin
>> for t in localpart(D)
>> println(t)
>> # do any work on the localpart of the DArray
>> end
>> end,
>> d)
>> end
>> end
>> end
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Jan 26, 2014 at 11:54 PM, Madeleine Udell <
>> madeleine.ud...@gmail.com> wrote:
>>
>>> @parallel breaks when paralleling a loop over a Zip. Is there a
>>> workaround that allows me not to explicitly form the sequence I'm iterating
>>> over? I'd like to avoid copying (unnecessarily) the data from the sequences
>>> I'm zipping up.
>>>
>>> n = 1000
>>> x = randperm(n); y = randperm(n)
>>> @parallel for t in zip(x,y)
>>> x,y = t
>>> println(x,y)
>>> end
>>>
>>> exception on 1: ERROR: no method
>>> getindex(Zip2{Array{Int64,1},Array{Int64,1}}, Range1{Int64})
>>>  in anonymous at no file:1467
>>>  in anonymous at multi.jl:1278
>>>  in run_work_thunk at multi.jl:575
>>>  in run_work_thunk at multi.jl:584
>>>  in anonymous at task.jl:88
>>>
>>
>>
>

Re: [julia-users] how to clean ram

2014-01-29 Thread Amit Murthy

gc()


On Wed, Jan 29, 2014 at 4:07 PM, paul analyst  wrote:

> How to clean ram
> A=rand(10,10);
> doing anywahat ...
> A=0;
> is some command to clean trash in ram ?
> (I can't find in documetation  by "clear ram memory etc." :/)
> Paul
>

Re: [julia-users] Julia Parallel Computing Optimization

2014-02-03 Thread Amit Murthy

Would like to mention that the non-reducer version of @parallel is
asynchronous. Before you can use Ans1 and Ans2, you should wait for
completion.

For example, if you need to time it, you can wrap it in a @sync block like
this:

@time @sync begin
   @parallel .
  
   end
end


On Mon, Feb 3, 2014 at 10:25 PM, David Salamon  wrote:

> I have no experience with it, but it looks like you could also just do:
>
> Ans1 = SharedArray(Float64, (limit, int64(limit/2))
> Ans2 = SharedArray(Float64, (limit, int64(limit/2))
>
> @parallel for sample=1:samples, i=1:limit, j=1:int64(limit/2)
>Sx = S[i, sample]
>Sy = S[j, sample]
>Sxy = S[i+j, sample]
>...
>
>   Ans1[i,j] = Aix * Bix / samples / samples
>   Ans2[i,j] = Cix / samples
> end
>
> return (Ans1, Ans2)
>
>
> On Mon, Feb 3, 2014 at 8:48 AM, David Salamon  wrote:
>
>> Also S[:,1] is allocating. it should look something like:
>>
>> for sample=1:samples, i=1:limit, j=1:int64(limit/2)
>>Sx = S[i, sample]
>>Sy = S[j, sample]
>>Sxy = S[i+j, sample]
>>...
>> end
>>
>>
>> On Mon, Feb 3, 2014 at 8:45 AM, David Salamon  wrote:
>>
>>> You're not out of the no-slicing woods yet. Looks like you can get rid
>>> of `mx` and `my`
>>>
>>> for i=1:limit, j=1:int64(limit/2)
>>> end
>>>
>>>
>>>
>>> As far as parallelizing, you could define:
>>> three_tup_add(a, b, c) = (a[1] + b[1] + c[1], a[2] + b[2] + c[2], a[3] +
>>> b[3] + c[3])
>>>
>>> and then do a @parallel (three_tup_add) over your sample index?
>>>
>>> for that matter, why not compute the two parts of the answer directly
>>> rather than going via A, B, and C?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 3, 2014 at 8:11 AM, Alex C  wrote:
>>>
 Thanks. I've re-written the function to minimize the amount of copying
 (i.e. slicing) that is required. But now, I'm befuddled as to how to
 parallelize this function using Julia. Any suggestions?

 Alex

 function expensive_hat(S::Array{Complex{Float64},2},
 mx::Array{Int64,2}, my::Array{Int64,2})

 samples = 64
 A = zeros(size(mx));
 B = zeros(size(mx));
 C = zeros(size(mx));

 for i = 1:samples
 Si = S[:,i];
 Sx = Si[mx];
 Sy = Si[my];
 Sxy = Si[mx+my];
 Sxyc = conj(Sxy);

 A +=  abs2(Sy .* Sx);
 B += abs2(sqrt(Sxyc .* Sxy));
 C += Sxyc .* Sy .* Sx;
 end

 ans = (A .* B ./ samples ./ samples, C./samples)
 return ans

 end

 data = rand(24000,64);
 limit = 2000;

 ix = int64([1:limit/2]);
 iy = ix[1:end/2];
 mg = zeros(Int64,length(iy),length(ix));
 mx = broadcast(+,ix',mg);
 my = broadcast(+,iy,mg);
 S = rfft(data,1)./24000;

 @time (AB, C) = expensive_hat(S,mx,my);


>>>
>>>
>>
>

Re: [julia-users] Parallel sparse matrix vector multiplication

2014-02-05 Thread Amit Murthy

I can try to answer the last part. blas_set_num_threads is set to one only
when Julia is started in parallel mode, i.e. via the "-p" argument, or if
you execute an addprocs. This is done since the blas library by default
optimizes its thread count to the number of cores. If this is not done, say
on a 4-core system, you started Julia with "-p 4" and in each Julia
process, blas started 4 threads each, you end up with 16 compute threads
competing for 4 cores which is inefficient.


On Wed, Feb 5, 2014 at 10:12 PM, Madeleine Udell
wrote:

> I'm developing an iterative optimization algorithm in Julia along the
> lines of other contributions to the Iterative Solvers 
> projector Krylov
> Subspace
> module
>  whose
> only computationally intensive step is computing A*b or A'*b. I would like
> to parallelize the method by using a parallel sparse matrix vector
> multiply. Is there a standard backend matrix-vector multiply that's
> recommended in Julia if I'm targeting a shared memory computer with a large
> number of processors? Similarly, is there a recommended backend for
> targeting a cluster? My matrices can easily reach 10 million rows by 1
> million columns, with sparsity anywhere from .01% to problems that are
> nearly diagonal.
>
> I've seen many posts  talking
> about integrating PETSc as a backend for this purpose, but it looks like
> the projecthas 
> stalled - the last commits I see are a year ago. I'm also interested in
> other backends, eg Spark , 
> SciDB,
> etc.
>
> I'm more interested in solving sparse problems, but as a side note, the
> built-in BLAS acceleration by changing the number of threads 
> `blas_set_num_threads`
> works ok for dense problems using a moderate number of processors. I wonder
> why the number of threads isn't set higher than one by default, for
> example, using as many as nprocs() cores?
>

[julia-users] evaluating an expression using @spawnat

2014-02-05 Thread Amit Murthy

The below code generates an expression consisting of multiple using 
statements

pid = addprocs(1)[1]

e1 = Expr(:toplevel)
for (k,v) in Base.package_list
p = basename(k)
if endswith(p, ".jl")
p = p[1:end-3]
end
push!(e1.args, Expr(:using, symbol(p)))
end


While a 

remotecall(pid, eval, e1) 

works as expected, I cannot do the same with either a

@spawnat pid e1

or

@spawnat pid eval(e1)


Would like to understand why this is so.

Thanks,
  Amit

Re: [julia-users] evaluating an expression using @spawnat

2014-02-06 Thread Amit Murthy

Yeah. My bad.

@spawnat pid eval(ex)

does work.

Thanks.


On Thu, Feb 6, 2014 at 2:32 PM, Tim Holy  wrote:

> On Wednesday, February 05, 2014 11:20:57 PM Amit Murthy wrote:
> > @spawnat pid e1
>
> In this case, e1 is not a function, so it can't be executed.
>
> > @spawnat pid eval(e1)
>
> ex = Expr(:toplevel, Expr(:using, :Gadfly), Expr(:using, :ImmutableArrays))
> fetch(@spawnat pid eval(ex))
>
> Works for me.
>
> --Tim
>
>

Re: [julia-users] two-hop ssh tunneling

2014-02-10 Thread Amit Murthy

Yes, the worker is launched on the machines specified in the machinefile
(or addprocs). The tunnel is only used to connect to the worker.

Writing your own cluster manager to start the worker, which is then
connected to via a ssh tunnel should work.



On Tue, Feb 11, 2014 at 8:40 AM, James Porter wrote:

> I have the following situation:
>
> I have direct ssh access to a login node. From the login node, I have ssh
> access to the actual compute nodes that I want to spawn workers on. I don't
> have access to these nodes directly from my machine. I usually do a two-hop
> ssh into workers via something like this:
>
> ssh me@$LOGIN_NODE -t "ssh me@$WORKER_NODE"
>
> Does the ssh stuff in Base support spawning workers like this? I looked at
> the `tunnel` option to addprocs but it doesn't look like there's a way to
> specify a machine to use as the first hop, which makes me suspect this is
> not what I'm looking for. If not how would I go about doing so? Write a
> ClusterManager for it?
>
> Cheers,
> James
>

Re: [julia-users] Issue with 0.3

2014-02-11 Thread Amit Murthy

Also, SharedArrays are not yet supported on Windows.


On Wed, Feb 12, 2014 at 1:54 AM, Elliot Saba  wrote:

> I'm pretty sure all the nightlies are broken right now.  I'm working on
> that today, hopefully we'll get new nightlies soon that don't have the
> myriad of problems being reported over that last few days.
> -E
>
>
> On Tue, Feb 11, 2014 at 12:24 PM, Bob Cowdery wrote:
>
>> With the object of getting SharedArrays I downloaded 0.3. However, when I
>> run it (Win7/32) I get:
>>
>> Please submit a bug report with steps to reproduce this fault, and any
>> error messages that follow (in their entirety). Thanks.
>> Exception: EXCEPTION_ILLEGAL_INSTRUCTION at 0x667bdde8 -- ??? at ???:
>> offset 667bdde8
>> julia_next2102 at ???: offset 667bdde8
>> julia_first2101 at ???: offset 667bdc50
>> julia_typeinf2095 at ???: offset 667bb3e3
>> julia_typeinf209511082 at ???: offset 667bd533
>> julia_typeinf_ext2094 at ???: offset 667b9847
>> jl_apply_generic at ???: offset 6b04f766
>> jl_args_morespecific at ???: offset 6b04e40c
>> jl_args_morespecific at ???: offset 6b04ef61
>> jl_apply_generic at ???: offset 6b04f7b0
>> julia_init_load_path1868 at ???: offset 6679f0fe
>> julia__start1833 at ???: offset 6679af22
>> julia__start183310460 at ???: offset 6679b745
>> ??? at ???: offset 401881
>> julia_trampoline at ???: offset 6b084759
>> tgamma at ???: offset 41f888
>> ??? at ???: offset 401402
>> BaseThreadInitThunk at ???: offset 76caed5c
>> RtlInitializeExceptionChain at ???: offset 774c37eb
>> RtlInitializeExceptionChain at ???: offset 774c37be
>>
>> -- bob
>>
>
>

Re: [julia-users] Re: ANN: PTools

2014-02-13 Thread Amit Murthy

The shmem stuff in PTools.jl is quite old. Do check out the current master
which has support for SharedArrays - see
http://docs.julialang.org/en/latest/manual/parallel-computing/#shared-arrays-experimental-unix-only-feature-
which is definitely more usable.


On Thu, Feb 13, 2014 at 5:00 AM, Jim Christoff wrote:

> I'm a little confused as to what functions are getting executed in your
> example. Is it mypid()?. Could you point out what is the function.
>
>
> On Saturday, May 4, 2013 1:57:18 AM UTC-4, Amit Murthy wrote:
>>
>> Folks,
>>
>> Please do checkout PTools <https://github.com/amitmurthy/PTools.jl> . I
>> hope to grow it into a collection of utilities for parallel computation.
>>
>> Currently the following are available.
>>
>>-
>>
>>ServerTasks - These are long running tasks that simply processes
>>incoming requests in a loop. Useful in situations where state needs to be
>>maintained across function calls. State can be maintained and retrieved
>>using the task_local_storage methods.
>>-
>>
>>SharedMemory - Useful in the event of parallel processing on a single
>>large multi-core machine. Avoids the overhead associated with
>>sending/recieving large data sets.
>>
>> <https://github.com/amitmurthy/PTools.jl#usage>Usage
>>
>> Typical usage pattern will be
>>
>>-
>>
>>start_stasks - Start Server Tasks, optionally with shared memory
>>mappings.
>>-
>>
>>Execute a series of functions in parallel on these tasks using
>>multiple invocations of pmap_stasks
>>-
>>
>>SomeFunction
>>-
>>
>>SomeOtherFunction
>>-
>>
>>SomeOtherFunction . . .
>>-
>>
>>stop_stasks - Stop all Server Tasks and free shared memory if
>>required.
>>
>> The user specified functions in pmap_stasks can store and retrieve state
>> information using the task_local_storage functions.
>> <https://github.com/amitmurthy/PTools.jl#example>Example
>>
>> The best way to understand what is available is by example:
>>
>>- specify shared memory configuration.
>>
>> using PTools
>>
>> shmcfg = [ShmCfg(:svar1, Int32, 64*1024), ShmCfg(:svar2, Uint8, (100,100))]
>>
>>
>>-
>>
>>the above line requests for a 64K Int32 array bound to svar1, and a
>>100x100 Uint8 array bound to svar2
>>-
>>
>>Start tasks.
>>
>>h = start_stasks(shmcfg)
>>ntasks = count_stasks(h)
>>
>>-
>>
>>The tasks are started and symbols pointing to shared memory segments
>>are added as task local storage. A handle is returned.
>>-
>>
>>The shared memory segments are also mapped in the current tasks local
>>storage.
>>-
>>
>>NOTE: If nprocs() > 1, then only the Worker Julia processes are used
>>to start the Server Tasks, i.e., if nprocs() == 5, then ntasks above would
>>be 4.
>>-
>>
>>Prepare arguments for our pmap call
>>
>>offset_list = [i for i in 1:ntasks]
>>ntasks_list = [ntasks for i in 1:ntasks]
>>
>>-
>>
>>Execute our function in parallel.
>>
>>resp = pmap_stasks(h, (offset, ntasks) -> begin
>># get local refernces to shared memory mapped arrays
>>svar1 = task_local_storage(:svar1)
>>svar2 = task_local_storage(:svar2)
>>
>>mypid = myid()
>>for x in offset:ntasks:64*1024
>>svar1[x] = mypid
>>end
>>
>>true
>>
>>end,
>>
>>offset_list, ntasks_list)
>>
>>-
>>
>>Access shared memory segments and view changes
>>
>> svar1 = task_local_storage(:svar1)
>> println(svar1)
>>
>> svar1 will print the values as updated by the Server Tasks.
>>
>>- Finally stop the tasks
>>
>> stop_stasks(h, shmcfg)
>>
>> This causes all tasks to be stopped and the shared memory unmapped.
>>
>>
>>

[julia-users] question on map

2014-02-13 Thread Amit Murthy

julia> tl = [(1,2), ("a", "b")]
2-element Array{(Any,Any),1}:
 (1,2)
 ("a","b")


julia> map(first, tl)
ERROR: no method convert(Type{Int64}, ASCIIString)
 in setindex! at array.jl:338
 in map_to! at abstractarray.jl:1353
 in map at abstractarray.jl:1362



Is this by design? One would expect the above map to just return an Array 
of Any, as does

julia> [x[1] for x in tl]

2-element Array{Any,1}:
 1   
  "a"



which(map, first, tl) lists :
map(f::Union(Function,DataType),A::AbstractArray{T,N}) at abstractarray.jl:
1359


which creates an array similar to the first element.

Re: [julia-users] Re: Parallel sparse matrix vector multiplication

2014-02-14 Thread Amit Murthy

The `using ParallelSparseMatMul` must be after any `addprocs` statements.


On Fri, Feb 14, 2014 at 2:21 PM, Jon Norberg wrote:

> Amazing, just what I was looking for.
>
> However :-/ I did exactly s your read me, installed, and using exactly
> your example I get:
>
> julia> y = S*x
>
> fatal error on 2: ERROR: ParallelSparseMatMul not defined
>
> Worker 2 terminated.
>
> ProcessExitedException()
>
>
>
> is it enough to just write
>
> using ParallelSparseMatMul
>
>
> to have access to it? Seems julia can't find some function...
>
>
> Any help?
>
>
> Many thanks
>
>
> On Friday, February 14, 2014 2:31:08 AM UTC+1, Madeleine Udell wrote:
>
>> Thanks, Jiahao! It looks like you've already made a great dent in the
>> iterative solvers wishlist. I'm planning on using the
>> ParallelSparseMatMul library along with some iterative solvers
>> (possibly just LSQR) to implement iterative solvers for nonnegative
>> least squares, lasso, elastic net, etc using ADMM. It would be nice to
>> ensure that eg shared arrays stay shared in IterativeSolvers to ensure
>> it works well with parallel matrix multiplication.
>>
>> On Thu, Feb 13, 2014 at 5:05 PM, Jiahao Chen  wrote:
>> > Fantastic work!
>> >
>> > I've been meaning to get back to work on IterativeSolvers...
>> >
>> > Thanks,
>> >
>> > Jiahao Chen, PhD
>> > Staff Research Scientist
>> > MIT Computer Science and Artificial Intelligence Laboratory
>>
>>
>>
>> --
>> Madeleine Udell
>> PhD Candidate in Computational and Mathematical Engineering
>> Stanford University
>> www.stanford.edu/~udell
>>
>

Re: [julia-users] Re: broadcast object to all processes

2014-02-20 Thread Amit Murthy

Since he needs to simultaneously compute b, how about using a DArray just
to store the local copies. Something like:

d = DArray(init_b_func, (nworkers(), ), workers()) where
init_b_func(indexes) returns initializes and returns a single object of
TypeB . The length of the DArray is the same as the number of workers.

This is probably not the most canonical use of a DArray, but hey,
references to all instances of b on all processes are kept in the main
process in a single object.
Retrieving b on each of the workers is via localpart(d) .




On Fri, Feb 21, 2014 at 8:36 AM, Tim Holy  wrote:

> julia> wpid = addprocs(1)[1]
> 2
>
> julia> rr = RemoteRef(wpid)
> RemoteRef(2,1,5)
>
> julia> put!(rr, "Hello")
> RemoteRef(2,1,5)
>
> julia> fetch(rr)
> "Hello"
>
> --Tim
>
> On Thursday, February 20, 2014 05:20:49 PM Peter Simon wrote:
> > I'm finding myself in need of similar functionality lately.  Has any
> > progress been made on this front?  In my case, since the large object b
> > needs to be computed at run-time, I would prefer to simultaneously
> compute
> > it on all workers at the beginning, then have these copies stick around
> for
> > later reuse.  In the Matlab version of the code I'm porting to Julia, I
> do
> > this with persistent variables.
> >
> > Thanks,
> > --Peter
> >
> > On Thursday, December 13, 2012 2:04:12 AM UTC-8, Viral Shah wrote:
> > > Need to wrap up remote_call / remote_call_fetch in a few higher level
> > > functions for such things. I'm going to get cracking on improving our
> > > parallel support soon.
> > >
> > > -viral
> > >
> > > On Thursday, December 13, 2012 12:29:58 PM UTC+5:30, Miles Lubin wrote:
> > >> Seemingly simple question that I haven't managed to figure out after
> > >> reading the documentation and playing around: How can you broadcast an
> > >> object from one process (say the main process) to all running
> processes?
> > >> I
> > >> come from an MPI background where this is a fundamental operation.
> > >>
> > >> To give an example, say I have a function f(a,b), where b is some
> large
> > >> 100MB+ dataset/matrix/object, and I want to compute f(a,b) for a in
> some
> > >> range and b fixed. It doesn't make sense to send a new copy of b with
> > >> each
> > >> call. Instead I'd like to broadcast b to each process and keep a
> > >> persistent
> > >> copy in each process to use during the pmap. What's the best and
> > >> prettiest
> > >> way to do this?
> > >>
> > >> Thanks,
> > >> Miles
>

Re: [julia-users] Re: broadcast object to all processes

2014-02-20 Thread Amit Murthy

Seems OK. As for the global, you could use this tip from the performance
section at -
http://julia.readthedocs.org/en/latest/manual/performance-tips/#avoid-global-variables:

Uses of non-constant globals can be optimized by annotating their types at
the point of use:

global xy = f(x::Int + 1)



On Fri, Feb 21, 2014 at 11:34 AM, Peter Simon  wrote:

> Thanks for the ideas, Tim and Amit.
> I was playing around with some simple ideas, and the following seems to
> work...
>
> # The file Pmodule.jl contains
>
> module Pmodule
> export make_b, use_b
> function make_b(n::Int)
> global b
> b = rand(n,n);
> end
> function use_b(x::Float64)
> global b
> x + b
> end
> end
>
> Here is an interactive Julia session:
>
> julia> pid = addprocs(2)
> 2-element Array{Any,1}:
>  2
>  3
>
> julia> @everywhere using Pmodule
>
> julia> @everywhere make_b(3)
>
> julia> remotecall_fetch(pid[1], use_b, 10.0)
> 3x3 Array{Float64,2}:
>  10.1983  10.3043  10.5157
>  10.3442  10.9273  10.3606
>  10.0316  10.8404  10.8781
>
> julia> remotecall_fetch(pid[2], use_b, 4.0)
> 3x3 Array{Float64,2}:
>  4.57154  4.06708  4.07531
>  4.47332  4.42889  4.07916
>  4.83437  4.06286  4.19575
>
> julia> use_b(6.0)
> 3x3 Array{Float64,2}:
>  6.59869  6.75167  6.74789
>  6.99774  6.22108  6.54333
>  6.85287  6.7234   6.00127
>
> Is this a reasonable way to do it?  Will it be inefficient because I'm
> using a global variable?
>
> Thanks,
> Peter
>
>
> On Thursday, February 20, 2014 7:20:34 PM UTC-8, Amit Murthy wrote:
>
>> Since he needs to simultaneously compute b, how about using a DArray just
>> to store the local copies. Something like:
>>
>> d = DArray(init_b_func, (nworkers(), ), workers()) where
>> init_b_func(indexes) returns initializes and returns a single object of
>> TypeB . The length of the DArray is the same as the number of workers.
>>
>> This is probably not the most canonical use of a DArray, but hey,
>> references to all instances of b on all processes are kept in the main
>> process in a single object.
>> Retrieving b on each of the workers is via localpart(d) .
>>
>>
>>
>>
>> On Fri, Feb 21, 2014 at 8:36 AM, Tim Holy  wrote:
>>
>>> julia> wpid = addprocs(1)[1]
>>> 2
>>>
>>> julia> rr = RemoteRef(wpid)
>>> RemoteRef(2,1,5)
>>>
>>> julia> put!(rr, "Hello")
>>> RemoteRef(2,1,5)
>>>
>>> julia> fetch(rr)
>>> "Hello"
>>>
>>> --Tim
>>>
>>> On Thursday, February 20, 2014 05:20:49 PM Peter Simon wrote:
>>> > I'm finding myself in need of similar functionality lately.  Has any
>>> > progress been made on this front?  In my case, since the large object b
>>> > needs to be computed at run-time, I would prefer to simultaneously
>>> compute
>>> > it on all workers at the beginning, then have these copies stick
>>> around for
>>> > later reuse.  In the Matlab version of the code I'm porting to Julia,
>>> I do
>>> > this with persistent variables.
>>> >
>>> > Thanks,
>>> > --Peter
>>> >
>>> > On Thursday, December 13, 2012 2:04:12 AM UTC-8, Viral Shah wrote:
>>> > > Need to wrap up remote_call / remote_call_fetch in a few higher level
>>> > > functions for such things. I'm going to get cracking on improving our
>>> > > parallel support soon.
>>> > >
>>> > > -viral
>>> > >
>>> > > On Thursday, December 13, 2012 12:29:58 PM UTC+5:30, Miles Lubin
>>> wrote:
>>> > >> Seemingly simple question that I haven't managed to figure out after
>>> > >> reading the documentation and playing around: How can you broadcast
>>> an
>>> > >> object from one process (say the main process) to all running
>>> processes?
>>> > >> I
>>> > >> come from an MPI background where this is a fundamental operation.
>>> > >>
>>> > >> To give an example, say I have a function f(a,b), where b is some
>>> large
>>> > >> 100MB+ dataset/matrix/object, and I want to compute f(a,b) for a in
>>> some
>>> > >> range and b fixed. It doesn't make sense to send a new copy of b
>>> with
>>> > >> each
>>> > >> call. Instead I'd like to broadcast b to each process and keep a
>>> > >> persistent
>>> > >> copy in each process to use during the pmap. What's the best and
>>> > >> prettiest
>>> > >> way to do this?
>>> > >>
>>> > >> Thanks,
>>> > >> Miles
>>>
>>
>>

Re: [julia-users] Exclude master node in pmap computation

2014-02-23 Thread Amit Murthy

Is it possible to share the relevant portions of the call here?


On Sun, Feb 23, 2014 at 11:44 AM, Micah McClimans
wrote:

> Thank you, it turns out my problem was coming from an @everywhere macro,
> not from pmap.
>
> However, and I hope it is not bad practice continuing in this same thread,
> but now I'm seeing that pmap is not utilizing all of the workers available
> for the process, in fact it is using only one, despite having 8 local and 8
> remote workers available. What sort of problems could be causing this
> behavior?
>
>
> On Saturday, February 22, 2014 6:32:18 PM UTC-5, Stefan Karpinski wrote:
>
>> If there are other processors, pmap doesn't use the head node by default:
>>
>> julia> addprocs(2)
>> 2-element Array{Any,1}:
>>  2
>>  3
>>
>> julia> pmap(x->myid(), 1:10)
>> 10-element Array{Any,1}:
>>  2
>>  3
>>  3
>>  2
>>  2
>>  3
>>  2
>>  3
>>  2
>>  3
>>
>>
>> On Sat, Feb 22, 2014 at 5:50 PM, Micah McClimans wrote:
>>
>>> I am working on distributing a compute intensive task over a cluster in
>>> Julia, using the pmap function. However, for several reasons I would like
>>> to avoid having the master node used in the computation- is there a way to
>>> accomplish this using the built in keyword, or will I need to rewrite pmap?
>>>
>>
>>

Re: [julia-users] Exclude master node in pmap computation

2014-02-23 Thread Amit Murthy

Is it possible that any one of capitals,{marketHistory for i in 1
:trials},depths,ALRs,{vec(variances) for i in 1:trials},{weights[:,i] for i
in 1:trials},sigComponents is of length 1?

Testing with "-p 3" and a busywait function:

@everywhere function bw(s...) t1 = time() while (time() - t1) < s[1] end end

pmap(bw, [10,20,30], [4,5,6]) results in 3 workers first taking up 100% of
3 cores, then 2 and finally 1 for the final 10 seconds.

pmap(bw, [10,20,30], [4]) will result in just one worker taking up one core
for 10 seconds.







On Sun, Feb 23, 2014 at 3:25 PM, Micah McClimans wrote:

> Sure.
>
> Invoked with julia -p 8
>
> fid=open("/home/main/data/juliafiles/Julia/machinefile.txt")
>
> rc=readlines(fid)
>
> m={match(r"(\r|\n)",rcd) for rcd in rc}
>
> machines={rc[ma][1:(m[ma].offset-1)] for ma in 1:length(m)}
> ...
> addprocs(machines;dir="/home/main/data/programfiles/julia/usr/bin")
> ...
> @everywhere progfile="/home/main/data/juliafiles/Julia/WLPPInt.jl"@everywhere 
> marketHistoryFile="/home/main/data/juliafiles/Julia/daily.mat"
> @everywhere using MAT
> @everywhere include(progfile)
> @everywhere Daily= matread(marketHistoryFile)@everywhere marketHistory= 
> NYSE["NYSE_Smoothed_Closes"]
> ...
> results=pmap(runWLPPIntTest,capitals,{marketHistory for i in 
> 1:trials},depths,ALRs,{vec(variances) for i in 1:trials},{weights[:,i] for i 
> in 1:trials},sigComponents)
>
> I'm not really sure if this is enough to be useful though, or what really 
> would be able to be useful.
>
>
>
> On Sunday, February 23, 2014 3:58:04 AM UTC-5, Amit Murthy wrote:
>
>> Is it possible to share the relevant portions of the call here?
>>
>>
>> On Sun, Feb 23, 2014 at 11:44 AM, Micah McClimans wrote:
>>
>>> Thank you, it turns out my problem was coming from an @everywhere macro,
>>> not from pmap.
>>>
>>> However, and I hope it is not bad practice continuing in this same
>>> thread, but now I'm seeing that pmap is not utilizing all of the workers
>>> available for the process, in fact it is using only one, despite having 8
>>> local and 8 remote workers available. What sort of problems could be
>>> causing this behavior?
>>>
>>>
>>> On Saturday, February 22, 2014 6:32:18 PM UTC-5, Stefan Karpinski wrote:
>>>
>>>> If there are other processors, pmap doesn't use the head node by
>>>> default:
>>>>
>>>> julia> addprocs(2)
>>>> 2-element Array{Any,1}:
>>>>  2
>>>>  3
>>>>
>>>> julia> pmap(x->myid(), 1:10)
>>>> 10-element Array{Any,1}:
>>>>  2
>>>>  3
>>>>  3
>>>>  2
>>>>  2
>>>>  3
>>>>  2
>>>>  3
>>>>  2
>>>>  3
>>>>
>>>>
>>>> On Sat, Feb 22, 2014 at 5:50 PM, Micah McClimans 
>>>> wrote:
>>>>
>>>>> I am working on distributing a compute intensive task over a cluster
>>>>> in Julia, using the pmap function. However, for several reasons I would
>>>>> like to avoid having the master node used in the computation- is there a
>>>>> way to accomplish this using the built in keyword, or will I need to
>>>>> rewrite pmap?
>>>>>
>>>>
>>>>
>>

Re: [julia-users] ProtoBuf library for Julia

2014-02-23 Thread Amit Murthy

You may want to check out Tanmay's https://github.com/tanmaykm/Protobuf.jl


On Sun, Feb 23, 2014 at 8:51 PM, Uwe Fechner wrote:

> Hello,
>
> our control and simulation software is highly modular, and we use google
> protocol buffer
> encoded messages over ZeroMQ sockets to communicate.
>
> What is the best approach to send and receive protobuf encoded messages:
>
> a) using one of the C bindings; there are actually three:
> - http://spbc.sourceforge.net/
> - http://code.google.com/p/protobuf-c/
> - http://koti.kapsi.fi/jpa/nanopb/
>
> b) using the Python library:
> https://developers.google.com/protocol-buffers/docs/pythontutorial
> This one is officially supported by google in contrast to the C
> bindings
>
> c) writing a full protobuf compiler, coder and decoder in Julia directly
>
> Any comments welcome.
>
> Uwe Fechner
>

Re: [julia-users] pmap and top level variables

2014-02-24 Thread Amit Murthy

The below should work:

for pid in workers()
remotecall(pid, x->(global a; a=x; nothing), a)
end






On Mon, Feb 24, 2014 at 3:23 PM, David van Leeuwen <
david.vanleeu...@gmail.com> wrote:

> Thanks,
>
> is there a way to distribute a top level variable?  E.g., suppose "a" is
> actually a large complex data structure, can I send it to the workers
> somehow, like
>
> @everywhere a
>
> Within the function this a seems to be automatically distributed.
>
> ---david
>
> On Monday, February 24, 2014 12:12:23 AM UTC+1, Jameson wrote:
>
>> Yes. It is expected that you have taken care of distributing globals
>> (values/functions/constants) as needed to the processors.
>> `@everywhere a = 1`  is a convenient way to handle this.
>>
>> -Jameson
>>
>>
>> On Sun, Feb 23, 2014 at 6:05 PM, David van Leeuwen > > wrote:
>>
>>> Hello,
>>>
>>> I noticed that directly from the REPL the scope for pmap() is different
>>> than from within a function.
>>>
>>> The following gives a "a unknown" error in the worker processes
>>>
>>> addprocs(2)
>>> a=1
>>>
>>> pmap(x->a+x, 1:2)
>>>
>>>
>>> However, if this is wrapped in a function, everything works as expected:
>>>
>>> f(a,r) = pmap(x->a+x, r)
>>> f(1, 1:2)
>>>
>>> Is this intended behaviour?
>>> ---david
>>>
>>
>>

[julia-users] Command interpolation question

2014-03-02 Thread Amit Murthy

How can I prevent the cartesian product from being done below. 

When exeflags = `1 2 3`

julia> `"julia $exeflags"` 

is 

`'julia 1' 'julia 2' 'julia 3'`

What I want is 

`"julia 1 2 3"`

exeflags has to be a command object and "julia $exeflags" has to exist 
within double quotes.

Re: [julia-users] Re: Command interpolation question

2014-03-02 Thread Amit Murthy

It has to.

It is actually this line -
https://github.com/JuliaLang/julia/blob/master/base/multi.jl#L1139


On Mon, Mar 3, 2014 at 12:45 PM, Ismael VC  wrote:

> Don't surround the expresion in quotes to get what you want:
>
> ulia> exeflags = `1 2 3`
> `1 2 3`
>
> julia> `"julia $exeflags"`
> `'julia 1' 'julia 2' 'julia 3'`
>
> julia> `julia $exeflags`
> `julia 1 2 3`
>
>
>
>
>

Re: [julia-users] Re: Command interpolation question

2014-03-02 Thread Amit Murthy

I would prefer not to quote exeflags. Since exeflags itself may be built up
at different places using

exeflags = `$exeflags some_other_flag`

For now, I am just working around this, so I am good.

Thanks.

On Mon, Mar 3, 2014 at 12:59 PM, Ismael VC  wrote:

> julia> dump(`"julia $exeflags"`)
> Cmd
>   exec: Array(Union(ASCIIString,UTF8String),(1,))
> Union(ASCIIString,UTF8String)["julia 1 2 3"]
>   ignorestatus: Bool false
>   detach: Bool false
>   env: Nothing nothing
>   dir: UTF8String ""
>
>
>

Re: [julia-users] Re: Command interpolation question

2014-03-03 Thread Amit Murthy

Cool. Thanks.



On Mon, Mar 3, 2014 at 2:35 PM, Ismael VC  wrote:

> julia> qcmd(`julia $exeflags`)
> `'julia 1 2 3'`
>
> El lunes, 3 de marzo de 2014 02:58:34 UTC-6, Ismael VC escribió:
>
>> This is all I could come up with:
>>
>> julia> exeflags = `1 2 3`;
>>
>> julia> qcmd(c::Cmd) = (s::ASCIIString = string(c)[2:end-1]; `$s`) # Quote
>> command.
>> qcmd (generic function with 1 method)
>>
>> julia> `"julia $(qcmd(exeflags))"`
>> `'julia 1 2 3'`
>>
>>
>> :-)
>>
>

Re: [julia-users] Preferred way of slicing an array into DArray

2014-03-04 Thread Amit Murthy

It is using the default distribution scheme that splits along the largest
dimension first.

For now, you could just define your own distribute function (a slight
modification of the distribute function in base/multi.jl)

function distribute(a::AbstractArray, pids, dist)
owner = myid()
rr = RemoteRef()
put!(rr, a)
DArray(size(a), pids, dist) do I
remotecall_fetch(owner, ()->fetch(rr)[I...])
end
end

where pids is a vector of worker ids
and
dist is a vector specifying the number of parts to split in each dimension.
In the example presented by you above it could be [1,3]. Note that in this
case the array will
only be distributed among 3 workers and not 4.

The call becomes
a_d = distribute(a, workers(), [1,3])






On Wed, Mar 5, 2014 at 4:15 AM, Daniel H  wrote:

> Hi all,
>
> I would like to know if there's any ways to distribute an array the way I
> want it.
> For example,
>
> if I have I have 4 workers and an array, a :
> julia> a = rand(4,3)
> and I want to give each worker a row to work on, how can I distribute the
> array?
>
> Julia has distribute function to slice array into DArray, but it doesn't
> slice it the way I want it.
> julia> a_d = distribute(a)
>
> julia> a_d.indexes
> 2x2 Array{(Range1{Int64},Range1{Int64}),2}:
>  (1:2,1:2)  (1:2,3:3)
>  (3:4,1:2)  (3:4,3:3)
> In this case, some workers get square matrix, some get column vector.
> However, I want every worker to get a row.
>
> I saw an article from Admin Magazine (
> http://www.admin-magazine.com/HPC/Articles/Julia-Distributed-Arrays)
> saying that we can do things like:
> a_d = distribute(a,1)
> but it doesn't work on mine machine. Is it only supported on older
> versions of Julia? How can I accomplish the same thing on the newer version
> of Julia?
>
> I'm using v0.2.1 on Mac OS X
>
> Thanks
>
> Daniel
>
>

Re: [julia-users] Preferred way of slicing an array into DArray

2014-03-04 Thread Amit Murthy

Also note that prod(dist)  <= length(pids) must be true


On Wed, Mar 5, 2014 at 8:57 AM, Amit Murthy  wrote:

> It is using the default distribution scheme that splits along the largest
> dimension first.
>
> For now, you could just define your own distribute function (a slight
> modification of the distribute function in base/multi.jl)
>
> function distribute(a::AbstractArray, pids, dist)
> owner = myid()
> rr = RemoteRef()
> put!(rr, a)
> DArray(size(a), pids, dist) do I
> remotecall_fetch(owner, ()->fetch(rr)[I...])
> end
> end
>
> where pids is a vector of worker ids
> and
> dist is a vector specifying the number of parts to split in each
> dimension. In the example presented by you above it could be [1,3]. Note
> that in this case the array will
> only be distributed among 3 workers and not 4.
>
> The call becomes
> a_d = distribute(a, workers(), [1,3])
>
>
>
>
>
>
> On Wed, Mar 5, 2014 at 4:15 AM, Daniel H  wrote:
>
>> Hi all,
>>
>> I would like to know if there's any ways to distribute an array the way I
>> want it.
>> For example,
>>
>> if I have I have 4 workers and an array, a :
>> julia> a = rand(4,3)
>> and I want to give each worker a row to work on, how can I distribute the
>> array?
>>
>> Julia has distribute function to slice array into DArray, but it doesn't
>> slice it the way I want it.
>> julia> a_d = distribute(a)
>>
>> julia> a_d.indexes
>> 2x2 Array{(Range1{Int64},Range1{Int64}),2}:
>>  (1:2,1:2)  (1:2,3:3)
>>  (3:4,1:2)  (3:4,3:3)
>> In this case, some workers get square matrix, some get column vector.
>> However, I want every worker to get a row.
>>
>> I saw an article from Admin Magazine (
>> http://www.admin-magazine.com/HPC/Articles/Julia-Distributed-Arrays)
>> saying that we can do things like:
>> a_d = distribute(a,1)
>> but it doesn't work on mine machine. Is it only supported on older
>> versions of Julia? How can I accomplish the same thing on the newer version
>> of Julia?
>>
>> I'm using v0.2.1 on Mac OS X
>>
>> Thanks
>>
>> Daniel
>>
>>
>

Re: [julia-users] Preferred way of slicing an array into DArray

2014-03-05 Thread Amit Murthy

The `dist` argument specifies the number of parts each dimension has to be
split into.

So, try with.
a_d = distribute(a, workers(), [length(workers()), 1]);

prod(dist)  must be <= length(pids)  so,  distributing one row per worker
does not make sense unless you have few rows and a huge number of columns.
For a large enough array (which is why  you want to use a darray in the
first place), the localpart will necessarily be a square matrix unless you
have very few rows.


On Thu, Mar 6, 2014 at 12:54 AM, Daniel H  wrote:

> You have been a great help, Amit!
>
> Now I can split the matrix into many columns regardless of the shape of
> the matrix, but I also wonder if we can split it into many rows
> I know that we can take the transpose of the matrix before splitting into
> DArray and then transpose it again, but it may take a lot of time to do the
> transpose.
>
>
> Here's what my overall code looks like right now:
> 
>
> julia> addprocs(4);
>
>
> julia> function Base.distribute(a::AbstractArray, pids, dist)
>
>owner = myid()
>rr = RemoteRef()
>
>put(rr, a)
>
>DArray(size(a), pids, dist) do I
>remotecall_fetch(owner, ()->fetch(rr)[I...])
>end
>end
>
>
> julia> a = rand(4,3);
>
>
> julia> a_d = distribute(a, workers(), [1,length(workers())]);
>
> julia> a_d.indexes
> 1x4 Array{(Range1{Int64},Range1{Int64}),2}:
>  (1:4,1:1)  (1:4,2:2)  (1:4,3:3)  (1:4,4:3)
>
> I noticed that it is always split into columns
> julia> a = rand(11,3);
>
>
> julia> a_d = distribute(a, workers(), [1,length(workers())]); a_d.indexes
> 1x4 Array{(Range1{Int64},Range1{Int64}),2}:
>  (1:11,1:1)  (1:11,2:2)  (1:11,3:3)  (1:11,4:3)
>
>
> julia> a = rand(3,11);
>
>
> julia> a_d = distribute(a, workers(), [1,length(workers())]); a_d.indexes
> 1x4 Array{(Range1{Int64},Range1{Int64}),2}:
>  (1:3,1:3)  (1:3,4:6)  (1:3,7:8)  (1:3,9:11)
>
>
> julia> a = rand(11,11);
>
>
> julia> a_d = distribute(a, workers(), [1,length(workers())]); a_d.indexes
> 1x4 Array{(Range1{Int64},Range1{Int64}),2}:
>  (1:11,1:3)  (1:11,4:6)  (1:11,7:8)  (1:11,9:11)
>
> Print out each worker's data
> julia> a = rand(4,3)
> 4x3 Array{Float64,2}:
>  0.333252  0.664941  0.121419
>  0.89810.24040.843411
>  0.169211  0.166081  0.862066
>  0.822757  0.184186  0.862764
>
> julia> a_d = distribute(a, workers(), [1,length(workers())]);
>
> julia> fetch(@spawnat 2 localpart(a_d))
> 4x1 Array{Float64,2}:
>  0.333252
>  0.8981
>  0.169211
>  0.822757
>
>
> julia> fetch(@spawnat 3 localpart(a_d))
> 4x1 Array{Float64,2}:
>  0.664941
>  0.2404
>  0.166081
>  0.184186
>
>
> julia> fetch(@spawnat 4 localpart(a_d))
> 4x1 Array{Float64,2}:
>  0.121419
>  0.843411
>  0.862066
>  0.862764
>
>
> julia> fetch(@spawnat 5 localpart(a_d))
> 4x0 Array{Float64,2}
>
>
>
> Tested on Version 0.2.1
>
>
>

Re: [julia-users] Question Around Distributed Work

2014-03-19 Thread Amit Murthy

If by "julia instance" you mean a set of related julia processes, as
started using 'addprocs', them the 'pmap' command pushes tasks onto free
workers only. A single julia cluster is not aware of other clusters on the
system if that is what you mean.

Regarding the second question, say you have a DArray, then you could use

remotecall_fetch(w, D->localpart(D)[m:n], D)

to return only the range m:n from the localpart of the darray D from worker
w






On Thu, Mar 20, 2014 at 4:25 AM, Austen McRae wrote:

> Hey All,
>
> I've read the docs, and browsed some of the sources, but wanted to ask a
> few questions to clarify my understanding regarding distributed work.
>
> My first question regards the SSHClusterManager, or any of the other
> cluster managers: is there a way to cap the number of available processes a
> user can use on a machine across multiple julia instances?  The similar
> solution is how Hadoop will assign slots on machines; tasks (maps or
> reduces) will wait until a slot becomes free if its being used.   The tools
> provided out of the box are already extremely strong, and wanted to check
> to see if there is something that will keep boxes from getting overloaded
> built in (the tricky part I would assume is the  "multiple julia instances"
> - in either case want to know before I tried to implement something!)
>
> The second is related to both the distributed array as well as the fetch
> command.  I'll be honest, I haven't looked at the source for this, but it
> seems like a fetch command pulls the entire result back at the time (I'm
> assuming the distributed array handles things the same, corrections
> welcome).  Is there a way to control the size of the return?
>
> To give the context, assume I'm doing a distributed sort across many
> machines, where the total list size doesn't fit into memory on a single
> machine.   The ideal solution is to have each machine/process sort a
> subset, and then send a partial sorted component back at each machine, and
> when that partial is fully merged, fetch the next partial from that
> machine, until done.
>
> In both cases, I don't doubt that it can't be done, but since I haven't
> dived deep yet wanted to make sure I'm not missing something thats already
> there (and if it is, feel free to berate me with where to look)!
>

Re: [julia-users] Question Around Distributed Work

2014-03-19 Thread Amit Murthy

Sorry, that should be

remotecall_fetch(w, (D, m, n)->localpart(D)[m:n], D, m, n)


On Thu, Mar 20, 2014 at 9:26 AM, Amit Murthy  wrote:

> If by "julia instance" you mean a set of related julia processes, as
> started using 'addprocs', them the 'pmap' command pushes tasks onto free
> workers only. A single julia cluster is not aware of other clusters on the
> system if that is what you mean.
>
> Regarding the second question, say you have a DArray, then you could use
>
> remotecall_fetch(w, D->localpart(D)[m:n], D)
>
> to return only the range m:n from the localpart of the darray D from
> worker w
>
>
>
>
>
>
> On Thu, Mar 20, 2014 at 4:25 AM, Austen McRae wrote:
>
>> Hey All,
>>
>> I've read the docs, and browsed some of the sources, but wanted to ask a
>> few questions to clarify my understanding regarding distributed work.
>>
>> My first question regards the SSHClusterManager, or any of the other
>> cluster managers: is there a way to cap the number of available processes a
>> user can use on a machine across multiple julia instances?  The similar
>> solution is how Hadoop will assign slots on machines; tasks (maps or
>> reduces) will wait until a slot becomes free if its being used.   The tools
>> provided out of the box are already extremely strong, and wanted to check
>> to see if there is something that will keep boxes from getting overloaded
>> built in (the tricky part I would assume is the  "multiple julia instances"
>> - in either case want to know before I tried to implement something!)
>>
>> The second is related to both the distributed array as well as the fetch
>> command.  I'll be honest, I haven't looked at the source for this, but it
>> seems like a fetch command pulls the entire result back at the time (I'm
>> assuming the distributed array handles things the same, corrections
>> welcome).  Is there a way to control the size of the return?
>>
>> To give the context, assume I'm doing a distributed sort across many
>> machines, where the total list size doesn't fit into memory on a single
>> machine.   The ideal solution is to have each machine/process sort a
>> subset, and then send a partial sorted component back at each machine, and
>> when that partial is fully merged, fetch the next partial from that
>> machine, until done.
>>
>> In both cases, I don't doubt that it can't be done, but since I haven't
>> dived deep yet wanted to make sure I'm not missing something thats already
>> there (and if it is, feel free to berate me with where to look)!
>>
>
>

Re: [julia-users] isready() blocking when process is busy

2014-03-20 Thread Amit Murthy

Hi,

Yes, since Julia implements co-operative multi tasking, a process cannot
respond to any other requests if it is CPU bound.

The wait functions currently do not have a timeout, though there has been
some discussion on the need for the same on github.

I have added a  few more inter-task communication mechanisms at
https://github.com/amitmurthy/MessageUtils.jl . Do check it out.

On Fri, Mar 21, 2014 at 2:51 AM, Valentin Churavy wrote:

> So isready() seems to block if the process the RemoteRef is on is busy.
>
> On the REPL
>
> addprocs(1)
> @everywhere function test()
> i = 0
> while true
> i += 1
> end
> return i
> end
>
>
> rref = @spawn test()
>
>
> isready(rref) # <- blocking
>
> What I would have expected is that it is evaluating to false (maybe after
> a timeout). Adding a yield() to the while loop allows the call to go
> through and properly evaluate to false.
>
> The reason I stumbled across the was my task management code  contains the
> following snippet.
>
>
> ...
> timedwait(() -> anyready(working_on), 120.0, pollint=0.5)
> ...
>
> function anyready(working_on)
> for rref in values(working_on)
> if isready(rref)
> return true
> end
> end
> return isempty(working_on)
> end
>
> But even adding the occasional yield to the computation job creates a
> situation when it sometimes get stuck and sometimes it doesn't.
>
> Best Valentin
>
>
>
>
>
>

Re: [julia-users] Re: Trying to understand how to structure parallel computation

2014-03-23 Thread Amit Murthy

I have not looked at your code, but I can suggest that you model your code
in the following manner.

- process 1 is the "driver" of the parrallel computations - it does not do
any computations itself.
- process 2 runs the "main" loop
- process 3 runs subproblems
- process 1 after starting the loop on process 2, waits on a remoteref (on
pid 1) that is set by process 2 after all computations are complete
- all remote refs that track task completion are only on process 1
- Now, when process 2 does an "isready", since the remoteref being tested
is on process 1, it will not block.







On Mon, Mar 24, 2014 at 4:14 AM, Iain Dunning  wrote:

> (talking to myself) isready seems to block, is that expected?
>
> Here is my code
> https://gist.github.com/IainNZ/9730991
>
> Is there a nonblocking isready or am I barking up the wrong tree?
>
>
>
>
> On Sunday, March 23, 2014 6:10:59 PM UTC-4, Iain Dunning wrote:
>>
>> Digging more into the "standard library" part of the manual, is this a
>> matter of using "isready" at the end of my main loop, which I know will be
>> safe because no one else is going to be looking at that RemoteRef?
>>
>> On Sunday, March 23, 2014 5:58:27 PM UTC-4, Iain Dunning wrote:
>>>
>>> Hi all,
>>>
>>> I've never really used the parallel stuff for a "real" task and I'm
>>> trying to understand the Julian way of structuring my computation.
>>>
>>> Heres the situation:
>>>
>>> - I have a "main" loop that is solving a series of problems
>>> - After solving one of these prolems, I sometimes want to solve an
>>> expensive subproblem that might improve the solution.
>>> - I want to solve this subproblem in a separate process, and I don't
>>> need the answer right away.
>>>
>>> Mentally I'm thinking of processor 1 running the main loop, and
>>> processor 2 working on solving any subproblems I send its way (queueing
>>> them up perhaps).
>>> At the end/start of every iteration of the main loop on processor 1 I'd
>>> "check" processor 2 to see if it has any solutions for me, and collect them
>>> if it has.
>>>
>>> Can someone help me out with how I should be thinking of this?
>>>
>>> Cheers,
>>> Iain
>>>
>>>
>>>
>>>

Re: [julia-users] Re: Terminate a task

2014-03-23 Thread Amit Murthy

I think currently the only way to interrupt a task is when it is blocked on
a condition variable by using the "notify" call. Will be good to have a
"terminate(t::Task)" call.

On Sun, Mar 23, 2014 at 9:08 PM, Bob Cowdery  wrote:

> Could you clarify please. If its on a blocking call how do I throw an
> error and if its not complete why would the gc delete it.
>
>
> On Sunday, March 23, 2014 3:27:18 PM UTC, Bob Cowdery wrote:
>>
>> Is there any way to terminate a task that is stuck on a blocking call. I
>> can see that I can call istaskdone() or start a Timeout to know that the
>> task is potentially blocked but I don't see anything in Task that lets me
>> terminate it.
>>
>> Bob
>>
>

Re: [julia-users] worker/proc nomenclature

2014-03-26 Thread Amit Murthy

In a parallel environment, the thinking is that all computations are
performed in "worker" processes, with pid 1, usually being the driver
process that does not do any actual computations itself.

processors should probably be changed to processes to maintain consistency.


On Wed, Mar 26, 2014 at 5:24 PM, Ben Arthur  wrote:

> if "procs() Returns a list of all process identifiers."
>
> and "workers() Returns a list of all worker process identifiers."
>
> then why do we "rmprocs(pids...) Removes the specified workers." instead
> of rmworkers()? particularly since
>
> julia> rmprocs(1)
> WARNING: rmprocs: process 1 not removed
> :ok
>
> similarly, why do we "addprocs() Add processes on remote machines" and not
> addworkers()?
>
> the above terminology seems a bit odd in a language which puts so much
> emphasis on expressiveness.
>
>
> note that there is what i believe to be a typo in the current
> documentation.  these both should refer to *processes* i believe, not
> processors.
>
> "nprocs() Get the number of available processors."
> "nworkers() Get the number of available worker processors."
>
>

Re: [julia-users] SharedArray oddities

2014-03-27 Thread Amit Murthy

I think the code does not do what you want.

In the non-shared case you are sending a 10^6 integer array over the
network 1000 times and summing it as many times. Most of the time is the
network traffic time. Reduce 'n' to say 10, and you will what I mean

In the shared case you are not sending the array over the network but still
summing the entire array 1000 times. Some of the remotecall_fetch calls
seems to be taking 40 milli seconds extra time which adds to the total.

shared time of 6 seconds being less than the 15 seconds for non-shared
seems to be just incidental.

I don't yet have an explanation for the extra 40 millseconds per
remotecall_fetch (for some calls only) in the shared case.






On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg wrote:

> Hi,
> I'm having some trouble figuring out exactly how I'm supposed to use
> SharedArrays - I might just be misunderstanding them or else something
> odd is happening with them.
>
> I'm trying to do some parallel computing which looks a bit like this
> test case:
>
> function createdata(shared)
> const n = 1000
> if shared
> A = SharedArray(Uint, (n, n))
> else
> A = Array(Uint, (n, n))
> end
> for i = 1:n, j = 1:n
> A[i, j] = rand(Uint)
> end
>
> return n, A
> end
>
> function mainfunction(r; shared = false)
> n, A = createdata(shared)
>
> i = 1
> nextidx() = (idx = i; i += 1; idx)
>
> @sync begin
> for p in workers()
> @async begin
> while true
> idx = nextidx()
> if idx > r
> break
> end
> found, s = remotecall_fetch(p, parfunction, n, A)
> end
> end
> end
> end
> end
>
> function parfunction(n::Int, A::Array{Uint, 2})
> # possibly do some other computation here independent of shared
> arrays
> s = sum(A)
> return false, s
> end
>
> function parfunction(n::Int, A::SharedArray{Uint, 2})
> s = sum(A)
> return false, s
> end
>
> If I then start julia with e.g. two worker processes, so julia -p 2, the
> following happens:
>
> julia> require("testpar.jl")
>
> julia> @time mainfunction(1000, shared = false)
> elapsed time: 15.717117365 seconds (8448701068 bytes allocated)
>
> julia> @time mainfunction(1000, shared = true)
> elapsed time: 6.068758627 seconds (56713996 bytes allocated)
>
> julia> rmprocs([2, 3])
> :ok
>
> julia> @time mainfunction(1000, shared = false)
> elapsed time: 0.717638344 seconds (40357664 bytes allocated)
>
> julia> @time mainfunction(1000, shared = true)
> elapsed time: 0.702174085 seconds (32680628 bytes allocated)
>
> So, with a normal array it's slow as expected, and it is faster with the
> shared array, but what seems to happen is that with the normal array cpu
> usage is 100 % on two cores but with the shared array cpu usage spikes
> for a fraction of a second and then for the remaining nearly 6 seconds
> it's at around 10 %. Can anyone reproduce this? Am I just doing
> something wrong with shared arrays.
>
> Slightly related note: is there now a way to create a random shared
> array? https://github.com/JuliaLang/julia/pull/4939 and the latest docs
> don't mention this.
>

Re: [julia-users] SharedArray oddities

2014-03-27 Thread Amit Murthy

Some more weirdness

Starting with julia -p 8

A=Base.shmem_fill(1, (1000,1000))

Using 2 workers:
for i in 1:100
 t1 = time(); p=2+(i%2); remotecall_fetch(p, x->1, A); t2=time();
println("@ $p ", int((t2-t1) * 1000))
end

prints

...
@ 3 8
@ 2 32
@ 3 8
@ 2 32
@ 3 8
@ 2 32
@ 3 8
@ 2 32


Notice that pid 2 always takes 32 milliseconds while pid 3 always takes 8



With 4 workers:

for i in 1:100
 t1 = time(); p=2+(i%4); remotecall_fetch(p, x->1, A); t2=time();
println("@ $p ", int((t2-t1) * 1000))
end

...
@ 2 31
@ 3 4
@ 4 4
@ 5 1
@ 2 31
@ 3 4
@ 4 4
@ 5 1
@ 2 31
@ 3 4
@ 4 4
@ 5 1
@ 2 31


Now pid 2 always takes 31 millisecs, pids 3&4, 4 and pid 5 1 millisecond

With 8 workers:

for i in 1:100
 t1 = time(); p=2+(i%8); remotecall_fetch(p, x->1, A); t2=time();
println("@ $p ", int((t2-t1) * 1000))
end


@ 2 20
@ 3 4
@ 4 1
@ 5 3
@ 6 4
@ 7 1
@ 8 2
@ 9 4
@ 2 20
@ 3 4
@ 4 1
@ 5 3
@ 6 4
@ 7 1
@ 8 2
@ 9 4
@ 2 20
@ 3 4
@ 4 1
@ 5 3
@ 6 4
@ 7 1
@ 8 3
@ 9 4
@ 2 20
@ 3 4
@ 4 1
@ 5 3
@ 6 4


pid 2 is always 20 milliseconds while the rest are pretty consistent too.

Any explanations?







On Thu, Mar 27, 2014 at 5:24 PM, Amit Murthy  wrote:

> I think the code does not do what you want.
>
> In the non-shared case you are sending a 10^6 integer array over the
> network 1000 times and summing it as many times. Most of the time is the
> network traffic time. Reduce 'n' to say 10, and you will what I mean
>
> In the shared case you are not sending the array over the network but
> still summing the entire array 1000 times. Some of the remotecall_fetch
> calls seems to be taking 40 milli seconds extra time which adds to the
> total.
>
> shared time of 6 seconds being less than the 15 seconds for non-shared
> seems to be just incidental.
>
> I don't yet have an explanation for the extra 40 millseconds per
> remotecall_fetch (for some calls only) in the shared case.
>
>
>
>
>
>
> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg wrote:
>
>> Hi,
>> I'm having some trouble figuring out exactly how I'm supposed to use
>> SharedArrays - I might just be misunderstanding them or else something
>> odd is happening with them.
>>
>> I'm trying to do some parallel computing which looks a bit like this
>> test case:
>>
>> function createdata(shared)
>> const n = 1000
>> if shared
>> A = SharedArray(Uint, (n, n))
>> else
>> A = Array(Uint, (n, n))
>> end
>> for i = 1:n, j = 1:n
>> A[i, j] = rand(Uint)
>> end
>>
>> return n, A
>> end
>>
>> function mainfunction(r; shared = false)
>> n, A = createdata(shared)
>>
>> i = 1
>> nextidx() = (idx = i; i += 1; idx)
>>
>> @sync begin
>> for p in workers()
>> @async begin
>> while true
>> idx = nextidx()
>> if idx > r
>> break
>> end
>> found, s = remotecall_fetch(p, parfunction, n, A)
>> end
>> end
>> end
>> end
>> end
>>
>> function parfunction(n::Int, A::Array{Uint, 2})
>> # possibly do some other computation here independent of shared
>> arrays
>> s = sum(A)
>> return false, s
>> end
>>
>> function parfunction(n::Int, A::SharedArray{Uint, 2})
>> s = sum(A)
>> return false, s
>> end
>>
>> If I then start julia with e.g. two worker processes, so julia -p 2, the
>> following happens:
>>
>> julia> require("testpar.jl")
>>
>> julia> @time mainfunction(1000, shared = false)
>> elapsed time: 15.717117365 seconds (8448701068 bytes allocated)
>>
>> julia> @time mainfunction(1000, shared = true)
>> elapsed time: 6.068758627 seconds (56713996 bytes allocated)
>>
>> julia> rmprocs([2, 3])
>> :ok
>>
>> julia> @time mainfunction(1000, shared = false)
>> elapsed time: 0.717638344 seconds (40357664 bytes allocated)
>>
>> julia> @time mainfunction(1000, shared = true)
>> elapsed time: 0.702174085 seconds (32680628 bytes allocated)
>>
>> So, with a normal array it's slow as expected, and it is faster with the
>> shared array, but what seems to happen is that with the normal array cpu
>> usage is 100 % on two cores but with the shared array cpu usage spikes
>> for a fraction of a second and then for the remaining nearly 6 seconds
>> it's at around 10 %. Can anyone reproduce this? Am I just doing
>> something wrong with shared arrays.
>>
>> Slightly related note: is there now a way to create a random shared
>> array? https://github.com/JuliaLang/julia/pull/4939 and the latest docs
>> don't mention this.
>>
>
>

Re: [julia-users] SharedArray oddities

2014-03-27 Thread Amit Murthy

There is a pattern here. For a set of pids, the cumulative sum is 40
milliseconds. In a SharedArray, RemoteRefs are maintained on the creating
pid (in this case 1) to the shmem mappings on each of the workers. I think
they are referring back to pid 1 to fetch the local mapping when the shared
array object is passed in the remotecall_fetch call, and hence all the
workers are stuck on pid 1 becoming free to service these calls.


On Thu, Mar 27, 2014 at 5:58 PM, Amit Murthy  wrote:

> Some more weirdness
>
> Starting with julia -p 8
>
> A=Base.shmem_fill(1, (1000,1000))
>
> Using 2 workers:
> for i in 1:100
>  t1 = time(); p=2+(i%2); remotecall_fetch(p, x->1, A); t2=time();
> println("@ $p ", int((t2-t1) * 1000))
> end
>
> prints
>
> ...
> @ 3 8
> @ 2 32
> @ 3 8
> @ 2 32
> @ 3 8
> @ 2 32
> @ 3 8
> @ 2 32
>
>
> Notice that pid 2 always takes 32 milliseconds while pid 3 always takes 8
>
>
>
> With 4 workers:
>
> for i in 1:100
>  t1 = time(); p=2+(i%4); remotecall_fetch(p, x->1, A); t2=time();
> println("@ $p ", int((t2-t1) * 1000))
> end
>
> ...
> @ 2 31
> @ 3 4
> @ 4 4
> @ 5 1
> @ 2 31
> @ 3 4
> @ 4 4
> @ 5 1
> @ 2 31
> @ 3 4
> @ 4 4
> @ 5 1
> @ 2 31
>
>
> Now pid 2 always takes 31 millisecs, pids 3&4, 4 and pid 5 1 millisecond
>
> With 8 workers:
>
> for i in 1:100
>  t1 = time(); p=2+(i%8); remotecall_fetch(p, x->1, A); t2=time();
> println("@ $p ", int((t2-t1) * 1000))
> end
>
> 
> @ 2 20
> @ 3 4
> @ 4 1
> @ 5 3
> @ 6 4
> @ 7 1
> @ 8 2
> @ 9 4
> @ 2 20
> @ 3 4
> @ 4 1
> @ 5 3
> @ 6 4
> @ 7 1
> @ 8 2
> @ 9 4
> @ 2 20
> @ 3 4
> @ 4 1
> @ 5 3
> @ 6 4
> @ 7 1
> @ 8 3
> @ 9 4
> @ 2 20
> @ 3 4
> @ 4 1
> @ 5 3
> @ 6 4
>
>
> pid 2 is always 20 milliseconds while the rest are pretty consistent too.
>
> Any explanations?
>
>
>
>
>
>
>
> On Thu, Mar 27, 2014 at 5:24 PM, Amit Murthy wrote:
>
>> I think the code does not do what you want.
>>
>> In the non-shared case you are sending a 10^6 integer array over the
>> network 1000 times and summing it as many times. Most of the time is the
>> network traffic time. Reduce 'n' to say 10, and you will what I mean
>>
>> In the shared case you are not sending the array over the network but
>> still summing the entire array 1000 times. Some of the remotecall_fetch
>> calls seems to be taking 40 milli seconds extra time which adds to the
>> total.
>>
>> shared time of 6 seconds being less than the 15 seconds for non-shared
>> seems to be just incidental.
>>
>> I don't yet have an explanation for the extra 40 millseconds per
>> remotecall_fetch (for some calls only) in the shared case.
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg wrote:
>>
>>> Hi,
>>> I'm having some trouble figuring out exactly how I'm supposed to use
>>> SharedArrays - I might just be misunderstanding them or else something
>>> odd is happening with them.
>>>
>>> I'm trying to do some parallel computing which looks a bit like this
>>> test case:
>>>
>>> function createdata(shared)
>>> const n = 1000
>>> if shared
>>> A = SharedArray(Uint, (n, n))
>>> else
>>> A = Array(Uint, (n, n))
>>> end
>>> for i = 1:n, j = 1:n
>>> A[i, j] = rand(Uint)
>>> end
>>>
>>> return n, A
>>> end
>>>
>>> function mainfunction(r; shared = false)
>>> n, A = createdata(shared)
>>>
>>> i = 1
>>> nextidx() = (idx = i; i += 1; idx)
>>>
>>> @sync begin
>>> for p in workers()
>>> @async begin
>>> while true
>>> idx = nextidx()
>>> if idx > r
>>> break
>>> end
>>> found, s = remotecall_fetch(p, parfunction, n, A)
>>> end
>>> end
>>> end
>>> end
>>> end
>>>
>>> function parfunction(n::Int, A::Array{Uint, 2})
>>> # possibly do some other computation here independent of shared
>>> arrays
>>> s = sum(A)
>>> return false, s
>>> end
>>>
>>> function parfuncti

Re: [julia-users] worker/proc nomenclature

2014-03-27 Thread Amit Murthy

Ah! OK. Makes sense.


On Thu, Mar 27, 2014 at 5:15 PM, Ben Arthur  wrote:

> i guess i should be more explicit in my suggestion that we rename
> addprocs() to addworkers(), and rmprocs() to rmworkers().  this new
> nomenclature would be more precise, b/c you can't just add/rm any proc with
> these functions, just worker procs.
>
> to be even more consistent, we should probably also consider changing the
> -p flag on the command line to -w.
>

Re: [julia-users] SharedArray oddities

2014-03-27 Thread Amit Murthy

No explanation for the uneven distribution of the 40 milliseconds though.


On Thu, Mar 27, 2014 at 6:11 PM, Amit Murthy  wrote:

> There is a pattern here. For a set of pids, the cumulative sum is 40
> milliseconds. In a SharedArray, RemoteRefs are maintained on the creating
> pid (in this case 1) to the shmem mappings on each of the workers. I think
> they are referring back to pid 1 to fetch the local mapping when the shared
> array object is passed in the remotecall_fetch call, and hence all the
> workers are stuck on pid 1 becoming free to service these calls.
>
>
> On Thu, Mar 27, 2014 at 5:58 PM, Amit Murthy wrote:
>
>> Some more weirdness
>>
>> Starting with julia -p 8
>>
>> A=Base.shmem_fill(1, (1000,1000))
>>
>> Using 2 workers:
>> for i in 1:100
>>  t1 = time(); p=2+(i%2); remotecall_fetch(p, x->1, A); t2=time();
>> println("@ $p ", int((t2-t1) * 1000))
>> end
>>
>> prints
>>
>> ...
>> @ 3 8
>> @ 2 32
>> @ 3 8
>> @ 2 32
>> @ 3 8
>> @ 2 32
>> @ 3 8
>> @ 2 32
>>
>>
>> Notice that pid 2 always takes 32 milliseconds while pid 3 always takes 8
>>
>>
>>
>> With 4 workers:
>>
>> for i in 1:100
>>  t1 = time(); p=2+(i%4); remotecall_fetch(p, x->1, A); t2=time();
>> println("@ $p ", int((t2-t1) * 1000))
>> end
>>
>> ...
>> @ 2 31
>> @ 3 4
>> @ 4 4
>> @ 5 1
>> @ 2 31
>> @ 3 4
>> @ 4 4
>> @ 5 1
>> @ 2 31
>> @ 3 4
>> @ 4 4
>> @ 5 1
>> @ 2 31
>>
>>
>> Now pid 2 always takes 31 millisecs, pids 3&4, 4 and pid 5 1 millisecond
>>
>> With 8 workers:
>>
>> for i in 1:100
>>  t1 = time(); p=2+(i%8); remotecall_fetch(p, x->1, A); t2=time();
>> println("@ $p ", int((t2-t1) * 1000))
>> end
>>
>> 
>> @ 2 20
>> @ 3 4
>> @ 4 1
>> @ 5 3
>> @ 6 4
>> @ 7 1
>> @ 8 2
>> @ 9 4
>> @ 2 20
>> @ 3 4
>> @ 4 1
>> @ 5 3
>> @ 6 4
>> @ 7 1
>> @ 8 2
>> @ 9 4
>> @ 2 20
>> @ 3 4
>> @ 4 1
>> @ 5 3
>> @ 6 4
>> @ 7 1
>> @ 8 3
>> @ 9 4
>> @ 2 20
>> @ 3 4
>> @ 4 1
>> @ 5 3
>> @ 6 4
>>
>>
>> pid 2 is always 20 milliseconds while the rest are pretty consistent too.
>>
>> Any explanations?
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 27, 2014 at 5:24 PM, Amit Murthy wrote:
>>
>>> I think the code does not do what you want.
>>>
>>> In the non-shared case you are sending a 10^6 integer array over the
>>> network 1000 times and summing it as many times. Most of the time is the
>>> network traffic time. Reduce 'n' to say 10, and you will what I mean
>>>
>>> In the shared case you are not sending the array over the network but
>>> still summing the entire array 1000 times. Some of the remotecall_fetch
>>> calls seems to be taking 40 milli seconds extra time which adds to the
>>> total.
>>>
>>> shared time of 6 seconds being less than the 15 seconds for non-shared
>>> seems to be just incidental.
>>>
>>> I don't yet have an explanation for the extra 40 millseconds per
>>> remotecall_fetch (for some calls only) in the shared case.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg 
>>> wrote:
>>>
>>>> Hi,
>>>> I'm having some trouble figuring out exactly how I'm supposed to use
>>>> SharedArrays - I might just be misunderstanding them or else something
>>>> odd is happening with them.
>>>>
>>>> I'm trying to do some parallel computing which looks a bit like this
>>>> test case:
>>>>
>>>> function createdata(shared)
>>>> const n = 1000
>>>> if shared
>>>> A = SharedArray(Uint, (n, n))
>>>> else
>>>> A = Array(Uint, (n, n))
>>>> end
>>>> for i = 1:n, j = 1:n
>>>> A[i, j] = rand(Uint)
>>>> end
>>>>
>>>> return n, A
>>>> end
>>>>
>>>> function mainfunction(r; shared = false)
>>>> n, A = createdata(shared)
>>>>
>>>> i = 1
>>>> nextidx() = (idx = i; i += 1; idx)
>>>>
&

Re: [julia-users] SharedArray oddities

2014-03-27 Thread Amit Murthy

Hi Tim,

The issue of the extra 40 milliseconds is related to how RemoteRefs to the
individuals mappings are fetched. I don't quite get how pmap_bw is related
to this


On Thu, Mar 27, 2014 at 6:27 PM, Tim Holy  wrote:

> This is why, with my original implementation of SharedArrays
> (oh-so-long-ago),
> I created pmap_bw, to do busy-wait on the return value of a SharedArray
> computation. The amusing part is that you can use a SharedArray to do the
> synchronization among processes.
>
> --Tim
>
> On Thursday, March 27, 2014 06:11:12 PM Amit Murthy wrote:
> > There is a pattern here. For a set of pids, the cumulative sum is 40
> > milliseconds. In a SharedArray, RemoteRefs are maintained on the creating
> > pid (in this case 1) to the shmem mappings on each of the workers. I
> think
> > they are referring back to pid 1 to fetch the local mapping when the
> shared
> > array object is passed in the remotecall_fetch call, and hence all the
> > workers are stuck on pid 1 becoming free to service these calls.
> >
> > On Thu, Mar 27, 2014 at 5:58 PM, Amit Murthy 
> wrote:
> > > Some more weirdness
> > >
> > > Starting with julia -p 8
> > >
> > > A=Base.shmem_fill(1, (1000,1000))
> > >
> > > Using 2 workers:
> > > for i in 1:100
> > >
> > >  t1 = time(); p=2+(i%2); remotecall_fetch(p, x->1, A);
> t2=time();
> > >
> > > println("@ $p ", int((t2-t1) * 1000))
> > > end
> > >
> > > prints
> > >
> > > ...
> > > @ 3 8
> > > @ 2 32
> > > @ 3 8
> > > @ 2 32
> > > @ 3 8
> > > @ 2 32
> > > @ 3 8
> > > @ 2 32
> > >
> > >
> > > Notice that pid 2 always takes 32 milliseconds while pid 3 always
> takes 8
> > >
> > >
> > >
> > > With 4 workers:
> > >
> > > for i in 1:100
> > >
> > >  t1 = time(); p=2+(i%4); remotecall_fetch(p, x->1, A);
> t2=time();
> > >
> > > println("@ $p ", int((t2-t1) * 1000))
> > > end
> > >
> > > ...
> > > @ 2 31
> > > @ 3 4
> > > @ 4 4
> > > @ 5 1
> > > @ 2 31
> > > @ 3 4
> > > @ 4 4
> > > @ 5 1
> > > @ 2 31
> > > @ 3 4
> > > @ 4 4
> > > @ 5 1
> > > @ 2 31
> > >
> > >
> > > Now pid 2 always takes 31 millisecs, pids 3&4, 4 and pid 5 1
> millisecond
> > >
> > > With 8 workers:
> > >
> > > for i in 1:100
> > >
> > >  t1 = time(); p=2+(i%8); remotecall_fetch(p, x->1, A);
> t2=time();
> > >
> > > println("@ $p ", int((t2-t1) * 1000))
> > > end
> > >
> > > 
> > > @ 2 20
> > > @ 3 4
> > > @ 4 1
> > > @ 5 3
> > > @ 6 4
> > > @ 7 1
> > > @ 8 2
> > > @ 9 4
> > > @ 2 20
> > > @ 3 4
> > > @ 4 1
> > > @ 5 3
> > > @ 6 4
> > > @ 7 1
> > > @ 8 2
> > > @ 9 4
> > > @ 2 20
> > > @ 3 4
> > > @ 4 1
> > > @ 5 3
> > > @ 6 4
> > > @ 7 1
> > > @ 8 3
> > > @ 9 4
> > > @ 2 20
> > > @ 3 4
> > > @ 4 1
> > > @ 5 3
> > > @ 6 4
> > >
> > >
> > > pid 2 is always 20 milliseconds while the rest are pretty consistent
> too.
> > >
> > > Any explanations?
> > >
> > > On Thu, Mar 27, 2014 at 5:24 PM, Amit Murthy  >wrote:
> > >> I think the code does not do what you want.
> > >>
> > >> In the non-shared case you are sending a 10^6 integer array over the
> > >> network 1000 times and summing it as many times. Most of the time is
> the
> > >> network traffic time. Reduce 'n' to say 10, and you will what I mean
> > >>
> > >> In the shared case you are not sending the array over the network but
> > >> still summing the entire array 1000 times. Some of the
> remotecall_fetch
> > >> calls seems to be taking 40 milli seconds extra time which adds to the
> > >> total.
> > >>
> > >> shared time of 6 seconds being less than the 15 seconds for non-shared
> > >> seems to be just incidental.
> > >>
> > >> I don't yet have an explanation for the extra 40 millseconds per
> > >> remotecall_fetch (for some calls only) in the shared ca

Re: [julia-users] SharedArray oddities

2014-03-27 Thread Amit Murthy

Hi Mikael,

This seems to be a bug in the SharedArray constructor. For sharedarrays of
length less than the number of participating pids only the first few pids
are used. Since the length of s = SharedArray(Uint, (1)) is 1, it is mapped
only on the first process.

For now a workaround is to just create s = SharedArray(Uint, (10)) or
something and just use the first element.



On Thu, Mar 27, 2014 at 7:13 PM, Mikael Simberg wrote:

>  Yes, you're at least half-right about it not doing quite what I want. Or
> let's say I was expecting the majority of the overhead to come from having
> to send the array over to each process, but what I wasn't expecting was
> that getting a boolean and an integer back would take so much time (and
> thus I was expecting using a SharedArray would have been at least
> comparable to keeping everything local). Indeed, if I just do a remotecall
> (i.e. without the fetch) it is faster with multiple processes which is what
> I was expecting.
>
> What I essentially want to do in the end is that the parfunction() is
> successful with some probability and then I want to return some object from
> the calculations there, but in general I will not want to fetch anything.
> What would be the "correct" way to do that? If I have the following code:
>
> function mainfunction(r)
>
> const n = 1000
> A = SharedArray(Uint, (n, n))
> for i = 1:n, j = 1:n
> A[i, j] = rand(Uint)
> end
> s = SharedArray(Uint, (1))
>
> i = 1
> nextidx() = (idx = i; i += 1; idx)
>
> println(s)
> @sync begin
> for p in workers()
> @async begin
> while true
> idx = nextidx()
> if idx > r
> break
> end
> remotecall(p, parfunction, A, s)
> end
> end
> end
> end
> println(s)
> end
>
> function parfunction(A::SharedArray{Uint, 2}, s::SharedArray{Uint, 1})
> d = sum(A)
> if rand(0:1000) == 0
> println("success")
> s[1] = d
> end
> end
>
> and run
> julia -p 2
> julia> reload("testpar.jl")
> julia> @time mainfunction(5000)
>
> I get ERROR: SharedArray cannot be used on a non-participating process,
> although s should according to my logic be available on all processes (I'm
> assuming it's s that's causing it because it's fine if I remove all traces
> of s).
>
> On Thu, Mar 27, 2014, at 4:54, Amit Murthy wrote:
>
> I think the code does not do what you want.
>
> In the non-shared case you are sending a 10^6 integer array over the
> network 1000 times and summing it as many times. Most of the time is the
> network traffic time. Reduce 'n' to say 10, and you will what I mean
>
> In the shared case you are not sending the array over the network but
> still summing the entire array 1000 times. Some of the remotecall_fetch
> calls seems to be taking 40 milli seconds extra time which adds to the
> total.
>
> shared time of 6 seconds being less than the 15 seconds for non-shared
> seems to be just incidental.
>
> I don't yet have an explanation for the extra 40 millseconds per
> remotecall_fetch (for some calls only) in the shared case.
>
>
>
>
>
>
> On Thu, Mar 27, 2014 at 2:50 PM, Mikael Simberg wrote:
>
> Hi,
>  I'm having some trouble figuring out exactly how I'm supposed to use
>  SharedArrays - I might just be misunderstanding them or else something
>  odd is happening with them.
>
>  I'm trying to do some parallel computing which looks a bit like this
>  test case:
>
>  function createdata(shared)
>  const n = 1000
>  if shared
>  A = SharedArray(Uint, (n, n))
>  else
>  A = Array(Uint, (n, n))
>  end
>  for i = 1:n, j = 1:n
>  A[i, j] = rand(Uint)
>  end
>
>  return n, A
>  end
>
>  function mainfunction(r; shared = false)
>  n, A = createdata(shared)
>
>  i = 1
>  nextidx() = (idx = i; i += 1; idx)
>
>  @sync begin
>  for p in workers()
>  @async begin
>  while true
>  idx = nextidx()
>  if idx > r
>  break
>  end
>  found, s = remotecall_fetch(p, parfunction, n, A)
>  end
>  end
>  end
>  end
>  end
>
>  function parfunction(n::Int, A::Array{Uint, 2})
>  # possibly do some other computation here independent of shared
>

Re: [julia-users] Re: Request to make peer address available

2014-03-27 Thread Amit Murthy

Yes, this would be nice to have - both for TCP as well as UDP.


On Thu, Mar 27, 2014 at 8:02 PM, Bob Cowdery  wrote:

> Perhaps I can ask a more specific question and hopefully get some help.
> The parameter in socket.jl _uv_hook_recv() is addr::Ptr{Void} which as far
> as I can see from libuv source is a pointer to this:
>
> struct sockaddr_in uv_ip4_addr(const char* ip, int port);
>
> Can I dig out the ip and port in julia code or does this have to be done by 
> calling another C function. I see lots of ccall(jl_...) in socket.jl which I 
> assume are helper functions. I wonder if there is a suitable helper function 
> already.
>
> Bob
>
>
>
> On Tuesday, March 25, 2014 9:22:41 PM UTC, Bob Cowdery wrote:
>>
>> I'm sending a UDP broadcast  and getting a response using recv(). I need
>> to now start a conversation with the other end.
>>
>> As far as I can tell the _uv_hook_recv(sock::UdpSocket, nread::Ptr{Void},
>> buf_addr::Ptr{Void}, buf_size::Int32, addr::Ptr{Void}, flags::Int32) which
>> I believe is the read callback contains the host address (param addr). Is
>> there a way I can get hold of this please.
>>
>> Thanks
>> Bob
>>
>

Re: [julia-users] Terminate a task

2014-03-27 Thread Amit Murthy

Issue created : https://github.com/JuliaLang/julia/issues/6283


On Mon, Mar 24, 2014 at 10:14 PM, Jameson Nash  wrote:

> Alternatively, might be fun to make a ccall_in_worker_thread
> intrinsic which handles all of the fiddly gc details and only blocks the
> local task (and/or returns a remoteref)
>
>
> On Monday, March 24, 2014, Stefan Karpinski  wrote:
>
>> Yes, we need this ability. Externally terminating and otherwise
>> interacting with tasks is a good way to deal with things like timeouts and
>> cancelling distributed work.
>>
>>
>> On Mon, Mar 24, 2014 at 1:01 AM, Amit Murthy wrote:
>>
>>> I think currently the only way to interrupt a task is when it is blocked
>>> on a condition variable by using the "notify" call. Will be good to have a
>>> "terminate(t::Task)" call.
>>>
>>>
>>> On Sun, Mar 23, 2014 at 9:08 PM, Bob Cowdery wrote:
>>>
>>>> Could you clarify please. If its on a blocking call how do I throw an
>>>> error and if its not complete why would the gc delete it.
>>>>
>>>>
>>>> On Sunday, March 23, 2014 3:27:18 PM UTC, Bob Cowdery wrote:
>>>>>
>>>>> Is there any way to terminate a task that is stuck on a blocking call.
>>>>> I can see that I can call istaskdone() or start a Timeout to know that the
>>>>> task is potentially blocked but I don't see anything in Task that lets me
>>>>> terminate it.
>>>>>
>>>>> Bob
>>>>>
>>>>
>>>
>>

Re: [julia-users] Re: new REPL

2014-03-30 Thread Amit Murthy

If we are on the latest master, I guess we can safely `rm -Rf
deps/readline-6.2` ?


On Sun, Mar 30, 2014 at 11:20 AM, Shaun Walbridge  wrote:

> Yeah, fair enough -- that'd work fine for my needs.
>
>
> On Sat, Mar 29, 2014 at 9:28 PM, Keno Fischer <
> kfisc...@college.harvard.edu> wrote:
>
>> The problem is the ambiguity between whether you typed the newline or
>> whether it's a record separator. There is hacks to make it work, but I
>> would prefer to have a non-text character to keep it simple. If you don't
>> care when grepping, couldn't you just do
>>
>> ~/.julia_history2 > tr '\0' '\n' | grep ...
>>
>> ?
>>
>>
>> On Sat, Mar 29, 2014 at 9:20 PM, Shaun Walbridge <
>> shaun.walbri...@gmail.com> wrote:
>>
>>> Wonderful work Keno and Mike! This is a great addition.
>>>
>>> Would it be possible to retain the standard Readline history file format
>>> of separating elements by newlines? It looks like history elements in my
>>> new .julia_history2 file are NULL terminated, which makes the file harder
>>> to use with line-oriented shell tools like ag/ack/grep.
>>>
>>>
>>> On Sat, Mar 29, 2014 at 9:09 PM, Keno Fischer <
>>> kfisc...@college.harvard.edu> wrote:
>>>
 ~/.julia_history2 unless Mike changed it. The format is different.


 On Sat, Mar 29, 2014 at 9:06 PM, J Luis  wrote:

> Where does it keeps the commands history? The old  .julia_history is
> not used (and lost) anymore but the commands "memory" is preserved, though
> reset to blank.
>
> Domingo, 30 de Março de 2014 0:57:13 UTC, Jake Bolewski escreveu:
>
>> This is really great work Keno and Mike.
>>
>> I think a great improvemnt would be to make completions were a bit
>> more modular.  That way custom completion callbacks could be added in at
>> runtime in your .juliarc file.  I'm trying to get zsh to do the shell
>> completions but that would only interest people who use zsh :-)
>>
>> Do we have an ETA on when 0.3 will be released?  This seemed like one
>> of the bigger blockers.
>>
>> Best,
>> Jake
>>
>> On Saturday, March 29, 2014 3:59:19 PM UTC-4, Stefan Karpinski wrote:
>>>
>>> Good news, everyone! The pure-Julia read eval print loop (REPL) that
>>> Keno Fischer developed and Mike Nolta integrated into base Julia has 
>>> just
>>> been merged. There are a number of nice things about changing from the 
>>> old
>>> REPL to this new one, in no particular order:
>>>
>>>- The old REPL used the GNU readline library, which we had
>>>hacked far beyond what it was ever meant to do. This made modifying 
>>> it a
>>>bit terrifying and thus issues with it tended to get ignored or 
>>> shelved as
>>>"we'll be able to do that in the new REPL".
>>>- The new REPL, is pretty clean, simple Julia code. Seriously - 
>>> terminal
>>>
>>> support,
>>>line 
>>> editing,
>>>and the REPL 
>>> itselfare 
>>> less than 2000 lines of code -
>>>*total*. This works out to a net code reduction of 33233 lines
>>>of code (GNU readline is 34640 lines of C), while 
>>> *gaining*functionality. That has to be a project record.
>>>- The new code is infinitely easier to modify, fix and improve,
>>>so REPL-replated bugs will probably get fixed lickety split going 
>>> forward.
>>>- The old GNU readline REPL was one of our GPL library
>>>dependencies that make the total Julia "product" GPL. We'd like to 
>>> shed
>>>these or make them optional to allow for a non-GPL, MIT-licensed 
>>> Julia
>>>distribution and this is a major step toward that goal.
>>>- The new REPL code already has fancy features that you wouldn't
>>>even think about doing with readline. Try typing "?" or ";" at the 
>>> prompt
>>>and see the REPL mode change form "julia>" to "help>" or "shell>". 
>>> Cool,
>>>huh?
>>>- The new REPL is noticeably snappier than the old one. Combined
>>>with the static compilation of julia introduced in 0.3, going from 
>>> zero to
>>>REPL is pretty quick these days.
>>>- Since full-fledged line editing functionality is now built
>>>into Base Julia, we can use it everywhere without worrying what 
>>> libraries
>>>people have installed. Once we settle on a good API, you can expect 
>>> that
>>>user code that needs to prompt for input will be just as slick as 
>>> the REPL
>>>itself.
>>>
>>> There will, of course, be some hitches and road bumps, but now that
>>> this is merged and everyone using Julia master will be testing it, they
>>> should get sorted out in shor

Re: [julia-users] synchronous loading and asynchronous execution, elegantly?

2014-04-13 Thread Amit Murthy

One option is :

# Create a RemoteRef on each worker to store the data
rr_files = map(x->RemoteRef(x), workers())

# Create a RemoteRef on each worker to store results
rr_results = map(x->RemoteRef(x), workers())

for i,p in enumerate(workers())
  remotecall_wait(p, load, files_on_ith_worker, rr_files[i])  #
files_on_ith_worker are that part of listoffiles on ith worker
end

for i,p in enumerate(workers())
  remotecall(p, stats, rr_files[i], rr_results[i]) # stats should process
data in the first remoteref and store the result in the second one
end

# fetch, wait and continue processing.



On Fri, Apr 11, 2014 at 8:12 PM, David van Leeuwen <
david.vanleeu...@gmail.com> wrote:

> Hello,
>
> I need to compute aggregate statistics---say, the mean over columns, but
> later these will become more computationally intensive---for a large
> collection of matrices.   These matrices vary in the number of rows, but
> have the same number of columns.
>
> I am looking for a general scheme that can utilise cluster parallel
> execution and will deal with various sorts of granularity of number of size
> of matrices.  The combined size of the matrices may be larger than the
> combined memory of the computing cluster, and I have my matrices in
> separate files.
>
> In my first attempts I load the data in memory in the worker processes,
> compute the statistics, and reduce the results in the main process.  The
> coding is fairly straight forward with pmap().
>
> function stats(x)
> ## compute the statistics
> end
>
> function load(file::String)
> ## load the data
> end
>
> result = pmap(s->stats(load(s)), listoffiles)
> reducethestats(result)
>
> However, for low-complexity stats computations--say, the sum over
> columns---it seems that the system is thrashing terribly because all
> processes start to load the data from (network) disc more/less at the same
> time, and data loading take longer than the computation.  The thrashing
> effect is quite large, I loose a factor of 100 or so over serial
> loading/execution time.  Of course one would not want to parallellize in
> this particular case, but as I said before, the statistics become more
> computationally intensive later in the algorithm, and then the parallel
> computing is very beneficial.
>
> The same data is used in multiple iterations of the algorithm, so it can
> be beneficial to map the same matrices to the same worker processes and
> have OS file caching reduce load times.  So pmap() is probably not a good
> choice, anyway.
>
> My question is: is there an efficient approach where the data is loaded
> synchronously in the worker processes---so that they don't swamp the
> network---an then later compute the stats asynchronously?
>
> One way could be (a simplified example that needs lots of memory in the
> main process)
>
>
> result = pmap(stats, map(load, listoffiles))
>
>
> but this is not efficient as it needs to serialize the loaded data in the
> main process and transfer it to the workers.  And for larger problems
> nothing is cached locally.   There is some better control with remotecall()
> and fetch(), but I don't see a way to leave the loaded data in a
> remotecall() process and use it in a next remotecall() without fetch()ing
> it to the main process.  Maybe I am looking for something like
>
> worker(i) = workers[1 .+ (i % nworkers())]
>
>
> for i in 1:length(workers)
>   remotecall_wait(worker(i), load, listoffiles[i]) ## but keep the data
> in the worker
> end
> for i in 1:length(workers)
>   remotecall(worker(i), stats, ## the remote result)
> end
> ## fetch, and do the rest of the files in a task-based way like pmap()
>
>
> Any ideas how this can be accomplished?
>
> Thanks,
>
> ---david
>
>
>
>
>
>

Re: [julia-users] Pkg.tag

2014-04-16 Thread Amit Murthy

What does Pkg.publish() show?


On Wed, Apr 16, 2014 at 4:17 PM, Simon Byrne  wrote:

> I'm trying to push a new package to METADATA, but I'm having some trouble.
>
> I've run
>
> julia> Pkg.tag("KDE")
>
> INFO: Tagging KDE v0.0.1
>
> which tags the KDE repo, but doesn't make any changes to METADATA repo.
>
> What am I missing? (I'm running the julia nightly)
>
> simon
>

Re: [julia-users] Pkg.tag

2014-04-16 Thread Amit Murthy

My bad. I manually updated METADATA for HTTPClient package and may have
screwed up. Give me a minute.


On Wed, Apr 16, 2014 at 4:29 PM, Simon Byrne  wrote:

> Hmm, something's not right here:
>
> julia> Pkg.update()
>
> INFO: Updating METADATA...
>
> INFO: INFO: INFO: INFO: Updating GSLDists...Updating BoostDists...Updating
> RmathDist...Updating KDE...
>
>
>
>
> INFO: Computing changes...
>
> INFO: No packages to install, update or remove
>
>
> julia> Pkg.publish()
>
> ERROR: METADATA is behind origin/metadata-v2 – run `Pkg.update()` before
> publishing
>
>  in publish at pkg/entry.jl:308
>
>  in anonymous at pkg/dir.jl:28
>
>  in cd at file.jl:20
>
>  in cd at pkg/dir.jl:28
>
>  in publish at pkg.jl:57
>
> Any ideas?
>
> On Wednesday, 16 April 2014 11:52:58 UTC+1, Amit Murthy wrote:
>
>> What does Pkg.publish() show?
>>
>>
>> On Wed, Apr 16, 2014 at 4:17 PM, Simon Byrne  wrote:
>>
>>> I'm trying to push a new package to METADATA, but I'm having some
>>> trouble.
>>>
>>> I've run
>>>
>>> julia> Pkg.tag("KDE")
>>>
>>> INFO: Tagging KDE v0.0.1
>>>
>>> which tags the KDE repo, but doesn't make any changes to METADATA repo.
>>>
>>> What am I missing? (I'm running the julia nightly)
>>>
>>> simon
>>>
>>
>>

Re: [julia-users] Pkg.tag

2014-04-16 Thread Amit Murthy

Sorry, couldn't find anything wrong with my edits. I don't see the error
you are seeing on my machine though.


On Wed, Apr 16, 2014 at 4:43 PM, Amit Murthy  wrote:

> My bad. I manually updated METADATA for HTTPClient package and may have
> screwed up. Give me a minute.
>
>
> On Wed, Apr 16, 2014 at 4:29 PM, Simon Byrne  wrote:
>
>> Hmm, something's not right here:
>>
>> julia> Pkg.update()
>>
>> INFO: Updating METADATA...
>>
>> INFO: INFO: INFO: INFO: Updating GSLDists...Updating
>> BoostDists...Updating RmathDist...Updating KDE...
>>
>>
>>
>>
>> INFO: Computing changes...
>>
>> INFO: No packages to install, update or remove
>>
>>
>> julia> Pkg.publish()
>>
>> ERROR: METADATA is behind origin/metadata-v2 – run `Pkg.update()` before
>> publishing
>>
>>  in publish at pkg/entry.jl:308
>>
>>  in anonymous at pkg/dir.jl:28
>>
>>  in cd at file.jl:20
>>
>>  in cd at pkg/dir.jl:28
>>
>>  in publish at pkg.jl:57
>>
>> Any ideas?
>>
>> On Wednesday, 16 April 2014 11:52:58 UTC+1, Amit Murthy wrote:
>>
>>> What does Pkg.publish() show?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 4:17 PM, Simon Byrne  wrote:
>>>
>>>> I'm trying to push a new package to METADATA, but I'm having some
>>>> trouble.
>>>>
>>>> I've run
>>>>
>>>> julia> Pkg.tag("KDE")
>>>>
>>>> INFO: Tagging KDE v0.0.1
>>>>
>>>> which tags the KDE repo, but doesn't make any changes to METADATA repo.
>>>>
>>>> What am I missing? (I'm running the julia nightly)
>>>>
>>>> simon
>>>>
>>>
>>>
>

Re: [julia-users] Expected behviour of addprocs(n, cman=SSHManager(; machines=machines)

2014-04-21 Thread Amit Murthy

For the ssh manager, each "machine" instance starts one instance on that
particular machine. So, just an "addprocs(machines)" should do what you
want.


On Mon, Apr 21, 2014 at 11:19 PM, Sam Kaplan  wrote:

> Hello,
>
> I have a quick question about the SSHManager, and how it works with
> addprocs.  If,
>
> machines = [machine1, machine2]
>
> and I do,
>
> cman = Base.SSHManager(;machines=machines)
> addprocs(2, cman=cman)
> for pid in workers()
> fetch(@spawnat pid run(`hostname`))
> end
>
> Then I see that one process is running on 'machine1', and the other on
> 'machine2'.  On the other hand, if I do:
>
> cman = Base.SSHManager(;machines=machines)
> addprocs(1, cman=cman)
> addprocs(1, cman=cman)
> for pid in workers()
> fetch(@spawnat pid run(`hostname`))
> end
>
> Then I see that both processes are running on 'machine1'.
>
> Is this expected behaviour, or a bug (or some other misunderstanding of
> mine)?
>
> Thanks!
>
> Sam
>

Re: [julia-users] do block semantics

2014-04-27 Thread Amit Murthy

Without using a do-block, you would need to pass in a function as the first
argument to 'map'.
'open' has a variant where the first argument is again a function that
accepts an open handle.

The do-block syntax in this case just allows you to define the said
function.


On Mon, Apr 28, 2014 at 8:55 AM, Peter Simon  wrote:

> In the Julia manual, the second example in
> block-syntax-for-function-arguments
>  contains
> the following do block:
>
> open("outfile", "w") do f
> write(f, data)
> end
>
> and the documentation states that "The function argument to open receives a 
> handle to the opened file."  I conclude from this that the return value 
> (i.e., the file handle) of the open function is passed to this function f -> 
> write(f, data) that is used as the first argument of open.  So far, so good 
> (I think).  But now I go back and take another look at the first do block 
> example:
>
> map([A, B, C]) do x
> if x < 0 && iseven(x)
> return 0
> elseif x == 0
> return 1
> else
> return x
> endend
>
> I try to interpret this example in light of what I learned from the second 
> example.  The map function has a return value, consisting of the array [A, B, 
> C], modified by applying the function in the do block to each element.  If 
> this example behaved like in the second example, then the output of the map 
> function should be passed as an input to the function defined in the do 
> block.  Clearly this doesn't happen, so the lesson I learned from the second 
> example doesn't apply here, apparently.  Why not?  Under what conditions is 
> the output of the outer function passed as an input to the inner function?
>
> I must be looking at this wrong and would appreciate some help in getting my 
> mind right :-).
>
> Thanks,
>
> Peter
>
>

Re: [julia-users] do block semantics

2014-04-27 Thread Amit Murthy

It is just a way to define an anonymous function. It is not a way to define
an "inner" function in that sense.


On Mon, Apr 28, 2014 at 9:24 AM, Peter Simon  wrote:

> My question concerns where this handle comes from.  Isn't the handle
> coming from the output of 'open'?  Since 'open' is the "outer" function of
> the 'do' construct, then why doesn't the outer function in the first
> example also supply its output as input to its inner function?
>
>
> On Sunday, April 27, 2014 8:40:27 PM UTC-7, Amit Murthy wrote:
>
>> Without using a do-block, you would need to pass in a function as the
>> first argument to 'map'.
>> 'open' has a variant where the first argument is again a function that
>> accepts an open handle.
>>
>> The do-block syntax in this case just allows you to define the said
>> function.
>>
>>
>> On Mon, Apr 28, 2014 at 8:55 AM, Peter Simon  wrote:
>>
>>> In the Julia manual, the second example in block-syntax-for-function-
>>> arguments<http://docs.julialang.org/en/latest/manual/functions/#block-syntax-for-function-arguments>
>>>  contains
>>> the following do block:
>>>
>>> open("outfile", "w") do f
>>> write(f, data)
>>> end
>>>
>>> and the documentation states that "The function argument to open receives a 
>>> handle to the opened file."  I conclude from this that the return value 
>>> (i.e., the file handle) of the open function is passed to this function f 
>>> -> write(f, data) that is used as the first argument of open.  So far, so 
>>> good (I think).  But now I go back and take another look at the first do 
>>> block example:
>>>
>>> map([A, B, C]) do x
>>> if x < 0 && iseven(x)
>>> return 0
>>> elseif x == 0
>>> return 1
>>> else
>>> return x
>>> endend
>>>
>>> I try to interpret this example in light of what I learned from the second 
>>> example.  The map function has a return value, consisting of the array [A, 
>>> B, C], modified by applying the function in the do block to each element.  
>>> If this example behaved like in the second example, then the output of the 
>>> map function should be passed as an input to the function defined in the do 
>>> block.  Clearly this doesn't happen, so the lesson I learned from the 
>>> second example doesn't apply here, apparently.  Why not?  Under what 
>>> conditions is the output of the outer function passed as an input to the 
>>> inner function?
>>>
>>> I must be looking at this wrong and would appreciate some help in getting 
>>> my mind right :-).
>>>
>>> Thanks,
>>>
>>> Peter
>>>
>>>
>>

Re: [julia-users] do block semantics

2014-04-27 Thread Amit Murthy

The actual function as defined in base/io.jl

function open(f::Function, args...)
io = open(args...)
try
f(io)
finally
close(io)
end
end

Just multiple dispatch at work. The 'open' variant without a file handle is
called first.



On Mon, Apr 28, 2014 at 9:44 AM, Peter Simon  wrote:

> Right, I don't have a problem with that.  I simply used "inner" as a way
> to refer to the function that is used as the first argument to the other
> ("outer") function.  Sorry if I abused a conventional meaning of these
> terms.
>
> I would like to know how this anonymous function (in the "open" example)
> is passed the file handle.  My confusion stems from the fact that this
> handle, to my knowledge, is not available until the "open" function
> provides it as its return value.
>
>
> On Sunday, April 27, 2014 8:59:50 PM UTC-7, Amit Murthy wrote:
>
>> It is just a way to define an anonymous function. It is not a way to
>> define an "inner" function in that sense.
>>
>>
>> On Mon, Apr 28, 2014 at 9:24 AM, Peter Simon  wrote:
>>
>>> My question concerns where this handle comes from.  Isn't the handle
>>> coming from the output of 'open'?  Since 'open' is the "outer" function of
>>> the 'do' construct, then why doesn't the outer function in the first
>>> example also supply its output as input to its inner function?
>>>
>>>
>>> On Sunday, April 27, 2014 8:40:27 PM UTC-7, Amit Murthy wrote:
>>>
>>>> Without using a do-block, you would need to pass in a function as the
>>>> first argument to 'map'.
>>>> 'open' has a variant where the first argument is again a function that
>>>> accepts an open handle.
>>>>
>>>> The do-block syntax in this case just allows you to define the said
>>>> function.
>>>>
>>>>
>>>> On Mon, Apr 28, 2014 at 8:55 AM, Peter Simon wrote:
>>>>
>>>>> In the Julia manual, the second example in block-syntax-for-function-
>>>>> arguments<http://docs.julialang.org/en/latest/manual/functions/#block-syntax-for-function-arguments>
>>>>>  contains
>>>>> the following do block:
>>>>>
>>>>> open("outfile", "w") do f
>>>>> write(f, data)
>>>>> end
>>>>>
>>>>> and the documentation states that "The function argument to open receives 
>>>>> a handle to the opened file."  I conclude from this that the return value 
>>>>> (i.e., the file handle) of the open function is passed to this function f 
>>>>> -> write(f, data) that is used as the first argument of open.  So far, so 
>>>>> good (I think).  But now I go back and take another look at the first do 
>>>>> block example:
>>>>>
>>>>> map([A, B, C]) do x
>>>>> if x < 0 && iseven(x)
>>>>> return 0
>>>>> elseif x == 0
>>>>> return 1
>>>>> else
>>>>> return x
>>>>> endend
>>>>>
>>>>> I try to interpret this example in light of what I learned from the 
>>>>> second example.  The map function has a return value, consisting of the 
>>>>> array [A, B, C], modified by applying the function in the do block to 
>>>>> each element.  If this example behaved like in the second example, then 
>>>>> the output of the map function should be passed as an input to the 
>>>>> function defined in the do block.  Clearly this doesn't happen, so the 
>>>>> lesson I learned from the second example doesn't apply here, apparently.  
>>>>> Why not?  Under what conditions is the output of the outer function 
>>>>> passed as an input to the inner function?
>>>>>
>>>>> I must be looking at this wrong and would appreciate some help in getting 
>>>>> my mind right :-).
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>
>>

Re: [julia-users] Re: Problem loading libcurl in windows environment: error compiling setup_easy_handle: could not load module libcurl

2014-04-30 Thread Amit Murthy

Merged. Thanks.


On Tue, Apr 29, 2014 at 11:56 PM, Tony Kelman  wrote:

> Hi. When you cross-post the same question to both the mailing list and as
> a new Github issue, it would help people keep track of things if you could
> post links from one to the other -
> https://github.com/JuliaLang/julia/issues/6687
>
> As I wrote on the Github issue, this is due to the LibCURL.jl package (
> https://github.com/amitmurthy/LibCURL.jl, not forio/Curl.jl which is
> deprecated in lieu of Requests.jl) not handling its binary dependencies
> properly for Windows users. The Requests.jl package appears to have its
> dependencies in better order (and it even has tests! yay for tests) and
> could potentially be a replacement for LibCURL.jl and HTTPClient.jl, but
> that would require rewriting some of the Plotly code's HTTP posts. I gave
> that a quick try a few weeks ago when I was looking at trying to use
> Plotly.jl myself, but gave up for the time being since simple replacement
> wasn't working.
>
> I think I just worked out how to get LibCURL.jl to automatically download
> the curl library for you on Windows, and opened a pull request here
> https://github.com/amitmurthy/LibCURL.jl/pull/10. If you're in a hurry,
> you should be able to try out my branch by doing
>
> Pkg.rm("LibCURL")
> Pkg.clone("https://github.com/tkelman/LibCURL.jl";)
> Pkg.checkout("LibCURL", "winrpm")
>
> Then restart Julia and see if using Plotly works any better. Remember to
> eventually switch back to the official version of LibCURL, once Windows
> support is working in the original repository.
>
>
> On Tuesday, April 29, 2014 5:59:23 AM UTC-7, sam cooper wrote:
>>
>> When trying to run the Plotly.plot function in julia (running on windows
>> 8 unfortunatley I can't change this having tried extensivley) I am
>> encountering the error message:
>>
>>  error compiling setup_easy_handle: could not load module libcurl
>>
>> This problem seems to be a recurrent problem in windows environments,
>> googling it brings up several threads where this is a problem though none
>> seem to have been properly resolved:
>>
>> https://github.com/forio/Curl.jl/issues/10
>> https://www.mail-archive.com/julia-users@googlegroups.com/msg04638.html
>> https://groups.google.com/forum/#!topic/julia-users/2XpGjLUC0ow
>>
>>
>> I ran Pkg.update() as one thread suggested this had been resolved in a
>> recent update, no luck.
>>
>> Has anyone found a resolution to this? If not is there any possibility of
>> this problem being resolved in the near future or does anyone have any
>> advice on how I might set about resolving this problem myself?
>>
>> Thank you very much in advance for any help.
>> Sam
>>
>

Re: [julia-users] Re: help with parallel

2014-05-05 Thread Amit Murthy

Some discussion regarding this on github:

https://github.com/JuliaLang/julia/issues/3674
https://github.com/JuliaLang/julia/issues/5232



On Tue, May 6, 2014 at 5:55 AM, Ethan Anderes wrote:

> Thanks James:
>
> That's basically the solution that I've got implemented now. Except I use
> require("functions.jl") which seems to make these function available to all
> workers. It feels a bit un-natural since my use of parallelism is a small
> part of my code, so I end up calling require("functions.jl") twice, first
> at the top of the script, then again just after addprocs(...).
>
> I also tried looking at the source code for this stuff, in the hopes that
> I could figure out the finer points of passing/requesting variable from
> works and local worker namespaces but I just couldn't penetrate it. Maybe I
> need to wait till someone does a detailed blog post on a particular problem.
>
> Cheers!
>

Re: [julia-users] Re: help with parallel

2014-05-05 Thread Amit Murthy

Also, it will be great if you could file a bug regarding the scope issue
mentioned previously. I think it is a bug too.


On Tue, May 6, 2014 at 8:15 AM, Amit Murthy  wrote:

> Some discussion regarding this on github:
>
> https://github.com/JuliaLang/julia/issues/3674
> https://github.com/JuliaLang/julia/issues/5232
>
>
>
> On Tue, May 6, 2014 at 5:55 AM, Ethan Anderes wrote:
>
>> Thanks James:
>>
>> That's basically the solution that I've got implemented now. Except I use
>> require("functions.jl") which seems to make these function available to all
>> workers. It feels a bit un-natural since my use of parallelism is a small
>> part of my code, so I end up calling require("functions.jl") twice, first
>> at the top of the script, then again just after addprocs(...).
>>
>> I also tried looking at the source code for this stuff, in the hopes that
>> I could figure out the finer points of passing/requesting variable from
>> works and local worker namespaces but I just couldn't penetrate it. Maybe I
>> need to wait till someone does a detailed blog post on a particular problem.
>>
>> Cheers!
>>
>
>

Re: [julia-users] Re: Matlab urlread('URL','method',PARAMS), anyone knows the equivalent in Julia?

2014-05-06 Thread Amit Murthy

If you want to try with HTTPClient, the usage would be

using HTTPClient.HTTPC
r=HTTPC.get("http://xxx.xxx.xxx.xxx:xx/?request=var";)

r.body is an IOBuffer containing has the response body. If the response is
an ASCII string, you can stringify it with bytestring(r.body)
r.http_code has the response code




On Tue, May 6, 2014 at 3:00 PM, Avik Sengupta wrote:

> I'm presuming you're using this in windows.
>
> The download(url) method places the content of the url into a temporary
> filename. The function should then return the name of the file in which the
> content has been placed. Unfortunately, the windows verson of the function
> seems to have a bug where it does not return the filename. The unix version
> does.
>
> As a workaround, you can use the download(url, filename) version of the
> method. So something like:
>
> filename = tempname()
> download(url, filename)
>
> The contents of the url will then be in the file represented by filename.
>
> Regards
> -
> Avik
>
>
>
> On Tuesday, 6 May 2014 10:20:30 UTC+1, joanenric barcelo wrote:
>>
>> That is right, what I want is to get the value of var as you said, ie.
>>
>> http://xxx.xxx.xxx.xxx:xx/?request=var
>>
>> so, doing
>>
>>
>> julia> dir = "http://xxx.xxx.xxx.xxx:xx/?request=var";;
>>
>> julia> a = download(dir)
>>
>> julia> println(a)
>> nothing
>>
>> julia> typeof(a)
>> Nothing (constructor with 1 method)
>>
>>
>>
>> I am not getting anything. Any ideas so far? Thanks!
>>
>>
>> El sábado, 3 de mayo de 2014 17:17:10 UTC+1, Jameson escribió:
>>>
>>> Or use the builtin `download` command. it isn't very fancy, but should
>>> get the job done.
>>>
>>> I'm not sure what matlab means by PARAMS for an HTTP GET, since the
>>> GET method doesn't take arguments. presumably though, it is rewriting
>>> the url to `http://xx.xx.xx.xx:xx/?request=value' with quoting for
>>> request and value as needed
>>>
>>> On Sat, May 3, 2014 at 6:19 AM, joanenric barcelo 
>>> wrote:
>>> > Thanks Tony for your help.
>>> >
>>> > However, I need to use Win XP for working reasons and I cannot manage
>>> to get
>>> > it work. I have raised another post with this issue
>>> > https://groups.google.com/forum/#!topic/julia-users/wPNc8T8lxX8
>>> >
>>> > thanks again!!
>>> >
>>> > El miércoles, 30 de abril de 2014 17:57:50 UTC+1, Tony Kelman
>>> escribió:
>>> >>
>>> >> I'm not sure if the functionality is in base, but presumably one of
>>> the
>>> >> http client packages (like https://github.com/loladiro/Requests.jl)
>>> could do
>>> >> what you're looking for?
>>> >>
>>> >>
>>> >> On Wednesday, April 30, 2014 8:57:17 AM UTC-7, joanenric barcelo
>>> wrote:
>>> >>>
>>> >>> Hi!
>>> >>>
>>> >>> I'm coming from Matlab and I would like to request some information
>>> >>> through IP connection. Basically, I would like to translate the
>>> following
>>> >>> Matlab command
>>> >>>
>>> >>>  urlread('URL','method',PARAMS)
>>> >>>
>>> >>>
>>> >>> concretely:
>>> >>>
>>> >>> urlread('http://xx.xx.xx.xx:xx','Get',{'request','value'})
>>> >>>
>>> >>>
>>> >>>
>>> >>> Thanks in advance!
>>> >>>
>>> >>> JoanEnric
>>>
>>

Re: [julia-users] Re: Matlab urlread('URL','method',PARAMS), anyone knows the equivalent in Julia?

2014-05-06 Thread Amit Murthy

Unfortunately I don't have access to a Windows machine to try it out. May
be related to this patch
https://github.com/amitmurthy/LibCURL.jl/pull/10that was recently
applied. Maybe Tony can help out here.


On Tue, May 6, 2014 at 7:13 PM, joanenric barcelo wrote:

> Thanks Amit, I've tried your package as well. What I get in my machine is
> the following:
>
> Microsoft Windows XP [Version 5.1.2600]
> (C) Copyright 1985-2001 Microsoft Corp.
>
>
> C:\Documents and Settings\jbarcelo>julia
> OpenBLAS : Your OS does not support AVX instructions. OpenBLAS is using
> Nehalem
> kernels as a fallback, which may give poorer performance.
>_
>_   _ _(_)_ |  A fresh approach to technical computing
>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>_ _   _| |_  __ _   |  Type "help()" to list help topics
>   | | | | | | |/ _` |  |
>   | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2809 (2014-04-28 22:41
> UTC)
>  _/ |\__'_|_|_|\__'_|  |  Commit d1095bb* (7 days old master)
> |__/   |  i686-w64-mingw32
>
>
> julia> using HTTPClient.HTTPC
>
> julia> r = HTTPC.get("http://xxx.xxx.xxx.xxx:xx/?request=var";)
> ERROR: error compiling get: error compiling setup_easy_handle:
>  in get at C:\Documents and Settings\user\.julia\v0.3\HTTPClient\src\HTTPC
> .j
> l:519
>
>
> Do you have any idea on what can be causing this issue? Thanks Amit
>
>
> El martes, 6 de mayo de 2014 10:55:46 UTC+1, Amit Murthy escribió:
>>
>> If you want to try with HTTPClient, the usage would be
>>
>> using HTTPClient.HTTPC
>> r=HTTPC.get("http://xxx.xxx.xxx.xxx:xx/?request=var";)
>>
>> r.body is an IOBuffer containing has the response body. If the response
>> is an ASCII string, you can stringify it with bytestring(r.body)
>> r.http_code has the response code
>>
>>
>>
>>
>> On Tue, May 6, 2014 at 3:00 PM, Avik Sengupta wrote:
>>
>>> I'm presuming you're using this in windows.
>>>
>>> The download(url) method places the content of the url into a temporary
>>> filename. The function should then return the name of the file in which the
>>> content has been placed. Unfortunately, the windows verson of the function
>>> seems to have a bug where it does not return the filename. The unix version
>>> does.
>>>
>>> As a workaround, you can use the download(url, filename) version of the
>>> method. So something like:
>>>
>>> filename = tempname()
>>> download(url, filename)
>>>
>>> The contents of the url will then be in the file represented by
>>> filename.
>>>
>>> Regards
>>> -
>>> Avik
>>>
>>>
>>>
>>> On Tuesday, 6 May 2014 10:20:30 UTC+1, joanenric barcelo wrote:
>>>>
>>>> That is right, what I want is to get the value of var as you said, ie.
>>>>
>>>> http://xxx.xxx.xxx.xxx:xx/?request=var
>>>>
>>>> so, doing
>>>>
>>>>
>>>> julia> dir = "http://xxx.xxx.xxx.xxx:xx/?request=var";;
>>>>
>>>> julia> a = download(dir)
>>>>
>>>> julia> println(a)
>>>> nothing
>>>>
>>>> julia> typeof(a)
>>>> Nothing (constructor with 1 method)
>>>>
>>>>
>>>>
>>>> I am not getting anything. Any ideas so far? Thanks!
>>>>
>>>>
>>>> El sábado, 3 de mayo de 2014 17:17:10 UTC+1, Jameson escribió:
>>>>>
>>>>> Or use the builtin `download` command. it isn't very fancy, but should
>>>>> get the job done.
>>>>>
>>>>> I'm not sure what matlab means by PARAMS for an HTTP GET, since the
>>>>> GET method doesn't take arguments. presumably though, it is rewriting
>>>>> the url to `http://xx.xx.xx.xx:xx/?request=value' with quoting for
>>>>> request and value as needed
>>>>>
>>>>> On Sat, May 3, 2014 at 6:19 AM, joanenric barcelo 
>>>>> wrote:
>>>>> > Thanks Tony for your help.
>>>>> >
>>>>> > However, I need to use Win XP for working reasons and I cannot
>>>>> manage to get
>>>>> > it work. I have raised another post with this issue
>>>>> > https://groups.google.com/forum/#!topic/julia-users/wPNc8T8lxX8
>>>>> >
>>>>> > thanks again!!
>>>>> >
>>>>> > El miércoles, 30 de abril de 2014 17:57:50 UTC+1, Tony Kelman
>>>>> escribió:
>>>>> >>
>>>>> >> I'm not sure if the functionality is in base, but presumably one of
>>>>> the
>>>>> >> http client packages (like https://github.com/loladiro/Requests.jl)
>>>>> could do
>>>>> >> what you're looking for?
>>>>> >>
>>>>> >>
>>>>> >> On Wednesday, April 30, 2014 8:57:17 AM UTC-7, joanenric barcelo
>>>>> wrote:
>>>>> >>>
>>>>> >>> Hi!
>>>>> >>>
>>>>> >>> I'm coming from Matlab and I would like to request some
>>>>> information
>>>>> >>> through IP connection. Basically, I would like to translate the
>>>>> following
>>>>> >>> Matlab command
>>>>> >>>
>>>>> >>>  urlread('URL','method',PARAMS)
>>>>> >>>
>>>>> >>>
>>>>> >>> concretely:
>>>>> >>>
>>>>> >>> urlread('http://xx.xx.xx.xx:xx','Get',{'request','value'})
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> Thanks in advance!
>>>>> >>>
>>>>> >>> JoanEnric
>>>>>
>>>>
>>

Re: [julia-users] How can a remote proc self-discover DArray localparts?

2014-05-08 Thread Amit Murthy

It can't. All part references are in the DArray object, `dz` in your
example above.


On Thu, May 8, 2014 at 1:07 AM, Rick Graham  wrote:

> How can a remote proc discover, on its own, that it is "hosting" a portion
> of a DArray?
>
> Here a DArray is distributed across two workers, but the workers'
> `whos` don't see it.
>
> $ ./julia
>_
>_   _ _(_)_ |  A fresh approach to technical computing
>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>_ _   _| |_  __ _   |  Type "help()" to list help topics
>   | | | | | | |/ _` |  |
>   | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2921 (2014-05-07 17:56
> UTC)
>  _/ |\__'_|_|_|\__'_|  |  Commit ea70e4d* (0 days old master)
> |__/   |  i686-redhat-linux
>
>
> julia> versioninfo()
> Julia Version 0.3.0-prerelease+2921
> Commit ea70e4d* (2014-05-07 17:56 UTC)
> Platform Info:
>   System: Linux (i686-redhat-linux)
>   CPU: Genuine Intel(R) CPU   T2250  @ 1.73GHz
>   WORD_SIZE: 32
>   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
>   LAPACK: libopenblas
>   LIBM: libopenlibm
>
>
> julia> addprocs(2)
> 2-element Array{Any,1}:
>  2
>  3
>
>
> julia> remotecall_fetch(2, whos)
>  From worker 2: Base  Module
>  From worker 2: Core  Module
>  From worker 2: Main  Module
>
>
> julia> dz=dzeros(512)
> 512-element DArray{Float64,1,Array{Float64,1}}:
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  ⋮
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>
>
> julia> dz.chunks
> 2-element Array{RemoteRef,1}:
>  RemoteRef(2,1,3)
>  RemoteRef(3,1,4)
>
>
> julia> dz.indexes
> 2-element Array{(UnitRange{Int32},),1}:
>  (1:256,)
>  (257:512,)
>
>
> julia> remotecall_fetch(2, whos)
>  From worker 2: Base  Module
>  From worker 2: Core  Module
>  From worker 2: Main  Module
>
>
> julia>
>
>
> If you tell the remote proc the name of the DArray, it can find its part.
>
> julia> remotecall_fetch(2, localpart, dz)
> 256-element Array{Float64,1}:
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  ⋮
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>  0.0
>
>
> julia>
>
>
>
>

Re: [julia-users] pmap/map avoid return by value

2014-05-29 Thread Amit Murthy

pmap always copies since the functions are executed in a different worker
process. Do you want to collect the results of pmap execution in a
dictionary? What is the actual problem you are trying to solve?


On Fri, May 30, 2014 at 12:53 AM, Kuba Roth  wrote:

> Hi There,
> I just want to confirm this is a desired behavior of map/pmap functions.
> What I am noticing is that in case of using pmap functions need to return
> data (copy of input data),  whereas map works fine with referenced values.
> I'd love to have similiare behaviuor in pmap since  it is much lightweight
> especially when dealing with large datasets,  but the more I think about
> it, the more it makes sense to me that this is not going to happen due to
> the type of parallelism Julia supports at teh moment (massage passing).
>
> Could please someone more competent verify if the code attached below is
> correct? What I was hoping for is to have a pmap working with the functions
> where data is passed around as a reference (no return statmentts).
> Unfortunately this is only true when using single threaded map() function.
>
> I was wondering are there any alternatives at the moment? I know there is
> a Shareded array options but so far I did't have a chance to fully
> comprehended it. As far as I understand Distributed arrays will give me
> similar bottlneck as pmap where data has to be copied back into original
> container...?
> Thanks, any feedback is much appreciated.
>
> kuba
>
> ##
> addprocs(2)
>
> @everywhere function testRef(elem)
>id = elem[1]
>d = elem[2]
>d[id]="___$id"
> end
>
>
> @everywhere function testCopy(elem)
>id = elem[1]
>d = elem[2]
>d[id]="___$id"
>return elem
> end
>
> ids = [[i,Dict()] for i=(1:3)]
> pmap(testRef, ids) # pmap - Dictionary is empty
> print("=== pmap ref:", ids)
> map(testRef, ids)  # map  - works as expected
> print("=== map ref:", ids)
> ids_copy = pmap(testCopy, ids) # works - but creates a copy - slow
> print("=== pmap copy:", ids_copy)
>
>
> ##
> Output:
> === pmap ref:1
>  Dict{Any,Any}()
> 2 Dict{Any,Any}()
> 3 Dict{Any,Any}()
> === map ref:1
>  {1=>"___1"}
> 2 {2=>"___2"}
> 3 {3=>"___3"}
> === pmap copy:1
> {1=>"___1"}
> 2 {2=>"___2"}
> 3 {3=>"___3"}
>
>

Re: [julia-users] Parallel, strange behavior of the loop after changing the value to a variable, 6 time slower

2014-06-01 Thread Amit Murthy

@everywhere  const k=2


On Sun, Jun 1, 2014 at 8:26 PM, Andreas Noack Jensen <
andreasnoackjen...@gmail.com> wrote:

> I don't know why you get that error. It is not there on my machine.
>
> However, for some reason defining k as const does not work for parallel
> loops. It only makes a difference for serial loops so my explanation was
> wrong. Is this expected?
>
> julia> const k1=1000;k2 = k1;
>
> julia> @time for i = 1:1000
>int(randbool())
>end
> elapsed time: 0.041954142 seconds (0 bytes allocated)
>
> julia> @time for i = 1:k1
>int(randbool())
>end
> elapsed time: 0.042543885 seconds (0 bytes allocated)
>
> julia> @time for i = 1:k2
>int(randbool())
>end
> elapsed time: 2.923224821 seconds (639983688 bytes allocated)
>
> julia> @time tmp = @parallel (+) for i = 1:1000
>int(randbool())
>end
> elapsed time: 0.05594257 seconds (155072 bytes allocated)
>
> julia> @time tmp = @parallel (+) for i = 1:k1
>int(randbool())
>end
> elapsed time: 3.067074328 seconds (640486512 bytes allocated)
>
> julia> @time tmp = @parallel (+) for i = 1:k2
>int(randbool())
>end
> elapsed time: 3.021796716 seconds (640090424 bytes allocated)
>
>
> 2014-06-01 16:28 GMT+02:00 paul analyst :
>
> After restart julia :
>>
>>
>> D:\install\Julia\Julia 0.3.0-prerelease-win64-ver3\Julia 0.3.0-prerelease
>> ver 3>bin\julia.exe -p 8
>>_
>>_   _ _(_)_ |  A fresh approach to technical computing
>>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>>_ _   _| |_  __ _   |  Type "help()" to list help topics
>>   | | | | | | |/ _` |  |
>>   | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2599 (2014-04-11 23:52
>> UTC)
>>  _/ |\__'_|_|_|\__'_|  |  Commit bf7096c (50 days old master)
>> |__/   |  x86_64-w64-mingw32
>>
>> julia> const k=2
>>
>> 2
>>
>> julia> tic();
>>
>> julia> nheads = @parallel (+) for i=1:k
>>int(randbool())
>>end
>> exception on exception on 3: 4: ERROR: ERROR: exception on exception on
>> 2: exception on 7: ERROR: 5:
>>  ERROR: ERROR: exception on k not defined6: exception on ERROR:
>>  in anonymous at no file:1379
>> 8k:  not definedERROR:
>>  in anonymous at no file:1379
>> k not definedk not defined
>>  in anonymous at no file:1379
>>
>>  in anonymous at no file:1379
>> k not definedk not definedk not defined
>>  in anonymous at no file:1379
>>
>>  in anonymous at no file:1379
>>
>>  in anonymous at no file:1379exception on
>> 9: ERROR: k not defined
>>  in anonymous at no file:1379
>> ERROR: no method +(UndefVarError, UndefVarError)
>>  in mr_pairwise at reduce.jl:534
>>
>> julia> toc()
>> elapsed time: 2.493525081 seconds
>> 2.493525081
>>
>> julia>
>>
>> W dniu niedziela, 1 czerwca 2014 16:27:14 UTC+2 użytkownik paul analyst
>> napisał:
>>
>>>
>>>
>>>
>>>
>>> *julia> const k=2;ERROR: cannot declare k constant; it already
>>> has a value :/Paul*
>>> W dniu niedziela, 1 czerwca 2014 16:13:27 UTC+2 użytkownik Andreas Noack
>>> Jensen napisał:

 This is the usual problem with global variables in Julia. If you define
 k by

 const k=2

 the timing results should be similar.


 2014-06-01 16:09 GMT+02:00 paul analyst :

>
> D:\install\Julia\Julia 0.3.0-prerelease-win64-ver3\Julia
> 0.3.0-prerelease ver 3>bin\julia.exe -p 8
>_
>_   _ _(_)_ |  A fresh approach to technical computing
>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>_ _   _| |_  __ _   |  Type "help()" to list help topics
>   | | | | | | |/ _` |  |
>   | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2599 (2014-04-11
> 23:52 UTC)
>  _/ |\__'_|_|_|\__'_|  |  Commit bf7096c (50 days old master)
> |__/   |  x86_64-w64-mingw32
>
> julia> procs()
> 9-element Array{Int64,1}:
>  1
>  2
>  3
>  4
>  5
>  6
>  7
>  8
>  9
>
> julia> tic();
>
> julia> nheads = @parallel (+) for i=1:2
>int(randbool())
>end
> 16468
>
> julia> toc()
> elapsed time: 2.77418807 seconds
> 2.77418807
>
> julia>exit()
>
>
> D:\install\Julia\Julia 0.3.0-prerelease-win64-ver3\Julia
> 0.3.0-prerelease ver 3>bin\julia.exe -p 8
>_
>_   _ _(_)_ |  A fresh approach to technical computing
>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>_ _   _| |_  __ _   |  Type "help()" to list help topics
>   | | | | | | |/ _` |  |
>   | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2599 (2014-04-11
> 23:52 UTC)
>  _/ |\__'_|_|_|\__'_|  |  Commit bf7096c (50 days old master)
> |__/   |  x86_64-w64-mingw32
>
> julia> k=2
> 2
>
> julia> tic();
>
>>

Re: [julia-users] pmap/map avoid return by value

2014-06-02 Thread Amit Murthy

Do note that SharedArrays will only work with bitstype arrays, not with,
for example, an array of Strings.

Assuming that you are parsing these files and generating an large amount of
small strings, I would suspect that the time being taken is in serializing
and deserializing these large amount of small strings.

Just as an example serializing - deserializing 2 million small strings of
an average length of 6 bytes takes around 4 seconds on my machine while a
single 12MB string takes 0.03 seconds.

I haven't tried, but it may be faster to return a single large delimited
string ( one line for every delimited key-value pair with your own
delimiter) and then build your dictionary on the master process. Overall
you will be generating only half the number of small strings.

On Sat, May 31, 2014 at 12:15 PM, Kuba Roth  wrote:

> Hi Amit,
> Well in my case I'm parsing a bunch of files, store results in
> dictionaries which are merged back into one big array of dictionaries.
> Since each file can be parsed independently pmap seems to be good and clean
> fit. But because size of each Dictionary is quite big merging the data back
> is super slow.
> Perhaps pmap is not  best answer to this problem and I should look further
> into shared arrays (which unfortunately I haven't had time right now)
> kuba
>

Re: [julia-users] Parallel, strange behavior of the loop after changing the value to a variable, 6 time slower

2014-06-02 Thread Amit Murthy

Sorry, I added the @everywhere seeing the "k not defined" messages in
Paul's post. It is not required. Don't know why he saw that.

As for the timings it does seem like the issue with using globals. Wrapping
the example in a function gives the same performance for both value and
variable.


On Mon, Jun 2, 2014 at 12:48 PM, Andreas Noack Jensen <
andreasnoackjen...@gmail.com> wrote:

> It still does not work
>
> julia> @everywhere const k = 1000;
>
> julia> @time tmp = @parallel (+) for i = 1:k;
>int(randbool())
>end
> elapsed time: 3.067388361 seconds (640090568 bytes allocated)
>
>
> julia> @time tmp = @parallel (+) for i = 1:1000;
>int(randbool())
>end
> elapsed time: 0.055685151 seconds (155072 bytes allocated)
>
>
> 2014-06-02 5:26 GMT+02:00 Amit Murthy :
>
> >
> > @everywhere  const k=2
> >
> >
> > On Sun, Jun 1, 2014 at 8:26 PM, Andreas Noack Jensen <
> andreasnoackjen...@gmail.com> wrote:
> >>
> >> I don't know why you get that error. It is not there on my machine.
> >>
> >> However, for some reason defining k as const does not work for parallel
> loops. It only makes a difference for serial loops so my explanation was
> wrong. Is this expected?
> >>
> >> julia> const k1=1000;k2 = k1;
> >>
> >> julia> @time for i = 1:1000
> >>int(randbool())
> >>end
> >> elapsed time: 0.041954142 seconds (0 bytes allocated)
> >>
> >> julia> @time for i = 1:k1
> >>int(randbool())
> >>end
> >> elapsed time: 0.042543885 seconds (0 bytes allocated)
> >>
> >> julia> @time for i = 1:k2
> >>int(randbool())
> >>end
> >> elapsed time: 2.923224821 seconds (639983688 bytes allocated)
> >>
> >> julia> @time tmp = @parallel (+) for i = 1:1000
> >>int(randbool())
> >>end
> >> elapsed time: 0.05594257 seconds (155072 bytes allocated)
> >>
> >> julia> @time tmp = @parallel (+) for i = 1:k1
> >>int(randbool())
> >>end
> >> elapsed time: 3.067074328 seconds (640486512 bytes allocated)
> >>
> >> julia> @time tmp = @parallel (+) for i = 1:k2
> >>int(randbool())
> >>end
> >> elapsed time: 3.021796716 seconds (640090424 bytes allocated)
> >>
> >>
> >> 2014-06-01 16:28 GMT+02:00 paul analyst :
> >>
> >>> After restart julia :
> >>>
> >>>
> >>> D:\install\Julia\Julia 0.3.0-prerelease-win64-ver3\Julia
> 0.3.0-prerelease ver 3>bin\julia.exe -p 8
> >>>_
> >>>_   _ _(_)_ |  A fresh approach to technical computing
> >>>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
> >>>_ _   _| |_  __ _   |  Type "help()" to list help topics
> >>>   | | | | | | |/ _` |  |
> >>>   | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2599 (2014-04-11
> 23:52 UTC)
> >>>  _/ |\__'_|_|_|\__'_|  |  Commit bf7096c (50 days old master)
> >>> |__/   |  x86_64-w64-mingw32
> >>>
> >>> julia> const k=2
> >>>
> >>> 2
> >>>
> >>> julia> tic();
> >>>
> >>> julia> nheads = @parallel (+) for i=1:k
> >>>int(randbool())
> >>>end
> >>> exception on exception on 3: 4: ERROR: ERROR: exception on exception
> on 2: exception on 7: ERROR: 5:
> >>>  ERROR: ERROR: exception on k not defined6: exception on ERROR:
> >>>  in anonymous at no file:1379
> >>> 8k:  not definedERROR:
> >>>  in anonymous at no file:1379
> >>> k not definedk not defined
> >>>  in anonymous at no file:1379
> >>>
> >>>  in anonymous at no file:1379
> >>> k not definedk not definedk not defined
> >>>  in anonymous at no file:1379
> >>>
> >>>  in anonymous at no file:1379
> >>>
> >>>  in anonymous at no file:1379exception on
> >>> 9: ERROR: k not defined
> >>>  in anonymous at no file:1379
> >>> ERROR: no method +(UndefVarError, UndefVarError)
> >>>  in mr_pairwise at reduce.jl:534
> >>>
> >>> julia> toc()
> >>> elapsed time: 2.493525081 seconds
> >>>

Re: [julia-users] pmap/map avoid return by value

2014-06-02 Thread Amit Murthy

Single Char Array yes. Array of Char Arrays no. Dicts no.


On Tue, Jun 3, 2014 at 5:44 AM, Kuba Roth  wrote:

> I can probably get away with String type by using Char arrays instead? How
> about Dict hashtables? Is there any way I can store them in shared arrays?
>
> Going back to pmap. I think i need to shuffle around the workflow and
> perform most of the computation at each prcess returning just what it is
> realy needed and not whole parsed data.
>
>

Re: [julia-users] @parallel for not using all julia processes?

2014-06-04 Thread Amit Murthy

https://github.com/amitmurthy/MessageUtils.jl could be useful for your own
task distribution.


On Wed, Jun 4, 2014 at 9:39 PM, Kevin Squire  wrote:

> It is the case currently that the array itself is divided up evenly. One
> way to deal with this (in the back end) is work stealing, but that hasn't
> been implemented in Julia yet.
>
> Cheers,
>Kevin
>
>
> On Wednesday, June 4, 2014, Jutho  wrote:
>
>> I probably already have an idea what's going on. How are the different
>> tasks distributed over the different Julia processes? Is the for loop
>> immediately cut into pieces where e.g. process 1 will handle the cases
>> iter=1:10, process 2 handles the cases iter=11:20 and so on? For different
>> values of the parameters, the execution time will be widely different (from
>> fraction of a second to several minutes or even more). If some processes
>> handle all the slow cases and other all the fast cases, then this explains
>> the behaviour I am seeing. I guess I need to write my own task
>> distribution, for which I will have to read the Manual section on parallel
>> computing again.
>>
>>
>>

Re: [julia-users] what is wrong in my code: pmap and @parallel

2014-06-10 Thread Amit Murthy

In the pmap case,  [(x, y) for x=1:5, y=1:5]  is an array of tuples, so f2
defined as below would work.

@everywhere function f2 (v)
 x,y=v
 return x + 1 + y
end

For the darray, push! is not implemented for a darray - a darray cannot be
grown once instantiated.




On Wed, Jun 11, 2014 at 5:18 AM, Zahirul ALAM 
wrote:

> I need help in understanding what am I doing wrong in each of the
> following cases:
>
> @everywhere function f2 (x, y)
>   return x + 1 + y
> end
> dArray = dzeros(5)
>
> @parallel for i=1:5
>   b= map(f2, [x, x])
>   push!(dArray, b)
> end
>
> dArray returns are still zero and the dimensions have not changed
>
> 
>
> pmap(f2, [(x, y) for x=1:5, y=1:5] )
>
> returns 4-element Array{Any,1}:
>
>  MethodError(f2,((1,1),))
>  MethodError(f2,((2,2),))
>  MethodError(f2,((3,3),))
>  MethodError(f2,((4,4),))
>
>
> Please help me understand what are the things I am doing wrong?
>
>

Re: [julia-users] Unable to reach Azure VM through --machinefile or addprocs, but fine through ssh command

2015-07-09 Thread Amit Murthy

keyword option "tunnel=true" could help

On Thu, Jul 9, 2015 at 7:53 PM, András Kalmár Nagy <
kalmar.nagy.and...@gmail.com> wrote:

> Hi everyone!
>
> I configured an Azure instance for passwordless SSH, and I can connect to
> it without problems from my shell, but Julia fails to connect and I get
> timed out after 60 secs.
>
> I tried putting the host in a machinefile, using it through addprocs (all
> combinations of specifying an explicit username and/or IP instead of the
> hostname).
>
> The julia executable is in the same place on all machines, I can connect
> to two other machines on the LAN, and they work.
>
> What could be the problem?
>

Re: [julia-users] Unable to reach Azure VM through --machinefile or addprocs, but fine through ssh command

2015-07-10 Thread Amit Murthy

Hmm, the tunnel option is expected to work when only the ssh port is open.
Is it possible that the ssh connection setup is throwing up a dialog in the
background?

For example, when using a keyfile to login to an AWS instance, I use

sshflags = `-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o
LogLevel=ERROR -i $(ec2_keyfile)`

which prevents the "add to hosts file" dialog from popping up.


On Fri, Jul 10, 2015 at 3:49 AM, Avik Sengupta 
wrote:

> I believe azure VM's by default are firewalled off. Have you opened the
> TCP ports that Julia uses to communicate?
>
>
> On Thursday, 9 July 2015 17:33:45 UTC+1, András Kalmár Nagy wrote:
>>
>> I tried tunnel=true... I can now see julia worker processes being spawned
>> on the remote machine, which is a good thing, but after a while, nothing
>> happens and they die. On the local side, I get this error:
>>
>> ERROR: connect: connection timed out (ETIMEDOUT)
>>  in wait at ./task.jl:284
>>  in wait at ./task.jl:194
>>  in stream_wait at stream.jl:263
>>  in wait_connected at stream.jl:301
>>  in Worker at multi.jl:113
>>  in anonymous at task.jl:905
>>
>> I'm still trying some things. Does julia use ~/.ssh/config?
>>
>> On Thursday, July 9, 2015 at 4:30:03 PM UTC+2, Amit Murthy wrote:
>>>
>>> keyword option "tunnel=true" could help
>>>
>>> On Thu, Jul 9, 2015 at 7:53 PM, András Kalmár Nagy <
>>> kalmar.na...@gmail.com> wrote:
>>>
>>>> Hi everyone!
>>>>
>>>> I configured an Azure instance for passwordless SSH, and I can connect
>>>> to it without problems from my shell, but Julia fails to connect and I get
>>>> timed out after 60 secs.
>>>>
>>>> I tried putting the host in a machinefile, using it through addprocs
>>>> (all combinations of specifying an explicit username and/or IP instead of
>>>> the hostname).
>>>>
>>>> The julia executable is in the same place on all machines, I can
>>>> connect to two other machines on the LAN, and they work.
>>>>
>>>> What could be the problem?
>>>>
>>>
>>>

Re: [julia-users] distributed parallel processing on Windows cluster

2015-08-12 Thread Amit Murthy

If you can write a ClusterManager ( look at ClusterManager.jl for some
examples), there is no particular reason why it should not be possible.

On Thu, Aug 13, 2015 at 1:50 AM, Marcio Sales  wrote:

> Is it possible?

Re: [julia-users] Adding remote workers on windows

2015-09-07 Thread Amit Murthy

Is port 9009 open on the remote machine? You could try with "tunnel=true"
if it is not open.

On Mon, Sep 7, 2015 at 4:32 PM, Greg Plowman  wrote:

> Hi,
>
> I'm trying to use addprocs() to add remote workers on another windows
> machine.
>
> I'm using a ssh server for windows (Bitvise) with a modified Cluster
> Manager, and have successfully used this method in another environment.
> So I know that it works, although one difference is Window 7 (works) vs
> Windows 8.1 (does not work), but I don't think this should be problem.
>
> Now, I don't expect anyone to troubleshoot my particular setup /
> environment / customisation.
> Rather I was hoping for some high level help with further diagnosis.
>
> I can confirm that the windows command to launch the remote worker is
> executed, and the remote machine receives a connection and then successful
> login.
> The remote ssh server shows a successful connection and login, and windows
> Task Manager shows a Julia process has started.
> Then the following error occurs on the local machine, after which the
> remote session is terminated.
>
> Error evaluating c:\Users\Greg\Julia6\src\Launcher.jl:
> connect: connection timed out (ETIMEDOUT)
>  in wait at task.jl:284
>  in wait at task.jl:194
>  in stream_wait at stream.jl:263
>  in wait_connected at stream.jl:301
>  in Worker at multi.jl:113
>  in create_worker at multi.jl:1064
>  in start_cluster_workers at multi.jl:1028
>
> I guess my first question is which side (local or remote) is failing.
> It seems to me that the local Julia process is waiting for some
> confirmation of connection? Does that sound right?
> If so, are there any suggestions on how to further diagnose problem.
>
> When the ssh command to start a remote Julia worker is executed from the
> windows command line, I get the following:
> julia_worker:9009#192.168.1.107
>
> Then after about 60s:
> Master process (id 1) could not connect within 60.0 seconds.
> exiting.
>
> Presumably this is the expected behaviour, since the remote worker process
> is not communicating with master Julia process?
>
> Maybe the remote Julia.exe command is not receiving the --worker argument
> properly?
>
> As I said, my method works in another environment (which incidentally
> seems like magic to me).
> I'm not really sure what is different here.
> So any suggestions would be appreciated.
>
> Thanks, Greg
>

Re: [julia-users] Adding remote workers on windows

2015-09-07 Thread Amit Murthy

I meant the remote machine/network may be firewalled to only accept
incoming ssh, http and other known ports.

On Tue, Sep 8, 2015 at 5:49 AM, greg_plowman via julia-users <
julia-users@googlegroups.com> wrote:

> Is port 9009 open on the remote machine? You could try with "tunnel=true"
>> if it is not open.
>
>
> I think so.
> After running addprocs() and before the wait error, netstat on the remote
> machine outputs the following:
>
> C:\Users\Greg>netstat -an
> Active Connections
>   Proto  Local Address  Foreign AddressState
>   TCP0.0.0.0:22 0.0.0.0:0  LISTENING
>   TCP0.0.0.0:1350.0.0.0:0  LISTENING
>   TCP0.0.0.0:4450.0.0.0:0  LISTENING
>   TCP0.0.0.0:5540.0.0.0:0  LISTENING
>   TCP0.0.0.0:2869   0.0.0.0:0  LISTENING
>   TCP0.0.0.0:3389   0.0.0.0:0  LISTENING
>   TCP0.0.0.0:5357   0.0.0.0:0  LISTENING
>   TCP0.0.0.0:8092   0.0.0.0:0  LISTENING
>   TCP0.0.0.0:9009   0.0.0.0:0  LISTENING
>   TCP0.0.0.0:10243  0.0.0.0:0  LISTENING
>   TCP0.0.0.0:26143  0.0.0.0:0  LISTENING
>   TCP0.0.0.0:47984  0.0.0.0:0  LISTENING
> ...
>
> When the remote session terminates, the 9009 entry is missing from netstat
> output.
>
>
> On Monday, September 7, 2015 at 9:24:38 PM UTC+10, Amit Murthy wrote:
>
>> Is port 9009 open on the remote machine? You could try with "tunnel=true"
>> if it is not open.
>>
>> On Mon, Sep 7, 2015 at 4:32 PM, Greg Plowman  wrote:
>>
>>> Hi,
>>>
>>> I'm trying to use addprocs() to add remote workers on another windows
>>> machine.
>>>
>>> I'm using a ssh server for windows (Bitvise) with a modified Cluster
>>> Manager, and have successfully used this method in another environment.
>>> So I know that it works, although one difference is Window 7 (works) vs
>>> Windows 8.1 (does not work), but I don't think this should be problem.
>>>
>>> Now, I don't expect anyone to troubleshoot my particular setup /
>>> environment / customisation.
>>> Rather I was hoping for some high level help with further diagnosis.
>>>
>>> I can confirm that the windows command to launch the remote worker is
>>> executed, and the remote machine receives a connection and then successful
>>> login.
>>> The remote ssh server shows a successful connection and login, and
>>> windows Task Manager shows a Julia process has started.
>>> Then the following error occurs on the local machine, after which the
>>> remote session is terminated.
>>>
>>> Error evaluating c:\Users\Greg\Julia6\src\Launcher.jl:
>>> connect: connection timed out (ETIMEDOUT)
>>>  in wait at task.jl:284
>>>  in wait at task.jl:194
>>>  in stream_wait at stream.jl:263
>>>  in wait_connected at stream.jl:301
>>>  in Worker at multi.jl:113
>>>  in create_worker at multi.jl:1064
>>>  in start_cluster_workers at multi.jl:1028
>>>
>>> I guess my first question is which side (local or remote) is failing.
>>> It seems to me that the local Julia process is waiting for some
>>> confirmation of connection? Does that sound right?
>>> If so, are there any suggestions on how to further diagnose problem.
>>>
>>> When the ssh command to start a remote Julia worker is executed from the
>>> windows command line, I get the following:
>>> julia_worker:9009#192.168.1.107
>>>
>>> Then after about 60s:
>>> Master process (id 1) could not connect within 60.0 seconds.
>>> exiting.
>>>
>>> Presumably this is the expected behaviour, since the remote worker
>>> process is not communicating with master Julia process?
>>>
>>> Maybe the remote Julia.exe command is not receiving the --worker
>>> argument properly?
>>>
>>> As I said, my method works in another environment (which incidentally
>>> seems like magic to me).
>>> I'm not really sure what is different here.
>>> So any suggestions would be appreciated.
>>>
>>> Thanks, Greg
>>>
>>
>>

[julia-users] Pkg.publish() woes

2015-09-21 Thread Amit Murthy

julia> Pkg.publish()

INFO: Validating METADATA
INFO: Pushing LibCURL permanent tags: v0.1.6
INFO: Submitting METADATA changes
INFO: Forking JuliaLang/METADATA.jl to amitmurthy
INFO: Recompiling stale cache file /home/amitm/.julia/lib/v0.4/JSON.ji for 
module JSON.
Enter host password for user 'amitmurthy':
INFO: Two-factor authentication in use.  Enter auth code.  (You may have to 
re-enter your password.)
Authentication code: xx
Enter host password for user 'amitmurthy':
INFO: Retrieving existing GitHub token. (You may have to re-enter your 
password twice more.)
Enter host password for user 'amitmurthy':
New authentication code: xx
Enter host password for user 'amitmurthy':
INFO: Could not authenticate with existing token. Deleting token and trying 
again.
Enter host password for user 'amitmurthy':
INFO: Two-factor authentication in use.  Enter auth code.  (You may have to 
re-enter your password.)
Authentication code: xx
Enter host password for user 'amitmurthy':
INFO: Retrieving existing GitHub token. (You may have to re-enter your 
password twice more.)
Enter host password for user 'amitmurthy':
New authentication code: xx
Enter host password for user 'amitmurthy':
ERROR: forking JuliaLang/METADATA.jl failed: Bad credentials
 in error at ./error.jl:21
 in fork at pkg/github.jl:144
 in pull_request at pkg/entry.jl:327
 in publish at pkg/entry.jl:394
 in anonymous at pkg/dir.jl:31
 in cd at file.jl:22
 in cd at pkg/dir.jl:31
 in publish at pkg.jl:61



After entering credentials umpteen number of times, it just barfs out. Any 
pointers?

Re: [julia-users] Re: Pkg.publish() woes

2015-09-21 Thread Amit Murthy

Thanks!

On Tue, Sep 22, 2015 at 9:56 AM, Seth  wrote:

> I got it to work and wrote my process up here:
> https://github.com/JuliaLang/julia/issues/10766
>
>
> On Monday, September 21, 2015 at 9:05:38 PM UTC-7, Amit Murthy wrote:
>>
>> julia> Pkg.publish()
>>
>> INFO: Validating METADATA
>> INFO: Pushing LibCURL permanent tags: v0.1.6
>> INFO: Submitting METADATA changes
>> INFO: Forking JuliaLang/METADATA.jl to amitmurthy
>> INFO: Recompiling stale cache file /home/amitm/.julia/lib/v0.4/JSON.ji
>> for module JSON.
>> Enter host password for user 'amitmurthy':
>> INFO: Two-factor authentication in use.  Enter auth code.  (You may have
>> to re-enter your password.)
>> Authentication code: xx
>> Enter host password for user 'amitmurthy':
>> INFO: Retrieving existing GitHub token. (You may have to re-enter your
>> password twice more.)
>> Enter host password for user 'amitmurthy':
>> New authentication code: xx
>> Enter host password for user 'amitmurthy':
>> INFO: Could not authenticate with existing token. Deleting token and
>> trying again.
>> Enter host password for user 'amitmurthy':
>> INFO: Two-factor authentication in use.  Enter auth code.  (You may have
>> to re-enter your password.)
>> Authentication code: xx
>> Enter host password for user 'amitmurthy':
>> INFO: Retrieving existing GitHub token. (You may have to re-enter your
>> password twice more.)
>> Enter host password for user 'amitmurthy':
>> New authentication code: xx
>> Enter host password for user 'amitmurthy':
>> ERROR: forking JuliaLang/METADATA.jl failed: Bad credentials
>>  in error at ./error.jl:21
>>  in fork at pkg/github.jl:144
>>  in pull_request at pkg/entry.jl:327
>>  in publish at pkg/entry.jl:394
>>  in anonymous at pkg/dir.jl:31
>>  in cd at file.jl:22
>>  in cd at pkg/dir.jl:31
>>  in publish at pkg.jl:61
>>
>>
>>
>> After entering credentials umpteen number of times, it just barfs out.
>> Any pointers?
>>
>>
>>
>>
>>

Re: [julia-users] memory usage of worker processes

2015-02-22 Thread Amit Murthy

There are a bunch of memory related issues w.r.t. distributed computing
still pending resolution. I guess an `@everywhere gc()` in between the
`remotecall_fetch` calls  did not help?

I suspect that https://github.com/JuliaLang/julia/issues/6597 is a probable
cause of these leaks.

On Mon, Feb 23, 2015 at 12:41 PM, Zhixuan Yang  wrote:

>
> Hello everyone,
>
> If I have a very large array in the main process and I use remotecall() or
> pmap() to copy the array to worker processes and modify the array in
> parallel (all modifications are wrapped in a function). After returning
> from the worker process, will the copied array be released?
>
> See the following REPL session run on my laptop (OS X with 8GB memory)
>
> $ ~/julia/julia -p 1
>
> julia> A = randn(1*1);
> # According to the system monitor, the main process of julia used about
> 800MB of memory, and the worker process used about 80MB
>
> julia> A[1] = remotecall_fetch(2, x->(x[1] = 1.0), A);
> # Now the main process used about 1.6GB of memory, the worker process used
> about 800MB
>
> julia> @everywhere gc()
>
> # Now Both the main proess and the worker process used about 800MB of
> memory, the copied array in the worker process wasn't released
>
>
> julia> A[1] = remotecall_fetch(2, x->(x[1] = 2.0), A);
> # If I want to iterate the computing, the situation gets worse. Now the
> worker process used about 1.6GB
>
>
> julia> A[1] = remotecall_fetch(2, x->(x[1] = 3.0), A);
> # worker process used about 2.4GB now
>
>
> julia> A[1] = remotecall_fetch(2, x->(x[1] = 4.0), A);
> # worker process used about 3GB
>
>
> julia> A[1] = remotecall_fetch(2, x->(x[1] = 5.0), A);
> # worker process used about 3.8GB
>
> In my real code, the array is even larger and there is more processes.
> After one or two iterations of pmap(), the computation becomes much slower
> than the first iteration. I think it's because the huge memory consumption
> triggers page swapping constantly.
>
> PS. In fact I prefer using shared memory or multithreading in my project,
> but I don't know how to share a  object with a user defined type besides
> shared array.
>
> Regards, Yang Zhixuan
>
>

Re: [julia-users] parallel, through a script and the REPL

2015-03-08 Thread Amit Murthy

@sync @parallel for i=1:1000
assert(get("http://127.0.0.1:5000/";).http_code == 200)
end

@parallel without a reducer is asynchronous



On Mon, Mar 9, 2015 at 4:36 AM, empty account 
wrote:

> Hi,
>
> I am trying to time how long x amount of http requests take.
>
> using HTTPClient
>
> tic()
> @parallel for i=1:1000
> assert(get("http://127.0.0.1:5000/";).http_code == 200)
> end
> toc()
>
>
> However, when I run as a script though the command line
>
> $ julia -p 2 conc.jl
>
> Only 9 requests are made,
>
> But all 1000 requests are made when running through the REPL, however the
> timer does not wait for parallelisation to complete.
>
> I wondered if anyone could point me in the right direction?
>
> Many Thanks
>
> Adrian
>
>
>
>
>

Re: [julia-users] @async weirdness

2015-03-10 Thread Amit Murthy

Works fine on Linux.



On Tue, Mar 10, 2015 at 11:28 PM, Ben Arthur  wrote:

> in my continuing quest to understand Julia tasks, i have created the
> following contrived example which does not behave as i would expect. can
> anyone help explain please? thanks in advance.
>
> julia> function printfoobar()
>  println("foo")
>  println("bar")
>  end
>
> printfoobar (generic function with 1 method)
>
> julia> printfoobar()   # great, it works
> foo
> bar
>
> julia> println("honey"); println("wagon")   # no surprise again
> honey
> wagon
>
> julia> t = @async (println("honey"); println("wagon"))  #  works too,
> modulo 'Task' being inbetween
> honey
> Task (queued) @0x7fb59e832500wagon
>
> julia> t = @async printfoobar()   # ditto:  foo and bar both printed,
> albeit with 'Task' inbetween
> foo
> Task (queued) @0x7fb59f2e1720bar
>
> julia> t = @async (println("honey"); printfoobar(); println("wagon"))   #
> WHERE ARE bar AND wagon ???
> honey
> Task (queued) @0x7fb59f2e1840foo
>
> julia> #   #nope, they still don't appear
>
> julia> # 
>
> julia> # 
>
> julia> wait(t)   # nope, still no further printed output
>
> julia> yield()   # still no joy
>
> julia> istaskdone(t)
> true
>
> is it that println("foo") and println("wagon") never get executed?  or
> that the output stream is just not making it to the REPL?  this is in 0.3.6
> by the way.  similar things happen on a 0 day old master.
>

Re: [julia-users] @async weirdness

2015-03-10 Thread Amit Murthy

What about if you don't print t .

t = @async (println("foo");println("bar"); println("baz"));

On Wed, Mar 11, 2015 at 9:31 AM, Sam L  wrote:

> Same thing on arch linux actually:
>
>   | | |_| | | | (_| |  |  Version 0.3.7-pre+15 (2015-03-02 23:43 UTC)
>  _/ |\__'_|_|_|\__'_|  |  Commit 0f0b136 (8 days old release-0.3)
> |__/   |  x86_64-unknown-linux-gnu
>
> julia> t = @async (println("foo");println("bar"); println("baz"))
> foo
> Task (queued) @0x03c57080bar
>
>
> julia>
> _
>
>
>
> On Tuesday, March 10, 2015 at 8:59:52 PM UTC-7, Sam L wrote:
>>
>> I see the behavior on OS X.  It also occurs with three println's.
>>
>>   | | |_| | | | (_| |  |  Version 0.3.7-pre+1 (2015-02-17 22:12 UTC)
>>  _/ |\__'_|_|_|\__'_|  |  Commit d15f183* (21 days old release-0.3)
>> |__/   |  x86_64-apple-darwin13.4.0
>>
>> julia> t = @async (println("foo");println("bar"); println("baz"))
>> foo
>> Task (queued) @0x7fa0faf0e520bar
>>
>>
>> julia>
>> _
>>
>> The _ indicates the cursor position after running the line of code. I hit
>> return only once after the first line starting with 't = @async...', and I
>> got two blank lines after Task was displayed, before the julia> prompt, and
>> the cursor ended up in the first column on a new line after the julia>
>> prompt.
>>
>>
>> On Tuesday, March 10, 2015 at 8:17:30 PM UTC-7, Amit Murthy wrote:
>>>
>>> Works fine on Linux.
>>>
>>>
>>>
>>> On Tue, Mar 10, 2015 at 11:28 PM, Ben Arthur  wrote:
>>>
>>>> in my continuing quest to understand Julia tasks, i have created the
>>>> following contrived example which does not behave as i would expect. can
>>>> anyone help explain please? thanks in advance.
>>>>
>>>> julia> function printfoobar()
>>>>  println("foo")
>>>>  println("bar")
>>>>  end
>>>>
>>>> printfoobar (generic function with 1 method)
>>>>
>>>> julia> printfoobar()   # great, it works
>>>> foo
>>>> bar
>>>>
>>>> julia> println("honey"); println("wagon")   # no surprise again
>>>> honey
>>>> wagon
>>>>
>>>> julia> t = @async (println("honey"); println("wagon"))  #  works too,
>>>> modulo 'Task' being inbetween
>>>> honey
>>>> Task (queued) @0x7fb59e832500wagon
>>>>
>>>> julia> t = @async printfoobar()   # ditto:  foo and bar both printed,
>>>> albeit with 'Task' inbetween
>>>> foo
>>>> Task (queued) @0x7fb59f2e1720bar
>>>>
>>>> julia> t = @async (println("honey"); printfoobar(); println("wagon"))
>>>> # WHERE ARE bar AND wagon ???
>>>> honey
>>>> Task (queued) @0x7fb59f2e1840foo
>>>>
>>>> julia> #   #nope, they still don't appear
>>>>
>>>> julia> # 
>>>>
>>>> julia> # 
>>>>
>>>> julia> wait(t)   # nope, still no further printed output
>>>>
>>>> julia> yield()   # still no joy
>>>>
>>>> julia> istaskdone(t)
>>>> true
>>>>
>>>> is it that println("foo") and println("wagon") never get executed?  or
>>>> that the output stream is just not making it to the REPL?  this is in 0.3.6
>>>> by the way.  similar things happen on a 0 day old master.
>>>>
>>>
>>>

[julia-users] code generation with @eval

2015-03-26 Thread Amit Murthy

I was trying to add a bunch  of common functions to DistributedArrays.jl 
with the below code block

for f in [:sum, :minimum, :maximum, :mean]
@eval begin
import Base: ($f)
export ($f)
function ($f)(D::DArray)
refs = [@spawnat p ($f)(localpart(D)) for p in procs(D)]
($f)([fetch(r) for r in refs])
end
end
end


 
But I get an error 

ERROR: LoadError: syntax: invalid "import" statement: expected identifier
 in include at ./boot.jl:250
 in include_from_node1 at ./loading.jl:129
 in reload_path at ./loading.jl:153
 in _require at ./loading.jl:68
 in require at ./loading.jl:51
 in process_options at ./client.jl:292
 in _start at ./client.jl:402
while loading /home/amitm/.julia/v0.4/DistributedArrays/src/
DistributedArrays.jl, in expression starting on line 497



What am I doing wrong ?

Re: [julia-users] code generation with @eval

2015-03-26 Thread Amit Murthy

Thanks guys. I'll try out your suggestions.

@Mauro  Base.$f does not work either.


On Thu, Mar 26, 2015 at 7:28 PM, Mike Innes  wrote:

> For some reason the parser doesn't like interpolating into import/export
> statements, but you can do this just fine by constructing the `Expr` object
> by hand. It's not pretty, but it works OK.
>
> You best bet is to construct expressions like `:(export foo, bar)` and
> `dump` them to see how the Expr is constructed. I think for export it's
> something like `Expr(:export, :foo, :bar)`.
>
> Example:
> https://github.com/one-more-minute/Hiccup.jl/blob/073f47314d1f08e0e5c8a958a25051248bb85039/src/Hiccup.jl#L95-L100
>
> On 26 March 2015 at 12:52, Andreas Noack 
> wrote:
>
>> Distributed reduce is already implemented, so maybe these slightly
>> simpler with e.g. sum(A::DArray) = reduce(Base.AddFun(), A)
>>
>> 2015-03-26 8:41 GMT-04:00 Jameson Nash :
>>
>> `eval` (typically) isn't allowed to handle `import` and `export`
>>> statements. those must be written explicitly
>>>
>>> On Thu, Mar 26, 2015 at 8:18 AM Amit Murthy 
>>> wrote:
>>>
>>>> I was trying to add a bunch  of common functions to
>>>> DistributedArrays.jl with the below code block
>>>>
>>>> for f in [:sum, :minimum, :maximum, :mean]
>>>> @eval begin
>>>> import Base: ($f)
>>>> export ($f)
>>>> function ($f)(D::DArray)
>>>> refs = [@spawnat p ($f)(localpart(D)) for p in procs(D)]
>>>> ($f)([fetch(r) for r in refs])
>>>> end
>>>> end
>>>> end
>>>>
>>>>
>>>>
>>>> But I get an error
>>>>
>>>> ERROR: LoadError: syntax: invalid "import" statement: expected
>>>> identifier
>>>>  in include at ./boot.jl:250
>>>>  in include_from_node1 at ./loading.jl:129
>>>>  in reload_path at ./loading.jl:153
>>>>  in _require at ./loading.jl:68
>>>>  in require at ./loading.jl:51
>>>>  in process_options at ./client.jl:292
>>>>  in _start at ./client.jl:402
>>>> while loading /home/amitm/.julia/v0.4/DistributedArrays/src/
>>>> DistributedArrays.jl, in expression starting on line 497
>>>>
>>>>
>>>>
>>>> What am I doing wrong ?
>>>>
>>>>
>>
>

Re: [julia-users] Possible bug in @spawnat or fetch?

2015-04-29 Thread Amit Murthy

Yes, this looks like a bug. In fact the below causes an error:

function test2()
  ref = @spawnat workers()[1] begin
   x=1
   end;
  x=2
end

Can you open an issue on github?


On Thu, Apr 30, 2015 at 7:07 AM, Sam Kaplan  wrote:

> Hello,
>
> I have the following code example:
> addprocs(1)
>
> function test1()
> ref = @spawnat workers()[1] begin
> x = 1
> end
> y = fetch(ref)
> @show y
> end
>
> function test2()
> ref = @spawnat workers()[1] begin
> x = 1
> end
> x = fetch(ref)
> @show x
> end
>
> function main()
> test1()
> test2()
> end
>
> main()
>
> giving the following output:
> y => 1
> ERROR: x not defined
>  in test2 at /tmp/test.jl:12
>  in main at /tmp/test.jl:21
>  in include at /usr/bin/../lib64/julia/sys.so
>  in include_from_node1 at ./loading.jl:128
>  in process_options at /usr/bin/../lib64/julia/sys.so
>  in _start at /usr/bin/../lib64/julia/sys.so
> while loading /tmp/test.jl, in expression starting on line 24
>
>
> Is this a valid error in the code or a bug in Julia?  The error seems to
> be caused when the variable that is local to the `@spawnat` block has its
> name mirrored by the variable being assigned to by the `fetch` call.
>
> For reference, I am running version 0.3.6:
>_
>_   _ _(_)_ |  A fresh approach to technical computing
>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>_ _   _| |_  __ _   |  Type "help()" for help.
>   | | | | | | |/ _` |  |
>   | | |_| | | | (_| |  |  Version 0.3.6
>  _/ |\__'_|_|_|\__'_|  |
> |__/   |  x86_64-redhat-linux
>
>
> Thanks!
>
> Sam
>

Re: [julia-users] Possible bug in @spawnat or fetch?

2015-04-29 Thread Amit Murthy

Simpler case.

julia> function test2()
   @async x=1
   x = 2
   end

test2 (generic function with 1 method)


julia> test2()

ERROR: UndefVarError: x not defined

 in test2 at none:2


Issue created : https://github.com/JuliaLang/julia/issues/11062


On Thu, Apr 30, 2015 at 8:55 AM, Amit Murthy  wrote:

> Yes, this looks like a bug. In fact the below causes an error:
>
> function test2()
>   ref = @spawnat workers()[1] begin
>x=1
>end;
>   x=2
> end
>
> Can you open an issue on github?
>
>
> On Thu, Apr 30, 2015 at 7:07 AM, Sam Kaplan 
> wrote:
>
>> Hello,
>>
>> I have the following code example:
>> addprocs(1)
>>
>> function test1()
>> ref = @spawnat workers()[1] begin
>> x = 1
>> end
>> y = fetch(ref)
>> @show y
>> end
>>
>> function test2()
>> ref = @spawnat workers()[1] begin
>> x = 1
>> end
>> x = fetch(ref)
>> @show x
>> end
>>
>> function main()
>> test1()
>> test2()
>> end
>>
>> main()
>>
>> giving the following output:
>> y => 1
>> ERROR: x not defined
>>  in test2 at /tmp/test.jl:12
>>  in main at /tmp/test.jl:21
>>  in include at /usr/bin/../lib64/julia/sys.so
>>  in include_from_node1 at ./loading.jl:128
>>  in process_options at /usr/bin/../lib64/julia/sys.so
>>  in _start at /usr/bin/../lib64/julia/sys.so
>> while loading /tmp/test.jl, in expression starting on line 24
>>
>>
>> Is this a valid error in the code or a bug in Julia?  The error seems to
>> be caused when the variable that is local to the `@spawnat` block has its
>> name mirrored by the variable being assigned to by the `fetch` call.
>>
>> For reference, I am running version 0.3.6:
>>_
>>_   _ _(_)_ |  A fresh approach to technical computing
>>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>>_ _   _| |_  __ _   |  Type "help()" for help.
>>   | | | | | | |/ _` |  |
>>   | | |_| | | | (_| |  |  Version 0.3.6
>>  _/ |\__'_|_|_|\__'_|  |
>> |__/   |  x86_64-redhat-linux
>>
>>
>> Thanks!
>>
>> Sam
>>
>
>

Re: [julia-users] Re: Limiting time of multicore run and related cleanup

2015-04-29 Thread Amit Murthy

Your solution seems reasonable enough.

Another solution : You could schedule a task in your julia code which will
interrupt the workers after a timeout
@schedule begin
  sleep(600)
  if pmap_not_complete
 interrupt(workers())
  end
end

Start this task before executing the pmap

Note that this will work only for additional processes created on the local
machine. For SSH workers, `interrupt` is a message sent to the remote
workers, which will be unable to process it if the main thread is
computation bound.



On Thu, Apr 30, 2015 at 9:08 AM, Pavel  wrote:

> Here is my current bash-script (same timeout-way due to the lack of
> alternative suggestions):
>
> timeout 600 julia -p $(nproc) juliacode.jl >>results.log 2>&1
> killall -9 -v julia >>cleanup.log 2>&1
>
> Does that seem reasonable? Perhaps Linux experts may think of some
> scenarios where this would not be sufficient as far as the
> runaway/non-responding process cleanup?
>
>
>
> On Thursday, April 2, 2015 at 12:15:33 PM UTC-7, Pavel wrote:
>>
>> What would be a good way to limit the total runtime of a multicore
>> process managed by pmap?
>>
>> I have pmap processing a collection of optimization runs (with fminbox)
>> and most of the time everything runs smoothly. On occasion however 1-2 out
>> of e.g. 8 CPUs take too long to complete one optimization, and
>> fminbox/conj. grad. does not have a way to limit run time as recently
>> discussed:
>>
>> http://julia-programming-language.2336112.n4.nabble.com/fminbox-getting-quot-stuck-quot-td12163.html
>>
>> To deal with this in a crude way, at the moment I call Julia from a shell
>> (bash) script with timeout:
>>
>> timeout 600 julia -p 8 juliacode.jl
>>
>> When doing this, is there anything to help find and stop zombie-processes
>> (if any) after timeout forces a multicore pmap run to terminate? Anything
>> within Julia related to how the processes are spawned? Any alternatives to
>> shell timeout? I know NLopt has a time limit option but that is not
>> implemented within Julia (but in the underlying C-library).
>>
>>

Re: [julia-users] Re: Limiting time of multicore run and related cleanup

2015-04-29 Thread Amit Murthy

`interrupt(workers())` is the equivalent of sending a SIGINT to the
workers. The tasks which are consuming 100% CPU are interrupted and they
terminate with an InterruptException.

All processes are still in a running state after this.

On Thu, Apr 30, 2015 at 10:02 AM, Pavel  wrote:

> The task-option is interesting. Let's say there are 8 CPU cores. Julia's
> ncpus() returns 9 when started with `julia -p 8`, that is to be expected.
> All 8 cores are 100% loaded during the pmap call. Would
> `interrupt(workers())` leave one running?
>
> On Wednesday, April 29, 2015 at 8:48:15 PM UTC-7, Amit Murthy wrote:
>>
>> Your solution seems reasonable enough.
>>
>> Another solution : You could schedule a task in your julia code which
>> will interrupt the workers after a timeout
>> @schedule begin
>>   sleep(600)
>>   if pmap_not_complete
>>  interrupt(workers())
>>   end
>> end
>>
>> Start this task before executing the pmap
>>
>> Note that this will work only for additional processes created on the
>> local machine. For SSH workers, `interrupt` is a message sent to the remote
>> workers, which will be unable to process it if the main thread is
>> computation bound.
>>
>>
>>
>> On Thu, Apr 30, 2015 at 9:08 AM, Pavel  wrote:
>>
>>> Here is my current bash-script (same timeout-way due to the lack of
>>> alternative suggestions):
>>>
>>> timeout 600 julia -p $(nproc) juliacode.jl >>results.log 2>&1
>>> killall -9 -v julia >>cleanup.log 2>&1
>>>
>>> Does that seem reasonable? Perhaps Linux experts may think of some
>>> scenarios where this would not be sufficient as far as the
>>> runaway/non-responding process cleanup?
>>>
>>>
>>>
>>> On Thursday, April 2, 2015 at 12:15:33 PM UTC-7, Pavel wrote:
>>>>
>>>> What would be a good way to limit the total runtime of a multicore
>>>> process managed by pmap?
>>>>
>>>> I have pmap processing a collection of optimization runs (with fminbox)
>>>> and most of the time everything runs smoothly. On occasion however 1-2 out
>>>> of e.g. 8 CPUs take too long to complete one optimization, and
>>>> fminbox/conj. grad. does not have a way to limit run time as recently
>>>> discussed:
>>>>
>>>> http://julia-programming-language.2336112.n4.nabble.com/fminbox-getting-quot-stuck-quot-td12163.html
>>>>
>>>> To deal with this in a crude way, at the moment I call Julia from a
>>>> shell (bash) script with timeout:
>>>>
>>>> timeout 600 julia -p 8 juliacode.jl
>>>>
>>>> When doing this, is there anything to help find and stop
>>>> zombie-processes (if any) after timeout forces a multicore pmap run to
>>>> terminate? Anything within Julia related to how the processes are spawned?
>>>> Any alternatives to shell timeout? I know NLopt has a time limit option but
>>>> that is not implemented within Julia (but in the underlying C-library).
>>>>
>>>>
>>

Re: [julia-users] Re: Limiting time of multicore run and related cleanup

2015-04-29 Thread Amit Murthy

 `interrupt` will work for local workers as well as SSH ones. I had
mentioned otherwise above.

On Thu, Apr 30, 2015 at 12:08 PM, Amit Murthy  wrote:

> `interrupt(workers())` is the equivalent of sending a SIGINT to the
> workers. The tasks which are consuming 100% CPU are interrupted and they
> terminate with an InterruptException.
>
> All processes are still in a running state after this.
>
> On Thu, Apr 30, 2015 at 10:02 AM, Pavel  wrote:
>
>> The task-option is interesting. Let's say there are 8 CPU cores. Julia's
>> ncpus() returns 9 when started with `julia -p 8`, that is to be expected.
>> All 8 cores are 100% loaded during the pmap call. Would
>> `interrupt(workers())` leave one running?
>>
>> On Wednesday, April 29, 2015 at 8:48:15 PM UTC-7, Amit Murthy wrote:
>>>
>>> Your solution seems reasonable enough.
>>>
>>> Another solution : You could schedule a task in your julia code which
>>> will interrupt the workers after a timeout
>>> @schedule begin
>>>   sleep(600)
>>>   if pmap_not_complete
>>>  interrupt(workers())
>>>   end
>>> end
>>>
>>> Start this task before executing the pmap
>>>
>>> Note that this will work only for additional processes created on the
>>> local machine. For SSH workers, `interrupt` is a message sent to the remote
>>> workers, which will be unable to process it if the main thread is
>>> computation bound.
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 9:08 AM, Pavel  wrote:
>>>
>>>> Here is my current bash-script (same timeout-way due to the lack of
>>>> alternative suggestions):
>>>>
>>>> timeout 600 julia -p $(nproc) juliacode.jl >>results.log 2>&1
>>>> killall -9 -v julia >>cleanup.log 2>&1
>>>>
>>>> Does that seem reasonable? Perhaps Linux experts may think of some
>>>> scenarios where this would not be sufficient as far as the
>>>> runaway/non-responding process cleanup?
>>>>
>>>>
>>>>
>>>> On Thursday, April 2, 2015 at 12:15:33 PM UTC-7, Pavel wrote:
>>>>>
>>>>> What would be a good way to limit the total runtime of a multicore
>>>>> process managed by pmap?
>>>>>
>>>>> I have pmap processing a collection of optimization runs (with
>>>>> fminbox) and most of the time everything runs smoothly. On occasion 
>>>>> however
>>>>> 1-2 out of e.g. 8 CPUs take too long to complete one optimization, and
>>>>> fminbox/conj. grad. does not have a way to limit run time as recently
>>>>> discussed:
>>>>>
>>>>> http://julia-programming-language.2336112.n4.nabble.com/fminbox-getting-quot-stuck-quot-td12163.html
>>>>>
>>>>> To deal with this in a crude way, at the moment I call Julia from a
>>>>> shell (bash) script with timeout:
>>>>>
>>>>> timeout 600 julia -p 8 juliacode.jl
>>>>>
>>>>> When doing this, is there anything to help find and stop
>>>>> zombie-processes (if any) after timeout forces a multicore pmap run to
>>>>> terminate? Anything within Julia related to how the processes are spawned?
>>>>> Any alternatives to shell timeout? I know NLopt has a time limit option 
>>>>> but
>>>>> that is not implemented within Julia (but in the underlying C-library).
>>>>>
>>>>>
>>>
>

Re: [julia-users] spawning processes remotely and locally

2015-06-08 Thread Amit Murthy

Julia 0.4 supports "auto*host" which will launch as many workers on host as
cores.

On Tue, Jun 9, 2015 at 12:29 AM, Matt Wolff  wrote:

> I was poking around the julia parallel tutorial and source, it looks like
> if you spin up a cluster using a --machinefile, (where every process is on
> it's own machine), that the remote workers would not be able to spin up
> additional local processes.
>
> For example, if you spin up a 4 machine cluster, and each worker has 4
> cores, you can't run addprocs() on the workers to utilize the 4 locally
> available worker cores.
>
> Is there something on the julia roadmap to be able to leverage free cores
> on a worker? Or is this something that can be done with the current julia
> implementation?
>
> As an example, here is a quick script I wrote to test this:
>
> #run this function from the master node
> function dostuff()
>   @everywhere include("parallel/testworkers.jl")
>   for worker in workers()
> remotecall(worker, testworkers, 12)
>   end
> end
>
> #each workers executes this
> function testworkers(numberofinterns::Int)
>   addprocs(numberofinterns)
>   nheads = @parallel (+) for i=1:200
> Int(rand(Bool))
>   end
>   println("finished")
> end
>
>
>

Re: [julia-users] spawning processes remotely and locally

2015-06-09 Thread Amit Murthy

The machinefile format is n*hostname for n processes at hostname. Or
auto*hostname for detecting the number of cores at hostname.

addprocs accepts an array of tuples or strings.

addprocs([("host1", :auto), ("host2", 5), "host3"]) will launch as many
workers as cores on host1, 5 workers on host2 and a single worker on host3.

On Tue, Jun 9, 2015 at 3:44 PM, Tim Holy  wrote:

> On Tuesday, June 09, 2015 11:15:52 AM Amit Murthy wrote:
> > Julia 0.4 supports "auto*host" which will launch as many workers on host
> as
> > cores.
>
> Amit, can you be a little more explicit about how one uses this? I tried
> this:
> julia> addprocs("auto*host")
> ERROR: MethodError: `addprocs` has no method matching
> addprocs(::ASCIIString)
>
> julia> addprocs(["auto*host"])
> ssh: Could not resolve hostname auto*host: Name or service not known
>
> (which just hung)
>
> --Tim
>
>

Re: [julia-users] pmap problems

2015-06-16 Thread Amit Murthy

pmap returns a new array. As per convention, do note that it is not pmap!

On Tue, Jun 16, 2015 at 8:19 PM, Chris <7hunderstr...@gmail.com> wrote:

> Hello,
>
> I am having some issues using pmap(). I constructed a basic example that
> illustrates the problem I'm seeing:
>
> *in "partest.jl":*
> module MyMod
> type mytype
> a::Float64
> b::Int64
> c::Float64
> end
>
> function myfun!(a,b,c)
> a.a += b*c
> end
>
> function mywrapper!(s)
> pmap( x->myfun!(x,1.,1.), s )
> end
> end
>
> *In the Julia REPL:*
> include("partest.jl")
> addprocs(4)
> @everywhere using MyMod
> v = [MyMod.mytype(randn(i)[1],i,0.) for i = 1:1000];
> MyMod.mywrapper!(v);
>
> I expect the elements of v to be changed according to the operation in
> myfun!, but they are not. What am I doing wrong here?
>
> Thanks,
> Chris
>

Re: [julia-users] questions about coroutines

2015-06-30 Thread Amit Murthy

Agree with your comments. There is enough room to speed up these
constructs. I have some thoughts on having channels in Base. Will post a
note about it on Julia-dev in a couple of days.

On Tue, Jun 30, 2015 at 12:20 PM, g  wrote:

> I'm trying to understand coroutines for single threaded concurrency. I'm
> surprised that produce()/consume() and RemoteRef have no mechanism for type
> information, and as I understand it, are type unstable. One should
> therefore avoid using these in inner loops I guess? Should we expect this
> to remain the case?
>
> In Go it appears to be encouraged to heavily use concurrency even for
> things as simple as generating a series of numbers. In julia you might say
> a = zeros(Int,100)
> for i=1:100 a[i]=i end
>
> In go you could certainly do that, but in my brief introduction it
> appeared to be encouraged to do create a function that pushes i onto a
> "channel" and then read numbers out from the channel. In julia with
> produce/consume instead of a channel that would look like
> counter() = for i=1:100 produce(i) end
> a = zeros(Int,100)
> task = Task(counter)
> for i=1:100 a[i]=consume(task) end
> This seems seems to violate julia performance guidelines as I understand
> them, since the the compiler doesn't know what type consume(task) will
> return.
>
> So I tried to implement something more like a go channel, that has type
> information. I thought it would be faster
> type GoChannel{T}
> v::T
> hasvalue::Bool
> c::Condition
> end
> It turned out to be about 2x slower than the produce()/consume() approach,
> and using RemoteRef (which has a more go channel like API) is about 4x
> slower than produce()/consume(). On my computer anyway, the
> produce()/consume() approach is about 7x slower than using a channel in go
> for the same thing.
>
> Gist with full code:
> https://gist.github.com/g/c93a0f029620deca4f2e
>
> So I guess my questions are:
> Why isn't there at least an option to encode type information in
> consume/produce and RemoteRef?
> Is there anything I could do to speed up my GoChannel implementation? Did
> I make any mistakes in my other benchmarks?
>
>  I know this is one of go's defining features, so it is perhaps
> unsurprising that Julia isn't as fast for this yet. Is there room to speed
> these concurrent communication constructs up?
>
> Thanks.
>

Re: [julia-users] Re: questions about coroutines

2015-07-06 Thread Amit Murthy

Check out https://github.com/JuliaLang/julia/pull/12042

Currently produce/consume work in lock-step. I think they should use a
channel for inter-task communication once the above PR is merged.

On Wed, Jul 1, 2015 at 3:03 AM, g  wrote:

> Looking forward to your note Amit. Maybe I can find a small part to help
> with.
>
> I'm just learning about concurrent programming, so your comments are
> helpful Tom. I noted all the benchmarks I did with julia relative to go,
> but I also did a simple go loop vs go with channels and the slowdown was a
> factor of a few hundred, which probably means this type of programming
> isn't well suited for inner loops unless as you suggest there is some sort
> of automatic translation to unrolled loops. I guess it's probably more
> appropriate for larger organizational type parts.
>
>
>
> On Tuesday, June 30, 2015 at 10:20:36 AM UTC-6, g wrote:
>>
>> I'm trying to understand coroutines for single threaded concurrency. I'm
>> surprised that produce()/consume() and RemoteRef have no mechanism for type
>> information, and as I understand it, are type unstable. One should
>> therefore avoid using these in inner loops I guess? Should we expect this
>> to remain the case?
>>
>> In Go it appears to be encouraged to heavily use concurrency even for
>> things as simple as generating a series of numbers. In julia you might say
>> a = zeros(Int,100)
>> for i=1:100 a[i]=i end
>>
>> In go you could certainly do that, but in my brief introduction it
>> appeared to be encouraged to do create a function that pushes i onto a
>> "channel" and then read numbers out from the channel. In julia with
>> produce/consume instead of a channel that would look like
>> counter() = for i=1:100 produce(i) end
>> a = zeros(Int,100)
>> task = Task(counter)
>> for i=1:100 a[i]=consume(task) end
>> This seems seems to violate julia performance guidelines as I understand
>> them, since the the compiler doesn't know what type consume(task) will
>> return.
>>
>> So I tried to implement something more like a go channel, that has type
>> information. I thought it would be faster
>> type GoChannel{T}
>> v::T
>> hasvalue::Bool
>> c::Condition
>> end
>> It turned out to be about 2x slower than the produce()/consume()
>> approach, and using RemoteRef (which has a more go channel like API) is
>> about 4x slower than produce()/consume(). On my computer anyway, the
>> produce()/consume() approach is about 7x slower than using a channel in go
>> for the same thing.
>>
>> Gist with full code:
>> https://gist.github.com/g/c93a0f029620deca4f2e
>>
>> So I guess my questions are:
>> Why isn't there at least an option to encode type information in
>> consume/produce and RemoteRef?
>> Is there anything I could do to speed up my GoChannel implementation? Did
>> I make any mistakes in my other benchmarks?
>>
>>  I know this is one of go's defining features, so it is perhaps
>> unsurprising that Julia isn't as fast for this yet. Is there room to speed
>> these concurrent communication constructs up?
>>
>> Thanks.
>>
>

1 2 >

1 - 100 of 147 matches

Mail list logo