[julia-users] Re: [ANN] JuliaIO and FileIO

2015-04-04 Thread Paulo Castro
That' s a very nice idea. Having a common way to load files with different 
backends is very neat and very useful. Even the idea of having a file_str 
macro is a very Julian way to do thing, I believe.

Maybe FastaIO could benefit from this model (and also other parsers for 
biological data). We should contact BioJulia and FastaIO guys to see what 
can be done.

Em sábado, 4 de abril de 2015 12:41:14 UTC-3, Simon Danisch escreveu:

 Hi there,

 FileIO has the aim to make it very easy to read any arbitrary file.
 I hastily copied together a proof of concept by taking code from Images.jl.

 JuliaIO is the umbrella group, which takes IO packages with no home. If 
 someone wrote an IO package, but doesn't have time to implement the FileIO 
 interface, giving it to JuliaIO might be a good idea in order to keep the 
 package usable.

 Concept of FileIO is described in the readme:

 Meta package for FileIO. Purpose is to open a file and return the 
 respective Julia object, without doing any research on how to open the file.

 f = filetest.jpg # - File{:jpg}read(f) # - Imageread(filetest.obj) # - 
 Meshread(filetest.csv) # - DataFrame

 So far only Images are supported and MeshIO is on the horizon.

 It is structured the following way: There are three levels of abstraction, 
 first FileIO, defining the file_str macro etc, then a meta package for a 
 certain class of file, e.g. Images or Meshes. This meta package defines the 
 Julia datatype (e.g. Mesh, Image) and organizes the importer libraries. 
 This is also a good place to define IO library independant tests for 
 different file formats. Then on the last level, there are the low-level 
 importer libraries, which do the actual IO. They're included via Mike Innes 
 Requires https://github.com/one-more-minute/Requires.jl package, so 
 that it doesn't introduce extra load time if not needed. This way, using 
 FileIO without reading/writing anything should have short load times.

 As an implementation example please look at FileIO - ImageIO - 
 ImageMagick. This should already work as a proof of concept. Try:

 using FileIO # should be very fast, thanks to Mike Innes Requires 
 packageread(filetest.jpg) # takes a little longer as it needs to load the 
 IO libraryread(filetest.jpg) # should be fastread(File(documents, 
 images, myimage.jpg) # automatic joinpath via File constructor

 Please open issues if things are not clear or if you find flaws in the 
 concept/implementation.

 If you're interested in working on this infrastructure I'll be pleased to 
 add you to the group JuliaIO.


 Best,

 Simon



[julia-users] Re: Introducing Julia wikibook

2015-02-16 Thread Paulo Castro
It's not bad, but could be much better. We currently need contributors, 
since some of them gone work on other manuals/books sometime ago. If anyone 
here could help, we would be glad.

Em quarta-feira, 11 de fevereiro de 2015 18:18:39 UTC-2, David P. Sanders 
escreveu:

 Just stumbled on this, which seems not bad (though I haven't looked in 
 detail):

 http://en.wikibooks.org/wiki/Introducing_Julia



[julia-users] Help: K-Means Clustering Algorithm

2014-07-03 Thread Paulo Castro
Hi guys,

I'm trying to implement the K-Means Clustering Algorithm, but I'm having 
some problems. The function I wrote:

function kcluster(data; distance = pearson, k=4)
# Generate a list of tuples of the min and max values of each column of 
data
ranges = [(minimum(data[:,i]), maximum(data[:,i])) for i in 1:size(data,
2)]

# Create k randomly placed centroids
centroids = [rand()*ranges[j][2] - ranges[j][1] + ranges[j][1] for i in 
1:k, j in 1:length(ranges)]

lastmatches = Any[]
for t in 1:100
println(Iteration $t)
bestmatches = [Int[] for i in 1:k]

# Get best matches for each cluster
for j in 1:size(data, 1)
row = data[j, :]
bestmatch = 1
bestd = distance(centroids[bestmatch, :], row)

for i in 1:k
d = distance(centroids[i, :], row)
if d  bestd
bestd = d
bestmatch  = i
end
end

push!(bestmatches[bestmatch], j)
end

if lastmatches == bestmatches
return lastmatches
end

lastmatches = bestmatches

# Move clusters to the average of its matches
numcols = size(data, 2)
for i in 1:k
avgs = zeros(1, numcols)
if length(bestmatches[i])  0
for row in bestmatches[i]
avgs += data[row, :]
end

avgs /= length(bestmatches[i])
centroids[i, :] = avgs
end
end
end

return lastmatches
end

The data argument is a two dimensional Array, each row representing an 
individual, and each column its position on space.

The problem is the following: the same algorithm in Python (with the same 
data input), use to stop near iteration #5, and in Julia it always goes 
to the iteration #100. The not-empty clusters on Python are also smaller, 
therefore there are less empty clusters. Can somebody find why it never 
enters the if lastmatches == bestmatches block?

Sorry about my poor english


Re: [julia-users] Help: K-Means Clustering Algorithm

2014-07-03 Thread Paulo Castro
Hi John,

Thanks for the tip, but actually I'm not using this function for 
production. I was reading the Programming Collective Intelligence, and 
trying to implement the examples in Julia rather than Python (with some 
complications as missing packages, like Beatiful Soup, but thats ok...). 
So, this is an exercise to help me with understanding these algorithms, and 
learn more Julia at the same time. 

The next book I'll try this is, guess what, Machine Learning for hackers! 
Hope that the transition of the algorithms on that book is easier.

Thanks again!

Em quinta-feira, 3 de julho de 2014 19h29min37s UTC-3, John Myles White 
escreveu:

 Hi Paulo,

 Rather than implement k-means from scratch, I'd encourage you to use the 
 implementation in the Clustering.jl package.

  -- John

 On Jul 3, 2014, at 2:51 PM, Paulo Castro p.olivei...@gmail.com 
 javascript: wrote:

 Hi guys,

 I'm trying to implement the K-Means Clustering Algorithm, but I'm having 
 some problems. The function I wrote:

 function kcluster(data; distance = pearson, k=4)
 # Generate a list of tuples of the min and max values of each column 
 of data
 ranges = [(minimum(data[:,i]), maximum(data[:,i])) for i in 1:size(
 data,2)]

 # Create k randomly placed centroids
 centroids = [rand()*ranges[j][2] - ranges[j][1] + ranges[j][1] for i 
 in 1:k, j in 1:length(ranges)]

 lastmatches = Any[]
 for t in 1:100
 println(Iteration $t)
 bestmatches = [Int[] for i in 1:k]

 # Get best matches for each cluster
 for j in 1:size(data, 1)
 row = data[j, :]
 bestmatch = 1
 bestd = distance(centroids[bestmatch, :], row)

 for i in 1:k
 d = distance(centroids[i, :], row)
 if d  bestd
 bestd = d
 bestmatch  = i
 end
 end

 push!(bestmatches[bestmatch], j)
 end

 if lastmatches == bestmatches
 return lastmatches
 end

 lastmatches = bestmatches

 # Move clusters to the average of its matches
 numcols = size(data, 2)
 for i in 1:k
 avgs = zeros(1, numcols)
 if length(bestmatches[i])  0
 for row in bestmatches[i]
 avgs += data[row, :]
 end

 avgs /= length(bestmatches[i])
 centroids[i, :] = avgs
 end
 end
 end

 return lastmatches
 end

 The data argument is a two dimensional Array, each row representing an 
 individual, and each column its position on space.

 The problem is the following: the same algorithm in Python (with the same 
 data input), use to stop near iteration #5, and in Julia it always goes 
 to the iteration #100. The not-empty clusters on Python are also smaller, 
 therefore there are less empty clusters. Can somebody find why it never 
 enters the if lastmatches == bestmatches block?

 Sorry about my poor english




[julia-users] How can I create a simple Graph using Graphs.jl?

2014-06-01 Thread Paulo Castro
Hi guys,

Sorry about making this kind of question, but even after reading the 
documentation, I don't know how to create the simplest graph object using 
Graphs.jl. For example, I want to create the following graph:


http://i.stack.imgur.com/820Fl.png

Can someone give me the directions to start?


[julia-users] DataFrames: Problems with Split-Apply-Combine strategy

2014-05-22 Thread Paulo Castro
 

*I made this question on StackOverflow, but I think I will get better 
results posting it here. We should use that platform more, so Julia is more 
exposed to R/Python/Matlab users needing something like it.*

I have some data (from a R course assignment, but that doesn't matter) that 
I want to use split-apply-combine strategy, but I'm having some problems. 
The data is on a DataFrame, called outcome, and each line represents a 
Hospital. Each column has an information about that hospital, like name, 
location, rates, etc.

*My objective is to obtain the Hospital with the lowest Mortality by Heart 
Attack Rate of each State.*

I was playing around with some strategies, and got a problem using the 
byfunction:

best_heart_rate(df) = sort(df, cols = :Mortality)[end,:] 
best_hospitals = by(hospitals, :State, best_heart_rate)

 The idea was to split the hospitals DataFrame by State, sort each of the 
SubDataFrames by Mortality Rate, get the lowest one, and combine the lines 
in a new DataFrame

But when I used this strategy, I got:

ERROR: no method nrow(SubDataFrame{Array{Int64,1}})
 in sort at /home/paulo/.julia/v0.3/DataFrames/src/dataframe/sort.jl:311
 in sort at /home/paulo/.julia/v0.3/DataFrames/src/dataframe/sort.jl:296
 in f at none:1
 in based_on at 
/home/paulo/.julia/v0.3/DataFrames/src/groupeddataframe/grouping.jl:144
 in by at 
/home/paulo/.julia/v0.3/DataFrames/src/groupeddataframe/grouping.jl:202

I suppose the nrow function is not implemented for SubDataFrames for a good 
reason, so I gave up from this strategy. Then I used a nastier code:

best_heart_rate(df) = (df[sortperm(df[:,:Mortality] , rev=true), :])[1,:]
best_hospitals = by(hospitals, :State, best_heart_rate)

Seems to work. But now there is a NA problem: how can I remove the rows 
from the SubDataFrames that have NA on the Mortality column? Is there a 
better strategy to accomplish my objective?


[julia-users] Re: JuliaCon Question Thread

2014-05-22 Thread Paulo Castro
Remembering to put a link to the slides on video's description would also 
be useful. Thanks!


[julia-users] Gadfly: plotting histogram of a integer variable x

2014-05-21 Thread Paulo Castro
Hi guys,

I'm having a problem when plotting data like this:
data = int(round(dropna(outcome[:,11])))
p = plot(x=data, Geom.histogram)

Here, data is an Array{Int64,1}. But the plot I get after running this have 
the bars spread, ignoring the space between integers. Is this expected? How 
can I make the same plot, but enlarging the bars so they touch each other?



https://lh6.googleusercontent.com/-5_gJWTBY4vY/U30w2QEPWyI/AGU/mfgnxTzo8uE/s1600/myplot.png









[julia-users] Problems with NA on Array{Any} using square brackets notation

2014-05-20 Thread Paulo Castro
Hi there,

I'm using DataFrames package and having problems using NA on Array{Any}. 
Here is the thing:

julia a = [true 2 hi]
1x3 Array{Any,2}:
 true  2  hi

That's expected, Julia had no way to promote this elements to a single 
concrete type, so I got an Array{Any}. Now, if I try:

julia a[3] = NA
1x3 Array{Any,2}:
 true  2  NA

Thats also expected! NA is of NAType, that is a subtype of Any (as any 
other type). But if I try:

julia a = [true 2 NA]
ERROR: no method convert(Type{Int64}, NAtype)
 in setindex! at multidimensional.jl:63
 in cat at abstractarray.jl:625
 in hcat at abstractarray.jl:632

That's unexpected. What am I missing?

Thanks and sorry about my poor English.


[julia-users] Re: JuliaCon: Registration is open!

2014-05-06 Thread Paulo Castro
Will the event organizators post videos on youtube for Julia users that 
couldn't attend?

Em terça-feira, 6 de maio de 2014 16h51min03s UTC-3, Evan Miller escreveu:

 On behalf of the JuliaCon committee[1], I'm pleased to announce that 
 registration is now open for JuliaCon, taking place June 26 and 27 at the 
 University of Chicago's Gleacher Center in Chicago, IL. JuliaCon will be a 
 two-day, single-track conference and an excuse for the Julia faithful to 
 get together and geek out. Tickets are $110.

 Conference website: http://www.juliacon.org/

 Registration page: http://juliacon.eventbrite.com/

 We have some great speakers lined up already, and are looking for more. 
 See the Call For Participation on the website for details. Talk proposals 
 are due Monday, May 26.

 The day after the conference, Hack@UChicago will host a free Julia Hack 
 Day at the University of Chicago's Hyde Park campus. You'll need to 
 register for this event separately (link on website).

 Sponsorship opportunities are available, which will help us keep the 
 ticket prices nice and cheap. If your organization might be interested in 
 being a JuliaCon sponsor, please contact me directly.

 Mark your calendar, and get pumped! This will be a great event. More 
 information will be posted on the website as the conference approaches.

 See you in June!

 Evan



 1. Full committee:

 Douglas Bates, Wisconsin-Madison
 Jeff Bezanson, MIT
 Jonah Bloch-Johnson, UChicago
 Jiahao Chen, MIT
 Alan Edelman, MIT
 Garrett Smith, CloudBees
 Jeff Hammond, Argonne
 Tim Holy, WUSTL
 Stefan Karpinski, MIT
 Evan Miller, Wizard
 Hunter Owens, UChicago
 James Porter, UChicago
 Arch Robison, Intel
 Viral B Shah, MIT
  


[julia-users] Re: Mysterious setindex! error when running Runge-Kutta-Fehlberg for two coupled functions

2014-05-04 Thread Paulo Castro
Thanks, it worked!

Em domingo, 4 de maio de 2014 19h06min47s UTC-3, Tony Kelman escreveu:

 On the lines where you initialize k and l to [0 0 0 0 0 0], that is an 
 array of integers by default. Then in the for loops where you try to assign 
 a floating-point result into that array, you get an InexactError because a 
 general floating-point number can't be exactly represented as an integer. 
 If you change those to float64([0 0 0 0 0 0]) or Float64[0 0 0 0 0 0] or 
 [0.0 0.0 0.0 0.0 0.0 0.0], and likewise with t0, v0, and w0, it should work.


 On Sunday, May 4, 2014 2:40:17 PM UTC-7, Paulo Castro wrote:

 I was trying to port a function from a previous Python code of mine, and 
 ended with this:

 # Rubge-Kutta-Fehlberg
 function RKF45(f, g, h, t0, tf, v0, w0)
 # Initial values
 t = t0
 v5 = v4 = v = v0
 w = w0

 tolerance = 10^(-5.0)

 # Create list for future plots
 t_list = [t]
 v_list = [v]
 w_list = [w]

 # Values for a, b and c (as on Butcher's tableau)
 a = [0  0   0  000;
 1/4 0   0  000;
 3/32   9/32 0  000;
  1932/2197 -7200/2197   7296/2197  000;
   439/216  -8   3680/513   -845/410400;
-8/272  -3544/2565  1859/4104 -11/40  0]

 b = [16/135  0   6656/12825  28561/56430  -9/50  2/55;
  25/216  0   1408/2565   2197/4104 -1/5   0]

 c = [0 1/4 3/8 12/13 1 1/2]

 # Relative to y
 k = [0 0 0 0 0 0]

 # Relative to z
 l = [0 0 0 0 0 0]

 # Compute the next terms
 for i in t0:h:tf
 # Compute the next values of K and L
 for j in 1:6
 k[j] = f(t + c[j] * h, v + (a[j,:] * k')[1], w + (a[j,:] * l'
 )[1])*h
 l[j] = g(t + c[j] * h, v + (a[j,:] * k')[1], w + (a[j,:] * l'
 )[1])*h
 end


 
 
 # Compute the next value of V   
#
 # Here we implemented a tolerance test   
   #
 
 
 v4 = v + (b[2] * k')[1] 
   #
 v5 = v + (b[1] * k')[1] 
   #
   
  #
 error = abs(v5 - v4) 
   #
 if error  tolerance 
   #
   
  #
 h = 0.9 * h * ((tolerance/error) ^ (0.25))   
   #
   
  #
 for j in 1:6 
   #
 k[j] = f(t + c[j] * h, v + (a[j] * k')[1], w + (a[j] * l'
 )[1])*h   #
 l[j] = g(t + c[j] * h, v + (a[j] * k')[1], w + (a[j] * l'
 )[1])*h   #
 end 
#
   
  #
 v5 = v + (b[1,:] * k')[1]   
#
 end   
  #
   
  #
 v = v5   
   #
 
 

 # Compute T and W with the right values of H and L, obtained 
 after the tolerance test 
 t += h
 w += (b[1,:] * l')[1]

 # Append new values to the lists
 push!(t_list,t)
 push!(v_list,v)
 push!(w_list,w)
 end

 return t_list, v_list, w_list
 end

 After running this code, I got a mysterious error message:



 *ERROR: InexactError() in setindex! at array.jl:346while loading 
 /home/paulo/Documents/Working/ex1.jl, in expression starting on line 28*

 ex1.jl is the file I used to test my function:

 include(rungekutta.jl)

 function f(t, v, w)
 return (v*(v-a)*(1-v) - w + I)/ε
 end

 function g(t, v, w)
 return (v - p*w - b)
 end

 ε = 0.005
 a = 0.5
 b = 0.15
 p = 1.0
 I = 0.0
 h = 0.005

 # Initial and final values of time
 t0 =  0
 tf =  8

 t, v, w = RKF45(f, g, h, t0, tf, 0, 0) # v0 = w0 = 0

 Can someone help me with finding what's the problem? I tried a lot of 
 things, but always end with this error.

 Thanks!



[julia-users] Array of images

2014-03-21 Thread Paulo Castro
Hi,

I am starting using Julia, and I'm having a simple problem. I have some 
images on a directory, and I want to iterate over each one, open it with 
Images' imread(), and store it on an array.
I cannot create a empty array and append images to it. How can I achieve 
this?

Thanks,

Paulo


[julia-users] How do I optimize a multi-argument function with Optim.jl?

2014-02-28 Thread Paulo Castro
Hi!

I'm doing the Machine Learning course exercises with Julia. I know how to 
use Optim.jl when the cost function only have one argument, for example:


*f(t) = someFunctionOfTheta(t)*

*optimize(f, initial_theta)*

But one of the exercises is to run Octave's fminuc this way:


*options = optimset('GradObj', 'on', 'MaxIter', 400);[theta, cost] = 
fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);*

This piece of code compute t so that costFunction(t,X,y) is minimum. In 
this case, How do I do the same thing in Julia?

Thanks!


[julia-users] Different error messages for sqrt(-1) and sqrt(-1.0)

2014-02-13 Thread Paulo Castro
Have you already noticed that error messages (on julia 
0.3.0-prerelease+1419) for sqrt(-1) and sqrt(-1.0) are different?
Here:

julia sqrt(-1)
ERROR: DomainError

julia sqrt(-1.0)
ERROR: DomainError
sqrt will only return a complex result if called with a complex argument.
try sqrt(complex(x))
 in sqrt at math.jl:277

Is it a bug?