dropna() is only defined for DataArrays. The individual columns in a DataFrame are DataArrays, but the DataFrame itself is not. There is a issue for it <https://github.com/JuliaStats/DataFrames.jl/issues/602>.
To get a Array out of a DataFrame you are best of building it yourself I think: complete_cases!(data) [data[:x1] data[:x2]] On Friday, July 4, 2014 4:07:54 AM UTC+3, Donald Lacombe wrote: > > Johan, > > I think there may be an issue with the Data Frames package as I get the > following: > > julia> data = readtable("test.csv",header=false) > > 6x2 DataFrame: > > x1 x2 > > [1,] 1 7 > > [2,] 2 8 > > [3,] 3 9 > > [4,] 4 10 > > [5,] 5 11 > > [6,] 6 12 > > > > julia> convert(Array,data) > > MethodError(convert,(Array{T,N},6x2 DataFrame: > > x1 x2 > > [1,] 1 7 > > [2,] 2 8 > > [3,] 3 9 > > [4,] 4 10 > > [5,] 5 11 > > [6,] 6 12 > > )) > > > julia> dropna(data) > > ErrorException("dropna not defined") > > > I read the documentation and they both say the same thing but it doesn't seem > to work in my case. > > > Thoughts? > > > Thanks, > > Don > > > On Thursday, July 3, 2014 7:54:49 PM UTC-4, Johan Sigfrids wrote: >> >> You can use dropna() to convert a DataArray to a Array. This will >> obviously drop any missing values. >> >> On Friday, July 4, 2014 2:08:55 AM UTC+3, Donald Lacombe wrote: >>> >>> Patrick (and others), >>> >>> Another issue that has reared it's ugly head is that when I read the >>> data using the Data Frames package, I get the following: >>> >>> data = readtable("ct_coord_2.csv",header=false) >>> >>> 8x2 DataFrame: >>> >>> x1 x2 >>> >>> [1,] -73.3712 41.225 >>> >>> [2,] -72.1065 41.4667 >>> >>> [3,] -73.2453 41.7925 >>> >>> [4,] -71.9876 41.83 >>> >>> [5,] -72.3365 41.855 >>> >>> [6,] -72.7328 41.8064 >>> >>> [7,] -72.5231 41.4354 >>> >>> [8,] -72.8999 41.3488 >>> >>> >>> julia> xc = data[:,1] >>> >>> 8-element DataArray{Float64,1}: >>> >>> -73.3712 >>> >>> -72.1065 >>> >>> -73.2453 >>> >>> -71.9876 >>> >>> -72.3365 >>> >>> -72.7328 >>> >>> -72.5231 >>> >>> -72.8999 >>> >>> >>> julia> yc = data[:,2] >>> >>> 8-element DataArray{Float64,1}: >>> >>> 41.225 >>> >>> 41.4667 >>> >>> 41.7925 >>> >>> 41.83 >>> >>> 41.855 >>> >>> 41.8064 >>> >>> 41.4354 >>> >>> 41.3488 >>> >>> >>> julia> xc=xc' >>> >>> 1x8 DataArray{Float64,2}: >>> >>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999 >>> >>> >>> julia> yc=yc' >>> >>> 1x8 DataArray{Float64,2}: >>> >>> 41.225 41.4667 41.7925 41.83 41.855 41.8064 41.4354 41.3488 >>> >>> >>> julia> temp = [xc;yc] >>> >>> 2x8 DataArray{Float64,2}: >>> >>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999 >>> >>> 41.225 41.4667 41.7925 41.83 41.8064 41.4354 41.3488 >>> >>> >>> julia> R = pairwise(Euclidean(),temp) >>> >>> MethodError(At_mul_B!,( >>> >>> 8x8 Array{Float64,2}: >>> >>> 2.7273e-316 2.7273e-316 2.67478e-315 … 2.7273e-316 2.7273e-316 >>> >>> 2.67736e-315 2.67736e-315 2.67736e-315 2.72726e-316 2.72726e-316 >>> >>> 2.67727e-315 2.67727e-315 2.67727e-315 2.67727e-315 2.67727e-315 >>> >>> 2.67727e-315 2.67727e-315 2.67727e-315 2.67727e-315 2.67727e-315 >>> >>> 4.94066e-324 4.94066e-324 4.94066e-324 9.88131e-324 4.94066e-324 >>> >>> 2.76235e-318 2.76235e-318 2.76235e-318 … 2.76235e-318 2.76235e-318 >>> >>> 4.94066e-324 4.94066e-324 4.94066e-324 9.88131e-324 4.94066e-324 >>> >>> 4.94066e-324 4.94066e-324 4.94066e-324 9.88131e-324 4.94066e-324, >>> >>> >>> 2x8 DataArray{Float64,2}: >>> >>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999 >>> >>> 41.225 41.4667 41.7925 41.83 41.8064 41.4354 41.3488, >>> >>> >>> 2x8 DataArray{Float64,2}: >>> >>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999 >>> >>> 41.225 41.4667 41.7925 41.83 41.8064 41.4354 41.3488)) >>> >>> >>> I do not think that the Distance package likes the types that is input into >>> the function, i.e. the vectors are DataArrays instead of Arrays. It works >>> just fine when I used Tony's idea: >>> >>> >>> julia> data = readcsv("ct_coord_2.csv",Float64) >>> >>> 8x2 Array{Float64,2}: >>> >>> -73.3712 41.225 >>> >>> -72.1065 41.4667 >>> >>> -73.2453 41.7925 >>> >>> -71.9876 41.83 >>> >>> -72.3365 41.855 >>> >>> -72.7328 41.8064 >>> >>> -72.5231 41.4354 >>> >>> -72.8999 41.3488 >>> >>> >>> julia> xc = data[:,1] >>> >>> 8-element Array{Float64,1}: >>> >>> -73.3712 >>> >>> -72.1065 >>> >>> -73.2453 >>> >>> -71.9876 >>> >>> -72.3365 >>> >>> -72.7328 >>> >>> -72.5231 >>> >>> -72.8999 >>> >>> >>> julia> yc = data[:,2] >>> >>> 8-element Array{Float64,1}: >>> >>> 41.225 >>> >>> 41.4667 >>> >>> 41.7925 >>> >>> 41.83 >>> >>> 41.855 >>> >>> 41.8064 >>> >>> 41.4354 >>> >>> 41.3488 >>> >>> >>> julia> xc=xc' >>> >>> 1x8 Array{Float64,2}: >>> >>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999 >>> >>> >>> julia> yc=yc' >>> >>> 1x8 Array{Float64,2}: >>> >>> 41.225 41.4667 41.7925 41.83 41.855 41.8064 41.4354 41.3488 >>> >>> >>> julia> temp = [xc;yc] >>> >>> 2x8 Array{Float64,2}: >>> >>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999 >>> >>> 41.225 41.4667 41.7925 41.83 41.8064 41.4354 41.3488 >>> >>> >>> julia> R = pairwise(Euclidean(),temp) >>> >>> 8x8 Array{Float64,2}: >>> >>> 0.0 1.28762 0.581327 1.51014 … 0.863479 0.873799 0.487347 >>> >>> 1.28762 0.0 1.18451 0.382214 0.712542 0.417808 0.802085 >>> >>> 0.581327 1.18451 0.0 1.25833 0.512668 0.805673 0.562309 >>> >>> 1.51014 0.382214 1.25833 0.0 0.745667 0.665227 1.03141 >>> >>> 1.21144 0.451294 0.910982 0.349837 0.399323 0.459258 0.757372 >>> >>> 0.863479 0.712542 0.512668 0.745667 … 0.0 0.426208 0.487124 >>> >>> 0.873799 0.417808 0.805673 0.665227 0.426208 0.0 0.386557 >>> >>> 0.487347 0.802085 0.562309 1.03141 0.487124 0.386557 0.0 >>> >>> >>> There seems to be some issue with the Distance package not accepting Data >>> Frames. Of course, the readcsv works fine but this might be an issue for >>> others as well. >>> >>> >>> Thanks, >>> >>> Don >>> >>> >>> >>> On Thursday, July 3, 2014 6:49:18 PM UTC-4, Patrick O'Leary wrote: >>>> >>>> On Thursday, July 3, 2014 5:36:23 PM UTC-5, Donald Lacombe wrote: >>>>> >>>>> I'm no GIS expert (I'm an applied econometrician) and the code I've >>>>> written seems to work. The Distance package also works with my "real" >>>>> data >>>>> which are the centroids of the counties in Connecticut and I tested it >>>>> with >>>>> Euclidean, Cityblock, and SqEuclidean. >>>>> >>>> >>>> Glad you got something working. Whether those distances are accurate >>>> enough depends on how the points are arranged and what you plan to do with >>>> it--I can see where it wouldn't make much difference in this case. I can't >>>> let the statisticians and image processing folks have all the technical >>>> conversation fun in this mailing list, though! >>>> >>>