I was using the dataframes convert method that allows replacement of NA with an arbitrary value. I thought I had it working, but maybe I forgot to save and was running an old version.
Anyway, it appears I am using the method from the dataframes documentation, but it results in a type error: Using DataFrames city = readtable(fname) points = convert(Array, city[:,2:end], NaN) # converts NA values to NaN == not a number Results in: *ERROR: MethodError: `convert` has no method matching convert(::Type{Array{T,N}}, ::DataFrames.DataFrame, ::Float64)* *This may have arisen from a call to the constructor Array{T,N}(...),* *since type constructors fall back to convert methods.* Closest candidates are: convert{T,N}(::Type{Array{T,N}}, *::DataArrays.DataArray{T,N}*, ::Any) convert{T,R,N}(::Type{Array{T,N}}, *::DataArrays.PooledDataArray{T,R,N}*, ::Any) convert(::Type{Array{T,N}}, ::DataFrames.AbstractDataFrame) ... in hcluster at /Users/lewislevinmbr/Dropbox/Online Coursework/MIT Intro 6002x/Assignments/Probset_6/df_hcluster.jl:85 DataFrames documentation shows: dv = @data([NA, 3, 2, 5, 4])mean(convert(Array, dv, 11)) Seems like I am doing the same thing, just using the float value NaN. The columns of city that are being sliced are indeed Float64. This certainly works, but will fail if any value is NA (not a problem with sample dataset, but I would like to generalize...): points = Array{Float64}(city[:, 2:end]) # fails if any value is NA > Kept breaking this down and solved it. The convert with replacement of NA values only works on type::DataArray, not the DataFrames type. So, first convert to DataArray, then do the conversion with replacement of NA, thus: city = readtable(fname) > > points = convert(Array{Float64,2}, DataArray(city[:, 2:end]), NaN) # >> converts NA to NaN > > That's what I wanted--and got. Works like a champ. I think replacing NaN with NA is pretty useful. NaN's will propagate like NA's in a DataArray type, but Array{Float64} is noticeably faster. You could ask, "why are you doing this? ...like why even use a DataFrame at all with its ability to handle NAs if you are just going to convert back out of it....?" Well, good question. The simple answer is for the simple data reading and handling of row/col names and simple summary stats, etc. And then, since the core data array has no NA's and is float, get the improved performance for handling the data subset as Array{Float64,2}. And, yes, I did the experiment with readcsv and this also works, but provides no handling of NA. So, I think the most general is loading the data as a DataFrame, deciding what to do with NA, and then converting. There is enough here to handle lots of different approaches. Just experimenting.