Thanks for reporting -- it is a bug. Having a Array or DataArray with 
NAtype as its eltype is a little awkward. Here's why it's causing you 
trouble, and a couple alternatives:

using DataFrames
nrows = 3
a = DataFrame(A = 1:nrows)

# Column :A is all NA for all of these cases
b1 = DataFrame(A = fill(NA, nrows))
b2 = DataFrame(A = DataArray(Int, nrows))
b3 = DataFrame(A = DataArray(None, nrows))

vcat(a, b1) # ERROR: no method matching convert(::Type{Int64}, 
::DataArrays.NAtype)
vcat(a, b2) # okay
vcat(a, b3) # okay

It should probably work as is (if not, I guess the promotion rules should 
change, and the result should be of type Any or there should be a more 
informative error).

I opened an issue: https://github.com/JuliaStats/DataArrays.jl/issues/134, 
but given that most interested developers are focused on coming up with an 
replacement for DataArrays and NAtype, it may not get attention at the 
moment, so I'd avoid creating that ambiguous array if possible for now.



For your other question, conversion of columns, you'll generally use 
functions from Base Julia or DataArrays.jl to transform data however you 
like.

Categorical variables are (for the moment) represented using 
PooledDataArrays, so:
pdata(abstract_array) or convert(PooledDataArray, abstract_array)

And for strings:
map(string, abstract_array) or convert(some_string_type, abstract_array)


On Friday, January 2, 2015 3:05:31 PM UTC-7, Guillaume Guy wrote:
>
> Sean:
>
> I found the problem. Not sure if that is a "bug" per se.
>
> Looking at one element of the Array (which is subsequently vcat-ed):
>
>
> <https://lh5.googleusercontent.com/-qE0qADLTofE/VKcS_9LqP4I/AAAAAAAADsw/WqliDGO7Lnk/s1600/dfs.PNG>
>
> Note the NA in the equipment column. When running my function 
> (intermediary_point) on each row of my input dataframe, equipment (which is 
> a String column) becomes NA of NAType. Then, the resulting dataframe (see 
> above) has an equipment column type which is now NAtype.
>
> Anyway ... You end up with dfs that has some elements looking like that:
>
> 7-element Array{Type{T<:Top},1}:
>  UTF8String
>  NAtype    
>  UTF8String
>  UTF8String
>  Int64     
>  Float64   
>  Float64
>
>
> and some elements with the correct type. The vcat returns a convert error 
> trying to convert the NAtype into String.
>
>
> Is it a bug? Shouldn't the vcat convert the NAType into String?  
>
>
> Another question I have is about how to convert a column type within an 
> existing dataframe.... I'm looking for an Julia equivalent of R's *as.factor 
> *or *as.string . *Alternative, when running DataFrame(A=1:20,B=1:20), is 
> there a way to specify what A and B should be? 
>
>
> Thx! 
>
>
>
> On Wednesday, December 31, 2014 10:42:30 PM UTC-5, Sean Garborg wrote:
>>
>> If you Pkg.update() and try again, you should be fine. DataFrames was 
>> overdue for a tagged release -- you'll get v0.6.0 which includes some 
>> updates to vcat. As a gut check, this works just fine:
>>
>> using DataFrames
>> dfs = [DataFrame(Float64, 15, 15) for _=1:200_000]
>> vcat(dfs)
>>
>> (If it doesn't for you, definitely file an issue.)
>>
>> Happy New Year,
>> Sean
>>
>> On Thursday, December 25, 2014 5:06:23 PM UTC-7, Guillaume Guy wrote:
>>>
>>> Hi David:
>>>
>>> That is where the stack overflow error is thrown.
>>>
>>> I attached the code + the data in my first post for your reference.
>>>
>>>
>>> On Thursday, December 25, 2014 6:59:57 PM UTC-5, David van Leeuwen wrote:
>>>>
>>>> Hello Guillome, 
>>>>
>>>> On Monday, December 22, 2014 9:09:16 PM UTC+1, Guillaume Guy wrote:
>>>>>
>>>>> Dear Julia users:
>>>>>
>>>>> Coming from a R background, I like to work with list of dataframes 
>>>>> which i can reduce by doing do.call('rbind',list_of_df) 
>>>>>
>>>>> After ~10 years of using R, I only recently leaned of the do.call(). 
>>>>
>>>> In Julia, you would say:
>>>>
>>>> vcat(dfs...)
>>>>
>>>> ---david
>>>>  
>>>>
>>>>> In Julia, I attempted to use vcat for this purpose but I ran into 
>>>>> trouble:
>>>>>
>>>>> "
>>>>>
>>>>> stack overflow
>>>>> while loading In[29], in expression starting on line 1
>>>>>
>>>>> "
>>>>>
>>>>>
>>>>> This operation is basically the vcat of a large vector v consisting of 
>>>>> 68K small (11X7) dataframes. The code is attached.
>>>>>
>>>>> Thanks for your help! 
>>>>>
>>>>

Reply via email to