Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

John Myles White Sun, 09 Nov 2014 15:51:12 -0800

FWIW, I think the best way to move forward with NamedArrays is to replace 
NamedArrays with a parametric type Named{T} that wraps around other 
AbstractArray types. That gives you both named Array and named DataArray 
objects for the same cost.


 -- John

On Nov 9, 2014, at 5:49 PM, Tim Holy <tim.h...@gmail.com> wrote:

> Indeed, better to use a Dict if you're naming each row/column. I'd forgotten 
> that was part of NamedArrays.
> 
> --Tim
> 
> On Sunday, November 09, 2014 06:11:44 PM Milan Bouchet-Valat wrote:
>> Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit :
>>> With regards to arrays with named dimensions, I suspect that with the
>>> arrival of stagedfunctions, something like NamedAxesArrays
>>> (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But
>>> stagedfunctions still have some show-stopper bugs, and we need to fix
>>> those
>>> first.
>> 
>> Interesting package!
>> 
>> But when I said "named dimensions", I actually meant that dimensions had
>> names, but that elements on each dimension (rows, columns...) had names
>> too. I'm not sure it also makes sense to use staged functions to
>> specialize code on element names, since they can vary much more than
>> dimension names. This could generate quite a lot of methods which would
>> use memory even if only used once.
>> 
>> 
>> Regards
>> 
>>> On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote:
>>>> Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
>>>>> I would vote for calling such a function `table()`, to get even closer
>>>>> to R's table().
>>>> 
>>>> Well, that's the debate at
>>>> https://github.com/JuliaStats/StatsBase.jl/issues/32
>>>> 
>>>> At first I was in favor of table() too, but now I prefer freqtable(),
>>>> because "table" could mean any kind of cross-tabulation. I think
>>>> NamedArray could even be called Table.
>>>> 
>>>>> And I can't wait for such functionality to be included in METADATA...
>>>> 
>>>> Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
>>>> when I first worked on the package. Now I see the tests are still
>>>> failing. Do you know what is needed to make them work?
>>>> 
>>>> Another point is that I think this deserves going into StatsBase, but
>>>> before that we need everybody to agree on a design for NamedArrays.
>>>> 
>>>> Regards
>>>> 
>>>>> On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
>>>>> 
>>>>> wrote:
>>>>>        Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
>>>>> 
>>>>>        écrit :
>>>>>> I was also looking for a function like this, but could not
>>>>>> find one in docs.julialang.org.  I was doing this
>>>>>> (v0.4.0-dev), for anyone who is interested:
>>>>>> 
>>>>>> 
>>>>>> example = rand(1:10,100)
>>>>>> uexample = sort(unique(example))
>>>>>> counts = map(x->count(y->x==y,example),uexample)
>>>>>> 
>>>>>> 
>>>>>> It's pretty ugly, so thanks, Johan, for pointing out the
>>>>>> StatsBase->countmap
>>>>> 
>>>>>        I've also put together a small package precisely aimed at
>>>>>        offering an equivalent of R's table():
>>>>>        https://github.com/nalimilan/Tables.jl
>>>>> 
>>>>>        But there's a more general issue about how to handle arrays
>>>>>        with dimension names in Julia. NamedArrays.jl (which is used
>>>>>        in my package) attempts to tackle this issue, but I don't
>>>>>        think we've reached a consensus yet about the best solution.
>>>>> 
>>>>> 
>>>>>        Regards
>>>>> 
>>>>>> On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
>>>>>> 
>>>>>> wrote:
>>>>>>        I think countmap comes closest to giving you what
>>>>>>        you want:
>>>>>> 
>>>>>>        using StatsBase
>>>>>>        data = sample(["a", "b", "c"], 20)
>>>>>>        countmap(data)
>>>>>> 
>>>>>>        Dict{ASCIIString,Int64} with 3 entries:
>>>>>>          "c" => 3
>>>>>>          "b" => 10
>>>>>>          "a" => 7
>>>>>> 
>>>>>>        On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
>>>>>> 
>>>>>>        Oswald wrote:
>>>>>>                Hi
>>>>>> 
>>>>>> 
>>>>>>                I'm looking for the best way to count how
>>>>>>                many times a certain value x_i appears in
>>>>>>                vector x, where x could be integers, floats,
>>>>>>                strings. In R I would do table(x). I found
>>>>>>                StatsBase.counts(x,k) but I'm a bit confused
>>>>>>                by k (where k goes into 1:k, i.e. the vector
>>>>>>                is scanned to find how many elements locate
>>>>>>                at each point of 1:k). most of the times I
>>>>>>                don't know k, and in fact I would do
>>>>>>                table(x) just to find out what k was. Apart
>>>>>>                from that, I don't think I could use this
>>>>>>                with strings, as I can't construct a range
>>>>>>                object from strings.
>>>>>> 
>>>>>> 
>>>>>>                I'm wondering whether a method
>>>>>>                StatsBase.counts(x::Vector) just returning
>>>>>>                the frequency of each element appearing
>>>>>>                would be useful.
>>>>>> 
>>>>>> 
>>>>>>                The same applies to Base.hist if I
>>>>>>                understand correctly. I just don't want to
>>>>>>                have to specify the edges of bins.
>

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Reply via email to