Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Milan Bouchet-Valat Sun, 09 Nov 2014 08:10:49 -0800

Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
> I would vote for calling such a function `table()`, to get even closer
> to R's table().


Well, that's the debate at
https://github.com/JuliaStats/StatsBase.jl/issues/32

At first I was in favor of table() too, but now I prefer freqtable(),
because "table" could mean any kind of cross-tabulation. I think
NamedArray could even be called Table.


> And I can't wait for such functionality to be included in METADATA...
> 
Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
when I first worked on the package. Now I see the tests are still
failing. Do you know what is needed to make them work?

Another point is that I think this deserves going into StatsBase, but
before that we need everybody to agree on a design for NamedArrays.

Regards



> On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
> wrote:
> 
>         Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
>         écrit : 
>         
>         > I was also looking for a function like this, but could not
>         > find one in docs.julialang.org.  I was doing this
>         > (v0.4.0-dev), for anyone who is interested:
>         > 
>         > 
>         > example = rand(1:10,100)
>         > uexample = sort(unique(example))
>         > counts = map(x->count(y->x==y,example),uexample)
>         > 
>         > 
>         > It's pretty ugly, so thanks, Johan, for pointing out the
>         > StatsBase->countmap 
>         
>         I've also put together a small package precisely aimed at
>         offering an equivalent of R's table():
>         https://github.com/nalimilan/Tables.jl
>         
>         But there's a more general issue about how to handle arrays
>         with dimension names in Julia. NamedArrays.jl (which is used
>         in my package) attempts to tackle this issue, but I don't
>         think we've reached a consensus yet about the best solution.
>         
>         
>         Regards
>         
>         
>         > 
>         > 
>         > 
>         > On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
>         > wrote:
>         > 
>         >         I think countmap comes closest to giving you what
>         >         you want:
>         >         
>         >         using StatsBase
>         >         data = sample(["a", "b", "c"], 20)
>         >         countmap(data)
>         >         
>         >         
>         >         
>         >         Dict{ASCIIString,Int64} with 3 entries:
>         >           "c" => 3
>         >           "b" => 10
>         >           "a" => 7
>         >         
>         >         
>         >         On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
>         >         Oswald wrote: 
>         >         
>         >                 Hi 
>         >                 
>         >                 
>         >                 I'm looking for the best way to count how
>         >                 many times a certain value x_i appears in
>         >                 vector x, where x could be integers, floats,
>         >                 strings. In R I would do table(x). I found
>         >                 StatsBase.counts(x,k) but I'm a bit confused
>         >                 by k (where k goes into 1:k, i.e. the vector
>         >                 is scanned to find how many elements locate
>         >                 at each point of 1:k). most of the times I
>         >                 don't know k, and in fact I would do
>         >                 table(x) just to find out what k was. Apart
>         >                 from that, I don't think I could use this
>         >                 with strings, as I can't construct a range
>         >                 object from strings. 
>         >                 
>         >                 
>         >                 I'm wondering whether a method
>         >                 StatsBase.counts(x::Vector) just returning
>         >                 the frequency of each element appearing
>         >                 would be useful. 
>         >                 
>         >                 
>         >                 The same applies to Base.hist if I
>         >                 understand correctly. I just don't want to
>         >                 have to specify the edges of bins. 
>         >                 
>         >                 
>         >                 
>         >                 
>         
>         
>

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Reply via email to