THX, work but:
julia> t=h5read("FMLo2_z_reversem.h5","FMLo2",(:,1));

julia> eltype(t)
Float64

julia> t
5932868x1 Array{Float64,2}:
  0.0181719
  0.303473
 -0.526979
  ?
 -0.526979
  0.912295
 -0.0281875

julia> entropy(t)
NaN

julia> entropy(vec(t))
NaN
Why ?
julia> s

1000000-element Array{Float64,1}:
 0.737228
 0.162308
 0.688503
 0.108727
 0.40552
 0.654883
 0.618194
 0.137147
 0.934373
 0.077236
 ?
 0.413675
 0.463914
 0.504321
 0.976408
 0.998662
 0.343927
 0.477739
 0.660533
 0.918326
 0.579264

julia> entropy(s)
13.280438296634793


julia>

julia> entropy(t[:,1])
NaN

???
Paul


W dniu 2014-09-05 07:34, David P. Sanders pisze:


El jueves, 4 de septiembre de 2014 23:42:20 UTC-5, paul analyst escribió:

    julia> entropy(s)=-sum(x->x*log(2,x), [count(x->x==c,s)/length(s)
    for c in unique(s)]);

    julia> s=rand(10^3);

    julia> @time entropy(s)
    elapsed time: 0.167097546 seconds (20255140 bytes allocated)
    9.965784284662059

    julia> s=rand(10^4);

    julia> @time entropy(s)
    elapsed time: 3.62008077 seconds (1602061320 bytes allocated,
    21.81% gc time)
    13.287712379549843

    julia> s=rand(10^5);

    julia> @time entropy(s)
    elapsed time: 366.181311932 seconds (160021245832 bytes allocated,
    21.89% gc time)
    16.609640474434073


You can see from these last two that the time is multiplied by 100 when the length of the vector is multiplied by 10, i.e. your method has O(n^2) complexity. This is due to the way that you are counting repeats. What you are basically doing is a histogram.

If your data are really floats, then in any case some binning is required. If they are ints, you could use a dictionary. I think there's even a counting object already implemented (but I don't remember what it's called).

How about this:

function entropy(s)
           N = length(s)
           num_bins = 10000
           h = hist(s, num_bins)
           p = h[2] ./ N  # probabilities
           -sum([x * log(2,x) for x in p])
end

julia> @time entropy(rand(10^6))
elapsed time: 0.199634039 seconds (79424624 bytes allocated, 31.51% gc time)
13.28044771568381

julia> @time entropy(rand(10^7))
elapsed time: 1.710673571 seconds (792084208 bytes allocated, 26.20% gc time)
13.286992511965552

julia> @time entropy(rand(10^8))
elapsed time: 18.20088571 seconds (7918627344 bytes allocated, 24.03% gc time)
13.28764216804997

The calculation is now O(n) instead.

    julia> s=rand(10^6);

    julia> @time entropy(s)
    ................................
    After 12 h not yet counted :/

    Paul


Reply via email to