THX, work but:
julia> t=h5read("FMLo2_z_reversem.h5","FMLo2",(:,1));
julia> eltype(t)
Float64
julia> t
5932868x1 Array{Float64,2}:
0.0181719
0.303473
-0.526979
?
-0.526979
0.912295
-0.0281875
julia> entropy(t)
NaN
julia> entropy(vec(t))
NaN
Why ?
julia> s
1000000-element Array{Float64,1}:
0.737228
0.162308
0.688503
0.108727
0.40552
0.654883
0.618194
0.137147
0.934373
0.077236
?
0.413675
0.463914
0.504321
0.976408
0.998662
0.343927
0.477739
0.660533
0.918326
0.579264
julia> entropy(s)
13.280438296634793
julia>
julia> entropy(t[:,1])
NaN
???
Paul
W dniu 2014-09-05 07:34, David P. Sanders pisze:
El jueves, 4 de septiembre de 2014 23:42:20 UTC-5, paul analyst escribió:
julia> entropy(s)=-sum(x->x*log(2,x), [count(x->x==c,s)/length(s)
for c in unique(s)]);
julia> s=rand(10^3);
julia> @time entropy(s)
elapsed time: 0.167097546 seconds (20255140 bytes allocated)
9.965784284662059
julia> s=rand(10^4);
julia> @time entropy(s)
elapsed time: 3.62008077 seconds (1602061320 bytes allocated,
21.81% gc time)
13.287712379549843
julia> s=rand(10^5);
julia> @time entropy(s)
elapsed time: 366.181311932 seconds (160021245832 bytes allocated,
21.89% gc time)
16.609640474434073
You can see from these last two that the time is multiplied by 100
when the length of the vector is multiplied by 10, i.e. your method
has O(n^2) complexity. This is due to the way that you are counting
repeats. What you are basically doing is a histogram.
If your data are really floats, then in any case some binning is
required. If they are ints, you could use a dictionary. I think
there's even a counting object already implemented (but I don't
remember what it's called).
How about this:
function entropy(s)
N = length(s)
num_bins = 10000
h = hist(s, num_bins)
p = h[2] ./ N # probabilities
-sum([x * log(2,x) for x in p])
end
julia> @time entropy(rand(10^6))
elapsed time: 0.199634039 seconds (79424624 bytes allocated, 31.51% gc
time)
13.28044771568381
julia> @time entropy(rand(10^7))
elapsed time: 1.710673571 seconds (792084208 bytes allocated, 26.20%
gc time)
13.286992511965552
julia> @time entropy(rand(10^8))
elapsed time: 18.20088571 seconds (7918627344 bytes allocated, 24.03%
gc time)
13.28764216804997
The calculation is now O(n) instead.
julia> s=rand(10^6);
julia> @time entropy(s)
................................
After 12 h not yet counted :/
Paul