On Sat, Oct 07, 2017 at 03:08:26AM +0300, Timofey Titovets wrote:
> > I hear the compiler scream, trying to optimize all the ifs. And I think
> > the CPU branch prediction would have a hard time, there's practically no
> > expected pattern and the performance could be worse compared to the
> > formula calculation.
> >
> > What's the reason you opencode it like that? Could you please post how
> > it would be implemented in C? There are potentially optimized functions
> > to do integer log2 (ilog2).
> 
> That is just:
> log2_lshift4(n) { return ilog2(pow(n, 16)); }
> But that will cause a overflow.
> 
> log2(8192) -> 13 - 13 bit long
> 13 * 4 = 52, 52 < 64
> 
> That also restrict sample size to 15 bit
> 
> So we can get approximately same behaviour,
> by doing like ilog2(pow(n, 4));
> 
> https://pastebin.com/VryvqkCy - some numbers
> TL;DR version:
> pow(n, 4)
>  - count        100          7    28300
>  - error %    0.0%     1.1%    3.0%
> pow(n, 16)
>  - count     18843    11483    14
>  - error %     0.0%     1.0%    2.8%
> 
> (I sample Win10.iso image as just big file with different data).
> I gen error value by:
> if (shannon_e_f > 0)
>     error = 100 - (shannon_e_i * 1.0 * 100 / shannon_e_f);
> 
> In fact that cause errors in entropy level like:
> TRUE -> Integer
> 83 -> 80
> 87 -> 84
> 32 -> 28
> 
> In theory that possible to just make entropy level markers more aggressive,
> like: 70->65 and 85 -> 80
> 
> That will cover precision errors.

With the v2 patch and direct calculation, I'm ok with leaving some
imprecision there. This is fine-tuning that can be done in parallel.

> or make a byte level BigInt in kernel (bigint_log2, bigint_Lshift & etc) %)
> (But this seems inexcusable to me + i'm unsure about performance + I'm
> afraid Big/Little.Endian)

I think we need to keep it fast and not anything too complex to
calculate, unless there's a significant win in the heuristic results.

> > Code like that could be microbenchmarked so we can see actual numbers,
> > so far I'm just guessing.
> 
> I'm mentioned in a cover letter https://github.com/Nefelim4ag/companalyze
> (implement the heuristic with no optimizations, all code run, to get
> RAW counters)
> Over in memory bad compressible data:
>  8188. BSize: 131072, RepPattern: 0, BSet: 256, BCSet: 220, ShanEi%:
> 99| 99 ~0.0%, RndDist:     0, out: -3
> 
> With -O0 - ~1100 MiB/s
> With -O2 - ~2500 MiB/s
> 
> With (i found in kernel and add pow(n, 4))
> static int ilog2(uint64_t v)
> {
>     int l = 0;
>     v = v*v*v*v;
>     while ((1UL << l) < v)
>         l++;

The while() implementation is generic and works on all architectures,
but x86_64 has an instruction for that and will be even faster:

ilog2 ->

http://elixir.free-electrons.com/linux/latest/source/include/linux/log2.h#L26

fls (find lowest set bit) ->

http://elixir.free-electrons.com/linux/latest/source/arch/x86/include/asm/bitops.h#L450

bsrl

>     return l;
> }
> 
> With -O0 - ~1150 MiB/s
> With -O2 - ~2500 MiB/s
> 
> So, my solutions looks more bad at code, than performance
> (i may be that because most requests for bad compressible data got
> directly to lookup table)

Ok, thanks for collecting the numbers.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to