On 11/18/2016 12:03 AM, Richard Henderson wrote: > On 11/17/2016 11:09 PM, Bastian Koppelmann wrote: >> On 11/17/2016 08:59 PM, Richard Henderson wrote: >>> On 11/17/2016 08:53 PM, Richard Henderson wrote: >>>> On 11/17/2016 05:50 PM, Bastian Koppelmann wrote: >>>>> On 11/16/2016 08:25 PM, Richard Henderson wrote: >>>>>> + >>>>>> + OP_32_64(clz): >>>>>> + if (const_args[2]) { >>>>>> + tcg_debug_assert(have_bmi1); >>>>>> + tcg_debug_assert(args[2] == (rexw ? 64 : 32)); >>>>>> + tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]); >>>>>> + } else { >>>>>> + /* ??? See above. */ >>>>>> + tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]); >>>>> >>>>> The Intel ISA manual states that it find the bit index of the most >>>>> significant bit, where the least significant bit is index 0. So for >>>>> the >>>>> input 0x2 this should return 1. However this is not the number of >>>>> leading zeros. >>>> >>>> Oh, of course you're right. I thought I was testing this, but while >>>> alpha does >>>> have this operation, it turns out it isn't used much. >>> >>> Alternately, what I tested was on a haswell machine, which takes the >>> LZCNT path, which *does* produce the intended results. Just the BSR >>> path doesn't. >> >> Luckily my old laptop is a Core 2 Duo without LZCNT :) > > Heh. Well, I've given it another few tests with LZCNT hacked off, and > with i686 32-bit. Here's an incremental update. Wherein I also note > that lzcnt isn't in the same cpuid flag as tzcnt. Double whoops.
My processor[1] seems to lie about the LZCNT cpuid flag. It says it has LZCNT but executes it as BSR. According to [2] ABM flag is used to indicate LZCNT support. Cheers, Bastian [1] $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz stepping : 10 microcode : 0xa0b cpu MHz : 1600.000 cache size : 3072 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority dtherm ida bugs : bogomips : 4523.35 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: [2] https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets