Vaibhav Jain <vaib...@linux.ibm.com> writes: > Thanks for reviewing this this patch Ira. My responses below: > Ira Weiny <ira.we...@intel.com> writes: >> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote: >>> Implement support for fetching nvdimm health information via >>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair >>> of 64-bit big-endian integers, bitwise-and of which is then stored in >>> 'struct papr_scm_priv' and subsequently partially exposed to >>> user-space via newly introduced dimm specific attribute >>> 'papr/flags'. Since the hcall is costly, the health information is >>> cached and only re-queried, 60s after the previous successful hcall. ... >>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >>> b/arch/powerpc/platforms/pseries/papr_scm.c >>> index f35592423380..142636e1a59f 100644 >>> --- a/arch/powerpc/platforms/pseries/papr_scm.c >>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >>> @@ -39,6 +78,15 @@ struct papr_scm_priv { >>> struct resource res; >>> struct nd_region *region; >>> struct nd_interleave_set nd_set; >>> + >>> + /* Protect dimm health data from concurrent read/writes */ >>> + struct mutex health_mutex; >>> + >>> + /* Last time the health information of the dimm was updated */ >>> + unsigned long lasthealth_jiffies; >>> + >>> + /* Health information for the dimm */ >>> + u64 health_bitmap; >> >> I wonder if this should be typed big endian as you mention that it is in the >> commit message? > This was discussed in an earlier review of the patch series at > https://lore.kernel.org/linux-nvdimm/878sjetcis....@mpe.ellerman.id.au > > Even though health bitmap is returned in big endian format (For ex > value 0xC00000000000000 indicates bits 0,1 set), its value is never > used. Instead only test for specific bits being set in the register is > done.
This has already caused a lot of confusion, so let me try and clear it up. I will probably fail :) The value is not big endian. It's returned in a GPR (a register), from the hypervisor. The ordering of bytes in a register is not dependent on what endian we're executing in. It's true that the hypervisor will have been running big endian, and when it returns to us we will now be running little endian. But the value is unchanged, it was 0xC00000000000000 in the GPR while the HV was running and it's still 0xC00000000000000 when we return to Linux. You can see this in mambo, see below for an example. _However_, the specification of the bits in the bitmap value uses MSB 0 ordering, as is traditional for IBM documentation. That means the most significant bit, aka. the left most bit, is called "bit 0". See: https://en.wikipedia.org/wiki/Bit_numbering#MSB_0_bit_numbering That is the opposite numbering from what most people use, and in particular what most code in Linux uses, which is that bit 0 is the least significant bit. Which is where the confusion comes in. It's not that the bytes are returned in a different order, it's that the bits are numbered differently in the IBM documentation. The way to fix this kind of thing is to read the docs, and convert all the bits into correct numbering (LSB=0), and then throw away the docs ;) cheers In mambo we can set a breakpoint just before the kernel enters skiboot, towards the end of __opal_call. The kernel is running LE and skiboot runs BE. systemsim-p9 [~/skiboot/skiboot/external/mambo] b 0xc0000000000c1744 breakpoint set at [0:0:0]: 0xc0000000000c1744 (0x00000000000C1744) Enc:0x2402004C : hrfid Then run: systemsim-p9 [~/skiboot/skiboot/external/mambo] c [0:0:0]: 0xC0000000000C1744 (0x00000000000C1744) Enc:0x2402004C : hrfid INFO: 121671618: (121671618): ** Execution stopped: user (tcl), ** 121671618: ** finished running 121671618 instructions ** And we stop there, on an hrfid that we haven't executed yet. We can print r0, to see the OPAL token: systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0 0x0000000000000019 ie. we're calling OPAL_CONSOLE_WRITE_BUFFER_SPACE (25). And we can print the MSR: systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr 0x9000000002001033 64-bit mode (SF): 0x1 [64-bit mode] Hypervisor State (HV): 0x1 Vector Available (VEC): 0x1 Machine Check Interrupt Enable (ME): 0x1 Instruction Relocate (IR): 0x1 Data Relocate (DR): 0x1 Recoverable Interrupt (RI): 0x1 Little-Endian Mode (LE): 0x1 [little-endian] ie. we're little endian. We then step one instruction: systemsim-p9 [~/skiboot/skiboot/external/mambo] s [0:0:0]: 0x0000000030002BF0 (0x0000000030002BF0) Enc:0x7D9FFAA6 : mfspr r12,PIR Now we're in skiboot. Print the MSR again: systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr 0x9000000002001002 64-bit mode (SF): 0x1 [64-bit mode] Hypervisor State (HV): 0x1 Vector Available (VEC): 0x1 Machine Check Interrupt Enable (ME): 0x1 Recoverable Interrupt (RI): 0x1 We're big endian. Print r0: systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0 0x0000000000000019 r0 is unchanged!