Re: [PATCH v4 4/4] sparc64: Add support for ADI (Application Data Integrity)

2017-01-17 Thread Khalid Aziz

On 01/16/2017 09:39 PM, David Miller wrote:

From: Khalid Aziz 
Date: Wed, 11 Jan 2017 09:12:54 -0700


diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 8a6982d..68b03bf 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 

 /* Unlike the OBP device tree, the machine description is a full-on
  * DAG.  An arbitrary number of ARCs are possible from one
@@ -1104,5 +1105,8 @@ void __init sun4v_mdesc_init(void)

cur_mdesc = hp;

+#ifdef CONFIG_SPARC64


mdesc.c is only built on sparc64, this ifdef is superfluous.


Good point. I will fix it.




+/* Update the state of MCDPER register in current task's mm context before
+ * dup so the dup'd task will inherit flags in this register correctly.
+ * Current task may have updated flags since it started running.
+ */
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
+{
+   if (adi_capable() && src->mm) {
+   register unsigned long tmp_mcdper;
+
+   __asm__ __volatile__(
+   ".word 0x83438000\n\t"/* rd %mcdper, %g1 */
+   "mov %%g1, %0\n\t"
+   : "=r" (tmp_mcdper)
+   :
+   : "g1");
+   src->mm->context.mcdper = tmp_mcdper;


I don't like the idea of duplicating 'mm' state using the task struct
copy.  Why do not the MM handling interfaces handle this properly?

Maybe it means you've abstracted the ADI register handling in the
wrong place.  Maybe it's a thread property which is "pushed" from
the MM context.


I see what you are saying. This code updates mm->context.mcdper for the 
source thread with the current state of MCDPER since MCDPER can be 
changed by a userspace process any time. When userspace changes MCDPER, 
it is not saved into mm->context.mcdper until a context switch happens. 
This means during the timeslice for a thread, its mm->context.mcdper may 
not reflect the current value of MCDPER. Updating it ensures dup_mm() 
will copy the real current value of MCDPER into the newly forked thread. 
arch_dup_mmap() looks like a more appropriate place to do this. Do you 
agree?


Thanks,
Khalid

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 1/1] DM: inplace compressed DM target

2017-01-17 Thread Ram Pai
This is a simple DM target supporting inplace compression. Its best
suited for SSD. The underlying disk must support 512B sector size.
The target only supports 4k sector size.

Disk layout:
|super|...meta...|..data...|

Store unit is 4k (a block). Super is 1 block, which stores meta  and
data size and compression algorithm. Meta is a bitmap. For each data
 block, there are 5 bits meta.

Data:

Data of   a block is compressed. Compressed  data  is round up to 512B,
which is the payload. In disk, payload is  stored at  the beginning of
logical sector  of the block. Let's look  at an example.  Say we store
data to block A, which  is in sector  B(A*8), its orginal  size is 4k,
compressed size is  1500.Compressed data (CD)  will  use three
sectors (512B). The three  sectors  are the  payload. Payload  will be
stored at sector B.

---
... | CD1 | CD2 | CD3 |   |   |   |   || ...
---
^B^B+1  ^B+2  ^B+7 ^B+8

For this block, we will not use sector B+3 to B+7 (a hole). We use four
meta  bits  to  present payload  size. The compressed size (1500) isn't
stored in meta directly. Instead, we  store  it  at  the last 32bits of
payload. In this  example, we store it at the  end  of  sector  B+2. If
compressed size + sizeof(32bits)  crosses a   sector, payload size will
increase one sector.  If payload  uses 8 sectors, we store uncompressed
data directly.

If IO size is bigger than one block, we can store the data as an extent.
Data of the  whole extent will compressed and stored in the similar way
like above.  The first  block of the extent is the head, all others are
the tail.  If extent is 1 block,  the  block  is head. We have 1 bit of
meta to present if a  block  is  head  or  tail. If 4 meta bits of head
block can't  store  extent payload size, we will borrow tail block meta
bits to  store  payload  size.   Max  allowd extent size is 128k, so we
don't compress/decompress too big size data.

Meta:
Modifying   data   will modify meta too. Meta will be written(flush) to
disk   depending   on   meta   write   policy. We support writeback and
writethrough mode.  In  writeback mode, meta will be written to disk in
an interval or a  FLUSH  request.  In  writethrough mode, data and meta
data will be written to disk together.

Advantages:

1. Simple. Since  we  store  compressed  data  in-place,  we don't need
   complicated disk data management.
2. Efficient. For  each  4k, we only need 5 bits meta. 1T data will use
less than 200M meta, so we  can  load  all meta into memory. And actual
compression size is in payload. So   if  IO doesn't need RMW and we use
writeback meta flush, we don't  need  extra IO for meta.

Disadvantages:

1. hole. Since we   store  compressed data in-place, there are a lot of
   holes (in above  example,  B+3 - B+7) Hole can impact IO, because we
   can't do IO merge.

2. 1:1 size. Compression  doesn't  change disk  size. If disk is 1T, we
   can only store 1T data even we do compression.

But this is for SSD only. Generally SSD firmware has a FTL layer to map
disk  sectors  to flash nand. High end SSD firmware has filesystem-like
FTL.

1. hole. Disk has a lot of holes, but SSD FTL   can   still  store data
   contiguous in nand. Even if we can't do IO   merge in  OS layer, SSD
   firmware can do it.

2. 1:1 size. On one side, we write compressed data to SSD, which means
   less  data is  written to SSD. This will be very helpful to improve
   SSD garbage collection, and  so write speed and life cycle. So even
   this is a problem, the target  is still helpful. On the other side,
   advanced SSD FTL can easily do thin provision. For example, if nand
   is   1T   and   we   let   SSD   report   it   as   2T,   and   use
   the  SSD  as  compressed target. In such SSD, we don't have the 1:1
   size issue.

So even if   SSD   FTL   cannot   map   non-contiguous disk sectors to
contiguous nand, the compression target can still function well.

Signed-off-by: Shaohua Li 
Signed-off-by: Ram Pai 
---
 .../device-mapper/dm-inplace-compress.txt  |  139 ++
 drivers/md/Kconfig |6 +
 drivers/md/Makefile|2 +
 drivers/md/dm-inplace-compress.c   | 2104 
 drivers/md/dm-inplace-compress.h   |  185 ++
 5 files changed, 2436 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-inplace-compress.txt
 create mode 100644 drivers/md/dm-inplace-compress.c
 create mode 100644 drivers/md/dm-inplace-compress.h

diff --git a/Documentation/device-mapper/dm-inplace-compress.txt 
b/Documentation/device-mapper/dm-inplace-compress.txt
new file mode 100644
index 000..c2eefb9
--- /dev/null
+++ b/Documentation/device-mapper/dm-inplace-compress.txt
@@ -0,0 +1,139 @@
+Device-Mapper's "inplace-compress" target provides 

[RFC PATCH v2] DM: dm-inplace-compress: inplace compressed DM target

2017-01-17 Thread Ram Pai
This  patch provides a generic device-mapper inplace compression
device.  Originally written by Shaohua Li.
https://www.redhat.com/archives/dm-devel/2013-December/msg00143.html

I have optimized and hardened the code.

Testing:
---
This compression block device  is  tested in the following scenarios
a) backing a ext4 filesystem 
b) backing swap
Its tested on a PPC64 system only.
Moretesting  is  needed on different architectures.

TODO:
-
For testing, the code was modified to use GFP_ATOMIC allocations when   device
is   used as swap. A more dynamic mechanism is needed to switch allocation
strategy based on usage. Probably a sysfs interface?

Version v1:
Comments from Alasdair have been incorporated.
https://www.redhat.com/archives/dm-devel/2013-December/msg00144.html

Version v2:
All patches are merged into a single patch.
Major code re-arrangement.
Data and metablocks allocated based on the length of the device
map rather than the size of the backing device.
Size   of   each entry   in  the bitmap array is explicitly set
 to 32bits.
Attempt  to  reuse  the  provided  bio  buffer  space   instead  of
 allocating a new one.

Your comments to improve the code is very much appreciated.

Ram Pai (1):
  From: Shaohua Li 

 .../device-mapper/dm-inplace-compress.txt  |  139 ++
 drivers/md/Kconfig |6 +
 drivers/md/Makefile|2 +
 drivers/md/dm-inplace-compress.c   | 2104 
 drivers/md/dm-inplace-compress.h   |  185 ++
 5 files changed, 2436 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-inplace-compress.txt
 create mode 100644 drivers/md/dm-inplace-compress.c
 create mode 100644 drivers/md/dm-inplace-compress.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4] Application Data Integrity feature introduced by SPARC M7

2017-01-17 Thread Khalid Aziz

On 01/16/2017 09:47 PM, David Miller wrote:

From: Dave Hansen 
Date: Wed, 11 Jan 2017 10:13:54 -0800


For memory shared by two different processes, do they have to agree on
what the tags are, or can they differ?


Whoever allocates the memory (does the mmap()+mprotect() or whatever),
decides on the tag.  They set it, and this determines which virtual
address is valid to access that mapping.

It's like kmalloc() returns pointers with some weird bits set in the
upper bits of the address.  Behind the scenes kmalloc() sets the
TAG bits appropriately.

It doesn't, in that sense, matter where in the non-tagged virtual
address space the memory is mapped.  All that matters is that, for
a given page, the TAG bits in the virtual address used for loads
and stores to that mapping are set properly.

I think the fundamental thing being missed is that the TAG bits in the
virtual address are not interpreted by the TLB.  They are chopped off
before virtual address translation occurs.

The TAG bits of the virtual address serve only to indicate what ADI
value the load or store thinks is valid to use for access to that
piece of memory.

Or something like that... :-)


Hi David,

Your explanation is spot on. MMU looks at the tag bits only to determine 
if the process has permission to access the memory address. Tag bits are 
not part of VA->PA translation. The tags are stored in physical memory 
though and MMU compares the tag stored at physical address obtained from 
TLB translation to the tag embedded in VA. What that means is if two 
processes map the same physical page in their address space, they both 
must embed the same tag in the VA they present to MMU irrespective of 
where in each process' address space the page is mapped in. If one 
process changes the tag, stored in physical memory, the other process 
must also embed the new tag in its VA when accessing this shared mapped 
page. This is something to consider because a tag can be set and changed 
entirely from userspace with no kernel involvement as long as the 
process has write access to memory.


--
Khalid

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/4] sparc64: Add support for ADI (Application Data Integrity)

2017-01-17 Thread Khalid Aziz

On 01/17/2017 12:42 PM, David Miller wrote:

From: Khalid Aziz 
Date: Tue, 17 Jan 2017 12:32:46 -0700


On 01/16/2017 09:39 PM, David Miller wrote:

From: Khalid Aziz 
Date: Wed, 11 Jan 2017 09:12:54 -0700


+   __asm__ __volatile__(
+   ".word 0xa1438000\n\t"/* rd  %mcdper, %l0 */


Just use "rd %%asr14, %0" this way you don't have to play all of these
fixed register games which kill the code generated by gcc.  If you
forcefully clobber a windowed register like %l0 it means the function
being emitted can never be a leaf function, tail calls are no longer
allowed, etc.


Hi David,

"rd %%asr14, %0" should work but does not due to bugs in assembler -
, and
. These bugs
were fixed in binutils 2.27 but older assemblers will cause kernel
build to fail. Using byte coded equivalent is the safest option.


Fair enough.

Then please at least use %g1 or another usable global register to
avoid at least some of the problems I mentioned.



Sure, I will do that. Thanks for the review and feedback.

--
Khalid
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/4] sparc64: Add support for ADI (Application Data Integrity)

2017-01-17 Thread David Miller
From: Khalid Aziz 
Date: Tue, 17 Jan 2017 12:32:46 -0700

> On 01/16/2017 09:39 PM, David Miller wrote:
>> From: Khalid Aziz 
>> Date: Wed, 11 Jan 2017 09:12:54 -0700
>>
>>> +   __asm__ __volatile__(
>>> +   ".word 0xa1438000\n\t"  /* rd  %mcdper, %l0 */
>>
>> Just use "rd %%asr14, %0" this way you don't have to play all of these
>> fixed register games which kill the code generated by gcc.  If you
>> forcefully clobber a windowed register like %l0 it means the function
>> being emitted can never be a leaf function, tail calls are no longer
>> allowed, etc.
> 
> Hi David,
> 
> "rd %%asr14, %0" should work but does not due to bugs in assembler -
> , and
> . These bugs
> were fixed in binutils 2.27 but older assemblers will cause kernel
> build to fail. Using byte coded equivalent is the safest option.

Fair enough.

Then please at least use %g1 or another usable global register to
avoid at least some of the problems I mentioned.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/4] sparc64: Add support for ADI (Application Data Integrity)

2017-01-17 Thread Khalid Aziz

On 01/16/2017 09:39 PM, David Miller wrote:

From: Khalid Aziz 
Date: Wed, 11 Jan 2017 09:12:54 -0700


+   __asm__ __volatile__(
+   ".word 0xa1438000\n\t"/* rd  %mcdper, %l0 */


Just use "rd %%asr14, %0" this way you don't have to play all of these
fixed register games which kill the code generated by gcc.  If you
forcefully clobber a windowed register like %l0 it means the function
being emitted can never be a leaf function, tail calls are no longer
allowed, etc.


Hi David,

"rd %%asr14, %0" should work but does not due to bugs in assembler - 
, and 
. These bugs 
were fixed in binutils 2.27 but older assemblers will cause kernel build 
to fail. Using byte coded equivalent is the safest option.





+   ".word 0x9d800011\n\t"/* wr  %g0, %l1, %mcdper 
*/


Likewise use "wr %%g0, %0, %%asr14"


+   ".word 0xaf91\n\t"/* wrpr  %g0, %g1, %pmcdper */


Hmmm, which %asr encodes %pmcdper?


%pmcdper is not an asr, rather a privileged register (pr23).

Thanks,
Khalid




diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 8a6982d..68b03bf 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 

 /* Unlike the OBP device tree, the machine description is a full-on
  * DAG.  An arbitrary number of ARCs are possible from one
@@ -1104,5 +1105,8 @@ void __init sun4v_mdesc_init(void)

cur_mdesc = hp;

+#ifdef CONFIG_SPARC64


mdesc.c is only built on sparc64, this ifdef is superfluous.


+/* Update the state of MCDPER register in current task's mm context before
+ * dup so the dup'd task will inherit flags in this register correctly.
+ * Current task may have updated flags since it started running.
+ */
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
+{
+   if (adi_capable() && src->mm) {
+   register unsigned long tmp_mcdper;
+
+   __asm__ __volatile__(
+   ".word 0x83438000\n\t"/* rd %mcdper, %g1 */
+   "mov %%g1, %0\n\t"
+   : "=r" (tmp_mcdper)
+   :
+   : "g1");
+   src->mm->context.mcdper = tmp_mcdper;


I don't like the idea of duplicating 'mm' state using the task struct
copy.  Why do not the MM handling interfaces handle this properly?

Maybe it means you've abstracted the ADI register handling in the
wrong place.  Maybe it's a thread property which is "pushed" from
the MM context.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] arm: sunxi: add support for V3s SoC

2017-01-17 Thread Maxime Ripard
On Tue, Jan 17, 2017 at 02:01:13AM +0800, Icenowy Zheng wrote:
> Allwinner V3s is a low-end single-core Cortex-A7 SoC, with 64MB
> integrated DRAM, and several peripherals.
> 
> Signed-off-by: Icenowy Zheng 
> ---
> Changes in v2:
> - Used linux-sunxi.org wiki hosted address of V3s datasheet.
> 
> Note: the V3s datasheet contains its user manual.

That would be great to use User Manual in the filename rather than
datasheet then. The datasheet is something different.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH v2 2/4] clk: sunxi-ng: add support for V3s CCU

2017-01-17 Thread Maxime Ripard
Hi,

On Tue, Jan 17, 2017 at 02:01:14AM +0800, Icenowy Zheng wrote:
> V3s has a similar but cut-down CCU to H3.
> 
> Add support for it.
> 
> Signed-off-by: Icenowy Zheng 
> ---

Output from checkpatch:
total: 5 errors, 2 warnings, 3 checks, 857 lines checked

> I think I should make a comparsion between V3s and H3 CCU here:
> (I won't mention the missing/added clocks here, only list conflicting clocks)

That should be in your commit log.

> - "bus-ehci0" is at different bit (The bit that is "bus-ehci0" on V3s is
>   "bus-ehci2" on H3)
> - The mux of "ce" is different. (According the view at V3s datasheet by the
>   author of sun4i-ss, V3s may have sun4i-ss, not sun8i-ce)
> - The mux of "de" is different. (V3s do not have "pll-de", but it can mux "de"
>   to "pll-video")
> - Clocks about CSI largely differs. (As V3s is designed as a camera SoC, and
>   it have an extra "pll-isp")

OK.

Thanks,
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature