Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
Le 24/06/2019 à 05:12, Nicholas Piggin a écrit : Christophe Leroy's on June 20, 2019 2:25 am: Le 19/06/2019 à 05:59, Nicholas Piggin a écrit : Christophe Leroy's on June 11, 2019 4:46 pm: Le 10/06/2019 à 05:08, Nicholas Piggin a écrit : I would like to remove the early ioremap or make it into its own function. Re-implement map_kernel_page with ioremap_page_range, allow page tables that don't use slab to avoid the early check, unbolt the hptes mapped in early boot, etc. Getting early ioremap out of the picture is a very good idea, it will help making things more common between all platform types. Today we face the fact that PPC32 allocates early io from the top of memory while PPC64 allocates it from the bottom of memory. Any idea on how to proceed ? I have to have a bit better look at other arches and our platform code. Without having looked closely at all the details, I would hope we could use GENERIC_EARLY_IOREMAP without too much pain. Good idea. I have looked at it and implemented it for PPC32. In its own it works well, now the challenge is to move all early call sites of ioremap() to early_ioremap(). I point however is that early_ioremap() expects all maps being released by the time we do paging_init(), whereas several of early PPC ioremap() users never release the mapped area. I think we have to dig into this in more details. Christophe
Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
Christophe Leroy's on June 20, 2019 2:25 am: > > > Le 19/06/2019 à 05:59, Nicholas Piggin a écrit : >> Christophe Leroy's on June 11, 2019 4:46 pm: >>> >>> >>> Le 10/06/2019 à 05:08, Nicholas Piggin a écrit : >> I would like to remove the early ioremap or make it into its own >> function. Re-implement map_kernel_page with ioremap_page_range, >> allow page tables that don't use slab to avoid the early check, >> unbolt the hptes mapped in early boot, etc. > > Getting early ioremap out of the picture is a very good idea, it will > help making things more common between all platform types. Today we face > the fact that PPC32 allocates early io from the top of memory while > PPC64 allocates it from the bottom of memory. > > Any idea on how to proceed ? I have to have a bit better look at other arches and our platform code. Without having looked closely at all the details, I would hope we could use GENERIC_EARLY_IOREMAP without too much pain. Thanks, Nick
Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
Christophe Leroy's on June 19, 2019 10:49 pm: > > > Le 19/06/2019 à 05:59, Nicholas Piggin a écrit : >> Christophe Leroy's on June 11, 2019 4:46 pm: >>> >>> >>> Le 10/06/2019 à 05:08, Nicholas Piggin a écrit : > > [snip] > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index c9bcf428dd2b..db993bc1aef3 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -11,6 +11,7 @@ #define pr_fmt(fmt) "radix-mmu: " fmt +#include #include #include #include @@ -1122,3 +1123,23 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, set_pte_at(mm, addr, ptep, pte); } + +int radix__ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, + pgprot_t prot, int nid) +{ + if (likely(slab_is_available())) { + int err = ioremap_page_range(ea, ea + size, pa, prot); + if (err) + unmap_kernel_range(ea, size); + return err; + } else { + unsigned long i; + + for (i = 0; i < size; i += PAGE_SIZE) { + int err = map_kernel_page(ea + i, pa + i, prot); + if (WARN_ON_ONCE(err)) /* Should clean up */ + return err; + } >>> >>> Same loop again. >>> >>> What about not doing a radix specific function and just putting >>> something like below in the core ioremap_range() function ? >>> >>> if (likely(slab_is_available()) && radix_enabled()) { >>> int err = ioremap_page_range(ea, ea + size, pa, prot); >>> >>> if (err) >>> unmap_kernel_range(ea, size); >>> return err; >>> } >>> >>> Because I'm pretty sure will more and more use ioremap_page_range(). >> >> Well I agree the duplication is not so nice, but it's convenient >> to see what is going on for each MMU type. >> >> There is a significant amount of churn that needs to be done in >> this layer so I prefer to make it a bit simpler despite duplication. >> >> I would like to remove the early ioremap or make it into its own >> function. Re-implement map_kernel_page with ioremap_page_range, >> allow page tables that don't use slab to avoid the early check, >> unbolt the hptes mapped in early boot, etc. >> >> I just wanted to escape out the 64s and hash/radix implementations >> completely until that settles. > > I can understand the benefit in some situations but here I just can't. > And code duplication should be avoided as much as possible as it makes > code maintenance more difficult. > > Here you have: > > +static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned > long size, pgprot_t prot, int nid) > +{ > + unsigned long i; > + > + for (i = 0; i < size; i += PAGE_SIZE) { > + int err = map_kernel_page(ea + i, pa + i, prot); > + if (err) { > + if (slab_is_available()) > + unmap_kernel_range(ea, size); > + else > + WARN_ON_ONCE(1); /* Should clean up */ > + return err; > + } > + } > + > + return 0; > +} > > You now create a new one in another file, that is almost identical: > > +int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, > pgprot_t prot, int nid) > +{ > + unsigned long i; > + > + if (radix_enabled()) > + return radix__ioremap_range(ea, pa, size, prot, nid); > + > + for (i = 0; i < size; i += PAGE_SIZE) { > + int err = map_kernel_page(ea + i, pa + i, prot); > + if (err) { > + if (slab_is_available()) > + unmap_kernel_range(ea, size); > + else > + WARN_ON_ONCE(1); /* Should clean up */ > + return err; > + } > + } > + > + return 0; > +} > > Then you have to make the original one __weak. > > Sorry I'm still having difficulties understanding what the benefit is. > > radix_enabled() is defined for every platforms so could just add the > following on top of the existing ioremap_range() and voila. > > + if (radix_enabled()) > + return radix__ioremap_range(ea, pa, size, prot, nid); > > > And with that you wouldn't have the __weak stuff to handle. I guess so. I don't really like radix_enabled escaping from 64s too much though. I'll try to improve the code in follow ups if possible. -static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) +int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) >>> >>> Hum. Weak functions
Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
Le 19/06/2019 à 05:59, Nicholas Piggin a écrit : Christophe Leroy's on June 11, 2019 4:46 pm: Le 10/06/2019 à 05:08, Nicholas Piggin a écrit : I would like to remove the early ioremap or make it into its own function. Re-implement map_kernel_page with ioremap_page_range, allow page tables that don't use slab to avoid the early check, unbolt the hptes mapped in early boot, etc. Getting early ioremap out of the picture is a very good idea, it will help making things more common between all platform types. Today we face the fact that PPC32 allocates early io from the top of memory while PPC64 allocates it from the bottom of memory. Any idea on how to proceed ? Christophe
Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
Le 19/06/2019 à 05:59, Nicholas Piggin a écrit : Christophe Leroy's on June 11, 2019 4:46 pm: Le 10/06/2019 à 05:08, Nicholas Piggin a écrit : [snip] diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index c9bcf428dd2b..db993bc1aef3 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -11,6 +11,7 @@ #define pr_fmt(fmt) "radix-mmu: " fmt +#include #include #include #include @@ -1122,3 +1123,23 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, set_pte_at(mm, addr, ptep, pte); } + +int radix__ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, + pgprot_t prot, int nid) +{ + if (likely(slab_is_available())) { + int err = ioremap_page_range(ea, ea + size, pa, prot); + if (err) + unmap_kernel_range(ea, size); + return err; + } else { + unsigned long i; + + for (i = 0; i < size; i += PAGE_SIZE) { + int err = map_kernel_page(ea + i, pa + i, prot); + if (WARN_ON_ONCE(err)) /* Should clean up */ + return err; + } Same loop again. What about not doing a radix specific function and just putting something like below in the core ioremap_range() function ? if (likely(slab_is_available()) && radix_enabled()) { int err = ioremap_page_range(ea, ea + size, pa, prot); if (err) unmap_kernel_range(ea, size); return err; } Because I'm pretty sure will more and more use ioremap_page_range(). Well I agree the duplication is not so nice, but it's convenient to see what is going on for each MMU type. There is a significant amount of churn that needs to be done in this layer so I prefer to make it a bit simpler despite duplication. I would like to remove the early ioremap or make it into its own function. Re-implement map_kernel_page with ioremap_page_range, allow page tables that don't use slab to avoid the early check, unbolt the hptes mapped in early boot, etc. I just wanted to escape out the 64s and hash/radix implementations completely until that settles. I can understand the benefit in some situations but here I just can't. And code duplication should be avoided as much as possible as it makes code maintenance more difficult. Here you have: +static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) +{ + unsigned long i; + + for (i = 0; i < size; i += PAGE_SIZE) { + int err = map_kernel_page(ea + i, pa + i, prot); + if (err) { + if (slab_is_available()) + unmap_kernel_range(ea, size); + else + WARN_ON_ONCE(1); /* Should clean up */ + return err; + } + } + + return 0; +} You now create a new one in another file, that is almost identical: +int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) +{ + unsigned long i; + + if (radix_enabled()) + return radix__ioremap_range(ea, pa, size, prot, nid); + + for (i = 0; i < size; i += PAGE_SIZE) { + int err = map_kernel_page(ea + i, pa + i, prot); + if (err) { + if (slab_is_available()) + unmap_kernel_range(ea, size); + else + WARN_ON_ONCE(1); /* Should clean up */ + return err; + } + } + + return 0; +} Then you have to make the original one __weak. Sorry I'm still having difficulties understanding what the benefit is. radix_enabled() is defined for every platforms so could just add the following on top of the existing ioremap_range() and voila. + if (radix_enabled()) + return radix__ioremap_range(ea, pa, size, prot, nid); And with that you wouldn't have the __weak stuff to handle. -static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) +int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) Hum. Weak functions remain in unused in vmlinux unless CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is selected. Also, they are some how dangerous because people might change them without seeing that it is overridden for some particular configuration. Well you shouldn't assume that when you see a weak function, but what's the preferred alternative? A config option? Yes you are right, nobody should assume that, but ... But I think if the fonctions were really different, the preferred alternative w
Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
Christophe Leroy's on June 11, 2019 4:46 pm: > > > Le 10/06/2019 à 05:08, Nicholas Piggin a écrit : >> Radix can use ioremap_page_range for ioremap, after slab is available. >> This makes it possible to enable huge ioremap mapping support. >> >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/include/asm/book3s/64/radix.h | 3 +++ >> arch/powerpc/mm/book3s64/pgtable.c | 21 + >> arch/powerpc/mm/book3s64/radix_pgtable.c | 21 + >> arch/powerpc/mm/pgtable_64.c | 2 +- >> 4 files changed, 46 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h >> b/arch/powerpc/include/asm/book3s/64/radix.h >> index 574eca33f893..e04a839cb5b9 100644 >> --- a/arch/powerpc/include/asm/book3s/64/radix.h >> +++ b/arch/powerpc/include/asm/book3s/64/radix.h >> @@ -266,6 +266,9 @@ extern void radix__vmemmap_remove_mapping(unsigned long >> start, >> extern int radix__map_kernel_page(unsigned long ea, unsigned long pa, >> pgprot_t flags, unsigned int psz); >> >> +extern int radix__ioremap_range(unsigned long ea, phys_addr_t pa, >> +unsigned long size, pgprot_t prot, int nid); >> + > > 'extern' is pointless here, and checkpatch will cry. > >> static inline unsigned long radix__get_tree_size(void) >> { >> unsigned long rts_field; >> diff --git a/arch/powerpc/mm/book3s64/pgtable.c >> b/arch/powerpc/mm/book3s64/pgtable.c >> index ff98b663c83e..953850a602f7 100644 >> --- a/arch/powerpc/mm/book3s64/pgtable.c >> +++ b/arch/powerpc/mm/book3s64/pgtable.c >> @@ -450,3 +450,24 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, >> >> return true; >> } >> + >> +int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, >> pgprot_t prot, int nid) >> +{ >> +unsigned long i; >> + >> +if (radix_enabled()) >> +return radix__ioremap_range(ea, pa, size, prot, nid); > > This function looks pretty similar to the one in the previous patch. > Since radix_enabled() is available and return false for all other > subarches, I think the above could go in the generic ioremap_range(), > you'll only need to move the function declaration in a common file, for > instance asm/io.h > >> + >> +for (i = 0; i < size; i += PAGE_SIZE) { >> +int err = map_kernel_page(ea + i, pa + i, prot); >> +if (err) { >> +if (slab_is_available()) >> +unmap_kernel_range(ea, size); >> +else >> +WARN_ON_ONCE(1); /* Should clean up */ >> +return err; >> +} >> +} >> + >> +return 0; >> +} >> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c >> b/arch/powerpc/mm/book3s64/radix_pgtable.c >> index c9bcf428dd2b..db993bc1aef3 100644 >> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c >> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c >> @@ -11,6 +11,7 @@ >> >> #define pr_fmt(fmt) "radix-mmu: " fmt >> >> +#include >> #include >> #include >> #include >> @@ -1122,3 +1123,23 @@ void radix__ptep_modify_prot_commit(struct >> vm_area_struct *vma, >> >> set_pte_at(mm, addr, ptep, pte); >> } >> + >> +int radix__ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long >> size, >> +pgprot_t prot, int nid) >> +{ >> +if (likely(slab_is_available())) { >> +int err = ioremap_page_range(ea, ea + size, pa, prot); >> +if (err) >> +unmap_kernel_range(ea, size); >> +return err; >> +} else { >> +unsigned long i; >> + >> +for (i = 0; i < size; i += PAGE_SIZE) { >> +int err = map_kernel_page(ea + i, pa + i, prot); >> +if (WARN_ON_ONCE(err)) /* Should clean up */ >> +return err; >> +} > > Same loop again. > > What about not doing a radix specific function and just putting > something like below in the core ioremap_range() function ? > > if (likely(slab_is_available()) && radix_enabled()) { > int err = ioremap_page_range(ea, ea + size, pa, prot); > > if (err) > unmap_kernel_range(ea, size); > return err; > } > > Because I'm pretty sure will more and more use ioremap_page_range(). Well I agree the duplication is not so nice, but it's convenient to see what is going on for each MMU type. There is a significant amount of churn that needs to be done in this layer so I prefer to make it a bit simpler despite duplication. I would like to remove the early ioremap or make it into its own function. Re-implement map_kernel_page with ioremap_page_range, allow page tables that don't use slab to avoid the early check, unbolt the hptes mapped in early boot, etc. I just wanted to escape out the 64s and hash/radix implement
Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
Le 10/06/2019 à 05:08, Nicholas Piggin a écrit : Radix can use ioremap_page_range for ioremap, after slab is available. This makes it possible to enable huge ioremap mapping support. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/radix.h | 3 +++ arch/powerpc/mm/book3s64/pgtable.c | 21 + arch/powerpc/mm/book3s64/radix_pgtable.c | 21 + arch/powerpc/mm/pgtable_64.c | 2 +- 4 files changed, 46 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index 574eca33f893..e04a839cb5b9 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -266,6 +266,9 @@ extern void radix__vmemmap_remove_mapping(unsigned long start, extern int radix__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t flags, unsigned int psz); +extern int radix__ioremap_range(unsigned long ea, phys_addr_t pa, + unsigned long size, pgprot_t prot, int nid); + 'extern' is pointless here, and checkpatch will cry. static inline unsigned long radix__get_tree_size(void) { unsigned long rts_field; diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index ff98b663c83e..953850a602f7 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -450,3 +450,24 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, return true; } + +int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) +{ + unsigned long i; + + if (radix_enabled()) + return radix__ioremap_range(ea, pa, size, prot, nid); This function looks pretty similar to the one in the previous patch. Since radix_enabled() is available and return false for all other subarches, I think the above could go in the generic ioremap_range(), you'll only need to move the function declaration in a common file, for instance asm/io.h + + for (i = 0; i < size; i += PAGE_SIZE) { + int err = map_kernel_page(ea + i, pa + i, prot); + if (err) { + if (slab_is_available()) + unmap_kernel_range(ea, size); + else + WARN_ON_ONCE(1); /* Should clean up */ + return err; + } + } + + return 0; +} diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index c9bcf428dd2b..db993bc1aef3 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -11,6 +11,7 @@ #define pr_fmt(fmt) "radix-mmu: " fmt +#include #include #include #include @@ -1122,3 +1123,23 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, set_pte_at(mm, addr, ptep, pte); } + +int radix__ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, + pgprot_t prot, int nid) +{ + if (likely(slab_is_available())) { + int err = ioremap_page_range(ea, ea + size, pa, prot); + if (err) + unmap_kernel_range(ea, size); + return err; + } else { + unsigned long i; + + for (i = 0; i < size; i += PAGE_SIZE) { + int err = map_kernel_page(ea + i, pa + i, prot); + if (WARN_ON_ONCE(err)) /* Should clean up */ + return err; + } Same loop again. What about not doing a radix specific function and just putting something like below in the core ioremap_range() function ? if (likely(slab_is_available()) && radix_enabled()) { int err = ioremap_page_range(ea, ea + size, pa, prot); if (err) unmap_kernel_range(ea, size); return err; } Because I'm pretty sure will more and more use ioremap_page_range(). + return 0; + } +} diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c index 6bd3660388aa..63cd81130643 100644 --- a/arch/powerpc/mm/pgtable_64.c +++ b/arch/powerpc/mm/pgtable_64.c @@ -108,7 +108,7 @@ unsigned long ioremap_bot; unsigned long ioremap_bot = IOREMAP_BASE; #endif -static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) +int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid) Hum. Weak functions remain in unused in vmlinux unless CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is selected. Also, they are some how dangerous because people might change them without seeing that it is overridden for some particular configuration. Christophe {