Re: [PATCH v3 1/2] schemas: Add a schema for memory map

2023-08-23 Thread Mark Rutland
On Tue, Aug 22, 2023 at 02:34:42PM -0600, Simon Glass wrote:
> The Devicetree specification skips over handling of a logical view of
> the memory map, pointing users to the UEFI specification.
> 
> It is common to split firmware into 'Platform Init', which does the
> initial hardware setup and a "Payload" which selects the OS to be booted.
> Thus an handover interface is required between these two pieces.
> 
> Where UEFI boot-time services are not available, but UEFI firmware is
> present on either side of this interface, information about memory usage
> and attributes must be presented to the "Payload" in some form.

Today Linux does that by passing:

  /chosen/linux,uefi-mmap-start
  /chosen/linux,uefi-mmap-size
  /chosen/linux,uefi-mmap-desc-size
  /chosen/linux,uefi-mmap-desc-ver

... or /chosen/xen,* variants of those.

Can't we document / genericise that?

Pointing to that rather than re-encoding it in DT means that it stays in-sync
with the EFI spec and we won't back ourselves into a corner where we cannot
encode something due to a structural difference. I don't think it's a good idea
to try to re-encode it, or we're just setting ourselves up for futher pain.

Thanks,
Mark.

> 
> This aims to provide an initial schema for this mapping.
> 
> Note that this is separate from the existing /memory and /reserved-memory
> nodes, since it is mostly concerned with what the memory is used for. It
> may cover only a small fraction of available memory.
> 
> For now, no attempt is made to create an exhaustive binding, so there are
> some example types listed. This can be completed once this has passed
> initial review.
> 
> This binding does not include a binding for the memory 'attribute'
> property, defined by EFI_BOOT_SERVICES.GetMemoryMap(). It may be useful
> to have that as well, but perhaps not as a bit mask.
> 
> Signed-off-by: Simon Glass 
> ---
> 
> Changes in v3:
> - Reword commit message again
> - cc a lot more people, from the FFI patch
> - Split out the attributes into the /memory nodes
> 
> Changes in v2:
> - Reword commit message
> 
>  dtschema/schemas/memory-map.yaml | 61 
>  1 file changed, 61 insertions(+)
>  create mode 100644 dtschema/schemas/memory-map.yaml
> 
> diff --git a/dtschema/schemas/memory-map.yaml 
> b/dtschema/schemas/memory-map.yaml
> new file mode 100644
> index 000..4b06583
> --- /dev/null
> +++ b/dtschema/schemas/memory-map.yaml
> @@ -0,0 +1,61 @@
> +# SPDX-License-Identifier: BSD-2-Clause
> +# Copyright 2023 Google LLC
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/memory-map.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: /memory-map nodes
> +description: |
> +  Common properties always required in /memory-map nodes. These nodes are
> +  intended to resolve the nonchalant clause 3.4.1 ("/memory node and UEFI")
> +  in the Devicetree Specification.
> +
> +maintainers:
> +  - Simon Glass 
> +
> +properties:
> +  $nodename:
> +const: 'memory-map'
> +
> +patternProperties:
> +  "^([a-z][a-z0-9\\-]+@[0-9a-f]+)?$":
> +type: object
> +additionalProperties: false
> +
> +properties:
> +  reg:
> +minItems: 1
> +maxItems: 1024
> +
> +  usage:
> +$ref: /schemas/types.yaml#/definitions/string
> +description: |
> +  Describes the usage of the memory region, e.g.:
> +
> +"acpi-reclaim", "acpi-nvs", "bootcode", "bootdata", "bootdata",
> +"runtime-code", "runtime-data".
> +
> +See enum EFI_MEMORY_TYPE in "Unified Extensible Firmware 
> Interface
> +(UEFI) Specification" for all the types. For now there are not
> +listed here.
> +
> +required:
> +  - reg
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +memory-map {
> +acpi@f {
> +reg = <0xf 0x4000>;
> +usage = "acpi-reclaim";
> +};
> +
> +runtime@1230 {
> +reg = <0x1230 0x28000>;
> +usage = "runtime-code";
> +};
> +};
> +...
> -- 
> 2.42.0.rc1.204.g551eb34607-goog
> 


Re: U-Boot: Arm64: bootm gets stuck if RANDOMIZE_BASE is disabled

2021-07-13 Thread Mark Rutland
On Tue, Jul 13, 2021 at 02:15:08PM +0500, Ahsan Hussain wrote:
> Hello,
> 
> I'm dumbfounded by a seemingly unrelated early kernel hang/failing to boot
> when CONFIG_RANDOMIZE_BASE=n is set in kernel and we use FIT uImage. I've
> verified this behavior on a couple of i.MX8 SoCs (i.MX8M plus and i.MX8QXP)
> and the results remain consistent.
> 
> I'm able to boot kernel when I use booti command. However when I use bootm
> to boot a U-Boot fitImage (with kernel and fdt load addresses/entrypoint in
> .its file same as I used for booti command; also tried disabling relocation
> for fdt by setting fdt_high=~0UL), the boot gets stuck at "Starting kernel
> ...". On disabling RANDOMIZE_BASE kconfig in Linux the same fitImage is able
> to boot.

Can you say which address you're trying to load the kernel to?

> I've tried enabling earlycon and U-Boot debug messages in common/bootm.c and
> arch/arm/lib/bootm.c but found no helpful difference in both boot flows.
> Please let me know if I'm missing something obvious or where do I start
> looking to debug this issue.

IIUC, the booti command respects the text_offset from the kernel header,
whereas bootm will not. If you have a hard-coded offset, it's possible
you're violating the offset the kernel expects, and where the kernel is
not relocatable, if can't fix itself up.

I suspect you have a hard-coded offset of 0x8, whereas recent
kernels have a text offset of 0x0. Your bootloader *should* read
this dynamially rather than hard-coding it.

For details, see:

https://www.kernel.org/doc/html/v5.4/arm64/booting.html#call-the-kernel-image

Thanks,
Mark.


Re: [U-Boot] [PATCH] arm: stm32mp1: add PSCI support

2018-03-28 Thread Mark Rutland
Hi,

On Tue, Mar 20, 2018 at 01:59:03PM +0100, Patrick Delaunay wrote:
> Add minimal PSCI support for Linux.
> 
> Signed-off-by: Patrick Delaunay 

> +int __secure psci_features(unsigned int psci_fid)
> +{
> + switch (psci_fid) {
> + case ARM_PSCI_0_2_FN_PSCI_VERSION:
> + case ARM_PSCI_0_2_FN_CPU_ON:
> + case ARM_PSCI_0_2_FN_CPU_OFF:
> + case ARM_PSCI_0_2_FN_SYSTEM_RESET:
> + return 0x0;
> + }
> + return ARM_PSCI_RET_NI;
> +}
>
> +unsigned int __secure psci_version(void)
> +{
> + return ARM_PSCI_VER_1_0;
> +}

I'm a bit worried, because while PSCI_VERSION reports PSCI 1.0, this
does not appear to be conformant with teh PSCI 1.0 spec, as some
mandatory functions are missing:

* AFFINITY_INFO -- Linux relies on this to ensure that CPUs have exited
  the kernel when calling CPU_OFF (e.g. so that we don't free any data /
  code that said CPU may be using on the path to calling CPU_OFF).

* SYSTEM_OFF -- Can you implement this similarly to SYSTEM_RESET?

> +int __secure psci_cpu_on(u32 __always_unused unused, u32 mpidr, u32 pc)
> +{

What about the context_id? PSCI 0.2+ mandates it, and some project rely
upon it (though Linux currently does not).

> + u32 cpu = (mpidr & 0x3);
> +
> + /* store target PC */
> + psci_save_target_pc(cpu, pc);
> +
> + /* write entrypoint in backup RAM register */
> + writel((u32)&psci_cpu_entry, TAMP_BACKUP_BRANCH_ADDRESS);
> +
> + /* write magic number in backup register */
> + writel(BOOT_API_A7_CORE1_MAGIC_NUMBER, TAMP_BACKUP_MAGIC_NUMBER);
> + stm32mp_smp_kick_all_cpus();
> +
> + return ARM_PSCI_RET_SUCCESS;

Does some other part of U-Boot do some state tracking? What happens if
this is called for a CPU that's already online?

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] [PATCH v2] arm64: booti: allow to place kernel image anywhere in physical memory

2017-03-08 Thread Mark Rutland
On Wed, Mar 08, 2017 at 11:35:12AM +0900, Masahiro Yamada wrote:
> At first, the ARM64 Linux booting requirement recommended that the
> kernel image be placed text_offset bytes from 2MB aligned base near
> the start of usable system RAM because memory below that base address
> was unusable at that time.
> 
> This requirement was relaxed by Linux commit a7f8de168ace ("arm64:
> allow kernel Image to be loaded anywhere in physical memory").
> Since then, the bit 3 of the flags field indicates the tolerance
> of the kernel physical placement.  If this bit is set, the 2MB
> aligned base may be anywhere in physical memory.  For details, see
> Documentation/arm64/booting.txt of Linux.
> 
> The booti command should be also relaxed.  If the bit 3 is set,
> images->ep is respected, and the image is placed at the nearest
> bootable location.  Otherwise, it is relocated to the start of the
> system RAM to keep the original behavior.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
> Changes in v2:
>   - Use le64_to_cpu() for correct endian-ness
>   - Check the bit 3
> 
>  cmd/booti.c | 18 ++
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/cmd/booti.c b/cmd/booti.c
> index bff87a8..8f3507d 100644
> --- a/cmd/booti.c
> +++ b/cmd/booti.c
> @@ -11,6 +11,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  DECLARE_GLOBAL_DATA_PTR;
>  
> @@ -20,7 +22,7 @@ struct Image_header {
>   uint32_tcode1;  /* Executable code */
>   uint64_ttext_offset;/* Image load offset, LE */
>   uint64_timage_size; /* Effective Image size, LE */
> - uint64_tres1;   /* reserved */
> + uint64_tflags;  /* Kernel flags, LE */
>   uint64_tres2;   /* reserved */
>   uint64_tres3;   /* reserved */
>   uint64_tres4;   /* reserved */
> @@ -51,10 +53,18 @@ static int booti_setup(bootm_headers_t *images)
>   }
>  
>   /*
> -  * If we are not at the correct run-time location, set the new
> -  * correct location and then move the image there.
> +  * If bit 3 of the flags field is set, the 2MB aligned base of the
> +  * kernel image can be anywhere in physical memory, so respect
> +  * images->ep.  Otherwise, relocate the image to the base of RAM
> +  * since memory below it is not accessible via the linear mapping.
>*/
> - dst = gd->bd->bi_dram[0].start + le64_to_cpu(ih->text_offset);
> + if (le64_to_cpu(ih->flags) & BIT(3))
> + dst = images->ep - le64_to_cpu(ih->text_offset);

I take it this is a pre-correction for the ALIGN() below?

> + else
> + dst = gd->bd->bi_dram[0].start;
> +
> + dst = ALIGN(dst, SZ_2M);
> + dst += le64_to_cpu(ih->text_offset);

There's one last wrinkle to take care of here, if we want to boot a
kernel older than commit a2c1d73b94ed49f5 (i.e. v3.16). Until then, the
text_offset was of unknown endianness.

As mentiond in the Linux documentation, you can detect this based on the
image_size field, e.g.

uint64_t text_offset;

/*
 * Prior to Linux commit a2c1d73b94ed49f5, the text_offset field
 * is of unknown endianness. In these cases, the image_size
 * field is zero, and we can assume a fixed value of 0x8.
 */
if (le64_to_cpu(ih->image_size) == 0)
text_offset = 0x8;
else
text_offset = (le64_to_cpu(ih->text_offset));

... then you can reuse that text_offset value for both cases above.

Otherwise, this looks fine to me.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] [PATCH 2/2] arm64: booti: allow to place kernel image anywhere in physical memory

2017-03-07 Thread Mark Rutland
On Tue, Mar 07, 2017 at 07:16:56AM -0500, Tom Rini wrote:
> On Tue, Mar 07, 2017 at 11:43:52AM +0000, Mark Rutland wrote:
> > On Tue, Feb 28, 2017 at 12:15:09PM -0500, Tom Rini wrote:
> > > On Wed, Mar 01, 2017 at 02:03:58AM +0900, Masahiro Yamada wrote:
> > > > 2017-02-27 7:41 GMT+09:00 Tom Rini :
> > > > If we put the image at 2MiB aligned base, the relocation would
> > > > always happen.
> > > 
> > > Correct.  But I honestly don't know if non-randomized text offset is the
> > > common case people will optimize for or randomized for added security 
> > > will be
> > > the more common case.  
> > 
> > FWIW, the randomized text_offset is a bootloader debugging/testing
> > feature, and there's no security aspect to it.
> > 
> > It was added [1] as an additional to hint to bootloader authors that
> > they must respect the text_offset field.
> 
> Right, and we do this today.  But since this doubles as a kind of cheap
> KASLR I would also expect to see it used, even if not intended, in this
> way.

I can certainly imagine people loading the kernel at a random physical
base address (i.e. a random 2M base + text_offset), and doing that's
perfectly fine for kernels happy to be loaded at arbitrary bases. That
may help to frustrate some DMA attacks.

I take it that's what you meant?

Given text_offset itself is fixed at compile time, randomizing it
provides absolutely no security benefit, and we should be careful not to
give the impression that it does.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] [PATCH 2/2] arm64: booti: allow to place kernel image anywhere in physical memory

2017-03-07 Thread Mark Rutland
On Tue, Feb 28, 2017 at 12:15:09PM -0500, Tom Rini wrote:
> On Wed, Mar 01, 2017 at 02:03:58AM +0900, Masahiro Yamada wrote:
> > 2017-02-27 7:41 GMT+09:00 Tom Rini :
> > > On Thu, Feb 23, 2017 at 10:31:17AM -0500, Tom Rini wrote:

> > > c) I'm not convinced your math above is correct.  images->ep is where we
> > > were put in memory.  This is what we should make sure is 2MiB aligned,
> > > and then add to it the text_offset.  And some quick testing with
> > > CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET enabled Images :)
> > 
> > My intention is
> > 
> >   images->ep  =  (2MiB aligned base)  + text_offset
> > 
> > If this equation is met, the image is already placed at the bootable 
> > position.
> > We can skip the relocation.
> > 
> > Theoretically, we can not know the value of text_offset in advance
> > (especially for CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET).
> > However, in practice, we know text_offset is 0x8.

Per the arm64 Image header, no particular value  of text_offset should
be assumed. Please do not assume a particular value

It has always been the intent that bootloaders should read this, though
this evidently wasn't very clear (and the bootwrapper set a bad
example). I tried to clear this up with documentation updates (and the
addition of ARM64_RANDOMIZE_TEXT_OFFSET).

> > If we put the image at 2MiB aligned base, the relocation would
> > always happen.
> 
> Correct.  But I honestly don't know if non-randomized text offset is the
> common case people will optimize for or randomized for added security will be
> the more common case.  

FWIW, the randomized text_offset is a bootloader debugging/testing
feature, and there's no security aspect to it.

It was added [1] as an additional to hint to bootloader authors that
they must respect the text_offset field.

Thanks,
Mark.

[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da57a369d3bc5cd61db90f7e9555840381db9b09
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] [RFC PATCH] armv8: cache: Switch MMU table without turning off MMU

2017-02-10 Thread Mark Rutland
Hi,

On Fri, Feb 03, 2017 at 02:22:48PM -0800, York Sun wrote:
> We don't have to completely turn off MMU and cache to switch to
> another MMU table as far as the data is coherent before and after
> the switching. This patch relaxes the procedure.
>
> Signed-off-by: York Sun 
> CC: Alexander Graf 
> ---
> I found this issue while trying to change MMU table for a SPL boot.
> The RAM version U-Boot was copied to memory when d-cache was enabled.
> SPL code never flushed the cache (I will send another patch to fix that).
> With existing code, U-Boot stops running as soon as the MMU is diabled.
> With below propsed change, U-Boot continues to run. I have been in
> contact with ARM support and got very useful information. However, this
> switching TTBR method is "behavious that should work"  but not well
> verified by ARM. During my debugging, I found other minor issue which
> convinced me this code wasn't exercised. I don't intend to use this
> method in long term, but figure it may help others by fixing it.

I believe the approach taken in this patch is unsafe.

Even if the resulting translations are identical across the old and new
tables, it is not valid to transition from one set of tables to another
unless:

(a) the mappings in both cases are non-global, and the ASID is switched,
or:

(b) A Break-Before-Make strategy is followed to ensure that the TLBs
contain no stale entries for the old tables.

Even if the resulting translations are identical, you can be subject to
TLB conflict aborts and/or behaviours resulting from the amalgamation of
TLB entries. In Linux, we had to write special code [1] to switch tables
by using some trampoline code in the other TTBR.

I believe that in this case, you only need to clean the MMU-off code to
the PoC (using DC DVAC), before temporarily turning the MMU off.

Thanks,
Mark.

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401434.html

>  arch/arm/cpu/armv8/cache.S | 44 +---
>  1 file changed, 9 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv8/cache.S b/arch/arm/cpu/armv8/cache.S
> index f1deaa7..63fb112 100644
> --- a/arch/arm/cpu/armv8/cache.S
> +++ b/arch/arm/cpu/armv8/cache.S
> @@ -171,35 +171,10 @@ ENDPROC(__asm_invalidate_l3_icache)
>  /*
>   * void __asm_switch_ttbr(ulong new_ttbr)
>   *
> - * Safely switches to a new page table.
> + * Switches to a new page table. Cache coherency must be maintained
> + * before calling this function.
>   */
>  ENTRY(__asm_switch_ttbr)
> - /* x2 = SCTLR (alive throghout the function) */
> - switch_el x4, 3f, 2f, 1f
> -3:   mrs x2, sctlr_el3
> - b   0f
> -2:   mrs x2, sctlr_el2
> - b   0f
> -1:   mrs x2, sctlr_el1
> -0:
> -
> - /* Unset CR_M | CR_C | CR_I from SCTLR to disable all caches */
> - movnx1, #(CR_M | CR_C | CR_I)
> - and x1, x2, x1
> - switch_el x4, 3f, 2f, 1f
> -3:   msr sctlr_el3, x1
> - b   0f
> -2:   msr sctlr_el2, x1
> - b   0f
> -1:   msr sctlr_el1, x1
> -0:   isb
> -
> - /* This call only clobbers x30 (lr) and x9 (unused) */
> - mov x3, x30
> - bl  __asm_invalidate_tlb_all
> -
> - /* From here on we're running safely with caches disabled */
> -
>   /* Set TTBR to our first argument */
>   switch_el x4, 3f, 2f, 1f
>  3:   msr ttbr0_el3, x0
> @@ -209,14 +184,13 @@ ENTRY(__asm_switch_ttbr)
>  1:   msr ttbr0_el1, x0
>  0:   isb
>  
> - /* Restore original SCTLR and thus enable caches again */
> - switch_el x4, 3f, 2f, 1f
> -3:   msr sctlr_el3, x2
> - b   0f
> -2:   msr sctlr_el2, x2
> - b   0f
> -1:   msr sctlr_el1, x2
> -0:   isb
> + /* invalidate i-cache */
> + ic  ialluis
> + isb sy
> +
> + /* This call only clobbers x30 (lr) and x9 (unused) */
> + mov x3, x30
> + bl  __asm_invalidate_tlb_all
>  
>   ret x3
>  ENDPROC(__asm_switch_ttbr)
> -- 
> 2.7.4
> 
> ___
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [RFC PATCH] arm: bootm: Boot kernel with U-Boot's FDT blob

2017-01-13 Thread Mark Rutland
On Thu, Jan 12, 2017 at 01:47:32PM +, Ryan Harkin wrote:
> On 12 January 2017 at 12:25, Mark Rutland  wrote:
> > On Tue, Jan 10, 2017 at 06:50:19PM +, Jon Medhurst (Tixy) wrote:
> >> On Tue, 2017-01-10 at 18:34 +, Mark Rutland wrote:
> >> > Looking at the git log for arch/arm64/boot/dts/arm, most updates are
> >> > simply adding new descriptions, so a DTB from a year ago should work
> >> > just fine with mainline (modulo the Juno PCI window issue, which was a
> >> > DTB bug). Upgrading kernel shouldn't require a DTB upgrade to see
> >> > equivalent functionality.

> > The key point is that it is possible to provide a baseline DTB that is
> > good enough for most users, and will work with future kernels.
> >
> > We're unlikely to get to a state where DTBs are perfect and complete
> > from day one. We can have something that remains usable.
> 
> I hope it stays that way. Most of my users are either on 3.18 or 4.4.
> And they are incompatible with each other w.r.t. DTBs to the point
> where one won't even post a banner message with the other's DTB.

Interesting. Just to check, do you mean v3.19? There was no upstream
Juno DT in v3.18.

Unfortunately, I can't spot any DT changes between v3.19 and v4.4 that
would obviously break compatibility such that serial wouldn't work.

If you have those kernels && DTBs to hand, are you able to take a look
if passing "earlycon=pl011,0x7ff8"?

I know that the ARM Software linux repo shipped a broken DT, along with
some kernel modifications which bodge around that (specifically, they
exposed a broken MMIO timer as functional). IIRC, Poking that would
bring down the kernel, before the serial wa up.

Is your v3.18 DT the old ARM Software repo's Juno DT?

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [RFC PATCH] arm: bootm: Boot kernel with U-Boot's FDT blob

2017-01-12 Thread Mark Rutland
On Tue, Jan 10, 2017 at 06:50:19PM +, Jon Medhurst (Tixy) wrote:
> On Tue, 2017-01-10 at 18:34 +0000, Mark Rutland wrote:
> > Looking at the git log for arch/arm64/boot/dts/arm, most updates are
> > simply adding new descriptions, so a DTB from a year ago should work
> > just fine with mainline (modulo the Juno PCI window issue, which was a
> > DTB bug). Upgrading kernel shouldn't require a DTB upgrade to see
> > equivalent functionality.
> 
> But if you want the new functionality in the kernel, why should you be
> forced to wait for the bootloader to catch up (or do that work yourself)
> then upgrade to that new bootloader version? And what about the poor
> devs working on that new functionality, they're going to need to use not
> upstream device-trees. Then there's all the firmware and system
> configuration stuff that's in device-tree.

Developers working on low-level stuff will always need to be able to
override/upgrade/etc. I am certainly not arguing to remove those
capabilities.

The key point is that it is possible to provide a baseline DTB that is
good enough for most users, and will work with future kernels.

We're unlikely to get to a state where DTBs are perfect and complete
from day one. We can have something that remains usable.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [RFC PATCH] arm: bootm: Boot kernel with U-Boot's FDT blob

2017-01-10 Thread Mark Rutland
On Tue, Jan 10, 2017 at 05:17:07PM +, Ryan Harkin wrote:
> On 10 January 2017 at 16:58, Alexander Graf  wrote:
> > On 01/10/2017 05:47 PM, Ryan Harkin wrote:

> >> I have a background task to refactor u-boot support for ARM Ltd
> >> boards. One of many options I was considering was to have a minimal
> >> DTB to configure the platform with only the nodes needed for u-boot.
> >> The ARM Ltd device trees fluctuate so much, I wouldn't be able to
> >> commit to one DTB that will work forever...
> >
> > No, it's only meant as a fallback when no manual device tree is provided.
> 
> Thanks for confirmation.
> 
> > In an ideal world however, device trees are static and complete, so
> > you could just put a final dt into U-Boot and have it propagated all
> > the way through.
> 
> I look forward to living in this ideal world the EDK2 and kernel
> communities promised me several years ago ;-)

To be fair, the *upstream* DTs for ARM Ltd platforms are relatively
stable. I must assume you're talking about random platform trees from
elsewhere, which it's not fair to blame the EDK2 or Linux communities
for. ;)

Looking at the git log for arch/arm64/boot/dts/arm, most updates are
simply adding new descriptions, so a DTB from a year ago should work
just fine with mainline (modulo the Juno PCI window issue, which was a
DTB bug). Upgrading kernel shouldn't require a DTB upgrade to see
equivalent functionality.

It's certainly not great that those aren't in a separate canonical repo,
but in terms of stability we are largely there, random *not upstream*
platform trees notwithstanding. We'll never get complete from day one,
so some updates over time are a fact of life, but we are in the position
to ship something that continues to work...

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [Resend RFC PATCH 1/2] armv8: Fix dcache disable function

2016-11-07 Thread Mark Rutland
On Fri, Oct 28, 2016 at 09:35:37PM +, york sun wrote:
> I am struggling on the dcache_disable() which implies all dcache is 
> flushed. I don't have a reasonable way to flush all if I want to skip 
> L3. I tried to benchmark flushing by VA to cover my entire 16GB memory. 
> It took 30+ seconds. On the other side, flushing by set/way and flushing 
> L3 together took 7 ms. If I only flush U-Boot stack in this function, it 
> can run really fast, but that defeats the purpose of flush all cache.
> 
> I thought of parsing each set/way to find the address of each cache line 
> (I don't know how to do that yet), but the tag only contains physical 
> address not VA.

With the MMU off, translation is an idmap (i.e. VA == PA), so if you
have physical addresses, you can use those directly.

That said, the presence and implementation of any mechanism to read
addresses from the cache is IMPLEMENTATION DEFINED, so this will not be
portable.

> The ARM document shows example code to clean entire data or unified 
> cache to PoC, very similar to the code we have in U-Boot armv8/cache.S.

Do you mean the "Example code for cache maintenance instructions"?

In recent versions of the ARM ARM there's a large note explaining why
this only works in very restricted scenarios (and cannot be used to
affect system caches such as your L3).

In the latest ARM ARM ("ARM DDI 0487A.k"), see page D3-1710.

> Unless there are other cache maintenance instruction I am not aware of, 
> I don't see how to flush to PoC by set/way.

Architecturally, Set/Way operations are not guaranteed to affect al
caches prior to the PoC, and may require other IMPLEMENTATION DEFINED
maintenance (e.g. MMIO control of system-level caches).

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [Resend RFC PATCH 1/2] armv8: Fix dcache disable function

2016-11-07 Thread Mark Rutland
On Fri, Oct 28, 2016 at 12:32:36PM -0600, Stephen Warren wrote:
> Related, consider the following from the Linux kernel's
> Documentation/arm64/booting.txt:
> 
> >- Caches, MMUs
> >  The MMU must be off.
> >  Instruction cache may be on or off.
> >  The address range corresponding to the loaded kernel image must be
> >  cleaned to the PoC.
> 
> (That only applies to the kernel image specifically, but doing the
> same for the entire cache content seems reasonable, perhaps even
> required for other reasons?)

It's certainly preferable.

The wording is somewhat poor too, and needs soem fixing up.

If anything has been allocated into the cache which may conflict with
later use with Normal Inner-Shareable Inner-WB Outer-WB mappings, thise
needs to be (Cleaned+)Invalidated from the caches.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v3 01/10] arm: add atomic functions with return support

2016-10-26 Thread Mark Rutland
Hi,

On Wed, Oct 26, 2016 at 02:10:24PM +0200, Antoine Tenart wrote:
> Implement three atomic functions to allow making an atomic operation
> that returns the value. Adds: atomic_add_return(), atomic_sub_return(),
> atomic_inc_return() and atomic_dec_return().
> 
> Signed-off-by: Antoine Tenart 

In the cover letter, you mentioned that these are needed for SMP
systems (for managing common suspend entry/exit management).

The below operations are *not* atomic in SMP systems, and can only work
in UP. The same is true of all the existing code in the same file.

> +static inline int atomic_add_return(int i, volatile atomic_t *v)
> +{
> + unsigned long flags = 0;
> + int ret;
> +
> + local_irq_save(flags);
> + ret = (v->counter += i);
> + local_irq_restore(flags);
> +
> + return ret;
> +}

local_irq_{save,restore}() won't serialize two CPUs. Consider two CPUs
executing this in parallel (assuming the compiler chooses a register rX
as a temporary):

CPU0CPU1

local_irq_save()local_irq_save()
rX = v->counter rX = v->counter
rX += i rX += i
v->counter = rX
v->counter = rX
local_irq_restore() local_irq_restore()

At the end of this, CPU0's increment of v->counter is lost.

If you need atomics on SMP, you'll need to use the
{load,store}-exclusive instructions, which come with a number of
additional requirements (e.g. the memory being operated on must be
mapped as write-back cacheable, the entire retry loop needs to be
writtten in asm, etc).

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [Resend RFC PATCH 1/2] armv8: Fix dcache disable function

2016-10-24 Thread Mark Rutland
On Fri, Oct 21, 2016 at 07:31:52PM +, york sun wrote:
> On 10/20/2016 01:34 PM, Stephen Warren wrote:
> > On 10/19/2016 11:06 PM, york sun wrote:
> >> I understand the data in dirty cache is not lost when the dcache is
> >> disabled. It is just not accessible. In my case, after flushing L1/L2 by
> >> way/set, the data is probably in L3 cache. Without flushing L3, I have
> >> stalled data (including the stack) in main memory.
> >
> > I assume "stale" not "stalled".
> >
> > Yes, I agree: If you have only flushed L1/L2 by set/way (or attempted to
> > do everything, but the L3 cache doesn't implement by set/way
> > operations), then L3 can indeed still contain dirty data, and hence main
> > memory can be stale.
> >
> >> My previous test was trying to prove I can skip flushing L3 if I flush
> >> the necessary VA.
> >
> > It depends whether your L3 cache is before/after the Level of
> > Unification and/or the Level of Coherency for your HW, and whether you
> > flush by VA to the PoU or PoC.
> >
> > Looking at your "[PATCH 2/2] armv8: Fix flush_dcache_all function", it
> > uses __asm_flush_dcache_range() which cleans and invalidates to the
> > point of coherency (it invokes the dc civac instruction).
> >
> >  > Now I when recall it carefully, I think I made a
> >> mistake by flushing by VA _after_ flushing the cache by way/set. I might
> >> have a positive result if I flushed the cache by VA first. I will repeat
> >> this test when I get back to prove this theory.
> >
> > I'll assume you L3 cache is before the Point of Coherency. If so, then
> > performing a clean/invalidate by VA to the PoC will affect all of L1,
> > L2, and L3 caches. As such, you need only perform the c/i by VA, and you
> > can entirely skip the c/i by set/way; I believe it would be redundant,
> > assuming that the by VA operations cover all relevant VAs.
> 
> I believe the PoC and PoU is before L3 for me. 

If you are using CCN, then the PoC is beyond the L3.

Were the PoC before the L3, there would be no requirement to perform
maintenance on the L3. The PoC is the point at which *all* accesses
(cacheable or otherwise) see the same data.

Per the ARM ARM (for ARMv8-A), maintenance to the PoC *must* affect
system caches (including CCN).

> I can clean/invalidate by VA, it may not cover all the cache lines. So
> by set/way is still needed.

The problem is figuring out which VA ranges require maintenance.

Do we not have an idea of the set of memory banks present in the SoC?
Like the memblock array in Linux?

> > b) How can we implement the by VA code in a way that doesn't touch DRAM?
> >
> > Implementing by-set/way is fairly constrained in that all the
> > configuration data is in a few registers, and the algorithm just .
> > requires a few nested loops.
> >
> > A generic by VA implementation seems like it would require walking
> > U-Boot's struct mm_region *mem_map data structure (or the CPU's
> > translation tables) in RAM. Perhaps that's OK since it's read-only...

So long as you clean the structure by VA to the PoC first, you can
safely access it with non-cacheable accesses.

> I agree in general about your points, but it may not be always practical 
> to flush all by VA. U-Boot may map huge amount of memory. Walking 
> through MMU table and flush all will be too much. 

I would recommend that you benchmark that; from my own experiments, so
long as you only perform maintenance on the portions of the PA space you
care about (and amortize barriers), this can take surprisingly little
time.

> Without flushing all memory, we really cannot say we flushed all
> dcache. On the other side, for U-Boot itself to operate, we don't
> really have to flush all. I guess the key is if we need to flush all.
> For Linux to boot, we don't.

This depends; so long as you've *only* used Normal, Inner-Shareable,
Inner Write-Back, Outer Write-Back, you could omit some maintenance, but
you still need to clean the Linux image to the PoC.

Any memory mapped with other attributes *must* be invalidated (and
perhaps clean+invalidated) from the caches.

> We can flush some memory (including U-Boot stack), turn off the MMU.
> As soon as kernel boots up, it enables dcache and everything is back.

If you have used memory attributes inconsistent with Linux, things will
end badly from here, due to potential loss of coherency resulting from
mismatched memory attributes.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [Resend RFC PATCH 1/2] armv8: Fix dcache disable function

2016-10-24 Thread Mark Rutland
Hi,

Sorry for joining this a bit late; apologies if the below re-treads
ground already covered.

On Wed, Oct 19, 2016 at 09:25:02AM -0600, Stephen Warren wrote:
> On 10/14/2016 02:17 PM, York Sun wrote:
> >Current code turns off d-cache first, then flush all levels of cache.
> >This results data loss. As soon as d-cache is off, the dirty cache
> >is discarded according to the test on LS2080A. This issue was not
> >seen as long as external L3 cache was flushed to push the data to
> >main memory. However, external L3 cache is not guaranteed to have
> >the data. To fix this, flush the d-cache by way/set first to make
> >sure cache is clean before turning it off.
> 
> >diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
> 
> >@@ -478,9 +478,9 @@ void dcache_disable(void)
> 
> >+flush_dcache_all();
> > set_sctlr(sctlr & ~(CR_C|CR_M));
> >
> >-flush_dcache_all();
> > __asm_invalidate_tlb_all();
> 
> I talked to Mark Rutland at ARM, and I believe the current code is
> correct.

Well, almost, but not quite. It's a long story... ;)

I gave a primer [1,2] on the details at ELC earlier this year, which may
or may not be useful.

The big details are:

* Generaly "Flush" is ambiguous/meaningless. Here you seem to want
  clean+invalidate.

* Set/Way operations are for IMPLEMENTATION DEFINED (i.e. SoC-specific)
  cache maintenance sequences, and are not truly portable (e.g. not
  affecting system caches).
  
  I assume that an earlier boot stage initialised the caches prior to
  U-Boot. Given that, you *only* need to perform maintenance for the
  memory you have (at any point) mapped with cacheable attrbiutes, which
  should be a small subset of the PA space. With ARMv8-A, broadcast
  maintenance to the PoC should affect all relevant caches (assuming you
  use the correct shareability attributes).

* You *cannot* write a dcache disable routine in C, as the compiler can
  perform a number of implicit memory accesses (e.g. stack, globals,
  GOT). For that alone, I do not believe the code above is correct.

  Note that we have seen this being an issue in practice, before we got
  rid of Set/Way ops from arm64 Linux (see commit 5e051531447259e5).

* Your dcache disable code *must* be clean to the PoC, prior to
  execution, or instruction fetches could see stale data. You can first
  *clean* this to the PoC, which is sufficient to avoid the problems
  above.

* The SCTLR_ELx.{C,I} bits do not enable/disable caches; they merely
  activate/deactiveate cacheable attributes on data/instruction fetches.

  Note that cacheable instruction fetches can allocate into unified/data
  caches.
  
  Also, note that the I bit is independent of the C bit, and the
  attributes it provides differ when the M bit is clear. Generally, I
  would advise that at all times M == C == I, as that leads to the least
  surprise.

Thanks,
Mark.

[1] http://events.linuxfoundation.org/sites/events/files/slides/slides_17.pdf
[2] https://www.youtube.com/watch?v=F0SlIMHRnLk
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] armv8: Enable CPUECTLR.SMPEN for data coherency

2016-06-30 Thread Mark Rutland
On Thu, Jun 30, 2016 at 04:51:48PM +0800, Gong Qianyu wrote:
> From: Mingkai Hu 
> 
> Data coherency is enabled only when the CPUECTLR.SMPEN bit is
> set. The SMPEN bit should be set before enabling the data cache.
> If not enabled, the cache is not coherent with other cores and
> data corruption could occur.
> 
> Signed-off-by: Mingkai Hu 
> Signed-off-by: Gong Qianyu 
> 
> diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
> index 670e323..735dd67 100644
> --- a/arch/arm/cpu/armv8/start.S
> +++ b/arch/arm/cpu/armv8/start.S
> @@ -81,6 +81,11 @@ reset:
>   msr cpacr_el1, x0   /* Enable FP/SIMD */
>  0:
>  
> + /* Enalbe SMPEN bit */
> + mrs x0, S3_1_c15_c2_1   /* cpuactlr_el1 */
> + orr x0, x0, #0x40
> + msr S3_1_c15_c2_1, x0

Please note that this register is IMPLEMENTATION DEFINED, and not
architectural, even though it happens to be common among ARM Ltd
implementations.

This is also not something that one can usually set on the Non-secure
side, and I'd expect Secure FW such as the ARM Trusted Firmware to
handle this.

If this is necessary within U-Boot, it should be guarded such that it
only runs on the relevant CPUs.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 2/2] arm64: add better spin-table support

2016-06-18 Thread Mark Rutland
On Fri, Jun 17, 2016 at 09:51:49PM +0900, Masahiro Yamada wrote:
> There are two enable methods supported by ARM64 Linux; psci and
> spin-table.  The latter is simpler and easier to use for quick SoC
> bring-up.
> 
> So, I used the spin-table for my first ARMv8 SoC porting, but I
> found its support in U-Boot was poor.  It is true there exists a
> code fragment for the spin code in arch/arm/cpu/armv8/start.S,
> but I see some problems:
> 
>   - We must hard-code CPU_RELEASE_ADDR so that it matches the
> "cpu-release-addr" property in the DT that comes from the
> kernel tree.
> 
>   - The Documentation/arm64/booting.txt in Linux requires that
> the release address must be zero-initialized, but it is not
> cared by the common code in U-Boot.  So, we must do it in a
> board specific manner.
> 
>   - There is no systematic way to protect the spin code from the
> kernel.  U-Boot relocates itself during the boot, so it is
> difficult to predict where the spin code will be located
> after the relocation, which makes it even more difficult to
> hard-code /memreserve/ in the DT of the kernel.  One possible
> work-around would be to pre-fetch the spin-code into the
> I-cache of secondary CPUs, but this is an unsafe solution.
> 
> So, here is a patch to solve the problems.  In this approach, the DT
> is run-time modified to reserve the spin code (+ cpu-release-addr).
> Also, the "cpu-release-addr" property is set to an appropriate
> address after the relocation, which means we no longer need the
> hard-coded CPU_RELEASE_ADDR.
> 
> Currently this patch only supports ARMv8, but theoretically nothing
> about the spin-table is arch-specific.  Perhaps, we might want to
> support it on PowerPC in the future.  So, I put the DT fixup code
> into the common/ directory.  Very little code must be written in
> assembler, which went to the arch/arm/cpu/armv8/ directory.

It's worth noting that while both arm64 and PPC have something called
"spin-table", they're not quite the same. The arm64 version just reused
the enable-method and property names for something different (and
arch-specific).

I have no strong feelings about where the code lives, however.

Otherwise, these sound like useful improvements.

Thanks,
Mark.

> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
>  arch/arm/cpu/armv8/Kconfig | 18 +++
>  arch/arm/cpu/armv8/Makefile|  1 +
>  arch/arm/cpu/armv8/spin_table_v8.S | 22 ++
>  arch/arm/cpu/armv8/start.S | 10 +++---
>  arch/arm/lib/bootm-fdt.c   |  7 +
>  common/Makefile|  1 +
>  common/spin_table.c| 62 
> ++
>  include/spin_table.h   | 14 +
>  8 files changed, 131 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm/cpu/armv8/spin_table_v8.S
>  create mode 100644 common/spin_table.c
>  create mode 100644 include/spin_table.h
> 
> diff --git a/arch/arm/cpu/armv8/Kconfig b/arch/arm/cpu/armv8/Kconfig
> index 3d19bbf..019b625 100644
> --- a/arch/arm/cpu/armv8/Kconfig
> +++ b/arch/arm/cpu/armv8/Kconfig
> @@ -3,4 +3,22 @@ if ARM64
>  config ARMV8_MULTIENTRY
>  boolean "Enable multiple CPUs to enter into U-Boot"
>  
> +config SPIN_TABLE
> + bool "Support spin-table enable method"
> + depends on ARMV8_MULTIENTRY && OF_LIBFDT
> + help
> +   Say Y here to support "spin-table" enable method for booting Linux.
> +
> +   To use this feature, you must do:
> + - Specify enable-method = "spin-table" in each CPU node in the
> +   Device Tree you are using to boot the kernel
> + - Let secondary CPUs in U-Boot (in a board specific manner)
> +   before the master CPU jumps to the kernel
> +
> +   U-Boot automatically does:
> + - Set "cpu-release-addr" property of each CPU node
> +   (overwrites it if already exists).
> + - Reserve the code for the spin-table and the release address
> +   via a /memreserve/ region in the Device Tree.
> +
>  endif
> diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile
> index 1c85aa9..2e3f421 100644
> --- a/arch/arm/cpu/armv8/Makefile
> +++ b/arch/arm/cpu/armv8/Makefile
> @@ -15,6 +15,7 @@ obj-y   += cache.o
>  obj-y+= tlb.o
>  obj-y+= transition.o
>  obj-y+= fwcall.o
> +obj-$(CONFIG_SPIN_TABLE) += spin_table_v8.o
>  
>  obj-$(CONFIG_FSL_LAYERSCAPE) += fsl-layerscape/
>  obj-$(CONFIG_ARCH_ZYNQMP) += zynqmp/
> diff --git a/arch/arm/cpu/armv8/spin_table_v8.S 
> b/arch/arm/cpu/armv8/spin_table_v8.S
> new file mode 100644
> index 000..2f1bd61
> --- /dev/null
> +++ b/arch/arm/cpu/armv8/spin_table_v8.S
> @@ -0,0 +1,22 @@
> +/*
> + * Copyright (C) 2016 Masahiro Yamada 
> + *
> + * SPDX-License-Identifier:  GPL-2.0+
> + */
> +
> +#include 
> +
> +ENTRY(spin_table_secondary_jump)
> +.globl spin_table_reserve_begin
> +spin_table_reserve_begin:
> +0:   wfe
> + ldr x0, spin

Re: [U-Boot] [PATCH v3 01/11] ARM: PSCI: change PSCI function IDs base and offsets

2016-05-23 Thread Mark Rutland
On Wed, May 18, 2016 at 05:10:24PM +0800, macro.wav...@gmail.com wrote:
> From: Wang Dongsheng 
> 
> According to PSCI specification v1.0, the PSCI functions should start from
> 0x8400 for SMC32, this patch changes this base value as well as other
> function offset values.

I agree that these are the correct valeus for PSCI 0.2, and we must use
those IDs for PSCI 0.2+.

However, this code is also used on platforms using PSCI 0.1, which did
not have well-defined IDs, and relied on them being described in the DT.
I fear that this may have the unintended sonequence of breaking those.

Does U-Boot patch the DT with the correct IDs per the PSCI 0.1 binding?
If so, then things are fine.

Thanks,
Mark.

> Signed-off-by: Wang Dongsheng 
> Signed-off-by: Hongbo Zhang 
> ---
>  arch/arm/include/asm/psci.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/psci.h b/arch/arm/include/asm/psci.h
> index 128a606..a4a19e3 100644
> --- a/arch/arm/include/asm/psci.h
> +++ b/arch/arm/include/asm/psci.h
> @@ -19,13 +19,13 @@
>  #define __ARM_PSCI_H__
>  
>  /* PSCI interface */
> -#define ARM_PSCI_FN_BASE 0x95c1ba5e
> +#define ARM_PSCI_FN_BASE 0x8400
>  #define ARM_PSCI_FN(n)   (ARM_PSCI_FN_BASE + (n))
>  
> -#define ARM_PSCI_FN_CPU_SUSPEND  ARM_PSCI_FN(0)
> -#define ARM_PSCI_FN_CPU_OFF  ARM_PSCI_FN(1)
> -#define ARM_PSCI_FN_CPU_ON   ARM_PSCI_FN(2)
> -#define ARM_PSCI_FN_MIGRATE  ARM_PSCI_FN(3)
> +#define ARM_PSCI_FN_CPU_SUSPEND  ARM_PSCI_FN(1)
> +#define ARM_PSCI_FN_CPU_OFF  ARM_PSCI_FN(2)
> +#define ARM_PSCI_FN_CPU_ON   ARM_PSCI_FN(3)
> +#define ARM_PSCI_FN_MIGRATE  ARM_PSCI_FN(5)
>  
>  #define ARM_PSCI_RET_SUCCESS 0
>  #define ARM_PSCI_RET_NI  (-1)
> -- 
> 2.1.4
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 01/10] thunderx: Calculate TCR dynamically

2016-02-25 Thread Mark Rutland
On Wed, Feb 24, 2016 at 06:39:22PM +0100, Alexander Graf wrote:
> On 02/24/2016 02:37 PM, Mark Rutland wrote:
> >On Wed, Feb 24, 2016 at 01:11:35PM +0100, Alexander Graf wrote:
> >>+   /* Calculate the maximum physical (and thus virtual) address */
> >>+   if (max_addr > (1ULL << 44)) {
> >>+   ips = 5;
> >>+   va_bits = 48;
> >>+   } else  if (max_addr > (1ULL << 42)) {
> >>+   ips = 4;
> >>+   va_bits = 44;
> >>+   } else  if (max_addr > (1ULL << 40)) {
> >>+   ips = 3;
> >>+   va_bits = 42;
> >>+   } else  if (max_addr > (1ULL << 36)) {
> >>+   ips = 2;
> >>+   va_bits = 40;
> >>+   } else  if (max_addr > (1ULL << 32)) {
> >>+   ips = 1;
> >>+   va_bits = 36;
> >>+   } else {
> >>+   ips = 0;
> >>+   va_bits = 32;
> >>+   }
> >In Linux we program IPS to the maximum PARange from ID_AA64MMFR0.
> >
> >If you did the same here you wouldn't have to iterate over all the
> >memory map entries to determine the maximum PA you care about (though
> >you may still need to do that for the VA size).
> 
> Since we'd want to find the largest number for VA to trim one level
> of page table if we can, I don't see how it would buy is much to
> take the maximum supported PARange of the core into account.

It would simply be a saving of lines, as you'd program the same IPS
value regardless of max_addr (and you have to expect that PARange is
sufficient regardless).

Otherwise, yes, it doesn't buy you anything.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 02/10] arm64: Make full va map code more dynamic

2016-02-24 Thread Mark Rutland
Hi,

This is a good cleanup!

On Wed, Feb 24, 2016 at 01:11:36PM +0100, Alexander Graf wrote:
> The idea to generate our pages tables from an array of memory ranges
> is very sound. However, instead of hard coding the code to create up
> to 2 levels of 64k granule page tables, we really should just create
> normal 4k page tables that allow us to set caching attributes on 2M
> or 4k level later on.
> 
> So this patch moves the full_va mapping code to 4k page size and
> makes it fully flexible to dynamically create as many levels as
> necessary for a map (including dynamic 1G/2M pages). It also adds
> support to dynamically split a large map into smaller ones when
> some code wants to set dcache attributes.

This latter part scares me a bit. It's _very_ difficult to get that
right, and bugs in incorrect code are latent with asynchronous triggers.
We had to rework the Linux page table creation to avoid issues with
splitting/creating sections [1,2] early in boot.

For Linux we can't split arbitrary sections dynamically (in the kernel
mapping) because that risks unmapping code/data in active use as part of
the required Break-Before-Make sequence.

I suspect the same applies to U-Boot, and that splitting section
dynamically is unsafe once the MMU is on and the mappings are active.

In the ARM ARM (ARM DDI 0487A.i) see D4.7.1 General TLB maintenance
requirements, which has a sub-section "Using break-before-make when
updating translation table entries".

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401434.html
[2] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/386209.html


> +static void set_pte_table(u64 *pte, u64 *table)
> +{
> + /* Point *pte to the new table */
> + debug("Setting %p to addr=%p\n", pte, table);
> + *pte = PTE_TYPE_TABLE | (ulong)table;
> +}

When the MMU is on, you will need barriers between creating/updating
tables and plumbing them into the next level. Otherwise the stores may
not be observed in-order by the TLB, and stale (garbage) might be
allocated into the TLB.

Either that needs to live at the end of the table creation function, or
the beginning of this one.

[...]

> +/* Splits a block PTE into table with subpages spanning the old block */
> +static void split_block(u64 *pte, int level)
> +{
> + u64 old_pte = *pte;
> + u64 *new_table;
> + u64 i = 0;
> + /* level describes the parent level, we need the child ones */
> + int levelshift = level2shift(level + 1);
> +
> + if (pte_type(pte) != PTE_TYPE_BLOCK)
> + panic("PTE %p (%llx) is not a block. Some driver code wants to "
> +   "modify dcache settings for an range not covered in "
> +   "mem_map.", pte, old_pte);
> +
> + new_table = create_table();
> + debug("Splitting pte %p (%llx) into %p\n", pte, old_pte, new_table);
> +
> + for (i = 0; i < MAX_PTE_ENTRIES; i++) {
> + new_table[i] = old_pte | (i << levelshift);
> + debug("Setting new_table[%lld] = %llx\n", i, new_table[i]);
> + }

You need a barrier here to ensure ordering of the above modifications
with the below plumbing into the next level table.

> +
> + /* Set the new table into effect */
> + set_pte_table(pte, new_table);

Splitting blocks in this manner requires Break-Before-Make. You must go
via an invalid entry.

[...]

> +static u64 set_one_region(u64 start, u64 size, u64 attrs, int level)
> +{
> + int levelshift = level2shift(level);
> + u64 levelsize = 1ULL << levelshift;
> + u64 *pte = find_pte(start, level);
> +
> + /* Can we can just modify the current level block PTE? */
> + if (is_aligned(start, size, levelsize)) {
> + *pte &= ~PMD_ATTRINDX_MASK;
> + *pte |= attrs;
> + debug("Set attrs=%llx pte=%p level=%d\n", attrs, pte, level);
> +
> + return levelsize;
> + }

When the MMU is on, I believe you need Break-Before-Make here due to the
change of memory attributes.

> +
> + /* Unaligned or doesn't fit, maybe split block into table */
> + debug("addr=%llx level=%d pte=%p (%llx)\n", start, level, pte, *pte);
> +
> + /* Maybe we need to split the block into a table */
> + if (pte_type(pte) == PTE_TYPE_BLOCK)
> + split_block(pte, level);
> +
> + /* And then double-check it became a table or already is one */
> + if (pte_type(pte) != PTE_TYPE_TABLE)
> + panic("PTE %p (%llx) for addr=%llx should be a table",
> +   pte, *pte, start);
> +
> + /* Roll on to the next page table level */
> + return 0;
> +}
> +
> +void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
> +  enum dcache_option option)
> +{
> + u64 attrs = PMD_ATTRINDX(option);
> + u64 real_start = start;
> + u64 real_size = size;
> +
> + debug("start=%lx size=%lx\n", (ulong)start, (ulong)size);
> +
> + /*
> +  * Loop through the addr

Re: [U-Boot] [PATCH 01/10] thunderx: Calculate TCR dynamically

2016-02-24 Thread Mark Rutland
On Wed, Feb 24, 2016 at 01:11:35PM +0100, Alexander Graf wrote:
> Based on the memory map we can determine a lot of hard coded fields of
> TCR, like the maximum VA and max PA we want to support. Calculate those
> dynamically to reduce the chance for pit falls.
> 
> Signed-off-by: Alexander Graf 
> ---
>  arch/arm/cpu/armv8/cache_v8.c| 59 
> +++-
>  arch/arm/include/asm/armv8/mmu.h |  6 +---
>  include/configs/thunderx_88xx.h  |  3 --
>  3 files changed, 59 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
> index 71f0020..9229532 100644
> --- a/arch/arm/cpu/armv8/cache_v8.c
> +++ b/arch/arm/cpu/armv8/cache_v8.c
> @@ -38,6 +38,58 @@ static struct mm_region mem_map[] = CONFIG_SYS_MEM_MAP;
>  #define PTL1_ENTRIES CONFIG_SYS_PTL1_ENTRIES
>  #define PTL2_ENTRIES CONFIG_SYS_PTL2_ENTRIES
>  
> +static u64 get_tcr(int el, u64 *pips, u64 *pva_bits)
> +{
> + u64 max_addr = 0;
> + u64 ips, va_bits;
> + u64 tcr;
> + int i;
> +
> + /* Find the largest address we need to support */
> + for (i = 0; i < ARRAY_SIZE(mem_map); i++)
> + max_addr = max(max_addr, mem_map[i].base + mem_map[i].size);
> +
> + /* Calculate the maximum physical (and thus virtual) address */
> + if (max_addr > (1ULL << 44)) {
> + ips = 5;
> + va_bits = 48;
> + } else  if (max_addr > (1ULL << 42)) {
> + ips = 4;
> + va_bits = 44;
> + } else  if (max_addr > (1ULL << 40)) {
> + ips = 3;
> + va_bits = 42;
> + } else  if (max_addr > (1ULL << 36)) {
> + ips = 2;
> + va_bits = 40;
> + } else  if (max_addr > (1ULL << 32)) {
> + ips = 1;
> + va_bits = 36;
> + } else {
> + ips = 0;
> + va_bits = 32;
> + }

In Linux we program IPS to the maximum PARange from ID_AA64MMFR0.

If you did the same here you wouldn't have to iterate over all the
memory map entries to determine the maximum PA you care about (though
you may still need to do that for the VA size).

> +
> + if (el == 1) {
> + tcr = TCR_EL1_RSVD | (ips << 32);
> + } else if (el == 2) {
> + tcr = TCR_EL2_RSVD | (ips << 16);
> + } else {
> + tcr = TCR_EL3_RSVD | (ips << 16);
> + }
> +
> + /* PTWs cacheable, inner/outer WBWA and inner shareable */
> + tcr |= TCR_TG0_64K | TCR_SHARED_INNER | TCR_ORGN_WBWA | TCR_IRGN_WBWA;
> + tcr |= TCR_T0SZ(VA_BITS);
> +
> + if (pips)
> + *pips = ips;
> + if (pva_bits)
> + *pva_bits = va_bits;
> +
> + return tcr;
> +}
> +
>  static void setup_pgtables(void)
>  {
>   int l1_e, l2_e;
> @@ -110,6 +162,10 @@ __weak void mmu_setup(void)
>   /* Set up page tables only on BSP */
>   if (coreid == BSP_COREID)
>   setup_pgtables();
> +
> + el = current_el();
> + set_ttbr_tcr_mair(el, gd->arch.tlb_addr, get_tcr(el, NULL, NULL),
> +   MEMORY_ATTRIBUTES);
>  #else
>   /* Setup an identity-mapping for all spaces */
>   for (i = 0; i < (PGTABLE_SIZE >> 3); i++) {
> @@ -128,7 +184,6 @@ __weak void mmu_setup(void)
>   }
>   }
>  
> -#endif
>   /* load TTBR0 */
>   el = current_el();
>   if (el == 1) {
> @@ -144,6 +199,8 @@ __weak void mmu_setup(void)
> TCR_EL3_RSVD | TCR_FLAGS | TCR_EL3_IPS_BITS,
> MEMORY_ATTRIBUTES);
>   }
> +#endif
> +
>   /* enable the mmu */
>   set_sctlr(get_sctlr() | CR_M);
>  }
> diff --git a/arch/arm/include/asm/armv8/mmu.h 
> b/arch/arm/include/asm/armv8/mmu.h
> index 897f010..39ff745 100644
> --- a/arch/arm/include/asm/armv8/mmu.h
> +++ b/arch/arm/include/asm/armv8/mmu.h
> @@ -159,11 +159,6 @@
>  #define TCR_EL1_IPS_BITS (UL(3) << 32)   /* 42 bits physical address */
>  #define TCR_EL2_IPS_BITS (3 << 16)   /* 42 bits physical address */
>  #define TCR_EL3_IPS_BITS (3 << 16)   /* 42 bits physical address */
> -#else
> -#define TCR_EL1_IPS_BITS CONFIG_SYS_TCR_EL1_IPS_BITS
> -#define TCR_EL2_IPS_BITS CONFIG_SYS_TCR_EL2_IPS_BITS
> -#define TCR_EL3_IPS_BITS CONFIG_SYS_TCR_EL3_IPS_BITS
> -#endif
>  
>  /* PTWs cacheable, inner/outer WBWA and inner shareable */
>  #define TCR_FLAGS(TCR_TG0_64K |  \
> @@ -171,6 +166,7 @@
>   TCR_ORGN_WBWA | \
>   TCR_IRGN_WBWA | \
>   TCR_T0SZ(VA_BITS))
> +#endif
>  
>  #define TCR_EL1_RSVD (1 << 31)
>  #define TCR_EL2_RSVD (1 << 31 | 1 << 23)

I suspect you want bit 23 / EPD1 for EL1. Otherwise the core can make
walks starting at whatever junk happens to be in TTBR1.

Thanks,
Mark.

> diff --git a/include/configs/thunderx_88xx.h b/include/configs/thunderx_88xx.h
> index cece4dd..b9f93ad 100644
> --- a/include

Re: [U-Boot] [PATCH 03/16] efi_loader: Add PE image loader

2016-02-02 Thread Mark Rutland
On Tue, Feb 02, 2016 at 03:45:01AM +0100, Alexander Graf wrote:
> EFI uses the PE binary format for its application images. Add support to EFI 
> PE
> binaries as well as all necessary bits for the "EFI image loader" interfaces.
> 
> Signed-off-by: Alexander Graf 
> 
> ---
> 
> v1 -> v2:
> 
>   - move memory allocation to separate patch
>   - limit 32/64 to hosts that support it
>   - check 32bit optional nt header magic
>   - switch to GPL2+
> 
> v2 -> v3:
> 
>   - use efi_alloc
>   - add EFIAPI to function prototypes
>   - remove unused macros
>   - reorder header inclusion
>   - split relocation code into function
>   - flush cache after loading
> ---
>  include/efi_loader.h  |  20 +++
>  include/pe.h  | 263 
> ++
>  lib/efi_loader/efi_image_loader.c | 182 ++
>  3 files changed, 465 insertions(+)
>  create mode 100644 include/efi_loader.h
>  create mode 100644 include/pe.h
>  create mode 100644 lib/efi_loader/efi_image_loader.c

[...]

> +static void efi_loader_relocate(const IMAGE_BASE_RELOCATION *rel,
> + unsigned long rel_size, void *efi_reloc)
> +{
> + const IMAGE_BASE_RELOCATION *end;
> + int i;
> +
> + end = (const IMAGE_BASE_RELOCATION *)((const char *)rel + rel_size);
> + while (rel < end - 1 && rel->SizeOfBlock) {
> + const uint16_t *relocs = (const uint16_t *)(rel + 1);
> + i = (rel->SizeOfBlock - sizeof(*rel)) / sizeof(uint16_t);
> + while (i--) {
> + uint16_t offset = (*relocs & 0xfff) +
> +   rel->VirtualAddress;
> + int type = *relocs >> 12;
> + unsigned long delta = (unsigned long)efi_reloc;
> + uint64_t *x64 = efi_reloc + offset;
> + uint32_t *x32 = efi_reloc + offset;
> + uint16_t *x16 = efi_reloc + offset;
> +
> + switch (type) {
> + case IMAGE_REL_BASED_ABSOLUTE:
> + break;
> + case IMAGE_REL_BASED_HIGH:
> + *x16 += ((uint32_t)delta) >> 16;
> + break;
> + case IMAGE_REL_BASED_LOW:
> + *x16 += (uint16_t)delta;
> + break;
> + case IMAGE_REL_BASED_HIGHLOW:
> + *x32 += (uint32_t)delta;
> + break;
> + case IMAGE_REL_BASED_DIR64:
> + *x64 += (uint64_t)delta;
> + break;
> + default:
> + printf("Unknown Relocation off %x type %x\n",
> +offset, type);
> + }
> + relocs++;
> + }
> + rel = (const IMAGE_BASE_RELOCATION *)relocs;
> + }
> +}

[...]

> + /* Load sections into RAM */
> + for (i = num_sections - 1; i >= 0; i--) {
> + IMAGE_SECTION_HEADER *sec = §ions[i];
> + memset(efi_reloc + sec->VirtualAddress, 0,
> +sec->Misc.VirtualSize);
> + memcpy(efi_reloc + sec->VirtualAddress,
> +efi + sec->PointerToRawData,
> +sec->SizeOfRawData);
> + }
> +
> + /* Run through relocations */
> + efi_loader_relocate(rel, rel_size, efi_reloc);
> +
> + /* Flush cache */
> + flush_cache((ulong)efi_reloc, virt_size);

Where's the I-cache maintenance for the image performed? I can't see it
here and I didn't spot it in later patches.

Given that speculative instruction fetches can happen at any time for
anything not marked NX, there may already be stale entries in the
I-caches.

Also, flush_cache seems to perform DC CIVAC in a loop, which is
excessively expensive. To make the instructions visible to instruction
fetches you only need DC CVAU (i.e. clean by VA to the PoU), and you
only need to do that for executable sections.

Mark.

> +
> + /* Populate the loaded image interface bits */
> + loaded_image_info->image_base = efi;
> + loaded_image_info->image_size = image_size;
> +
> + return entry;
> +}
> -- 
> 2.6.2
> 
> ___
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 12/16] efi_loader: Add DCACHE_OFF support for arm64

2016-02-02 Thread Mark Rutland
On Tue, Feb 02, 2016 at 03:45:10AM +0100, Alexander Graf wrote:
> On arm64, boards can declare that they want to run with dcache disabled.
> 
> However, uEFI guarantees to payloads that they're running with the dcache
> enabled which on arm64 means that they can do unaligned accesses.
> 
> To not leave those systems out of the door, let's handle the unaligned traps.
> In the typical boot case, the OS will set up page tables and dcache itself
> early on anyway once it's done talking with uEFI.

This is not sufficient to emulate having caches enabled.

There are other things which operate differently with the caches on
(e.g. exclusives and/or atomics, which a compiler might generate
implicitly).

Likewise, cache-maintenance by Va (which you may require from the
I-side) implicitly hazards against cacheable accesses, but not against
non-cacheable accesses.

There are almsot certainly other differences.

Due to that, I don't think this is a good approach.

Why can we not map memory using cacheable attributes in all cases?

Mark.

> Signed-off-by: Alexander Graf 
> ---
>  arch/arm/cpu/armv8/exceptions.S |  10 ++
>  arch/arm/lib/Makefile   |   3 +
>  arch/arm/lib/interrupts_64.c|  19 +++
>  arch/arm/lib/unaligned_64.c | 284 
> 
>  cmd/bootefi.c   |   5 +
>  5 files changed, 321 insertions(+)
>  create mode 100644 arch/arm/lib/unaligned_64.c
> 
> diff --git a/arch/arm/cpu/armv8/exceptions.S b/arch/arm/cpu/armv8/exceptions.S
> index 4f4f526..97101c3 100644
> --- a/arch/arm/cpu/armv8/exceptions.S
> +++ b/arch/arm/cpu/armv8/exceptions.S
> @@ -144,3 +144,13 @@ exception_exit:
>   ldp x27, x28, [sp],#16
>   ldp x29, x30, [sp],#16
>   eret
> +
> +.global read_far
> +read_far:
> + switch_el x1, 3f, 2f, 1f
> +3:   mrs x0, far_el3
> + ret
> +2:   mrs x0, far_el2
> + ret
> +1:   mrs x0, far_el1
> + ret
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index f3db7b5..ce5ed99 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -42,6 +42,9 @@ else ifdef CONFIG_ARM64
>  obj-y+= ccn504.o
>  obj-y+= gic_64.o
>  obj-y+= interrupts_64.o
> +ifeq ($(CONFIG_SYS_DCACHE_OFF),y)
> +obj-$(CONFIG_EFI_LOADER) += unaligned_64.o
> +endif
>  else
>  obj-y+= interrupts.o
>  endif
> diff --git a/arch/arm/lib/interrupts_64.c b/arch/arm/lib/interrupts_64.c
> index 7c9cfce..4aa36de 100644
> --- a/arch/arm/lib/interrupts_64.c
> +++ b/arch/arm/lib/interrupts_64.c
> @@ -81,12 +81,31 @@ void do_bad_error(struct pt_regs *pt_regs, unsigned int 
> esr)
>   panic("Resetting CPU ...\n");
>  }
>  
> +#if defined(CONFIG_EFI_LOADER) && defined(CONFIG_SYS_DCACHE_OFF)
> +int do_unaligned_data(struct pt_regs *pt_regs, unsigned int esr);
> +#else
> +static int do_unaligned_data(struct pt_regs *pt_regs, unsigned int esr)
> +{
> + return -1;
> +}
> +#endif
> +
>  /*
>   * do_sync handles the Synchronous Abort exception.
>   */
>  void do_sync(struct pt_regs *pt_regs, unsigned int esr)
>  {
>   efi_restore_gd();
> +
> + /*
> +  * EFI guarantees that unaligned accesses do succeed, so while we
> +  * still need hardware access and thus are unsure whether we can
> +  * enable the dcache to have the CPU deal with them, we fix unaligned
> +  * accesses up ourselves.
> +  */
> + if (!do_unaligned_data(pt_regs, esr))
> + return;
> +
>   printf("\"Synchronous Abort\" handler, esr 0x%08x\n", esr);
>   show_regs(pt_regs);
>   panic("Resetting CPU ...\n");
> diff --git a/arch/arm/lib/unaligned_64.c b/arch/arm/lib/unaligned_64.c
> new file mode 100644
> index 000..b307b7e
> --- /dev/null
> +++ b/arch/arm/lib/unaligned_64.c
> @@ -0,0 +1,284 @@
> +/*
> + * (C) Copyright 2016
> + * Alexander Graf 
> + *
> + * SPDX-License-Identifier:  GPL-2.0+
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#define ESR_EC_MASK  0xFC00
> +#define ESR_EC_SHIFT 26
> +#define ESR_IL_MASK  0x0200
> +#define ESR_IL_SHIFT 25
> +#define ESR_ISS_MASK 0x01FF
> +#define ESR_ISS_SHIFT0
> +
> +#define EC_DATA_SL   0x25
> +
> +#define ISS_ISV_MASK 0x0100
> +#define ISS_ISV_SHIFT24
> +#define ISS_SAS_MASK 0x00C0
> +#define ISS_SAS_SHIFT22
> +#define ISS_SSE_MASK 0x0020
> +#define ISS_SSE_SHIFT21
> +#define ISS_SRT_MASK 0x000F
> +#define ISS_SRT_SHIFT16
> +#define ISS_SF_MASK  0x8000
> +#define ISS_SF_SHIFT 15
> +#define ISS_AR_MASK  0x4000
> +#define ISS_AR_SHIFT 14
> +#define ISS_EA_MASK  0x0200
> +#define ISS_EA_SHIFT 9
> +#define ISS_CM_MASK  0x0100
> +#define ISS_CM_SHIFT 8
> +#define ISS_S1PTW_MASK   0x0080
> +#define ISS_S1PTW_SHIFT  7
> +#define ISS_WNR_MASK 0x0040
> +#define ISS_WNR_SHIFT6
> +#define WNR_READ 0
> +#define WNR_WRITE1
> +#define ISS_DFSC_MASK0x003F
> +#define ISS_DFSC_SHIFT   0
> +
> +#define ISV_VAL

Re: [U-Boot] arm: ls1021a: Ensure Generic Timer disabled before jumping into the OS

2015-10-28 Thread Mark Rutland
On Tue, Oct 27, 2015 at 02:52:43AM +, Huan Wang wrote:
> Hi, Mark and Alexander,
> 
>   Do you have any comment about this patch?
>   Thanks.

The patch looks good to me. FWIW:

Acked-by: Mark Rutland 

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v7 5/9] arm: serial: Add ability to use pre-initialized UARTs

2015-10-19 Thread Mark Rutland
On Mon, Oct 19, 2015 at 02:57:22PM +0200, Linus Walleij wrote:
> Jon & Grant especially:
> 
> On Mon, Oct 19, 2015 at 2:44 PM, Simon Glass  wrote:
> > Me
> >> I will go in and answer the comment on the DT mailing list so there is
> >> some push atleast.
> >
> > Perhaps if we could see some movement then it would provide
> > encouragement to continue. So far I cannot recall seeing a single
> > U-Boot device tree change accepted in the 4 years I've been involved.
> > That's not to say it hasn't happened, and I hope it is just a
> > reflection on my memory rather than the difficulty level.
> 
> OK this isn't working.
> 
> I think the problem is that DT bindings have traditionally been merged to
> the kernel by different subsystem maintainers. That means mailing them
> and their mailing lists and this is IMO too complex for U-Boot people
> (or other external people) to have to deal with. As subsystem
> maintainer I'm not very happy about being the one responsible either.
> 
> The MAINTAINERS entry for device tree bindings does not state a
> git tree and I've never seen any of the maintainers send a pull request for
> DT binding files. (Beat me up properly if you have, guys.) I've seen
> Grant send some at times.

FromRob Herring 
DateTue, 1 Sep 2015 16:20:13 -0500
Subject [GIT PULL] DeviceTree for 4.3

https://lkml.org/lkml/2015/9/1/526

> I suggest sending U-Boot DT bindings to not only
> devicet...@vger.kernel.org
> but also, as indicated, to Jon Corbet and linux-doc.

I have no problem with patches going to an extended set of lists.

> If noone cares to comment in two weeks, Jon can merge them,
> breaking the status quo on external DT bindings.
> 
> The DT bindings maintainance has sadly been a very sad story and if
> the Linux kernel should be the canon repository for them, we
> need to find a simple way for external projects to contribute. Just
> mailing them to devicetree@vger obviously stands the risk of just
> ending up in the memory hole.

Rob, we should organise rotating the reponsibility of picking things up.

I'd been meaning to organise something official previously, and this is
a good kick to do so.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] armv7 DMA and cache mangement functions

2015-08-27 Thread Mark Rutland
On Mon, Aug 24, 2015 at 08:54:17AM +0100, Markus Niebel wrote:
> Hello,
> 
> I'm not an expert in the low level details of this area. So please sorry if 
> there are
> wrong assumptions in this post post.
> 
> Hardware: i.MX6 Solo (TQMa6 on custom Mainboard)
> U-Boot: 2014.10
> gcc: 4.8.3 
> 
> We see an error using TFTP on i.MX6 that seems to triggered, if the code / 
> data size goes
> over a limit. Code changes have nothing to do with network stack, network 
> drivers, 
> memory mangement. TFTP will completely unusable: device sees frequently 
> erroneous packages 
> with different of wierd errors. If code stays below this size all works fine.
> 
> Up to now we checked a lot of things. The following brought us to the 
> assumption, that this
> could be cache related:
> 
> dynamically disable data cache before doing TFTP: TFTP works well again
> running with disabled L2 cache (data cache enabled):  TFTP works well again
> 
> Looking at the code in drivers/net/fec_mxc.c, function fec_recv we see a call 
> to
> invalidate_dcache_range before accessing the received ethernet data. When 
> looking at
> the code for invalidate_dcache_range in arch/arm/cpu/armv7/cache_v7.c an 
> comparing
> how the things done in linux and barebox we noticed that the order of L2 
> chache / data cache
> invalidation is just swapped there. Applying this to the receive code for 
> fec_mxc,
> TFTP will work again.
> 
> Question: is the order of cache invalidation important?

The order is important.

Consider the case where both the external and architected caches contain
stale (but clean) cache lines for the region you care about.

If you invalidate the architected caches before the external L2, the
architected caches may speculatively fetch (stale) data from the L2
before the L2 is cleaned, and so in the end you may still see stale
data in the architected caches.

If you invalidate the L2 first, the architected caches could
speculatively fetch from the L2 (stale) or memory (new) while this is in
progress, but they will then be invalidated, and from then on can only
fetch the new data.

That assumes that both levels were clean to begin with. If they are not,
then additional maintenance is required. It's also conceivable that
caches could be implemented such that the above is insufficient, YMMV.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] arm: ls1021a: Ensure LS1021 ARM Generic Timer CompareValue Set 64-bit

2015-07-17 Thread Mark Rutland
On Fri, Jul 17, 2015 at 11:01:01AM +0100, Huan Wang wrote:
> Hi, Mark,
> 
> > > On Wed, Jul 15, 2015 at 08:13:05AM +0100, Alison Wang wrote:
> > > > This patch addresses a problem mentioned recently on this mailing
> > > list:
> > > > [1].
> > > >
> > > > In that posting a LS1021 based system was locking up at about 5
> > > > minutes after boot, but the problem was mysteriously related to the
> > > > toolchain used for building u-boot.  Debugging the problem reveals
> > a
> > > > stuck interrupt 29 on the GIC.
> > > >
> > > > It appears Freescale's LS1021 support in u-boot erroneously sets
> > the
> > > > 64-bit ARM generic PL1 physical time CompareValue register to all-
> > > ones
> > > > with a 32-bit value.  This causes the timer compare to fire 344
> > > > seconds after u-boot configures it.  Depending on how fast u-boot
> > > gets
> > > > the kernel booted, this amounts to about 5-minutes of Linux uptime
> > > > before locking up.
> > >
> > > If as in [2] this is an attempt to not generate interrupts that Linux
> > > doesn't expect, it would be far better to simply disable the timer
> > > interrupt before leaving U-Boot, ensuring that unexpected interrupts
> > > are never generated regardless of the width or rate of the counter.
> > >
> > > There are bits in CNTP_CTL to do this.
> > [Alison Wang] Yes, your idea is far better.
> [Alison Wang] If the CompareValue register is not written, is there any 
> unexpected
> Interrupt?

If you don't write to CNTP_CVAL, you have the exact same problem. The
only difference is that CNTP_CVAL contains an UNKNOWN value, and so the
interrutp could trigger at any point in time.

> How about removing the following code?
> 
> /* Set PL1 Physical Comp Value */
> val = TIMER_COMP_VAL;
> asm("mcrr p15, 2, %Q0, %R0, c14" : : "r" (val));

To stop the interrupt from firing at all you can clear CNTP_CTL.ENABLE,
which will disable the comparator. You could instead set CNTP_CTL.IMASK,
but I think clearing ENABLE is preferable because you might also save
power.

Thanks,
Mark.

> 
> > > Thanks,
> > > Mark.
> > >
> > > [2] http://lists.denx.de/pipermail/u-boot/2015-July/218937.html
> > > [3] http://lists.denx.de/pipermail/u-boot/2015-July/218979.html
> > >
> > > > Apparently the bug is masked by some toolchains.  Perhaps this is
> > > > explained by default compiler options, word sizes, or binutils
> > > versions.
> > > > At any rate this patch makes the manipulation explicitly 64-bit
> > > > which alleviates the issue.
> > > >
> > > > [1]
> > > > https://lists.yoctoproject.org/pipermail/meta-freescale/2015-
> > > June/0144
> > > > 00.html
> > > > Signed-off-by: Chris Kilgour 
> > > > Signed-off-by: Alison Wang 
> > > > ---
> > > >  arch/arm/cpu/armv7/ls102xa/timer.c| 3 ++-
> > > >  arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h | 2 +-
> > > >  2 files changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/cpu/armv7/ls102xa/timer.c
> > > > b/arch/arm/cpu/armv7/ls102xa/timer.c
> > > > index 11b17b2..e6a32ca 100644
> > > > --- a/arch/arm/cpu/armv7/ls102xa/timer.c
> > > > +++ b/arch/arm/cpu/armv7/ls102xa/timer.c
> > > > @@ -56,7 +56,8 @@ static inline unsigned long long
> > > us_to_tick(unsigned
> > > > long long usec)  int timer_init(void)  {
> > > > struct sctr_regs *sctr = (struct sctr_regs *)SCTR_BASE_ADDR;
> > > > -   unsigned long ctrl, val, freq;
> > > > +   unsigned long ctrl, freq;
> > > > +   unsigned long long val;
> > > >
> > > > /* Enable System Counter */
> > > > writel(SYS_COUNTER_CTRL_ENABLE, &sctr->cntcr); diff --git
> > > > a/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> > > > b/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> > > > index ee547fb..34854da 100644
> > > > --- a/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> > > > +++ b/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> > > > @@ -31,7 +31,7 @@
> > > >  #define RCWSR4_SRDS1_PRTCL_SHIFT   24
> > > >  #define RCWSR4_SRDS1_PRTCL_MASK0xff00
> > > >
> > > > -#define TIMER_COMP_VAL 0x
> > > > +#define TIMER_COMP_VAL 0xull
> > > >  #define ARCH_TIMER_CTRL_ENABLE (1 << 0)
> > > >  #define SYS_COUNTER_CTRL_ENABLE(1 << 24)
> > > >
> > > > --
> > > > 2.1.0.27.g96db324
> > > >
> > > > ___
> > > > U-Boot mailing list
> > > > U-Boot@lists.denx.de
> > > > http://lists.denx.de/mailman/listinfo/u-boot
> > > >
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] arm: ls1021a: Ensure LS1021 ARM Generic Timer CompareValue Set 64-bit

2015-07-15 Thread Mark Rutland
Hi,

Isn't this the same patch as a couple of days ago [2], which I replied
to [3]?

On Wed, Jul 15, 2015 at 08:13:05AM +0100, Alison Wang wrote:
> This patch addresses a problem mentioned recently on this mailing list:
> [1].
> 
> In that posting a LS1021 based system was locking up at about 5 minutes
> after boot, but the problem was mysteriously related to the toolchain
> used for building u-boot.  Debugging the problem reveals a stuck
> interrupt 29 on the GIC.
> 
> It appears Freescale's LS1021 support in u-boot erroneously sets the
> 64-bit ARM generic PL1 physical time CompareValue register to all-ones
> with a 32-bit value.  This causes the timer compare to fire 344 seconds
> after u-boot configures it.  Depending on how fast u-boot gets the
> kernel booted, this amounts to about 5-minutes of Linux uptime before
> locking up.

If as in [2] this is an attempt to not generate interrupts that Linux
doesn't expect, it would be far better to simply disable the timer
interrupt before leaving U-Boot, ensuring that unexpected interrupts are
never generated regardless of the width or rate of the counter.

There are bits in CNTP_CTL to do this.

Thanks,
Mark.

[2] http://lists.denx.de/pipermail/u-boot/2015-July/218937.html
[3] http://lists.denx.de/pipermail/u-boot/2015-July/218979.html

> Apparently the bug is masked by some toolchains.  Perhaps this is
> explained by default compiler options, word sizes, or binutils versions.
> At any rate this patch makes the manipulation explicitly 64-bit which
> alleviates the issue.
> 
> [1]
> https://lists.yoctoproject.org/pipermail/meta-freescale/2015-June/014400.html
> Signed-off-by: Chris Kilgour 
> Signed-off-by: Alison Wang 
> ---
>  arch/arm/cpu/armv7/ls102xa/timer.c| 3 ++-
>  arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv7/ls102xa/timer.c 
> b/arch/arm/cpu/armv7/ls102xa/timer.c
> index 11b17b2..e6a32ca 100644
> --- a/arch/arm/cpu/armv7/ls102xa/timer.c
> +++ b/arch/arm/cpu/armv7/ls102xa/timer.c
> @@ -56,7 +56,8 @@ static inline unsigned long long us_to_tick(unsigned long 
> long usec)
>  int timer_init(void)
>  {
>   struct sctr_regs *sctr = (struct sctr_regs *)SCTR_BASE_ADDR;
> - unsigned long ctrl, val, freq;
> + unsigned long ctrl, freq;
> + unsigned long long val;
>  
>   /* Enable System Counter */
>   writel(SYS_COUNTER_CTRL_ENABLE, &sctr->cntcr);
> diff --git a/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h 
> b/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> index ee547fb..34854da 100644
> --- a/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> +++ b/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> @@ -31,7 +31,7 @@
>  #define RCWSR4_SRDS1_PRTCL_SHIFT 24
>  #define RCWSR4_SRDS1_PRTCL_MASK  0xff00
>  
> -#define TIMER_COMP_VAL   0x
> +#define TIMER_COMP_VAL   0xull
>  #define ARCH_TIMER_CTRL_ENABLE   (1 << 0)
>  #define SYS_COUNTER_CTRL_ENABLE  (1 << 24)
>  
> -- 
> 2.1.0.27.g96db324
> 
> ___
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 1/3] dm: dts: ls2085a: Bring in ls2085a dts files from linux kernel

2015-07-13 Thread Mark Rutland
On Wed, Jul 08, 2015 at 08:31:47AM +0100, Sharma Bhupesh wrote:
> > -Original Message-
> > From: U-Boot [mailto:u-boot-boun...@lists.denx.de] On Behalf Of Wang
> > Haikun
> > On 7/8/2015 3:13 PM, Bin Meng wrote:
> > > Hi,
> > >
> > > On Wed, Jul 8, 2015 at 2:51 PM, Wang Haikun 
> > wrote:
> > >> On 6/26/2015 7:53 PM, Haikun Wang wrote:
> > >>> From: Haikun Wang 
> > >>>
> > >>> Bring in required device tree files for ls2085a from Linux.
> > >>> These are initially unchanged and have a number of pieces not needed
> > by U-Boot.
> > >> Hi Simon,
> > >>
> > >> I got below comment when review this patch internal.
> > >> Please help me confirm.
> > >>
> > >> "For new platforms like ARM64, it was discussed to not duplicate the
> > >> DTS in u-boot and Linux, simply because that will break compatibility
> > >> with other bootloaders like Linaro's BootMonitor and UEFI bootloader,
> > >> which do not place the DTS in the bootloader. Also in near future,
> > >> with DTS being replaced by ACPI gradually for ARM64 platforms, it was
> > >> discussed that in a longer run it would be beneficial to move DTS out
> > >> of both u-boot and Linux and maintain it as a separate tree."
> > >>
> > >
> > > I think UEFI + ACPI is only required for ARMv8 servers, not for all
> > > ARMv8 processors. Is ls2085a a processor targeting the server market?
> > No, at least it's not our major market.
> > I want to know whether we have made a conclusion that u-boot will not add
> > Arm64 dts files?
> 
> Adding Russell and Mark for their thoughts.
> 
> AFAIK there were discussions to generate common DTS files for PPC and ARM 
> platforms,
> where it was discussed that in a longer run it would be beneficial to move 
> DTS out
> of both u-boot and Linux and maintain it as a separate tree.

While I would like to see dts moved out of the kernel, I don't see this
happening in the short term.

I'm not sure what the best strategy is w.r.t. U-Boot and dts.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] Fix LS102xa timer setup to use 64-bit.

2015-07-13 Thread Mark Rutland
On Sun, Jul 12, 2015 at 05:05:34AM +0100, Christopher Kilgour wrote:
> Fix LS102xa timer configuration to ensure timer compare value is set to
> all-ones as a 64-bit number rather than a 32-bit number.
> 
> When the 32-bit all-ones was used, this could result in a timer compare value
> of 2^32-1, which at 12.5 MHz will fire in ~344 seconds.  If the operating
> system's timer support does not expect or handle the timer compare, it can
> lock up when the timer compare fires and never clears (in Linux this shows up
> as a stuck GIC 29).
> 
> It's also possible to interactively work with u-boot for longer than 344
> seconds, and have the timer compare silently fire in the background without
> impact on u-boot operation.  However, as soon as the operating system enables
> interrupts, it can lock up.  On embedded Linux without early console support,
> this can be a silent lockup without warning or diagnostic output.
> 
> It's likely Freescale wanted to set the timer compare to the largest possible
> value of all-ones at 64-bits.  Rather than 344 seconds, this would fire after
> ~47k years.  Even though Linux systems are known for long uptimes, one assumes
> this is Freescale's intended, safe value.

It would probably be better to set CNTP_CTL.IMASK, or clear
CNTP_CTL.ENABLE. That way no interrupt will be generated regardless of
the timer frequency, even in the case of a long uptime ;)

Thanks,
Mark.

> 
> Signed-off-by: Christopher Kilgour 
> Cc: York Sun 
> ---
>  arch/arm/cpu/armv7/ls102xa/timer.c| 7 ---
>  arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h | 2 +-
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv7/ls102xa/timer.c 
> b/arch/arm/cpu/armv7/ls102xa/timer.c
> index 11b17b2..746bfc0 100644
> --- a/arch/arm/cpu/armv7/ls102xa/timer.c
> +++ b/arch/arm/cpu/armv7/ls102xa/timer.c
> @@ -56,7 +56,8 @@ static inline unsigned long long us_to_tick(unsigned long 
> long usec)
>  int timer_init(void)
>  {
>   struct sctr_regs *sctr = (struct sctr_regs *)SCTR_BASE_ADDR;
> - unsigned long ctrl, val, freq;
> + unsigned long ctrl, freq;
> + unsigned long long val64;
>  
>   /* Enable System Counter */
>   writel(SYS_COUNTER_CTRL_ENABLE, &sctr->cntcr);
> @@ -69,8 +70,8 @@ int timer_init(void)
>   asm("mcr p15, 0, %0, c14, c2, 1" : : "r" (ctrl));
>  
>   /* Set PL1 Physical Comp Value */
> - val = TIMER_COMP_VAL;
> - asm("mcrr p15, 2, %Q0, %R0, c14" : : "r" (val));
> + val64 = TIMER_COMP_VAL;
> + asm("mcrr p15, 2, %Q0, %R0, c14" : : "r" (val64));
>  
>   gd->arch.tbl = 0;
>   gd->arch.tbu = 0;
> diff --git a/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h 
> b/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> index ee547fb..1f55655 100644
> --- a/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> +++ b/arch/arm/include/asm/arch-ls102xa/immap_ls102xa.h
> @@ -31,7 +31,7 @@
>  #define RCWSR4_SRDS1_PRTCL_SHIFT 24
>  #define RCWSR4_SRDS1_PRTCL_MASK  0xff00
>  
> -#define TIMER_COMP_VAL   0x
> +#define TIMER_COMP_VAL   ((unsigned long long)(-1))
>  #define ARCH_TIMER_CTRL_ENABLE   (1 << 0)
>  #define SYS_COUNTER_CTRL_ENABLE  (1 << 24)
>  
> -- 
> 2.1.0
> 
> ___
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v3] armv8: caches: Added routine to set non cacheable region

2015-06-11 Thread Mark Rutland
On Thu, Jun 11, 2015 at 08:17:15AM +0100, Siva Durga Prasad Paladugu wrote:
> 
> Hi Mark,
> 
> > -Original Message-
> > From: Mark Rutland [mailto:mark.rutl...@arm.com]
> > Sent: Thursday, May 28, 2015 3:10 PM
> > To: Siva Durga Prasad Paladugu
> > Cc: u-boot@lists.denx.de; Michal Simek; Siva Durga Prasad Paladugu
> > Subject: Re: [PATCH v3] armv8: caches: Added routine to set non cacheable
> > region
> >
> > Hi,
> >
> > > +void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
> > > +enum dcache_option option)
> > > +{
> > > +   u64 *page_table = arch_get_page_table();
> > > +   u64 upto, end;
> > > +
> > > +   if (page_table == NULL)
> > > +   return;
> > > +
> > > +   end = ALIGN(start + size, (1 << MMU_SECTION_SHIFT)) >>
> > > + MMU_SECTION_SHIFT;
> > > +   start = start >> MMU_SECTION_SHIFT;
> > > +   for (upto = start; upto < end; upto++) {
> > > +   page_table[upto] &= ~PMD_ATTRINDX_MASK;
> > > +   page_table[upto] |= PMD_ATTRINDX(option);
> > > +   }
> >
> > These writes might not be visible to the page table walkers immediately, and
> > the TLBs might still contain stale values for a while afterwards.
> > That could render the cache maintenance useless (as speculative fetches
> > could still occur due to cacheable attributes still being in place).
> >
> > You need a DSB to ensure writes are visible to the page table walkers (with 
> > a
> > compiler barrier to ensure that the writes actually occur before the DSB), 
> > and
> > some TLB maintenance (complete with another DSB) to ensure that the TLBs
> > don't contain stale values by the time to get to the cache afterwards.
> The flush_dcache _range() below contains a dsb. Isn't it fine enough?
> Or we need a separte dsb in the for loopafter we changed the cache
> attribute.

The DSB in flush_dcache_range() is not sufficient. You need a DSB
between the page table modifications and the TLB invalidation, and the
TLB invalidation must be completed before the cache maintenance begins.

> Regarding the TLB maintenance if we have _asm_invalidate_tlb_all()
> after the flush dcache range below it should be fine right?

No. The TLB maintenance must be complete _before_ the cache maintenance,
or the cache can be refilled while the maintenance is ongoing (e.g. the
CPU could make speculative prefetches).

You need a strictly-ordered sequence:

1) Modify the page tables

2) DSB

   This ensures the updates are visible to the page table walker(s).

3) TLB invalidation

4) DSB

   This ensures that the TLB invalidation is complete (i.e. from this
   point on the TLBs cannot hold entries for the region with cacheable
   attributes).

5) ISB

   This ensures that the effects of TLB invalidation are visible to
   later instructions. Otherwise instructions later could be using stale
   attributes fetched earlier by the CPU from the TLB, before the TLB
   invalidation completed (and hence could allocate in the caches).

6) Cache maintenance 

7) DSB to complete cache maintenance

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v3] armv8: caches: Added routine to set non cacheable region

2015-05-28 Thread Mark Rutland
Hi,

> +void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
> +  enum dcache_option option)
> +{
> + u64 *page_table = arch_get_page_table();
> + u64 upto, end;
> +
> + if (page_table == NULL)
> + return;
> +
> + end = ALIGN(start + size, (1 << MMU_SECTION_SHIFT)) >>
> +   MMU_SECTION_SHIFT;
> + start = start >> MMU_SECTION_SHIFT;
> + for (upto = start; upto < end; upto++) {
> + page_table[upto] &= ~PMD_ATTRINDX_MASK;
> + page_table[upto] |= PMD_ATTRINDX(option);
> + }

These writes might not be visible to the page table walkers immediately,
and the TLBs might still contain stale values for a while afterwards.
That could render the cache maintenance useless (as speculative fetches
could still occur due to cacheable attributes still being in place).

You need a DSB to ensure writes are visible to the page table walkers
(with a compiler barrier to ensure that the writes actually occur before
the DSB), and some TLB maintenance (complete with another DSB) to ensure
that the TLBs don't contain stale values by the time to get to the cache
afterwards.

Also minor nit, but s/upto/i/?

> +
> + start = start << MMU_SECTION_SHIFT;
> + end = end << MMU_SECTION_SHIFT;
> + flush_dcache_range(start, end);
> +}
>  #else/* CONFIG_SYS_DCACHE_OFF */
>  
>  void invalidate_dcache_all(void)
> @@ -170,6 +197,11 @@ int dcache_status(void)
>   return 0;
>  }
>  
> +void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
> +  enum dcache_option option)
> +{
> +}
> +
>  #endif   /* CONFIG_SYS_DCACHE_OFF */
>  
>  #ifndef CONFIG_SYS_ICACHE_OFF
> diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> index 760e8ab..868ea54 100644
> --- a/arch/arm/include/asm/system.h
> +++ b/arch/arm/include/asm/system.h
> @@ -15,9 +15,15 @@
>  #define CR_EE(1 << 25)   /* Exception (Big) Endian   
> */
>  
>  #define PGTABLE_SIZE (0x1)
> +/* 2MB granularity */
> +#define MMU_SECTION_SHIFT21

Do we only expect 4K pages for now?

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2] armv8: caches: Added routine to set non cacheable region

2015-05-13 Thread Mark Rutland
On Tue, May 12, 2015 at 04:46:49AM +0100, Siva Durga Prasad Paladugu wrote:
> Hi Mark,
> 
> 
> > -Original Message-
> > From: Mark Rutland [mailto:mark.rutl...@arm.com]
> > Sent: Wednesday, April 29, 2015 10:00 PM
> > To: Michal Simek
> > Cc: u-boot@lists.denx.de; Albert Aribaud; Marek Vasut; Tom Rini; Siva Durga
> > Prasad Paladugu; Varun Sethi; Thierry Reding; Arnab Basu; York Sun
> > Subject: Re: [U-Boot] [PATCH v2] armv8: caches: Added routine to set non
> > cacheable region
> > 
> > Hi Michal,
> > 
> > On Wed, Apr 29, 2015 at 09:35:35AM +0100, Michal Simek wrote:
> > > Added routine mmu_set_region_dcache_behaviour() to set a particular
> > > region as non cacheable.
> > 
> > What's the intended use of this?
> This is intended to mark a dynamically allocated region as non-cacheable 
> region in runtime.

Sure, but why does that region need to be non-cacheable?

I assume you want to give the region to some device?

Do you ever hand the memory back (and hence need to make it cacehable
again)?

> There is same kind of routine for armv7 but not for armv8. Do you think that 
> the same functionality to be addressed for armv8 too? 
> 
> As per below comment, you are correct, this looks like to be more board 
> specific.

While the address of the tables might be board-specific I'd imagine
that the manipulation routines can be shared.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] sunxi: display: Align end of memory to work around a linux-4.0 bug

2015-05-12 Thread Mark Rutland
On Tue, May 05, 2015 at 10:39:14AM +0100, Mark Rutland wrote:
> On Mon, May 04, 2015 at 10:36:43AM +0100, Ian Campbell wrote:
> > On Mon, 2015-05-04 at 10:51 +0200, Hans de Goede wrote:
> > > Hi,
> > > 
> > > On 02-05-15 15:21, Ian Campbell wrote:
> > > > On Fri, 2015-04-24 at 20:39 +0200, Hans de Goede wrote:
> > > >> Linux-4.0 as shipped has a bug causing it to not boot if the end of 
> > > >> memory
> > > >> is not aligned to a multiple of 2 MiB. For details see the linux-arm
> > > >> mailing list post titled:
> > > >> "Memory size unaligned to section boundary"
> > > >> http://www.spinics.net/lists/arm-kernel/msg413811.html
> > > >>
> > > >> This is something which specifically hits the sunxi display driver 
> > > >> because
> > > >> we carve out the exact needed framebuffer size at the top of mem, this
> > > >> commit works around this issue by aligning the carve out.
> > > >
> > > > I'm afraid I don't like this, we shouldn't be working around Linux bugs
> > > > in the firmware, especially when both are Free software. Lets just fix
> > > > Linux and get the fix into the appropriate stable trees and in the
> > > > meantime tell people to avoid this buggy kernel.
> > > >
> > > > The problem with this sort of thing is that it is very hard to get rid
> > > > of these workarounds, even once the underlying issue is fixed and we no
> > > > longer care about the versions with the bug OS authors (including
> > > > non-Linux OSes) can inadvertently come to rely on the quirky behaviour,
> > > > (i.e. the work around masks other bugs). Hence we end up in a
> > > > quirks-race as everyone works around the other parties last workaround.
> > > >
> > > > If there is to be a workaround instead of a fix then it should be for
> > > > Linux to align memory to 2MB boundaries if that is what it requires.
> > > 
> > > I can understand where you're coming from, the problem is that despite
> > > various mails to the arm kernel mailing list no one from the upstream
> > > kernel seems to be looking into this,
> > 
> > Mark, do you think you could find some cycles (not necessarily your own)
> > to look at this, or perhaps you know the appropriate maintainers to
> > ping?
> 
> I'll have another look and see if I can come up with a kernel patch.
> Perhaps proposing something (even if slightly wrong) will provoke people
> to respond.

For the benefit of anyone not on the Linux ARM kernel list there's now a
patch addressing the issue [1].

Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/342210.html
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] sunxi: display: Align end of memory to work around a linux-4.0 bug

2015-05-05 Thread Mark Rutland
On Mon, May 04, 2015 at 10:36:43AM +0100, Ian Campbell wrote:
> On Mon, 2015-05-04 at 10:51 +0200, Hans de Goede wrote:
> > Hi,
> > 
> > On 02-05-15 15:21, Ian Campbell wrote:
> > > On Fri, 2015-04-24 at 20:39 +0200, Hans de Goede wrote:
> > >> Linux-4.0 as shipped has a bug causing it to not boot if the end of 
> > >> memory
> > >> is not aligned to a multiple of 2 MiB. For details see the linux-arm
> > >> mailing list post titled:
> > >> "Memory size unaligned to section boundary"
> > >> http://www.spinics.net/lists/arm-kernel/msg413811.html
> > >>
> > >> This is something which specifically hits the sunxi display driver 
> > >> because
> > >> we carve out the exact needed framebuffer size at the top of mem, this
> > >> commit works around this issue by aligning the carve out.
> > >
> > > I'm afraid I don't like this, we shouldn't be working around Linux bugs
> > > in the firmware, especially when both are Free software. Lets just fix
> > > Linux and get the fix into the appropriate stable trees and in the
> > > meantime tell people to avoid this buggy kernel.
> > >
> > > The problem with this sort of thing is that it is very hard to get rid
> > > of these workarounds, even once the underlying issue is fixed and we no
> > > longer care about the versions with the bug OS authors (including
> > > non-Linux OSes) can inadvertently come to rely on the quirky behaviour,
> > > (i.e. the work around masks other bugs). Hence we end up in a
> > > quirks-race as everyone works around the other parties last workaround.
> > >
> > > If there is to be a workaround instead of a fix then it should be for
> > > Linux to align memory to 2MB boundaries if that is what it requires.
> > 
> > I can understand where you're coming from, the problem is that despite
> > various mails to the arm kernel mailing list no one from the upstream
> > kernel seems to be looking into this,
> 
> Mark, do you think you could find some cycles (not necessarily your own)
> to look at this, or perhaps you know the appropriate maintainers to
> ping?

I'll have another look and see if I can come up with a kernel patch.
Perhaps proposing something (even if slightly wrong) will provoke people
to respond.

> I'd really like to avoid having to hack around kernel bugs in the
> firmware.

Likewise.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2] armv8: caches: Added routine to set non cacheable region

2015-04-29 Thread Mark Rutland
Hi Michal,

On Wed, Apr 29, 2015 at 09:35:35AM +0100, Michal Simek wrote:
> Added routine mmu_set_region_dcache_behaviour() to set a
> particular region as non cacheable.

What's the intended use of this?

> Define dummy routine for mmu_set_region_dcache_behaviour()
> to handle incase of dcache off.
> 
> Signed-off-by: Siva Durga Prasad Paladugu 
> Signed-off-by: Michal Simek 
> ---
> 
> Changes in v2:
> - Fix patch subject (remove addional zzz from v1)
> - Remove armv8: caches: Disable dcache after flush patch from this
>   series based on the talk with Mark Rutland (patch is not needed
>   anymore)
> 
>  arch/arm/cpu/armv8/cache_v8.c | 23 +++
>  arch/arm/include/asm/system.h | 28 ++--
>  2 files changed, 41 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
> index c5ec5297cd39..25a2136a3cdf 100644
> --- a/arch/arm/cpu/armv8/cache_v8.c
> +++ b/arch/arm/cpu/armv8/cache_v8.c
> @@ -139,6 +139,24 @@ int dcache_status(void)
>   return (get_sctlr() & CR_C) != 0;
>  }
>  
> +void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
> +  enum dcache_option option)
> +{
> + /* get the level2_table0 start address */
> + u64 *page_table = (u64 *)(gd->arch.tlb_addr + 0x3000);

This looks very specific to a particular platform.

> + u64 upto, end;
> +
> + end = ALIGN(start + size, (1 << MMU_SECTION_SHIFT)) >>
> +   MMU_SECTION_SHIFT;
> + start = start >> MMU_SECTION_SHIFT;
> + for (upto = start; upto < end; upto++) {
> + page_table[upto] &= ~PMD_ATTRINDX_MASK;
> + page_table[upto] |= PMD_ATTRINDX(option);
> + }
> +
> + flush_dcache_range(page_table[start], page_table[end]);

This looks odd. Aren't these the values in the page tables (complete
with attributes), rather than (virtual) addresses?

What exactly are you trying to flush here? Depending on your TCR
settings you don't necessarily have to flush the tables themselves,
assuming they don't fall inside the region being changed?

> + __asm_invalidate_tlb_all();

If the region was previously cacheable, you'll need to
(clean+)invalidate here to clear the PA range in the caches.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 07/10] sunxi: Fix end of kernel memory alignment for A33

2015-04-28 Thread Mark Rutland
Hi Hans,

> So it seems that I'm not the only one seeing this, and I've been wrongly
> blaming it on the A33, instead it seems to be a kernel bug, triggered
> on my A33 due to the display resolution it has.
> 
> For details see:
> 
> http://www.spinics.net/lists/arm-kernel/msg413811.html

That's good news; far less scary than a HW issue.

Would you mind replying on that thread to give it a bit more visibility?

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 1/2] armv8: caches: Disable dcache after flush

2015-04-20 Thread Mark Rutland
> > > Thanks for explanation.
> > > So in that case, the flushing of the required stack or any other data
> > > which needs to be flushed should be part of board specific. Am I
> > > correct?
> >
> > It could be done in generic code, assuming we know the bounds of memory
> > which will be used, because maintenance by VA should always work.
> >
> > Do we know which memory U-Boot might use (e.g. does it all fall within some
> > static carveout?), or can it dynamically allocate from anywhere in memory?
> >
> > > If yes, then this disable_dcache() should contain a asm call to a
> > > routine() (which might be board specific) after disabling the cache to
> > > flush the required data and then flush_dcache_all() followed by flush
> > > L3 cache..
> >
> > You could probably get away with:
> >
> > * Load the memory bounds that we need to flush into some registers, or
> >   flush some datastructure containing these to memory.
> > * In assembly:
> >   - disable the MMU.
> >   - flush the PA range(s) we need to use to be able to use C safely.
> >   - flush by Set/Way to empry the CPU-local caches
> > * Implementation-specific L3 flushing for anything else.
> >
> > If we only map a small amount of memory, we could simply flush this by VA
> > (knowing that this will drain the CPU and L3 caches, without any special
> > maintenance).
> I just looked at one of the old patch from York in the link below. Can you 
> look at this.
> http://lists.denx.de/pipermail/u-boot/2015-January/200514.html

I'm not sure what you're expecting me to say w.r.t. that. Is there a
particular question you have?

Thanks,
Mark.

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 07/10] sunxi: Fix end of kernel memory alignment for A33

2015-04-17 Thread Mark Rutland
On Thu, Apr 16, 2015 at 08:12:31PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 16-04-15 19:35, Mark Rutland wrote:
> > On Thu, Apr 16, 2015 at 08:32:03AM +0100, Hans de Goede wrote:
> >> Hi,
> >>
> >> On 15-04-15 21:57, Ian Campbell wrote:
> >>> On Tue, 2015-04-14 at 18:06 +0200, Hans de Goede wrote:
> >>>> For unknown reasons the A33 needs the end of the memory we report to the
> >>>> kernel to be aligned to a multiple of 4 MiB.
> >>>
> >>> Do you really mean "the A33 needs" (as in the processor itself) or do
> >>> you actually mean "the A33 kernel port"?
> >>>
> >>> If the latter than can't that be investigated/fixed instead of hacked
> >>> here? That would be far more preferable.
> >>
> >> I mean the former, it seems that the SoC itself cannot handle dram
> >> ranges with different cache policies which are not aligned to 4 MiB,
> >> at least that is my WAG what is going on here.
> >
> > That sounds incredibly suspicious.
> >
> > What do you mean w.r.t. different cache policies -- what does that have
> > to do with the end of DRAM?
> 
> We carve out a framebuffer at the end of DRAM, and then report less
> DRAM then we actually have to the kernel. This framebuffer then gets
> picked up by the kernel through simplefb, which will map it with a different
> cache policy then the normal part of the DRAM has.

I see. Thanks for the clarification.

> > What problem do you see?
> 
> Depending on the framebuffer-size the kernel either boots or does not boot,
> when it does not boot it does nothing (I've a serial console) earlyprintk
> does not help, I was looking into setting up an early console (should be
> a matter of just putting in the right parameters) when I found out that if
> I modify the framebuffer size that fixes things.

Ok. So we don't know if the kernel is stuck somewhere or everything is
completely hosed, then?

I take it you can't get JTAG worknig via the SD card slot?

> After experimenting more it seems that keeping the last pixel of the
> framebuffer at the very end of DRAM is not a problem (so this does not seem
> to be a display engine problem), things start to work when I make the carve
> out at the end bigger.
> 
> On the very similar A23 giving the kernel all of the DRAM except for the
> framebuffer (aligned to a multiple of 4k) works just fine.
> 
> Sometimes I can get away with just making the carve-out bigger without
> aligning it to a multiple of 4 MiB, but an alignment to 4 MiB seems to
> always work independent of the framebuffer size.
> 
> > It would be worth reporting this on lakml.
> 
> If you still think that after the above explanation I'll start a new thread
> on lakml with contents more targeted at kernel devs.

I think it would be worthwhile. This could be one instance of an issue
in the memory system that we might hit elsewhere. Even if we don't come
to another solution, it'll at least make it visible to others.

> >> I've been using an a23 dtb + generic multi-platform kernel for my testing
> >> (as said before the a33 really is almost the same design), and that boots
> >> fine without this alignment hack on an actual A23 device, so this is not
> >> a kernel limitation.
> >
> > Not necessarily. Is RAM at the same location on both SoCs? What about
> > other devices and carevouts?
> 
> Everything is the same on both SoCs except that one has 2 Cortex A7
> cores and the new one with the problem has 4 Cortex A7 cores, and a
> new dram controller / mbus subsystem to keep the 4 cores fed.
> 
> > It could be htat the stars happen to align and we're finally caught out
> > by some dodgy maths.
> 
> I don't think that that is the case here.

Yeah. The memory subsystem differences sound like the chief suspects.

Do we know if the A7s in the A23 and A33 are different revisions (and
which bits are set in their aux registers)? It could be that some
memory system features is enabled on one but not the other, or something
like that.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 1/2] armv8: caches: Disable dcache after flush

2015-04-17 Thread Mark Rutland
> > > Now in the flush_dcache_all we are invoking the actual asm call to
> > > flush dcache which may wipeout the stored return value in stack with
> > > cahe contents(main memory). Hence the return from the flush_dcahe_all
> > > will fail.
> > >
> > > To confirm this I modified the dcache_disable in the below way and it
> > worked fine.
> > > 1. Disable the dcache.
> > > 2. Now I called the __asm_flush_dcache_all(); and then flush_l3_cache();
> > instead of calling the flush_dcache_all().
> > 
> > That also is unsafe; implicit (e.g. stack) accesses at any point after 
> > SCTLR.C is
> > cleared and before flush_l3_cache() has completed may see stale data, or
> > get overwritten by stale data.
> > 
> > Set/Way ops only flush the CPU-local caches, so you only guarantee that
> > these are clean (and potentially dirty cache lines for the stack could be 
> > sat in
> > L3 and written back at any time). So your flush_l3_cache() function might 
> > not
> > work.
> > 
> > Per ARMv8 the L3 _must_ respect maintenance by VA, so after disabling the
> > MMU you can flush the memory region corresponding to your stack (and any
> > other data you need) by VA to the PoC before executing flush_l3_cache(), in
> > addition to the Set/Way ops used to empty the CPU-local caches.
> Thanks for explanation.
> So in that case, the flushing of the required stack or any other data
> which needs to be flushed should be part of board specific. Am I
> correct?

It could be done in generic code, assuming we know the bounds of memory
which will be used, because maintenance by VA should always work.

Do we know which memory U-Boot might use (e.g. does it all fall within
some static carveout?), or can it dynamically allocate from anywhere in
memory?

> If yes, then this disable_dcache() should contain a asm call to a
> routine() (which might be board specific) after disabling the cache to
> flush the required data and then flush_dcache_all() followed by flush
> L3 cache.. 

You could probably get away with:

* Load the memory bounds that we need to flush into some registers, or
  flush some datastructure containing these to memory.
* In assembly:
  - disable the MMU.
  - flush the PA range(s) we need to use to be able to use C safely.
  - flush by Set/Way to empry the CPU-local caches
* Implementation-specific L3 flushing for anything else.

If we only map a small amount of memory, we could simply flush this by
VA (knowing that this will drain the CPU and L3 caches, without any
special maintenance).

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 07/10] sunxi: Fix end of kernel memory alignment for A33

2015-04-16 Thread Mark Rutland
On Thu, Apr 16, 2015 at 08:32:03AM +0100, Hans de Goede wrote:
> Hi,
> 
> On 15-04-15 21:57, Ian Campbell wrote:
> > On Tue, 2015-04-14 at 18:06 +0200, Hans de Goede wrote:
> >> For unknown reasons the A33 needs the end of the memory we report to the
> >> kernel to be aligned to a multiple of 4 MiB.
> >
> > Do you really mean "the A33 needs" (as in the processor itself) or do
> > you actually mean "the A33 kernel port"?
> >
> > If the latter than can't that be investigated/fixed instead of hacked
> > here? That would be far more preferable.
> 
> I mean the former, it seems that the SoC itself cannot handle dram
> ranges with different cache policies which are not aligned to 4 MiB,
> at least that is my WAG what is going on here.

That sounds incredibly suspicious.

What do you mean w.r.t. different cache policies -- what does that have
to do with the end of DRAM? What problem do you see?

It would be worth reporting this on lakml.

> I've been using an a23 dtb + generic multi-platform kernel for my testing
> (as said before the a33 really is almost the same design), and that boots
> fine without this alignment hack on an actual A23 device, so this is not
> a kernel limitation.

Not necessarily. Is RAM at the same location on both SoCs? What about
other devices and carevouts?

It could be htat the stars happen to align and we're finally caught out
by some dodgy maths.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 1/2] armv8: caches: Disable dcache after flush

2015-04-16 Thread Mark Rutland
On Thu, Apr 16, 2015 at 06:17:59AM +0100, Siva Durga Prasad Paladugu wrote:
> Hi Mark.
> 
> > -Original Message-
> > From: Mark Rutland [mailto:mark.rutl...@arm.com]
> > Sent: Wednesday, April 15, 2015 6:41 PM
> > To: Michal Simek
> > Cc: u-boot@lists.denx.de; Tom Rini; Siva Durga Prasad Paladugu; Varun Sethi;
> > Arnab Basu; York Sun
> > Subject: Re: [U-Boot] [PATCH 1/2] armv8: caches: Disable dcache after flush
> > 
> > On Wed, Apr 15, 2015 at 12:33:00PM +0100, Michal Simek wrote:
> > > From: Siva Durga Prasad Paladugu 
> > >
> > > Always disable dcache after the flush operation The following sequence
> > > is advisable while disabling d-cache:
> > > 1. disable_dcache() - flushes and disables d-cache 2.
> > > invalidate_dcache_all() - invalid any entry that came to the cache
> > >in the short period after the cache was flushed but before the
> > >cache got disabled
> > 
> > For reasons I have described previously (see [1,2,3]), this is unsafe.
> > The first cache flush may achieve nothing.
> > 
> > If you need data out at the PoC before disabling the cache, then you should
> > first use maintenance by VA to push that data out.
> > 
> > Thanks,
> > Mark.
> > 
> > [1] http://lists.denx.de/pipermail/u-boot/2015-February/204403.html
> > [2] http://lists.denx.de/pipermail/u-boot/2015-February/204407.html
> > [3] http://lists.denx.de/pipermail/u-boot/2015-February/204702.html
> > 
> > >
> > > Signed-off-by: Siva Durga Prasad Paladugu 
> > > Signed-off-by: Michal Simek 
> > > ---
> > >
> > >  arch/arm/cpu/armv8/cache_v8.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm/cpu/armv8/cache_v8.c
> > > b/arch/arm/cpu/armv8/cache_v8.c index c5ec5297cd39..2a0492fbef52
> > > 100644
> > > --- a/arch/arm/cpu/armv8/cache_v8.c
> > > +++ b/arch/arm/cpu/armv8/cache_v8.c
> > > @@ -128,10 +128,10 @@ void dcache_disable(void)
> > >   if (!(sctlr & CR_C))
> > >   return;
> > >
> > > - set_sctlr(sctlr & ~(CR_C|CR_M));
> > > -
> > >   flush_dcache_all();
> > >   __asm_invalidate_tlb_all();
> > > +
> > > + set_sctlr(sctlr & ~(CR_C|CR_M));
> 
> I got your point. But here in this scenario also there is an issue
> with disable first and then flush_dcache_all().  This is because when
> we disable the cache and invoke the c routine flush_dcache_all() then
> the return address of this is stored in a stack(in memory as dcache is
> disabled).

Which is why this sequence cannot be written in C, and needs to be
performed in assembly, without any memory accesses between the write to
the SCTLR and the cache flush.

> Now in the flush_dcache_all we are invoking the actual asm call to
> flush dcache which may wipeout the stored return value in stack with
> cahe contents(main memory). Hence the return from the flush_dcahe_all
> will fail.
> 
> To confirm this I modified the dcache_disable in the below way and it worked 
> fine.
> 1. Disable the dcache.
> 2. Now I called the __asm_flush_dcache_all(); and then flush_l3_cache();  
> instead of calling the flush_dcache_all().

That also is unsafe; implicit (e.g. stack) accesses at any point after
SCTLR.C is cleared and before flush_l3_cache() has completed may see
stale data, or get overwritten by stale data.

Set/Way ops only flush the CPU-local caches, so you only guarantee that
these are clean (and potentially dirty cache lines for the stack could
be sat in L3 and written back at any time). So your flush_l3_cache()
function might not work.

Per ARMv8 the L3 _must_ respect maintenance by VA, so after disabling
the MMU you can flush the memory region corresponding to your stack (and
any other data you need) by VA to the PoC before executing
flush_l3_cache(), in addition to the Set/Way ops used to empty the
CPU-local caches.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 1/2] armv8: caches: Disable dcache after flush

2015-04-15 Thread Mark Rutland
On Wed, Apr 15, 2015 at 12:33:00PM +0100, Michal Simek wrote:
> From: Siva Durga Prasad Paladugu 
> 
> Always disable dcache after the flush operation
> The following sequence is advisable while disabling d-cache:
> 1. disable_dcache() - flushes and disables d-cache
> 2. invalidate_dcache_all() - invalid any entry that came to the cache
>in the short period after the cache was flushed but before the
>cache got disabled

For reasons I have described previously (see [1,2,3]), this is unsafe.
The first cache flush may achieve nothing.

If you need data out at the PoC before disabling the cache, then you
should first use maintenance by VA to push that data out.

Thanks,
Mark.

[1] http://lists.denx.de/pipermail/u-boot/2015-February/204403.html
[2] http://lists.denx.de/pipermail/u-boot/2015-February/204407.html
[3] http://lists.denx.de/pipermail/u-boot/2015-February/204702.html

> 
> Signed-off-by: Siva Durga Prasad Paladugu 
> Signed-off-by: Michal Simek 
> ---
> 
>  arch/arm/cpu/armv8/cache_v8.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
> index c5ec5297cd39..2a0492fbef52 100644
> --- a/arch/arm/cpu/armv8/cache_v8.c
> +++ b/arch/arm/cpu/armv8/cache_v8.c
> @@ -128,10 +128,10 @@ void dcache_disable(void)
>   if (!(sctlr & CR_C))
>   return;
>  
> - set_sctlr(sctlr & ~(CR_C|CR_M));
> -
>   flush_dcache_all();
>   __asm_invalidate_tlb_all();
> +
> + set_sctlr(sctlr & ~(CR_C|CR_M));
>  }
>  
>  int dcache_status(void)
> -- 
> 2.3.5
> 
> ___
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [RFC PATCH] ARM: Merge v7 and v8 outer cache operations

2015-03-30 Thread Mark Rutland
On Fri, Mar 27, 2015 at 02:11:48PM +, Albert ARIBAUD wrote:
> Hello Mark,
> 
> On Thu, 12 Feb 2015 15:56:52 +0000, Mark Rutland 
> wrote:
> > On Sat, Jan 31, 2015 at 03:08:54AM +, feng...@phytium.com.cn wrote:
> > > From: David Feng 
> > > 
> > > Armv7 and Armv8 allow outer cache exist, it is outside of the architecture
> > > defined cache hierarchy and can not be manipulated by architecture defined
> > > instructions. It's processor specific.
> > > This patch merge v7_outer_cache_* and v8 l3_cache_*.
> > 
> > This commit message is a little misleading, though it probably makes
> > sense to have something of this sort ARMv8. Info dump below.

[...]

> So, does the commit message require rewriting?

Yup. My complaint was that outer caches _can_ be manipulated by
architecturally defined instructions, but only for maintenance by VA. So
it's wrong to say that they cannot be manipulated by archtiecturally
defined instructions.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] Vexpress64: Fix the compiling error when CONFIG_ARMV8_MULTIENTRY defined

2015-03-24 Thread Mark Rutland
> >> > +/* SMP Spin Table Definitions */
> >> > +#ifdef CONFIG_BASE_FVP
> >> > +#define CPU_RELEASE_ADDR   (CONFIG_SYS_SDRAM_BASE + 
> >> > 0x03f0)
> >> > +#else
> >> > +#define CPU_RELEASE_ADDR   (CONFIG_SYS_SDRAM_BASE + 0x7fff0)
> >> > +#endif
> >>
> >> Where are these address defines coming from?
> >
> > It's just hard coded and should be the same value with that in DTS.
> 
> I look in the DTS from the Linux kernel:
> 
> arch/arm64/boot/dts/arm/foundation-v8.dts:
> 
> cpu@0 {
> device_type = "cpu";
> compatible = "arm,armv8";
> reg = <0x0 0x0>;
> enable-method = "spin-table";
> cpu-release-addr = <0x0 0x8000fff8>;
> next-level-cache = <&L2_0>;
> };
> cpu@1 {
> device_type = "cpu";
> compatible = "arm,armv8";
> reg = <0x0 0x1>;
> enable-method = "spin-table";
> cpu-release-addr = <0x0 0x8000fff8>;
> next-level-cache = <&L2_0>;
> };
> (...)
> 
> It's not the same addres for what I can tell,
> 
> CONFIG_SYS_SDRAM_BASE + 0x03f0 = 0x83f0
> 
> but the DTS cpu-release-addr is 0x8000fff8...
> 
> Curiously we also have an ontology problem here: the DTS in
> the Linux kernel does use spin tables, but there is another set of
> DTS files in the ARM Trusted Firmware distribution, for the same
> simulator, stating PSCI as CPU release mechanism. These are
> the only ones that work properly when using ARM TF.

FWIW in the bootwrapper we inject the relevant PSCI properties into the
DTB if the bootwrapper is configured to use PSCI, and we should really
do the same for spin-table.

Given the enable-method is entirely dependent on the FW, it would be
better for FW to fill in an appropriate value (where possible), leaving
those out of the dts.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] 64Bit device tree compilation

2015-03-24 Thread Mark Rutland
Hi,

> Maybe a dumb question, why do we need to have a 64-bit U-Boot for
> arm64? I don't see we ever created 64-bit U-Boot for ppc64.

In ARMv8 it's not possible to change the register width at an exception
level (i.e. you can't change 64->32 or vice-versa), and lower exception
levels cannot be wider (so if your code at EL3 is 32-bit, you cannot run
64-bit code at EL3, EL2, EL1, or EL0).

Therefore you need a purely 64-bit path from EL3 to EL2 or EL1N in order
to boot a 64-bit kernel, so the bootloader needs to be 64-bit.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v3 04/28] armv8/ls2085a: Fix generic timer clock source

2015-03-20 Thread Mark Rutland
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/mp.c 
> b/arch/arm/cpu/armv8/fsl-lsch3/mp.c
> index ce9c0c1..5338fe6 100644
> --- a/arch/arm/cpu/armv8/fsl-lsch3/mp.c
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/mp.c
> @@ -30,6 +30,13 @@ int fsl_lsch3_wake_seconday_cores(void)
>   u32 cores, cpu_up_mask = 1;
>   int i, timeout = 10;
>   u64 *table = get_spin_tbl_addr();
> +#ifdef COUNTER_FREQUENCY_REAL
> + unsigned long cntfrq = COUNTER_FREQUENCY_REAL;
> +
> + __real_cntfrq = cntfrq; /* update for secondary cores */

Do you need the temporary cntfrq variable? Can't you just have:

__real_cntfrq = COUNTER_FREQUENCY_REAL;

> + flush_dcache_range((unsigned long)&__real_cntfrq,
> +(unsigned long)&__real_cntfrq + 8);

This looks fine, as does the rest of the patch.

So either way:

Acked-by: Mark Rutland 

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 04/28] armv8/ls2085a: Fix generic timer clock source

2015-03-20 Thread Mark Rutland
> >> +int timer_init(void)
> >> +{
> >> +  u32 __iomem *cntcr = (u32 *)CONFIG_SYS_FSL_TIMER_ADDR;
> >> +  u32 __iomem *cltbenr = (u32 *)CONFIG_SYS_FSL_PMU_CLTBENR;
> >> +#ifdef COUNTER_FREQUENCY_REAL
> >> +  unsigned long cntfrq = COUNTER_FREQUENCY_REAL;
> >> +
> >> +  /* Update with accurate clock frequency */
> >> +  asm volatile("msr cntfrq_el0, %0" : : "r" (cntfrq) : "memory");
> > 
> > The commit message says that this can only be determined at runtime, but
> > this looks like we're writing a compile-time static value.
> > 
> 
> The macro COUNTER_FREQUENCY_REA is (CONFIG_SYS_CLK_FREQ/4), where
> CONFIG_SYS_CLK_FREQ is a function call get_board_sys_clk().

Ah, that sounds fine to me then.

> >> +  __real_cntfrq = cntfrq; /* update for secondary cores */
> > 
> > Do we need anything in the way or barriers and/or cache flushing to
> > ensure that this is visible to the secondary CPUs? Or is the MMU off at
> > this point?
> 
> It is flushed before booting secondary cores. But I am relying on the trick of
> enabling cache on flash. It may not be as reliable if someone decide to 
> disable
> the cache to begin with. I will move the code to somewhere safe in next 
> version.

Ok.

> >> +  .global __real_cntfrq
> >> +__real_cntfrq:
> >> +  .quad 0x17d7840 /* 25MHz */
> > 
> > I think this would be better as COUNTER_FREQUENCY, so as to avoid
> > duplicating the value.
> 
> Good idea. Will fix in next version.

Great!

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 04/28] armv8/ls2085a: Fix generic timer clock source

2015-03-20 Thread Mark Rutland
Hi,

On Thu, Mar 19, 2015 at 08:34:22PM +, York Sun wrote:
> The timer clock is system clock divided by 4, not fixed 12MHz. This is
> common to the SoC, not board specific.
> 
> Signed-off-by: York Sun 
> 
> ---
> 
> Changes in v2:
>   Fix CNTFRQ for secondary cores when COUNTER_FREQUENCY_REAL is defined.
> 
>  README  |8 
>  arch/arm/cpu/armv8/fsl-lsch3/cpu.c  |   26 ++
>  arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S |6 ++
>  arch/arm/cpu/armv8/fsl-lsch3/mp.h   |1 +
>  board/freescale/ls2085a/ls2085a.c   |   18 --
>  include/configs/ls2085a_common.h|6 +-
>  6 files changed, 46 insertions(+), 19 deletions(-)
> 
> diff --git a/README b/README
> index f473515..776ebf4 100644
> --- a/README
> +++ b/README
> @@ -690,6 +690,14 @@ The following options need to be configured:
>   exists, unlike the similar options in the Linux kernel. Do not
>   set these options unless they apply!
>  
> + COUNTER_FREQUENCY
> + Generic timer clock source frequency.
> +
> + COUNTER_FREQUENCY_REAL
> + Generic timer clock source frequency if the real clock is
> + different from COUNTER_FREQUENCY, and can only be determined
> + at run time.
> +
>   NOTE: The following can be machine specific errata. These
>   do have ability to provide rudimentary version and machine
>   specific checks, but expect no product checks.
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/cpu.c 
> b/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> index 94fd147..f75b21d 100644
> --- a/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> @@ -395,3 +395,29 @@ int arch_early_init_r(void)
>  
>   return 0;
>  }
> +
> +int timer_init(void)
> +{
> + u32 __iomem *cntcr = (u32 *)CONFIG_SYS_FSL_TIMER_ADDR;
> + u32 __iomem *cltbenr = (u32 *)CONFIG_SYS_FSL_PMU_CLTBENR;
> +#ifdef COUNTER_FREQUENCY_REAL
> + unsigned long cntfrq = COUNTER_FREQUENCY_REAL;
> +
> + /* Update with accurate clock frequency */
> + asm volatile("msr cntfrq_el0, %0" : : "r" (cntfrq) : "memory");

The commit message says that this can only be determined at runtime, but
this looks like we're writing a compile-time static value.

> +
> + __real_cntfrq = cntfrq; /* update for secondary cores */

Do we need anything in the way or barriers and/or cache flushing to
ensure that this is visible to the secondary CPUs? Or is the MMU off at
this point?

> +#endif
> +
> + /* Enable timebase for all clusters.
> +  * It is safe to do so even some clusters are not enabled.
> +  */
> + out_le32(cltbenr, 0xf);
> +
> + /* Enable clock for timer
> +  * This is a global setting.
> +  */
> + out_le32(cntcr, 0x1);
> +
> + return 0;
> +}
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S 
> b/arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S
> index 886576e..8d330ff 100644
> --- a/arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S
> @@ -224,6 +224,9 @@ ENTRY(secondary_boot_func)
>   /* physical address of this cpus spin table element */
>   add x11, x1, x0
>  
> + ldr x0, =__real_cntfrq
> + ldr x0, [x0]
> + msr cntfrq_el0, x0  /* set with real frequency */
>   str x9, [x11, #16]  /* LPID */
>   mov x4, #1
>   str x4, [x11, #8]   /* STATUS */
> @@ -275,6 +278,9 @@ ENDPROC(secondary_switch_to_el1)
>  
>   /* 64 bit alignment for elements accessed as data */
>   .align 4
> + .global __real_cntfrq
> +__real_cntfrq:
> + .quad 0x17d7840 /* 25MHz */

I think this would be better as COUNTER_FREQUENCY, so as to avoid
duplicating the value.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 17/28] armv8/fsl-lsch3: Enable system error aborts

2015-03-20 Thread Mark Rutland
On Thu, Mar 19, 2015 at 07:52:30PM +, Scott Wood wrote:
> On Thu, 2015-03-19 at 18:14 +0000, Mark Rutland wrote:
> > On Thu, Mar 19, 2015 at 04:45:48PM +, York Sun wrote:
> > > From: Scott Wood 
> > > 
> > > This lets us see the problems (close to) when they happen,
> > > rather than Linux hanging when it enables them prior to having a
> > > working console.
> > 
> > FYI, if the Linux driver for your UART supports earlycon, that should
> > work since commit 7a9c43bed891d1f8 ("setup: Move unmask of async
> > interrupts after possible earlycon setup").
> 
> I wrote this patch in the context of board bringup, where I was stuck
> using an older kernel.  In any case, when U-Boot causes a problem we
> want to see it in U-Boot.

Sure.

The Linux patch helps when Linux triggers an SError after taking
ownership of the vectors.

> > I hope that SError is masked again prior to entering Linux, as required
> > by the boot protocol?
> 
> Doesn't look like it based on grepping for daifset.

Ok. That should happen before you call the kernel. Otherwise if the
kernel triggers an SError between setting up the vectors and discovering
the UART, you won't get any output.

Linux requires that all the DAIF exceptions are masked prior to entry.

> Where is the boot protocol documented?  Just for future reference -- I
> agree that leaving this enabled during the handover would be a bad
> thing.

In the Linux kernel tree see Documentation/arm64/booting.txt [1]. This
is periodically updated with clarifications and updates for new
architectural features, though it should always remain compatible.

Mark.

[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm64/booting.txt
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 04/28] armv8/ls2085a: Fix generic timer clock source

2015-03-19 Thread Mark Rutland
On Thu, Mar 19, 2015 at 06:24:10PM +, York Sun wrote:
> On 03/19/2015 11:17 AM, Mark Rutland wrote:
> > On Thu, Mar 19, 2015 at 06:16:25PM +, York Sun wrote:
> >> On 03/19/2015 11:08 AM, Mark Rutland wrote:
> >>>> +
> >>>> +int timer_init(void)
> >>>> +{
> >>>> +u32 __iomem *cntcr = (u32 *)CONFIG_SYS_FSL_TIMER_ADDR;
> >>>> +u32 __iomem *cltbenr = (u32 *)CONFIG_SYS_FSL_PMU_CLTBENR;
> >>>> +#ifdef COUNTER_FREQUENCY_REAL
> >>>> +unsigned long cntfrq = COUNTER_FREQUENCY_REAL;
> >>>> +
> >>>> +/* Update with accurate clock frequency */
> >>>> +asm volatile("msr cntfrq_el0, %0" : : "r" (cntfrq) : "memory");
> >>>> +#endif
> >>>
> >>> Is this executed on all CPUs, or do secondary CPUs have CNTFRQ
> >>> programmed with the correct value elsewhere?
> >>>
> >>
> >> Only the primary CPU runs here. The secondary CPU doesn't come here.
> > 
> > Ok. Where does CNTFRQ get programmed for those CPUs?
> > 
> > If it's necessary to write COUNTER_FREQUENCY_REAL to the primary CPU's
> > CNTFRQ, that's also necessary on the secondaries before they enter the
> > OS.
> 
> Hmm, this may be a bug. Didn't hear any complain from Linux users. We found 
> the
> timer wasn't correct during bring-up. Let me check with internal team.

Cheers!

If the CPUs don't have matching CNTFRQ, things will work most of the
time, but timekeeping will be broken in some cases (e.g. KVM guests,
after a kexec). It's not possible for the OS to fix this up, so the boot
protocol requires that it's programmed on all CPUs prior to entering the
kernel.

Since commit 127161aaf0fcd376 ("arm64: add runtime system sanity
checks"), Linux should complain at boot time if CNTFRQ is mismatched
across CPUs. We've added other sanity checks since that commit.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 04/28] armv8/ls2085a: Fix generic timer clock source

2015-03-19 Thread Mark Rutland
On Thu, Mar 19, 2015 at 06:16:25PM +, York Sun wrote:
> On 03/19/2015 11:08 AM, Mark Rutland wrote:
> >> +
> >> +int timer_init(void)
> >> +{
> >> +  u32 __iomem *cntcr = (u32 *)CONFIG_SYS_FSL_TIMER_ADDR;
> >> +  u32 __iomem *cltbenr = (u32 *)CONFIG_SYS_FSL_PMU_CLTBENR;
> >> +#ifdef COUNTER_FREQUENCY_REAL
> >> +  unsigned long cntfrq = COUNTER_FREQUENCY_REAL;
> >> +
> >> +  /* Update with accurate clock frequency */
> >> +  asm volatile("msr cntfrq_el0, %0" : : "r" (cntfrq) : "memory");
> >> +#endif
> > 
> > Is this executed on all CPUs, or do secondary CPUs have CNTFRQ
> > programmed with the correct value elsewhere?
> > 
> 
> Only the primary CPU runs here. The secondary CPU doesn't come here.

Ok. Where does CNTFRQ get programmed for those CPUs?

If it's necessary to write COUNTER_FREQUENCY_REAL to the primary CPU's
CNTFRQ, that's also necessary on the secondaries before they enter the
OS.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 17/28] armv8/fsl-lsch3: Enable system error aborts

2015-03-19 Thread Mark Rutland
On Thu, Mar 19, 2015 at 04:45:48PM +, York Sun wrote:
> From: Scott Wood 
> 
> This lets us see the problems (close to) when they happen,
> rather than Linux hanging when it enables them prior to having a
> working console.

FYI, if the Linux driver for your UART supports earlycon, that should
work since commit 7a9c43bed891d1f8 ("setup: Move unmask of async
interrupts after possible earlycon setup").

I hope that SError is masked again prior to entering Linux, as required
by the boot protocol?

Mark.

> Signed-off-by: Scott Wood 
> ---
>  arch/arm/cpu/armv8/fsl-lsch3/cpu.c |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/cpu.c 
> b/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> index 07064a3..22b5fb2 100644
> --- a/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> @@ -263,6 +263,10 @@ int arch_cpu_init(void)
>   __asm_invalidate_tlb_all();
>   early_mmu_setup();
>   set_sctlr(get_sctlr() | CR_C);
> +
> + /* Enable system error aborts */
> + asm volatile("msr daifclr, #4" : : : "memory");
> +
>   return 0;
>  }
>  
> -- 
> 1.7.9.5
> 
> ___
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 04/28] armv8/ls2085a: Fix generic timer clock source

2015-03-19 Thread Mark Rutland
On Thu, Mar 19, 2015 at 04:45:35PM +, York Sun wrote:
> The timer clock is system clock divided by 4, not fixed 12MHz. This is
> common to the SoC, not board specific.
> 
> Signed-off-by: York Sun 
> ---
>  README |8 
>  arch/arm/cpu/armv8/fsl-lsch3/cpu.c |   24 
>  board/freescale/ls2085a/ls2085a.c  |   18 --
>  include/configs/ls2085a_common.h   |6 +-
>  4 files changed, 37 insertions(+), 19 deletions(-)
> 
> diff --git a/README b/README
> index f473515..776ebf4 100644
> --- a/README
> +++ b/README
> @@ -690,6 +690,14 @@ The following options need to be configured:
>   exists, unlike the similar options in the Linux kernel. Do not
>   set these options unless they apply!
>  
> + COUNTER_FREQUENCY
> + Generic timer clock source frequency.
> +
> + COUNTER_FREQUENCY_REAL
> + Generic timer clock source frequency if the real clock is
> + different from COUNTER_FREQUENCY, and can only be determined
> + at run time.
> +
>   NOTE: The following can be machine specific errata. These
>   do have ability to provide rudimentary version and machine
>   specific checks, but expect no product checks.
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/cpu.c 
> b/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> index 94fd147..e985181 100644
> --- a/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> @@ -395,3 +395,27 @@ int arch_early_init_r(void)
>  
>   return 0;
>  }
> +
> +int timer_init(void)
> +{
> + u32 __iomem *cntcr = (u32 *)CONFIG_SYS_FSL_TIMER_ADDR;
> + u32 __iomem *cltbenr = (u32 *)CONFIG_SYS_FSL_PMU_CLTBENR;
> +#ifdef COUNTER_FREQUENCY_REAL
> + unsigned long cntfrq = COUNTER_FREQUENCY_REAL;
> +
> + /* Update with accurate clock frequency */
> + asm volatile("msr cntfrq_el0, %0" : : "r" (cntfrq) : "memory");
> +#endif

Is this executed on all CPUs, or do secondary CPUs have CNTFRQ
programmed with the correct value elsewhere?

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] ARMv8: Bug fix of dcache_disable()

2015-03-03 Thread Mark Rutland
On Thu, Feb 26, 2015 at 03:06:10PM +, FengHua wrote:
> 
> hi Mark,

Hi,

>You did very detailed analysis of the cache beheaviour. Yes, this 
> patch is not perfect.
> But it did fix the actually existed bug. I will try to describe it more 
> clearly in the following.

While this may appear to work on your platform, it simply trades one bug
for another by relying on guarantees that the architecture does not
provide.

Fundamentally, you require a sequence like:

* Clean by VA any data/code you will need after the caches are disabled.
  This may leave clean entries in the cache, but the data/code will be
  visible to the CPU when SCTLR_ELX.{C,M} are clear.

* DSB to complete the maintenance.

* In assembly, without relying on data/code not clean to the PoC:
  - Clear SCTLR_ELx.{C,M}
  - ISB
  - Flush the architected caches by Set/Way
  - (Clean+)Invalidate by VA any region of memory you need to write to
for which a dirty line could exist in the L3 (e.g. your stack).
  - DSB to complete the maintenance.

* Flush the L3.

To the best of my knowledge, anything short of that relies on guarantees
that the architecture does not provide. The above sequence does assume
that no other masters are active which could allocate into the caches
and/or acquire dirty lines.

> > * Set/Way operations aren't guaranteed to flush data to the PoC in the
> >   presence of a system cache like CCN, so we have no guarantee that
> >   we've pushed any data to the PoC. Per ARMv8 only maintenance by VA
> >   guarantees this (but luckily maintenance by VA is mandated to be
> >   respected by such system caches).
> flush_dcache_all should flush both cache existed in architecture defined 
> cache 
> hierachy and outer cache(such as L3 in CCN), a previous patch did this.

While the SCTLR_ELx.{C,M} bits are set, Set/Way operations may not even
force data out of the CPU's architected caches, so there is no guarantee
that the data is flushed to the L3.

> > * While the cache is enabled lines could theoretically migrate between
> >   set/way slots mid-sequence (e.g. with speculative accesses and an
> >   exclusive L1/L2 configuration). I don't believe this currently happens
> >   in practice, but the architecture does not prevent this.
> > 
> > So I don't see that moving this maintenance solves any existing problem,
> > and it introduces new ones.
> The bug actually exist when flush_dcache_all is after of set_sctlr.
> I try to describe it more detailed.
> flush_dcache_all is a C routine, it will preserve return address in stack. 
> The stack
> memory may be in the cache. If we call set_sctlr to disable cache
> first, then flush_dcache_all will write the return address directly into 
> memory instead of cache.
> But there is another copy of the stack memory in the cache, the correct 
> return address in
> memory will be rewritten by wrong value when flush cache, then 
> flush_dcache_all get wrong return address.
> The best solution is writing flush_dcache_all totally in assembly and make 
> sure no memory access
> between flush_dcache_all and set_sctlr.

Even when written in assembly, the cache flush will have to occur after
the SCTLR_ELx.{C,M} bits are cleared in order to guarantee that the
cache is in a quiescent state.

[...]

> > The above isn't theoretical, we were hit by these issues in Linux. See
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5e051531447259e5df95c44bccb69979537c19e4
> flush_cache_all of linux kernel did not flush outer cache. I think the patch 
> mostly deal with this.
> right?

In the arm64 Linux port we don't always have access to the system cache
interface (which could be secure-only), so we only rely on cache
maintenance by VA, which system caches are mandated to respect.

This does mean that there may be entries in the caches, but we are able
to perform maintenance on the portions of the address space which we
care about.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 06/12] virt-dt: Allow reservation of the secure region when it is in a RAM carveout.

2015-02-19 Thread Mark Rutland
On Thu, Feb 19, 2015 at 10:13:58AM +, Ian Campbell wrote:
> On Thu, 2015-02-19 at 10:25 +0100, Jan Kiszka wrote:
> > On 2015-02-19 10:19, Ian Campbell wrote:
> > > On Thu, 2015-02-19 at 09:28 +0100, Thierry Reding wrote:
> > >> On Tue, Feb 17, 2015 at 11:55:24AM +, Mark Rutland wrote:
> > >>> [...]
> > >>>
> > >>>>>> This is getting invasive:
> > >>>>>>
> > >>>>>> If I add carveouts via adjusting memory banks, I need to account for 
> > >>>>>> the
> > >>>>>> case that an existing bank is split into two halves, creating 
> > >>>>>> additional
> > >>>>>> banks this way. But then current fdt_fixup_memory_banks will no 
> > >>>>>> longer
> > >>>>>> work due to its limitation to the number of physical banks. I could
> > >>>>>> always add one spare bank to that service, ok, but then the next use
> > >>>>>> case for carveouts will hit the wall again. So I better double that
> > >>>>>> limit, or so.
> > >>>>>
> > >>>>> Yeah, not fun.
> > >>>>>
> > >>>>> If the code is position-independent then you might be able to simply
> > >>>>> carve out a sufficient proportion from the start of the first entry or
> > >>>>> the end of the last one, which would avoid splitting. If either of 
> > >>>>> said
> > >>>>> regions are too small for the monitor code then it's questionable as 
> > >>>>> to
> > >>>>> whether the OS can make use of it.
> > >>>>
> > >>>> The code /seems/ to be position-independent, but locations are so far
> > >>>> hard-coded in those places that prepare it and move it around. Maybe we
> > >>>> can decide about the location at runtime, maybe we can simply demand it
> > >>>> to be at the end or the beginning of some bank.
> > >>>
> > >>> If it's possible to do so, it would seem like the nicest option to me.
> > >>
> > >> Using the top of memory for this seems like the most natural choice,
> > > 
> > > I think it needs to still be below 4G, doesn't it? So on large mem/LPAE
> > > systems some care might be needed.
> > 
> > Argh. That would likely mean we had to split a bank (unless >2G comes in
> > multiple banks), something I'd like to avoid having to implement.
> 
> I expect it is usual for the 4G boundary to coincide with a bank
> boundary, even if memory spans the gap -- but I also don't think we can
> rely on that.
> 
> > > It was suggested by Mark earlier in the thread that this stuff is
> > > IMPLEMENTATION DEFINED. Is it possible that we simply don't need to
> > > worry about these cross-world cache issues on Tegra?
> > > 
> > > (I must confess that until now I'd assumed that the cache lines were
> > > tagged with the world which populated them to stop them interfering with
> > > each other in this sort of way...)
> > 
> > I'm pretty sure that is no such thing as a cross-world cache problem.
> > Otherwise the architecture or some implementation would have serious
> > security issues as discussed earlier. To my understanding, Mark's
> > suggestion is now targeting the concern that Linux may accidentally
> > trigger accesses and, thus, stumble or create warnings at least.
> 
> Ah, then I've misunderstood/misremembered.
> 
> I think it is fair to say that if a NS-world OS deliberately touches a
> region marked reserved or not described at all then it deserves whatever
> fault it gets. But I think what Mark is saying is that just mapping the
> region but never explicitly accessing it can still result in errors due
> to e.g. prefetching or speculative behaviour which may use those
> mappings.

In this case the external memory security controller block likely can't
tell the difference (speculative and explicit accesses will look the
same on the bus). So any speculative access can trigger violations and
bring down the non-secure world (or just DoS the secure world if it
signals the secure world but provides a dummy result to non-secure
reads).

> IOW it is architecturally allowable for faults arising from accesses
> never explicitly occurring in the code to result in OS visible faults.
> Rather than, say, requiring such faults to be squashed until the
> real/non-speculative access actually really occurs in the program.

Yes. Anything mapped cacheable (or non-executable) can be speculatively
read, and if those accesses trigger some error on the bus (e.g. due to a
security controller), you'll get some kind of asynchronous abort
reported by the CPU.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 06/12] virt-dt: Allow reservation of the secure region when it is in a RAM carveout.

2015-02-19 Thread Mark Rutland
On Thu, Feb 19, 2015 at 09:25:56AM +, Jan Kiszka wrote:
> On 2015-02-19 10:19, Ian Campbell wrote:
> > On Thu, 2015-02-19 at 09:28 +0100, Thierry Reding wrote:
> >> On Tue, Feb 17, 2015 at 11:55:24AM +, Mark Rutland wrote:
> >>> [...]
> >>>
> >>>>>> This is getting invasive:
> >>>>>>
> >>>>>> If I add carveouts via adjusting memory banks, I need to account for 
> >>>>>> the
> >>>>>> case that an existing bank is split into two halves, creating 
> >>>>>> additional
> >>>>>> banks this way. But then current fdt_fixup_memory_banks will no longer
> >>>>>> work due to its limitation to the number of physical banks. I could
> >>>>>> always add one spare bank to that service, ok, but then the next use
> >>>>>> case for carveouts will hit the wall again. So I better double that
> >>>>>> limit, or so.
> >>>>>
> >>>>> Yeah, not fun.
> >>>>>
> >>>>> If the code is position-independent then you might be able to simply
> >>>>> carve out a sufficient proportion from the start of the first entry or
> >>>>> the end of the last one, which would avoid splitting. If either of said
> >>>>> regions are too small for the monitor code then it's questionable as to
> >>>>> whether the OS can make use of it.
> >>>>
> >>>> The code /seems/ to be position-independent, but locations are so far
> >>>> hard-coded in those places that prepare it and move it around. Maybe we
> >>>> can decide about the location at runtime, maybe we can simply demand it
> >>>> to be at the end or the beginning of some bank.
> >>>
> >>> If it's possible to do so, it would seem like the nicest option to me.
> >>
> >> Using the top of memory for this seems like the most natural choice,
> > 
> > I think it needs to still be below 4G, doesn't it? So on large mem/LPAE
> > systems some care might be needed.
> 
> Argh. That would likely mean we had to split a bank (unless >2G comes in
> multiple banks), something I'd like to avoid having to implement.
> 
> > 
> > It was suggested by Mark earlier in the thread that this stuff is
> > IMPLEMENTATION DEFINED. Is it possible that we simply don't need to
> > worry about these cross-world cache issues on Tegra?
> > 
> > (I must confess that until now I'd assumed that the cache lines were
> > tagged with the world which populated them to stop them interfering with
> > each other in this sort of way...)
> 
> I'm pretty sure that is no such thing as a cross-world cache problem.
> Otherwise the architecture or some implementation would have serious
> security issues as discussed earlier. To my understanding, Mark's
> suggestion is now targeting the concern that Linux may accidentally
> trigger accesses and, thus, stumble or create warnings at least.

Yup.

If the memory is protected by some configurable security controller (as
seems to be the case on Tegra), the non-secure side accessing any memory
protected by it will result in a violation (and presumably bring down
the non-secure world). We need to prevent speculative accesses (the
security controller can't tell the difference), and therefore cannot map
the memory at all (so a /memreserve/ is insufficient).

Depending on implementation details there are other potential problems,
and carving out the memory explicitly solves all that I am aware of
without having to rely on implementation-specific details.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 06/12] virt-dt: Allow reservation of the secure region when it is in a RAM carveout.

2015-02-17 Thread Mark Rutland
[...]

> >> This is getting invasive:
> >>
> >> If I add carveouts via adjusting memory banks, I need to account for the
> >> case that an existing bank is split into two halves, creating additional
> >> banks this way. But then current fdt_fixup_memory_banks will no longer
> >> work due to its limitation to the number of physical banks. I could
> >> always add one spare bank to that service, ok, but then the next use
> >> case for carveouts will hit the wall again. So I better double that
> >> limit, or so.
> > 
> > Yeah, not fun.
> > 
> > If the code is position-independent then you might be able to simply
> > carve out a sufficient proportion from the start of the first entry or
> > the end of the last one, which would avoid splitting. If either of said
> > regions are too small for the monitor code then it's questionable as to
> > whether the OS can make use of it.
> 
> The code /seems/ to be position-independent, but locations are so far
> hard-coded in those places that prepare it and move it around. Maybe we
> can decide about the location at runtime, maybe we can simply demand it
> to be at the end or the beginning of some bank.

If it's possible to do so, it would seem like the nicest option to me.

> >> Also, are there any architectural or OS-implementation related
> >> restrictions on the alignment of bank start addresses and sizes? Just to
> >> make sure we don't stumble over some side effects of punching holes into
> >> that device tree node.
> > 
> > I would guess that we need to at least pad the carevout to page-aligned
> > to prevent any particular OS from mapping a page for the sake of a few
> > bytes left unused by the monitor.
> > 
> > From a quick look at the Linux arm_add_memory and memblock code it looks
> > like Linux won't map partial pages, but I don't know what Xen and others
> > do, and given we know that we want to keep the relevant pages exclusive
> > to the monitor anyway padding to age boundaries seems like a sensible
> > thing to do.
> > 
> > My one concern would be early mappings; I believe that the initial page
> > tables use (2MiB) section/block mappings to map the kernel and some
> > initial memory (including the DTB) before the memory nodes are parsed,
> > so the carevout would need to be placed away from where the kernel and
> > DTB were loaded in order to prevent those early mappings from covering
> > it. I'm unfortunately not sure on the full details there.
> 
> That makes be wonder again if we are trying to solve real issues: What
> is the OS supposed to do with a memory reserve map, what does it have to
> avoid doing with it?

Per ePAPR, memory reservation block entries may not be explicitly
accessed by the operating system (unless told to elsewhere). The OS may
map any reserved entries with cacheable attributes (potentially leading
to the issues I described earlier)

> Is the semantic really so weak that we cannot use it here?

In general, the semantic is too weak. In fact, it's not even strictly
defined for the ARM architecture w.r.t. memory attributes, so we have
very little guarantee as to what what an OS will do beyond that it will
not perform any explicit accesses to the region.

In practice, Linux will currently map the region as cacheable, and it
may or may not map it shareable depending on SMP/UP (which could be a
problem if you want to use a UP Linux to load and kexec an SMP kernel
for some reason).

It may be that on a given CPU/system implemetation that a memreserve
entry is sufficient; but unfortunately this depends on IMPLEMENTATION
DEFINED details.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 06/12] virt-dt: Allow reservation of the secure region when it is in a RAM carveout.

2015-02-17 Thread Mark Rutland
On Tue, Feb 17, 2015 at 08:09:57AM +, Jan Kiszka wrote:
> On 2015-02-16 16:38, Jan Kiszka wrote:
> > On 2015-02-16 15:56, Mark Rutland wrote:
> >> On Mon, Feb 16, 2015 at 02:31:21PM +, Jan Kiszka wrote:
> >>> On 2015-02-16 15:25, Mark Rutland wrote:
> >>>> On Mon, Feb 16, 2015 at 01:51:37PM +, Jan Kiszka wrote:
> >>>>> On 2015-02-16 14:42, Mark Rutland wrote:
> >>>>>> On Mon, Feb 16, 2015 at 12:54:43PM +, Jan Kiszka wrote:
> >>>>>>> From: Ian Campbell 
> >>>>>>>
> >>>>>>> In this case the secure code lives in RAM, and hence needs to be 
> >>>>>>> reserved, but
> >>>>>>> it has been relocated, so the reservation of __secure_start does not 
> >>>>>>> apply.
> >>>>>>>
> >>>>>>> Add support for setting CONFIG_ARMV7_SECURE_RESERVE_SIZE to reserve 
> >>>>>>> such a
> >>>>>>> region.
> >>>>>>>
> >>>>>>> This will be used in a subsequent patch for Jetson-TK1
> >>>>>>
> >>>>>> Using a memreserve and allowing the OS to map the memory but not poke 
> >>>>>> it
> >>>>>> can be problematic due to the potential of mismatched attributes 
> >>>>>> between
> >>>>>> the monitor and the OS.
> >>>>>
> >>>>> OK, here my knowledge is not yet sufficient to process this remark. What
> >>>>> kind of problems can arise from what kind of attribute mismatch? And why
> >>>>> should the OS be able to cause problems for the monitor?
> >>>>
> >>>> For example, consider the case of the region being mapped cacheable by
> >>>> the OS but not by the monitor. The monitor communicates between cores
> >>>> expecting to never hit in a cache (because it uses a non-cacheable
> >>>> mapping), but the mapping used by the OS can cause the region to be
> >>>> allocated into caches at any point in time even if it never accesses the
> >>>> region explicitly.
> >>>>
> >>>> The CPU _may_ hit in a cache even if making a non-cacheable access (this
> >>>> is called an "unexepcted data cache hit"), so the cache allocations
> >>>> caused by the OS can mask data other CPUs wrote straight to memory.
> >>>>
> >>>> Other than that case, I believe the rules given in the ARM ARM for
> >>>> mismatched memory attributes may apply for similar reasons.  Thus
> >>>> allowing the OS to map this memory can cause a loss of coherency on the
> >>>> monitor side, if the OS and monitor map the region with different
> >>>> attributes.
> >>>>
> >>>> This is all IMPLEMENTATION DEFINED, so it may be that you're fine on the
> >>>> system you're dealing with. I don't immediately know whether that is the
> >>>> case, however. Never telling the OS about the memory in the first place
> >>>> avoids the possibility in all cases.
> >>>
> >>> But from a security point of view, it must not matter if the OS maps the
> >>> memory or not - the monitor must be robust against that, no? If the
> >>> architecture cannot provide such guarantees, it has to be worked around
> >>> in software in the monitor (I hope you can do so...).
> >>
> >> Well, yes and no.
> >>
> >> In this case it sounds like due to the security controller you should
> >> never encounter the mismatched attributes issue in the first place,
> >> though you may encounter issues w.r.t. speculative accesses triggering
> >> violations arbitrarily. Not telling the OS about the secure memory means
> >> that said violations shouldn't occur in normal operation; only when the
> >> non-secure OS is trying to do something bad.
> >>
> >> If the OS has access to the memory, then you're already trusting it to
> >> not write to there or you can't trust that memory at all (and hence
> >> cannot use it). Given that means you must already assume that the OS is
> >> cooperative, it's simpler to not tell it about the memory than to add
> >> cache maintenance around every memory access within the monitor. You can
> >> never make things secure in this case, but you can at least offer the
> >> abstrac

Re: [U-Boot] [PATCH v2 12/12] tegra: Set CNTFRQ for secondary CPUs

2015-02-17 Thread Mark Rutland
On Tue, Feb 17, 2015 at 07:01:57AM +, Jan Kiszka wrote:
> On 2015-02-16 15:02, Jan Kiszka wrote:
> > On 2015-02-16 14:51, Mark Rutland wrote:
> >> On Mon, Feb 16, 2015 at 01:44:36PM +, Jan Kiszka wrote:
> >>> On 2015-02-16 14:37, Mark Rutland wrote:
> >>>> On Mon, Feb 16, 2015 at 12:54:49PM +, Jan Kiszka wrote:
> >>>>> We only set CNTFRQ in arch_timer_init for the boot CPU. But this has to
> >>>>> happen for all cores.
> >>>>>
> >>>>> Fixing this resolves problems of KVM with emulating the generic
> >>>>> timer/counter.
> >>>>>
> >>>>> Signed-off-by: Jan Kiszka 
> >>>>> ---
> >>>>>  arch/arm/cpu/armv7/tegra-common/psci.S | 13 +
> >>>>>  1 file changed, 13 insertions(+)
> >>>>>
> >>>>> diff --git a/arch/arm/cpu/armv7/tegra-common/psci.S 
> >>>>> b/arch/arm/cpu/armv7/tegra-common/psci.S
> >>>>> index b7501fb..119c246 100644
> >>>>> --- a/arch/arm/cpu/armv7/tegra-common/psci.S
> >>>>> +++ b/arch/arm/cpu/armv7/tegra-common/psci.S
> >>>>> @@ -51,12 +51,25 @@ ENTRY(psci_arch_init)
> >>>>>  
> >>>>> mrc p15, 0, r4, c0, c0, 5   @ MPIDR
> >>>>> and r4, r4, #7  @ number of CPUs in cluster
> >>>>> +
> >>>>> +   adr r5, _sys_clock_freq
> >>>>> +   cmp r4, #0
> >>>>> +
> >>>>> +   mrceq   p15, 0, r7, c14, c0, 0  @ read CNTFRQ from CPU0
> >>>>> +   streq   r7, [r5]
> >>>>> +
> >>>>> +   ldrne   r7, [r5]
> >>>>> +   mcrne   p15, 0, r7, c14, c0, 0  @ write CNTFRQ to CPU1..3
> >>>>
> >>>> Is it not possible to have a hook that uses the same variable as
> >>>> arch_timer_init rather than doing a here copy? It seems a shame to
> >>>> duplicate the effort.
> >>>
> >>> The problem is related to the different address spaces. Here we run in
> >>> the secure monitor, arch_timer_init - to my understanding - in
> >>> non-secure mode. Didn't find a pattern so far how to transfer data (and
> >>> that shouldn't be more complex than the above code).
> >>
> >> Surely arch_timer_init must be run in a secure mode in order to be
> >> allowed to write to CNTFRQ?
> > 
> > Ah, right.
> > 
> >>
> >> If this is simply the easiest way of moving the data around then there's
> >> no real problem with it; it's just a shame that it only happens in the
> >> PSCI case.
> > 
> > OK, I'll check again. Maybe it's easier than I thought.
> 
> It isn't: the variable we would have to write - conditionally, i.e. not
> for the SPL build - is not yet where it will finally be. The monitor is
> copied later on, but the symbol points to the destination already. One
> can account for that, but the result won't get simpler and cleaner IMHO.

Fair enough. As I mentioned earlier there's no real problem with the
above, it just seemed a shame that this was only done in the PSCI case.

I take it for the quoted sequence above that the primary CPU runs
through this path before sending the secondaries through (and that
either there's a DSB somewhere between the write and waking up
secondaries or memory accesses are strongly ordered at this point).

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 06/12] virt-dt: Allow reservation of the secure region when it is in a RAM carveout.

2015-02-16 Thread Mark Rutland
On Mon, Feb 16, 2015 at 02:31:21PM +, Jan Kiszka wrote:
> On 2015-02-16 15:25, Mark Rutland wrote:
> > On Mon, Feb 16, 2015 at 01:51:37PM +, Jan Kiszka wrote:
> >> On 2015-02-16 14:42, Mark Rutland wrote:
> >>> On Mon, Feb 16, 2015 at 12:54:43PM +, Jan Kiszka wrote:
> >>>> From: Ian Campbell 
> >>>>
> >>>> In this case the secure code lives in RAM, and hence needs to be 
> >>>> reserved, but
> >>>> it has been relocated, so the reservation of __secure_start does not 
> >>>> apply.
> >>>>
> >>>> Add support for setting CONFIG_ARMV7_SECURE_RESERVE_SIZE to reserve such 
> >>>> a
> >>>> region.
> >>>>
> >>>> This will be used in a subsequent patch for Jetson-TK1
> >>>
> >>> Using a memreserve and allowing the OS to map the memory but not poke it
> >>> can be problematic due to the potential of mismatched attributes between
> >>> the monitor and the OS.
> >>
> >> OK, here my knowledge is not yet sufficient to process this remark. What
> >> kind of problems can arise from what kind of attribute mismatch? And why
> >> should the OS be able to cause problems for the monitor?
> > 
> > For example, consider the case of the region being mapped cacheable by
> > the OS but not by the monitor. The monitor communicates between cores
> > expecting to never hit in a cache (because it uses a non-cacheable
> > mapping), but the mapping used by the OS can cause the region to be
> > allocated into caches at any point in time even if it never accesses the
> > region explicitly.
> > 
> > The CPU _may_ hit in a cache even if making a non-cacheable access (this
> > is called an "unexepcted data cache hit"), so the cache allocations
> > caused by the OS can mask data other CPUs wrote straight to memory.
> > 
> > Other than that case, I believe the rules given in the ARM ARM for
> > mismatched memory attributes may apply for similar reasons.  Thus
> > allowing the OS to map this memory can cause a loss of coherency on the
> > monitor side, if the OS and monitor map the region with different
> > attributes.
> > 
> > This is all IMPLEMENTATION DEFINED, so it may be that you're fine on the
> > system you're dealing with. I don't immediately know whether that is the
> > case, however. Never telling the OS about the memory in the first place
> > avoids the possibility in all cases.
> 
> But from a security point of view, it must not matter if the OS maps the
> memory or not - the monitor must be robust against that, no? If the
> architecture cannot provide such guarantees, it has to be worked around
> in software in the monitor (I hope you can do so...).

Well, yes and no.

In this case it sounds like due to the security controller you should
never encounter the mismatched attributes issue in the first place,
though you may encounter issues w.r.t. speculative accesses triggering
violations arbitrarily. Not telling the OS about the secure memory means
that said violations shouldn't occur in normal operation; only when the
non-secure OS is trying to do something bad.

If the OS has access to the memory, then you're already trusting it to
not write to there or you can't trust that memory at all (and hence
cannot use it). Given that means you must already assume that the OS is
cooperative, it's simpler to not tell it about the memory than to add
cache maintenance around every memory access within the monitor. You can
never make things secure in this case, but you can at least offer the
abstraction provided by PSCI.

So as far as I can see in either case it's better to not tell the OS
about the memory you wish to use from the monitor. If you have no HW
protection and can't trust the OS then you've already lost, and if you
do have HW protection you don't want it to trigger
continuously/spuriously as a result of speculation.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 06/12] virt-dt: Allow reservation of the secure region when it is in a RAM carveout.

2015-02-16 Thread Mark Rutland
On Mon, Feb 16, 2015 at 01:51:37PM +, Jan Kiszka wrote:
> On 2015-02-16 14:42, Mark Rutland wrote:
> > On Mon, Feb 16, 2015 at 12:54:43PM +, Jan Kiszka wrote:
> >> From: Ian Campbell 
> >>
> >> In this case the secure code lives in RAM, and hence needs to be reserved, 
> >> but
> >> it has been relocated, so the reservation of __secure_start does not apply.
> >>
> >> Add support for setting CONFIG_ARMV7_SECURE_RESERVE_SIZE to reserve such a
> >> region.
> >>
> >> This will be used in a subsequent patch for Jetson-TK1
> > 
> > Using a memreserve and allowing the OS to map the memory but not poke it
> > can be problematic due to the potential of mismatched attributes between
> > the monitor and the OS.
> 
> OK, here my knowledge is not yet sufficient to process this remark. What
> kind of problems can arise from what kind of attribute mismatch? And why
> should the OS be able to cause problems for the monitor?

For example, consider the case of the region being mapped cacheable by
the OS but not by the monitor. The monitor communicates between cores
expecting to never hit in a cache (because it uses a non-cacheable
mapping), but the mapping used by the OS can cause the region to be
allocated into caches at any point in time even if it never accesses the
region explicitly.

The CPU _may_ hit in a cache even if making a non-cacheable access (this
is called an "unexepcted data cache hit"), so the cache allocations
caused by the OS can mask data other CPUs wrote straight to memory.

Other than that case, I believe the rules given in the ARM ARM for
mismatched memory attributes may apply for similar reasons.  Thus
allowing the OS to map this memory can cause a loss of coherency on the
monitor side, if the OS and monitor map the region with different
attributes.

This is all IMPLEMENTATION DEFINED, so it may be that you're fine on the
system you're dealing with. I don't immediately know whether that is the
case, however. Never telling the OS about the memory in the first place
avoids the possibility in all cases.

> > If you're able to carve out the "secure" memory from the memory node(s),
> > then you should be safe from that.
> 
> Do you have a pointer to an example how to do it instead?

Unfortunately not; I don't know whether there's an existing primitive
for doing that. In general you might need to split a memory region in
two to carve out the portion in the middle.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 12/12] tegra: Set CNTFRQ for secondary CPUs

2015-02-16 Thread Mark Rutland
On Mon, Feb 16, 2015 at 01:44:36PM +, Jan Kiszka wrote:
> On 2015-02-16 14:37, Mark Rutland wrote:
> > On Mon, Feb 16, 2015 at 12:54:49PM +, Jan Kiszka wrote:
> >> We only set CNTFRQ in arch_timer_init for the boot CPU. But this has to
> >> happen for all cores.
> >>
> >> Fixing this resolves problems of KVM with emulating the generic
> >> timer/counter.
> >>
> >> Signed-off-by: Jan Kiszka 
> >> ---
> >>  arch/arm/cpu/armv7/tegra-common/psci.S | 13 +
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/arch/arm/cpu/armv7/tegra-common/psci.S 
> >> b/arch/arm/cpu/armv7/tegra-common/psci.S
> >> index b7501fb..119c246 100644
> >> --- a/arch/arm/cpu/armv7/tegra-common/psci.S
> >> +++ b/arch/arm/cpu/armv7/tegra-common/psci.S
> >> @@ -51,12 +51,25 @@ ENTRY(psci_arch_init)
> >>  
> >>mrc p15, 0, r4, c0, c0, 5   @ MPIDR
> >>and r4, r4, #7  @ number of CPUs in cluster
> >> +
> >> +  adr r5, _sys_clock_freq
> >> +  cmp r4, #0
> >> +
> >> +  mrceq   p15, 0, r7, c14, c0, 0  @ read CNTFRQ from CPU0
> >> +  streq   r7, [r5]
> >> +
> >> +  ldrne   r7, [r5]
> >> +  mcrne   p15, 0, r7, c14, c0, 0  @ write CNTFRQ to CPU1..3
> > 
> > Is it not possible to have a hook that uses the same variable as
> > arch_timer_init rather than doing a here copy? It seems a shame to
> > duplicate the effort.
> 
> The problem is related to the different address spaces. Here we run in
> the secure monitor, arch_timer_init - to my understanding - in
> non-secure mode. Didn't find a pattern so far how to transfer data (and
> that shouldn't be more complex than the above code).

Surely arch_timer_init must be run in a secure mode in order to be
allowed to write to CNTFRQ?

If this is simply the easiest way of moving the data around then there's
no real problem with it; it's just a shame that it only happens in the
PSCI case.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 11/12] tegra124: Reserve secure RAM using MC_SECURITY_CFG{0, 1}_0

2015-02-16 Thread Mark Rutland
On Mon, Feb 16, 2015 at 12:54:48PM +, Jan Kiszka wrote:
> From: Ian Campbell 
> 
> These registers can be used to prevent non-secure world from accessing a
> megabyte aligned region of RAM, use them to protect the u-boot secure monitor
> code.

What happens if the CPU tried to read this memory from the non-secure
world? If the OS has it mapped then the CPU could perform speculative
reads at any point in time.

If that can raise an abort then the OS needs to not map the region.

I take it U-Boot uses a secure mapping for the region (which I believe
should avoid the mismatched attributes issue I mentioned in my other
reply).

Thanks,
Mark.

> At first I tried to do this from s_init(), however this inexplicably causes
> u-boot's networking (e.g. DHCP) to fail, while networking under Linux was 
> fine.
> 
> So instead I have added a new weak arch function protect_secure_section()
> called from relocate_secure_section() and reserved the region there. This is
> better overall since it defers the reservation until after the sec vs. non-sec
> decision (which can be influenced by an envvar) has been made when booting the
> os.
> 
> Signed-off-by: Ian Campbell 
> Signed-off-by: Jan Kiszka 
> ---
>  arch/arm/cpu/armv7/virt-v7.c   |  5 +
>  arch/arm/cpu/tegra-common/ap.c | 15 +++
>  arch/arm/include/asm/system.h  |  1 +
>  3 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm/cpu/armv7/virt-v7.c b/arch/arm/cpu/armv7/virt-v7.c
> index b69fd37..eb6195c 100644
> --- a/arch/arm/cpu/armv7/virt-v7.c
> +++ b/arch/arm/cpu/armv7/virt-v7.c
> @@ -46,6 +46,10 @@ static unsigned long get_gicd_base_address(void)
>  #endif
>  }
>  
> +/* Define a specific version of this function to enable any available
> + * hardware protections for the reserved region */
> +void __weak protect_secure_section(void) {}
> +
>  static void relocate_secure_section(void)
>  {
>  #ifdef CONFIG_ARMV7_SECURE_BASE
> @@ -54,6 +58,7 @@ static void relocate_secure_section(void)
>   memcpy((void *)CONFIG_ARMV7_SECURE_BASE, __secure_start, sz);
>   flush_dcache_range(CONFIG_ARMV7_SECURE_BASE,
>  CONFIG_ARMV7_SECURE_BASE + sz + 1);
> + protect_secure_section();
>   invalidate_icache_all();
>  #endif
>  }
> diff --git a/arch/arm/cpu/tegra-common/ap.c b/arch/arm/cpu/tegra-common/ap.c
> index a17dfd1..f1d3070 100644
> --- a/arch/arm/cpu/tegra-common/ap.c
> +++ b/arch/arm/cpu/tegra-common/ap.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -154,6 +155,20 @@ static void init_pmc_scratch(void)
>   writel(odmdata, &pmc->pmc_scratch20);
>  }
>  
> +#ifdef CONFIG_ARMV7_SECURE_RESERVE_SIZE
> +void protect_secure_section(void)
> +{
> + struct mc_ctlr *mc = (struct mc_ctlr *)NV_PA_MC_BASE;
> +
> + /* Must be MB aligned */
> + BUILD_BUG_ON(CONFIG_ARMV7_SECURE_BASE & 0xF);
> + BUILD_BUG_ON(CONFIG_ARMV7_SECURE_RESERVE_SIZE & 0xF);
> +
> + writel(CONFIG_ARMV7_SECURE_BASE, &mc->mc_security_cfg0);
> + writel(CONFIG_ARMV7_SECURE_RESERVE_SIZE>>20, &mc->mc_security_cfg1);
> +}
> +#endif
> +
>  void s_init(void)
>  {
>   /* Init PMC scratch memory */
> diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> index 89f2294..21be69d 100644
> --- a/arch/arm/include/asm/system.h
> +++ b/arch/arm/include/asm/system.h
> @@ -76,6 +76,7 @@ void armv8_switch_to_el1(void);
>  void gic_init(void);
>  void gic_send_sgi(unsigned long sgino);
>  void wait_for_wakeup(void);
> +void protect_secure_region(void);
>  void smp_kick_all_cpus(void);
>  
>  void flush_l3_cache(void);
> -- 
> 2.1.4
> 
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 06/12] virt-dt: Allow reservation of the secure region when it is in a RAM carveout.

2015-02-16 Thread Mark Rutland
On Mon, Feb 16, 2015 at 12:54:43PM +, Jan Kiszka wrote:
> From: Ian Campbell 
> 
> In this case the secure code lives in RAM, and hence needs to be reserved, but
> it has been relocated, so the reservation of __secure_start does not apply.
> 
> Add support for setting CONFIG_ARMV7_SECURE_RESERVE_SIZE to reserve such a
> region.
> 
> This will be used in a subsequent patch for Jetson-TK1

Using a memreserve and allowing the OS to map the memory but not poke it
can be problematic due to the potential of mismatched attributes between
the monitor and the OS.

If you're able to carve out the "secure" memory from the memory node(s),
then you should be safe from that.

Thanks,
Mark.

> 
> Signed-off-by: Ian Campbell 
> Signed-off-by: Jan Kiszka 
> ---
>  arch/arm/cpu/armv7/virt-dt.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm/cpu/armv7/virt-dt.c b/arch/arm/cpu/armv7/virt-dt.c
> index ad19e4c..eb95031 100644
> --- a/arch/arm/cpu/armv7/virt-dt.c
> +++ b/arch/arm/cpu/armv7/virt-dt.c
> @@ -96,6 +96,11 @@ int armv7_update_dt(void *fdt)
>   /* secure code lives in RAM, keep it alive */
>   fdt_add_mem_rsv(fdt, (unsigned long)__secure_start,
>   __secure_end - __secure_start);
> +#elif defined(CONFIG_ARMV7_SECURE_RESERVE_SIZE)
> + /* secure code has been relocated into RAM carveout, keep it alive */
> + fdt_add_mem_rsv(fdt,
> + CONFIG_ARMV7_SECURE_BASE,
> + CONFIG_ARMV7_SECURE_RESERVE_SIZE);
>  #endif
>  
>   return fdt_psci(fdt);
> -- 
> 2.1.4
> 
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 12/12] tegra: Set CNTFRQ for secondary CPUs

2015-02-16 Thread Mark Rutland
On Mon, Feb 16, 2015 at 12:54:49PM +, Jan Kiszka wrote:
> We only set CNTFRQ in arch_timer_init for the boot CPU. But this has to
> happen for all cores.
> 
> Fixing this resolves problems of KVM with emulating the generic
> timer/counter.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  arch/arm/cpu/armv7/tegra-common/psci.S | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm/cpu/armv7/tegra-common/psci.S 
> b/arch/arm/cpu/armv7/tegra-common/psci.S
> index b7501fb..119c246 100644
> --- a/arch/arm/cpu/armv7/tegra-common/psci.S
> +++ b/arch/arm/cpu/armv7/tegra-common/psci.S
> @@ -51,12 +51,25 @@ ENTRY(psci_arch_init)
>  
>   mrc p15, 0, r4, c0, c0, 5   @ MPIDR
>   and r4, r4, #7  @ number of CPUs in cluster
> +
> + adr r5, _sys_clock_freq
> + cmp r4, #0
> +
> + mrceq   p15, 0, r7, c14, c0, 0  @ read CNTFRQ from CPU0
> + streq   r7, [r5]
> +
> + ldrne   r7, [r5]
> + mcrne   p15, 0, r7, c14, c0, 0  @ write CNTFRQ to CPU1..3

Is it not possible to have a hook that uses the same variable as
arch_timer_init rather than doing a here copy? It seems a shame to
duplicate the effort.

Thanks,
Mark.

> +
>   bl  psci_get_cpu_stack_top
>   mov sp, r5
>  
>   bx  r6
>  ENDPROC(psci_arch_init)
>  
> +_sys_clock_freq:
> + .word
> +
>  ENTRY(psci_cpu_off)
>   bl psci_cpu_off_common
>  
> -- 
> 2.1.4
> 
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [RFC PATCH] ARM: Merge v7 and v8 outer cache operations

2015-02-12 Thread Mark Rutland
On Sat, Jan 31, 2015 at 03:08:54AM +, feng...@phytium.com.cn wrote:
> From: David Feng 
> 
> Armv7 and Armv8 allow outer cache exist, it is outside of the architecture
> defined cache hierarchy and can not be manipulated by architecture defined
> instructions. It's processor specific.
> This patch merge v7_outer_cache_* and v8 l3_cache_*.

This commit message is a little misleading, though it probably makes
sense to have something of this sort ARMv8. Info dump below.

Recently the ARMv8 architecture reference manual was clarified to
mention that any such system caches _must_ respect maintenance by VA,
and are affected by the architected instructions for this. The arm64
Linux port relies on this property.

Set/Way maintenance will not affect system caches. So if you want to
flush/empty the entire cache hierarchy, you will need to rely on a
mechanism specific to the outer cache implementation (rather than one
specific to the processor).

Additionally, the interconnect and cache hierarchies in ARMv8
implementations are becoming more complex, and it is more likely that
dirty lines may migrate arbitrarily between CPUs and the system caches.
Due to this you will need to ensure that CPU caches are disabled and
empty before system cache maintenance is performed (I don't know whether
your current sequences for ARMv7 ensure that).

Thanks,
Mark.

> 
> Signed-off-by: David Feng 
> ---
>  arch/arm/cpu/armv7/cache_v7.c|   22 +++---
>  arch/arm/cpu/armv7/cpu.c |2 +-
>  arch/arm/cpu/armv7/exynos/soc.c  |2 +-
>  arch/arm/cpu/armv7/mx6/soc.c |4 ++--
>  arch/arm/cpu/armv7/omap3/board.c |2 +-
>  arch/arm/cpu/armv7/omap4/hwinit.c|4 ++--
>  arch/arm/cpu/armv7/s5pc1xx/cache.c   |4 ++--
>  arch/arm/cpu/armv7/uniphier/cache_uniphier.c |   14 +++---
>  arch/arm/cpu/armv8/cache_v8.c|   21 -
>  arch/arm/cpu/armv8/fsl-lsch3/cpu.c   |2 +-
>  arch/arm/include/asm/armv7.h |7 ---
>  arch/arm/include/asm/cache.h |7 +++
>  arch/arm/include/asm/system.h|2 --
>  arch/arm/lib/cache-pl310.c   |   12 ++--
>  14 files changed, 57 insertions(+), 48 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv7/cache_v7.c b/arch/arm/cpu/armv7/cache_v7.c
> index 0f9d837..7d4d5d3 100644
> --- a/arch/arm/cpu/armv7/cache_v7.c
> +++ b/arch/arm/cpu/armv7/cache_v7.c
> @@ -237,7 +237,7 @@ void invalidate_dcache_all(void)
>  {
> v7_maint_dcache_all(ARMV7_DCACHE_INVAL_ALL);
> 
> -   v7_outer_cache_inval_all();
> +   outer_cache_inval_all();
>  }
> 
>  /*
> @@ -248,7 +248,7 @@ void flush_dcache_all(void)
>  {
> v7_maint_dcache_all(ARMV7_DCACHE_CLEAN_INVAL_ALL);
> 
> -   v7_outer_cache_flush_all();
> +   outer_cache_flush_all();
>  }
> 
>  /*
> @@ -259,7 +259,7 @@ void invalidate_dcache_range(unsigned long start, 
> unsigned long stop)
>  {
> v7_dcache_maint_range(start, stop, ARMV7_DCACHE_INVAL_RANGE);
> 
> -   v7_outer_cache_inval_range(start, stop);
> +   outer_cache_inval_range(start, stop);
>  }
> 
>  /*
> @@ -271,12 +271,12 @@ void flush_dcache_range(unsigned long start, unsigned 
> long stop)
>  {
> v7_dcache_maint_range(start, stop, ARMV7_DCACHE_CLEAN_INVAL_RANGE);
> 
> -   v7_outer_cache_flush_range(start, stop);
> +   outer_cache_flush_range(start, stop);
>  }
> 
>  void arm_init_before_mmu(void)
>  {
> -   v7_outer_cache_enable();
> +   outer_cache_enable();
> invalidate_dcache_all();
> v7_inval_tlb();
>  }
> @@ -355,9 +355,9 @@ void invalidate_icache_all(void)
>  #endif
> 
>  /*  Stub implementations for outer cache operations */
> -__weak void v7_outer_cache_enable(void) {}
> -__weak void v7_outer_cache_disable(void) {}
> -__weak void v7_outer_cache_flush_all(void) {}
> -__weak void v7_outer_cache_inval_all(void) {}
> -__weak void v7_outer_cache_flush_range(u32 start, u32 end) {}
> -__weak void v7_outer_cache_inval_range(u32 start, u32 end) {}
> +__weak void outer_cache_enable(void) {}
> +__weak void outer_cache_disable(void) {}
> +__weak void outer_cache_flush_all(void) {}
> +__weak void outer_cache_inval_all(void) {}
> +__weak void outer_cache_flush_range(unsigned long start, unsigned long end) 
> {}
> +__weak void outer_cache_inval_range(unsigned long start, unsigned long end) 
> {}
> diff --git a/arch/arm/cpu/armv7/cpu.c b/arch/arm/cpu/armv7/cpu.c
> index 01cdb7e..07ad549 100644
> --- a/arch/arm/cpu/armv7/cpu.c
> +++ b/arch/arm/cpu/armv7/cpu.c
> @@ -47,7 +47,7 @@ int cleanup_before_linux(void)
>  * dcache_disable() in turn flushes the d-cache and disables MMU
>  */
> dcache_disable();
> -   v7_outer_cache_disable();
> +   outer_cache_disable();
> 
> /*
>  * After D-cache is flushed and before it is disabled there may
> diff --git a/arch/ar

Re: [U-Boot] [PATCH] ARMv8: Bug fix of dcache_disable()

2015-02-11 Thread Mark Rutland
On Wed, Feb 11, 2015 at 03:26:06AM +, FengHua wrote:
> 
> hi Mark,
> Thank you review this patch.
> 
> > -Original Messages-----
> > From: "Mark Rutland" 
> > Sent Time: 2015-02-09 19:05:54 (Monday)
> > To: "feng...@phytium.com.cn" 
> > Cc: "u-boot@lists.denx.de" 
> > Subject: Re: [U-Boot] [PATCH] ARMv8: Bug fix of dcache_disable()
> > 
> > On Mon, Feb 09, 2015 at 08:51:59AM +, feng...@phytium.com.cn wrote:
> > > From: David Feng 
> > > 
> > > The cache disable operation shoud be performed after flush_dcache_all().
> > > If cache disable operation is performed before
> > > flush_dcache_all(), flush_dcache_all() store data directly to memory
> > > and may be overrided by data copy in cache.
> > 
> > The reasoning above (and hence this patch) is wrong.
> > 
> > While the caches are on, they can allocate lines for any portion of the
> > address space with cacheable attributes, and can acquire dirty cache
> > lines from other CPUs. Additionally, there is no restriction preventing
> > lines from migrating between levels of cache while they are active.
> > 
> > So calling flush_dcache_all (which performs maintenance by Set/Way)
> > while the caches are enabled is wrong. Per the architecture it provides
> > no guarantee whatsoever.
> > 
> > To empty the caches by Set/Way, they must first be disabled. Note that
> > this only guarantees that the caches are empty; not where the data went.
> > Other CPUs might acquire dirty lines, or the data might only reach a
> > system cache rather than memory.
> > 
> > If you need certain portions of data to be flushed out to memory, then
> > those must be flushed by VA. If flush_dcache_all performs any memory
> > accesses before it has completed Set/Way maintenance, it is buggy.
> > 
> > Thanks,
> > Mark.
> You are right. If data acess exist when flushing cache when cache is enabled,
> the data may be brought to cache again. In normal circumstance we can not do
> like this.
> But the problem is flush_dcahe_all is a C routine, it will preserve return
> address in stack. If disable cache first the return address will be directly
> store in memory, and if the stack has a copy in cache the data will be covered
> when flushing cache, then flush_dcache_all will get wrong return address.
> 
> There should be no data access between disabling cache and flushing cache.
> U-boot for aarch64 runs at only one processor and the data flush_dcache_all 
> manipulated
> will not be used by following routines. By simply adjusting the sequence can 
> fix this
> bug although it's not the best solution.

I don't follow:

* The compiler may generate writes between flush_dcache_all and
  set_sctlr (even in the absence of any explicit writes in source code),
  so the cache might allocate dirty lines that could be written back
  asynchronously later (when the cache id sieabled), clobbering data we
  are using.

* The cache can allocate clean lines at any point before it is disabled
  (even in the middle of flush_dcache_all), so the cache will almost
  certainly not be empty once disabled. It won't write back clean lines,
  but these could mask data later if not invalidated.

* Set/Way operations aren't guaranteed to flush data to the PoC in the
  presence of a system cache like CCN, so we have no guarantee that
  we've pushed any data to the PoC. Per ARMv8 only maintenance by VA
  guarantees this (but luckily maintenance by VA is mandated to be
  respected by such system caches).
  
* While the cache is enabled lines could theoretically migrate between
  set/way slots mid-sequence (e.g. with speculative accesses and an
  exclusive L1/L2 configuration). I don't believe this currently happens
  in practice, but the architecture does not prevent this.

So I don't see that moving this maintenance solves any existing problem,
and it introduces new ones.

Maintenance by Set/Way was only intended for IMPLEMENTATION DEFINED
initialisation and for emptying a PE's caches prior to cutting power.
It doesn't make sense when caches are enabled, and doesn't provide any
guarantee as to where the data went.

Fundamentally, if flush_dcache_all accesses memory that is not already
clean to the PoC then it is broken. Likewise for any sequence for
disabling the caches.

The above isn't theoretical, we were hit by these issues in Linux. See
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5e051531447259e5df95c44bccb69979537c19e4

Thanks,
Mark.

> 
> Yours,
> David.
> 
> > 
> > > 
> > > Signed-off-by: David Feng 
> > > ---
> > >  arch/arm

Re: [U-Boot] [PATCH v1 0/4] Jetson-TK1 support for PSCI

2015-02-09 Thread Mark Rutland
[...]

> > > The solution that was discussed internally would involve having the
> > > secure monitor (U-Boot's PSCI implementation in this case) program the
> > > flow controller appropriately, point the CPU reset vectors to a location
> > > containing a WFI instruction and power up the CPUs. That way they should
> > > immediately be powergated when they reach the WFI instruction and the
> > > PSCI implementation would then be able to wake them up without accessing
> > > the PMC registers once the kernel has booted.
> > 
> > That sounds far, far better than I had hoped!
> > 
> > I guess we need to tell the kernel that portions of the PMC are reserved
> > by FW (in the sense that they must not be modified by the kernel rather
> > than that FW is going to poke them), to avoid mishaps.
> 
> I'm not sure we need even that. As I understand it the kernel can still
> touch all the registers and none of it should influence the CPU power-
> gating done by the secure monitor.
> 
> Well, I guess you'd need to make sure that the PMC driver doesn't try to
> powergate or unpowergate the CPU partitions, but since the cpuidle
> driver is the only one doing that it should resolve itself if a generic,
> PSCI-based cpuidle driver takes over instead of a Tegra-specific one.

This was my concern. It would be good to avoid a case where we
accidentally rely on some subtle interactiion where both the FW and
kernel poke some registers in a particular way.

I guess we can check for the presence of an enable-method, and if there
is one don't register the Tegra-specific cpuidle driver; in that case we
expect the FW to own that side of things.

> > > Adding Peter. Please correct me if I misunderstood what we discussed.
> > > Can you also provide Ian with pointers to the registers that need to be
> > > programmed to make this work? I suspect that a lot of it can be gleaned
> > > from the cpuidle drivers in arch/arm/mach-tegra in the upstream Linux
> > > kernel.
> > > 
> > > Also adding Paul for visibility.
> > > 
> > > > One thing to bear in mind is that PSCI is only one user of the SMC
> > > > space. Per SMC calling convention, portions of the SMC ID space are
> > > > there to be used for other (vendor-specific) purposes.
> > > > 
> > > > So rather than extending PSCI, a parallel API could be implemented for
> > > > power control of other devices, and the backend could arbitrate the two
> > > > without the non-secure OS requiring implementation-specific mutual
> > > > exclusion.
> > > > 
> > > > I think this has been brought up internally previously; I'll go and poke
> > > > around in the area to see if we managed to figure out anything useful.
> > > > 
> > > > > Unfortunately this doesn't change on 64-bit Tegra at all.
> > > > 
> > > > I suspected as much. :/
> > > > 
> > > > How does this bode for the tegra132 dts [1] on LAKML at the moment? Is
> > > > it just the "nvidia,tegra132-pmc" device that needs to be poked by both
> > > > FW and kernel, or are other devices involved?
> > > 
> > > As I understand it, only the flow controller is involved with CPU power
> > > management once the above steps have been performed by the secure
> > > monitor. And I don't think anyone in the kernel would need access to the
> > > flow controller at that point either, so I think that problem resolved
> > > itself nicely.
> > > 
> > > Also note that the above should work as far back as Tegra30.
> > 
> > It would be amazing if we could gain PSCI for all the platforms that
> > covers!
> 
> It should be relatively easy to support at least Tegra114 with much the
> same code as Tegra124, and some slight changes on Tegra30. But yeah, it
> would be great to see this work.

Nice!

I should look into getting hold of a relevant platform; I only have a
(T20) AC100, and I guess that's a bit different at the system-level.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] ARMv8: Bug fix of dcache_disable()

2015-02-09 Thread Mark Rutland
On Mon, Feb 09, 2015 at 08:51:59AM +, feng...@phytium.com.cn wrote:
> From: David Feng 
> 
> The cache disable operation shoud be performed after flush_dcache_all().
> If cache disable operation is performed before
> flush_dcache_all(), flush_dcache_all() store data directly to memory
> and may be overrided by data copy in cache.

The reasoning above (and hence this patch) is wrong.

While the caches are on, they can allocate lines for any portion of the
address space with cacheable attributes, and can acquire dirty cache
lines from other CPUs. Additionally, there is no restriction preventing
lines from migrating between levels of cache while they are active.

So calling flush_dcache_all (which performs maintenance by Set/Way)
while the caches are enabled is wrong. Per the architecture it provides
no guarantee whatsoever.

To empty the caches by Set/Way, they must first be disabled. Note that
this only guarantees that the caches are empty; not where the data went.
Other CPUs might acquire dirty lines, or the data might only reach a
system cache rather than memory.

If you need certain portions of data to be flushed out to memory, then
those must be flushed by VA. If flush_dcache_all performs any memory
accesses before it has completed Set/Way maintenance, it is buggy.

Thanks,
Mark.

> 
> Signed-off-by: David Feng 
> ---
>  arch/arm/cpu/armv8/cache_v8.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
> index 9dbcdf2..dc2fc8c 100644
> --- a/arch/arm/cpu/armv8/cache_v8.c
> +++ b/arch/arm/cpu/armv8/cache_v8.c
> @@ -124,9 +124,10 @@ void dcache_disable(void)
>   if (!(sctlr & CR_C))
>   return;
>  
> + flush_dcache_all();
> +
>   set_sctlr(sctlr & ~(CR_C|CR_M));
>  
> - flush_dcache_all();
>   __asm_invalidate_tlb_all();
>  }
>  
> -- 
> 1.7.9.5
> 
> 
> ___
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v1 0/4] Jetson-TK1 support for PSCI

2015-02-05 Thread Mark Rutland
On Thu, Feb 05, 2015 at 11:44:25AM +, Thierry Reding wrote:
> On Fri, Jan 23, 2015 at 12:37:20PM +0000, Mark Rutland wrote:
> > On Fri, Jan 23, 2015 at 10:10:45AM +, Thierry Reding wrote:
> > > On Thu, Jan 22, 2015 at 07:20:15PM +, Mark Rutland wrote:
> > > > On Fri, Jan 16, 2015 at 09:12:59AM +, Thierry Reding wrote:
> > > > > On Thu, Jan 15, 2015 at 07:19:37PM +, Mark Rutland wrote:
> > > > > > On Wed, Jan 14, 2015 at 07:57:25AM +, Thierry Reding wrote:
> > > > > > > On Tue, Jan 13, 2015 at 07:44:50PM +, Ian Campbell wrote:
> > > > > > > > Hi Thierry,
> > > > > > > > 
> > > > > > > > I needed to boot my Jetson in NS mode (in order to boot Xen) 
> > > > > > > > and was
> > > > > > > > investigating the possibility of PSCI support when I discovered 
> > > > > > > > that you
> > > > > > > > had already started on it[0]. Hurrah!
> > > > > > > > 
> > > > > > > > I cherry-picked the relevant commit onto u-boot-tegra#master 
> > > > > > > > and added a
> > > > > > > > few more patches and now it boots correctly for me, both 
> > > > > > > > running Xen
> > > > > > > > (some Xen side patches are needed too) and native Linux.
> > > > > > > > 
> > > > > > > > The main things which was needed was to rebase for some recent 
> > > > > > > > Kconfig
> > > > > > > > changes relating to virt and nonsec mode and to arrange for the 
> > > > > > > > RAM used
> > > > > > > > by the secure code to be reserved in the FDT. I also reserved 
> > > > > > > > the RAM
> > > > > > > > using the hardware MC_SECURITY_CFG registers for good measure.
> > > > > > > 
> > > > > > > Great, those were all things that I had wanted to do but never got
> > > > > > > around to.
> > > > > > > 
> > > > > > > > I also pushed my tree to gitorious:
> > > > > > > > https://gitorious.org/ijc/u-boot jetson-psci-v1
> > > > > > > > 
> > > > > > > > I would Ack your patch, but I don't think you've posted it and 
> > > > > > > > it has no
> > > > > > > > S-o-b so that would seem a bit premature/rude of me. For the 
> > > > > > > > same reason
> > > > > > > > I've not actually included it in the series posted (but it is 
> > > > > > > > in the
> > > > > > > > gitorious branch).
> > > > > > > 
> > > > > > > Feel free to take ownership of that patch. I currently don't have 
> > > > > > > the
> > > > > > > time to work on this and it seems you've made good progress on it.
> > > > > > > 
> > > > > > > It could probably use some cleanup because there's a bit of debug 
> > > > > > > output
> > > > > > > still in there. Also...
> > > > > > > 
> > > > > > > > FWIW I think you could drop your stub versions of psci_cpu_off 
> > > > > > > > and
> > > > > > > > psci_cpu_suspend (assuming you don't want to implement them) 
> > > > > > > > since the
> > > > > > > > common code has stubs.
> > > > > > > 
> > > > > > > ... I'd think you'd need to implement these so that you can get 
> > > > > > > proper
> > > > > > > suspend/resume support in the kernel. I've had to disable cpuidle 
> > > > > > > (via
> > > > > > > #undef CONFIG_PM_SLEEP in arch/arm/mach-tegra/cpuidle-tegra114.c) 
> > > > > > > in the
> > > > > > > kernel to make that code not powergate CPUs. Ideally I think the 
> > > > > > > kernel
> > > > > > > would check that it's running with PSCI support and disable the 
> > > > > > > cpuidle
> > > > > > > driver. Maybe that could be done by introducing a new cpuidle 
> > > > > > > driver
> > > > &g

Re: [U-Boot] [PATCH v1 0/4] Jetson-TK1 support for PSCI

2015-01-23 Thread Mark Rutland
[...]

> > > PSCI assumes that the FW is in full control of the registers it's
> > > poking. While a lock isn't necessarily bad, I suspect it's going to be
> > > very difficult to have that common across all users without the code
> > > becoming unmaintainable fast. I'd also hope that for arm64 we wouldn't
> > > need it.
> > > 
> > > When/how/why does the kernel to poke these registers?
> > 
> > The PMC is what controls power partitions. Some of these partitions are
> > assigned to CPUs, others are assigned to things like SATA, PCIe, display
> > and so on. The problem is that if we manage the CPU power partitions via
> > the firmware, then they will conflict with calls that we need to make
> > from other drivers that need to gate or ungate the partitions for their
> > hardware. As I understand it there's no provision in PSCI to manage non-
> > CPU devices, so this is a problem.
> 
> Ok.
> 
> > So I think either firmware needs to control everything, in which case we
> > are going to need a new interface (or extend PSCI) or it mustn't control
> > anything, in which case we need custom code in the kernel for SMP. Well,
> > the other alternative would be the lock that we can grab in the
> > powergate API and the PSCI calls.
> 
> One reason I'm not so keen on a lock is I could imagine you'd need to
> grab this for CPU_SUSPEND calls (i.e. cpuidle), at which point all CPUs
> are going to contend for the lock all the time.
> 
> One thing to bear in mind is that PSCI is only one user of the SMC
> space. Per SMC calling convention, portions of the SMC ID space are
> there to be used for other (vendor-specific) purposes.
> 
> So rather than extending PSCI, a parallel API could be implemented for
> power control of other devices, and the backend could arbitrate the two
> without the non-secure OS requiring implementation-specific mutual
> exclusion.
> 
> I think this has been brought up internally previously; I'll go and poke
> around in the area to see if we managed to figure out anything useful.

It sounds like what we figured out internally is roughly what I stated
above:

Allocate some SMC calls in the SIP and/or OEM Service Calls range for
vendor-specific device power management, and have the implementation on
the secure side (which would do the actual register poking) arbitrate
with any other secure-side access to those registers (i.e. CPU power
management, which it will already have to arbitrate).

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v1 0/4] Jetson-TK1 support for PSCI

2015-01-23 Thread Mark Rutland
On Fri, Jan 23, 2015 at 10:10:45AM +, Thierry Reding wrote:
> On Thu, Jan 22, 2015 at 07:20:15PM +0000, Mark Rutland wrote:
> > On Fri, Jan 16, 2015 at 09:12:59AM +, Thierry Reding wrote:
> > > On Thu, Jan 15, 2015 at 07:19:37PM +, Mark Rutland wrote:
> > > > On Wed, Jan 14, 2015 at 07:57:25AM +, Thierry Reding wrote:
> > > > > On Tue, Jan 13, 2015 at 07:44:50PM +, Ian Campbell wrote:
> > > > > > Hi Thierry,
> > > > > > 
> > > > > > I needed to boot my Jetson in NS mode (in order to boot Xen) and was
> > > > > > investigating the possibility of PSCI support when I discovered 
> > > > > > that you
> > > > > > had already started on it[0]. Hurrah!
> > > > > > 
> > > > > > I cherry-picked the relevant commit onto u-boot-tegra#master and 
> > > > > > added a
> > > > > > few more patches and now it boots correctly for me, both running Xen
> > > > > > (some Xen side patches are needed too) and native Linux.
> > > > > > 
> > > > > > The main things which was needed was to rebase for some recent 
> > > > > > Kconfig
> > > > > > changes relating to virt and nonsec mode and to arrange for the RAM 
> > > > > > used
> > > > > > by the secure code to be reserved in the FDT. I also reserved the 
> > > > > > RAM
> > > > > > using the hardware MC_SECURITY_CFG registers for good measure.
> > > > > 
> > > > > Great, those were all things that I had wanted to do but never got
> > > > > around to.
> > > > > 
> > > > > > I also pushed my tree to gitorious:
> > > > > > https://gitorious.org/ijc/u-boot jetson-psci-v1
> > > > > > 
> > > > > > I would Ack your patch, but I don't think you've posted it and it 
> > > > > > has no
> > > > > > S-o-b so that would seem a bit premature/rude of me. For the same 
> > > > > > reason
> > > > > > I've not actually included it in the series posted (but it is in the
> > > > > > gitorious branch).
> > > > > 
> > > > > Feel free to take ownership of that patch. I currently don't have the
> > > > > time to work on this and it seems you've made good progress on it.
> > > > > 
> > > > > It could probably use some cleanup because there's a bit of debug 
> > > > > output
> > > > > still in there. Also...
> > > > > 
> > > > > > FWIW I think you could drop your stub versions of psci_cpu_off and
> > > > > > psci_cpu_suspend (assuming you don't want to implement them) since 
> > > > > > the
> > > > > > common code has stubs.
> > > > > 
> > > > > ... I'd think you'd need to implement these so that you can get proper
> > > > > suspend/resume support in the kernel. I've had to disable cpuidle (via
> > > > > #undef CONFIG_PM_SLEEP in arch/arm/mach-tegra/cpuidle-tegra114.c) in 
> > > > > the
> > > > > kernel to make that code not powergate CPUs. Ideally I think the 
> > > > > kernel
> > > > > would check that it's running with PSCI support and disable the 
> > > > > cpuidle
> > > > > driver. Maybe that could be done by introducing a new cpuidle driver
> > > > > that checks for PSCI availability and uses it when present.
> > > > 
> > > > We have a generic CPUidle driver on arm64 which can use PSCI as a
> > > > backend; we should try to reuse that. The binding should certainly be
> > > > identical.
> > > 
> > > Is there any reason that driver needs to be ARM64-specific? I would've
> > > thought that there could be a generic PSCI driver that works anywhere.
> > 
> > Currently the arm and arm64 arch interfaces are a little different, but
> > with some work the bulk of the code could certainly be made common
> > (in drivers/firmware, perhaps).
> > 
> > > > It looks like the tegra124 dts in mainline doesn't use enable-method in
> > > > the DT, so a better option might be to fail early in cpuidle-tegra114.c 
> > > > if _any_ enable-method is present.
> > > 
> > > Yes, that sounds like a good plan. The absence of an enab

Re: [U-Boot] [PATCH v1 0/4] Jetson-TK1 support for PSCI

2015-01-22 Thread Mark Rutland
On Fri, Jan 16, 2015 at 09:12:59AM +, Thierry Reding wrote:
> On Thu, Jan 15, 2015 at 07:19:37PM +0000, Mark Rutland wrote:
> > On Wed, Jan 14, 2015 at 07:57:25AM +, Thierry Reding wrote:
> > > On Tue, Jan 13, 2015 at 07:44:50PM +, Ian Campbell wrote:
> > > > Hi Thierry,
> > > > 
> > > > I needed to boot my Jetson in NS mode (in order to boot Xen) and was
> > > > investigating the possibility of PSCI support when I discovered that you
> > > > had already started on it[0]. Hurrah!
> > > > 
> > > > I cherry-picked the relevant commit onto u-boot-tegra#master and added a
> > > > few more patches and now it boots correctly for me, both running Xen
> > > > (some Xen side patches are needed too) and native Linux.
> > > > 
> > > > The main things which was needed was to rebase for some recent Kconfig
> > > > changes relating to virt and nonsec mode and to arrange for the RAM used
> > > > by the secure code to be reserved in the FDT. I also reserved the RAM
> > > > using the hardware MC_SECURITY_CFG registers for good measure.
> > > 
> > > Great, those were all things that I had wanted to do but never got
> > > around to.
> > > 
> > > > I also pushed my tree to gitorious:
> > > > https://gitorious.org/ijc/u-boot jetson-psci-v1
> > > > 
> > > > I would Ack your patch, but I don't think you've posted it and it has no
> > > > S-o-b so that would seem a bit premature/rude of me. For the same reason
> > > > I've not actually included it in the series posted (but it is in the
> > > > gitorious branch).
> > > 
> > > Feel free to take ownership of that patch. I currently don't have the
> > > time to work on this and it seems you've made good progress on it.
> > > 
> > > It could probably use some cleanup because there's a bit of debug output
> > > still in there. Also...
> > > 
> > > > FWIW I think you could drop your stub versions of psci_cpu_off and
> > > > psci_cpu_suspend (assuming you don't want to implement them) since the
> > > > common code has stubs.
> > > 
> > > ... I'd think you'd need to implement these so that you can get proper
> > > suspend/resume support in the kernel. I've had to disable cpuidle (via
> > > #undef CONFIG_PM_SLEEP in arch/arm/mach-tegra/cpuidle-tegra114.c) in the
> > > kernel to make that code not powergate CPUs. Ideally I think the kernel
> > > would check that it's running with PSCI support and disable the cpuidle
> > > driver. Maybe that could be done by introducing a new cpuidle driver
> > > that checks for PSCI availability and uses it when present.
> > 
> > We have a generic CPUidle driver on arm64 which can use PSCI as a
> > backend; we should try to reuse that. The binding should certainly be
> > identical.
> 
> Is there any reason that driver needs to be ARM64-specific? I would've
> thought that there could be a generic PSCI driver that works anywhere.

Currently the arm and arm64 arch interfaces are a little different, but
with some work the bulk of the code could certainly be made common
(in drivers/firmware, perhaps).

> > It looks like the tegra124 dts in mainline doesn't use enable-method in
> > the DT, so a better option might be to fail early in cpuidle-tegra114.c 
> > if _any_ enable-method is present.
> 
> Yes, that sounds like a good plan. The absence of an enable-method would
> signal that a kernel-native method (if any) should be used.
> 
> And this reminds me that we still need to find a way to synchronize
> accesses to the powergate registers between secure firmware and the
> kernel. Tegra has a set of hardware semaphores, but it seems like those
> can only be used to synchronize between AVP and CPU, whereas for PSCI
> we'd need something to synchronize between two CPUs. Do you know of any
> existing mechanism to perform that type of synchronization?
> 
> Perhaps an option would be to add some sort of global lock in the kernel
> which the cpuidle driver can grab before issuing the SMC instruction.

PSCI assumes that the FW is in full control of the registers it's
poking. While a lock isn't necessarily bad, I suspect it's going to be
very difficult to have that common across all users without the code
becoming unmaintainable fast. I'd also hope that for arm64 we wouldn't
need it.

When/how/why does the kernel to poke these registers?

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 6/9] ARMv8: PCSI: Add generic ARMv8 PSCI code

2015-01-15 Thread Mark Rutland
Hi,

On Mon, Jan 12, 2015 at 08:56:45PM +, Arnab Basu wrote:
> Implement core support for PSCI. As this is generic code, it doesn't
> implement anything really useful (all the functions are returning
> Not Implemented).
> 
> This is largely ported from the similar code that exists for ARMv7
> 
> Signed-off-by: Arnab Basu 
> Cc: Bhupesh Sharma 
> Cc: Marc Zyngier 
> ---
>  arch/arm/cpu/armv8/Makefile  |   3 +-
>  arch/arm/cpu/armv8/psci.S| 162 
> +++
>  arch/arm/include/asm/armv8/esr.h |  12 +++
>  3 files changed, 176 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/cpu/armv8/psci.S
>  create mode 100644 arch/arm/include/asm/armv8/esr.h
> 
> diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile
> index 74c32b2..1c696ea 100644
> --- a/arch/arm/cpu/armv8/Makefile
> +++ b/arch/arm/cpu/armv8/Makefile
> @@ -16,4 +16,5 @@ obj-y   += tlb.o
>  obj-y+= transition.o
>  obj-y+= cpu-dt.o
>  
> -obj-$(CONFIG_FSL_LSCH3) += fsl-lsch3/
> +obj-$(CONFIG_ARMV8_PSCI) += psci.o
> +obj-$(CONFIG_FSL_LSCH3)  += fsl-lsch3/
> diff --git a/arch/arm/cpu/armv8/psci.S b/arch/arm/cpu/armv8/psci.S
> new file mode 100644
> index 000..6028020
> --- /dev/null
> +++ b/arch/arm/cpu/armv8/psci.S
> @@ -0,0 +1,162 @@
> +/*
> + * (C) Copyright 2014
> + * Arnab Basu 
> + * (C) Copyright 2015
> + * Arnab Basu 
> + *
> + * Based on arch/arm/cpu/armv7/psci.S
> + *
> + * SPDX-License-Identifier:  GPL-2.0+
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#define PSCI_FN(__id, __fn) \
> +.quad __id; \
> +.quad __fn
> +
> +.pushsection ._secure.text, "ax"
> +
> +ENTRY(psci_0_2_cpu_suspend_64)
> +ENTRY(psci_0_2_cpu_on_64)
> +ENTRY(psci_0_2_affinity_info_64)
> +ENTRY(psci_0_2_migrate_64)
> +ENTRY(psci_0_2_migrate_info_up_cpu_64)
> + mov x0, #ARM_PSCI_RET_NI/* Return -1 (Not Implemented) */
> + ret
> +ENDPROC(psci_0_2_cpu_suspend_64)
> +ENDPROC(psci_0_2_cpu_on_64)
> +ENDPROC(psci_0_2_affinity_info_64)
> +ENDPROC(psci_0_2_migrate_64)
> +ENDPROC(psci_0_2_migrate_info_up_cpu_64)
> +.weak psci_0_2_cpu_suspend_64
> +.weak psci_0_2_cpu_on_64
> +.weak psci_0_2_affinity_info_64
> +.weak psci_0_2_migrate_64
> +.weak psci_0_2_migrate_info_up_cpu_64

You also need to have MIGRATE_INFO_TYPE, which I didn't spot here or
elsewhere in this patch.

We need that to be able to detect and handle UP Trusted OSs (i.e.
firmware) which requires a particular CPU to remain enabled at all
times. While mainline Linux doesn't have that yet, it's coming shortly.

Do you require that a particular CPU remains online? If so you will need
to have MIGRATE_INFO_TYPE return 1, and MIGRATE_INFO_UP_CPU return the
MPIDR.Aff* fields for the CPU which must remain on.

If any arbitrary CPU can be disabled, it should return 2 (Trusted OS is
either not present or does not require migration).

Thanks,
Mark

> +
> +ENTRY(psci_0_2_psci_version)
> + mov x0, #2  /* Return Major = 0, Minor = 2*/
> + ret
> +ENDPROC(psci_0_2_psci_version)
> +
> +.align 4
> +_psci_0_2_table:
> + PSCI_FN(PSCI_0_2_FN_PSCI_VERSION, psci_0_2_psci_version)
> + PSCI_FN(PSCI_0_2_FN64_CPU_SUSPEND, psci_0_2_cpu_suspend_64)
> + PSCI_FN(PSCI_0_2_FN64_CPU_ON, psci_0_2_cpu_on_64)
> + PSCI_FN(PSCI_0_2_FN64_AFFINITY_INFO, psci_0_2_affinity_info_64)
> + PSCI_FN(PSCI_0_2_FN64_MIGRATE, psci_0_2_migrate_64)
> + PSCI_FN(PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU, 
> psci_0_2_migrate_info_up_cpu_64)
> + PSCI_FN(0, 0)
> +
> +.macro   psci_enter
> + stp x29, x30, [sp, #-16]!
> + stp x27, x28, [sp, #-16]!
> + stp x25, x26, [sp, #-16]!
> + stp x23, x24, [sp, #-16]!
> + stp x21, x22, [sp, #-16]!
> + stp x19, x20, [sp, #-16]!
> +str x18, [sp, #-8]!
> +mrs x16, sp_el0
> +mrs x15, elr_el3
> + stp x15, x16, [sp, #-16]!
> +
> + /* Switching to Secure State to Execute U-Boot */
> + mrs x4, scr_el3
> + bic x4, x4, #1
> + msr scr_el3, x4
> +.endm
> +
> +.macro   psci_return
> + /* Switching to Non-Secure State to Execute OS */
> + mrs x4, scr_el3
> + orr x4, x4, #1
> + msr scr_el3, x4
> +
> +ldp x15, x16, [sp], #16
> +msr elr_el3, x15
> +msr sp_el0, x16
> +ldr x18, [sp], #8
> + ldp x19, x20, [sp], #16
> + ldp x21, x22, [sp], #16
> + ldp x23, x24, [sp], #16
> + ldp x25, x26, [sp], #16
> + ldp x27, x28, [sp], #16
> + ldp x29, x30, [sp], #16
> + eret
> +.endm
> +
> +ENTRY(_smc_psci)
> + psci_enter
> + adr x4, _psci_0_2_table
> +1:   ldp x5, x6, [x4]  /* Load PSCI function ID and target PC */
> + cbz x5, fn_not_found  /* If reach the end, bail out */
> + cmp x0, x5/* If not matching, try next entry */
> + b.eqfn_call
> + add x4, x4, #16
> +  

Re: [U-Boot] [PATCH v1 0/4] Jetson-TK1 support for PSCI

2015-01-15 Thread Mark Rutland
On Wed, Jan 14, 2015 at 07:57:25AM +, Thierry Reding wrote:
> On Tue, Jan 13, 2015 at 07:44:50PM +, Ian Campbell wrote:
> > Hi Thierry,
> > 
> > I needed to boot my Jetson in NS mode (in order to boot Xen) and was
> > investigating the possibility of PSCI support when I discovered that you
> > had already started on it[0]. Hurrah!
> > 
> > I cherry-picked the relevant commit onto u-boot-tegra#master and added a
> > few more patches and now it boots correctly for me, both running Xen
> > (some Xen side patches are needed too) and native Linux.
> > 
> > The main things which was needed was to rebase for some recent Kconfig
> > changes relating to virt and nonsec mode and to arrange for the RAM used
> > by the secure code to be reserved in the FDT. I also reserved the RAM
> > using the hardware MC_SECURITY_CFG registers for good measure.
> 
> Great, those were all things that I had wanted to do but never got
> around to.
> 
> > I also pushed my tree to gitorious:
> > https://gitorious.org/ijc/u-boot jetson-psci-v1
> > 
> > I would Ack your patch, but I don't think you've posted it and it has no
> > S-o-b so that would seem a bit premature/rude of me. For the same reason
> > I've not actually included it in the series posted (but it is in the
> > gitorious branch).
> 
> Feel free to take ownership of that patch. I currently don't have the
> time to work on this and it seems you've made good progress on it.
> 
> It could probably use some cleanup because there's a bit of debug output
> still in there. Also...
> 
> > FWIW I think you could drop your stub versions of psci_cpu_off and
> > psci_cpu_suspend (assuming you don't want to implement them) since the
> > common code has stubs.
> 
> ... I'd think you'd need to implement these so that you can get proper
> suspend/resume support in the kernel. I've had to disable cpuidle (via
> #undef CONFIG_PM_SLEEP in arch/arm/mach-tegra/cpuidle-tegra114.c) in the
> kernel to make that code not powergate CPUs. Ideally I think the kernel
> would check that it's running with PSCI support and disable the cpuidle
> driver. Maybe that could be done by introducing a new cpuidle driver
> that checks for PSCI availability and uses it when present.

We have a generic CPUidle driver on arm64 which can use PSCI as a
backend; we should try to reuse that. The binding should certainly be
identical.

It looks like the tegra124 dts in mainline doesn't use enable-method in
the DT, so a better option might be to fail early in cpuidle-tegra114.c 
if _any_ enable-method is present.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 1/2] Errata/ARM57: Add basic constructs to handle and apply A57 specific erratas

2015-01-15 Thread Mark Rutland
On Thu, Jan 15, 2015 at 06:10:57AM +, bhupesh.sha...@freescale.com wrote:
> Hi York,
> 
> > -Original Message-
> > From: Sun York-R58495
> > Sent: Wednesday, January 14, 2015 9:44 PM
> > On 01/14/2015 05:46 AM, Bhupesh Sharma wrote:
> > > This patch adds basic constructs in the ARMv8 u-boot code to handle
> > > and apply Cortex-A57 specific erratas.
> > >
> > > As and example, the framework showcases how erratas 833069, 826974 and
> > > 828024 can be handled and applied.
> > >
> > > Later on this framework can be extended to include other erratas.
> > >
> > > Signed-off-by: Bhupesh Sharma 
> > > ---
> > > Changes from v1:
> > >   - Addressed York's comment about x29 usage and calling the
> > > core errata fxup function before the lowlevel_init function
> > > is called.
> > >
> > >  arch/arm/cpu/armv8/start.S   |   51
> > ++
> > >  arch/arm/include/asm/macro.h |   20 +
> > >  2 files changed, 71 insertions(+)
> > >
> > > diff --git a/arch/arm/cpu/armv8/start.S b/arch/arm/cpu/armv8/start.S
> > > index 4b11aa4..df532f9 100644
> > > --- a/arch/arm/cpu/armv8/start.S
> > > +++ b/arch/arm/cpu/armv8/start.S
> > > @@ -67,6 +67,9 @@ reset:
> > >   msr cpacr_el1, x0   /* Enable FP/SIMD */
> > >  0:
> > >
> > > + /* Apply ARM core specific erratas */
> > > + bl  apply_core_errata
> > > +
> > >   /*
> > >* Cache/BPB/TLB Invalidate
> > >* i-cache is invalidated before enabled in icache_enable() @@ -
> > 97,6
> > > +100,54 @@ master_cpu:
> > >
> > >
> > > /*
> > > ---*/
> > >
> > > +WEAK(apply_core_errata)
> > > +
> > > + /* For now, we support Cortex-A57 specific errata only */
> > > +
> > > + /* Check if we are running on a Cortex-A57 core */
> > > + branch_if_a57_core x0, 1f
> > > + b   2f
> > > +1:
> > > + bl  apply_a57_core_errata
> > > +
> > > +2:
> > > + ret
> > > +ENDPROC(apply_core_errata)
> > > +
> > 
> > Bhupesh,
> > 
> > Have you tested the new code? I don't think it handles LR correctly. Your
> > code will be stuck.
> > 
> 
> Yes, I have tested this on both the LS2085A simulator and emulator platforms.
> On emulator I tried u-boot boot-to-prompt and on simulator I tried linux 
> boot-to-prompt.
> Both seem to be working fine.

Has the apply_a57_core_errata function definitely been called in your
tests?

If so the lr should point immediately after it (i.e. at the ret), and so
the ret should branch to itself, repeatedly.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/8] PSCI v0.2 framework for ARMv8

2014-12-03 Thread Mark Rutland
On Wed, Nov 26, 2014 at 12:52:11PM +, Jan Kiszka wrote:
> On 2014-08-27 22:29, Arnab Basu wrote:
> > This series of patches creates a generic PSCI v0.2 framework for ARMv8.
> > 
> > The first 3 patches refactor existing code so that ARMv7 PSCI,
> > ARMv8 spin-table and ARMv8 PSCI can coexist.
> > 
> > The next 5 patches create a generic framework for PSCI v0.2 in ARMv8.
> > 
> > The implementation is modelled on the pre-existing PSCI v0.1 support
> > in ARMv7.
> > 
> > PSCI support patches for the ARMv8 Foundation model will follow shortly.
> > 
> 
> What's the status of this effort? I'll look into v0.2 support for v7
> soon, so I was wondering if there is something recent to possibly build
> upon / derive from.

When I asked on linux-arm-kernel [1], I was told that Arnab Basu has
left Freescale, but intends to continue with this series. I don't know
if we're likely to see anything newer at any point soon.

Thanks,
Mark.

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/307820.html
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH] ARM: bootm: Allow booting in secure mode on hyp capable systems

2014-10-15 Thread Mark Rutland
[...]

> Other than this, are you really happy about granting the users full
> rights to allow booting the kernel in the secure mode via a simple
> environment variables tweak? Can't it potentially become a security
> breach in some scenarios?

U-Boot must be running in secure mode in order to boot a kernel in
secure mode. If U-Boot has been placed in secure mode with such an
option, there is obviously nothing in the secure world to protect. As
the user is in charge of booting the kernel, there is nothing in the
normal world to protect.

There is no security breach here.

> > > Or are you saying that it is really impossible to distinguish your
> > > use case of having the appended DT without resorting to the use of the
> > > environment config options?
> > 
> > Think of it. How do you find out about what the kernel wants? This is
> > just a blob...
> 
> The FDT blob has a header with an easily recognisable signature. So we
> can see the difference between the FDT and FEX blobs if the blob is
> provided to u-boot. And if no blob is provided at all, then we are sure
> that it can't be booted by the sunxi-3.4 kernel.

FEX vs DT is specific to sunxi, whereas an explcit boot mode option is
more generally useful. It is possible to have a kernel which can boot in
either mode, where the security state the kernel runs in is a user
choice, regardless of the presence or absence of a DTB.

Trying to guess how an OS will react and working around that is only
going to cause problems when that OS changes over time.

> I can see only one theoretically problematic scenario, where u-boot is
> provided with the non-FDT and non-FEX blob, but loads a kernel, which
> has FDT statically compiled in. How does this actually play with PSCI?

It would be completely orthogonal, just as the presence or absence of a
DTB is orthogonal to the presence or absence of PSCI

> And what about the new device drivers model, which is going to depend
> on FDT information itself? Are we really happy allowing to use different
> FDT blobs for the u-boot and the kernel in the same system?

There are already differences between what U-Boot needs to know and the
kernel needs to know, e.g. secure peripherals if the kernel is booted in
a non-secure mode. So in general you might need separate DTBs; the
physical address spaces are different.

> Or have I missed something?
> 
> Either way, following the least surprise principle, IMHO u-boot should
> log the reason for making a decision about whether it is switching to
> the non-secure mode or not. This is useful for troubleshooting.

Printing a message would make sense regardless of how the mode is
selected.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 6/8] ARMv8: PSCI: Fixup the device tree for PSCI v0.2

2014-09-03 Thread Mark Rutland
On Tue, Sep 02, 2014 at 04:21:24PM +0100, Stuart Yoder wrote:
> > > The idea here is that if there is no PSCI specific (most likely secure)
> > > memory allocated in the system, the macro "CONFIG_ARMV8_SECURE_BASE"
> > > will not be defined. In this case the PSCI vector table and its support
> > > code will be in DDR and will be protected from Linux using memreserve.
> > 
> > Sure, this will prevent the OS from explicitly modifying this memory.
> > 
> > However, the OS will still map the memory. This renders the protection
> > incomplete due to the possibility of mismatched attributes and/or
> > unexpected cache hits resulting in nasty coherency problems. We are
> > likely to get away with this most of the time (if the kernel and U-Boot
> > use the same attributes), but it would be very easy to blow things up
> > accidentally.
> > 
> > The only way to prevent that is to completely remove a portion of the
> > memory from the view of the OS, such that it doesn't map the memory at
> > all.
> 
> Can't this be done by simply removing that secure portion of memory
> from the memory advertised in the memory node of the device tree passed
> to the non-secure OS?  ...should prevent the OS from mapping the memory.

Yes, removing such memory entirely from the memory nodes would work.

The only caveat (I believe) is that it would be necessary to remove such
memory in 2MB naturally-aligned chunks due to the way Linux maps memory.

I intend to at some point decouple the Linux linear mapping from the
text mapping, so that Linux can address meemory below it. So it's vital
to remove the memory enitrely from the view of the kernel rather than
just loading the kernel 2MB higher.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 6/8] ARMv8: PSCI: Fixup the device tree for PSCI v0.2

2014-09-02 Thread Mark Rutland
On Mon, Sep 01, 2014 at 07:43:18PM +0100, Mark Rutland wrote:
> Hi,
> 
> > >> diff --git a/arch/arm/cpu/armv8/cpu-dt.c b/arch/arm/cpu/armv8/cpu-dt.c
> > >> index 9792bc0..c2c8fe7 100644
> > >> --- a/arch/arm/cpu/armv8/cpu-dt.c
> > >> +++ b/arch/arm/cpu/armv8/cpu-dt.c
> > >> @@ -9,7 +9,69 @@
> > >>   #include 
> > >>
> > >>   #ifdef CONFIG_MP
> > >> +#ifdef CONFIG_ARMV8_PSCI
> > >>
> > >> +static void psci_reserve_mem(void *fdt)
> > >> +{
> > >> +#ifndef CONFIG_ARMV8_SECURE_BASE
> > >> +/* secure code lives in RAM, keep it alive */
> > >> +fdt_add_mem_rsv(fdt, (unsigned long)__secure_start,
> > >> +__secure_end - __secure_start);
> > >> +#endif
> > >> +}
> > >
> > > With PSCI I'd be worried about telling the OS about this memory at all.
> > >
> > > If the OS maps the memory we could encounter issues with mismatched
> > > aliases and/or unexpected hits in the D-cache, which can result in a
> > > loss of ordering and/or visbility guarantees, which could break the PSCI
> > > implementation.
> > >
> > > With the KVM or trusted firmware PSCI implementations the (guest) OS
> > > cannot map the implementation's memory, preventing such problems. The
> > > arm64 Linux boot-wrapper is dodgy in this regard currently.
> > >
> > 
> > The idea here is that if there is no PSCI specific (most likely secure) 
> > memory allocated in the system, the macro "CONFIG_ARMV8_SECURE_BASE" 
> > will not be defined. In this case the PSCI vector table and its support 
> > code will be in DDR and will be protected from Linux using memreserve.
> 
> Sure, this will prevent the OS from explicitly modifying this memory.
> 
> However, the OS will still map the memory. This renders the protection
> incomplete due to the possibility of mismatched attributes and/or
> unexpected cache hits resulting in nasty coherency problems. We are
> likely to get away with this most of the time (if the kernel and U-Boot
> use the same attributes), but it would be very easy to blow things up
> accidentally.
> 
> The only way to prevent that is to completely remove a portion of the
> memory from the view of the OS, such that it doesn't map the memory at
> all.

To clarify:

If the PSCI implementation uses some memory not described to the OS
there is no problem. Ideally this would be some secure SRAM somwhere,
which the OS can never map. So if you are using some secure RAM then
there is no issue.

If the memory is described to the non-secure OS, then there can be
coherency issues unless either:

 * The caches are not in use at EL3. This necessitates something like
   bakery locks for synchronization.

 * The memory is mapped at EL3 as secure, and the core makes a
   distinction between secure and non-secure memory (see
   ID_AA64MMFR0_EL1.SNSMem). Otherwise misatched attributes can cause
   coherency issues (see B2.9 "Mismatched memory attributes" in the ARM
   ARM).

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 6/8] ARMv8: PSCI: Fixup the device tree for PSCI v0.2

2014-09-01 Thread Mark Rutland
Hi,

> >> diff --git a/arch/arm/cpu/armv8/cpu-dt.c b/arch/arm/cpu/armv8/cpu-dt.c
> >> index 9792bc0..c2c8fe7 100644
> >> --- a/arch/arm/cpu/armv8/cpu-dt.c
> >> +++ b/arch/arm/cpu/armv8/cpu-dt.c
> >> @@ -9,7 +9,69 @@
> >>   #include 
> >>
> >>   #ifdef CONFIG_MP
> >> +#ifdef CONFIG_ARMV8_PSCI
> >>
> >> +static void psci_reserve_mem(void *fdt)
> >> +{
> >> +#ifndef CONFIG_ARMV8_SECURE_BASE
> >> +  /* secure code lives in RAM, keep it alive */
> >> +  fdt_add_mem_rsv(fdt, (unsigned long)__secure_start,
> >> +  __secure_end - __secure_start);
> >> +#endif
> >> +}
> >
> > With PSCI I'd be worried about telling the OS about this memory at all.
> >
> > If the OS maps the memory we could encounter issues with mismatched
> > aliases and/or unexpected hits in the D-cache, which can result in a
> > loss of ordering and/or visbility guarantees, which could break the PSCI
> > implementation.
> >
> > With the KVM or trusted firmware PSCI implementations the (guest) OS
> > cannot map the implementation's memory, preventing such problems. The
> > arm64 Linux boot-wrapper is dodgy in this regard currently.
> >
> 
> The idea here is that if there is no PSCI specific (most likely secure) 
> memory allocated in the system, the macro "CONFIG_ARMV8_SECURE_BASE" 
> will not be defined. In this case the PSCI vector table and its support 
> code will be in DDR and will be protected from Linux using memreserve.

Sure, this will prevent the OS from explicitly modifying this memory.

However, the OS will still map the memory. This renders the protection
incomplete due to the possibility of mismatched attributes and/or
unexpected cache hits resulting in nasty coherency problems. We are
likely to get away with this most of the time (if the kernel and U-Boot
use the same attributes), but it would be very easy to blow things up
accidentally.

The only way to prevent that is to completely remove a portion of the
memory from the view of the OS, such that it doesn't map the memory at
all.

> If this macro is defined the assumption is that it points to some 
> non-ddr location, say secure OCRAM. In this case U-Boot will copy the 
> PSCI vector table and its support code to that region and we are hoping 
> that this address space is not visible to the OS in the first place.

This makes sense, but was not the issue I was referring to.

> This is my understanding of the code, maybe Marc would like to comment 
> on if this was the thinking in ARMv7.

If we're doing this on ARMv7 then it is dodgy there too.

Marc, thoughts?

[...]

> >> +  }
> >> +
> >> +  nodeoff = fdt_path_offset(fdt, "/psci");
> >
> > We might need to search by compatible string. All psci nodes so far have
> > been called /psci, but that's not guaranteed. Linux looks for nodes
> > compatible with "arm,psci" and/or "arm,psci-0.2".
> >
> 
> I see that it is called "Main node" in the kernel documentation. Any 
> reason it's name has not been fixed to "psci"? Is it too late to do that 
> and save myself some work here? :)

Unfortunately the canonical way to find the PSCI node is by compatible
string, and that's what Linux does. While we might be able to ensure all
in-tree dts follow this convention, it's not something that should be
relied upon.

Sorry :(

Cheers,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 2/8] ARM: PSCI: Alow arch specific DT patching

2014-08-28 Thread Mark Rutland
On Thu, Aug 28, 2014 at 11:51:08AM +0100, Arnab Basu wrote:
> Hi Mark
> 
> On 08/28/2014 03:40 PM, Mark Rutland wrote:
> > Hi Arnab,
> >
> > On Wed, Aug 27, 2014 at 09:29:55PM +0100, Arnab Basu wrote:
> >> Both ARMv7 and ARMv8 need to patch the device tree but the kind
> >> of patching done is different. This creates a function that can be
> >> defined by each architecture to handle the differences
> >
> > I have no problem with the patch, but what is it that we need to do
> > differently for ARMv7 and ARMv8?
> >
> 
> In ARMv7 there does not seem to be any code around to set 
> "enable-method" to "spin-table". I guess it is assumed that DTs already 
> come with this set.

That'll be a consequence of partial conversion frmo board file on the
32-bit side. For most platforms the SMP boot mechanism is still
implicit.

> For ARMv8 we would like to assume that "enable-method" is missing from 
> the cpu node and will be set either to "spin-table" or "psci" by the 
> boot loader.

Sounds good to me.

> So the difference is, for ARMv7 the "enable-method" is only modified in 
> case of PSCI, whereas for ARMv8 it is always modified.

Ok. Thanks for the description. :)

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 6/8] ARMv8: PSCI: Fixup the device tree for PSCI v0.2

2014-08-28 Thread Mark Rutland
Hi,

On Wed, Aug 27, 2014 at 09:29:59PM +0100, Arnab Basu wrote:
> Set the enable-method in the cpu node to psci, create the psci
> device tree node and also add a reserve section for the psci code
> that lives in in normal RAM, so that the kernel leaves it alone
> 
> Signed-off-by: Arnab Basu 
> Reviewed-by: Bhupesh Sharma 
> Cc: Marc Zyngier 
> ---
>  arch/arm/cpu/armv8/cpu-dt.c   |   67 
> +
>  arch/arm/include/asm/system.h |4 ++
>  2 files changed, 71 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv8/cpu-dt.c b/arch/arm/cpu/armv8/cpu-dt.c
> index 9792bc0..c2c8fe7 100644
> --- a/arch/arm/cpu/armv8/cpu-dt.c
> +++ b/arch/arm/cpu/armv8/cpu-dt.c
> @@ -9,7 +9,69 @@
>  #include 
>  
>  #ifdef CONFIG_MP
> +#ifdef CONFIG_ARMV8_PSCI
>  
> +static void psci_reserve_mem(void *fdt)
> +{
> +#ifndef CONFIG_ARMV8_SECURE_BASE
> + /* secure code lives in RAM, keep it alive */
> + fdt_add_mem_rsv(fdt, (unsigned long)__secure_start,
> + __secure_end - __secure_start);
> +#endif
> +}

With PSCI I'd be worried about telling the OS about this memory at all.

If the OS maps the memory we could encounter issues with mismatched
aliases and/or unexpected hits in the D-cache, which can result in a
loss of ordering and/or visbility guarantees, which could break the PSCI
implementation.

With the KVM or trusted firmware PSCI implementations the (guest) OS
cannot map the implementation's memory, preventing such problems. The
arm64 Linux boot-wrapper is dodgy in this regard currently.

> +static int cpu_update_dt_psci(void *fdt)
> +{
> + int nodeoff;
> + int tmp;
> +
> + psci_reserve_mem(fdt);
> +
> + nodeoff = fdt_path_offset(fdt, "/cpus");
> + if (nodeoff < 0) {
> + printf("couldn't find /cpus\n");
> + return nodeoff;
> + }
> +
> + /* add 'enable-method = "psci"' to each cpu node */
> + for (tmp = fdt_first_subnode(fdt, nodeoff);
> +  tmp >= 0;
> +  tmp = fdt_next_subnode(fdt, tmp)) {
> + const struct fdt_property *prop;
> + int len;
> +
> + prop = fdt_get_property(fdt, tmp, "device_type", &len);
> + if (!prop)
> + continue;
> + if (len < 4)
> + continue;
> + if (strcmp(prop->data, "cpu"))
> + continue;
> +
> + fdt_setprop_string(fdt, tmp, "enable-method", "psci");

Do we need to check the return code here, as we do when setting up the
psci node?

> + }
> +
> + nodeoff = fdt_path_offset(fdt, "/psci");

We might need to search by compatible string. All psci nodes so far have
been called /psci, but that's not guaranteed. Linux looks for nodes
compatible with "arm,psci" and/or "arm,psci-0.2".

> + if (nodeoff < 0) {
> + nodeoff = fdt_path_offset(fdt, "/");
> + if (nodeoff < 0)
> + return nodeoff;
> +
> + nodeoff = fdt_add_subnode(fdt, nodeoff, "psci");
> + if (nodeoff < 0)
> + return nodeoff;
> + }
> +
> + tmp = fdt_setprop_string(fdt, nodeoff, "compatible", "arm,psci-0.2");
> + if (tmp)
> + return tmp;
> + tmp = fdt_setprop_string(fdt, nodeoff, "method", "smc");
> + if (tmp)
> + return tmp;
> +
> + return 0;
> +}

Otherwise this looks fine.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 5/8] ARMv8: PCSI: Add generic ARMv8 PSCI code

2014-08-28 Thread Mark Rutland
Hi Arnab,

On Wed, Aug 27, 2014 at 09:29:58PM +0100, Arnab Basu wrote:
> Implement core support for PSCI. As this is generic code, it doesn't
> implement anything really useful (all the functions are returning
> Not Implemented).

This is really nice to see! Thanks for working on this.

Some functions which return NOT_IMPLEMENTED below are requried to be
implemented per the PSCI 0.2 spec.

I hope that the plan is to implement all the required functions before
we turn this on?

Otherwise, comments below.

> This is largely ported from the similar code that exists for ARMv7
> 
> Signed-off-by: Arnab Basu 
> Reviewed-by: Bhupesh Sharma 
> Cc: Marc Zyngier 
> ---
>  arch/arm/cpu/armv8/Makefile |1 +
>  arch/arm/cpu/armv8/psci.S   |  171 
> +++
>  2 files changed, 172 insertions(+), 0 deletions(-)
>  create mode 100644 arch/arm/cpu/armv8/psci.S
> 
> diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile
> index 4f0ea87..8f6988d 100644
> --- a/arch/arm/cpu/armv8/Makefile
> +++ b/arch/arm/cpu/armv8/Makefile
> @@ -15,3 +15,4 @@ obj-y   += cache.o
>  obj-y+= tlb.o
>  obj-y+= transition.o
>  obj-y+= cpu-dt.o
> +obj-$(CONFIG_ARMV8_PSCI) += psci.o
> diff --git a/arch/arm/cpu/armv8/psci.S b/arch/arm/cpu/armv8/psci.S
> new file mode 100644
> index 000..5f4e3b2
> --- /dev/null
> +++ b/arch/arm/cpu/armv8/psci.S
> @@ -0,0 +1,171 @@
> +/*
> + * (C) Copyright 2014
> + * Arnab Basu 
> + *
> + * Based on arch/arm/cpu/armv7/psci.S
> + *
> + * SPDX-License-Identifier:  GPL-2.0+
> + */
> +
> +#include 
> +#include 
> +
> +.pushsection ._secure.text, "ax"
> +
> +ENTRY(psci_0_2_cpu_suspend_64)
> +ENTRY(psci_0_2_cpu_on_64)
> +ENTRY(psci_0_2_affinity_info_64)
> +ENTRY(psci_0_2_migrate_64)
> +ENTRY(psci_0_2_migrate_info_up_cpu_64)
> + mov x0, #ARM_PSCI_RET_NI/* Return -1 (Not Implemented) */
> + ret
> +ENDPROC(psci_0_2_cpu_suspend_64)
> +ENDPROC(psci_0_2_cpu_on_64)
> +ENDPROC(psci_0_2_affinity_info_64)
> +ENDPROC(psci_0_2_migrate_64)
> +ENDPROC(psci_0_2_migrate_info_up_cpu_64)
> +.weak psci_0_2_cpu_suspend_64
> +.weak psci_0_2_cpu_on_64
> +.weak psci_0_2_affinity_info_64
> +.weak psci_0_2_migrate_64
> +.weak psci_0_2_migrate_info_up_cpu_64
> +
> +ENTRY(psci_0_2_psci_version)
> + mov x0, #2  /* Return Major = 0, Minor = 2*/
> + ret
> +ENDPROC(psci_0_2_psci_version)
> +
> +.align 4
> +_psci_0_2_table:
> + .quad   PSCI_0_2_FN_PSCI_VERSION
> + .quad   psci_0_2_psci_version
> + .quad   PSCI_0_2_FN64_CPU_SUSPEND
> + .quad   psci_0_2_cpu_suspend_64
> + .quad   PSCI_0_2_FN64_CPU_ON
> + .quad   psci_0_2_cpu_on_64
> + .quad   PSCI_0_2_FN64_AFFINITY_INFO
> + .quad   psci_0_2_affinity_info_64
> + .quad   PSCI_0_2_FN64_MIGRATE
> + .quad   psci_0_2_migrate_64
> + .quad   PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU
> + .quad   psci_0_2_migrate_info_up_cpu_64
> + .quad   0
> + .quad   0

It would be nice if we could reorganise this something like:

.quad PSCI_0_2_FN_PSCI_VERSION, psci_0_2_psci_version
.quad PSCI_0_2_FN64_CPU_SUSPEND,psci_0_2_cpu_suspend_64
.quad PSCI_0_2_FN64_CPU_ON, psci_0_2_cpu_on_64
.quad PSCI_0_2_FN64_AFFINITY_INFO,  psci_0_2_affinity_info_64
.quad PSCI_0_2_FN64_MIGRATE,psci_0_2_migrate_64
.quad PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU,
psci_0_2_migrate_info_up_cpu_64
.quad 0,0

As that would make the relationship between IDs and functions clearer
(at least to me). Maybe a macro could make this less painful.

> +.macro   psci_enter
> + stp x29, x30, [sp, #-16]!
> + stp x27, x28, [sp, #-16]!
> + stp x25, x26, [sp, #-16]!
> + stp x23, x24, [sp, #-16]!
> + stp x21, x22, [sp, #-16]!
> + stp x19, x20, [sp, #-16]!
> + stp x17, x18, [sp, #-16]!
> + stp x15, x16, [sp, #-16]!
> + stp x13, x14, [sp, #-16]!
> + stp x11, x12, [sp, #-16]!
> + stp x9, x10, [sp, #-16]!
> + stp x7, x8, [sp, #-16]!
> + stp x5, x6, [sp, #-16]!
> + mrs x5, elr_el3
> + stp x5, x4, [sp, #-16]!
> +
> + /* EL0 and El1 will execute in secure */

I think this would be better as:

/* U-Boot will run on the secure side */

> + mrs x4, scr_el3
> + bic x4, x4, #1
> + msr scr_el3, x4
> +.endm
> +
> +.macro   psci_return
> + /* EL0 and El1 will execute in non-secure */

Similarly:

/* The OS will run on the non-secure side */

> + mrs x4, scr_el3
> + orr x4, x4, #1
> + msr scr_el3, x4
> +
> + ldp x5, x4, [sp], #16
> + msr elr_el3, x5
> + ldp x5, x6, [sp], #16
> + ldp x7, x8, [sp], #16
> + ldp x9, x10, [sp], #16
> + ldp x11, x12, [sp], #16
> + ldp x13, x14, [sp], #16
> + ldp x15, x16, [sp], #16
> + ldp 

Re: [U-Boot] [PATCH 2/8] ARM: PSCI: Alow arch specific DT patching

2014-08-28 Thread Mark Rutland
Hi Arnab,

On Wed, Aug 27, 2014 at 09:29:55PM +0100, Arnab Basu wrote:
> Both ARMv7 and ARMv8 need to patch the device tree but the kind
> of patching done is different. This creates a function that can be
> defined by each architecture to handle the differences

I have no problem with the patch, but what is it that we need to do
differently for ARMv7 and ARMv8?

Mark.

> Signed-off-by: Arnab Basu 
> Reviewed-by: Bhupesh Sharma 
> Cc: Marc Zyngier 
> ---
>  arch/arm/cpu/armv7/virt-dt.c |7 ++-
>  arch/arm/lib/bootm-fdt.c |   11 ---
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv7/virt-dt.c b/arch/arm/cpu/armv7/virt-dt.c
> index 0b0d6a7..3fbec39 100644
> --- a/arch/arm/cpu/armv7/virt-dt.c
> +++ b/arch/arm/cpu/armv7/virt-dt.c
> @@ -88,7 +88,7 @@ static int fdt_psci(void *fdt)
>   return 0;
>  }
>  
> -int armv7_update_dt(void *fdt)
> +static int armv7_update_dt(void *fdt)
>  {
>  #ifndef CONFIG_ARMV7_SECURE_BASE
>   /* secure code lives in RAM, keep it alive */
> @@ -98,3 +98,8 @@ int armv7_update_dt(void *fdt)
>  
>   return fdt_psci(fdt);
>  }
> +
> +int cpu_update_dt(void *fdt)
> +{
> + return armv7_update_dt(fdt);
> +}
> diff --git a/arch/arm/lib/bootm-fdt.c b/arch/arm/lib/bootm-fdt.c
> index d4f1578..daabc03 100644
> --- a/arch/arm/lib/bootm-fdt.c
> +++ b/arch/arm/lib/bootm-fdt.c
> @@ -21,6 +21,11 @@
>  
>  DECLARE_GLOBAL_DATA_PTR;
>  
> +__weak int cpu_update_dt(void *fdt)
> +{
> + return 0;
> +}
> +
>  int arch_fixup_fdt(void *blob)
>  {
>   bd_t *bd = gd->bd;
> @@ -34,11 +39,11 @@ int arch_fixup_fdt(void *blob)
>   }
>  
>   ret = fdt_fixup_memory_banks(blob, start, size, CONFIG_NR_DRAM_BANKS);
> -#if defined(CONFIG_ARMV7_NONSEC) || defined(CONFIG_ARMV7_VIRT)
> +
>   if (ret)
>   return ret;
>  
> - ret = armv7_update_dt(blob);
> -#endif
> + ret = cpu_update_dt(blob);
> +
>   return ret;
>  }
> -- 
> 1.7.7.4
> 
> 
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [Patch v2 3/5] armv8/fsl-lsch3: Release secondary cores from boot hold off with Boot Page

2014-08-22 Thread Mark Rutland
Hi York,

> >> -   /*
> >> -* All processors will enter EL2 and optionally EL1.
> >> +slave_cpu:
> >> +   wfe
> >> +#ifdef CONFIG_FSL_SMP_RELEASE_ALL
> >> +   /* All cores are released from the address in the 1st spin table
> >> +* element
> >>  */
> >> -   bl  armv8_switch_to_el2
> >> -#ifdef CONFIG_ARMV8_SWITCH_TO_EL1
> >> -   bl  armv8_switch_to_el1
> >> +   ldr x1, =__spin_table
> >> +   ldr x0, [x1]
> >> +#else
> >> +   ldr x0, [x11]
> >> +#endif
> >> +   cbz x0, slave_cpu
> >
> > Similarly is there any reason to have the option of a single release
> > addr if we can support unique addresses?
> 
> I think it was used by Linux for some ARM parts. I personally not a fun of 
> using
> single release.

That makes two of us. The single release address on those ARM dts is a
legacy mistake that we can't fix up without breaking some models. We
don't need to propagate that mistake to new platforms.

> But if it makes everyone happy, I can keep it.

I'd be happier with CONFIG_FSL_SMP_RELEASE_ALL dropped entirely. Ideally
U-Boot would always provide a unique cpu-release-address for each CPU. 

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [Patch v2 3/5] armv8/fsl-lsch3: Release secondary cores from boot hold off with Boot Page

2014-08-22 Thread Mark Rutland
Hi Bhupesh,

[...]

> > >> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> > >> b/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> > >> new file mode 100644
> > >> index 000..2dbcdcb
> > >> --- /dev/null
> > >> +++ b/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> > >> @@ -0,0 +1,56 @@
> > >> +/*
> > >> + * Copyright 2014 Freescale Semiconductor, Inc.
> > >> + *
> > >> + * SPDX-License-Identifier:GPL-2.0+
> > >> + */
> > >> +
> > >> +#include 
> > >> +#include 
> > >> +#include 
> > >> +#include "mp.h"
> > >> +
> > >> +#ifdef CONFIG_MP
> > >> +void ft_fixup_cpu(void *blob)
> > >> +{
> > >> +   int off;
> > >> +   __maybe_unused u64 spin_tbl_addr = (u64)get_spin_tbl_addr();
> > >> +   fdt32_t *reg;
> > >> +   int addr_cells;
> > >> +   u64 val;
> > >> +   size_t *boot_code_size = &(__secondary_boot_code_size);
> > >> +
> > >> +   off = fdt_node_offset_by_prop_value(blob, -1, "device_type",
> > >> + "cpus", 4);
> > >
> > > I didn't think /cpus had device_type = "cpus". I can't see any
> > > instances in any DTs I have to hand. Can we not find /cpus by path?
> >
> > I will let Arnab to comment on this. He is coordinating with Linux device
> > tree.
> 
> Since I contribute to the DTS for FSL ARMv8 SoC, here is my rationale behind 
> the same.
> I have used the standard ARM cpu device-tree binding documentation as a 
> reference (see [1])
> which defined the device_type which it mentions should be set to cpu.
> 
> Please let me know if I am missing something.
> 
> [1] https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/cpus.txt

Hi. As Arnab replied, it's only the CPU nodes themselves (e.g.
/cpus/cpu@0) that have device_type = "cpu". The /cpus node does not have
a device_type at all.

The /cpus node can always be found by path however, as the name is
special.

[...]

> > >> --- a/arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S
> > >> +++ b/arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S
> > >> @@ -8,7 +8,9 @@
> > >>
> > >>  #include 
> > >>  #include 
> > >> +#include 
> > >>  #include 
> > >> +#include "mp.h"
> > >>
> > >>  ENTRY(lowlevel_init)
> > >> mov x29, lr /* Save LR */
> > >> @@ -37,29 +39,119 @@ ENTRY(lowlevel_init)
> > >>
> > >> branch_if_master x0, x1, 1f
> > >>
> > >> +   ldr x0, =secondary_boot_func
> > >> +   blr x0
> > >> +
> > >> +1:
> > >> +2:
> > >
> > > Isn't the '2' label redundant?
> >
> > We have some internal code dealing with trust zone between the 1 and 2. It
> > is not likely to be used in long term since we are trying to move them
> > into security monitor. I can drop label 2 here.
> 
> U-boot can still be booted in EL3, as it can be well booted w/o a ATF or EL3 
> capable
> s/w running before the same. That's why we have CONFIG_EL3 and CONFIG_EL2 
> code legs
> in the u-boot ARMv8 code.

Ok. I was only confused by the fact the label didn't seem to be used
anywhere, and it sounds like it can be dropped for now?

[...]

> > >> -   /*
> > >> -* All processors will enter EL2 and optionally EL1.
> > >> +slave_cpu:
> > >> +   wfe
> > >> +#ifdef CONFIG_FSL_SMP_RELEASE_ALL
> > >> +   /* All cores are released from the address in the 1st spin
> > table
> > >> +* element
> > >>  */
> > >> -   bl  armv8_switch_to_el2
> > >> -#ifdef CONFIG_ARMV8_SWITCH_TO_EL1
> > >> -   bl  armv8_switch_to_el1
> > >> +   ldr x1, =__spin_table
> > >> +   ldr x0, [x1]
> > >> +#else
> > >> +   ldr x0, [x11]
> > >> +#endif
> > >> +   cbz x0, slave_cpu
> > >
> > > Similarly is there any reason to have the option of a single release
> > > addr if we can support unique addresses?
> >
> > I think it was used by Linux for some ARM parts. I personally not a fun of
> > using single release. But if it makes everyone happy, I can keep it.
> 
> We followed the standard ARMv8 foundation model DTS initially which along 
> with others
> supported a single release address for all the cores. So, we wanted to comply 
> to the same.

As I mentioned elsewhere, the existing DTS aren't good examples. The
fact that isn't clear is a problem on the Linux side, and I'm sorry that
they have misled you.

Using unique addresses is preferred. Sharing a single address should be
discouraged.

[...]

> > >> diff --git a/arch/arm/include/asm/macro.h
> > >> b/arch/arm/include/asm/macro.h index f77e4b8..0009c28 100644
> > >> --- a/arch/arm/include/asm/macro.h
> > >> +++ b/arch/arm/include/asm/macro.h
> > >> @@ -105,6 +105,98 @@ lr .reqx30
> > >> cbz \xreg1, \master_label
> > >>  .endm
> > >>
> > >> +.macro armv8_switch_to_el2_m, xreg1
> > >> +   mov \xreg1, #0x5b1  /* Non-secure EL0/EL1 | HVC | 64bit EL2
> > */
> > >> +   msr scr_el3, \xreg1
> > >
> > > When dropping to EL1 from EL2 we disable HVC via HCR_EL2; presumably
> > > due to lack of a handler. Would it make sense to do similarly here and
> > > disable SMC here until we have a user (e.g. PSCI)?
> >
> > I will let 

Re: [U-Boot] [Patch v2 3/5] armv8/fsl-lsch3: Release secondary cores from boot hold off with Boot Page

2014-08-22 Thread Mark Rutland
Hi Arnab,

[...]

> >>> Is there any reason to have the SWITCH_TO_EL1 option other than for
> >>> debugging?
> >>
> >> Good question. I will let Arnab to comment here.
> >>
> >>>
> >>> EL2 is the preferred EL to boot at for Linux and Xen (it gives far
> >>> more flexibility), and if dropping to EL1 is necessary I think it
> >>> would make more sense as a run-time option than a compile-time option.
> >>>
> 
> I don't think we plan to boot Linux in EL1. This is primarily here to
> maintain uniformity with "arch/arm/lib/bootm.c". If I remove it from
> here and it was ever defined in the config, then the boot core would
> enter Linux in EL1 while the secondaries entered Linux in EL2. I don't
> know if that breaks anything...

Linux will be very unhappy, we require that CPUs enter the kernel in a
consistent mode. So keeping the CPUs in the same mode is the most
important thing for now.

I don't see a reason other than debugging to boot any CPU at EL1N if EL2
is present (so I'm not keen on the CONFIG_ARMV8_SWITCH_TO_EL1 option at
all). The ideal case would be to always drop to the highest privileged
non-secure mode the CPU supports (i.e. EL2 if present, EL1N otherwise).

If there's an OS that requires EL1N rather than EL2 then I think that
should be the special case (so users get the option of EL2 features by
default, and the OS has more flexibility to fix things up at EL2 if
necessary).

> The run-time option seems interesting and it would definitely work for
> the primary core which could access the u-boot env variables and such
> but the secondaries are executing assembly and the communication between
> cores is fairly primitive (sgi's and sev's etc) so this might require a
> little bit of work.
> 
> If you have any thoughts on how we can go about it, I would be glad to
> do some research, but that seems to be the topic for a separate patchset
> I guess.

I guess if secondaries were first dropped into a spin-table at EL2 you
could boot them into a shim that did something like:

 - Reset the cpu-release-addr for the CPU
 - Configure EL1
 - Drop to EL1
 - Return to the spin-table

So all that would be necessary is the sev and the usual polling to see
that the CPU has responded to the cpu-release-addr being written.

As you say, that's a topic for a separate patchset.

> 
>  -   /*
>  -* All processors will enter EL2 and optionally EL1.
>  +slave_cpu:
>  +   wfe
>  +#ifdef CONFIG_FSL_SMP_RELEASE_ALL
>  +   /* All cores are released from the address in the 1st spin
> >> table
>  +* element
>    */
>  -   bl  armv8_switch_to_el2
>  -#ifdef CONFIG_ARMV8_SWITCH_TO_EL1
>  -   bl  armv8_switch_to_el1
>  +   ldr x1, =__spin_table
>  +   ldr x0, [x1]
>  +#else
>  +   ldr x0, [x11]
>  +#endif
>  +   cbz x0, slave_cpu
> >>>
> >>> Similarly is there any reason to have the option of a single release
> >>> addr if we can support unique addresses?
> >>
> >> I think it was used by Linux for some ARM parts. I personally not a fun of
> >> using single release. But if it makes everyone happy, I can keep it.
> >
> > We followed the standard ARMv8 foundation model DTS initially which along 
> > with others
> > supported a single release address for all the cores. So, we wanted to 
> > comply to the same.
> >
> 
> Yes this is left over code which should (and will) be cleaned up.

The foundation model DTS is unfortunately not a good example, and it's
too late to change it due to existing users.

Linux has supported unique cpu release addresses since the start of the
arm64 port, and it's the preferred implementation. The bootwrapper was a
quick hack to get things booting rather than a reference bootloader. It
does plenty of things wrong that I would like to fix.

I realise that this is not made clear at the moment. Putting together a
document with guidance for bootloaders has been on my TODO list, but
unfortunately it is at the far end.

[...]

>  diff --git a/arch/arm/include/asm/macro.h
>  b/arch/arm/include/asm/macro.h index f77e4b8..0009c28 100644
>  --- a/arch/arm/include/asm/macro.h
>  +++ b/arch/arm/include/asm/macro.h
>  @@ -105,6 +105,98 @@ lr .reqx30
>   cbz \xreg1, \master_label
>    .endm
> 
>  +.macro armv8_switch_to_el2_m, xreg1
>  +   mov \xreg1, #0x5b1  /* Non-secure EL0/EL1 | HVC | 64bit EL2
> >> */
>  +   msr scr_el3, \xreg1
> >>>
> >>> When dropping to EL1 from EL2 we disable HVC via HCR_EL2; presumably
> >>> due to lack of a handler. Would it make sense to do similarly here and
> >>> disable SMC here until we have a user (e.g. PSCI)?
> >>
> >> I will let Arnab to comment here.
> >
> 
> SMC's are disabled (we are setting bit 7, the SMD bit). The comment does
> not capture this. I'll fix it.

Ah, sorry. I attempted to review the hex values manually but evidently I
got confused here. I've j

Re: [U-Boot] [Patch v2 3/5] armv8/fsl-lsch3: Release secondary cores from boot hold off with Boot Page

2014-08-21 Thread Mark Rutland
Hi York,

I have mostly minor comments this time; this is looking pretty good.

On Tue, Aug 19, 2014 at 09:28:00PM +0100, York Sun wrote:
> Secondary cores need to be released from holdoff by boot release
> registers. With GPP bootrom, they can boot from main memory
> directly. Individual spin table is used for each core. If a single
> release address is needed, defining macro CONFIG_FSL_SMP_RELEASE_ALL
> will use the CPU_RELEASE_ADDR. Spin table and the boot page is reserved
> in device tree so OS won't overwrite.
>
> Signed-off-by: York Sun 
> Signed-off-by: Arnab Basu 
> ---
>  v2: Removed copying boot page. Use u-boot image as is in memory.
>  Added dealing with different size of addr_cell in device tree.
>  Added dealing with big- and little-endian.
>  Added flushing spin table after cpu_release().
>
>  arch/arm/cpu/armv8/fsl-lsch3/Makefile |2 +
>  arch/arm/cpu/armv8/fsl-lsch3/cpu.c|   13 ++
>  arch/arm/cpu/armv8/fsl-lsch3/cpu.h|1 +
>  arch/arm/cpu/armv8/fsl-lsch3/fdt.c|   56 +++
>  arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S   |  128 ---
>  arch/arm/cpu/armv8/fsl-lsch3/mp.c |  172 
> +
>  arch/arm/cpu/armv8/fsl-lsch3/mp.h |   36 +
>  arch/arm/cpu/armv8/transition.S   |   63 +---
>  arch/arm/include/asm/arch-fsl-lsch3/config.h  |3 +-
>  arch/arm/include/asm/arch-fsl-lsch3/immap_lsch3.h |   35 +
>  arch/arm/include/asm/macro.h  |   92 +++
>  arch/arm/lib/gic_64.S |   10 +-
>  common/board_f.c  |2 +-
>  13 files changed, 525 insertions(+), 88 deletions(-)
>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/fdt.c
>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/mp.c
>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/mp.h
>
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/Makefile 
> b/arch/arm/cpu/armv8/fsl-lsch3/Makefile
> index 9249537..f920eeb 100644
> --- a/arch/arm/cpu/armv8/fsl-lsch3/Makefile
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/Makefile
> @@ -7,3 +7,5 @@
>  obj-y += cpu.o
>  obj-y += lowlevel.o
>  obj-y += speed.o
> +obj-$(CONFIG_MP) += mp.o
> +obj-$(CONFIG_OF_LIBFDT) += fdt.o
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/cpu.c 
> b/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> index c129d03..47b947f 100644
> --- a/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/cpu.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include "cpu.h"
> +#include "mp.h"
>  #include "speed.h"
>  #include 
>
> @@ -434,3 +435,15 @@ int cpu_eth_init(bd_t *bis)
>  #endif
> return error;
>  }
> +
> +
> +int arch_early_init_r(void)
> +{
> +   int rv;
> +   rv = fsl_lsch3_wake_seconday_cores();
> +
> +   if (rv)
> +   printf("Did not wake secondary cores\n");
> +
> +   return 0;
> +}
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/cpu.h 
> b/arch/arm/cpu/armv8/fsl-lsch3/cpu.h
> index 28544d7..2e3312b 100644
> --- a/arch/arm/cpu/armv8/fsl-lsch3/cpu.h
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/cpu.h
> @@ -5,3 +5,4 @@
>   */
>
>  int fsl_qoriq_core_to_cluster(unsigned int core);
> +u32 cpu_mask(void);
> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/fdt.c 
> b/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> new file mode 100644
> index 000..2dbcdcb
> --- /dev/null
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> @@ -0,0 +1,56 @@
> +/*
> + * Copyright 2014 Freescale Semiconductor, Inc.
> + *
> + * SPDX-License-Identifier:GPL-2.0+
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include "mp.h"
> +
> +#ifdef CONFIG_MP
> +void ft_fixup_cpu(void *blob)
> +{
> +   int off;
> +   __maybe_unused u64 spin_tbl_addr = (u64)get_spin_tbl_addr();
> +   fdt32_t *reg;
> +   int addr_cells;
> +   u64 val;
> +   size_t *boot_code_size = &(__secondary_boot_code_size);
> +
> +   off = fdt_node_offset_by_prop_value(blob, -1, "device_type", "cpus", 
> 4);

I didn't think /cpus had device_type = "cpus". I can't see any
instances in any DTs I have to hand. Can we not find /cpus by path?

> +   of_bus_default_count_cells(blob, off, &addr_cells, NULL);
> +
> +   off = fdt_node_offset_by_prop_value(blob, -1, "device_type", "cpu", 
> 4);
> +   while (off != -FDT_ERR_NOTFOUND) {
> +   reg = (fdt32_t *)fdt_getprop(blob, off, "reg", 0);
> +   if (reg) {
> +   val = spin_tbl_addr;
> +#ifndef CONFIG_FSL_SMP_RELEASE_ALL
> +   val += id_to_core(of_read_number(reg, addr_cells))
> +   * SPIN_TABLE_ELEM_SIZE;
> +#endif
> +   val = cpu_to_fdt64(val);
> +   fdt_setprop_string(blob, off, "enable-method",
> +  "spin-table");
> +   fdt_setprop(blob, off, "cpu-release-addr",
> +   &val, sizeof(val));
> + 

Re: [U-Boot] [PATCH v5 01/16] arm: ls102xa: Add Freescale LS102xA SoC support

2014-08-20 Thread Mark Rutland
On Wed, Aug 20, 2014 at 03:39:37AM +0100, AlisonWang wrote:
> Hi, Mark,
> 
> On Tue, Aug 19, 2014 at 03:54:50AM +0100, Alison Wang wrote:
> 
> > +int timer_init(void) 
> > +{ 
> > +   struct sctr_regs *sctr = (struct sctr_regs *)SCTR_BASE_ADDR; 
> > +   unsigned long ctrl, val, freq; 
> > + 
> > +   /* Enable System Counter */ 
> > +   writel(SYS_COUNTER_CTRL_ENABLE, &sctr->cntcr); 
> > + 
> > +   freq = GENERIC_TIMER_CLK; 
> > +   asm("mcr p15, 0, %0, c14, c0, 0" : : "r" (freq));
> 
> Is CNTFRQ initialised for both CPUs? 
> 
> [Alison Wang] No, only one CPU is booted now.

Ah, ok. I missed that.

> If the CPUs are booted at PL1 rather than PL2, is CNTVOFF initialised to 
> the same value on both CPUs? 
> 
> [Alison Wang] CNTVOFF is not initialized in the current secure mode. When
> we add virtualization support and switch to non-secure mode, we will
> initialize CNTVOFF to zero on both CPUs.

Ok. So long as CNTFRQ is also initialised that sounds fine.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v5 01/16] arm: ls102xa: Add Freescale LS102xA SoC support

2014-08-19 Thread Mark Rutland
Hi,

On Tue, Aug 19, 2014 at 03:54:50AM +0100, Alison Wang wrote:
> From: Wang Huan 
>
> The QorIQ LS1 family is built on Layerscape architecture,
> the industry's first software-aware, core-agnostic networking
> architecture to offer unprecedented efficiency and scale.
>
> Freescale LS102xA is a set of SoCs combines two ARM
> Cortex-A7 cores that have been optimized for high
> reliability and pack the highest level of integration
> available for sub-3 W embedded communications processors
> with Layerscape architecture and with a comprehensive
> enablement model focused on ease of programmability.
>
> Signed-off-by: Alison Wang 
> Signed-off-by: Jason Jin 
> Signed-off-by: Jingchang Lu 
> Signed-off-by: Prabhakar Kushwaha 
> ---
> Change log:
>  v5: No change.
>  v4: No change.
>  v3: Fix checkpatch errors.
>  v2: Add serdes support.
>  Update DDR frequency and data rate information.
>  Fix overflow condition error for the timer.

[...]

> +int timer_init(void)
> +{
> +   struct sctr_regs *sctr = (struct sctr_regs *)SCTR_BASE_ADDR;
> +   unsigned long ctrl, val, freq;
> +
> +   /* Enable System Counter */
> +   writel(SYS_COUNTER_CTRL_ENABLE, &sctr->cntcr);
> +
> +   freq = GENERIC_TIMER_CLK;
> +   asm("mcr p15, 0, %0, c14, c0, 0" : : "r" (freq));

Is CNTFRQ initialised for both CPUs?

If the CPUs are booted at PL1 rather than PL2, is CNTVOFF initialised to
the same value on both CPUs?

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 2/4] cmd_bootm.c: Add 'booti' for ARM64 Linux kernel Images

2014-08-15 Thread Mark Rutland
On Thu, Aug 14, 2014 at 08:11:49PM +0100, Tom Rini wrote:
> On Thu, Aug 14, 2014 at 04:16:50PM +0100, Mark Rutland wrote:
> > Hi Tom,
> > 
> > On Thu, Aug 14, 2014 at 11:42:36AM +0100, Tom Rini wrote:
> > > The default format for arm64 Linux kernels is the "Image" format,
> > > described in Documentation/arm64/booting.txt.  This, along with an
> > > optional gzip compression on top is all that is generated by default.
> > > The Image format has a magic number within the header for verification,
> > > a text_offset where the Image must be run from, an image_size that
> > > includes the BSS and reserved fields.
> > > 
> > > This does not support automatic detection of a gzip compressed image.
> > > 
> > > Signed-off-by: Tom Rini 
> > > 
> > > ---
> > > Changes in v1:
> > > - Adopt to Mark Rutland's changes now in mainline kernel wrt text_offset
> > >   / image_size
> > > ---
> > >  README |1 +
> > >  common/cmd_bootm.c |  140 
> > > 
> > >  include/bootm.h|2 +-
> > >  3 files changed, 142 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/README b/README
> > > index 1d71359..b9af7ac 100644
> > > --- a/README
> > > +++ b/README
> > > @@ -959,6 +959,7 @@ The following options need to be configured:
> > >   CONFIG_CMD_BMP  * BMP support
> > >   CONFIG_CMD_BSP  * Board specific commands
> > >   CONFIG_CMD_BOOTD  bootd
> > > + CONFIG_CMD_BOOTI* ARM64 Linux kernel Image support
> > >   CONFIG_CMD_CACHE* icache, dcache
> > >   CONFIG_CMD_CLK  * clock command support
> > >   CONFIG_CMD_CONSOLEconinfo
> > > diff --git a/common/cmd_bootm.c b/common/cmd_bootm.c
> > > index 8b897c8d..843ec6e 100644
> > > --- a/common/cmd_bootm.c
> > > +++ b/common/cmd_bootm.c
> > > @@ -627,3 +627,143 @@ U_BOOT_CMD(
> > >   "boot Linux zImage image from memory", bootz_help_text
> > >  );
> > >  #endif   /* CONFIG_CMD_BOOTZ */
> > > +
> > > +#ifdef CONFIG_CMD_BOOTI
> > > +/* See Documentation/arm64/booting.txt in the Linux kernel */
> > > +struct Image_header {
> > > + uint32_tcode0;  /* Executable code */
> > > + uint32_tcode1;  /* Executable code */
> > > + uint64_ttext_offset;/* Image load offset, LE */
> > > + uint64_timage_size; /* Effective Image size, LE */
> > > + uint64_tres1;   /* reserved */
> > > + uint64_tres2;   /* reserved */
> > > + uint64_tres3;   /* reserved */
> > > + uint64_tres4;   /* reserved */
> > > + uint32_tmagic;  /* Magic number */
> > > + uint32_tres5;
> > > +};
> > > +
> > > +#define LINUX_ARM64_IMAGE_MAGIC  0x644d5241
> > > +
> > > +static int booti_setup(bootm_headers_t *images)
> > > +{
> > > + struct Image_header *ih;
> > > + uint64_t dst;
> > > +
> > > + ih = (struct Image_header *)map_sysmem(images->ep, 0);
> > > +
> > > + if (ih->magic != le32_to_cpu(LINUX_ARM64_IMAGE_MAGIC)) {
> > > + puts("Bad Linux ARM64 Image magic!\n");
> > > + return 1;
> > > + }
> > > + 
> > > + if (ih->image_size == 0) {
> > > + puts("Image lacks image_size field, assuming 16MiB\n");
> > > + ih->image_size = (16 << 20);
> > > + }
> > 
> > This should work for a defconfig, but it might be possible to build a
> > larger kernel. From experiments with an allyesconfig, I can build a
> > ~60MB kernel with ~20MB of uninitialised data after the end of the
> > Image.
> 
> Part of me just wants to error out in this case.  Today people are
> wrapping vmlinux up with a legacy header and making uImages.  My hope is
> that with this and 3.17 we can encourage Image/Image.*/FIT Image usage
> instead.  We could just as easily whack in 128MB, all the same.

Sure, it's unlikely that someone will build that big a (< v3.17) kernel
for reasons other than breaking things. I just thought I should mention
in case this crops up again.
 
> > Modifying the Image feels a little dodgy, but I can't think of anything
> > this would break.
> 
> Yeah.  In my mind, an Image without this information is the corner case,
> not the normal case.  Doing it this way (a fixup to the data) means we
> don't have to error check this twice or play some other games.

Ok. As I said I can't think of anything this should break. This should
only affect older kernels so shouldn't be a problem going forward.

Prior to v3.17 you'll also find the text_offset field could be in an
arbitrary endianness, though should always have value 0x8. So if you
want to boot BE (< v3.17) kernels you'd have to fix that up too. Post
v3.17 it's subject to randomization.

Cheers,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH v2 2/4] cmd_bootm.c: Add 'booti' for ARM64 Linux kernel Images

2014-08-14 Thread Mark Rutland
Hi Tom,

On Thu, Aug 14, 2014 at 11:42:36AM +0100, Tom Rini wrote:
> The default format for arm64 Linux kernels is the "Image" format,
> described in Documentation/arm64/booting.txt.  This, along with an
> optional gzip compression on top is all that is generated by default.
> The Image format has a magic number within the header for verification,
> a text_offset where the Image must be run from, an image_size that
> includes the BSS and reserved fields.
> 
> This does not support automatic detection of a gzip compressed image.
> 
> Signed-off-by: Tom Rini 
> 
> ---
> Changes in v1:
> - Adopt to Mark Rutland's changes now in mainline kernel wrt text_offset
>   / image_size
> ---
>  README |1 +
>  common/cmd_bootm.c |  140 
> 
>  include/bootm.h|2 +-
>  3 files changed, 142 insertions(+), 1 deletion(-)
> 
> diff --git a/README b/README
> index 1d71359..b9af7ac 100644
> --- a/README
> +++ b/README
> @@ -959,6 +959,7 @@ The following options need to be configured:
>   CONFIG_CMD_BMP  * BMP support
>   CONFIG_CMD_BSP  * Board specific commands
>   CONFIG_CMD_BOOTD  bootd
> + CONFIG_CMD_BOOTI* ARM64 Linux kernel Image support
>   CONFIG_CMD_CACHE* icache, dcache
>   CONFIG_CMD_CLK  * clock command support
>   CONFIG_CMD_CONSOLEconinfo
> diff --git a/common/cmd_bootm.c b/common/cmd_bootm.c
> index 8b897c8d..843ec6e 100644
> --- a/common/cmd_bootm.c
> +++ b/common/cmd_bootm.c
> @@ -627,3 +627,143 @@ U_BOOT_CMD(
>   "boot Linux zImage image from memory", bootz_help_text
>  );
>  #endif   /* CONFIG_CMD_BOOTZ */
> +
> +#ifdef CONFIG_CMD_BOOTI
> +/* See Documentation/arm64/booting.txt in the Linux kernel */
> +struct Image_header {
> + uint32_tcode0;  /* Executable code */
> + uint32_tcode1;  /* Executable code */
> + uint64_ttext_offset;/* Image load offset, LE */
> + uint64_timage_size; /* Effective Image size, LE */
> + uint64_tres1;   /* reserved */
> + uint64_tres2;   /* reserved */
> + uint64_tres3;   /* reserved */
> + uint64_tres4;   /* reserved */
> + uint32_tmagic;  /* Magic number */
> + uint32_tres5;
> +};
> +
> +#define LINUX_ARM64_IMAGE_MAGIC  0x644d5241
> +
> +static int booti_setup(bootm_headers_t *images)
> +{
> + struct Image_header *ih;
> + uint64_t dst;
> +
> + ih = (struct Image_header *)map_sysmem(images->ep, 0);
> +
> + if (ih->magic != le32_to_cpu(LINUX_ARM64_IMAGE_MAGIC)) {
> + puts("Bad Linux ARM64 Image magic!\n");
> + return 1;
> + }
> + 
> + if (ih->image_size == 0) {
> + puts("Image lacks image_size field, assuming 16MiB\n");
> + ih->image_size = (16 << 20);
> + }

This should work for a defconfig, but it might be possible to build a
larger kernel. From experiments with an allyesconfig, I can build a
~60MB kernel with ~20MB of uninitialised data after the end of the
Image.

Modifying the Image feels a little dodgy, but I can't think of anything
this would break.

> +
> + /*
> +  * If we are not at the correct run-time location, set the new
> +  * correct location and then move the image there.
> +  */
> + dst = gd->bd->bi_dram[0].start + le32_to_cpu(ih->text_offset);

This should be le64_to_cpu(ih->text_offset) to be strictly correct.

I wouldn't imagine we'd ever have a text_offset larger than 4GB, but it
would be nice to keep things consistent with the documentation and
kernel code.

> + if (images->ep != dst) {
> + void *src;
> +
> + debug("Moving Image from 0x%lx to 0x%llx\n", images->ep, dst);
> +
> + src = (void *)images->ep;
> + images->ep = dst;
> + memmove((void *)dst, src, le32_to_cpu(ih->image_size));

Likewise.

> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Image booting support
> + */
> +static int booti_start(cmd_tbl_t *cmdtp, int flag, int argc,
> + char * const argv[], bootm_headers_t *images)
> +{
> + int ret;
> + struct Image_header *ih;
> +
> + ret = do_bootm_states(cmdtp, flag, argc, argv, BOOTM_STATE_START,
> +   images, 1);
> +
> + /* Setup Linux kernel Image entry point */
> + if (!argc) {
> + images->ep = load_addr;
> + debug("*  kernel: default image load address = 0x%08lx\n",
> + load_addr);
> + } else {
> + images->ep = simple_strtoul(argv[0], NULL, 16);
> + debug("*  kernel: cmdline image address = 0x%08lx\n",
> + images->ep);
> + }
> +
> + ret = booti_setup(images);
> + if (ret != 0)
> + return 1;
> +
>

Re: [U-Boot] [PATCH] ARM: HYP/non-sec: Add MIDR check to detect unsupported CPUs

2014-08-06 Thread Mark Rutland
On Wed, Aug 06, 2014 at 08:38:13AM +0100, Ian Campbell wrote:
> On Mon, 2014-08-04 at 16:14 +0100, Marc Zyngier wrote:
> 
> > My personal feeling is that booting in secure mode is always the wrong
> > thing to do.
> 
> FWIW I agree.
> 
> > If you want to go down the road of a single bootloader that is able to
> > run on several SOCs, then do it the proper way: parse the device tree
> > and have separate constraints for your SoC. But please don't blacklist
> > random cores just because it fits your environment.
> 
> I think there is a CPU feature register which indicates whether support
> for HYP mode is present, isn't there?

ID_PFR1[15:12] should tell you if the CPU has the virtualization
extensions.

> In which case a tolerable fix for now (going all the way DT is a big
> yakk to shave...) would be to use that to decide between booting in
> NS.HYP vs NS.SVC (nb: not NS.HYP vs S.SVC).

That sounds ideal.

> I don't recall if the GIC has a feature bit for the security extensions,
> but if not then inferring it from the CPUs support wouldn't be the worst
> thing in the world under the circumstances.

GICD_TYPER[10] (SecurityExtn) should tell you if the GIC has the
security extensions. I don't know whether you'll encounter a platform
where the CPU and GIC are mismatched w.r.t. security extensions.

Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] ARMv8 spin-table patches

2014-07-16 Thread Mark Rutland
On Mon, Jul 14, 2014 at 09:21:26PM +0100, York Sun wrote:
> On 07/14/2014 07:03 AM, Mark Rutland wrote:
> > Unfortunately I don't have an answer to that; the arm64 Linux spin-table
> > documentation and code were all written before I was involved.
> > 
> > The unfortunate truth is that a lot of the ARM DT and boot unification
> > work was done somewhat blindly, with many subtleties being lost. Someone
> > implemented spin-table with a shared address because it happened to be
> > easier, and then it got copied. Now that people are actively using it
> > it's not possible to remove it, and it's difficult to dissuade others
> > from following the crowd.
> > 
> > If U-Boot provides each CPU with its own unique address, then that would
> > be fantastic, and certainly avoids one nasty edge-case.
> 
> In the patch set I sent for review, each CPU has its own spin table. It has an
> option to use a single release address, or individual release address.

Ok, each having their own table is good.

I would strongly recommend against sharing the release address for the
reasons I described in my earlier email.

Cheers,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] ARMv8 spin-table patches

2014-07-14 Thread Mark Rutland
On Tue, Jul 08, 2014 at 04:48:00AM +0100, Scott Wood wrote:
> On Fri, 2014-07-04 at 10:29 +0100, Mark Rutland wrote:
> > Hi,
> > 
> > Apologies for the late reply.
> > 
> > On Fri, Jun 27, 2014 at 05:44:05PM +0100, Tom Rini wrote:
> > > On Fri, Jun 27, 2014 at 09:11:39AM -0700, York Sun wrote:
> > > 
> > > > Dear Albert, Wolfgang, Tom,
> > > > 
> > > > I have seen some patches for PSCI. We don't have PSCI enabled on
> > > > Freescale ARMv8 SoCs. Will spin-table patches be acceptable?
> > > 
> > > Baring some technical reasons why no, you can't do that, yes, lets see
> > > the patches :)
> > 
> > I'd point out that it's decidedly sub-optimal as spin-table provides no
> > provision for CPU hotplug (which for Linux will affect kexec and other
> > features relying on CPU hotplug support).
> 
> I don't think we're ruling PSCI out entirely, but we don't have it yet
> and it's not imminent.  Currently we don't have any firmware that stays
> resident after Linux takes over (the tiny spin table code is not
> comparable to something that needs to be able to provide runtime
> services to a core that has already run OS code).

Sure. I understand that's the case. I just wanted to point out the
limitations of spin-table over alternatives.

I would point out that a trivial PSCI implementation (for version 0.1)
which only provides CPU_ON and CPU_OFF can be relatively small
(especially if CPUs just spin at EL3), and that provides the major
functionality for the use-cases described above.

> > Additionally, spin-table has the unfortunate property of allowing the
> > firmware to throw an unbound number of CPUs into the kernel at once
> > (when they share a cpu-release-addr), where they can spend a lot of time
> > spinning pointlessly (executing kernel code from memory and possibly
> > fetching it into I-caches) depending on the number of events a CPU
> > happens to generate at runtime.
> 
> Why do some ARM implementations of the spin table use the same address
> for all CPUs?  It looks like ARM's use of the spin table was patterned
> after what we do on PPC (Documentation/devicetree/bindings/arm/cpus.txt
> even claims to follow ePAPR v1.1), but PPC always had a separate release
> address for each CPU, plus the ability to set a register to give the CPU
> some context after spinning up.

Unfortunately I don't have an answer to that; the arm64 Linux spin-table
documentation and code were all written before I was involved.

The unfortunate truth is that a lot of the ARM DT and boot unification
work was done somewhat blindly, with many subtleties being lost. Someone
implemented spin-table with a shared address because it happened to be
easier, and then it got copied. Now that people are actively using it
it's not possible to remove it, and it's difficult to dissuade others
from following the crowd.

If U-Boot provides each CPU with its own unique address, then that would
be fantastic, and certainly avoids one nasty edge-case.

Thanks,
Mark.
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [Patch v1 2/4] armv8/fsl-lsch3: Release secondary cores from boot hold off with Boot Page

2014-07-14 Thread Mark Rutland
> > >> @@ -119,3 +107,94 @@ ENTRY(lowlevel_init)
> > >> mov lr, x29 /* Restore LR */
> > >> ret
> > >>  ENDPROC(lowlevel_init)
> > >> +
> > >> +   /* Keep literals not used by the secondary boot page outside
> > it */
> > >> +   .ltorg
> > >> +
> > >> +   .align 4
> > >
> > > That looks like a small alignment for a page.
> > >
> > > Should this be larger? Or is the "page" a misnomer here?
> >
> > I think as far as it is aligned to instruction size and keep "ldr" happy,
> > it is OK. The code will be copied to the beginning of DDR to run. Any
> > concern here?
> >
> 
> "page" is definitely a misnomer here, the comment (and maybe the label
> below) should probably be altered.
> The align directive is fine I guess.

I think it can be dropped to .align 2 if it's just to keep the
instruction stream aligned as per York's comment, but it's not harmful
to have greater alignment (just slightly confusing when trying to figure
out why the .align is there).

[...]

> > >> +#if defined(CONFIG_GICV3)
> > >> +   gic_wait_for_interrupt_m x0
> > >> +#endif
> > >> +
> > >> +   bl secondary_switch_to_el2
> > >> +#ifdef CONFIG_ARMV8_SWITCH_TO_EL1
> > >> +   secondary_switch_to_el1
> > >> +#endif
> > >> +
> > >> +slave_cpu:
> > >> +   wfe
> > >> +#ifdef CONFIG_FSL_SMP_RELEASE_ALL
> > >> +   ldr x1, =CPU_RELEASE_ADDR
> > >> +   ldr x0, [x1]
> > >> +#else
> > >> +   ldr x0, [x11]
> > >> +   tbnzx0, #0, slave_cpu
> > >> +#endif
> > >> +   cbz x0, slave_cpu
> > >> +   br  x0  /* branch to the given address
> > */
> > >
> > > Just to check, I take it CPUs won't ever be in a big-endian mode at
> > > this point?
> >
> > Don't know yet. Any concern if big-endian here?
> 
> I think we missed something here. Mark please correct me if I am
> wrong.  If the CPU is big-endian then we will need to convert the
> address contained at "CPU_RELEASE_ADDR", since it will always be
> written as a "single 64-bit little-endian value" (quoting from
> Documentation/arm64/booting.txt")

It sounds like you figured it out.

As far as I am aware, all you need to do is byte-swap the value if CPUs
are big-endian at this point. Linux will configure the CPUs to the
endianness it desires before it makes any explicit memory accesses.

> 
> >
> > >
> > >> +ENDPROC(secondary_boot_func)
> > >> +
> > >> +ENTRY(secondary_switch_to_el2)
> > >> +   switch_el x0, 1f, 0f, 0f
> > >> +0: ret
> > >> +1: armv8_switch_to_el2_m x0
> > >> +ENDPROC(secondary_switch_to_el2)
> > >> +
> > >> +ENTRY(secondary_switch_to_el1)
> > >> +   switch_el x0, 0f, 1f, 0f
> > >> +0: ret
> > >> +1: armv8_switch_to_el1_m x0, x1
> > >> +ENDPROC(secondary_switch_to_el1)
> > >> +
> > >> +   /* Ensure that the literals used by the secondary boot page
> > are
> > >> +* assembled within it
> > >> +*/
> > >> +   .ltorg
> > >> +
> > >> +   .align 4
> > >
> > > Similarly to above, this looks like a small alignment for a page.
> >
> > Please suggest a proper alignment.
> >
> 
> Think this is confusion caused by our use of the term "secondary boot
> page". I don't think it needs to be sized or aligned as a page. We
> should probably change our terminology.

If there's some better terminology we could use, it would certainly make
things clearer. I must admit that the alternatives I came up with
weren't much better. "Secondary boot region", perhaps?

[...]

> > >> +   /* Initialize SCTLR_EL2 */
> > >> +   msr sctlr_el2, xzr
> > >
> > > What about the RES1 bits (e.g. bits 29 & 28)?
> > >
> > > We don't seem to initialise them before the eret.
> >
> > I can't answer this question and below. Adding Arnab as the original
> > author for these changes.
> >
> > York
> >
> 
> You are right, will fix this. According to the ARMv8 ARM, RES1 bits
> "should be one or preserved".  What should the preferred approach be
> here? Write one or read modify and update the required bits?

As this seems to be the first initialisation of sctlr_el2, I believe the
correct thing to do is to write one for those bits, as their value may
be UNKNOWN (if not hardwired).

Per my reading of the ARM ARM's description of SBOP, after
initialization read-modify-write preserving the value of those bits is
the preferred way of modifying the register.

> 
> > >
> > >> +
> > >> +   /* Return to the EL2_SP2 mode from EL3 */
> > >> +   mov \xreg1, sp
> > >> +   msr sp_el2, \xreg1  /* Migrate SP */
> > >> +   mrs \xreg1, vbar_el3
> > >> +   msr vbar_el2, \xreg1/* Migrate VBAR */
> > >> +   mov x0, #0x3c9
> 
> Just noticed a bug here, x0 should in fact be \xreg1. My bad!!
> 
> > >> +   msr spsr_el3, \xreg1/* EL2_SP2 | D | A | I | F */
> > >> +   msr elr_el3, lr
> > >> +   eret
> > >> +.endm
> > >> +
> > >> +.macro armv8_switch_to_el1_m, xreg1, xreg2
> > >> +   /* Initialize Gener

Re: [U-Boot] [Patch v1 2/4] armv8/fsl-lsch3: Release secondary cores from boot hold off with Boot Page

2014-07-14 Thread Mark Rutland
On Tue, Jul 08, 2014 at 06:56:26PM +0100, York Sun wrote:
> On 07/04/2014 05:31 AM, Mark Rutland wrote:
> > Hi York,
> >
> > I spotted a couple of generic issues below. Most of these are issues
> > with the existing code that you happen to be moving around, rather than
> > with the new code this patch introduces.
> >
> > There are a couple of gotchas around secondary startup that are painful
> > with the bootwrapper for arm64 at present, and I think that we can avoid
> > them by construction for U-Boot. More on that below.
> >
> > On Fri, Jun 27, 2014 at 05:54:08PM +0100, York Sun wrote:
> >> Secondary cores need to be released from holdoff by boot release
> >> registers. With GPP bootrom, they can boot from main memory
> >> directly. Individual spin table is used for each core. If a single
> >> release address is needed, defining macro CONFIG_FSL_SMP_RELEASE_ALL
> >> will use the CPU_RELEASE_ADDR. Spin table and the boot page is reserved
> >> in device tree so OS won't overwrite.
> >>
> >> Signed-off-by: York Sun 
> >> Signed-off-by: Arnab Basu 
> >> ---
> >> This set depends on this bundle 
> >> http://patchwork.ozlabs.org/bundle/yorksun/armv8_fsl-lsch3/
> >>
> >>  arch/arm/cpu/armv8/fsl-lsch3/Makefile |2 +
> >>  arch/arm/cpu/armv8/fsl-lsch3/cpu.c|   13 ++
> >>  arch/arm/cpu/armv8/fsl-lsch3/cpu.h|1 +
> >>  arch/arm/cpu/armv8/fsl-lsch3/fdt.c|   56 +++
> >>  arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S   |  119 +++---
> >>  arch/arm/cpu/armv8/fsl-lsch3/mp.c |  171 
> >> +
> >>  arch/arm/cpu/armv8/fsl-lsch3/mp.h |   36 +
> >>  arch/arm/cpu/armv8/transition.S   |   63 +---
> >>  arch/arm/include/asm/arch-fsl-lsch3/config.h  |3 +-
> >>  arch/arm/include/asm/arch-fsl-lsch3/immap_lsch3.h |   35 +
> >>  arch/arm/include/asm/macro.h  |   81 ++
> >>  arch/arm/lib/gic_64.S |   10 +-
> >>  common/board_f.c  |2 +-
> >>  13 files changed, 502 insertions(+), 90 deletions(-)
> >>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> >>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/mp.c
> >>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/mp.h
> >
> > [...]
> >
> >> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/fdt.c 
> >> b/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> >> new file mode 100644
> >> index 000..cd34e16
> >> --- /dev/null
> >> +++ b/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> >> @@ -0,0 +1,56 @@
> >> +/*
> >> + * Copyright 2014 Freescale Semiconductor, Inc.
> >> + *
> >> + * SPDX-License-Identifier:GPL-2.0+
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include "mp.h"
> >> +
> >> +#ifdef CONFIG_MP
> >> +void ft_fixup_cpu(void *blob)
> >> +{
> >> +   int off;
> >> +   __maybe_unused u64 spin_tbl_addr = (u64)get_spin_tbl_addr();
> >> +   u64 *reg;
> >> +   u64 val;
> >> +
> >> +   off = fdt_node_offset_by_prop_value(blob, -1, "device_type", 
> >> "cpu", 4);
> >> +   while (off != -FDT_ERR_NOTFOUND) {
> >> +   reg = (u64 *)fdt_getprop(blob, off, "reg", 0);
> >> +   if (reg) {
> >> +   val = spin_tbl_addr;
> >> +#ifndef CONFIG_FSL_SMP_RELEASE_ALL
> >> +   val += id_to_core(fdt64_to_cpu(*reg)) * 
> >> SIZE_BOOT_ENTRY;
> >
> > In Linux we read /cpus/#address-cells to determine the size of a
> > CPU's reg property (and have dts where this is 1 cell). Will the above
> > work for that?
>
> I don't think so. Will have to add the same size check.

Cheers.

> >
> >> +#endif
> >> +   val = cpu_to_fdt64(val);
> >> +   fdt_setprop_string(blob, off, "enable-method",
> >> +  "spin-table");
> >> +   fdt_setprop(blob, off, "cpu-release-addr",
> >> +   &val, sizeof(val));
> >> +   } else {
> >> +   puts("cpu NULL\n");
> >

Re: [U-Boot] [Patch v1 2/4] armv8/fsl-lsch3: Release secondary cores from boot hold off with Boot Page

2014-07-04 Thread Mark Rutland
Hi York,

I spotted a couple of generic issues below. Most of these are issues
with the existing code that you happen to be moving around, rather than
with the new code this patch introduces.

There are a couple of gotchas around secondary startup that are painful
with the bootwrapper for arm64 at present, and I think that we can avoid
them by construction for U-Boot. More on that below.

On Fri, Jun 27, 2014 at 05:54:08PM +0100, York Sun wrote:
> Secondary cores need to be released from holdoff by boot release
> registers. With GPP bootrom, they can boot from main memory
> directly. Individual spin table is used for each core. If a single
> release address is needed, defining macro CONFIG_FSL_SMP_RELEASE_ALL
> will use the CPU_RELEASE_ADDR. Spin table and the boot page is reserved
> in device tree so OS won't overwrite.
> 
> Signed-off-by: York Sun 
> Signed-off-by: Arnab Basu 
> ---
> This set depends on this bundle 
> http://patchwork.ozlabs.org/bundle/yorksun/armv8_fsl-lsch3/
> 
>  arch/arm/cpu/armv8/fsl-lsch3/Makefile |2 +
>  arch/arm/cpu/armv8/fsl-lsch3/cpu.c|   13 ++
>  arch/arm/cpu/armv8/fsl-lsch3/cpu.h|1 +
>  arch/arm/cpu/armv8/fsl-lsch3/fdt.c|   56 +++
>  arch/arm/cpu/armv8/fsl-lsch3/lowlevel.S   |  119 +++---
>  arch/arm/cpu/armv8/fsl-lsch3/mp.c |  171 
> +
>  arch/arm/cpu/armv8/fsl-lsch3/mp.h |   36 +
>  arch/arm/cpu/armv8/transition.S   |   63 +---
>  arch/arm/include/asm/arch-fsl-lsch3/config.h  |3 +-
>  arch/arm/include/asm/arch-fsl-lsch3/immap_lsch3.h |   35 +
>  arch/arm/include/asm/macro.h  |   81 ++
>  arch/arm/lib/gic_64.S |   10 +-
>  common/board_f.c  |2 +-
>  13 files changed, 502 insertions(+), 90 deletions(-)
>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/fdt.c
>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/mp.c
>  create mode 100644 arch/arm/cpu/armv8/fsl-lsch3/mp.h
 
[...]

> diff --git a/arch/arm/cpu/armv8/fsl-lsch3/fdt.c 
> b/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> new file mode 100644
> index 000..cd34e16
> --- /dev/null
> +++ b/arch/arm/cpu/armv8/fsl-lsch3/fdt.c
> @@ -0,0 +1,56 @@
> +/*
> + * Copyright 2014 Freescale Semiconductor, Inc.
> + *
> + * SPDX-License-Identifier:GPL-2.0+
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include "mp.h"
> +
> +#ifdef CONFIG_MP
> +void ft_fixup_cpu(void *blob)
> +{
> +   int off;
> +   __maybe_unused u64 spin_tbl_addr = (u64)get_spin_tbl_addr();
> +   u64 *reg;
> +   u64 val;
> +
> +   off = fdt_node_offset_by_prop_value(blob, -1, "device_type", "cpu", 
> 4);
> +   while (off != -FDT_ERR_NOTFOUND) {
> +   reg = (u64 *)fdt_getprop(blob, off, "reg", 0);
> +   if (reg) {
> +   val = spin_tbl_addr;
> +#ifndef CONFIG_FSL_SMP_RELEASE_ALL
> +   val += id_to_core(fdt64_to_cpu(*reg)) * 
> SIZE_BOOT_ENTRY;

In Linux we read /cpus/#address-cells to determine the size of a
CPU's reg property (and have dts where this is 1 cell). Will the above
work for that?

> +#endif
> +   val = cpu_to_fdt64(val);
> +   fdt_setprop_string(blob, off, "enable-method",
> +  "spin-table");
> +   fdt_setprop(blob, off, "cpu-release-addr",
> +   &val, sizeof(val));
> +   } else {
> +   puts("cpu NULL\n");
> +   }
> +   off = fdt_node_offset_by_prop_value(blob, off, "device_type",
> +   "cpu", 4);
> +   }
> +   /*
> +* Boot page and spin table can be reserved here if not done staticlly
> +* in device tree.
> +*
> +* fdt_add_mem_rsv(blob, bootpg,
> +* *((u64 *)&(__secondary_boot_page_size)));
> +* If defined CONFIG_FSL_SMP_RELEASE_ALL, the release address should
> +* also be reserved.
> +*/

I think that this reservation should _always_ be added by U-Boot unless
specifically overridden.

A problem I had with the arm64 bootwrapper when adding PSCI support and
now (as I am moving stuff about) was that the DTS in the kernel tree had
a memreserve out-of-sync with what the wrapper actually needed. While I
can add a new reservation, I can't remove any in case they are for
something else, so I end up protecting too much, wasting memory.

Given that the reservation is to protect data which U-Boot is in control
of choosing the address for, I think the only sane thing to do is for
U-Boot to always add the reservation.

That way U-Boot can change and existing DTBs will just work. We won't
end up protecting too much or too little.

[...]

> @@ -119,3 +107,94 @@ ENTRY(lowlevel_init)
> mov

  1   2   >