Re: [PATCH 2/2] xfs: remove the extra processing of zero size in xfs_idata_realloc()

2020-11-24 Thread Leizhen (ThunderTown)



On 2020/11/24 19:52, Christoph Hellwig wrote:
> On Tue, Nov 24, 2020 at 06:45:31PM +0800, Zhen Lei wrote:
>> krealloc() does the free operation when the parameter new_size is 0, with
>> ZERO_SIZE_PTR returned. Because all other places use NULL to check whether
>> if_data is available or not, so covert it from ZERO_SIZE_PTR to NULL.
> 
> This new code looks much harder to read than the version it replaced.

OK

> 
> 



Re: [PATCH 1/4] reset: hisilicon: correct vendor prefix

2020-12-03 Thread Leizhen (ThunderTown)



On 2020/12/3 20:54, Philipp Zabel wrote:
> On Thu, 2020-12-03 at 20:02 +0800, Zhen Lei wrote:
>> The vendor prefix of "Hisilicon Limited" is "hisilicon", it is clearly
>> stated in "vendor-prefixes.yaml".
>>
>> Fixes: 1527058736fa ("reset: hisilicon: add reset-hi3660")
>> Signed-off-by: Zhen Lei 
>> Cc: Zhangfei Gao 
>> ---
>>  drivers/reset/hisilicon/reset-hi3660.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/reset/hisilicon/reset-hi3660.c 
>> b/drivers/reset/hisilicon/reset-hi3660.c
>> index a7d4445924e558c..8f1953159a65b31 100644
>> --- a/drivers/reset/hisilicon/reset-hi3660.c
>> +++ b/drivers/reset/hisilicon/reset-hi3660.c
>> @@ -83,7 +83,7 @@ static int hi3660_reset_probe(struct platform_device *pdev)
>>  if (!rc)
>>  return -ENOMEM;
>>  
>> -rc->map = syscon_regmap_lookup_by_phandle(np, "hisi,rst-syscon");
>> +rc->map = syscon_regmap_lookup_by_phandle(np, "hisilicon,rst-syscon");
> 
> What about those that don't upgrade kernel and DT in lock-step?
> It would be easy to fall back to the old compatible if the new one
> fails.

All right, I'll combine them. I thought they belonged to different maintainers,
and I had to break them apart.

> 
> regards
> Philipp
> 
> .
> 



Re: [PATCH 0/1] dt-bindings: eliminate yamllint warnings

2020-12-03 Thread Leizhen (ThunderTown)
Sorry, Forgot to say: This patch is based on the latest linux-next code.


On 2020/12/4 10:42, Zhen Lei wrote:
> There're too many people, I just send to the maintainer, reviewer, supporter.
> 
> Eliminate below warnings:
> ./Documentation/devicetree/bindings/clock/imx8qxp-lpcg.yaml:32:13: [warning] 
> wrong indentation: expected 14 but found 12 (indentation)
> ./Documentation/devicetree/bindings/clock/imx8qxp-lpcg.yaml:35:9: [warning] 
> wrong indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/display/intel,keembay-msscam.yaml:21:6: 
> [warning] wrong indentation: expected 6 but found 5 (indentation)
> ./Documentation/devicetree/bindings/display/bridge/analogix,anx7625.yaml:52:9:
>  [warning] wrong indentation: expected 6 but found 8 (indentation)
> ./Documentation/devicetree/bindings/display/bridge/intel,keembay-dsi.yaml:42:8:
>  [warning] wrong indentation: expected 8 but found 7 (indentation)
> ./Documentation/devicetree/bindings/display/bridge/intel,keembay-dsi.yaml:45:8:
>  [warning] wrong indentation: expected 8 but found 7 (indentation)
> ./Documentation/devicetree/bindings/display/panel/novatek,nt36672a.yaml:25:10:
>  [warning] wrong indentation: expected 10 but found 9 (indentation)
> ./Documentation/devicetree/bindings/media/i2c/adv7604.yaml:24:9: [warning] 
> wrong indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/media/i2c/mipi-ccs.yaml:4:1: [error] 
> missing document start "---" (document-start)
> ./Documentation/devicetree/bindings/media/i2c/mipi-ccs.yaml:29:9: [warning] 
> wrong indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/media/i2c/mipi-ccs.yaml:32:9: [warning] 
> wrong indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/media/i2c/ovti,ov772x.yaml:79:17: 
> [warning] wrong indentation: expected 14 but found 16 (indentation)
> ./Documentation/devicetree/bindings/media/i2c/ovti,ov772x.yaml:88:17: 
> [warning] wrong indentation: expected 14 but found 16 (indentation)
> ./Documentation/devicetree/bindings/media/i2c/sony,imx214.yaml:72:17: 
> [warning] wrong indentation: expected 18 but found 16 (indentation)
> ./Documentation/devicetree/bindings/media/i2c/sony,imx214.yaml:75:17: 
> [warning] wrong indentation: expected 18 but found 16 (indentation)
> ./Documentation/devicetree/bindings/mmc/mtk-sd.yaml:20:9: [warning] wrong 
> indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/mmc/mtk-sd.yaml:30:9: [warning] wrong 
> indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/mmc/mtk-sd.yaml:33:9: [warning] wrong 
> indentation: expected 10 but found 8 (indentation)
> ./Documentation/devicetree/bindings/sound/mt8192-mt6359-rt1015-rt5682.yaml:10:4:
>  [warning] wrong indentation: expected 2 but found 3 (indentation)
> 
> 
> Zhen Lei (1):
>   dt-bindings: eliminate yamllint warnings
> 
>  .../devicetree/bindings/clock/imx8qxp-lpcg.yaml| 20 -
>  .../bindings/display/bridge/analogix,anx7625.yaml  |  4 ++--
>  .../bindings/display/bridge/intel,keembay-dsi.yaml |  4 ++--
>  .../bindings/display/intel,keembay-msscam.yaml |  4 ++--
>  .../bindings/display/panel/novatek,nt36672a.yaml   |  2 +-
>  .../devicetree/bindings/media/i2c/adv7604.yaml |  4 ++--
>  .../devicetree/bindings/media/i2c/mipi-ccs.yaml| 11 -
>  .../devicetree/bindings/media/i2c/ovti,ov772x.yaml | 12 +-
>  .../devicetree/bindings/media/i2c/sony,imx214.yaml | 12 +-
>  Documentation/devicetree/bindings/mmc/mtk-sd.yaml  | 26 
> +++---
>  .../sound/mt8192-mt6359-rt1015-rt5682.yaml |  4 ++--
>  11 files changed, 52 insertions(+), 51 deletions(-)
> 



Re: [PATCH 5/6] ARM: dts: mmp2-olpc-xo-1-75: explicitly add #address-cells=<0> for slave mode

2020-12-03 Thread Leizhen (ThunderTown)
Hi everybody:
  Can somebody apply this patch? When I do any YAML dtbs_check on arm, below 
Warnings always reported.

arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): 
/soc/apb@d400/spi@d4037000: incorrect #address-cells for SPI bus
  also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): 
/soc/apb@d400/spi@d4037000: incorrect #size-cells for SPI bus
  also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
arch/arm/boot/dts/mmp2-olpc-xo-1-75.dt.yaml: Warning (spi_bus_reg): Failed 
prerequisite 'spi_bus_bridge'


On 2020/10/14 0:08, Zhen Lei wrote:
> Delete the old property "#address-cells" and then explicitly add it with
> zero value. The value of "#size-cells" is already zero, so keep it no
> change.
> 
> Signed-off-by: Zhen Lei 
> ---
>  arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts 
> b/arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts
> index f1a41152e9dd70d..be88b6e551d58e9 100644
> --- a/arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts
> +++ b/arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts
> @@ -224,7 +224,7 @@
>  
>  &ssp3 {
>   /delete-property/ #address-cells;
> - /delete-property/ #size-cells;
> + #address-cells = <0>;
>   spi-slave;
>   status = "okay";
>   ready-gpio = <&gpio 125 GPIO_ACTIVE_HIGH>;
> 



Re: [PATCH 1/1] dt-bindings: eliminate yamllint warnings

2020-12-05 Thread Leizhen (ThunderTown)



On 2020/12/5 1:41, Mark Brown wrote:
> On Fri, Dec 04, 2020 at 10:42:26AM +0800, Zhen Lei wrote:
>> All warnings are related only to "wrong indentation", except one:
>> Documentation/devicetree/bindings/media/i2c/mipi-ccs.yaml:4:1: \
>> [error] missing document start "---" (document-start)
> 
> It would make life easier (and be more normal practice) to split this up
> by driver/subsystem and send a bunch of separate patches to the relevant
> maintainers, this makes it much easier to review and handle things.

Okay, I'll split this patch and send them separately. I'm not going to mark
the new patches as v2 to save trouble.

> 



Re: [PATCH 1/5] media: dt-bindings: add the required property 'additionalProperties'

2020-12-05 Thread Leizhen (ThunderTown)



On 2020/12/4 18:56, Philipp Zabel wrote:
> On Fri, 2020-12-04 at 17:38 +0800, Zhen Lei wrote:
>> When I do dt_binding_check for any YAML file, below wanring is always
>> reported:
>>
>> xxx/media/coda.yaml: 'additionalProperties' is a required property
>> xxx/media/coda.yaml: ignoring, error in schema:
>> warning: no schema found in file: xxx/media/coda.yaml
>>
>> There are three properties defined in allOf, they should be explicitly
>> declared. Otherwise, "additionalProperties: false" will prohibit them.
>>
>> Signed-off-by: Zhen Lei 
> 
> Thank you, there already is a patch to fix this:
> 
> https://lore.kernel.org/linux-media/20201117200752.4004368-1-r...@kernel.org/

OK. I found it temporarily during do a JSON conversion,I have not subscribed the
dt-binding maillist.

> 
> regards
> Philipp
> 
> .
> 



Re: [PATCH 1/1] device-dax: avoid an unnecessary check in alloc_dev_dax_range()

2020-12-17 Thread Leizhen (ThunderTown)



On 2020/12/18 11:10, Dan Williams wrote:
> On Fri, Nov 20, 2020 at 1:23 AM Zhen Lei  wrote:
>>
>> Swap the calling sequence of krealloc() and __request_region(), call the
>> latter first. In this way, the value of dev_dax->nr_range does not need to
>> be considered when __request_region() failed.
> 
> This looks ok, but I think I want to see another cleanup go in first
> before this to add a helper for trimming the last range off the set of
> ranges:
> 
> static void dev_dax_trim_range(struct dev_dax *dev_dax)
> {
> int i = dev_dax->nr_range - 1;
> struct range *range = &dev_dax->ranges[i].range;
> struct dax_region *dax_region = dev_dax->region;
> 
> dev_dbg(dev, "delete range[%d]: %#llx:%#llx\n", i,
> (unsigned long long)range->start,
> (unsigned long long)range->end);
> 
> __release_region(&dax_region->res, range->start, range_len(range));
> if (--dev_dax->nr_range == 0) {
> kfree(dev_dax->ranges);
> dev_dax->ranges = NULL;
> }
> }
> 
> Care to do a lead in patch with that cleanup, then do this one?

I don't mind! You can add above helper first. After that, I'll update
and send this patch again.

> 
> I think that might also cleanup a memory leak report from Jane in
> addition to not needing the "goto" as well.
> 
> http://lore.kernel.org/r/c8a8a260-34c6-dbfc-1f19-25c23d01c...@oracle.com
> 
> .
> 



Re: [PATCH 1/1] device-dax: avoid an unnecessary check in alloc_dev_dax_range()

2020-12-17 Thread Leizhen (ThunderTown)



On 2020/11/20 17:22, Zhen Lei wrote:
> Swap the calling sequence of krealloc() and __request_region(), call the
> latter first. In this way, the value of dev_dax->nr_range does not need to
> be considered when __request_region() failed.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/dax/bus.c | 29 -
>  1 file changed, 12 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 27513d311242..1efae11d947a 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -763,23 +763,15 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, 
> u64 start,
>   return 0;
>   }
>  
> - ranges = krealloc(dev_dax->ranges, sizeof(*ranges)
> - * (dev_dax->nr_range + 1), GFP_KERNEL);
> - if (!ranges)
> - return -ENOMEM;
> -
>   alloc = __request_region(res, start, size, dev_name(dev), 0);
> - if (!alloc) {
> - /*
> -  * If this was an empty set of ranges nothing else
> -  * will release @ranges, so do it now.
> -  */
> - if (!dev_dax->nr_range) {
> - kfree(ranges);
> - ranges = NULL;
> - }
> - dev_dax->ranges = ranges;
> + if (!alloc)
>   return -ENOMEM;
> +
> + ranges = krealloc(dev_dax->ranges, sizeof(*ranges)
> + * (dev_dax->nr_range + 1), GFP_KERNEL);
> + if (!ranges) {
> + rc = -ENOMEM;
> + goto err;

Hi, Dan Williams:
In fact, after adding the new helper dev_dax_trim_range(), we can
directly call __release_region() and return error code at here. Replace goto.

>   }
>  
>   for (i = 0; i < dev_dax->nr_range; i++)
> @@ -808,11 +800,14 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, 
> u64 start,
>   dev_dbg(dev, "delete range[%d]: %pa:%pa\n", dev_dax->nr_range - 
> 1,
>   &alloc->start, &alloc->end);
>   dev_dax->nr_range--;
> - __release_region(res, alloc->start, resource_size(alloc));
> - return rc;
> + goto err;
>   }
>  
>   return 0;
> +
> +err:
> + __release_region(res, alloc->start, resource_size(alloc));
> + return rc;
>  }
>  
>  static int adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource 
> *res, resource_size_t size)
> 



Re: [PATCH] device-dax: Fix range release

2020-12-18 Thread Leizhen (ThunderTown)



On 2020/12/19 10:41, Dan Williams wrote:
> There are multiple locations that open-code the release of the last
> range in a device-dax instance. Consolidate this into a new
> dev_dax_trim_range() helper.
> 
> This also addresses a kmemleak report:
> 
> # cat /sys/kernel/debug/kmemleak
> [..]
> unreferenced object 0x976bd46f6240 (size 64):
>comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s)
>hex dump (first 32 bytes):
>  00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00  .. .7...
>  ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00  8...
>backtrace:
>  [<064003cf>] __kmalloc_track_caller+0x136/0x379
>  [] krealloc+0x67/0x92
>  [] __alloc_dev_dax_range+0x73/0x25c
>  [<27d58626>] devm_create_dev_dax+0x27d/0x416
>  [<434abd43>] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core]
>  [<83726c1c>] dax_pmem_probe+0x10/0x1f [dax_pmem]
>  [] nvdimm_bus_probe+0x9d/0x340 [libnvdimm]
>  [] really_probe+0x230/0x48d
>  [<6cabd38e>] driver_probe_device+0x122/0x13b
>  [<29c7b95a>] device_driver_attach+0x5b/0x60
>  [<53e5659b>] bind_store+0xb7/0xc3
>  [] drv_attr_store+0x27/0x31
>  [<949069c5>] sysfs_kf_write+0x4a/0x57
>  [<4a8b5adf>] kernfs_fop_write+0x150/0x1e5
>  [] __vfs_write+0x1b/0x34
>  [] vfs_write+0xd8/0x1d1
> 
> Reported-by: Jane Chu 
> Cc: Zhen Lei 
> Signed-off-by: Dan Williams 
> ---
>  drivers/dax/bus.c |   44 +---
>  1 file changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 9761cb40d4bb..720cd140209f 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -367,19 +367,28 @@ void kill_dev_dax(struct dev_dax *dev_dax)
>  }
>  EXPORT_SYMBOL_GPL(kill_dev_dax);
>  
> -static void free_dev_dax_ranges(struct dev_dax *dev_dax)
> +static void trim_dev_dax_range(struct dev_dax *dev_dax)
>  {
> + int i = dev_dax->nr_range - 1;
> + struct range *range = &dev_dax->ranges[i].range;
>   struct dax_region *dax_region = dev_dax->region;
> - int i;
>  
>   device_lock_assert(dax_region->dev);
> - for (i = 0; i < dev_dax->nr_range; i++) {
> - struct range *range = &dev_dax->ranges[i].range;
> -
> - __release_region(&dax_region->res, range->start,
> - range_len(range));
> + dev_dbg(&dev_dax->dev, "delete range[%d]: %#llx:%#llx\n", i,
> + (unsigned long long)range->start,
> + (unsigned long long)range->end);
> +
> + __release_region(&dax_region->res, range->start, range_len(range));
> + if (--dev_dax->nr_range == 0) {
> + kfree(dev_dax->ranges);
> + dev_dax->ranges = NULL;
>   }
> - dev_dax->nr_range = 0;
> +}
> +
> +static void free_dev_dax_ranges(struct dev_dax *dev_dax)
> +{
> + while (dev_dax->nr_range)
It's better to use READ_ONCE to get the value of dev_dax->nr_range,
to prevent compiler optimization.

> + trim_dev_dax_range(dev_dax);
>  }
>  
>  static void unregister_dev_dax(void *dev)
> @@ -804,15 +813,10 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, 
> u64 start,
>   return 0;
>  
>   rc = devm_register_dax_mapping(dev_dax, dev_dax->nr_range - 1);
> - if (rc) {
> - dev_dbg(dev, "delete range[%d]: %pa:%pa\n", dev_dax->nr_range - 
> 1,
> - &alloc->start, &alloc->end);
> - dev_dax->nr_range--;
> - __release_region(res, alloc->start, resource_size(alloc));
> - return rc;
> - }
> + if (rc)
> + trim_dev_dax_range(dev_dax);
>  
> - return 0;
> + return rc;
>  }
>  
>  static int adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource 
> *res, resource_size_t size)
> @@ -885,12 +889,7 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, 
> resource_size_t size)
>   if (shrink >= range_len(range)) {
>   devm_release_action(dax_region->dev,
>   unregister_dax_mapping, &mapping->dev);
> - __release_region(&dax_region->res, range->start,
> - range_len(range));
> - dev_dax->nr_range--;
> - dev_dbg(dev, "delete range[%d]: %#llx:%#llx\n", i,
> - (unsigned long long) range->start,
> - (unsigned long long) range->end);
> + trim_dev_dax_range(dev_dax);
>   to_shrink -= shrink;
>   if (!to_shrink)
>   break;
> @@ -1267,7 +1266,6 @@ static void dev_dax_release(struct device *dev)
>   put_dax(dax_dev);
>   free_d

Re: [PATCH 1/2] perf/smmuv3: Don't reserve the register space that overlaps with the SMMUv3

2021-01-20 Thread Leizhen (ThunderTown)



On 2021/1/20 11:37, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/1/19 20:32, Robin Murphy wrote:
>> On 2021-01-19 01:59, Zhen Lei wrote:
>>> Some SMMUv3 implementation embed the Perf Monitor Group Registers (PMCG)
>>> inside the first 64kB region of the SMMU. Since SMMU and PMCG are managed
>>> by two separate drivers, and this driver depends on ARM_SMMU_V3, so the
>>> SMMU driver reserves the corresponding resource first, this driver should
>>> not reserve the corresponding resource again. Otherwise, a resource
>>> reservation conflict is reported during boot.
>>>
>>> Signed-off-by: Zhen Lei 
>>> ---
>>>   drivers/perf/arm_smmuv3_pmu.c | 42 
>>> --
>>>   1 file changed, 40 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>>> index 74474bb322c3f26..dcce085431c6ce8 100644
>>> --- a/drivers/perf/arm_smmuv3_pmu.c
>>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>>> @@ -761,6 +761,44 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
>>> *smmu_pmu)
>>>   dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
>>>   }
>>>   +static void __iomem *
>>> +smmu_pmu_get_and_ioremap_resource(struct platform_device *pdev,
>>> +  unsigned int index,
>>> +  struct resource **out_res)
>>> +{
>>> +    int ret;
>>> +    void __iomem *base;
>>> +    struct resource *res;
>>> +
>>> +    res = platform_get_resource(pdev, IORESOURCE_MEM, index);
>>> +    if (!res) {
>>> +    dev_err(&pdev->dev, "invalid resource\n");
>>> +    return IOMEM_ERR_PTR(-EINVAL);
>>> +    }
>>> +    if (out_res)
>>> +    *out_res = res;
>>> +
>>> +    ret = region_intersects(res->start, resource_size(res),
>>> +    IORESOURCE_MEM, IORES_DESC_NONE);
>>> +    if (ret == REGION_INTERSECTS) {
>>> +    /*
>>> + * The resource has already been reserved by the SMMUv3 driver.
>>> + * Don't reserve it again, just do devm_ioremap().
>>> + */
>>> +    base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
>>> +    } else {
>>> +    /*
>>> + * The resource may have not been reserved by any driver, or
>>> + * has been reserved but not type IORESOURCE_MEM. In the latter
>>> + * case, devm_ioremap_resource() reports a conflict and returns
>>> + * IOMEM_ERR_PTR(-EBUSY).
>>> + */
>>> +    base = devm_ioremap_resource(&pdev->dev, res);
>>> +    }
>>
>> What if the PMCG driver simply happens to probe first?
> 
> There are 4 cases:
> 1) ARM_SMMU_V3=m, ARM_SMMU_V3_PMU=y
>It's not allowed. Becase: ARM_SMMU_V3_PMU depends on ARM_SMMU_V3
>config ARM_SMMU_V3_PMU
>  tristate "ARM SMMUv3 Performance Monitors Extension"
>  depends on ARM64 && ACPI && ARM_SMMU_V3
> 
> 2) ARM_SMMU_V3=y, ARM_SMMU_V3_PMU=m
>No problem, SMMUv3 will be initialized first.
> 
> 3) ARM_SMMU_V3=y, ARM_SMMU_V3_PMU=y
>vi drivers/Makefile
>60 obj-y   += iommu/
>172 obj-$(CONFIG_PERF_EVENTS)   += perf/
> 
>This link sequence ensure that SMMUv3 driver will be initialized first.
>They are currently at the same initialization level.
> 
> 4) ARM_SMMU_V3=m, ARM_SMMU_V3_PMU=m
>Sorry, I thought module dependencies were generated based on "depends on".
>But I tried it today,module dependencies are generated only when symbol
>dependencies exist. I should use MODULE_SOFTDEP() to explicitly mark the
>dependency. I will send V2 later.
> 

Hi Robin:
  I think I misunderstood your question. The probe() instead of module_init()
determines the time for reserving register space resources.  So we'd better
reserve multiple small blocks of resources in SMMUv3 but perform ioremap() for
the entire resource, if the probe() of the PMCG occurs first.
  I'll refine these patches to make both initialization sequences work well.
I'm trying to send V2 this week.

> 
>>
>> Robin.
>>
>>> +
>>> +    return base;
>>> +}
>>> +
>>>   static int smmu_pmu_probe(struct platform_device *pdev)
>>>   {
>>>   struct smmu_pmu *smmu_pmu;
>>> @@ -793,7 +831,7 @@ static int smmu_pmu_probe(struct platfo

Re: [PATCH 1/2] perf/smmuv3: Don't reserve the register space that overlaps with the SMMUv3

2021-01-20 Thread Leizhen (ThunderTown)



On 2021/1/20 21:27, Robin Murphy wrote:
> On 2021-01-20 09:26, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/1/20 11:37, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2021/1/19 20:32, Robin Murphy wrote:
>>>> On 2021-01-19 01:59, Zhen Lei wrote:
>>>>> Some SMMUv3 implementation embed the Perf Monitor Group Registers (PMCG)
>>>>> inside the first 64kB region of the SMMU. Since SMMU and PMCG are managed
>>>>> by two separate drivers, and this driver depends on ARM_SMMU_V3, so the
>>>>> SMMU driver reserves the corresponding resource first, this driver should
>>>>> not reserve the corresponding resource again. Otherwise, a resource
>>>>> reservation conflict is reported during boot.
>>>>>
>>>>> Signed-off-by: Zhen Lei 
>>>>> ---
>>>>>    drivers/perf/arm_smmuv3_pmu.c | 42 
>>>>> --
>>>>>    1 file changed, 40 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>>>>> index 74474bb322c3f26..dcce085431c6ce8 100644
>>>>> --- a/drivers/perf/arm_smmuv3_pmu.c
>>>>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>>>>> @@ -761,6 +761,44 @@ static void smmu_pmu_get_acpi_options(struct 
>>>>> smmu_pmu *smmu_pmu)
>>>>>    dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
>>>>>    }
>>>>>    +static void __iomem *
>>>>> +smmu_pmu_get_and_ioremap_resource(struct platform_device *pdev,
>>>>> +  unsigned int index,
>>>>> +  struct resource **out_res)
>>>>> +{
>>>>> +    int ret;
>>>>> +    void __iomem *base;
>>>>> +    struct resource *res;
>>>>> +
>>>>> +    res = platform_get_resource(pdev, IORESOURCE_MEM, index);
>>>>> +    if (!res) {
>>>>> +    dev_err(&pdev->dev, "invalid resource\n");
>>>>> +    return IOMEM_ERR_PTR(-EINVAL);
>>>>> +    }
>>>>> +    if (out_res)
>>>>> +    *out_res = res;
>>>>> +
>>>>> +    ret = region_intersects(res->start, resource_size(res),
>>>>> +    IORESOURCE_MEM, IORES_DESC_NONE);
>>>>> +    if (ret == REGION_INTERSECTS) {
>>>>> +    /*
>>>>> + * The resource has already been reserved by the SMMUv3 driver.
>>>>> + * Don't reserve it again, just do devm_ioremap().
>>>>> + */
>>>>> +    base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
>>>>> +    } else {
>>>>> +    /*
>>>>> + * The resource may have not been reserved by any driver, or
>>>>> + * has been reserved but not type IORESOURCE_MEM. In the latter
>>>>> + * case, devm_ioremap_resource() reports a conflict and returns
>>>>> + * IOMEM_ERR_PTR(-EBUSY).
>>>>> + */
>>>>> +    base = devm_ioremap_resource(&pdev->dev, res);
>>>>> +    }
>>>>
>>>> What if the PMCG driver simply happens to probe first?
>>>
>>> There are 4 cases:
>>> 1) ARM_SMMU_V3=m, ARM_SMMU_V3_PMU=y
>>>     It's not allowed. Becase: ARM_SMMU_V3_PMU depends on ARM_SMMU_V3
>>>     config ARM_SMMU_V3_PMU
>>>   tristate "ARM SMMUv3 Performance Monitors Extension"
>>>   depends on ARM64 && ACPI && ARM_SMMU_V3
>>>
>>> 2) ARM_SMMU_V3=y, ARM_SMMU_V3_PMU=m
>>>     No problem, SMMUv3 will be initialized first.
>>>
>>> 3) ARM_SMMU_V3=y, ARM_SMMU_V3_PMU=y
>>>     vi drivers/Makefile
>>>     60 obj-y   += iommu/
>>>     172 obj-$(CONFIG_PERF_EVENTS)   += perf/
>>>
>>>     This link sequence ensure that SMMUv3 driver will be initialized first.
>>>     They are currently at the same initialization level.
>>>
>>> 4) ARM_SMMU_V3=m, ARM_SMMU_V3_PMU=m
>>>     Sorry, I thought module dependencies were generated based on "depends 
>>> on".
>>>     But I tried it today,module dependencies are generated only when symbol
>>>     dependencies exist. I should use MODULE_SOFTDEP() to e

Re: [PATCH 1/2] perf/smmuv3: Don't reserve the register space that overlaps with the SMMUv3

2021-01-20 Thread Leizhen (ThunderTown)



On 2021/1/20 23:54, Robin Murphy wrote:
> On 2021-01-20 14:14, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/1/20 21:27, Robin Murphy wrote:
>>> On 2021-01-20 09:26, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2021/1/20 11:37, Leizhen (ThunderTown) wrote:
>>>>>
>>>>>
>>>>> On 2021/1/19 20:32, Robin Murphy wrote:
>>>>>> On 2021-01-19 01:59, Zhen Lei wrote:
>>>>>>> Some SMMUv3 implementation embed the Perf Monitor Group Registers (PMCG)
>>>>>>> inside the first 64kB region of the SMMU. Since SMMU and PMCG are 
>>>>>>> managed
>>>>>>> by two separate drivers, and this driver depends on ARM_SMMU_V3, so the
>>>>>>> SMMU driver reserves the corresponding resource first, this driver 
>>>>>>> should
>>>>>>> not reserve the corresponding resource again. Otherwise, a resource
>>>>>>> reservation conflict is reported during boot.
>>>>>>>
>>>>>>> Signed-off-by: Zhen Lei 
>>>>>>> ---
>>>>>>>     drivers/perf/arm_smmuv3_pmu.c | 42 
>>>>>>> --
>>>>>>>     1 file changed, 40 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/perf/arm_smmuv3_pmu.c 
>>>>>>> b/drivers/perf/arm_smmuv3_pmu.c
>>>>>>> index 74474bb322c3f26..dcce085431c6ce8 100644
>>>>>>> --- a/drivers/perf/arm_smmuv3_pmu.c
>>>>>>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>>>>>>> @@ -761,6 +761,44 @@ static void smmu_pmu_get_acpi_options(struct 
>>>>>>> smmu_pmu *smmu_pmu)
>>>>>>>     dev_notice(smmu_pmu->dev, "option mask 0x%x\n", 
>>>>>>> smmu_pmu->options);
>>>>>>>     }
>>>>>>>     +static void __iomem *
>>>>>>> +smmu_pmu_get_and_ioremap_resource(struct platform_device *pdev,
>>>>>>> +  unsigned int index,
>>>>>>> +  struct resource **out_res)
>>>>>>> +{
>>>>>>> +    int ret;
>>>>>>> +    void __iomem *base;
>>>>>>> +    struct resource *res;
>>>>>>> +
>>>>>>> +    res = platform_get_resource(pdev, IORESOURCE_MEM, index);
>>>>>>> +    if (!res) {
>>>>>>> +    dev_err(&pdev->dev, "invalid resource\n");
>>>>>>> +    return IOMEM_ERR_PTR(-EINVAL);
>>>>>>> +    }
>>>>>>> +    if (out_res)
>>>>>>> +    *out_res = res;
>>>>>>> +
>>>>>>> +    ret = region_intersects(res->start, resource_size(res),
>>>>>>> +    IORESOURCE_MEM, IORES_DESC_NONE);
>>>>>>> +    if (ret == REGION_INTERSECTS) {
>>>>>>> +    /*
>>>>>>> + * The resource has already been reserved by the SMMUv3 driver.
>>>>>>> + * Don't reserve it again, just do devm_ioremap().
>>>>>>> + */
>>>>>>> +    base = devm_ioremap(&pdev->dev, res->start, 
>>>>>>> resource_size(res));
>>>>>>> +    } else {
>>>>>>> +    /*
>>>>>>> + * The resource may have not been reserved by any driver, or
>>>>>>> + * has been reserved but not type IORESOURCE_MEM. In the latter
>>>>>>> + * case, devm_ioremap_resource() reports a conflict and returns
>>>>>>> + * IOMEM_ERR_PTR(-EBUSY).
>>>>>>> + */
>>>>>>> +    base = devm_ioremap_resource(&pdev->dev, res);
>>>>>>> +    }
>>>>>>
>>>>>> What if the PMCG driver simply happens to probe first?
>>>>>
>>>>> There are 4 cases:
>>>>> 1) ARM_SMMU_V3=m, ARM_SMMU_V3_PMU=y
>>>>>  It's not allowed. Becase: ARM_SMMU_V3_PMU depends on ARM_SMMU_V3
>>>>>  config ARM_SMMU_V3_PMU
>>>>>    tristate "ARM SMMUv3 Performance Monitors Extension"
>>>>>    depends on ARM64 && ACPI && ARM_SMMU_V3
>>>

Re: [PATCH 2/2] Revert "iommu/arm-smmu-v3: Don't reserve implementation defined register space"

2021-01-20 Thread Leizhen (ThunderTown)



On 2021/1/20 23:02, Robin Murphy wrote:
> On 2021-01-19 01:59, Zhen Lei wrote:
>> This reverts commit 52f3fab0067d6fa9e99c1b7f63265dd48ca76046.
>>
>> This problem has been fixed by another patch. The original method had side
>> effects, it was not mapped to the user-specified resource size. The code
>> will become more complex when ECMDQ is supported later.
> 
> FWIW I don't think that's a significant issue either way - there could be any 
> number of imp-def pages between SMMU page 0 and the ECMDQ control pages, so 
> it will still be logical to map them as another separate thing anyway.

Yes, so now I'm thinking of preserving the SMMUv3 resources and eliminating the 
imp-def area. Then use another devm_ioremap() to cover the entire 
resource,assign it to smmu->base.
Otherwise, a base pointer needs to be defined for each separated register 
space,or call a function to convert each time.

> 
> Robin.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 
>> -
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 ---
>>   2 files changed, 4 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 8ca7415d785d9bf..477f473842e5272 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -91,8 +91,9 @@ struct arm_smmu_option_prop {
>>   static inline void __iomem *arm_smmu_page1_fixup(unsigned long offset,
>>    struct arm_smmu_device *smmu)
>>   {
>> -    if (offset > SZ_64K)
>> -    return smmu->page1 + offset - SZ_64K;
>> +    if ((offset > SZ_64K) &&
>> +    (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY))
>> +    offset -= SZ_64K;
>>     return smmu->base + offset;
>>   }
>> @@ -3486,18 +3487,6 @@ static int arm_smmu_set_bus_ops(struct iommu_ops *ops)
>>   return err;
>>   }
>>   -static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t 
>> start,
>> -  resource_size_t size)
>> -{
>> -    struct resource res = {
>> -    .flags = IORESOURCE_MEM,
>> -    .start = start,
>> -    .end = start + size - 1,
>> -    };
>> -
>> -    return devm_ioremap_resource(dev, &res);
>> -}
>> -
>>   static int arm_smmu_device_probe(struct platform_device *pdev)
>>   {
>>   int irq, ret;
>> @@ -3533,23 +3522,10 @@ static int arm_smmu_device_probe(struct 
>> platform_device *pdev)
>>   }
>>   ioaddr = res->start;
>>   -    /*
>> - * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
>> - * the PMCG registers which are reserved by the PMU driver.
>> - */
>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>> +    smmu->base = devm_ioremap_resource(dev, res);
>>   if (IS_ERR(smmu->base))
>>   return PTR_ERR(smmu->base);
>>   -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>> -    smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>> -   ARM_SMMU_REG_SZ);
>> -    if (IS_ERR(smmu->page1))
>> -    return PTR_ERR(smmu->page1);
>> -    } else {
>> -    smmu->page1 = smmu->base;
>> -    }
>> -
>>   /* Interrupt lines */
>>     irq = platform_get_irq_byname_optional(pdev, "combined");
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 96c2e9565e00282..0c3090c60840c22 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -152,8 +152,6 @@
>>   #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
>>   #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
>>   -#define ARM_SMMU_REG_SZ    0xe00
>> -
>>   /* Common MSI config fields */
>>   #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
>>   #define MSI_CFG2_SH    GENMASK(5, 4)
>> @@ -584,7 +582,6 @@ struct arm_smmu_strtab_cfg {
>>   struct arm_smmu_device {
>>   struct device    *dev;
>>   void __iomem    *base;
>> -    void __iomem    *page1;
>>     #define ARM_SMMU_FEAT_2_LVL_STRTAB    (1 << 0)
>>   #define ARM_SMMU_FEAT_2_LVL_CDTAB    (1 << 1)
>>
> 
> .
> 



Re: [PATCH 2/2] Revert "iommu/arm-smmu-v3: Don't reserve implementation defined register space"

2021-01-21 Thread Leizhen (ThunderTown)



On 2021/1/21 20:50, Robin Murphy wrote:
> On 2021-01-21 02:04, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/1/20 23:02, Robin Murphy wrote:
>>> On 2021-01-19 01:59, Zhen Lei wrote:
>>>> This reverts commit 52f3fab0067d6fa9e99c1b7f63265dd48ca76046.
>>>>
>>>> This problem has been fixed by another patch. The original method had side
>>>> effects, it was not mapped to the user-specified resource size. The code
>>>> will become more complex when ECMDQ is supported later.
>>>
>>> FWIW I don't think that's a significant issue either way - there could be 
>>> any number of imp-def pages between SMMU page 0 and the ECMDQ control 
>>> pages, so it will still be logical to map them as another separate thing 
>>> anyway.
>>
>> Yes, so now I'm thinking of preserving the SMMUv3 resources and eliminating 
>> the imp-def area. Then use another devm_ioremap() to cover the entire 
>> resource,assign it to smmu->base.
>> Otherwise, a base pointer needs to be defined for each separated register 
>> space,or call a function to convert each time.
> 
> But we'll almost certainly want to maintain a pointer to start of the ECMDQ 
> control page block anyway, since that's not fixed relative to smmu->base. 
> Therefore what's the harm in handling that via a dedicated mapping, once 
> we've determined that we *do* intend to use ECMDQs? Otherwise we end up with 
> in the complicated dance of trying to map "everything" up-front in order to 
> be able to read the ID registers to determine what the actual extent of 
> "everything" is supposed to be.

Currently, we only mapped the first 0xe00 size, so the 
SMMU_CMDQ_CONTROL_PAGE_XXXn registers space at offset 0x4000 should be mapped 
again.
The size of this ECMDQ resource is not fixed, depending on 
SMMU_IDR6.CMDQ_CONTROL_PAGE_LOG2NUMQ.
Processing its resource reservation to avoid resource conflict with PMCG is a 
bit more complicated.

> 
> (also this reminds me that I was going to remove arm_smmu_page1_fixup() 
> entirely - I'd totally forgotten about that...)

Ah, that patch you made is so clever.

> 
> Robin.
> 
>>>> Signed-off-by: Zhen Lei 
>>>> ---
>>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 
>>>> -
>>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 ---
>>>>    2 files changed, 4 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 8ca7415d785d9bf..477f473842e5272 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -91,8 +91,9 @@ struct arm_smmu_option_prop {
>>>>    static inline void __iomem *arm_smmu_page1_fixup(unsigned long offset,
>>>>     struct arm_smmu_device *smmu)
>>>>    {
>>>> -    if (offset > SZ_64K)
>>>> -    return smmu->page1 + offset - SZ_64K;
>>>> +    if ((offset > SZ_64K) &&
>>>> +    (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY))
>>>> +    offset -= SZ_64K;
>>>>      return smmu->base + offset;
>>>>    }
>>>> @@ -3486,18 +3487,6 @@ static int arm_smmu_set_bus_ops(struct iommu_ops 
>>>> *ops)
>>>>    return err;
>>>>    }
>>>>    -static void __iomem *arm_smmu_ioremap(struct device *dev, 
>>>> resource_size_t start,
>>>> -  resource_size_t size)
>>>> -{
>>>> -    struct resource res = {
>>>> -    .flags = IORESOURCE_MEM,
>>>> -    .start = start,
>>>> -    .end = start + size - 1,
>>>> -    };
>>>> -
>>>> -    return devm_ioremap_resource(dev, &res);
>>>> -}
>>>> -
>>>>    static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>    {
>>>>    int irq, ret;
>>>> @@ -3533,23 +3522,10 @@ static int arm_smmu_device_probe(struct 
>>>> platform_device *pdev)
>>>>    }
>>>>    ioaddr = res->start;
>>>>    -    /*
>>>> - * Don't map the IMPLEMENTATION DEFINED regions, since they may 
>>>> contain
>>>> - * the PMCG registers which are reserved by the PMU driver.
>>>> - */
>>>> -    smmu

Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-22 Thread Leizhen (ThunderTown)



On 2021/1/22 21:00, Robin Murphy wrote:
> On 2021-01-22 12:51, Will Deacon wrote:
>> On Thu, Nov 26, 2020 at 11:42:30AM +0800, Zhen Lei wrote:
>>> When changing from a set of pages/smaller blocks to a larger block for an
>>> address, the software should follow the sequence of BBML processing.
>>>
>>> When changing from a block to a set of pages/smaller blocks for an
>>> address, there's no need to use nT bit. If an address in the large block
>>> is accessed before page table switching, the TLB caches the large block
>>> mapping. After the page table is switched and before TLB invalidation
>>> finished, new access requests are still based on large block mapping.
>>> After the block or page is invalidated, the system reads the small block
>>> or page mapping from the memory; If the address in the large block is not
>>> accessed before page table switching, the TLB has no cache. After the
>>> page table is switched, a new access is initiated to read the small block
>>> or page mapping from the memory.
>>>
>>> Signed-off-by: Zhen Lei 
>>> ---
>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 +
>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>>   drivers/iommu/io-pgtable-arm.c  | 46 -
>>>   include/linux/io-pgtable.h  |  1 +
>>>   4 files changed, 40 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index e634bbe60573..14a1a11565fb 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -1977,6 +1977,7 @@ static int arm_smmu_domain_finalise(struct 
>>> iommu_domain *domain,
>>>   .coherent_walk    = smmu->features & ARM_SMMU_FEAT_COHERENCY,
>>>   .tlb    = &arm_smmu_flush_ops,
>>>   .iommu_dev    = smmu->dev,
>>> +    .bbml    = smmu->bbml,
>>>   };
>>>     if (smmu_domain->non_strict)
>>> @@ -3291,6 +3292,7 @@ static int arm_smmu_device_hw_probe(struct 
>>> arm_smmu_device *smmu)
>>>     /* IDR3 */
>>>   reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
>>> +    smmu->bbml = FIELD_GET(IDR3_BBML, reg);
>>>   if (FIELD_GET(IDR3_RIL, reg))
>>>   smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
>>>   diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> index d4b7f40ccb02..aa7eb460fa09 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> @@ -51,6 +51,7 @@
>>>   #define IDR1_SIDSIZE    GENMASK(5, 0)
>>>     #define ARM_SMMU_IDR3    0xc
>>> +#define IDR3_BBML    GENMASK(12, 11)
>>>   #define IDR3_RIL    (1 << 10)
>>>     #define ARM_SMMU_IDR5    0x14
>>> @@ -617,6 +618,7 @@ struct arm_smmu_device {
>>>     int    gerr_irq;
>>>   int    combined_irq;
>>> +    int    bbml;
>>>     unsigned long    ias; /* IPA */
>>>   unsigned long    oas; /* PA */
>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>> index a7a9bc08dcd1..341581337ad0 100644
>>> --- a/drivers/iommu/io-pgtable-arm.c
>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>> @@ -72,6 +72,7 @@
>>>     #define ARM_LPAE_PTE_NSTABLE    (((arm_lpae_iopte)1) << 63)
>>>   #define ARM_LPAE_PTE_XN    (((arm_lpae_iopte)3) << 53)
>>> +#define ARM_LPAE_PTE_nT    (((arm_lpae_iopte)1) << 16)
>>>   #define ARM_LPAE_PTE_AF    (((arm_lpae_iopte)1) << 10)
>>>   #define ARM_LPAE_PTE_SH_NS    (((arm_lpae_iopte)0) << 8)
>>>   #define ARM_LPAE_PTE_SH_OS    (((arm_lpae_iopte)2) << 8)
>>> @@ -255,7 +256,7 @@ static size_t __arm_lpae_unmap(struct 
>>> arm_lpae_io_pgtable *data,
>>>     static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>   phys_addr_t paddr, arm_lpae_iopte prot,
>>> -    int lvl, arm_lpae_iopte *ptep)
>>> +    int lvl, arm_lpae_iopte *ptep, arm_lpae_iopte nT)
>>>   {
>>>   arm_lpae_iopte pte = prot;
>>>   @@ -265,37 +266,60 @@ static void __arm_lpae_init_pte(struct 
>>> arm_lpae_io_pgtable *data,
>>>   pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>>>     pte |= paddr_to_iopte(paddr, data);
>>> +    pte |= nT;
>>>     __arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>>>   }
>>>   +static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, 
>>> int lvl,
>>> +    arm_lpae_iopte *ptep);
>>>   static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>    unsigned long iova, phys_addr_t paddr,
>>>    arm_lpae_iopte prot, int lvl,
>>>    arm_lpae_iopte *ptep)
>>>   {
>>>   arm_lpae_iopte pte = *ptep;
>>> +    struct io_pgtable_cfg *cfg = &data->iop.cfg;
>>>     if (iopte_leaf(pte, lvl, data->iop.fmt)) {
>>>   /* We require an unmap first */
>>>   

Re: [PATCH v2] dt-bindings: leds: Document commonly used LED triggers

2021-01-26 Thread Leizhen (ThunderTown)
Hi Manivannan:
  Do you have time to prepare v3? Hope it can be applied into v5.12


On 2020/12/15 6:36, Rob Herring wrote:
> On Thu, Dec 10, 2020 at 01:54:49PM +0530, Manivannan Sadhasivam wrote:
>> This commit documents the LED triggers used commonly in the SoCs. Not
>> all triggers are documented as some of them are very application specific.
>> Most of the triggers documented here are currently used in devicetrees
>> of many SoCs.
> 
> The idea with recent LED binding changes is to move away from 
> 'linux,default-trigger' to 'function' and 'trigger-sources' and to have 
> some sort of standardized names.
> 
>>
>> While at it, let's also sort the triggers in ascending order.
> 
> I'm not sure we want that. Probably better to keep related functions 
> together.
> 
>>
>> Signed-off-by: Manivannan Sadhasivam 
>> ---
>>
>> Changes in v2:
>>
>> * Added more triggers, fixed the regex
>> * Sorted triggers in ascending order
>>
>>  .../devicetree/bindings/leds/common.yaml  | 78 ++-
>>  1 file changed, 60 insertions(+), 18 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/leds/common.yaml 
>> b/Documentation/devicetree/bindings/leds/common.yaml
>> index f1211e7045f1..3c2e2208c1da 100644
>> --- a/Documentation/devicetree/bindings/leds/common.yaml
>> +++ b/Documentation/devicetree/bindings/leds/common.yaml



Re: [PATCH v3 2/4] arm64: dts: correct vendor prefix hisi to hisilicon

2021-01-26 Thread Leizhen (ThunderTown)



On 2021/1/27 6:23, Arnd Bergmann wrote:
> On Tue, Dec 8, 2020 at 1:46 PM Zhen Lei  wrote:
>>
>> The vendor prefix of "Hisilicon Limited" is "hisilicon", it is clearly
>> stated in "vendor-prefixes.yaml".
>>
>> Fixes: 35ca8168133c ("arm64: dts: Add dts files for Hisilicon Hi3660 SoC")
>> Fixes: dd8c7b78c11b ("arm64: dts: Add devicetree for Hisilicon Hi3670 SoC")
>> Signed-off-by: Zhen Lei 
>> Cc: Chen Feng 
>> Cc: Manivannan Sadhasivam 
> 
> I see this change in the pull request I got, but I'm a bit worried about the
> incompatible binding change. Wouldn't the correct path forward be to
> list both the correct and the incorrect properties, both in the dts file
> and in the driver that interprets the properties?

Hi, Arnd:

This is one of the patch series. The other three patches have been applied by 
Philipp Zabel and are currently in linux-next.

https://lkml.org/lkml/2020/12/10/697

> 
> The binding file in this case would need to list the old name as deprecated,
> though I'm not sure how that would work without causing a warning about
> the unknown vendor prefix.
> 
> Arnd
> 
> .
> 



Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-26 Thread Leizhen (ThunderTown)



On 2021/1/26 18:12, Will Deacon wrote:
> On Mon, Jan 25, 2021 at 08:23:40PM +, Robin Murphy wrote:
>> Now we probably will need some degreee of BBML feature awareness for the
>> sake of SVA if and when we start using it for CPU pagetables, but I still
>> cannot see any need to consider it in io-pgtable.
> 
> Agreed; I don't think this is something that io-pgtable should have to care
> about.

Yes, the SVA works in stall mode, and the failed device access requests are not
discarded.

Let me look for examples. The BBML usage scenario was told by a former 
colleague.

> 
> Will
> 
> .
> 



Re: [PATCH 1/1] iommu/arm-smmu-v3: Use DEFINE_RES_MEM() to simplify code

2021-01-26 Thread Leizhen (ThunderTown)
I've sent another set of patches. https://lkml.org/lkml/2021/1/26/1065
If those patches are acceptable, then this one should be ignored.


On 2021/1/22 21:14, Zhen Lei wrote:
> No functional change.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index bca458c00e48a8b..f04c55a7503c790 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3479,11 +3479,7 @@ static int arm_smmu_set_bus_ops(struct iommu_ops *ops)
>  static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t 
> start,
> resource_size_t size)
>  {
> - struct resource res = {
> - .flags = IORESOURCE_MEM,
> - .start = start,
> - .end = start + size - 1,
> - };
> + struct resource res = DEFINE_RES_MEM(start, size);
>  
>   return devm_ioremap_resource(dev, &res);
>  }
> 



Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-23 Thread Leizhen (ThunderTown)



On 2021/1/22 20:51, Will Deacon wrote:
> On Thu, Nov 26, 2020 at 11:42:30AM +0800, Zhen Lei wrote:
>> When changing from a set of pages/smaller blocks to a larger block for an
>> address, the software should follow the sequence of BBML processing.
>>
>> When changing from a block to a set of pages/smaller blocks for an
>> address, there's no need to use nT bit. If an address in the large block
>> is accessed before page table switching, the TLB caches the large block
>> mapping. After the page table is switched and before TLB invalidation
>> finished, new access requests are still based on large block mapping.
>> After the block or page is invalidated, the system reads the small block
>> or page mapping from the memory; If the address in the large block is not
>> accessed before page table switching, the TLB has no cache. After the
>> page table is switched, a new access is initiated to read the small block
>> or page mapping from the memory.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 +
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>  drivers/iommu/io-pgtable-arm.c  | 46 -
>>  include/linux/io-pgtable.h  |  1 +
>>  4 files changed, 40 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index e634bbe60573..14a1a11565fb 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1977,6 +1977,7 @@ static int arm_smmu_domain_finalise(struct 
>> iommu_domain *domain,
>>  .coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENCY,
>>  .tlb= &arm_smmu_flush_ops,
>>  .iommu_dev  = smmu->dev,
>> +.bbml   = smmu->bbml,
>>  };
>>  
>>  if (smmu_domain->non_strict)
>> @@ -3291,6 +3292,7 @@ static int arm_smmu_device_hw_probe(struct 
>> arm_smmu_device *smmu)
>>  
>>  /* IDR3 */
>>  reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
>> +smmu->bbml = FIELD_GET(IDR3_BBML, reg);
>>  if (FIELD_GET(IDR3_RIL, reg))
>>  smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
>>  
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index d4b7f40ccb02..aa7eb460fa09 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -51,6 +51,7 @@
>>  #define IDR1_SIDSIZEGENMASK(5, 0)
>>  
>>  #define ARM_SMMU_IDR3   0xc
>> +#define IDR3_BBML   GENMASK(12, 11)
>>  #define IDR3_RIL(1 << 10)
>>  
>>  #define ARM_SMMU_IDR5   0x14
>> @@ -617,6 +618,7 @@ struct arm_smmu_device {
>>  
>>  int gerr_irq;
>>  int combined_irq;
>> +int bbml;
>>  
>>  unsigned long   ias; /* IPA */
>>  unsigned long   oas; /* PA */
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index a7a9bc08dcd1..341581337ad0 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -72,6 +72,7 @@
>>  
>>  #define ARM_LPAE_PTE_NSTABLE(((arm_lpae_iopte)1) << 63)
>>  #define ARM_LPAE_PTE_XN (((arm_lpae_iopte)3) << 53)
>> +#define ARM_LPAE_PTE_nT (((arm_lpae_iopte)1) << 16)
>>  #define ARM_LPAE_PTE_AF (((arm_lpae_iopte)1) << 10)
>>  #define ARM_LPAE_PTE_SH_NS  (((arm_lpae_iopte)0) << 8)
>>  #define ARM_LPAE_PTE_SH_OS  (((arm_lpae_iopte)2) << 8)
>> @@ -255,7 +256,7 @@ static size_t __arm_lpae_unmap(struct 
>> arm_lpae_io_pgtable *data,
>>  
>>  static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>  phys_addr_t paddr, arm_lpae_iopte prot,
>> -int lvl, arm_lpae_iopte *ptep)
>> +int lvl, arm_lpae_iopte *ptep, arm_lpae_iopte 
>> nT)
>>  {
>>  arm_lpae_iopte pte = prot;
>>  
>> @@ -265,37 +266,60 @@ static void __arm_lpae_init_pte(struct 
>> arm_lpae_io_pgtable *data,
>>  pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>>  
>>  pte |= paddr_to_iopte(paddr, data);
>> +pte |= nT;
>>  
>>  __arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>>  }
>>  
>> +static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int 
>> lvl,
>> +arm_lpae_iopte *ptep);
>>  static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>   unsigned long iova, phys_addr_t paddr,
>>   arm_lpae_iopte prot, int lvl,
>>   arm_lpae_iopte *ptep)
>>  {
>>  arm_lpae_iopte pte = *ptep;
>> +struct io_pgtable_cfg *cfg = &data->iop.cfg;

Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-23 Thread Leizhen (ThunderTown)



On 2021/1/22 21:00, Robin Murphy wrote:
> On 2021-01-22 12:51, Will Deacon wrote:
>> On Thu, Nov 26, 2020 at 11:42:30AM +0800, Zhen Lei wrote:
>>> When changing from a set of pages/smaller blocks to a larger block for an
>>> address, the software should follow the sequence of BBML processing.
>>>
>>> When changing from a block to a set of pages/smaller blocks for an
>>> address, there's no need to use nT bit. If an address in the large block
>>> is accessed before page table switching, the TLB caches the large block
>>> mapping. After the page table is switched and before TLB invalidation
>>> finished, new access requests are still based on large block mapping.
>>> After the block or page is invalidated, the system reads the small block
>>> or page mapping from the memory; If the address in the large block is not
>>> accessed before page table switching, the TLB has no cache. After the
>>> page table is switched, a new access is initiated to read the small block
>>> or page mapping from the memory.
>>>
>>> Signed-off-by: Zhen Lei 
>>> ---
>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 +
>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>>   drivers/iommu/io-pgtable-arm.c  | 46 -
>>>   include/linux/io-pgtable.h  |  1 +
>>>   4 files changed, 40 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index e634bbe60573..14a1a11565fb 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -1977,6 +1977,7 @@ static int arm_smmu_domain_finalise(struct 
>>> iommu_domain *domain,
>>>   .coherent_walk    = smmu->features & ARM_SMMU_FEAT_COHERENCY,
>>>   .tlb    = &arm_smmu_flush_ops,
>>>   .iommu_dev    = smmu->dev,
>>> +    .bbml    = smmu->bbml,
>>>   };
>>>     if (smmu_domain->non_strict)
>>> @@ -3291,6 +3292,7 @@ static int arm_smmu_device_hw_probe(struct 
>>> arm_smmu_device *smmu)
>>>     /* IDR3 */
>>>   reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
>>> +    smmu->bbml = FIELD_GET(IDR3_BBML, reg);
>>>   if (FIELD_GET(IDR3_RIL, reg))
>>>   smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
>>>   diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> index d4b7f40ccb02..aa7eb460fa09 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> @@ -51,6 +51,7 @@
>>>   #define IDR1_SIDSIZE    GENMASK(5, 0)
>>>     #define ARM_SMMU_IDR3    0xc
>>> +#define IDR3_BBML    GENMASK(12, 11)
>>>   #define IDR3_RIL    (1 << 10)
>>>     #define ARM_SMMU_IDR5    0x14
>>> @@ -617,6 +618,7 @@ struct arm_smmu_device {
>>>     int    gerr_irq;
>>>   int    combined_irq;
>>> +    int    bbml;
>>>     unsigned long    ias; /* IPA */
>>>   unsigned long    oas; /* PA */
>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>> index a7a9bc08dcd1..341581337ad0 100644
>>> --- a/drivers/iommu/io-pgtable-arm.c
>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>> @@ -72,6 +72,7 @@
>>>     #define ARM_LPAE_PTE_NSTABLE    (((arm_lpae_iopte)1) << 63)
>>>   #define ARM_LPAE_PTE_XN    (((arm_lpae_iopte)3) << 53)
>>> +#define ARM_LPAE_PTE_nT    (((arm_lpae_iopte)1) << 16)
>>>   #define ARM_LPAE_PTE_AF    (((arm_lpae_iopte)1) << 10)
>>>   #define ARM_LPAE_PTE_SH_NS    (((arm_lpae_iopte)0) << 8)
>>>   #define ARM_LPAE_PTE_SH_OS    (((arm_lpae_iopte)2) << 8)
>>> @@ -255,7 +256,7 @@ static size_t __arm_lpae_unmap(struct 
>>> arm_lpae_io_pgtable *data,
>>>     static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>   phys_addr_t paddr, arm_lpae_iopte prot,
>>> -    int lvl, arm_lpae_iopte *ptep)
>>> +    int lvl, arm_lpae_iopte *ptep, arm_lpae_iopte nT)
>>>   {
>>>   arm_lpae_iopte pte = prot;
>>>   @@ -265,37 +266,60 @@ static void __arm_lpae_init_pte(struct 
>>> arm_lpae_io_pgtable *data,
>>>   pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>>>     pte |= paddr_to_iopte(paddr, data);
>>> +    pte |= nT;
>>>     __arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>>>   }
>>>   +static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, 
>>> int lvl,
>>> +    arm_lpae_iopte *ptep);
>>>   static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>>    unsigned long iova, phys_addr_t paddr,
>>>    arm_lpae_iopte prot, int lvl,
>>>    arm_lpae_iopte *ptep)
>>>   {
>>>   arm_lpae_iopte pte = *ptep;
>>> +    struct io_pgtable_cfg *cfg = &data->iop.cfg;
>>>     if (iopte_leaf(pte, lvl, data->iop.fmt)) {
>>>   /* We require an unmap first */
>>>   

Re: [PATCH 1/1] iommu/arm-smmu-v3: Use DEFINE_RES_MEM() to simplify code

2021-01-27 Thread Leizhen (ThunderTown)



On 2021/1/27 17:23, Will Deacon wrote:
> On Wed, Jan 27, 2021 at 10:05:50AM +0800, Leizhen (ThunderTown) wrote:
>> I've sent another set of patches. https://lkml.org/lkml/2021/1/26/1065
>> If those patches are acceptable, then this one should be ignored.
> 
> I've already queued this one, so if you want me to drop it then you need to
> send me a revert.

Thanks. Since it's queued, keep it. I'll update the new patch set.

> 
> Will
> 
> .
> 



Re: [PATCH v5 0/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

2021-01-27 Thread Leizhen (ThunderTown)
Hi Russell and Arnd:
  Do you have time to review it?


On 2021/1/16 11:27, Zhen Lei wrote:
> v4 --> v5:
> 1. Add SoC macro ARCH_KUNPENG50X, and the Kunpeng L3 cache controller only 
> enabled
>on that platform.
> 2. Require the compatible string of the Kunpeng L3 cache controller must have 
> a
>relevant name on a specific SoC. For example:
>compatible = "hisilicon,kunpeng509-l3cache", "hisilicon,kunpeng-l3cache";
> 
> v3 --> v4:
> Rename the compatible string from "hisilicon,l3cache" to 
> "hisilicon,kunpeng-l3cache".
> Then adjust the file name, configuration option name, and description 
> accordingly.
> 
> v2 --> v3:
> Add Hisilicon L3 cache controller driver and its document. That's: patch 2-3.
> 
> v1 --> v2:
> Discard the middle-tier functions and do silent narrowing cast in the outcache
> hook functions. For example:
> -static void l2c220_inv_range(unsigned long start, unsigned long end)
> +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
>  {
> + unsigned long start = pa_start;
> + unsigned long end = pa_end;
> 
> 
> v1:
> Do cast phys_addr_t to unsigned long by adding a middle-tier function.
> For example:
> -static void l2c220_inv_range(unsigned long start, unsigned long end)
> +static void __l2c220_inv_range(unsigned long start, unsigned long end)
>  {
>   ...
>  }
> +static void l2c220_inv_range(phys_addr_t start, phys_addr_t end)
> +{
> +  __l2c220_inv_range(start, end);
> +}
> 
> 
> Zhen Lei (4):
>   ARM: LPAE: Use phys_addr_t instead of unsigned long in outercache
> hooks
>   ARM: hisi: add support for Kunpeng50x SoC
>   dt-bindings: arm: hisilicon: Add binding for Kunpeng L3 cache
> controller
>   ARM: Add support for Hisilicon Kunpeng L3 cache controller
> 
>  .../arm/hisilicon/kunpeng-l3cache.yaml|  40 +
>  arch/arm/include/asm/outercache.h |   6 +-
>  arch/arm/mach-hisi/Kconfig|   8 +
>  arch/arm/mm/Kconfig   |  10 ++
>  arch/arm/mm/Makefile  |   1 +
>  arch/arm/mm/cache-feroceon-l2.c   |  15 +-
>  arch/arm/mm/cache-kunpeng-l3.c| 153 ++
>  arch/arm/mm/cache-kunpeng-l3.h|  30 
>  arch/arm/mm/cache-l2x0.c  |  50 --
>  arch/arm/mm/cache-tauros2.c   |  15 +-
>  arch/arm/mm/cache-uniphier.c  |   6 +-
>  arch/arm/mm/cache-xsc3l2.c|  12 +-
>  12 files changed, 317 insertions(+), 29 deletions(-)
>  create mode 100644 
> Documentation/devicetree/bindings/arm/hisilicon/kunpeng-l3cache.yaml
>  create mode 100644 arch/arm/mm/cache-kunpeng-l3.c
>  create mode 100644 arch/arm/mm/cache-kunpeng-l3.h
> 



Re: [PATCH 1/1] perf diff: fix error return value in __cmd_diff()

2020-11-27 Thread Leizhen (ThunderTown)
Hi everybody:
  Can any one review it?


On 2020/11/24 18:36, Zhen Lei wrote:
> An appropriate return value should be set on the failed path.
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Zhen Lei 
> ---
>  tools/perf/builtin-diff.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index 584e2e1a3793..cefc71506409 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -1222,8 +1222,10 @@ static int __cmd_diff(void)
>   if (compute == COMPUTE_STREAM) {
>   d->evlist_streams = evlist__create_streams(
>   d->session->evlist, 5);
> - if (!d->evlist_streams)
> + if (!d->evlist_streams) {
> + ret = -ENOMEM;
>   goto out_delete;
> + }
>   }
>   }
>  
> 



Re: [PATCH 1/1] perf diff: fix error return value in __cmd_diff()

2020-11-27 Thread Leizhen (ThunderTown)



On 2020/11/28 1:25, Arnaldo Carvalho de Melo wrote:
> Em Fri, Nov 27, 2020 at 02:22:02PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Fri, Nov 27, 2020 at 10:35:37PM +0900, Namhyung Kim escreveu:
>>> On Tue, Nov 24, 2020 at 7:37 PM Zhen Lei  wrote:
> 
 An appropriate return value should be set on the failed path.
> 
 Reported-by: Hulk Robot 
 Signed-off-by: Zhen Lei 
>  
>>> Acked-by: Namhyung Kim 
>  
>> Thanks, applied.
> 
> I also added this:
> 
> Cc: Jin Yao 
> Fixes: 2a09a84c720b436a ("perf diff: Support hot streams comparison")
> 
> Please add the fixes line and CC the author of the patch introducing the
> bug next time,

Okay, I'll do that next time. Thanks for the heads-up.

> 
> Thanks
> 
> - Arnaldo
> 
> .
> 



Re: [PATCH 1/1] of: to support binding numa node to root subnode(non-bus)

2015-08-24 Thread Leizhen (ThunderTown)


On 2015/8/24 21:25, Rob Herring wrote:
> +benh
> 
> On Mon, Aug 24, 2015 at 7:30 AM, Zhen Lei  wrote:
>> If use of_platform_populate to scan dt-nodes and add devices, the
>> subnode of root(such as /smmu), when being scanned and invoke
> 
> You should have a bus as the sub-node of root rather than devices
> directly off of root. You still have a problem though...

But actually the parent of bus is also &platform_bus if we didn't have special 
initialization.
For example:
The function of_platform_device_create_pdata invoke of_device_alloc first, then 
invoke of_device_add.
But in of_device_alloc, we can find that:
dev->dev.parent = parent ? : &platform_bus;

> 
>> of_device_add, the ofdev->dev.parent is always equal &platform_bus. So
>> that, function set_dev_node will not be called. And in device_add,
>> dev_to_node(parent) always return NUMA_NO_NODE.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/base/core.c | 2 +-
>>  drivers/of/device.c | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index dafae6d..5df4f46b 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -1017,7 +1017,7 @@ int device_add(struct device *dev)
>> dev->kobj.parent = kobj;
>>
>> /* use parent numa_node */
>> -   if (parent)
>> +   if (parent && (parent != &platform_bus))
> 
> This is only fixing one specific case, but I think things are broken
> for any case where the NUMA associativity if not set at the top level
> bus node. I think this should be something like:
> 
> if (parent && (dev_to_node(dev) != NO_NUMA_NODE))

It seems a mistake, we should use equal sign.
if (parent && (dev_to_node(dev) == NUMA_NO_NODE))

> 
> Then the OF code can set the node however it wants.

OK. I will send patch v2 base upon your advice. Thank you.

> 
>> set_dev_node(dev, dev_to_node(parent));
>>
>> /* first, register with generic layer. */
>> diff --git a/drivers/of/device.c b/drivers/of/device.c
>> index 8b91ea2..96ebece 100644
>> --- a/drivers/of/device.c
>> +++ b/drivers/of/device.c
>> @@ -63,7 +63,7 @@ int of_device_add(struct platform_device *ofdev)
>> /* device_add will assume that this device is on the same node as
>>  * the parent. If there is no parent defined, set the node
>>  * explicitly */
>> -   if (!ofdev->dev.parent)
>> +   if (!ofdev->dev.parent || (ofdev->dev.parent == &platform_bus))
> 
> And then remove the if here.
> 

OK. I also think remove this statement will be better. Althouth set_dev_node 
maybe called two times,
but it only spends very little time, and this almost happened at initialization 
phase.

>> set_dev_node(&ofdev->dev, 
>> of_node_to_nid(ofdev->dev.of_node));
>>
>> return device_add(&ofdev->dev);
>> --
>> 2.5.0
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] arm64: fix flush_cache_range

2016-05-24 Thread Leizhen (ThunderTown)


On 2016/5/24 19:37, Mark Rutland wrote:
> On Tue, May 24, 2016 at 07:16:37PM +0800, Zhen Lei wrote:
>> When we ran mprotect04(a test case in LTP) infinitely, it would always
>> failed after a few seconds. The case can be described briefly that: copy
>> a empty function from code area into a new memory area(created by mmap),
>> then call mprotect to change the protection to PROT_EXEC. The syscall
>> sys_mprotect will finally invoke flush_cache_range, but this function
>> currently only invalid icache, the operation of flush dcache is missed.
> 
> In the LTP code I see powerpc-specific D-cache / I-cache synchronisation
> (i.e. d-cache cleaning followed by I-cache invalidation), so there
> appears to be some expectation of userspace maintenance. Hoever, there
> is no such ARM-specific I-cache maintenance.
But I see some other platforms have D-cache maintenance, like: 
arch/nios2/mm/cacheflush.c
And according to the name of flush_cache_range, it should do this, I judged. 
Otherwise,
mprotect04 will be failed on more platforms, it's easy to discover. Only PPC 
have specific
cache synchronization, maybe it meets some hardware limitation. It's impossible 
a programmer
fixed a common bug on only one platform but leave others unchanged.

> 
> It looks like the test may be missing I-cache maintenance regardless of
> the semantics of mprotect in this case.
> 
> I have not yet devled into flush_cache_range and how it is called.

SYSCALL_DEFINE3(mprotect ---> mprotect_fixup ---> change_protection ---> 
change_protection_range --> flush_cache_range

> 
> Thanks,
> Mark.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/mm/flush.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>> index dbd12ea..eda4124 100644
>> --- a/arch/arm64/mm/flush.c
>> +++ b/arch/arm64/mm/flush.c
>> @@ -31,7 +31,7 @@ void flush_cache_range(struct vm_area_struct *vma, 
>> unsigned long start,
>> unsigned long end)
>>  {
>>  if (vma->vm_flags & VM_EXEC)
>> -__flush_icache_all();
>> +flush_icache_range(start, end);
>>  }
>>
>>  static void sync_icache_aliases(void *kaddr, unsigned long len)
>> --
>> 2.5.0
>>
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 
> .
> 



Re: [PATCH 1/1] arm64: fix flush_cache_range

2016-05-24 Thread Leizhen (ThunderTown)


On 2016/5/25 9:20, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/5/24 21:02, Catalin Marinas wrote:
>> On Tue, May 24, 2016 at 08:19:05PM +0800, Leizhen (ThunderTown) wrote:
>>> On 2016/5/24 19:37, Mark Rutland wrote:
>>>> On Tue, May 24, 2016 at 07:16:37PM +0800, Zhen Lei wrote:
>>>>> When we ran mprotect04(a test case in LTP) infinitely, it would always
>>>>> failed after a few seconds. The case can be described briefly that: copy
>>>>> a empty function from code area into a new memory area(created by mmap),
>>>>> then call mprotect to change the protection to PROT_EXEC. The syscall
>>>>> sys_mprotect will finally invoke flush_cache_range, but this function
>>>>> currently only invalid icache, the operation of flush dcache is missed.
>>>>
>>>> In the LTP code I see powerpc-specific D-cache / I-cache synchronisation
>>>> (i.e. d-cache cleaning followed by I-cache invalidation), so there
>>>> appears to be some expectation of userspace maintenance. Hoever, there
>>>> is no such ARM-specific I-cache maintenance.
>>>
>>> But I see some other platforms have D-cache maintenance, like: 
>>> arch/nios2/mm/cacheflush.c
>>> And according to the name of flush_cache_range, it should do this, I 
>>> judged. Otherwise,
>>> mprotect04 will be failed on more platforms, it's easy to discover. Only 
>>> PPC have specific
>>> cache synchronization, maybe it meets some hardware limitation. It's 
>>> impossible a programmer
>>> fixed a common bug on only one platform but leave others unchanged.
>>
>> flush_cache_range() is primarily used on VIVT caches before changing the
>> mapping and should not really be implemented on arm64. I don't recall
>> why we still have the I-cache invalidation, possibly for the ASID-tagged
>> VIVT I-cache case, though we should have a specific check for this.
>>
>> There are some other cases where flush_cache_range() is called and no
>> D-cache maintenance is necessary on arm64, so I don't want to penalise
>> them by implementing flush_cache_range().
>>
>>>> It looks like the test may be missing I-cache maintenance regardless of
>>>> the semantics of mprotect in this case.
>>>>
>>>> I have not yet devled into flush_cache_range and how it is called.
>>>
>>> SYSCALL_DEFINE3(mprotect ---> mprotect_fixup ---> change_protection ---> 
>>> change_protection_range --> flush_cache_range
>>
>> The change_protection() shouldn't need to flush the caches in
>> flush_cache_range(). The change_pte_range() function eventually ends up
>> calling set_pte_at() which calls __sync_icache_dcache() if the mapping
>> is executable.
> 
> OK, I see.
> But I'm afraid it entered the "if (pte_present(oldpte))" branch in function 
> change_pte_range.
> Because the test case called mmap to create pte first, then called pte_modify.
> I will check it later.

I have checked that it entered "if (pte_present(oldpte))" branch.

But I don't known why I add flush_icache_range is OK, but add 
__sync_icache_dcache have no effect.

> 
>>
>> Can you be more specific about the kernel version you are using, its
>> configuration?
>>
> I used the latest mainline kernel version, and built with 
> arch/arm64/configs/defconfig, ran on our D02 board.
> I have attached the testcase, you can simply run: sh test.sh
> 



Re: [PATCH 1/1] arm64: fix flush_cache_range

2016-05-25 Thread Leizhen (ThunderTown)


On 2016/5/25 18:50, Catalin Marinas wrote:
> On Wed, May 25, 2016 at 11:36:38AM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/5/25 9:20, Leizhen (ThunderTown) wrote:
>>> On 2016/5/24 21:02, Catalin Marinas wrote:
>>>> On Tue, May 24, 2016 at 08:19:05PM +0800, Leizhen (ThunderTown) wrote:
>>>>> On 2016/5/24 19:37, Mark Rutland wrote:
>>>>>> It looks like the test may be missing I-cache maintenance regardless of
>>>>>> the semantics of mprotect in this case.
>>>>>>
>>>>>> I have not yet devled into flush_cache_range and how it is called.
>>>>>
>>>>> SYSCALL_DEFINE3(mprotect ---> mprotect_fixup ---> change_protection ---> 
>>>>> change_protection_range --> flush_cache_range
>>>>
>>>> The change_protection() shouldn't need to flush the caches in
>>>> flush_cache_range(). The change_pte_range() function eventually ends up
>>>> calling set_pte_at() which calls __sync_icache_dcache() if the mapping
>>>> is executable.
>>>
>>> OK, I see.
>>> But I'm afraid it entered the "if (pte_present(oldpte))" branch in
>>> function change_pte_range. Because the test case called mmap to
>>> create pte first, then called pte_modify. I will check it later.
>>
>> I have checked that it entered "if (pte_present(oldpte))" branch.
> 
> This path eventually calls set_pte_at() via ptep_modify_prot_commit().
OK, I see.

> 
>> But I don't known why I add flush_icache_range is OK, but add
>> __sync_icache_dcache have no effect.
> 
> Do you mean you modified set_pte_at() to use flush_icache_range()
Just about. I added in change_pte_range after below statement.
ptent = pte_modify(ptent, newprot);

> instead of __sync_icache_dcache() and it works?
Yes.

> 
> What happens is that __sync_icache_dcache() only takes care of the first
> time a page is mapped in user space and flushes the caches, marking it
> as "clean" (PG_dcache_clean) afterwards. Subsequent changes to this
> mapping or writes to it are entirely the responsibility of the user. So
> if the user plans to execute instructions, it better explicitly flush
> the caches (as Mark Rutland already stated in a previous reply).
> 
> I ran our internal LTP version yesterday and it was fine but didn't
> realise that we actually patched mprotect04.c to include:
> 
>   __clear_cache((char *)func, (char *)func + page_sz);
> 
> just after memcpy().
Yes, I aslo tried this before I sent this patch. Flush dcache in userspace
or kernel can both fixs this problem.

> 
> (we still need to investigate whether the I-cache invalidation is
> actually needed in flush_cache_range() or it's just something we forgot
> to remove)
> 



Re: [PATCH 1/1] arm64: fix flush_cache_range

2016-05-26 Thread Leizhen (ThunderTown)


On 2016/5/25 18:50, Catalin Marinas wrote:
> On Wed, May 25, 2016 at 11:36:38AM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/5/25 9:20, Leizhen (ThunderTown) wrote:
>>> On 2016/5/24 21:02, Catalin Marinas wrote:
>>>> On Tue, May 24, 2016 at 08:19:05PM +0800, Leizhen (ThunderTown) wrote:
>>>>> On 2016/5/24 19:37, Mark Rutland wrote:
>>>>>> It looks like the test may be missing I-cache maintenance regardless of
>>>>>> the semantics of mprotect in this case.
>>>>>>
>>>>>> I have not yet devled into flush_cache_range and how it is called.
>>>>>
>>>>> SYSCALL_DEFINE3(mprotect ---> mprotect_fixup ---> change_protection ---> 
>>>>> change_protection_range --> flush_cache_range
>>>>
>>>> The change_protection() shouldn't need to flush the caches in
>>>> flush_cache_range(). The change_pte_range() function eventually ends up
>>>> calling set_pte_at() which calls __sync_icache_dcache() if the mapping
>>>> is executable.
>>>
>>> OK, I see.
>>> But I'm afraid it entered the "if (pte_present(oldpte))" branch in
>>> function change_pte_range. Because the test case called mmap to
>>> create pte first, then called pte_modify. I will check it later.
>>
>> I have checked that it entered "if (pte_present(oldpte))" branch.
> 
> This path eventually calls set_pte_at() via ptep_modify_prot_commit().
> 
>> But I don't known why I add flush_icache_range is OK, but add
>> __sync_icache_dcache have no effect.
> 
> Do you mean you modified set_pte_at() to use flush_icache_range()
> instead of __sync_icache_dcache() and it works?
> 
> What happens is that __sync_icache_dcache() only takes care of the first
> time a page is mapped in user space and flushes the caches, marking it
> as "clean" (PG_dcache_clean) afterwards. Subsequent changes to this

Hi,
As my tracing, it is returned by "if (!page_mapping(page))", because "mmap" are 
anonymous pages. I commented below code lines, it works well.

/* no flushing needed for anonymous pages */
if (!page_mapping(page))
return;


I printed the page information three times, as below:
page->mapping=8017baf36961, page->flags=0x10040048
page->mapping=8017b265bf51, page->flags=0x10040048
page->mapping=8017b94fc5a1, page->flags=0x10040048

PG_slab=7, PG_arch_1=9, PG_swapcache=15

> mapping or writes to it are entirely the responsibility of the user. So
> if the user plans to execute instructions, it better explicitly flush
> the caches (as Mark Rutland already stated in a previous reply).
> 
> I ran our internal LTP version yesterday and it was fine but didn't
> realise that we actually patched mprotect04.c to include:
> 
>   __clear_cache((char *)func, (char *)func + page_sz);
> 
> just after memcpy().
> 
> (we still need to investigate whether the I-cache invalidation is
> actually needed in flush_cache_range() or it's just something we forgot
> to remove)
> 



Re: [PATCH 3/3] arm64/numa: fix type info

2016-05-26 Thread Leizhen (ThunderTown)


On 2016/5/27 1:12, David Daney wrote:
> The current patch to correct this problem is here:
> 
> https://lkml.org/lkml/2016/5/24/679
> 
> Since v7 of the ACPI/NUMA patches are likely going to be added to linux-next 
> as soon as the current merge window ends, further simplifications of the 
> informational prints should probably be rebased on top of it.
> 
> David Daney
> 

>> On Thu, 2016-05-26 at 09:22 -0700, Ganapatrao Kulkarni wrote:
>>> IIRC, it should be
>>> if (!numa_off)
>>> we want to print this message when we failed to find proper numa 
>>> configuration.
>>> when numa_off is set, we will not look for any numa configuration.
>>>

 +   pr_info("%s\n", "No NUMA configuration found");
>>


OK, I think I also missed some cases.

But my problem still have not been resolved by 
"https://lkml.org/lkml/2016/5/24/679";, see below. I will update my patches base 
on it.


[0.00] NUMA: Adding memblock [0x0 - 0x6aff] on node 0
[0.00] NUMA: parsing numa-distance-map-v1
[0.00] NUMA: Warning: invalid memblk node 4 [mem 0x6b00-0x7fbf] 
//My numa configuration is incorrect, but not "No ... found"
[0.00] No NUMA configuration found  
//Above warning is very detail, this can be removed
[0.00] NUMA: Faking a node at [mem 
0x-0x0017]



Re: [PATCH 2/3] of/numa: fix a memory@ dt node can only contains one memory block

2016-05-26 Thread Leizhen (ThunderTown)


On 2016/5/26 21:13, Rob Herring wrote:
> On Thu, May 26, 2016 at 10:43:58AM +0800, Zhen Lei wrote:
>> For a normal memory@ devicetree node, its reg property can contains more
>> memory blocks.
>>
>> Because we don't known how many memory blocks maybe contained, so we try
>> from index=0, increase 1 until error returned(the end).
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/of/of_numa.c | 30 --
>>  1 file changed, 20 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
>> index 21d831f..2c5f249 100644
>> --- a/drivers/of/of_numa.c
>> +++ b/drivers/of/of_numa.c
>> @@ -63,7 +63,7 @@ static int __init of_numa_parse_memory_nodes(void)
>>  struct device_node *np = NULL;
>>  struct resource rsrc;
>>  u32 nid;
>> -int r = 0;
>> +int i, r = 0;
>>
>>  for (;;) {
>>  np = of_find_node_by_type(np, "memory");
>> @@ -82,17 +82,27 @@ static int __init of_numa_parse_memory_nodes(void)
>>  /* some other error */
>>  break;
>>
>> -r = of_address_to_resource(np, 0, &rsrc);
>> -if (r) {
>> -pr_err("NUMA: bad reg property in memory node\n");
>> -break;
>> +for (i = 0; ; i++) {
>> +r = of_address_to_resource(np, i, &rsrc);
>> +if (r) {
>> +/* reached the end of of_address */
>> +if (i > 0) {
>> +r = 0;
>> +break;
>> +}
>> +
>> +pr_err("NUMA: bad reg property in memory 
>> node\n");
>> +goto finished;
>> +}
>> +
>> +r = numa_add_memblk(nid, rsrc.start,
>> +rsrc.end - rsrc.start + 1);
>> +if (r)
>> +goto finished;
>>  }
>> -
>> -r = numa_add_memblk(nid, rsrc.start,
>> -rsrc.end - rsrc.start + 1);
>> -if (r)
>> -break;
>>  }
>> +
>> +finished:
>>  of_node_put(np);
> 
> This function can be simplified down to:
> 
>   for_each_node_by_type(np, "memory") {
OK, That's good.

>   r = of_property_read_u32(np, "numa-node-id", &nid);
>   if (r == -EINVAL)
>   /*
>* property doesn't exist if -EINVAL, continue
>* looking for more memory nodes with
>* "numa-node-id" property
>*/
>   continue;
Hi, everybody:
If some "memory" node contains "numa-node-id", but some others missed. Can 
we simply ignored it?
I think we should break out too, and faking to only have node0.

>   else if (r)
>   /* some other error */
>   break;
> 
>   r = of_address_to_resource(np, 0, &rsrc);
>   for (i = 0; !r; i++, r = of_address_to_resource(np, i, 

But r(non-zero) is just break this loop, the original is break the outer for 
(;;) loop

How about as below?

for_each_node_by_type(np, "memory") {
... ...

for (i = 0; !of_address_to_resource(np, i, &rsrc); i++) {
r = numa_add_memblk(nid, rsrc.start,
rsrc.end - rsrc.start + 1);
if (r)
goto finished;
}

if (!i)
pr_err("NUMA: bad reg property in memory node\n");
}

finished:


> &rsrc)) {
>   r = numa_add_memblk(nid, rsrc.start,
>   rsrc.end - rsrc.start + 1);
>   }
>   }
>   of_node_put(np);
> 
>   return r;
> 
> 
> Perhaps with a "if (!i && r) pr_err()" for an error message at the end.
> 
> Rob
> 
> .
> 



Re: [PATCH 2/3] of/numa: fix a memory@ dt node can only contains one memory block

2016-05-27 Thread Leizhen (ThunderTown)


On 2016/5/27 12:20, Rob Herring wrote:
> On Thu, May 26, 2016 at 10:36 PM, Leizhen (ThunderTown)
>  wrote:
>>
>>
>> On 2016/5/26 21:13, Rob Herring wrote:
>>> On Thu, May 26, 2016 at 10:43:58AM +0800, Zhen Lei wrote:
>>>> For a normal memory@ devicetree node, its reg property can contains more
>>>> memory blocks.
>>>>
>>>> Because we don't known how many memory blocks maybe contained, so we try
>>>> from index=0, increase 1 until error returned(the end).
>>>>
>>>> Signed-off-by: Zhen Lei 
>>>> ---
>>>>  drivers/of/of_numa.c | 30 --
>>>>  1 file changed, 20 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
>>>> index 21d831f..2c5f249 100644
>>>> --- a/drivers/of/of_numa.c
>>>> +++ b/drivers/of/of_numa.c
>>>> @@ -63,7 +63,7 @@ static int __init of_numa_parse_memory_nodes(void)
>>>>  struct device_node *np = NULL;
>>>>  struct resource rsrc;
>>>>  u32 nid;
>>>> -int r = 0;
>>>> +int i, r = 0;
>>>>
>>>>  for (;;) {
>>>>  np = of_find_node_by_type(np, "memory");
>>>> @@ -82,17 +82,27 @@ static int __init of_numa_parse_memory_nodes(void)
>>>>  /* some other error */
>>>>  break;
>>>>
>>>> -r = of_address_to_resource(np, 0, &rsrc);
>>>> -if (r) {
>>>> -pr_err("NUMA: bad reg property in memory node\n");
>>>> -break;
>>>> +for (i = 0; ; i++) {
>>>> +r = of_address_to_resource(np, i, &rsrc);
>>>> +if (r) {
>>>> +/* reached the end of of_address */
>>>> +if (i > 0) {
>>>> +r = 0;
>>>> +break;
>>>> +}
>>>> +
>>>> +pr_err("NUMA: bad reg property in memory 
>>>> node\n");
>>>> +goto finished;
>>>> +}
>>>> +
>>>> +r = numa_add_memblk(nid, rsrc.start,
>>>> +rsrc.end - rsrc.start + 1);
>>>> +if (r)
>>>> +goto finished;
>>>>  }
>>>> -
>>>> -r = numa_add_memblk(nid, rsrc.start,
>>>> -rsrc.end - rsrc.start + 1);
>>>> -if (r)
>>>> -break;
>>>>  }
>>>> +
>>>> +finished:
>>>>  of_node_put(np);
>>>
>>> This function can be simplified down to:
>>>
>>>   for_each_node_by_type(np, "memory") {
>> OK, That's good.
>>
>>>   r = of_property_read_u32(np, "numa-node-id", &nid);
>>>   if (r == -EINVAL)
>>>   /*
>>>* property doesn't exist if -EINVAL, continue
>>>* looking for more memory nodes with
>>>* "numa-node-id" property
>>>*/
>>>   continue;
>> Hi, everybody:
>> If some "memory" node contains "numa-node-id", but some others missed. 
>> Can we simply ignored it?
>> I think we should break out too, and faking to only have node0.
> 
> Continuing to work is probably better than not.
> 
>>
>>>   else if (r)
>>>   /* some other error */
>>>   break;
>>>
>>>   r = of_address_to_resource(np, 0, &rsrc);
>>>   for (i = 0; !r; i++, r = of_address_to_resource(np, i,
>>
>> But r(non-zero) is just break this loop, the original is break the outer for 
>> (;;) loop
> 
> It is not really the kernel's job to validate the DT. If there's
> random things in it then kernel's behavior is undefined.
> 
>>
>> How about as below?
>>
>> for_each_node_by_type(np, "memory") {
>>

Re: [PATCH 1/1] tty/serial: to support 8250 earlycon can be enabled independently

2016-05-16 Thread Leizhen (ThunderTown)


On 2016/5/16 23:40, Peter Hurley wrote:
> On 05/16/2016 04:35 AM, Zhen Lei wrote:
>> Sometimes, we may only use SSH to login, and build 8250 uart driver as a
>> ko(insmod if needed). But the earlycon may still be necessary, because
>> the kernel boot process may take a long time. It's not good to display
>> nothing but ask people to wait patiently.
> 
> I'm confused; you want the possibility of earlycon but _not_ a normal
> serial console?
Our downstream customers want add some private functions into 8250.ko. So that, 
we can not pre-build the 8250 driver into Image.

> 
> This configuration is unsafe because nothing prevents the 8250 driver
> and 8250 earlycon from concurrently accessing the hardware.
earlycon is a boot console, it will be disabled in printk_late_init(suppose we 
have not set keep_bootcon).

> 
> 
>> In addition, the 8250.ko can not be worked if we have not opened any
>> other serial drivers, because SERIAL_CORE would not be selected.
> 
> I don't understand what this means.

Before I opened CONFIG_SERIAL_AMBA_PL011_CONSOLE(only built 8250 as a module, 
this case can not be worked):
CONFIG_SERIAL_CORE=m

After I opened CONFIG_SERIAL_AMBA_PL011_CONSOLE:
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_AMBA_PL011=y
CONFIG_SERIAL_AMBA_PL011_CONSOLE=y
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y

> 
> Regards,
> Peter Hurley
> 
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/tty/serial/8250/Kconfig  | 9 +++--
>>  drivers/tty/serial/8250/Makefile | 1 -
>>  drivers/tty/serial/Makefile  | 1 +
>>  3 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/tty/serial/8250/Kconfig 
>> b/drivers/tty/serial/8250/Kconfig
>> index 4d7cb9c..2992f0a 100644
>> --- a/drivers/tty/serial/8250/Kconfig
>> +++ b/drivers/tty/serial/8250/Kconfig
>> @@ -3,6 +3,12 @@
>>  # you somehow have an implicit or explicit dependency on SERIAL_8250.
>>  #
>>
>> +config SERIAL_8250_EARLYCON
>> +bool "Early console using 8250"
>> +select SERIAL_CORE
>> +select SERIAL_CORE_CONSOLE
>> +select SERIAL_EARLYCON
>> +
>>  config SERIAL_8250
>>  tristate "8250/16550 and compatible serial support"
>>  select SERIAL_CORE
>> @@ -60,8 +66,7 @@ config SERIAL_8250_PNP
>>  config SERIAL_8250_CONSOLE
>>  bool "Console on 8250/16550 and compatible serial port"
>>  depends on SERIAL_8250=y
>> -select SERIAL_CORE_CONSOLE
>> -select SERIAL_EARLYCON
>> +select SERIAL_8250_EARLYCON
>>  ---help---
>>If you say Y here, it will be possible to use a serial port as the
>>system console (the system console is the device which receives all
>> diff --git a/drivers/tty/serial/8250/Makefile 
>> b/drivers/tty/serial/8250/Makefile
>> index c9a2d6e..1f24c74 100644
>> --- a/drivers/tty/serial/8250/Makefile
>> +++ b/drivers/tty/serial/8250/Makefile
>> @@ -13,7 +13,6 @@ obj-$(CONFIG_SERIAL_8250_HP300)+= 8250_hp300.o
>>  obj-$(CONFIG_SERIAL_8250_CS)+= serial_cs.o
>>  obj-$(CONFIG_SERIAL_8250_ACORN) += 8250_acorn.o
>>  obj-$(CONFIG_SERIAL_8250_BCM2835AUX)+= 8250_bcm2835aux.o
>> -obj-$(CONFIG_SERIAL_8250_CONSOLE)   += 8250_early.o
>>  obj-$(CONFIG_SERIAL_8250_FOURPORT)  += 8250_fourport.o
>>  obj-$(CONFIG_SERIAL_8250_ACCENT)+= 8250_accent.o
>>  obj-$(CONFIG_SERIAL_8250_BOCA)  += 8250_boca.o
>> diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
>> index 8c261ad..cd84181 100644
>> --- a/drivers/tty/serial/Makefile
>> +++ b/drivers/tty/serial/Makefile
>> @@ -19,6 +19,7 @@ obj-$(CONFIG_SERIAL_SUNSAB) += sunsab.o
>>
>>  # Now bring in any enabled 8250/16450/16550 type drivers.
>>  obj-$(CONFIG_SERIAL_8250) += 8250/
>> +obj-$(CONFIG_SERIAL_8250_EARLYCON) += 8250/8250_early.o
>>
>>  obj-$(CONFIG_SERIAL_AMBA_PL010) += amba-pl010.o
>>  obj-$(CONFIG_SERIAL_AMBA_PL011) += amba-pl011.o
>> --
>> 2.5.0
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
> .
> 



Re: [PATCH 1/1] rtc: fix type information of rtc-proc

2015-11-11 Thread Leizhen (ThunderTown)


On 2015/11/11 18:54, Alexandre Belloni wrote:
> On 11/11/2015 at 09:06:51 +0800, Leizhen (ThunderTown) wrote :
>> Hi, all
>>
>> I'm sorry. Maybe I didn't describe clearly enough before. These words are 
>> finally
>> shown to the end user. The end user maybe not a programmer, abbreviation 
>> word is unsuitable.
>>
> 
> Yes, that is exactly m point. What if an end user currently has a
> program parsing the file and looking for alrm_time or alrm_date? After
> updating his kernel, the program won't work anymore which is something
> we don't want.

OK. I see. Thanks.

> 
>>
>> cat /proc/driver/rtc
>>
>> rtc_time: 00:47:43
>> rtc_date: 2015-11-11
>> alrm_time   : 03:27:58   //alrm_time --> 
>> alarm_time
>> alrm_date   : 2015-10-08 //alrm_date --> 
>> alarm_date
>> alarm_IRQ   : no
>> alrm_pending: no //alrm_pending --> 
>> alarm_pending
>> update IRQ enabled  : no
>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/1] of: to support binding numa node to specified device

2015-09-10 Thread Leizhen (ThunderTown)
Sorry, missed version number in title.


On 2015/8/25 12:08, Zhen Lei wrote:
> Changelog:
> v1 -> v2:
> In patch v1, binding numa node to specified device only take effect for 
> dt-nodes
> directly of root. Patch v2 removed this limitation, we can binding numa node 
> to
> any specified device in devicetree.
> 
> Zhen Lei (1):
>   of: to support binding numa node to specified device in devicetree
> 
>  drivers/base/core.c |  2 +-
>  drivers/of/device.c | 11 ++-
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/1] of: to support binding numa node to specified device in devicetree

2015-09-10 Thread Leizhen (ThunderTown)
Hi all,

Can somebody take a few moments to review it? This patch is
too small, only changed two lines.

Thanks,
Thunder.

On 2015/8/25 12:08, Zhen Lei wrote:
> For now, in function device_add, the new device will be forced to
> inherit the numa node of its parent. But this will override the device's
> numa node which configured in devicetree.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/base/core.c |  2 +-
>  drivers/of/device.c | 11 ++-
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index dafae6d..e06de82 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1017,7 +1017,7 @@ int device_add(struct device *dev)
>   dev->kobj.parent = kobj;
> 
>   /* use parent numa_node */
> - if (parent)
> + if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
>   set_dev_node(dev, dev_to_node(parent));
> 
>   /* first, register with generic layer. */
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index 8b91ea2..e5f47ce 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -60,11 +60,12 @@ int of_device_add(struct platform_device *ofdev)
>   ofdev->name = dev_name(&ofdev->dev);
>   ofdev->id = -1;
> 
> - /* device_add will assume that this device is on the same node as
> -  * the parent. If there is no parent defined, set the node
> -  * explicitly */
> - if (!ofdev->dev.parent)
> - set_dev_node(&ofdev->dev, of_node_to_nid(ofdev->dev.of_node));
> + /*
> +  * If this device has not binding numa node in devicetree, that is
> +  * of_node_to_nid returns NUMA_NO_NODE. device_add will assume that this
> +  * device is on the same node as the parent.
> +  */
> + set_dev_node(&ofdev->dev, of_node_to_nid(ofdev->dev.of_node));
> 
>   return device_add(&ofdev->dev);
>  }
> --
> 2.5.0
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] PCI: generic: add description of property "interrupt-skip-mask"

2016-02-25 Thread Leizhen (ThunderTown)


On 2016/2/25 20:20, Mark Rutland wrote:
> Hi,
> 
> In future, please send the binding document first in a series, per point
> 3 of Documentation/devicetree/bindings/submitting-patches.txt. It makes
> review easier/faster.
Thank you for your reminding.

> 
> On Thu, Feb 25, 2016 at 07:53:28PM +0800, Zhen Lei wrote:
>> Interrupt Pin register is read-only and optional. Some pci devices may use
>> msi/msix but leave the value of Interrupt Pin non-zero.
> 
> Is that permitted by the spec? Surely 'optional' means it must be zero
> if not implemented?

In :
Devices (or device functions) that do not use an interrupt pin must put a 0 in 
this register. This register is read-only.

So, do you think this is a hardware bug? But these pci-devices are not produced 
by our company.

In function init_service_irqs, it try msix first, then msi, Interrupt PIN is 
the last attemption. But of_irq_parse_pci() happened before this.


In fact, there also a familiar problem exist. As below:
pci :42:00.0: BAR 7: no space for [io  size 0x1000]
pci :42:00.0: BAR 7: failed to assign [io  size 0x1000]

There no "io space" on arm64, maybe only exist on X86. And the Memory Space 
Indicator also read-only in BAR register.

> 
>> In this case, the driver will print information as below: pci
>> :40:00.0: of_irq_parse_pci() failed with rc=-22
>>
>> It's easily lead to misinterpret.
> 
> If this is limited to a subset of devices which we know are broken in
> this regard, can we not handle these cases explicitly?
Actually, we have another way to block this warning. Use "interrupt-map" to map 
it to a pesudo IRQ. But I think it will also be misunderstanded.

> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  Documentation/devicetree/bindings/pci/host-generic-pci.txt | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/pci/host-generic-pci.txt 
>> b/Documentation/devicetree/bindings/pci/host-generic-pci.txt
>> index 3f1d3fc..0f10978 100644
>> --- a/Documentation/devicetree/bindings/pci/host-generic-pci.txt
>> +++ b/Documentation/devicetree/bindings/pci/host-generic-pci.txt
>> @@ -70,6 +70,8 @@ Practice: Interrupt Mapping' and requires the following 
>> properties:
>>
>>  - interrupt-map-mask : 
>>
>> +- interrupt-skip-mask: Explicitly declare which pci devices only use 
>> msi/msix
>> +but leave the value of Interrupt Pin non-zero.
> 
> Unlike the rest of the interrupt mapping properties, this is not
> described in  `Open Firmware Recommended Practice: Interrupt Mapping'.
> 
> This needs a far more complete description.
> 
> This also doesn't strike me as th right approach. The interrupt-map-mask
> property describe as relationship between the host-controller-provided
> interrupt lines and endpoints, while this seems to be a bug completely
> contained within an endpoint.

In :
// PCI_DEVICE(3)  INT#(1)  CONTROLLER(PHANDLE)  CONTROLLER_DATA(3)
interrupt-map = <  0x0 0x0 0x0  0x1  &gic  0x0 0x4 0x1

PCI_DEVICE contain 3 cells. But only the first one be used in function 
of_irq_parse_pci.
laddr[0] = cpu_to_be32((pdev->bus->number << 16) | (pdev->devfn << 8));
laddr[1] = laddr[2] = cpu_to_be32(0);

And for INT#, I don't think there will some Pins used but others unused on a 
pci-device. So I can ommit it.

So, only laddr[0] mask need to be described.
> 
> Thanks,
> Mark.
> 
>>
>>  Example:
>>
>> --
>> 2.5.0
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> .
> 



Re: Suspicious error for CMA stress test

2016-03-08 Thread Leizhen (ThunderTown)


On 2016/3/8 9:54, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/3/8 2:42, Laura Abbott wrote:
>> On 03/07/2016 12:16 AM, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2016/3/7 12:34, Joonsoo Kim wrote:
>>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>>>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>>>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
>>>>>>> On 2016/3/4 12:32, Joonsoo Kim wrote:
>>>>>>>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>>>>>>>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>>>>>>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
>>>>>>>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott :
>>>>>>>>>>>> (cc -mm and Joonsoo Kim)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I came across a suspicious error for CMA stress test:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Before the test, I got:
>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>>> CmaTotal: 204800 kB
>>>>>>>>>>>>> CmaFree:  195044 kB
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> After running the test:
>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>>> CmaTotal: 204800 kB
>>>>>>>>>>>>> CmaFree: 6602584 kB
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the freed CMA memory is more than total..
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also the the MemFree is more than mem total:
>>>>>>>>>>>>>
>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo
>>>>>>>>>>>>> MemTotal:   16342016 kB
>>>>>>>>>>>>> MemFree:22367268 kB
>>>>>>>>>>>>> MemAvailable:   22370528 kB
>>>>>>>>>> [...]
>>>>>>>>>>>> I played with this a bit and can see the same problem. The sanity
>>>>>>>>>>>> check of CmaFree < CmaTotal generally triggers in
>>>>>>>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
>>>>>>>>>>>> This also seems to be present as far back as v4.0 which was the
>>>>>>>>>>>> first version to have the updated accounting from Joonsoo.
>>>>>>>>>>>> Were there known limitations with the new freepage accounting,
>>>>>>>>>>>> Joonsoo?
>>>>>>>>>>> I don't know. I also played with this and looks like there is
>>>>>>>>>>> accounting problem, however, for my case, number of free page is 
>>>>>>>>>>> slightly less
>>>>>>>>>>> than total. I will take a look.
>>>>>>>>>>>
>>>>>>>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it 
>>>>>>>>>>> doesn't
>>>>>>>>>>> look like your case.
>>>>>>>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, 
>>>>>>>>>> also I
>>>>>>>>>> did some other test:
>>>>>>>>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>>>>>>>>
>>>>>>>>>>   - run with single thread with 10 times, everything is fine.
>>>>>>>>>>
>>>>>>>>>>   - I hack the cam_alloc() and free as below [1] to see if it's lock 
>>>>>>>>>> issue, with
>>>>>>>>>> 

Re: [PATCH 1/1] arm64/dma-mapping: remove an unnecessary conversion

2016-03-15 Thread Leizhen (ThunderTown)


On 2016/3/15 23:37, Catalin Marinas wrote:
> On Tue, Mar 15, 2016 at 10:12:11AM +0800, Zhen Lei wrote:
>> 1. In swiotlb_alloc_coherent, the branch of __get_free_pages. Directly
>>return vaddr on success, and pass vaddr to free_pages on failure.
>> 2. So, we can directly transparent pass vaddr from __dma_free to
>>swiotlb_free_coherent, keep consistent with swiotlb_alloc_coherent.
>>
>> This patch have no functional change,
> 
> I don't think so.
> 
>> but can obtain a bit performance improvement.
> 
> Have you actually measured it?
I have not run any performance testing, but reduced a line of code. So I said 
"a bit".

> 
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index a6e757c..b2f2834 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -187,8 +187,6 @@ static void __dma_free(struct device *dev, size_t size,
>> void *vaddr, dma_addr_t dma_handle,
>> struct dma_attrs *attrs)
>>  {
>> -void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle));
>> -
>>  size = PAGE_ALIGN(size);
>>
>>  if (!is_device_dma_coherent(dev)) {
>> @@ -196,7 +194,7 @@ static void __dma_free(struct device *dev, size_t size,
>>  return;
>>  vunmap(vaddr);
>>  }
>> -__dma_free_coherent(dev, size, swiotlb_addr, dma_handle, attrs);
>> +__dma_free_coherent(dev, size, vaddr, dma_handle, attrs);
>>  }
> 
> What happens when !is_device_dma_coherent(dev)? (hint: read two lines
> above __dma_free_coherent).
> 
The whole function of __dma_free as below: (nobody use swiotlb_addr except 
__dma_free_coherent)
static void __dma_free(struct device *dev, size_t size,
   void *vaddr, dma_addr_t dma_handle,
   struct dma_attrs *attrs)
{
void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle));

size = PAGE_ALIGN(size);

if (!is_device_dma_coherent(dev)) {
if (__free_from_pool(vaddr, size))
return;
vunmap(vaddr);
}
__dma_free_coherent(dev, size, swiotlb_addr, dma_handle, attrs);
}




Re: [PATCH 1/1] arm64/dma-mapping: remove an unnecessary conversion

2016-03-19 Thread Leizhen (ThunderTown)


On 2016/3/16 9:56, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/3/15 23:37, Catalin Marinas wrote:
>> On Tue, Mar 15, 2016 at 10:12:11AM +0800, Zhen Lei wrote:
>>> 1. In swiotlb_alloc_coherent, the branch of __get_free_pages. Directly
>>>return vaddr on success, and pass vaddr to free_pages on failure.
>>> 2. So, we can directly transparent pass vaddr from __dma_free to
>>>swiotlb_free_coherent, keep consistent with swiotlb_alloc_coherent.
>>>
>>> This patch have no functional change,
>>
>> I don't think so.
>>
>>> but can obtain a bit performance improvement.
>>
>> Have you actually measured it?
> I have not run any performance testing, but reduced a line of code. So I said 
> "a bit".
> 
>>
>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>> index a6e757c..b2f2834 100644
>>> --- a/arch/arm64/mm/dma-mapping.c
>>> +++ b/arch/arm64/mm/dma-mapping.c
>>> @@ -187,8 +187,6 @@ static void __dma_free(struct device *dev, size_t size,
>>>void *vaddr, dma_addr_t dma_handle,
>>>struct dma_attrs *attrs)
>>>  {
>>> -   void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle));
>>> -
>>> size = PAGE_ALIGN(size);
>>>
>>> if (!is_device_dma_coherent(dev)) {
>>> @@ -196,7 +194,7 @@ static void __dma_free(struct device *dev, size_t size,
>>> return;
>>> vunmap(vaddr);
>>> }
>>> -   __dma_free_coherent(dev, size, swiotlb_addr, dma_handle, attrs);
>>> +   __dma_free_coherent(dev, size, vaddr, dma_handle, attrs);
>>>  }
>>
>> What happens when !is_device_dma_coherent(dev)? (hint: read two lines
>> above __dma_free_coherent).
Do you afraid "vaddr" maybe modified by these statement?
First, it could not be __free_from_pool. Otherwise, the function vunmap(which 
after it) can not work well.
Then, it count not be vunmap too, the parameter is defined as "const void *".

In the call chain: 
__dma_free_coherent-->__dma_free_coherent-->swiotlb_free_coherent, only 
swiotlb_free_coherent finally use "vaddr".

>>
> The whole function of __dma_free as below: (nobody use swiotlb_addr except 
> __dma_free_coherent)
> static void __dma_free(struct device *dev, size_t size,
>void *vaddr, dma_addr_t dma_handle,
>struct dma_attrs *attrs)
> {
> void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle));
> 
> size = PAGE_ALIGN(size);
> 
> if (!is_device_dma_coherent(dev)) {
> if (__free_from_pool(vaddr, size))
> return;
> vunmap(vaddr);
> }
> __dma_free_coherent(dev, size, swiotlb_addr, dma_handle, attrs);
> }
> 



Re: [PATCH 1/1] arm64/dma-mapping: remove an unnecessary conversion

2016-03-19 Thread Leizhen (ThunderTown)


On 2016/3/17 19:59, Catalin Marinas wrote:
> On Thu, Mar 17, 2016 at 07:06:27PM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/3/16 9:56, Leizhen (ThunderTown) wrote:
>>> On 2016/3/15 23:37, Catalin Marinas wrote:
>>>> On Tue, Mar 15, 2016 at 10:12:11AM +0800, Zhen Lei wrote:
>>>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>>>> index a6e757c..b2f2834 100644
>>>>> --- a/arch/arm64/mm/dma-mapping.c
>>>>> +++ b/arch/arm64/mm/dma-mapping.c
>>>>> @@ -187,8 +187,6 @@ static void __dma_free(struct device *dev, size_t 
>>>>> size,
>>>>>  void *vaddr, dma_addr_t dma_handle,
>>>>>  struct dma_attrs *attrs)
>>>>>  {
>>>>> - void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle));
>>>>> -
>>>>>   size = PAGE_ALIGN(size);
>>>>>
>>>>>   if (!is_device_dma_coherent(dev)) {
>>>>> @@ -196,7 +194,7 @@ static void __dma_free(struct device *dev, size_t 
>>>>> size,
>>>>>   return;
>>>>>   vunmap(vaddr);
>>>>>   }
>>>>> - __dma_free_coherent(dev, size, swiotlb_addr, dma_handle, attrs);
>>>>> + __dma_free_coherent(dev, size, vaddr, dma_handle, attrs);
>>>>>  }
>>>>
>>>> What happens when !is_device_dma_coherent(dev)? (hint: read two lines
>>>> above __dma_free_coherent).
>>
>> Do you afraid "vaddr" maybe modified by these statement?
>> First, it could not be __free_from_pool. Otherwise, the function
>> vunmap(which after it) can not work well. Then, it count not be vunmap
>> too, the parameter is defined as "const void *".
>>
>> In the call chain:
>> __dma_free_coherent-->__dma_free_coherent-->swiotlb_free_coherent,
>> only swiotlb_free_coherent finally use "vaddr".
> 
> Exactly. So you give swiotlb_free_coherent a vaddr which has been
> unmapped. It doesn't even matter whether it's still mapped since this
> address is passed further to free_pages() which performs a
> virt_to_page(). The latter is *only* valid on linear map addresses (and
> you would actually hit the VM_BUG_ON in free_pages; you can try running
> this with CONFIG_DEBUG_VM enabled and non-coherent DMA).
> 
> For non-coherent DMA, the vaddr is not part of the linear mapping as it
> has been remapped by __dma_alloc() via dma_common_contiguous_remap(),
> hence for swiotlb freeing we need the actual linear map address (the
> original "ptr" in __dma_alloc()). We can generate it by a
> phys_to_virt(dma_to_phys(dma_handle)).
> 

OK, I got it.

So actually I should move the statement into branch "if 
(!is_device_dma_coherent(dev))", I will prepare v2.



Re: [PATCH 1/1] rtc: fix type information of rtc-proc

2015-11-10 Thread Leizhen (ThunderTown)
Hi, all

I'm sorry. Maybe I didn't describe clearly enough before. These words are 
finally
shown to the end user. The end user maybe not a programmer, abbreviation word 
is unsuitable.


cat /proc/driver/rtc

rtc_time: 00:47:43
rtc_date: 2015-11-11
alrm_time   : 03:27:58  //alrm_time --> 
alarm_time
alrm_date   : 2015-10-08//alrm_date --> 
alarm_date
alarm_IRQ   : no
alrm_pending: no//alrm_pending --> 
alarm_pending
update IRQ enabled  : no


On 2015/10/8 17:47, Zhen Lei wrote:
> Display the whole word of "alarm", make it look more comfortable.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/rtc/rtc-proc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/rtc/rtc-proc.c b/drivers/rtc/rtc-proc.c
> index ffa69e1..ef83f34 100644
> --- a/drivers/rtc/rtc-proc.c
> +++ b/drivers/rtc/rtc-proc.c
> @@ -58,7 +58,7 @@ static int rtc_proc_show(struct seq_file *seq, void *offset)
> 
>   err = rtc_read_alarm(rtc, &alrm);
>   if (err == 0) {
> - seq_printf(seq, "alrm_time\t: ");
> + seq_printf(seq, "alarm_time\t: ");
>   if ((unsigned int)alrm.time.tm_hour <= 24)
>   seq_printf(seq, "%02d:", alrm.time.tm_hour);
>   else
> @@ -72,7 +72,7 @@ static int rtc_proc_show(struct seq_file *seq, void *offset)
>   else
>   seq_printf(seq, "**\n");
> 
> - seq_printf(seq, "alrm_date\t: ");
> + seq_printf(seq, "alarm_date\t: ");
>   if ((unsigned int)alrm.time.tm_year <= 200)
>   seq_printf(seq, "%04d-", alrm.time.tm_year + 1900);
>   else
> @@ -87,7 +87,7 @@ static int rtc_proc_show(struct seq_file *seq, void *offset)
>   seq_printf(seq, "**\n");
>   seq_printf(seq, "alarm_IRQ\t: %s\n",
>   alrm.enabled ? "yes" : "no");
> - seq_printf(seq, "alrm_pending\t: %s\n",
> + seq_printf(seq, "alarm_pending\t: %s\n",
>   alrm.pending ? "yes" : "no");
>   seq_printf(seq, "update IRQ enabled\t: %s\n",
>   (rtc->uie_rtctimer.enabled) ? "yes" : "no");
> --
> 2.5.0
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/2] iommu/iova: enhance the rcache optimization

2019-08-23 Thread Leizhen (ThunderTown)
Hi all,
  Can anyone help review it?


On 2019/8/15 20:11, Zhen Lei wrote:
> v1 --> v2
> 1. I did not chagne the patches but added this cover-letter.
> 2. Add a batch of reviewers base on
>9257b4a206fc ("iommu/iova: introduce per-cpu caching to iova allocation")
> 3. I described the problem I met in patch 2, but I hope below brief 
> description
>can help people to quickly understand.
>Suppose there are six rcache sizes, each size can maximum hold 1 IOVAs.
>
>|  4K   |  8K  | 16K  |  32K | 64K  | 128K |
>
>| 1 | 9000 | 8500 | 8600 | 9200 | 7000 |
>
>As the above map displayed, the whole rcache buffered too many IOVAs. Now, 
> the
>worst case can be coming, suppose we need 2 4K IOVAs at one time. That 
> means
>1 IOVAs can be allocated from rcache, but another 1 IOVAs should 
> be 
>allocated from RB tree base on alloc_iova() function. But the RB tree 
> currently
>have at least (9000 + 8500 + 8600 + 9200 + 7000) = 42300 nodes. The 
> average speed
>of RB tree traverse will be very slow. For my test scenario, the 4K size 
> IOVAs are
>frequently used, but others are not. So similarly, when the 2 4K IOVAs 
> are
>continuous freed, the first 1 IOVAs can be quickly buffered, but the 
> other
>1 IOVAs can not.
> 
> Zhen Lei (2):
>   iommu/iova: introduce iova_magazine_compact_pfns()
>   iommu/iova: enhance the rcache optimization
> 
>  drivers/iommu/iova.c | 100 
> +++
>  include/linux/iova.h |   1 +
>  2 files changed, 95 insertions(+), 6 deletions(-)
> 



Re: [PATCH RFC 1/1] iommu: set the default iommu-dma mode as non-strict

2019-03-06 Thread Leizhen (ThunderTown)



On 2019/3/4 23:52, Robin Murphy wrote:
> On 02/03/2019 06:12, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2019/3/1 19:07, Jean-Philippe Brucker wrote:
>>> Hi Leizhen,
>>>
>>> On 01/03/2019 04:44, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2019/2/26 20:36, Hanjun Guo wrote:
>>>>> Hi Jean,
>>>>>
>>>>> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 31/01/2019 13:52, Zhen Lei wrote:
>>>>>>> Currently, many peripherals are faster than before. For example, the top
>>>>>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>>>>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>>>>>> in strict mode, because of frequently map and unmap operations. In order
>>>>>>> to keep abreast of the times, I think it's better to set non-strict as
>>>>>>> default.
>>>>>>
>>>>>> Most users won't be aware of this relaxation and will have their system
>>>>>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>>>>>> Invalidation in
>>>>>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
>>>> Hi Jean,
>>>>
>>>> In fact, we have discussed the vulnerable of deferred invalidation 
>>>> before upstream
>>>> the non-strict patches. The attacks maybe possible because of an untrusted 
>>>> device or
>>>> the mistake of the device driver. And we limited the VFIO to still use 
>>>> strict mode.
>>>> As mentioned in the pdf, limit the freed memory with deferred 
>>>> invalidation only to
>>>> be reused by the device, can mitigate the vulnerability. But it's too hard 
>>>> to implement
>>>> it now.
>>>> A compromise maybe we only apply non-strict to (1) dma_free_coherent, 
>>>> because the
>>>> memory is controlled by DMA common module, so we can make the memory to be 
>>>> freed after
>>>> the global invalidation in the timer handler. (2) And provide some new 
>>>> APIs related to
>>>> iommu_unmap_page/sg, these new APIs deferred invalidation. And the 
>>>> candiate device
>>>> drivers update the APIs if they want to improve performance. (3) Make sure 
>>>> that only
>>>> the trusted devices and trusted drivers can apply (1) and (2). For 
>>>> example, the driver
>>>> must be built into kernel Image.
>>>
>>> Do we have a notion of untrusted kernel drivers? A userspace driver
>> It seems impossible to have such driver. The modules insmod by root users 
>> should be
>> guaranteed by themselves.
>>
>>> (VFIO) is untrusted, ok. But a malicious driver loaded into the kernel
>>> address space would have much easier ways to corrupt the system than to
>>> exploit lazy mode...
>> Yes, so that we have no need to consider untrusted drivers.
>>
>>>
>>> For (3), I agree that we should at least disallow lazy mode if
>>> pci_dev->untrusted is set. At the moment it means that we require the
>>> strictest IOMMU configuration for external-facing PCI ports, but it can
>>> be extended to blacklist other vulnerable devices or locations.
>> I plan to add an attribute file for each device, espcially for hotplug 
>> devices. And
>> let the root users to decide which mode should be used, strict or 
>> non-strict. Becasue
>> they should known whether the hot-plug divice is trusted or not.
> 
> Aside from the problem that without massive implementation changes 
> strict/non-strict is at
> best a per-domain property, not a per-device one, I can't see this being 
> particularly practical
> - surely the whole point of a malicious endpoint is that it's going to 
> pretend to be some common
> device for which a 'trusted' kernel driver already exists? 
Yes, It should be assumed that all kernel drivers and all hard-wired devices 
are trusted. There is
no reason to doubt that the open source drivers or the drivers and devices 
provided by legitimate
suppliers are malicious.


> If you've chosen to trust *any* external device, I think you may as well have 
> just set non-strict globally anyway.
> The effort involved in trying to implement super-fine-grained control seems 
> hard t

[Question] Are the trace APIs declared by "TRACE_EVENT(irq_handler_entry" allowed to be used in Ko?

2018-09-11 Thread Leizhen (ThunderTown)
After patch 7e066fb870fc ("tracepoints: add DECLARE_TRACE() and 
DEFINE_TRACE()"),
the trace APIs declared by "TRACE_EVENT(irq_handler_entry" can not be directly 
used
by ko, because it's not explicitly exported by EXPORT_TRACEPOINT_SYMBOL_GPL or
EXPORT_TRACEPOINT_SYMBOL.

Did we miss it? or it's not recommended to be used in ko?


-

commit 7e066fb870fcd1025ec3ba7bbde5d541094f4ce1
Author: Mathieu Desnoyers 
Date:   Fri Nov 14 17:47:47 2008 -0500

tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()

Impact: API *CHANGE*. Must update all tracepoint users.

Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
structure in a single spot for all the kernel. It helps reducing memory
consumption, especially when declaring a lot of tracepoints, e.g. for
kmalloc tracing.

*API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
to do it. The name previously used was misleading.

Updates scheduler instrumentation to follow this API change.


-- 
Thanks!
BestRegards



Re: [PATCH 1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

2016-07-07 Thread Leizhen (ThunderTown)


On 2016/7/7 23:37, Catalin Marinas wrote:
> On Thu, Jul 07, 2016 at 08:09:04PM +0800, Zhen Lei wrote:
>> At present, PG_dcache_clean is only cleared when the related huge page
>> is about to be freed. But sometimes, there maybe a process is in charge
>> to copy binary codes into a shared memory, and notifies other processes
>> to execute base on that. For the first time, there is no problem, because
>> the default value of page->flags is PG_dcache_clean cleared. So the cache
>> will be maintained at the time of set_pte_at for other processes. But if
>> the content of the shared memory have been updated again, there is no
>> cache operations, because the PG_dcache_clean is still set.
>>
>> For example:
>> Process A
>>  open a hugetlbfs file
>>  mmap it as a shared memory
>>  copy some binary codes into it
>>  munmap
>>
>> Process B
>>  open the hugetlbfs file
>>  mmap it as a shared memory, executable
>>  invoke the functions in the shared memory
>>  munmap
>>
>> repeat the above steps.
> 
> Does this work as you would expect with small pages (and for example
> shared file mmap)? I don't want to have a different behaviour between
> small and huge pages.

The small pages also have this problem, I will try to fix it too.

> 



Re: [PATCH 1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

2016-07-08 Thread Leizhen (ThunderTown)


On 2016/7/8 21:54, Catalin Marinas wrote:
> On Fri, Jul 08, 2016 at 11:36:57AM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/7/7 23:37, Catalin Marinas wrote:
>>> On Thu, Jul 07, 2016 at 08:09:04PM +0800, Zhen Lei wrote:
>>>> At present, PG_dcache_clean is only cleared when the related huge page
>>>> is about to be freed. But sometimes, there maybe a process is in charge
>>>> to copy binary codes into a shared memory, and notifies other processes
>>>> to execute base on that. For the first time, there is no problem, because
>>>> the default value of page->flags is PG_dcache_clean cleared. So the cache
>>>> will be maintained at the time of set_pte_at for other processes. But if
>>>> the content of the shared memory have been updated again, there is no
>>>> cache operations, because the PG_dcache_clean is still set.
>>>>
>>>> For example:
>>>> Process A
>>>>open a hugetlbfs file
>>>>mmap it as a shared memory
>>>>copy some binary codes into it
>>>>munmap
>>>>
>>>> Process B
>>>>open the hugetlbfs file
>>>>mmap it as a shared memory, executable
>>>>invoke the functions in the shared memory
>>>>munmap
>>>>
>>>> repeat the above steps.
>>>
>>> Does this work as you would expect with small pages (and for example
>>> shared file mmap)? I don't want to have a different behaviour between
>>> small and huge pages.
>>
>> The small pages also have this problem, I will try to fix it too.
> 
> Have you run the above tests on a standard file (with small pages)? It's
> strange that we haven't hit this so far with gcc or something else
> generating code (unless they don't use mmap but just sequential writes).
The test code should be randomly generated, to make sure the context
in ICache is always stale. I have attached the simplified testcase demo.

The main portion is picked as below:
srand(time(NULL));
ptr = (unsigned int *)share_mem;
*ptr++ = 0xd280;//mov x0, #0
for (i = 0, total = 0; i < 100; i++) {
value = 0xfff & rand();
total += value;
*ptr++ = 0xb100 | (value << 10);//adds x0, x0, #value
}
*ptr = 0xd65f03c0;  //ret

> 
> If both cases need solving, we might better move the fix in the
> __sync_icache_dcache() function. Untested:
Yes.

At first I also want to fix it as below. But I'm not sure which time the 
PageDirty
will be cleared, and if two or more processes mmap it as executable, cache 
operations
will be duplicated. At present, I really have not found any good place to clear
PG_dcache_clean. So the below modification may be the best choice, concisely 
and clearly.

> 
> 8<
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index dbd12ea8ce68..c753fa804165 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)
>   if (!page_mapping(page))
>   return;
>  
> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> + if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||
> + PageDirty(page))
>   sync_icache_aliases(page_address(page),
>   PAGE_SIZE << compound_order(page));
>   else if (icache_is_aivivt())
> 8<-
> 
> BTW, can you make your tests (source) available somewhere?
Both cases worked well with this patch.

> 
> Thanks.
> 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define FILENAME"/mnt/huge/test_file"
#define TST_MMAP_SIZE   0x20

typedef unsigned int (*TEST_FUNC_T)(void);

/*
 * mkdir -p /mnt/huge
 * echo 20 > /proc/sys/vm/nr_hugepages
 * mount none /mnt/huge -t hugetlbfs -o pagesize=2048K
 */
int main(void)
{
int i;
int fd;
int ret;
void *share_mem;
size_t size;
struct stat sb;
TEST_FUNC_T func_ptr;
unsigned int value, total;
unsigned int *ptr;

fd = open(FILENAME, O_RDWR | O_CREAT);
if (fd == -1) {
printf("Open file %s failed P1: %s\n", FILENAME, 
strerror(errno));
return 1;
}

lseek(fd, TST_MMAP_SIZE - 1, SEEK_SET);  
write(fd, "", 1);

share_mem = mmap(NULL, TST_MMAP_SIZE, PROT_READ | PROT_WRITE, 
MAP_SHARED, fd, 0);
 

Re: [PATCH v2 5/5] arm64/numa: avoid inconsistent information to be printed

2016-05-31 Thread Leizhen (ThunderTown)


On 2016/5/31 19:27, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/5/31 17:07, Matthias Brugger wrote:
>>
>>
>> On 28/05/16 11:22, Zhen Lei wrote:
>>> numa_init(of_numa_init) may returned error because of numa configuration
>>> error. So "No NUMA configuration found" is inaccurate. In fact, specific
>>> configuration error information should be immediately printed by the
>>> testing branch.
>>>
>>> Signed-off-by: Zhen Lei 
>>> ---
>>
>> Which kernel version is this patch based on?
> 
> Base on 
> mainline(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git), I 
> git pulled about 3-5 days ago, the last commit-id is dc03c0f.
> 
> And thess patches base on https://lkml.org/lkml/2016/5/24/679 series(acpi 
> numa) as David Daney's requirement.
> 
>>
>> Regards,
>> Matthias
>>
>>>   arch/arm64/mm/numa.c | 6 +++---
>>>   drivers/of/of_numa.c | 7 +++
>>>   2 files changed, 6 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index 2601660..1b9622c 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -338,8 +338,10 @@ static int __init numa_init(int (*init_func)(void))
>>>   if (ret < 0)
>>>   return ret;
>>>
>>> -if (nodes_empty(numa_nodes_parsed))
>>> +if (nodes_empty(numa_nodes_parsed)) {
>>> +pr_info("No NUMA configuration found\n");
>>>   return -EINVAL;
>>> +}
>>>
>>>   ret = numa_register_nodes();
>>>   if (ret < 0)
>>> @@ -370,8 +372,6 @@ static int __init dummy_numa_init(void)
>>>
>>>   if (numa_off)
>>>   pr_info("NUMA disabled\n"); /* Forced off on command line. */
>>> -else
>>> -pr_info("No NUMA configuration found\n");
>>>   pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n",
>>>  0LLU, PFN_PHYS(max_pfn) - 1);
>>>
>>> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
>>> index fb62307..3157130 100644
>>> --- a/drivers/of/of_numa.c
>>> +++ b/drivers/of/of_numa.c
>>> @@ -63,7 +63,7 @@ static int __init of_numa_parse_memory_nodes(void)
>>>   struct device_node *np = NULL;
>>>   struct resource rsrc;
>>>   u32 nid;
>>> -int i, r = 0;
>>> +int i, r;
>>>
>>>   for_each_node_by_type(np, "memory") {
>>>   r = of_property_read_u32(np, "numa-node-id", &nid);
>>> @@ -81,12 +81,11 @@ static int __init of_numa_parse_memory_nodes(void)
>>>   if (!i || r) {
>>>   of_node_put(np);
>>>   pr_err("NUMA: bad property in memory node\n");
>>> -r = r ? : -EINVAL;
>>> -break;
>>> +return r ? : -EINVAL;
>>>   }
>>>   }
>>>
>>> -return r;
>>> +return 0;
>>>   }
>>>
>>
>> Well this is fixing changes you introduced in this patch-set. Any reason 
>> this is not part of patch 2?
> 
> Because they fixed two different problems.

Hi, Matthias

I thougth it again on my way home yesterday. Yeah, you're right, move this part 
to patch 2, will make these two
patches looks more well. I put it here before, because for "No numa 
configuration" case, it originally returns error
code, so that it can not walk to "if (nodes_empty(numa_nodes_parsed))".

ret = init_func();
if (ret < 0)
return ret;

   -if (nodes_empty(numa_nodes_parsed))
   +if (nodes_empty(numa_nodes_parsed)) {
   +pr_info("No NUMA configuration found\n");
return -EINVAL;
   +}

Regards,
Zhen Lei

> 
>>
>>>   static int __init of_numa_parse_distance_map_v1(struct device_node *map)
>>> -- 
>>> 2.5.0
>>>
>>>
>>>
>>> ___
>>> linux-arm-kernel mailing list
>>> linux-arm-ker...@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
>> .
>>



Re: [PATCH v2 2/5] of/numa: fix a memory@ node can only contains one memory block

2016-06-01 Thread Leizhen (ThunderTown)


On 2016/6/2 4:13, Rob Herring wrote:
> On Sat, May 28, 2016 at 4:22 AM, Zhen Lei  wrote:
>> For a normal memory@ devicetree node, its reg property can contains more
>> memory blocks.
>>
>> Because we don't known how many memory blocks maybe contained, so we try
>> from index=0, increase 1 until error returned(the end).
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/of/of_numa.c | 26 +-
>>  1 file changed, 9 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
>> index fb71b4e..fa85a51 100644
>> --- a/drivers/of/of_numa.c
>> +++ b/drivers/of/of_numa.c
>> @@ -63,13 +63,9 @@ static int __init of_numa_parse_memory_nodes(void)
>> struct device_node *np = NULL;
>> struct resource rsrc;
>> u32 nid;
>> -   int r = 0;
>> -
>> -   for (;;) {
>> -   np = of_find_node_by_type(np, "memory");
>> -   if (!np)
>> -   break;
>> +   int i, r = 0;
>>
>> +   for_each_node_by_type(np, "memory") {
>> r = of_property_read_u32(np, "numa-node-id", &nid);
>> if (r == -EINVAL)
>> /*
>> @@ -78,21 +74,17 @@ static int __init of_numa_parse_memory_nodes(void)
>>  * "numa-node-id" property
>>  */
>> continue;
>> -   else if (r)
>> -   /* some other error */
>> -   break;
>>
>> -   r = of_address_to_resource(np, 0, &rsrc);
>> -   if (r) {
>> -   pr_err("NUMA: bad reg property in memory node\n");
>> -   break;
>> -   }
>> +   for (i = 0; !r && !of_address_to_resource(np, i, &rsrc); i++)
>> +   r = numa_add_memblk(nid, rsrc.start, rsrc.end + 1);
>>
>> -   r = numa_add_memblk(nid, rsrc.start, rsrc.end + 1);
>> -   if (r)
>> +   if (!i || r) {
>> +   of_node_put(np);
>> +   pr_err("NUMA: bad property in memory node\n");
>> +   r = r ? : -EINVAL;
>> break;
>> +   }
>> }
>> -   of_node_put(np);
> 
> I believe you still need this and not the one above. You only need it
> within the loop if you return. Otherwise, the last node always need to
> be put.

OK. Thanks.

Addition with Matthias's suggestion, I will move "return" into this patch, so 
that this of_node_put(np) can be safely removed.


> 
> With that, for the series:
> 
> Acked-by: Rob Herring 
> 
> Rob
> 
> .
> 



Re: [PATCH v2 5/5] arm64/numa: avoid inconsistent information to be printed

2016-05-31 Thread Leizhen (ThunderTown)


On 2016/5/31 17:07, Matthias Brugger wrote:
> 
> 
> On 28/05/16 11:22, Zhen Lei wrote:
>> numa_init(of_numa_init) may returned error because of numa configuration
>> error. So "No NUMA configuration found" is inaccurate. In fact, specific
>> configuration error information should be immediately printed by the
>> testing branch.
>>
>> Signed-off-by: Zhen Lei 
>> ---
> 
> Which kernel version is this patch based on?

Base on 
mainline(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git), I 
git pulled about 3-5 days ago, the last commit-id is dc03c0f.

And thess patches base on https://lkml.org/lkml/2016/5/24/679 series(acpi numa) 
as David Daney's requirement.

> 
> Regards,
> Matthias
> 
>>   arch/arm64/mm/numa.c | 6 +++---
>>   drivers/of/of_numa.c | 7 +++
>>   2 files changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 2601660..1b9622c 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -338,8 +338,10 @@ static int __init numa_init(int (*init_func)(void))
>>   if (ret < 0)
>>   return ret;
>>
>> -if (nodes_empty(numa_nodes_parsed))
>> +if (nodes_empty(numa_nodes_parsed)) {
>> +pr_info("No NUMA configuration found\n");
>>   return -EINVAL;
>> +}
>>
>>   ret = numa_register_nodes();
>>   if (ret < 0)
>> @@ -370,8 +372,6 @@ static int __init dummy_numa_init(void)
>>
>>   if (numa_off)
>>   pr_info("NUMA disabled\n"); /* Forced off on command line. */
>> -else
>> -pr_info("No NUMA configuration found\n");
>>   pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n",
>>  0LLU, PFN_PHYS(max_pfn) - 1);
>>
>> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
>> index fb62307..3157130 100644
>> --- a/drivers/of/of_numa.c
>> +++ b/drivers/of/of_numa.c
>> @@ -63,7 +63,7 @@ static int __init of_numa_parse_memory_nodes(void)
>>   struct device_node *np = NULL;
>>   struct resource rsrc;
>>   u32 nid;
>> -int i, r = 0;
>> +int i, r;
>>
>>   for_each_node_by_type(np, "memory") {
>>   r = of_property_read_u32(np, "numa-node-id", &nid);
>> @@ -81,12 +81,11 @@ static int __init of_numa_parse_memory_nodes(void)
>>   if (!i || r) {
>>   of_node_put(np);
>>   pr_err("NUMA: bad property in memory node\n");
>> -r = r ? : -EINVAL;
>> -break;
>> +return r ? : -EINVAL;
>>   }
>>   }
>>
>> -return r;
>> +return 0;
>>   }
>>
> 
> Well this is fixing changes you introduced in this patch-set. Any reason this 
> is not part of patch 2?

Because they fixed two different problems.

> 
>>   static int __init of_numa_parse_distance_map_v1(struct device_node *map)
>> -- 
>> 2.5.0
>>
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 
> .
> 



Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc

2016-10-26 Thread Leizhen (ThunderTown)


On 2016/10/26 17:31, Michal Hocko wrote:
> On Wed 26-10-16 11:10:44, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/10/25 21:23, Michal Hocko wrote:
>>> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>>>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>>>> actually exist. The percpu variable areas and numa control blocks of that
>>>> memoryless numa nodes need to be allocated from the nearest available
>>>> node to improve performance.
>>>>
>>>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>>>> specified nid at the first time, but if that allocation failed it will
>>>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>>>> the second time.
>>>>
>>>> To compatible the above old scene, I use a marco node_distance_ready to
>>>> control it. By default, the marco node_distance_ready is not defined in
>>>> any platforms, the above mentioned functions will work as normal as
>>>> before. Otherwise, they will try the nearest node first.
>>>
>>> I am sorry but it is absolutely unclear to me _what_ is the motivation
>>> of the patch. Is this a performance optimization, correctness issue or
>>> something else? Could you please restate what is the problem, why do you
>>> think it has to be fixed at memblock layer and describe what the actual
>>> fix is please?
>>
>> This is a performance optimization.
> 
> Do you have any numbers to back the improvements?
I have not collected any performance data, but at least in theory, it's 
beneficial and harmless,
except make code looks a bit urly. Because all related functions are actually 
defined as __init,
for example:
phys_addr_t __init memblock_alloc_try_nid(
void * __init memblock_virt_alloc_try_nid(

And all related memory(percpu variables and NODE_DATA) is mostly referred at 
running time.

> 
>> The problem is if some memoryless numa nodes are
>> actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no 
>> memory,
>> and the node distances is as below:
>> -board---
>>  |   |
>> |   |
>>  socket0 socket1
>>/ \ / \
>>   /   \   /   \
>>node0 node1 node2 node3
>> distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on 
>> node1 access
>> the memory of node0 is faster than node2 or node3.
>>
>> Linux defines a lot of percpu variables, each cpu has a copy of it and most 
>> of the time
>> only to access their own percpu area. In this example, we hope the percpu 
>> area of CPUs
>> on node1 allocated from node0. But without these patches, it's not sure that.
> 
> I am not familiar with the percpu allocator much so I might be
> completely missig a point but why cannot this be solved in the percpu
> allocator directly e.g. by using cpu_to_mem which should already be
> memoryless aware.
My test result told me that it can not:
[0.00] Initmem setup node 0 [mem 0x-0x0011]
[0.00] Could not find start_pfn for node 1
[0.00] Initmem setup node 1 [mem 0x-0x]
[0.00] Initmem setup node 2 [mem 0x0012-0x0013]
[0.00] Initmem setup node 3 [mem 0x0014-0x0017]


[   14.801895] NODE_DATA(0) = 0x11e500
[   14.805749] NODE_DATA(1) = 0x11ca00  //(1), see below
[   14.809602] NODE_DATA(2) = 0x13e500
[   14.813455] NODE_DATA(3) = 0x17fffe5480
[   14.817316] cpu 0 on node0: 11fff87638
[   14.821083] cpu 1 on node0: 11fff9c638
[   14.824850] cpu 2 on node0: 11fffb1638
[   14.828616] cpu 3 on node0: 11fffc6638
[   14.832383] cpu 4 on node1: 17fff8a638   //(2), see below
[   14.836149] cpu 5 on node1: 17fff9f638
[   14.839912] cpu 6 on node1: 17fffb4638
[   14.843677] cpu 7 on node1: 17fffc9638
[   14.847444] cpu 8 on node2: 13fffa4638
[   14.851210] cpu 9 on node2: 13fffb9638
[   14.854976] cpu10 on node2: 13fffce638
[   14.858742] cpu11 on node2: 13fffe3638
[   14.862510] cpu12 on node3: 17fff36638
[   14.866276] cpu13 on node3: 17fff4b638
[   14.870042] cpu14 on node3: 17fff60638
[   14.873809] cpu15 on node3: 17fff75638

(1) memblock_alloc_try_nid and with these patches, memory was allocated from 
node0
(2) do the same implementation as X86 and PowerPC, memory was allocated from 
node3:
return  __alloc_bootmem_node(NODE_DATA(nid), size, align, 
__pa(MAX_DMA_ADDRESS));

I'm not su

Re: [PATCH 2/2] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-10-26 Thread Leizhen (ThunderTown)


On 2016/10/27 2:36, Will Deacon wrote:
> On Tue, Oct 25, 2016 at 10:59:18AM +0800, Zhen Lei wrote:
>> Some numa nodes may have no memory. For example:
>> 1) a node has no memory bank plugged.
>> 2) a node has no memory bank slots.
>>
>> To ensure percpu variable areas and numa control blocks of the
>> memoryless numa nodes to be allocated from the nearest available node to
>> improve performance, defined node_distance_ready. And make its value to be
>> true immediately after node distances have been initialized.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/Kconfig| 4 
>>  arch/arm64/include/asm/numa.h | 3 +++
>>  arch/arm64/mm/numa.c  | 6 +-
>>  3 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 30398db..648dd13 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -609,6 +609,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>  def_bool y
>>  depends on NUMA
>>
>> +config HAVE_MEMORYLESS_NODES
>> +def_bool y
>> +depends on NUMA
> 
> Given that patch 1 and the associated node_distance_ready stuff is all
> an unqualified performance optimisation, is there any merit in just
> enabling HAVE_MEMORYLESS_NODES in Kconfig and then optimising things as
> a separate series when you have numbers to back it up?
HAVE_MEMORYLESS_NODES is also an performance optimisation for memoryless 
scenario.
For example:
node0 is a memoryless node, node1 is the nearest node of node0.
We want to allocate memory from node0, normally memory manager will try node0 
first, then node1.
But we have already kwown that node0 have no memory, so we can tell memory 
manager directly try
node1 first. So HAVE_MEMORYLESS_NODES is used to skip the memoryless nodes, 
don't try them.

So I think the title of this patch is misleading, I will rewrite it in V2.

Or, Do you mean separate it into a new patch?


> 
> Will
> 
> .
> 



Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc

2016-10-27 Thread Leizhen (ThunderTown)


On 2016/10/27 15:22, Michal Hocko wrote:
> On Thu 27-10-16 10:41:24, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/10/26 17:31, Michal Hocko wrote:
>>> On Wed 26-10-16 11:10:44, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2016/10/25 21:23, Michal Hocko wrote:
>>>>> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>>>>>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>>>>>> actually exist. The percpu variable areas and numa control blocks of that
>>>>>> memoryless numa nodes need to be allocated from the nearest available
>>>>>> node to improve performance.
>>>>>>
>>>>>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>>>>>> specified nid at the first time, but if that allocation failed it will
>>>>>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>>>>>> the second time.
>>>>>>
>>>>>> To compatible the above old scene, I use a marco node_distance_ready to
>>>>>> control it. By default, the marco node_distance_ready is not defined in
>>>>>> any platforms, the above mentioned functions will work as normal as
>>>>>> before. Otherwise, they will try the nearest node first.
>>>>>
>>>>> I am sorry but it is absolutely unclear to me _what_ is the motivation
>>>>> of the patch. Is this a performance optimization, correctness issue or
>>>>> something else? Could you please restate what is the problem, why do you
>>>>> think it has to be fixed at memblock layer and describe what the actual
>>>>> fix is please?
>>>>
>>>> This is a performance optimization.
>>>
>>> Do you have any numbers to back the improvements?
>>
>> I have not collected any performance data, but at least in theory,
>> it's beneficial and harmless, except make code looks a bit
>> urly.
> 
> The whole memoryless area is cluttered with hacks because everybody just
> adds pieces here and there to make his particular usecase work IMHO.
> Adding more on top for performance reasons which are even not measured
OK, I will ask my colleagues for help, whether some APPs can be used or not.

> to prove a clear win is a no go. Please step back try to think how this
> could be done with an existing infrastructure we have (some cleanups
OK, I will try to do it. But some infrastructures maybe only restricted in the
theoretical analysis, I don't have the related testing environment, so there is
no way to verify.


> while doing that would be hugely appreciated) and if that is not
> possible then explain why and why it is not feasible to fix that before
I think it will be feasible.

> you start adding a new API.
> 
> Thanks!
> 



Re: [PATCH 1/2] of, numa: Add function to disable of_node_to_nid().

2016-10-27 Thread Leizhen (ThunderTown)


On 2016/10/27 1:00, David Daney wrote:
> On 10/26/2016 06:43 AM, Robert Richter wrote:
>> On 25.10.16 14:31:00, David Daney wrote:
>>> From: David Daney 
>>>
>>> On arm64 NUMA kernels we can pass "numa=off" on the command line to
>>> disable NUMA.  A side effect of this is that kmalloc_node() calls to
>>> non-zero nodes will crash the system with an OOPS:
>>>
>>> [0.00] [] __alloc_pages_nodemask+0xa4/0xe68
>>> [0.00] [] new_slab+0xd0/0x57c
>>> [0.00] [] ___slab_alloc+0x2e4/0x514
>>> [0.00] [] __slab_alloc+0x48/0x58
>>> [0.00] [] __kmalloc_node+0xd0/0x2e0
>>> [0.00] [] __irq_domain_add+0x7c/0x164
>>> [0.00] [] its_probe+0x784/0x81c
>>> [0.00] [] its_init+0x48/0x1b0
>>> .
>>> .
>>> .
>>>
>>> This is caused by code like this in kernel/irq/irqdomain.c
>>>
>>>  domain = kzalloc_node(sizeof(*domain) + (sizeof(unsigned int) * size),
>>>GFP_KERNEL, of_node_to_nid(of_node));
>>>
>>> When NUMA is disabled, the concept of a node is really undefined, so
>>> of_node_to_nid() should unconditionally return NUMA_NO_NODE.
>>>
>>> Add __of_force_no_numa() to allow of_node_to_nid() to be forced to
>>> return NUMA_NO_NODE.
>>>
>>> The follow on patch will call this new function from the arm64 numa
>>> code.
>>
>> Didn't that work before?
> 
> I am fairly certain that it used to work.
> 
>> numa=off just maps all mem to node 0.
> 
> Yes, that is the current behavior.
It just deal with the cpu nodes, but I think currently you added "numa-node-id" 
in the peripheral device(maybe ITS).

> 
>> If mem
>> allocation is requested for another node it should just fall back to a
>> node with mem (node 0 then).
> 
> This is the root of the problem.  The ITS code is allocating memory. It calls 
> of_node_to_nid() to determine which node it resides on.  The answer in the 
> failing case is node-1.  Since we have mapped all the memory to node-0 the  
> __kmalloc_node(..., 1) call fails with the OOPS shown.
> 
> It could be that __kmalloc_node() used to allocate memory on a node other 
> than the requested node if the request couldn't be met.  But in v4.8 and 
> later it produces that OOPS.
> 
> If you pass a node containing free memory or NUMA_NO_NODE to 
> __kmalloc_node(), the allocation succeeds.
> 
> When we first did these patches, I advocated removing the numa=off feature, 
> and requiring people to install usable firmware on their systems.  That was 
> rejected on the grounds that not everybody has the ability to change their 
> firmware and we would like to allow NUMA kernels to run on systems with 
> defective firmware by supplying this command line parameter.  Now that I have 
> seen requests from the wild for this, I think it is a good idea to allow 
> numa=off to be used to work around this bad firmware.
> 
> The change in this patch set is fairly small, and seems to get the job done.  
> An alternative would be to change __kmalloc_node() to ignore the node 
> parameter if the request cannot be made, but I assume that there were good 
> reasons to have the current behavior, so that would be a much more 
> complicated change to make.
> 
> 
> 
>> I suspect there is something wrong with
>> the page initialization, see:
>>
>>   http://www.spinics.net/lists/arm-kernel/msg535191.html
>>   https://bugzilla.redhat.com/show_bug.cgi?id=1387793
>>
>> What is the complete oops?
>>
>> So I think k*alloc_node() must be able to handle requests to
>> non-existing nodes. Otherwise your fix is incomplete, assume a failed
>> of_numa_init() causing a dummy init but still some devices reporting a
>> node.
> 
> .
> .
> .
> EFI stub: Booting Linux Kernel...
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services and installing virtual address map...
> [0.00] Booting Linux on physical CPU 0x0
> [0.00] Linux version 4.8.0-rc8-dd (ddaney@localhost.localdomain) (gcc 
> version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #29 SMP Tue Sep 27 15:50:35 
> PDT 2016
> [0.00] Boot CPU: AArch64 Processor [431f0a10]
> [0.00] NUMA turned off
> [0.00] earlycon: pl11 at MMIO 0x87e02400 (options '')
> [0.00] bootconsole [pl11] enabled
> [0.00] efi: Getting EFI parameters from FDT:
> [0.00] efi: EFI v2.40 by Cavium Thunder cn88xx EFI 
> jenkins_weekly_build_40-0-ga1f880f Sep 13 2016 17:05:35
> [0.00] efi:  ACPI=0xf000  ACPI 2.0=0xf014  SMBIOS 
> 3.0=0x10ffafcf000
> [0.00] cma: Reserved 512 MiB at 0xc000
> [0.00] NUMA disabled
> [0.00] NUMA: Faking a node at [mem 
> 0x-0x010f]
> [0.00] NUMA: Adding memblock [0x140 - 0xfffd] on node 0
> [0.00] NUMA: Adding memblock [0xfffe - 0x] on node 0
> [0.00] NUMA: Adding memblock [0x1 - 0xf] on node 0
> [0.00] NUMA: Adding memblock [0x140 - 0x10ffa38] on node 0
> [0.00] NUMA: Adding mem

Re: [PATCH v8 10/16] mm/memblock: add a new function memblock_alloc_near_nid

2016-10-10 Thread Leizhen (ThunderTown)


On 2016/9/1 14:55, Zhen Lei wrote:
> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
> actually exist. The percpu variable areas and numa control blocks of that
> memoryless numa nodes must be allocated from the nearest available node
> to improve performance.
> 
> Signed-off-by: Zhen Lei 
> ---
>  include/linux/memblock.h |  1 +
>  mm/memblock.c| 28 
>  2 files changed, 29 insertions(+)

Hi Will,
  It seems no one take care about this, how about I move below function into 
arch/arm64/mm/numa.c
again? So that, merge it and patch 11 into one.

> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 2925da2..8e866e0 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -290,6 +290,7 @@ static inline int memblock_get_region_node(const struct 
> memblock_region *r)
> 
>  phys_addr_t memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid);
>  phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int 
> nid);
> +phys_addr_t memblock_alloc_near_nid(phys_addr_t size, phys_addr_t align, int 
> nid);
> 
>  phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 483197e..6578fff 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1189,6 +1189,34 @@ again:
>   return ret;
>  }
> 
> +phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size, phys_addr_t 
> align, int nid)
> +{
> + int i, best_nid, distance;
> + u64 pa;
> + DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
> +
> + bitmap_zero(nodes_map, MAX_NUMNODES);
> +
> +find_nearest_node:
> + best_nid = NUMA_NO_NODE;
> + distance = INT_MAX;
> +
> + for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
> + if (node_distance(nid, i) < distance) {
> + best_nid = i;
> + distance = node_distance(nid, i);
> + }
> +
> + pa = memblock_alloc_nid(size, align, best_nid);
> + if (!pa) {
> + BUG_ON(best_nid == NUMA_NO_NODE);
> + bitmap_set(nodes_map, best_nid, 1);
> + goto find_nearest_node;
> + }
> +
> + return pa;
> +}
> +
>  phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t 
> align, phys_addr_t max_addr)
>  {
>   return memblock_alloc_base_nid(size, align, max_addr, NUMA_NO_NODE,
> --
> 2.5.0
> 
> 
> 
> .
> 



Re: [PATCH v8 10/16] mm/memblock: add a new function memblock_alloc_near_nid

2016-10-11 Thread Leizhen (ThunderTown)


On 2016/10/11 18:16, Will Deacon wrote:
> On Tue, Oct 11, 2016 at 09:44:20AM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/9/1 14:55, Zhen Lei wrote:
>>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>>> actually exist. The percpu variable areas and numa control blocks of that
>>> memoryless numa nodes must be allocated from the nearest available node
>>> to improve performance.
>>>
>>> Signed-off-by: Zhen Lei 
>>> ---
>>>  include/linux/memblock.h |  1 +
>>>  mm/memblock.c| 28 
>>>  2 files changed, 29 insertions(+)
>>
>> Hi Will,
>>   It seems no one take care about this, how about I move below function into 
>> arch/arm64/mm/numa.c
>> again? So that, merge it and patch 11 into one.
> 
> I'd rather you reposted it after the merge window so we can see what to
> do with it then. The previous posting was really hard to figure out and
> mixed lots of different concepts into one series, so it's not completely
> surprising that it didn't all get picked up.
OK, thanks.

> 
> Will
> 
> .
> 



Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc

2016-10-25 Thread Leizhen (ThunderTown)


On 2016/10/25 21:23, Michal Hocko wrote:
> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>> actually exist. The percpu variable areas and numa control blocks of that
>> memoryless numa nodes need to be allocated from the nearest available
>> node to improve performance.
>>
>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>> specified nid at the first time, but if that allocation failed it will
>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>> the second time.
>>
>> To compatible the above old scene, I use a marco node_distance_ready to
>> control it. By default, the marco node_distance_ready is not defined in
>> any platforms, the above mentioned functions will work as normal as
>> before. Otherwise, they will try the nearest node first.
> 
> I am sorry but it is absolutely unclear to me _what_ is the motivation
> of the patch. Is this a performance optimization, correctness issue or
> something else? Could you please restate what is the problem, why do you
> think it has to be fixed at memblock layer and describe what the actual
> fix is please?
This is a performance optimization. The problem is if some memoryless numa 
nodes are
actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no 
memory,
and the node distances is as below:
-board---
|   |
|   |
 socket0 socket1
   / \ / \
  /   \   /   \
   node0 node1 node2 node3
distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 
access
the memory of node0 is faster than node2 or node3.

Linux defines a lot of percpu variables, each cpu has a copy of it and most of 
the time
only to access their own percpu area. In this example, we hope the percpu area 
of CPUs
on node1 allocated from node0. But without these patches, it's not sure that.

If each node has their own memory, we can directly use below functions to 
allocate memory
from its local node:
1. memblock_alloc_nid
2. memblock_alloc_try_nid
3. memblock_virt_alloc_try_nid_nopanic
4. memblock_virt_alloc_try_nid

So, these patches is only used for numa memoryless scenario.

Another use case is the control block "extern pg_data_t *node_data[]",
Here is an example of x86 numa in arch/x86/mm/numa.c:
static void __init alloc_node_data(int nid)
{
... ...
/*
 * Allocate node data.  Try node-local memory and then any node.
//==>But the nearest node is the best
 * Never allocate in DMA zone.
 */
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
  MEMBLOCK_ALLOC_ACCESSIBLE);
if (!nd_pa) {
pr_err("Cannot find %zu bytes in node %d\n",
   nd_size, nid);
return;
}
}
nd = __va(nd_pa);
... ...
node_data[nid] = nd;

> 
>>From a quick glance you are trying to bend over the memblock API for
> something that should be handled on a different layer.
> 
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  mm/memblock.c | 76 
>> ++-
>>  1 file changed, 65 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7608bc3..556bbd2 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, 
>> phys_addr_t align)
>>  return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
>>  }
>>
>> +#ifndef node_distance_ready
>> +#define node_distance_ready()   0
>> +#endif
>> +
>> +static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size,
>> +phys_addr_t align, phys_addr_t start,
>> +phys_addr_t end, int nid, ulong flags,
>> +int alloc_func_type)
>> +{
>> +int nnid, round = 0;
>> +u64 pa;
>> +DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>> +
>> +bitmap_zero(nodes_map, MAX_NUMNODES);
>> +
>> +again:
>> +/*
>> + * There are total 4 cases:
>> + * 
>> + *   1)2) node_distance_ready || !node_distance_ready
>> + *  Round 1, nnid = nid = NUMA_NO_NODE;
>> + * 
>> + *   3) !node_distance_ready
>> + *  Round 1, nnid = nid;
>> + *::Round 2, currently only applicable for alloc_func_type = <0>
>> + *  Round 2, nnid = NUMA_NO_NODE;
>> + *   4) node_distance_ready
>> + *  Round 1, LOCAL_DISTANCE, nnid = nid;
>> + *  Round ?, nnid = nearest nid;
>> + */
>> +if (!nod

Re: [PATCH 1/7] iommu/iova: fix incorrect variable types

2017-03-23 Thread Leizhen (ThunderTown)


On 2017/3/23 19:42, Robin Murphy wrote:
> On 22/03/17 06:27, Zhen Lei wrote:
>> Keep these four variables type consistent with the paramters of function
>> __alloc_and_insert_iova_range and the members of struct iova:
>>
>> 1. static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>>  unsigned long size, unsigned long limit_pfn,
>>
>> 2. struct iova {
>>  unsigned long   pfn_hi;
>>  unsigned long   pfn_lo;
>>
>> In fact, limit_pfn is most likely larger than 32 bits on DMA64.
> 
> FWIW if pad_size manages to overflow an int something's probably gone
> horribly wrong, but there's no harm in making it consistent with
> everything else here. However, given that patch #6 makes this irrelevant
> anyway, do we really need to bother?

Because I'm not sure whether patch #6 can be applied or not.

> 
> Robin.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/iova.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index b7268a1..8ba8b496 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -104,8 +104,8 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
>> struct iova *free)
>>   * Computes the padding size required, to make the start address
>>   * naturally aligned on the power-of-two order of its size
>>   */
>> -static unsigned int
>> -iova_get_pad_size(unsigned int size, unsigned int limit_pfn)
>> +static unsigned long
>> +iova_get_pad_size(unsigned long size, unsigned long limit_pfn)
>>  {
>>  return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
>>  }
>> @@ -117,7 +117,7 @@ static int __alloc_and_insert_iova_range(struct 
>> iova_domain *iovad,
>>  struct rb_node *prev, *curr = NULL;
>>  unsigned long flags;
>>  unsigned long saved_pfn;
>> -unsigned int pad_size = 0;
>> +unsigned long pad_size = 0;
>>  
>>  /* Walk the tree backwards */
>>  spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
>>
> 
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH 3/7] iommu/iova: insert start_pfn boundary of dma32

2017-03-23 Thread Leizhen (ThunderTown)


On 2017/3/23 21:01, Robin Murphy wrote:
> On 22/03/17 06:27, Zhen Lei wrote:
>> Reserve the first granule size memory(start at start_pfn) as boundary
>> iova, to make sure that iovad->cached32_node can not be NULL in future.
>> Meanwhile, changed the assignment of iovad->cached32_node from rb_next to
>> rb_prev of &free->node in function __cached_rbnode_delete_update.
> 
> I'm not sure I follow this. It's a top-down allocator, so cached32_node
> points to the last node allocated (or the node above the last one freed)
> on the assumption that there is likely free space directly below there,
> thus it's a pretty good place for the next allocation to start searching
> from. On the other hand, start_pfn is a hard "do not go below this line"
> limit, so it doesn't seem to make any sense to ever point the former at
> the latter.
This patch just prepares for dma64. Because we really need to add the boundary
between dma32 and dma64, there are two main purposes:
1. to make dma32 iova allocation faster, because the last node which dma32 can 
be
seen is the boundary. So dma32 iova allocation will only try within dma32 iova 
space.
Meanwhile, we hope dma64 allocation try dma64 iova space(iova>=4G) first, 
because the
maxium dma32 iova space is 4GB, dma64 iova space is almost richer than dma32.

2. to prevent a allocated iova cross dma32 and dma64 space. Otherwise, this 
special
case should be considered when allocate and free iova.

After the above boundary added, it's better to add start_pfn of dma32 boundary 
also,
to make them to be considered in one model.

After the two boundaries added, adjust cached32/64_node point to the free iova 
node can
simplified programming.


> 
> I could understand slightly more if we were reserving the PFN *above*
> the cached range, but as-is I don't see what we gain from the change
> here, nor what benefit the cached32_node != NULL assumption gives
> (AFAICS it would be more useful to simply restrict the cases where it
> may be NULL to when the address space is either completely full or
> completely empty, or perhaps both).
> 
> Robin.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/iova.c | 63 
>> ++--
>>  1 file changed, 37 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index 1c49969..b5a148e 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -32,6 +32,17 @@ static unsigned long iova_rcache_get(struct iova_domain 
>> *iovad,
>>  static void init_iova_rcaches(struct iova_domain *iovad);
>>  static void free_iova_rcaches(struct iova_domain *iovad);
>>  
>> +static void
>> +insert_iova_boundary(struct iova_domain *iovad)
>> +{
>> +struct iova *iova;
>> +unsigned long start_pfn_32bit = iovad->start_pfn;
>> +
>> +iova = reserve_iova(iovad, start_pfn_32bit, start_pfn_32bit);
>> +BUG_ON(!iova);
>> +iovad->cached32_node = &iova->node;
>> +}
>> +
>>  void
>>  init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>>  unsigned long start_pfn, unsigned long pfn_32bit)
>> @@ -45,27 +56,38 @@ init_iova_domain(struct iova_domain *iovad, unsigned 
>> long granule,
>>  
>>  spin_lock_init(&iovad->iova_rbtree_lock);
>>  iovad->rbroot = RB_ROOT;
>> -iovad->cached32_node = NULL;
>>  iovad->granule = granule;
>>  iovad->start_pfn = start_pfn;
>>  iovad->dma_32bit_pfn = pfn_32bit;
>>  init_iova_rcaches(iovad);
>> +
>> +/*
>> + * Insert boundary nodes for dma32. So cached32_node can not be NULL in
>> + * future.
>> + */
>> +insert_iova_boundary(iovad);
>>  }
>>  EXPORT_SYMBOL_GPL(init_iova_domain);
>>  
>>  static struct rb_node *
>>  __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
>>  {
>> -if ((*limit_pfn > iovad->dma_32bit_pfn) ||
>> -(iovad->cached32_node == NULL))
>> +struct rb_node *cached_node;
>> +struct rb_node *next_node;
>> +
>> +if (*limit_pfn > iovad->dma_32bit_pfn)
>>  return rb_last(&iovad->rbroot);
>> -else {
>> -struct rb_node *prev_node = rb_prev(iovad->cached32_node);
>> -struct iova *curr_iova =
>> -rb_entry(iovad->cached32_node, struct iova, node);
>> -*limit_pfn = curr_iova->pfn_lo - 1;
>> -return prev_node;
>> +else
>> +cached_node = iovad->cached32_node;
>> +
>> +next_node = rb_next(cached_node);
>> +if (next_node) {
>> +struct iova *next_iova = rb_entry(next_node, struct iova, node);
>> +
>> +*limit_pfn = min(*limit_pfn, next_iova->pfn_lo - 1);
>>  }
>> +
>> +return cached_node;
>>  }
>>  
>>  static void
>> @@ -83,20 +105,13 @@ __cached_rbnode_delete_update(struct iova_domain 
>> *iovad, struct iova *free)
>>  struct iova *cached_iova;
>>  struct rb_node *curr;
>>  
>> -if (!iovad->cached32_node)
>> -return;
>>  curr = iovad->cached32_node;
>>  c

Re: [PATCH v5 03/14] arm64/numa: add nid check for memory block

2016-08-09 Thread Leizhen (ThunderTown)


On 2016/8/10 10:12, Hanjun Guo wrote:
> On 2016/8/8 17:18, Zhen Lei wrote:
>> Use the same tactic to cpu and numa-distance nodes.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/mm/numa.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index c7fe3ec..2601660 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -141,6 +141,11 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>  {
>>  int ret;
>>
>> +if (nid >= MAX_NUMNODES) {
>> +pr_warn("NUMA: Node id %u exceeds maximum value\n", nid);
>> +return -EINVAL;
>> +}
> 
> I think this check should be added to of_numa_parse_memory_nodes(), which 
> before
> the numa_add_memblk() called, it's the same logic in 
> of_numa_parse_cpu_nodes() and
> the node id is checked before calling numa_add_memblk() in ACPI.

Yes, you are right. This check is arch independent.

> 
> Thanks
> Hanjun
> 
> 
> 
> .
> 



Re: [PATCH v7 03/14] arm64/numa: add nid check for memory block

2016-08-27 Thread Leizhen (ThunderTown)


On 2016/8/26 20:39, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:42PM +0800, Zhen Lei wrote:
>> Use the same tactic to cpu and numa-distance nodes.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/of/of_numa.c | 5 +
>>  1 file changed, 5 insertions(+)
> 
> The subject has arm64/numa, but this is clearly core OF code and
I originally added below check in arch/arm64/mm/numa.c, until Hanjun Guo
told me that it should move into drivers/of/of_numa.c

I forgot updating this.

> requires an ack from Rob.
> 
> The commit message also doesn't make much sense to me.
> 
>> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
>> index 7b3fbdc..afaeb9c 100644
>> --- a/drivers/of/of_numa.c
>> +++ b/drivers/of/of_numa.c
>> @@ -75,6 +75,11 @@ static int __init of_numa_parse_memory_nodes(void)
>>   */
>>  continue;
>>
>> +if (nid >= MAX_NUMNODES) {
>> +pr_warn("NUMA: Node id %u exceeds maximum value\n", 
>> nid);
>> +return -EINVAL;
>> +}
> 
> Do you really want to return from the function here? Shouldn't we at least
> of_node_put(np), i.e. by using a break; ?
Thanks for pointing out this mistake. I will change to "r = -EINVAL" in the 
next version.

> 
> Will
> 
> .
> 



Re: [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed

2016-08-27 Thread Leizhen (ThunderTown)


On 2016/8/26 20:47, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:44PM +0800, Zhen Lei wrote:
>> numa_init(of_numa_init) may returned error because of numa configuration
>> error. So "No NUMA configuration found" is inaccurate. In fact, specific
>> configuration error information should be immediately printed by the
>> testing branch.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/mm/numa.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 5bb15ea..d97c6e2 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -335,8 +335,10 @@ static int __init numa_init(int (*init_func)(void))
>>  if (ret < 0)
>>  return ret;
>>
>> -if (nodes_empty(numa_nodes_parsed))
>> +if (nodes_empty(numa_nodes_parsed)) {
>> +pr_info("No NUMA configuration found\n");
>>  return -EINVAL;
> 
> Hmm, but dummy_numa_init calls node_set(nid, numa_nodes_parsed) for a
> completely artificial setup, created by adding all memblocks to node 0,
> so this new message will be suppressed even though things really did go
> wrong.
It will be printed by the former: numa_init(of_numa_init)

> 
> In that case, don't we want to print *something* (like we do today in
> dummy_numa_init) but maybe not "No NUMA configuration found"? What
> exactly do you find inaccurate about the current message?
For example:
[0.00] NUMA: No distance-matrix property in distance-map
[0.00] No NUMA configuration found

So if of_numa_init or arm64_acpi_numa_init returned error, because of
some numa configuration error had been found, it's no good to print "No NUMA 
...".

> 
> Will
> 
> .
> 



Re: [PATCH v7 08/14] arm64: numa: Use pr_fmt()

2016-08-27 Thread Leizhen (ThunderTown)


On 2016/8/26 20:54, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:47PM +0800, Zhen Lei wrote:
>> From: Kefeng Wang 
>>
>> Use pr_fmt to prefix kernel output, and remove duplicated msg
>> of NUMA turned off.
>>
>> Signed-off-by: Kefeng Wang 
>> ---
>>  arch/arm64/mm/numa.c | 40 
>>  1 file changed, 20 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index d97c6e2..7b73808 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -17,6 +17,8 @@
>>   * along with this program.  If not, see .
>>   */
>>
>> +#define pr_fmt(fmt) "numa: " fmt
> 
> Shouldn't this be uppercase for consistency with the existing code and
> the code in places like drivers/of/of_numa.c?
OK, I will change it to "NUMA: ".

> 
>>  #include 
>>  #include 
>>  #include 
>> @@ -38,10 +40,9 @@ static __init int numa_parse_early_param(char *opt)
>>  {
>>  if (!opt)
>>  return -EINVAL;
>> -if (!strncmp(opt, "off", 3)) {
>> -pr_info("%s\n", "NUMA turned off");
>> +if (!strncmp(opt, "off", 3))
>>  numa_off = true;
>> -}
>> +
>>  return 0;
>>  }
>>  early_param("numa", numa_parse_early_param);
>> @@ -110,7 +111,7 @@ static void __init setup_node_to_cpumask_map(void)
>>  set_cpu_numa_node(cpu, NUMA_NO_NODE);
>>
>>  /* cpumask_of_node() will now work */
>> -pr_debug("NUMA: Node to cpumask map for %d nodes\n", nr_node_ids);
>> +pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>>  }
>>
>>  /*
>> @@ -145,13 +146,13 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>
>>  ret = memblock_set_node(start, (end - start), &memblock.memory, nid);
>>  if (ret < 0) {
>> -pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node 
>> %d\n",
>> +pr_err("memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>>  start, (end - 1), nid);
>>  return ret;
>>  }
>>
>>  node_set(nid, numa_nodes_parsed);
>> -pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>> +pr_info("Adding memblock [0x%llx - 0x%llx] on node %d\n",
>>  start, (end - 1), nid);
>>  return ret;
>>  }
>> @@ -166,19 +167,18 @@ static void __init setup_node_data(int nid, u64 
>> start_pfn, u64 end_pfn)
>>  void *nd;
>>  int tnid;
>>
>> -pr_info("NUMA: Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> -nid, start_pfn << PAGE_SHIFT,
>> -(end_pfn << PAGE_SHIFT) - 1);
>> +pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> +nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>
>>  nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>>  nd = __va(nd_pa);
>>
>>  /* report and initialize */
>> -pr_info("NUMA: NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> +pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> 
> Why are you adding leading whitespace?
Kefeng Wang said that just in order to make the final print info looks more 
clear.

I will remove the leading whitespace in v8.

> 
>>  nd_pa, nd_pa + nd_size - 1);
>>  tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>>  if (tnid != nid)
>> -pr_info("NUMA: NODE_DATA(%d) on node %d\n", nid, tnid);
>> +pr_info("NODE_DATA(%d) on node %d\n", nid, tnid);
> 
> 
> Same here.
> 
>>  node_data[nid] = nd;
>>  memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> @@ -235,8 +235,7 @@ static int __init numa_alloc_distance(void)
>>  numa_distance[i * numa_distance_cnt + j] = i == j ?
>>  LOCAL_DISTANCE : REMOTE_DISTANCE;
>>
>> -pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>> -numa_distance_cnt);
>> +pr_debug("Initialized distance table, cnt=%d\n", numa_distance_cnt);
>>
>>  return 0;
>>  }
>> @@ -257,20 +256,20 @@ static int __init numa_alloc_distance(void)
>>  void __init numa_set_distance(int from, int to, int distance)
>>  {
>>  if (!numa_distance) {
>> -pr_warn_once("NUMA: Warning: distance table not allocated 
>> yet\n");
>> +pr_warn_once("Warning: distance table not allocated yet\n");
>>  return;
>>  }
>>
>>  if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>>  from < 0 || to < 0) {
>> -pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d 
>> to=%d distance=%d\n",
>> +pr_warn_once("Warning: node ids are out of bound, from=%d to=%d 
>> distance=%d\n",
>>  from, to, distance);
>>  return;
>>  }
>>
>>  if ((u8)distance != distance ||
>>  (from == to && distance != LOCAL_DISTANCE)) {
>> -pr_warn_once("NUMA: Warning: invalid distance parameter, 
>> from=%d to=%d distance=%d\n",
>> +pr_warn_once("Wa

Re: [PATCH v7 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA

2016-08-27 Thread Leizhen (ThunderTown)


On 2016/8/26 21:28, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:48PM +0800, Zhen Lei wrote:
>> To make each percpu area allocated from its local numa node. Without this
>> patch, all percpu areas will be allocated from the node which cpu0 belongs
>> to.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/Kconfig   |  8 
>>  arch/arm64/mm/numa.c | 55 
>> 
>>  2 files changed, 63 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index bc3f00f..2815af6 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -603,6 +603,14 @@ config USE_PERCPU_NUMA_NODE_ID
>>  def_bool y
>>  depends on NUMA
>>
>> +config HAVE_SETUP_PER_CPU_AREA
>> +def_bool y
>> +depends on NUMA
>> +
>> +config NEED_PER_CPU_EMBED_FIRST_CHUNK
>> +def_bool y
>> +depends on NUMA
> 
> Why do we need this? Is it purely about using block mappings for the
> pcpu area?
Without NEED_PER_CPU_EMBED_FIRST_CHUNK, Link error will be reported.

#if defined(CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK) || \
!defined(CONFIG_HAVE_SETUP_PER_CPU_AREA)
#define BUILD_EMBED_FIRST_CHUNK
#endif

#if defined(BUILD_EMBED_FIRST_CHUNK)
//pcpu_embed_first_chunk definition
#endif

setup_per_cpu_areas -->pcpu_embed_first_chunk


> 
>>  source kernel/Kconfig.preempt
>>  source kernel/Kconfig.hz
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 7b73808..5e44ad1 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -26,6 +26,7 @@
>>  #include 
>>
>>  #include 
>> +#include 
>>
>>  struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>>  EXPORT_SYMBOL(node_data);
>> @@ -131,6 +132,60 @@ void __init early_map_cpu_to_node(unsigned int cpu, int 
>> nid)
>>  cpu_to_node_map[cpu] = nid;
>>  }
>>
>> +#ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>> +unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
>> +EXPORT_SYMBOL(__per_cpu_offset);
>> +
>> +static int __init early_cpu_to_node(int cpu)
>> +{
>> +return cpu_to_node_map[cpu];
>> +}
>> +
>> +static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
>> +{
>> +if (early_cpu_to_node(from) == early_cpu_to_node(to))
>> +return LOCAL_DISTANCE;
>> +else
>> +return REMOTE_DISTANCE;
>> +}
> 
> Is it too early to use __node_distance here?
Good, we can directly use node_distance, thanks.

> 
>> +static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size,
>> +   size_t align)
>> +{
>> +int nid = early_cpu_to_node(cpu);
>> +
>> +return  memblock_virt_alloc_try_nid(size, align,
>> +__pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>> +}
>> +
>> +static void __init pcpu_fc_free(void *ptr, size_t size)
>> +{
>> +memblock_free_early(__pa(ptr), size);
>> +}
>> +
>> +void __init setup_per_cpu_areas(void)
>> +{
>> +unsigned long delta;
>> +unsigned int cpu;
>> +int rc;
>> +
>> +/*
>> + * Always reserve area for module percpu variables.  That's
>> + * what the legacy allocator did.
>> + */
>> +rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
>> +PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
>> +pcpu_cpu_distance,
>> +pcpu_fc_alloc, pcpu_fc_free);
>> +if (rc < 0)
>> +panic("Failed to initialize percpu areas.");
>> +
>> +delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
>> +for_each_possible_cpu(cpu)
>> +__per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
>> +}
>> +#endif
> 
> It's a pity that this is practically identical to PowerPC. Ideally, there
> would be definitions of this initialisation gunk in the core code that
> could be reused across architectures.
But these are different from other ARCHs, except PPC.

I originally want to put it into driver/of/of_numa.c, but now the ACPI NUMA is
coming up, so I don't known where.

> 
> Will
> 
> .
> 



Re: [PATCH v7 10/14] arm64/numa: define numa_distance as array to simplify code

2016-08-27 Thread Leizhen (ThunderTown)


On 2016/8/26 23:29, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:49PM +0800, Zhen Lei wrote:
>> 1. MAX_NUMNODES is base on CONFIG_NODES_SHIFT, the default value of the
>>latter is very small now.
>> 2. Suppose the default value of MAX_NUMNODES is enlarged to 64, so the
>>size of numa_distance is 4K, it's still acceptable if run the Image
>>on other processors.
>> 3. It will make function __node_distance quicker than before.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/include/asm/numa.h |  1 -
>>  arch/arm64/mm/numa.c  | 74 
>> +++
>>  2 files changed, 5 insertions(+), 70 deletions(-)
> 
> I fail to see the advantages of this patch. Do you have some compelling
> performance figures or something?

We can only put numa_distance_cnt on one node, so for the cpus of other nodes 
to access it should
spend more time. I have not tested how many can be improved yet.

I will try to get some data next week.

> 
> Will
> 
> .
> 



Re: [PATCH v7 14/14] Documentation: remove the constraint on the distances of node pairs

2016-08-27 Thread Leizhen (ThunderTown)


On 2016/8/26 23:35, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:53PM +0800, Zhen Lei wrote:
>> Update documentation. This limit is unneccessary.
>>
>> Signed-off-by: Zhen Lei 
>> Acked-by: Rob Herring 
>> ---
>>  Documentation/devicetree/bindings/numa.txt | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/numa.txt 
>> b/Documentation/devicetree/bindings/numa.txt
>> index 21b3505..c0ea4a7 100644
>> --- a/Documentation/devicetree/bindings/numa.txt
>> +++ b/Documentation/devicetree/bindings/numa.txt
>> @@ -48,7 +48,6 @@ distance (memory latency) between all numa nodes.
>>
>>Note:
>>  1. Each entry represents distance from first node to second node.
>> -The distances are equal in either direction.
> 
> Hmm, so what happens now if firmware provides a description where both
> distances (in either direction) are supplied, but are different?
I have not known any hardware that the distances of two direction are different 
yet, but:
1. software have no need to limit the distances of two direction must be equal.
2. suppose below software scenario:
   1) cpu0 and cpu1 belong to the same hardware node.
   2) cpu0 is a master control CPU, many tasks and interrupts deliver to cpu0 
first. So cpu0 often busy than cpu1.
   3) we split cpu0 and cpu1 into two logical nodes, cpu0 belongs to node0, 
cpu1 belong to node1. Now, we make
  the distance from cpu0 to cpu1 larger than the distance from cpu1 to cpu0.

> 
> Will
> 
> .
> 



Re: [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-08-27 Thread Leizhen (ThunderTown)


On 2016/8/26 23:43, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:50PM +0800, Zhen Lei wrote:
>> Some numa nodes may have no memory. For example:
>> 1. cpu0 on node0
>> 2. cpu1 on node1
>> 3. device0 access the momory from node0 and node1 take the same time.
>>
>> So, we can not simply classify device0 to node0 or node1, but we can
>> define a node2 which distances to node0 and node1 are the same.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/Kconfig  |  4 
>>  arch/arm64/kernel/smp.c |  1 +
>>  arch/arm64/mm/numa.c| 43 +--
>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 2815af6..3a2b6ed 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -611,6 +611,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>  def_bool y
>>  depends on NUMA
>>
>> +config HAVE_MEMORYLESS_NODES
>> +def_bool y
>> +depends on NUMA
>> +
>>  source kernel/Kconfig.preempt
>>  source kernel/Kconfig.hz
>>
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index d93d433..4879085 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -619,6 +619,7 @@ static void __init of_parse_and_init_cpus(void)
>>  }
>>
>>  bootcpu_valid = true;
>> +early_map_cpu_to_node(0, of_node_to_nid(dn));
> 
> This seems unrelated?
I will get off my work soon. Maybe I need put it into patch 12.

> 
>>  /*
>>   * cpu_logical_map has already been
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 6853db7..114180f 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -129,6 +129,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int 
>> nid)
>>  nid = 0;
>>
>>  cpu_to_node_map[cpu] = nid;
>> +
>> +/*
>> + * We should set the numa node of cpu0 as soon as possible, because it
>> + * has already been set up online before. cpu_to_node(0) will soon be
>> + * called.
>> + */
>> +if (!cpu)
>> +set_cpu_numa_node(cpu, nid);
> 
> Likewise.
> 
>>  }
>>
>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>> @@ -211,6 +219,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>  return ret;
>>  }
>>
>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t 
>> size)
>> +{
>> +int i, best_nid, distance;
>> +u64 pa;
>> +DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>> +
>> +bitmap_zero(nodes_map, MAX_NUMNODES);
>> +bitmap_set(nodes_map, nid, 1);
>> +
>> +find_nearest_node:
>> +best_nid = NUMA_NO_NODE;
>> +distance = INT_MAX;
>> +
>> +for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>> +if (numa_distance[nid][i] < distance) {
>> +best_nid = i;
>> +distance = numa_distance[nid][i];
>> +}
>> +
>> +pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
>> +if (!pa) {
>> +BUG_ON(best_nid == NUMA_NO_NODE);
>> +bitmap_set(nodes_map, best_nid, 1);
>> +goto find_nearest_node;
>> +}
>> +
>> +return pa;
>> +}
>> +
>>  /**
>>   * Initialize NODE_DATA for a node on the local memory
>>   */
>> @@ -224,7 +261,9 @@ static void __init setup_node_data(int nid, u64 
>> start_pfn, u64 end_pfn)
>>  pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>  nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>
>> -nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +if (!nd_pa)
>> +nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
> 
> Why not add memblock_alloc_near_nid to the core code, and make it do
> what you need there?
I'm thinking about it next week. But some ARCHs like X86/IA64 have their own 
implementation.

> 
> Will
> 
> .
> 



Re: [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-08-28 Thread Leizhen (ThunderTown)


On 2016/8/27 19:05, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/8/26 23:43, Will Deacon wrote:
>> On Wed, Aug 24, 2016 at 03:44:50PM +0800, Zhen Lei wrote:
>>> Some numa nodes may have no memory. For example:
>>> 1. cpu0 on node0
>>> 2. cpu1 on node1
>>> 3. device0 access the momory from node0 and node1 take the same time.
>>>
>>> So, we can not simply classify device0 to node0 or node1, but we can
>>> define a node2 which distances to node0 and node1 are the same.
>>>
>>> Signed-off-by: Zhen Lei 
>>> ---
>>>  arch/arm64/Kconfig  |  4 
>>>  arch/arm64/kernel/smp.c |  1 +
>>>  arch/arm64/mm/numa.c| 43 +--
>>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 2815af6..3a2b6ed 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -611,6 +611,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>> def_bool y
>>> depends on NUMA
>>>
>>> +config HAVE_MEMORYLESS_NODES
>>> +   def_bool y
>>> +   depends on NUMA
>>> +
>>>  source kernel/Kconfig.preempt
>>>  source kernel/Kconfig.hz
>>>
>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>> index d93d433..4879085 100644
>>> --- a/arch/arm64/kernel/smp.c
>>> +++ b/arch/arm64/kernel/smp.c
>>> @@ -619,6 +619,7 @@ static void __init of_parse_and_init_cpus(void)
>>> }
>>>
>>> bootcpu_valid = true;
>>> +   early_map_cpu_to_node(0, of_node_to_nid(dn));
>>
>> This seems unrelated?
> I will get off my work soon. Maybe I need put it into patch 12.
> 
>>
>>> /*
>>>  * cpu_logical_map has already been
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index 6853db7..114180f 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -129,6 +129,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, 
>>> int nid)
>>> nid = 0;
>>>
>>> cpu_to_node_map[cpu] = nid;
>>> +
>>> +   /*
>>> +* We should set the numa node of cpu0 as soon as possible, because it
>>> +* has already been set up online before. cpu_to_node(0) will soon be
>>> +* called.
>>> +*/
>>> +   if (!cpu)
>>> +   set_cpu_numa_node(cpu, nid);
>>
>> Likewise.
>>
>>>  }
>>>
>>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>>> @@ -211,6 +219,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>> return ret;
>>>  }
>>>
>>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t 
>>> size)
>>> +{
>>> +   int i, best_nid, distance;
>>> +   u64 pa;
>>> +   DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>>> +
>>> +   bitmap_zero(nodes_map, MAX_NUMNODES);
>>> +   bitmap_set(nodes_map, nid, 1);
>>> +
>>> +find_nearest_node:
>>> +   best_nid = NUMA_NO_NODE;
>>> +   distance = INT_MAX;
>>> +
>>> +   for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>>> +   if (numa_distance[nid][i] < distance) {
>>> +   best_nid = i;
>>> +   distance = numa_distance[nid][i];
>>> +   }
>>> +
>>> +   pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
>>> +   if (!pa) {
>>> +   BUG_ON(best_nid == NUMA_NO_NODE);
>>> +   bitmap_set(nodes_map, best_nid, 1);
>>> +   goto find_nearest_node;
>>> +   }
>>> +
>>> +   return pa;
>>> +}
>>> +
>>>  /**
>>>   * Initialize NODE_DATA for a node on the local memory
>>>   */
>>> @@ -224,7 +261,9 @@ static void __init setup_node_data(int nid, u64 
>>> start_pfn, u64 end_pfn)
>>> pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>> nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>>
>>> -   nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>>> +   nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>>> +   if (!nd_pa)
>>> +   nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
>>
>> Why not add memblock_alloc_near_nid to the core code, and make it do
>> what you need there?
> I'm thinking about it next week. But some ARCHs like X86/IA64 have their own 
> implementation.

Do you mean directly and only call alloc_node_data_from_nearest_node? OK, 
that's fine. Thanks.

> 
>>
>> Will
>>
>> .
>>



Re: [PATCH v7 12/14] arm64/numa: remove the limitation that cpu0 must bind to node0

2016-08-29 Thread Leizhen (ThunderTown)


On 2016/8/26 23:49, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:51PM +0800, Zhen Lei wrote:
>> 1. Currently only cpu0 set on cpu_possible_mask and percpu areas have not
>>been initialized.
This description refer to below:
-   for_each_possible_cpu(cpu)
-   set_cpu_numa_node(cpu, NUMA_NO_NODE);

1. When the above code is executed, only the bit of cpu0 was set on 
cpu_possible_mask.
   So that, only set_cpu_numa_node(0, NUMA_NO_NODE); will be executed.
2. set_cpu_numa_node will access percpu variable numa_node, but 
setup_per_cpu_areas is
   called after current time. Without the first problem, it will lead kernel 
crash.

I changed the title of this patch in v7, the original is "remove some useless 
code".
I think I should separate this into a new patch.



>> 2. No reason to limit cpu0 must belongs to node0.
> 
> Whilst I suspect you're using enumerated lists in order to try to make
> things clearer, I'm having a really hard time understanding the commit
> messages you have in this series. It's actually much better if you
> structure them as concise paragraphs explaining:
> 
>   - What is the problem that you're fixing?
> 
>   - How does that problem manifest?
> 
>   - How does the patch fix it?
> 
> As far as I can see, this patch just removes a bunch of code with no
> explanation as to why it's not required or any problems caused by
> keeping it around.
> 
> Will
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/mm/numa.c | 12 ++--
>>  1 file changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 114180f..07a1978 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -94,7 +94,6 @@ void numa_clear_node(unsigned int cpu)
>>   */
>>  static void __init setup_node_to_cpumask_map(void)
>>  {
>> -unsigned int cpu;
>>  int node;
>>
>>  /* setup nr_node_ids if not done yet */
>> @@ -107,9 +106,6 @@ static void __init setup_node_to_cpumask_map(void)
>>  cpumask_clear(node_to_cpumask_map[node]);
>>  }
>>
>> -for_each_possible_cpu(cpu)
>> -set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> -
>>  /* cpumask_of_node() will now work */
>>  pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>>  }
>> @@ -119,13 +115,13 @@ static void __init setup_node_to_cpumask_map(void)
>>   */
>>  void numa_store_cpu_info(unsigned int cpu)
>>  {
>> -map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>> +map_cpu_to_node(cpu, cpu_to_node_map[cpu]);
>>  }
>>
>>  void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>>  {
>>  /* fallback to node 0 */
>> -if (nid < 0 || nid >= MAX_NUMNODES)
>> +if (nid < 0 || nid >= MAX_NUMNODES || numa_off)
>>  nid = 0;
After the below code have been removed, we should make the corresponding 
adjustment.
otherwise, kernel will be crashed if "numa=off" was set in bootargs.

>>
>>  cpu_to_node_map[cpu] = nid;
>> @@ -375,10 +371,6 @@ static int __init numa_init(int (*init_func)(void))
>>
>>  setup_node_to_cpumask_map();
>>
>> -/* init boot processor */
>> -cpu_to_node_map[0] = 0;
>> -map_cpu_to_node(0, 0);
These code limit cpu0 must belong to node0, but our current implementation 
deesn't
have this limitation.

>> -
>>  return 0;
>>  }
>>
>> --
>> 2.5.0
>>
>>
> 
> .
> 



Re: [PATCH v4 12/14] arm64/numa: remove some useless code

2016-06-07 Thread Leizhen (ThunderTown)


On 2016/6/7 16:28, Ganapatrao Kulkarni wrote:
> On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei  wrote:
>> 1. Currently only cpu0 set on cpu_possible_mask and percpu areas have not
>>been initialized.
>> 2. No reason to limit cpu0 must belongs to node0.
> 
> even smp init assumes cpu0/boot processor.
Yes, we define boot cpu as cpu0. But we can not force cpu0 must belongs to 
node0.
For example, we use the same Image and dtb run on two boards. On the first 
board,
BIOS choose cpu-A as boot cpu. But on the other board, BIOS may choose CPU-B as
boot cpu. Although this case is difficult to appear, but we can not sure that it
will not appear.

> is this patch tested on any hardware?
Yes, I tested it on our D02 board.

> can you describe your testing hardware?
Althoug D02 only contains one hardware numa node. But the implementation of numa
software is hardware independent, so I define some logical numa nodes. For 
example:
treat each core as a numa node, and subdivide momory.

>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/mm/numa.c | 8 
>>  1 file changed, 8 deletions(-)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index d73b0a0..92b1692 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -93,7 +93,6 @@ void numa_clear_node(unsigned int cpu)
>>   */
>>  static void __init setup_node_to_cpumask_map(void)
>>  {
>> -   unsigned int cpu;
>> int node;
>>
>> /* setup nr_node_ids if not done yet */
>> @@ -106,9 +105,6 @@ static void __init setup_node_to_cpumask_map(void)
>> cpumask_clear(node_to_cpumask_map[node]);
>> }
>>
>> -   for_each_possible_cpu(cpu)
>> -   set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> -
> 
> do you see this init of setting node id to NUMA_NO_NODE  for each cpu
> happening any where else?
I have used below code to verify my judgement, it's only "cpu=0" printed.

for_each_possible_cpu(cpu)
pr_info("setup_node_to_cpumask_map: cpu=%d\n", cpu);

Actually, the execution sequence is as below:
1. setup_arch
   1) bootmem_init();   -->arm64_numa_init
   2) smp_init_cpus();  -->smp_cpu_setup --> set_cpu_possible(cpu, true);

So that, the above deleted code only set cpu0 to NUMA_NO_NODE. And the below
deleted code set cpu0 to nid0. In fact, the default value of cpu_to_node(0)
is also zero. So I said these code take no effect.

> otherwise, better to have initialised node id/NUMA_NO_NODE to every
> cpu otherwise default  node id will be shown as zero
> which is not correct.
> 
>> /* cpumask_of_node() will now work */
>> pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>>  }
>> @@ -379,10 +375,6 @@ static int __init numa_init(int (*init_func)(void))
>>
>> setup_node_to_cpumask_map();
>>
>> -   /* init boot processor */
>> -   cpu_to_node_map[0] = 0;
>> -   map_cpu_to_node(0, 0);
>> -
> 
> otherwise, how you set numa info for cpu0/boot-processor?
I have done it in the previous patch.

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d099306..9e15297 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
}

bootcpu_valid = true;
+   early_map_cpu_to_node(0, of_node_to_nid(dn));

/*
 * cpu_logical_map has already been
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index df5c842..d73b0a0 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int 
nid)
nid = 0;

cpu_to_node_map[cpu] = nid;
+
+   /*
+* We should set the numa node of cpu0 as soon as possible, because it
+* has already been set up online before. cpu_to_node(0) will soon be
+* called.
+*/
+   if (!cpu)
+   set_cpu_numa_node(cpu, nid);
 }

> 
> thanks
> Ganapat
>> return 0;
>>  }
>>
>> --
>> 2.5.0
>>
>>
>>
> 
> thanks
> ganapat
> 
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-06-07 Thread Leizhen (ThunderTown)


On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
> On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei  wrote:
>> Some numa nodes may have no memory. For example:
>> 1. cpu0 on node0
>> 2. cpu1 on node1
>> 3. device0 access the momory from node0 and node1 take the same time.
> 
> i am wondering, if access to both nodes is same, then why you need numa.
> the example you are quoting is against the basic principle of "numa"
> what is device0 here? cpu?
The device0 can also be a cpu. I drew a simple diagram:

  cpu0 cpu1cpu2/device0
||  |
||  |
   DDR0 DDR1No DIMM slots or no DIMM plugged
 (node0)  (node1) (node2)

>>
>> So, we can not simply classify device0 to node0 or node1, but we can
>> define a node2 which distances to node0 and node1 are the same.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/Kconfig  |  4 
>>  arch/arm64/kernel/smp.c |  1 +
>>  arch/arm64/mm/numa.c| 43 +--
>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 05c1bf1..5904a62 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>> def_bool y
>> depends on NUMA
>>
>> +config HAVE_MEMORYLESS_NODES
>> +   def_bool y
>> +   depends on NUMA
>> +
>>  source kernel/Kconfig.preempt
>>  source kernel/Kconfig.hz
>>
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index d099306..9e15297 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
>> }
>>
>> bootcpu_valid = true;
>> +   early_map_cpu_to_node(0, of_node_to_nid(dn));
>>
>> /*
>>  * cpu_logical_map has already been
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index df5c842..d73b0a0 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int 
>> nid)
>> nid = 0;
>>
>> cpu_to_node_map[cpu] = nid;
>> +
>> +   /*
>> +* We should set the numa node of cpu0 as soon as possible, because 
>> it
>> +* has already been set up online before. cpu_to_node(0) will soon be
>> +* called.
>> +*/
>> +   if (!cpu)
>> +   set_cpu_numa_node(cpu, nid);
>>  }
>>
>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>> @@ -215,6 +223,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>> return ret;
>>  }
>>
>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t 
>> size)
>> +{
>> +   int i, best_nid, distance;
>> +   u64 pa;
>> +   DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>> +
>> +   bitmap_zero(nodes_map, MAX_NUMNODES);
>> +   bitmap_set(nodes_map, nid, 1);
>> +
>> +find_nearest_node:
>> +   best_nid = NUMA_NO_NODE;
>> +   distance = INT_MAX;
>> +
>> +   for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>> +   if (numa_distance[nid][i] < distance) {
>> +   best_nid = i;
>> +   distance = numa_distance[nid][i];
>> +   }
>> +
>> +   pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
>> +   if (!pa) {
>> +   BUG_ON(best_nid == NUMA_NO_NODE);
>> +   bitmap_set(nodes_map, best_nid, 1);
>> +   goto find_nearest_node;
>> +   }
>> +
>> +   return pa;
>> +}
>> +
>>  /**
>>   * Initialize NODE_DATA for a node on the local memory
>>   */
>> @@ -228,7 +265,9 @@ static void __init setup_node_data(int nid, u64 
>> start_pfn, u64 end_pfn)
>> pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>
>> -   nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +   nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +   if (!nd_pa)
>> +   nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
>> nd = __va(nd_pa);
>>
>> /* report and initialize */
>> @@ -238,7 +277,7 @@ static void __init setup_node_data(int nid, u64 
>> start_pfn, u64 end_pfn)
>> if (tnid != nid)
>> pr_info("NODE_DATA(%d) on node %d\n", nid, tnid);
>>
>> -   node_data[nid] = nd;
>> +   NODE_DATA(nid) = nd;
>> memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> NODE_DATA(nid)->node_id = nid;
>> NODE_DATA(nid)->node_start_pfn = start_pfn;
>> --
>> 2.5.0
>>
>>
> Ganapat
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-06-07 Thread Leizhen (ThunderTown)


On 2016/6/7 22:01, Ganapatrao Kulkarni wrote:
> On Tue, Jun 7, 2016 at 6:27 PM, Leizhen (ThunderTown)
>  wrote:
>>
>>
>> On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
>>> On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei  wrote:
>>>> Some numa nodes may have no memory. For example:
>>>> 1. cpu0 on node0
>>>> 2. cpu1 on node1
>>>> 3. device0 access the momory from node0 and node1 take the same time.
>>>
>>> i am wondering, if access to both nodes is same, then why you need numa.
>>> the example you are quoting is against the basic principle of "numa"
>>> what is device0 here? cpu?
>> The device0 can also be a cpu. I drew a simple diagram:
>>
>>   cpu0 cpu1cpu2/device0
>> ||  |
>> ||  |
>>DDR0 DDR1No DIMM slots or no DIMM plugged
>>  (node0)  (node1) (node2)
>>
> 
> thanks for the clarification. your example is for 3 node system, where
> third node is memory less node.
> do you see any issue in supporting this topology with existing code?
If opened HAVE_MEMORYLESS_NODES, it will pick the nearest node for the cpus on
memoryless node.

For example, in include/linux/topology.h
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
...
static inline int cpu_to_mem(int cpu)
{
return per_cpu(_numa_mem_, cpu);
}
...
#else
...
static inline int cpu_to_mem(int cpu)
{
return cpu_to_node(cpu);
}
...
#endif

> I think, this use case should be supported with present code.
> 
>>>>
>>>> So, we can not simply classify device0 to node0 or node1, but we can
>>>> define a node2 which distances to node0 and node1 are the same.
>>>>
>>>> Signed-off-by: Zhen Lei 
>>>> ---
>>>>  arch/arm64/Kconfig  |  4 
>>>>  arch/arm64/kernel/smp.c |  1 +
>>>>  arch/arm64/mm/numa.c| 43 +--
>>>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 05c1bf1..5904a62 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>>> def_bool y
>>>> depends on NUMA
>>>>
>>>> +config HAVE_MEMORYLESS_NODES
>>>> +   def_bool y
>>>> +   depends on NUMA
>>>> +
>>>>  source kernel/Kconfig.preempt
>>>>  source kernel/Kconfig.hz
>>>>
>>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>>> index d099306..9e15297 100644
>>>> --- a/arch/arm64/kernel/smp.c
>>>> +++ b/arch/arm64/kernel/smp.c
>>>> @@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
>>>> }
>>>>
>>>> bootcpu_valid = true;
>>>> +   early_map_cpu_to_node(0, of_node_to_nid(dn));
>>>>
>>>> /*
>>>>  * cpu_logical_map has already been
>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>> index df5c842..d73b0a0 100644
>>>> --- a/arch/arm64/mm/numa.c
>>>> +++ b/arch/arm64/mm/numa.c
>>>> @@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, 
>>>> int nid)
>>>> nid = 0;
>>>>
>>>> cpu_to_node_map[cpu] = nid;
>>>> +
>>>> +   /*
>>>> +* We should set the numa node of cpu0 as soon as possible, 
>>>> because it
>>>> +* has already been set up online before. cpu_to_node(0) will soon 
>>>> be
>>>> +* called.
>>>> +*/
>>>> +   if (!cpu)
>>>> +   set_cpu_numa_node(cpu, nid);
>>>>  }
>>>>
>>>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>>>> @@ -215,6 +223,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 
>>>> end)
>>>> return ret;
>>>>  }
>>>>
>>>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t 
>>>> size)
>>>> +{
>>>> +   int i, best_nid, distance;
>>>> +   u64 pa;
>>>> +   DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>>>> +
>>>> +   bitmap_zero(nodes_map, MAX_NUMNODES);
>>>

Re: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-06-08 Thread Leizhen (ThunderTown)


On 2016/6/8 12:45, Ganapatrao Kulkarni wrote:
> On Wed, Jun 8, 2016 at 7:46 AM, Leizhen (ThunderTown)
>  wrote:
>>
>>
>> On 2016/6/7 22:01, Ganapatrao Kulkarni wrote:
>>> On Tue, Jun 7, 2016 at 6:27 PM, Leizhen (ThunderTown)
>>>  wrote:
>>>>
>>>>
>>>> On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
>>>>> On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei  
>>>>> wrote:
>>>>>> Some numa nodes may have no memory. For example:
>>>>>> 1. cpu0 on node0
>>>>>> 2. cpu1 on node1
>>>>>> 3. device0 access the momory from node0 and node1 take the same time.
>>>>>
>>>>> i am wondering, if access to both nodes is same, then why you need numa.
>>>>> the example you are quoting is against the basic principle of "numa"
>>>>> what is device0 here? cpu?
>>>> The device0 can also be a cpu. I drew a simple diagram:
>>>>
>>>>   cpu0 cpu1cpu2/device0
>>>> ||  |
>>>> ||  |
>>>>DDR0 DDR1No DIMM slots or no DIMM plugged
>>>>  (node0)  (node1) (node2)
>>>>
>>>
>>> thanks for the clarification. your example is for 3 node system, where
>>> third node is memory less node.
>>> do you see any issue in supporting this topology with existing code?
>> If opened HAVE_MEMORYLESS_NODES, it will pick the nearest node for the cpus 
>> on
>> memoryless node.
> 
> i see couple of arch enabled HAVE_MEMORYLESS_NODES, but i don't see
> any code in arch specific numa code for this
> is that means the core code will take care of this?
I just spent some time to read the implementation code of HAVE_MEMORYLESS_NODES 
on PPC and IA64.
For NODE_DATA initialization, it's similar to mine on IA64. But PPC have no 
special process, it's
similar to yours. I think the developers of PPC need to fix it.

I picked the code on IA64 as below:
static void __init *memory_less_node_alloc(int nid, unsigned long pernodesize)
{
void *ptr = NULL;
u8 best = 0xff;
int bestnode = -1, node, anynode = 0;

for_each_online_node(node) {
if (node_isset(node, memory_less_mask))
continue;
else if (node_distance(nid, node) < best) {
best = node_distance(nid, node);
bestnode = node;
}
anynode = node;
}

if (bestnode == -1)
bestnode = anynode;

ptr = __alloc_bootmem_node(pgdat_list[bestnode], pernodesize,
PERCPU_PAGE_SIZE, __pa(MAX_DMA_ADDRESS));

return ptr;
}

/**
 * memory_less_nodes - allocate and initialize CPU only nodes pernode
 *  information.
 */
static void __init memory_less_nodes(void)
{
unsigned long pernodesize;
void *pernode;
int node;

for_each_node_mask(node, memory_less_mask) {
pernodesize = compute_pernodesize(node);
pernode = memory_less_node_alloc(node, pernodesize);
fill_pernode(node, __pa(pernode), pernodesize);
}

return;
}



> 
>>
>> For example, in include/linux/topology.h
>> #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>> ...
>> static inline int cpu_to_mem(int cpu)
>> {
>> return per_cpu(_numa_mem_, cpu);
>> }
>> ...
>> #else
>> ...
>> static inline int cpu_to_mem(int cpu)
>> {
>> return cpu_to_node(cpu);
>> }
>> ...
>> #endif
>>
>>> I think, this use case should be supported with present code.
>>>
>>>>>>
>>>>>> So, we can not simply classify device0 to node0 or node1, but we can
>>>>>> define a node2 which distances to node0 and node1 are the same.
>>>>>>
>>>>>> Signed-off-by: Zhen Lei 
>>>>>> ---
>>>>>>  arch/arm64/Kconfig  |  4 
>>>>>>  arch/arm64/kernel/smp.c |  1 +
>>>>>>  arch/arm64/mm/numa.c| 43 +--
>>>>>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>> index 05c1bf1..5904a62 100644
>>>>>> --- a/arch/arm64/Kconfig
>>>>>> +++ b/arch/arm64/Kconfig
>>>>>> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>>>>> def_bool y
>>>>>> depends

Re: [PATCH v4 00/14] fix some type infos and bugs for arm64/of numa

2016-06-08 Thread Leizhen (ThunderTown)


On 2016/6/7 21:58, Will Deacon wrote:
> On Tue, Jun 07, 2016 at 04:08:04PM +0800, Zhen Lei wrote:
>> v3 -> v4:
>> 1. Packed three patches of Kefeng Wang, patch6-8.
>> 2. Add 6 new patches(9-15) to enhance the numa on arm64.
>>
>> v2 -> v3:
>> 1. Adjust patch2 and patch5 according to Matthias Brugger's advice, to make 
>> the
>>patches looks more well. The final code have no change. 
>>
>> v1 -> v2:
>> 1. Base on https://lkml.org/lkml/2016/5/24/679
> 
> If you want bug fixes to land in 4.7, you'll need to base them on a
> mainline kernel.

I heared that David Daney's acpi numa patch series was accepted and put into 
next branch(Linux 4.8).
Otherwise I will suggest him sending his patch6-7 to mainline first. So that, 
only a very small conflict
will be exist.

I also tested that:
1. git am David Daney's patch6-7, then git am all of my patches on a branch, 
named branch A.
2. git am David Daney's patch6-7 on another branch, named branch B.
3. when I git merge B into branch A, it's still conflict. So I guess git merge 
is based on source code, rather than patches.

So at present, unless the maintainers are willing to resolve the conflict, 
otherwise I update my patches will not work.

Fortunately, these patches are not particularly urgent. So I think I can wait 
until Linux 4.8 start, then send these patches again.
But I'm not sure whether these patches can be merged into Linux 4.8, I really 
hope.

> 
> Will
> 
> .
> 



Re: [PATCH 1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

2016-07-11 Thread Leizhen (ThunderTown)


On 2016/7/9 0:13, Catalin Marinas wrote:
> On Fri, Jul 08, 2016 at 11:24:26PM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/7/8 21:54, Catalin Marinas wrote:
>>> On Fri, Jul 08, 2016 at 11:36:57AM +0800, Leizhen (ThunderTown) wrote:
>>>> On 2016/7/7 23:37, Catalin Marinas wrote:
>>>>> On Thu, Jul 07, 2016 at 08:09:04PM +0800, Zhen Lei wrote:
>>>>>> At present, PG_dcache_clean is only cleared when the related huge page
>>>>>> is about to be freed. But sometimes, there maybe a process is in charge
>>>>>> to copy binary codes into a shared memory, and notifies other processes
>>>>>> to execute base on that. For the first time, there is no problem, because
>>>>>> the default value of page->flags is PG_dcache_clean cleared. So the cache
>>>>>> will be maintained at the time of set_pte_at for other processes. But if
>>>>>> the content of the shared memory have been updated again, there is no
>>>>>> cache operations, because the PG_dcache_clean is still set.
>>>>>>
>>>>>> For example:
>>>>>> Process A
>>>>>>  open a hugetlbfs file
>>>>>>  mmap it as a shared memory
>>>>>>  copy some binary codes into it
>>>>>>  munmap
>>>>>>
>>>>>> Process B
>>>>>>  open the hugetlbfs file
>>>>>>  mmap it as a shared memory, executable
>>>>>>  invoke the functions in the shared memory
>>>>>>  munmap
>>>>>>
>>>>>> repeat the above steps.
>>>>>
>>>>> Does this work as you would expect with small pages (and for example
>>>>> shared file mmap)? I don't want to have a different behaviour between
>>>>> small and huge pages.
>>>>
>>>> The small pages also have this problem, I will try to fix it too.
> [...]
>>> If both cases need solving, we might better move the fix in the
>>> __sync_icache_dcache() function. Untested:
>>
>> At first I also want to fix it as below. But I'm not sure which time the 
>> PageDirty
>> will be cleared, and if two or more processes mmap it as executable, cache 
>> operations
>> will be duplicated. At present, I really have not found any good place to 
>> clear
>> PG_dcache_clean. So the below modification may be the best choice, concisely 
>> and clearly.
>>
>>> 8<
>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>>> index dbd12ea8ce68..c753fa804165 100644
>>> --- a/arch/arm64/mm/flush.c
>>> +++ b/arch/arm64/mm/flush.c
>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)
>>> if (!page_mapping(page))
>>> return;
>>>  
>>> -   if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>> +   if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||
>>> +   PageDirty(page))
>>> sync_icache_aliases(page_address(page),
>>> PAGE_SIZE << compound_order(page));
>>> else if (icache_is_aivivt())
>>> 8<-
>>>
>>> BTW, can you make your tests (source) available somewhere?
>>
>> Both cases worked well with this patch.
> 
> Now I'm even more confused ;). IIUC, after an msync() in user space we
> should flush the pages to disk via write_cache_pages(). This function
> calls clear_page_dirty_for_io() after which PageDirty() is no longer
> true. I can't tell how a subsequent mmap() can see the written pages as
> dirty.
> 

As my tracing, both cases invoked empty function.

int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
..
return file->f_op->fsync(file, start, end, datasync);
}

const struct file_operations hugetlbfs_file_operations = {
.fsync  = noop_fsync,

static const struct file_operations shmem_file_operations = {
.mmap   = shmem_mmap,
#ifdef CONFIG_TMPFS
.fsync  = noop_fsync,



Re: [PATCH v2 2/5] of/numa: fix a memory@ node can only contains one memory block

2016-06-05 Thread Leizhen (ThunderTown)


On 2016/6/3 17:45, Will Deacon wrote:
> On Thu, Jun 02, 2016 at 09:36:40AM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/6/2 4:13, Rob Herring wrote:
>>> I believe you still need this and not the one above. You only need it
>>> within the loop if you return. Otherwise, the last node always need to
>>> be put.
>>
>> OK. Thanks.
>>
>> Addition with Matthias's suggestion, I will move "return" into this patch,
>> so that this of_node_put(np) can be safely removed.
> 
> Do you want to include Kefeng's [1] patches in your series too? We don't
> need two sets of related NUMA cleanups :)

Yes, It's originally suggested by Joe Perches.

> 
> Will
> 
> [1] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/432715.html
> 
> .
> 



Re: [PATCH v3 3/5] arm64/numa: add nid check for memory block

2016-06-05 Thread Leizhen (ThunderTown)
On 2016/6/3 17:52, Will Deacon wrote:
> On Thu, Jun 02, 2016 at 10:28:09AM +0800, Zhen Lei wrote:
>> Use the same tactic to cpu and numa-distance nodes.
> 
> Sorry, I don't understand... :/

In function of_numa_parse_cpu_nodes:
for_each_child_of_node(cpus, np) {
...
r = of_property_read_u32(np, "numa-node-id", &nid);
...
if (nid >= MAX_NUMNODES)
//check nid
pr_warn("NUMA: Node id %u exceeds maximum value\n", 
nid);   //print warning info
...


In function numa_set_distance:
if (from >= numa_distance_cnt || to >= numa_distance_cnt || 
//check nid
from < 0 || to < 0) {
pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d 
to=%d distance=%d\n",   //print warning info
from, to, distance);
return;
}

Both these two functions will check that whether nid(configured in dts, the 
subnodes of
cpus and distance-map) is right or not. So memory@ should also be checked.


memory@c0 {
device_type = "memory";
reg = <0x0 0xc0 0x0 0x8000>;
/* node 0 */
numa-node-id = <0>; //have not been 
checked yet.
};  //suppose I 
configued a wrong nid, it will not print any warning info

cpus {
#address-cells = <2>;
#size-cells = <0>;

cpu@0 {
device_type = "cpu";
compatible =  "arm,armv8";
reg = <0x0 0x0>;
enable-method = "psci";
/* node 0 */
numa-node-id = <0>; //checked in 
of_numa_parse_cpu_nodes
};

distance-map {
compatible = "numa-distance-map-v1";
distance-matrix = <0 0 10>, //checked in 
of_numa_parse_distance_map_v1 --> numa_set_distance
  <0 1 20>,
  <1 1 10>;
};

> 
> Will
> 
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/mm/numa.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index c7fe3ec..2601660 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -141,6 +141,11 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>  {
>>  int ret;
>>
>> +if (nid >= MAX_NUMNODES) {
>> +pr_warn("NUMA: Node id %u exceeds maximum value\n", nid);
>> +return -EINVAL;
>> +}
>> +
>>  ret = memblock_set_node(start, (end - start), &memblock.memory, nid);
>>  if (ret < 0) {
>>  pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node 
>> %d\n",
>> --
>> 2.5.0
>>
>>
> 
> .
> 



Re: [PATCH v3 5/5] arm64/numa: avoid inconsistent information to be printed

2016-06-05 Thread Leizhen (ThunderTown)


On 2016/6/3 17:55, Will Deacon wrote:
> On Thu, Jun 02, 2016 at 10:28:11AM +0800, Zhen Lei wrote:
>> numa_init(of_numa_init) may returned error because of numa configuration
>> error. So "No NUMA configuration found" is inaccurate. In fact, specific
>> configuration error information should be immediately printed by the
>> testing branch.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/mm/numa.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Looks fine to me, but this doesn't apply against -rc1.

Oh,

These patched based on https://lkml.org/lkml/2016/5/24/679 series.

> 
> Will
> 
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 2601660..1b9622c 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -338,8 +338,10 @@ static int __init numa_init(int (*init_func)(void))
>>  if (ret < 0)
>>  return ret;
>>
>> -if (nodes_empty(numa_nodes_parsed))
>> +if (nodes_empty(numa_nodes_parsed)) {
>> +pr_info("No NUMA configuration found\n");
>>  return -EINVAL;
>> +}
>>
>>  ret = numa_register_nodes();
>>  if (ret < 0)
>> @@ -370,8 +372,6 @@ static int __init dummy_numa_init(void)
>>
>>  if (numa_off)
>>  pr_info("NUMA disabled\n"); /* Forced off on command line. */
>> -else
>> -pr_info("No NUMA configuration found\n");
>>  pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n",
>> 0LLU, PFN_PHYS(max_pfn) - 1);
>>
>> --
>> 2.5.0
>>
>>
> 
> .
> 



Re: Suspicious error for CMA stress test

2016-03-07 Thread Leizhen (ThunderTown)


On 2016/3/7 12:34, Joonsoo Kim wrote:
> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
 On 2016/3/4 12:32, Joonsoo Kim wrote:
> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
 2016-03-03 10:25 GMT+09:00 Laura Abbott :
> (cc -mm and Joonsoo Kim)
>
>
> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>> Hi,
>>
>> I came across a suspicious error for CMA stress test:
>>
>> Before the test, I got:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal: 204800 kB
>> CmaFree:  195044 kB
>>
>>
>> After running the test:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal: 204800 kB
>> CmaFree: 6602584 kB
>>
>> So the freed CMA memory is more than total..
>>
>> Also the the MemFree is more than mem total:
>>
>> -bash-4.3# cat /proc/meminfo
>> MemTotal:   16342016 kB
>> MemFree:22367268 kB
>> MemAvailable:   22370528 kB
>>> [...]
> I played with this a bit and can see the same problem. The sanity
> check of CmaFree < CmaTotal generally triggers in
> __move_zone_freepage_state in unset_migratetype_isolate.
> This also seems to be present as far back as v4.0 which was the
> first version to have the updated accounting from Joonsoo.
> Were there known limitations with the new freepage accounting,
> Joonsoo?
 I don't know. I also played with this and looks like there is
 accounting problem, however, for my case, number of free page is 
 slightly less
 than total. I will take a look.

 Hanjun, could you tell me your malloc_size? I tested with 1 and it 
 doesn't
 look like your case.
>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, 
>>> also I
>>> did some other test:
>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>
>>>  - run with single thread with 10 times, everything is fine.
>>>
>>>  - I hack the cam_alloc() and free as below [1] to see if it's lock 
>>> issue, with
>>>the same test with 100 multi-thread, then I got:
>> [1] would not be sufficient to close this race.
>>
>> Try following things [A]. And, for more accurate test, I changed code a 
>> bit more
>> to prevent kernel page allocation from cma area [B]. This will prevent 
>> kernel
>> page allocation from cma area completely so we can focus 
>> cma_alloc/release race.
>>
>> Although, this is not correct fix, it could help that we can guess
>> where the problem is.
> More correct fix is something like below.
> Please test it.
 Hmm, this is not working:
>>> Sad to hear that.
>>>
>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>
>>
>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
> 
> Hmm... that's same with me.
> 
> Below is similar fix that prevents buddy merging when one of buddy's
> migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
> no idea why previous fix (more correct fix) doesn't work for you.
> (It works for me.) But, maybe there is a bug on the fix
> so I make new one which is more general form. Please test it.

Hi,
Hanjun Guo has gone to Tailand on business, so I help him to run this 
patch. The result
shows that the count of "CmaFree:" is OK now. But sometimes printed some 
information as below:

alloc_contig_range: [28500, 28600) PFNs busy
alloc_contig_range: [28300, 28380) PFNs busy

> 
> Thanks.
> 
> -->8-
>>From dd41e348572948d70b935fc24f82c096ff0fb417 Mon Sep 17 00:00:00 2001
> From: Joonsoo Kim 
> Date: Fri, 4 Mar 2016 13:28:17 +0900
> Subject: [PATCH] mm/cma: fix race
> 
> Signed-off-by: Joonsoo Kim 
> ---
>  mm/page_alloc.c | 33 +++--
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c6c38ed..d80d071 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -620,8 +620,8 @@ static inline void rmv_page_order(struct page *page)
>   *
>   * For recording page's order, we use page_private(page).
>   */
> -static inline int page_is_buddy(struct page *page, struct page *buddy,
> -   unsigned int order)
> +static inline int page_is_buddy(struct zone *zone, struct page *page,
> +   struct page *buddy, unsigned int order)
>  {
> if (!pfn_valid_within(page_to_pfn(buddy)))
> return 0;
> @@ -644,6 +644,2

Re: [PATCH 1/1] dma-mapping: to avoid exception when cpu_addr is NULL

2016-03-07 Thread Leizhen (ThunderTown)
Suppose:
CONFIG_SPARSEMEM is opened.
CONFIG_DMA_API_DEBUG or CONFIG_CMA is opened.

Then virt_to_page or phys_to_page will be called. Finally, in __pfn_to_page, 
__sec = __pfn_to_section(__pfn) is NULL.
So access section->section_mem_map will trigger exception.

-

#define __pfn_to_page(pfn)  \
({  unsigned long __pfn = (pfn);\
struct mem_section *__sec = __pfn_to_section(__pfn);\
__section_mem_map_addr(__sec) + __pfn;  \
})

static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}


On 2016/3/7 17:21, Zhen Lei wrote:
> Do this to keep consistent with kfree, which tolerate ptr is NULL.
> 
> Signed-off-by: Zhen Lei 
> ---
>  include/linux/dma-mapping.h | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 75857cd..fdd4294 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -402,7 +402,10 @@ static inline void *dma_alloc_coherent(struct device 
> *dev, size_t size,
>  static inline void dma_free_coherent(struct device *dev, size_t size,
>   void *cpu_addr, dma_addr_t dma_handle)
>  {
> - return dma_free_attrs(dev, size, cpu_addr, dma_handle, NULL);
> + if (unlikely(!cpu_addr))
> + return;
> +
> + dma_free_attrs(dev, size, cpu_addr, dma_handle, NULL);
>  }
> 
>  static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
> --
> 2.5.0
> 
> 
> 
> .
> 



Re: [PATCH 1/1] dma-mapping: to avoid exception when cpu_addr is NULL

2016-03-07 Thread Leizhen (ThunderTown)


On 2016/3/7 19:41, One Thousand Gnomes wrote:
> On Mon, 7 Mar 2016 17:21:25 +0800
> Zhen Lei  wrote:
> 
>> Do this to keep consistent with kfree, which tolerate ptr is NULL.
>>
>> Signed-off-by: Zhen Lei 
> 
> This is inlined code so you are adding extra logic to every single
> instance of a call to the function. What is it's total effect on kernel
> size ?

This a simple if statement, I think it will only generates two instructions.
Maybe I need move it into function dma_free_attrs, as below:
if (!ops->free || !cpu_addr)
return;

So that, it only generates one instruction. And dma_free_noncoherent can also 
be impacted.

Or I changed it to BUG_ON(!cpu_addr)?

Otherwise, I move it into ops->free, but that maybe more ARCHs.

> 
> Alan
> 
> .
> 



Re: [PATCH 1/1] dma-mapping: to avoid exception when cpu_addr is NULL

2016-03-07 Thread Leizhen (ThunderTown)


On 2016/3/8 6:59, Andrew Morton wrote:
> On Mon, 7 Mar 2016 18:43:47 +0800 "Leizhen (ThunderTown)" 
>  wrote:
> 
>> Suppose:
>> CONFIG_SPARSEMEM is opened.
>> CONFIG_DMA_API_DEBUG or CONFIG_CMA is opened.
>>
>> Then virt_to_page or phys_to_page will be called. Finally, in __pfn_to_page, 
>> __sec = __pfn_to_section(__pfn) is NULL.
>> So access section->section_mem_map will trigger exception.
>>
>> -
>>
>> #define __pfn_to_page(pfn)   \
>> ({   unsigned long __pfn = (pfn);\
>>  struct mem_section *__sec = __pfn_to_section(__pfn);\
>>  __section_mem_map_addr(__sec) + __pfn;  \
>> })
>>
>> static inline struct page *__section_mem_map_addr(struct mem_section 
>> *section)
>> {
>>  unsigned long map = section->section_mem_map;
>>  map &= SECTION_MAP_MASK;
>>  return (struct page *)map;
>> }
> 
> I'm having a bit of trouble understanding this.
> 
> Perhaps you could explain the bug more carefully (inclusion of an oops
> output would help) then we'll be in a better position to understand the
> proposed fix(es).
> 

Unable to handle kernel paging request at virtual address ffc020d3b2b8
pgd = ffc083a61000
[ffc020d3b2b8] *pgd=, *pud=
CPU: 4 PID: 1489 Comm: malloc_dma_1 Tainted: G   O
Hardware name:
task: ffc00d7d26c0 ti: ffc0837fc000 task.ti: ffc0837fc000
PC is at __dma_free_coherent.isra.10+0x74/0xc8
LR is at __dma_free+0x9c/0xb0
pc : [] lr : [] pstate: 8145
sp : ffc0837ff700
x29: ffc0837ff700 x28: 
x27:  x26: 
x25: ffc000d1b1d0 x24: 
x23: 00a0 x22: ffbfff5f
x21: 0010 x20: ffc2e21f7010
x19:  x18: 
x17: 007f9360a2b0 x16: ffc000541040
x15:  x14: 
x13:  x12: 0001
x11: 0068 x10: 0040
x9 : ffc000214e00 x8 : ffc2e54586b0
x7 :  x6 : 0004
x5 : ffc000214d64 x4 : 
x3 : 03ff x2 : 0003
x1 : 000f x0 : ffc000d3b2c0

Process malloc_dma_1 (pid: 1489, stack limit = 0xffc0837fc020)
Stack: (0xffc0837ff700 to 0xffc08380)
f700: ffc0837ff730 ffc000214e00 0010 
f720: ffc2e21f7010 ffc0837ff7d0 ffc0837ff770 ffbffc1d6134
f740: ffc2e21f7010 01a0 0064 ffc0837ff7d0
f760: ffc000c9fa20 ffc0837ffaf0 ffc0837ffe10 ffc000239b0c
f780: ffc00d54a280 ffc000d1ef58 ffc000957163 ffc2e21f7000
f7a0: ffbffc1d6030   
f7c0:   ffc01300 ffc013c0
f7e0: ffc013f0 ffc01420 ffc01460 ffc014a0
f800: ffc014e0 ffc01520 ffc01560 ffc015a0
f820: ffc015e0 ffc01620 ffc01660 ffc016a0
f840: ffc016e0 ffc01720 ffc01760 ffc017a0
f860: ffc017e0 ffc01820 ffc01860 ffc018a0
f880: ffc018e0 ffc01920 ffc01960 ffc019a0
f8a0: ffc019e0 ffc01a20 ffc01a60 ffc01aa0
f8c0: ffc01ae0 ffc01b20 ffc01b60 ffc01ba0
f8e0: ffc01be0 ffc01c20 ffc01c60 ffc01ca0
f900: ffc01ce0 ffc01d20 ffc01d60 ffc01da0
f920: ffc01de0 ffc01e20 ffc01e40 ffc01e60
f940: ffc01e90 ffc01ea0 ffc01eb0 ffc01ec0
f960: ffc01ee0 ffc01f20  
f980:    
f9a0:    
f9c0:    
f9e0:    
fa00:    
fa20:    
fa40:    
fa60:    
fa80:    
faa0:    
fac0:    
fae0:   13a0 1460
fb00: 1490 14c0 1500 1540
fb20: 1580 0

Re: Suspicious error for CMA stress test

2016-03-07 Thread Leizhen (ThunderTown)


On 2016/3/8 2:42, Laura Abbott wrote:
> On 03/07/2016 12:16 AM, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/3/7 12:34, Joonsoo Kim wrote:
>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
>>>>>> On 2016/3/4 12:32, Joonsoo Kim wrote:
>>>>>>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>>>>>>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>>>>>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
>>>>>>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott :
>>>>>>>>>>> (cc -mm and Joonsoo Kim)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I came across a suspicious error for CMA stress test:
>>>>>>>>>>>>
>>>>>>>>>>>> Before the test, I got:
>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>> CmaTotal: 204800 kB
>>>>>>>>>>>> CmaFree:  195044 kB
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> After running the test:
>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>> CmaTotal: 204800 kB
>>>>>>>>>>>> CmaFree: 6602584 kB
>>>>>>>>>>>>
>>>>>>>>>>>> So the freed CMA memory is more than total..
>>>>>>>>>>>>
>>>>>>>>>>>> Also the the MemFree is more than mem total:
>>>>>>>>>>>>
>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo
>>>>>>>>>>>> MemTotal:   16342016 kB
>>>>>>>>>>>> MemFree:22367268 kB
>>>>>>>>>>>> MemAvailable:   22370528 kB
>>>>>>>>> [...]
>>>>>>>>>>> I played with this a bit and can see the same problem. The sanity
>>>>>>>>>>> check of CmaFree < CmaTotal generally triggers in
>>>>>>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
>>>>>>>>>>> This also seems to be present as far back as v4.0 which was the
>>>>>>>>>>> first version to have the updated accounting from Joonsoo.
>>>>>>>>>>> Were there known limitations with the new freepage accounting,
>>>>>>>>>>> Joonsoo?
>>>>>>>>>> I don't know. I also played with this and looks like there is
>>>>>>>>>> accounting problem, however, for my case, number of free page is 
>>>>>>>>>> slightly less
>>>>>>>>>> than total. I will take a look.
>>>>>>>>>>
>>>>>>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it 
>>>>>>>>>> doesn't
>>>>>>>>>> look like your case.
>>>>>>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, 
>>>>>>>>> also I
>>>>>>>>> did some other test:
>>>>>>>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>>>>>>>
>>>>>>>>>   - run with single thread with 10 times, everything is fine.
>>>>>>>>>
>>>>>>>>>   - I hack the cam_alloc() and free as below [1] to see if it's lock 
>>>>>>>>> issue, with
>>>>>>>>> the same test with 100 multi-thread, then I got:
>>>>>>>> [1] would not be sufficient to close this race.
>>>>>>>>
>>>>>>>> Try following things [A]. And, for more accurate test, I changed code 
>>>>>>>> a bit more
>>

Re: [PATCH 2/2] arm64: to allow EFI_RTC can be selected on ARM64

2015-09-28 Thread Leizhen (ThunderTown)


On 2015/9/28 15:35, Arnd Bergmann wrote:
> On Monday 28 September 2015 13:34:38 Zhen Lei wrote:
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 07d1811..25cec57 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -85,7 +85,7 @@ config ARM64
>> select PERF_USE_VMALLOC
>> select POWER_RESET
>> select POWER_SUPPLY
>> -   select RTC_LIB
>> +   select RTC_LIB if !EFI
>> select SPARSE_IRQ
>> select SYSCTL_EXCEPTION_TRACE
>> select HAVE_CONTEXT_TRACKING
> 
> Sorry, we can't do that: enabling EFI has to be done in a way that it only
> adds features but not disables them.

I run "make ARCH=arm64 menuconfig" and found that RTC_CLASS is selected by 
default. Actually, RTC_LIB only
controls whether to display some configs when run "make menuconfig". I list all 
informations below:

-make ARCH=arm64 menuconfig-
  [*] Real Time Clock  --->

-drivers/rtc/Kconfig---
menuconfig RTC_CLASS
bool "Real Time Clock"
default n
depends on !S390 && !UML
select RTC_LIB

---
find . -name "*Kconfig*" | xargs grep RTC_LIB
./drivers/rtc/Kconfig:config RTC_LIB
./drivers/rtc/Kconfig:  select RTC_LIB
./drivers/char/Kconfig:if RTC_LIB=n
./drivers/char/Kconfig:endif # RTC_LIB
./arch/x86/Kconfig: select RTC_LIB
./arch/arm/Kconfig: select RTC_LIB
./arch/arm64/Kconfig:   select RTC_LIB if !EFI
./arch/sh/Kconfig:  select RTC_LIB
./arch/mips/Kconfig:select RTC_LIB if !MACH_LOONGSON64

--drivers/char/Kconfig--
if RTC_LIB=n

config RTC
tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)"

...

endif # RTC_LIB


> 
> Your patch breaks RTC on all non-EFI platforms as soon as CONFIG_EFI
> is selected by the user.

No, on non-EFI platforms, they can still use RTC as before. As I mentioned 
above,
RTC_LIB only controls whether to display some configs when run "make 
menuconfig".
On ARM64, (in this patch) I only allowed EFI_RTC can be showed when RTC_LIB was 
not selected.

--drivers/char/Kconfig--
if RTC_LIB=n

config RTC
tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)"

...

config EFI_RTC
bool "EFI Real Time Clock Services"
depends on IA64 || ARM64

...

endif # RTC_LIB

> 
>   Arnd
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] arm64: to allow EFI_RTC can be selected on ARM64

2015-09-28 Thread Leizhen (ThunderTown)


On 2015/9/28 15:40, Ard Biesheuvel wrote:
> On 28 September 2015 at 06:34, Zhen Lei  wrote:
>> Now, ARM64 is also support EFI startup. We hope use EFI runtime services
>> to get/set current time and date.
>>
>> RTC_LIB only controls some configs in drivers/char/Kconfig(included
>> EFI_RTC), and will be automatically selected when RTC_CLASS opened. So
>> this patch have no functional change but give an opportunity to select
>> EFI_RTC when RTC_CLASS closed.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/arm64/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 07d1811..25cec57 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -85,7 +85,7 @@ config ARM64
>> select PERF_USE_VMALLOC
>> select POWER_RESET
>> select POWER_SUPPLY
>> -   select RTC_LIB
>> +   select RTC_LIB if !EFI
>> select SPARSE_IRQ
>> select SYSCTL_EXCEPTION_TRACE
>> select HAVE_CONTEXT_TRACKING
> 
> You can currently enable EFI_RTC just fine on arm64 when EFI is enabled.
> Why exactly do you need this patch on top?

Because when we run "make ARCH=arm64 menuconfig", RTC_LIB is always selected. 
And we have no opportunity
to deselect it. And EFI_RTC can be displayed only when RTC_LIB=n.

drivers/rtc/Kconfig---
config RTC_LIB
bool

menuconfig RTC_CLASS
bool "Real Time Clock"
default n
depends on !S390 && !UML
select RTC_LIB

--drivers/char/Kconfig--
if RTC_LIB=n

..

config EFI_RTC
bool "EFI Real Time Clock Services"
depends on IA64 || ARM64

...

endif # RTC_LIB

> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] arm64: to allow EFI_RTC can be selected on ARM64

2015-09-28 Thread Leizhen (ThunderTown)


On 2015/9/28 16:42, Arnd Bergmann wrote:
> On Monday 28 September 2015 16:29:57 Leizhen wrote:
>>
>> On 2015/9/28 15:35, Arnd Bergmann wrote:
>>> On Monday 28 September 2015 13:34:38 Zhen Lei wrote:
 diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
 index 07d1811..25cec57 100644
 --- a/arch/arm64/Kconfig
 +++ b/arch/arm64/Kconfig
 @@ -85,7 +85,7 @@ config ARM64
 select PERF_USE_VMALLOC
 select POWER_RESET
 select POWER_SUPPLY
 -   select RTC_LIB
 +   select RTC_LIB if !EFI
 select SPARSE_IRQ
 select SYSCTL_EXCEPTION_TRACE
 select HAVE_CONTEXT_TRACKING
>>>
>>> Sorry, we can't do that: enabling EFI has to be done in a way that it only
>>> adds features but not disables them.
>>
>> I run "make ARCH=arm64 menuconfig" and found that RTC_CLASS is selected by 
>> default. Actually, RTC_LIB only
>> controls whether to display some configs when run "make menuconfig". I list 
>> all informations below:
>>
>> -make ARCH=arm64 menuconfig-
>>   [*] Real Time Clock  --->
>>
>> -drivers/rtc/Kconfig---
>> menuconfig RTC_CLASS
>> bool "Real Time Clock"
>> default n
>> depends on !S390 && !UML
>> select RTC_LIB
> 
> Ok, I see. So your patch here has no effect at all and can be dropped, or
> we can remove the 'select RTC_LIB' without the EFI dependency.

Oh, I described the reason in the reply to Ard Biesheuvel.

https://lkml.org/lkml/2015/9/28/124

> 
>> ---
>> find . -name "*Kconfig*" | xargs grep RTC_LIB
>> ./drivers/rtc/Kconfig:config RTC_LIB
>> ./drivers/rtc/Kconfig:   select RTC_LIB
>> ./drivers/char/Kconfig:if RTC_LIB=n
>> ./drivers/char/Kconfig:endif # RTC_LIB
>> ./arch/x86/Kconfig:  select RTC_LIB
>> ./arch/arm/Kconfig:  select RTC_LIB
>> ./arch/arm64/Kconfig:select RTC_LIB if !EFI
>> ./arch/sh/Kconfig:   select RTC_LIB
>> ./arch/mips/Kconfig: select RTC_LIB if !MACH_LOONGSON64
>>
>> --drivers/char/Kconfig--
>> if RTC_LIB=n
>>
>> config RTC
>> tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)"
>>
>> ...
>>
>> endif # RTC_LIB
>>
>>
>>>
>>> Your patch breaks RTC on all non-EFI platforms as soon as CONFIG_EFI
>>> is selected by the user.
>>
>> No, on non-EFI platforms, they can still use RTC as before. As I mentioned 
>> above,
>> RTC_LIB only controls whether to display some configs when run "make 
>> menuconfig".
>> On ARM64, (in this patch) I only allowed EFI_RTC can be showed when RTC_LIB 
>> was
>> not selected.
>>
> 
> but that is the wrong driver that uses the legacy API, we cannot have that
> on ARM because it conflicts with the normal RTC_CLASS drivers.

Yes, RTC_CLASS will automatically select RTC_LIB, and will not display EFI_RTC, 
because
RTC_LIB=y now.

We can select EFI_RTC only when RTC_CLASS is not selected(meanwhile RTC_LIB=n)

> 
>> --drivers/char/Kconfig--
>> if RTC_LIB=n
>>
>> config RTC
>> tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)"
>>
>> ...
>>
>> config EFI_RTC
>> bool "EFI Real Time Clock Services"
>> depends on IA64 || ARM64
>>
>> ...
>>
>> endif # RTC_LIB
> 
> The driver you want is RTC_DRV_EFI, not EFI_RTC.

OK, I will try it tommorrow.

> 
>   Arnd
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] clean up some functions in mm/swap_slots.c

2020-07-08 Thread Leizhen (ThunderTown)
Hi, all:
  Are these patches acceptable?
  All these three patches are "Acked-by: Tim Chen " 
two months ago.

On 2020/4/30 14:11, Zhen Lei wrote:
> When I studied the code of mm/swap_slots.c, I found some places can be 
> improved.
> 
> Zhen Lei (3):
>   mm/swap: simplify alloc_swap_slot_cache()
>   mm/swap: simplify enable_swap_slots_cache()
>   mm/swap: remove redundant check for swap_slot_cache_initialized
> 
>  mm/swap_slots.c | 45 +
>  1 file changed, 21 insertions(+), 24 deletions(-)
> 



Re: [PATCH v3 0/7] bugfix and optimize for drivers/nvdimm

2020-08-26 Thread Leizhen (ThunderTown)
Hi all:
  Any comment? I want to merge patches 1 and 2 into one, then send
other patches separately.

On 2020/8/20 10:16, Zhen Lei wrote:
> v2 --> v3:
> 1. Fix spelling error of patch 1 subject: memmory --> memory
> 2. Add "Reviewed-by: Oliver O'Halloran " into patch 1
> 3. Rewrite patch descriptions of Patch 1, 3, 4
> 4. Add 3 new trivial patches 5-7, I just found that yesterday.
> 5. Unify all "subsystem" names to "libnvdimm:"
> 
> v1 --> v2:
> 1. Add Fixes for Patch 1-2
> 2. Slightly change the subject and description of Patch 1
> 3. Add a new trivial Patch 4, I just found that yesterday.
> 
> v1:
> I found a memleak when I learned the drivers/nvdimm code today. And I also
> added a sanity check for priv->bus_desc.provider_name, because strdup()
> maybe failed. Patch 3 is a trivial source code optimization.
> 
> 
> Zhen Lei (7):
>   libnvdimm: fix memory leaks in of_pmem.c
>   libnvdimm: add sanity check for provider_name in
> of_pmem_region_probe()
>   libnvdimm: simplify walk_to_nvdimm_bus()
>   libnvdimm: reduce an unnecessary if branch in nd_region_create()
>   libnvdimm: reduce an unnecessary if branch in nd_region_activate()
>   libnvdimm: make sure EXPORT_SYMBOL_GPL(nvdimm_flush) close to its
> function
>   libnvdimm: slightly simplify available_slots_show()
> 
>  drivers/nvdimm/bus.c |  7 +++
>  drivers/nvdimm/dimm_devs.c   |  5 ++---
>  drivers/nvdimm/of_pmem.c |  7 +++
>  drivers/nvdimm/region_devs.c | 13 -
>  4 files changed, 16 insertions(+), 16 deletions(-)
> 



Re: [PATCH 3/4] libnvdimm: eliminate two unnecessary zero initializations in badrange.c

2020-08-26 Thread Leizhen (ThunderTown)
I will drop this patch, because badrange_add() is unlikely to be called.
There's no need to care about trivial performance improvements.

On 2020/8/20 22:30, Zhen Lei wrote:
> Currently, the "struct badrange_entry" has three members: start, length,
> list. In append_badrange_entry(), "start" and "length" will be assigned
> later, and "list" does not need to be initialized before calling
> list_add_tail(). That means, the kzalloc() in badrange_add() or
> alloc_and_append_badrange_entry() can be replaced with kmalloc(), because
> the zero initialization is not required.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/nvdimm/badrange.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c
> index 7f78b659057902d..13145001c52ff39 100644
> --- a/drivers/nvdimm/badrange.c
> +++ b/drivers/nvdimm/badrange.c
> @@ -37,7 +37,7 @@ static int alloc_and_append_badrange_entry(struct badrange 
> *badrange,
>  {
>   struct badrange_entry *bre;
>  
> - bre = kzalloc(sizeof(*bre), flags);
> + bre = kmalloc(sizeof(*bre), flags);
>   if (!bre)
>   return -ENOMEM;
>  
> @@ -49,7 +49,7 @@ int badrange_add(struct badrange *badrange, u64 addr, u64 
> length)
>  {
>   struct badrange_entry *bre, *bre_new;
>  
> - bre_new = kzalloc(sizeof(*bre_new), GFP_KERNEL);
> + bre_new = kmalloc(sizeof(*bre_new), GFP_KERNEL);
>  
>   spin_lock(&badrange->lock);
>  
> 



Re: [PATCH v3 5/7] libnvdimm: reduce an unnecessary if branch in nd_region_activate()

2020-08-27 Thread Leizhen (ThunderTown)
I will drop this patch, because I have a doubt:
Suppose the nd_region->ndr_mappings is 4, and for each nd_region->mapping[],
the value of num_flush is "0, 0, 4, 0", so the flush_data_size is "1 + 1 + 5 + 
1", * sizeof(void *).
But in ndrd_get_flush_wpq() or ndrd_set_flush_wpq(), the expression is
"ndrd->flush_wpq[dimm * num + (hint & mask)]", I don't think the memory "ndrd" 
allocated is enough.
Please refer call chain: nd_region_activate() --> nvdimm_map_flush() --> 
ndrd_set_flush_wpq()

for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
struct nvdimm *nvdimm = nd_mapping->nvdimm;

/* at least one null hint slot per-dimm for the "no-hint" case 
*/
flush_data_size += sizeof(void *);
num_flush = min_not_zero(num_flush, nvdimm->num_flush);
if (!nvdimm->num_flush)
continue;
flush_data_size += nvdimm->num_flush * sizeof(void *);
}

ndrd = devm_kzalloc(dev, sizeof(*ndrd) + flush_data_size, GFP_KERNEL);




On 2020/8/20 10:16, Zhen Lei wrote:
> According to the original code logic:
> if (!nvdimm->num_flush) {
>   flush_data_size += sizeof(void *);
>   //nvdimm->num_flush is zero now, add 1) have no side effects
> } else {
>   flush_data_size += sizeof(void *);
> 1)flush_data_size += nvdimm->num_flush * sizeof(void *);
> }
> 
> Obviously, the above code snippet can be reduced to one statement:
> flush_data_size += (nvdimm->num_flush + 1) * sizeof(void *);
> 
> No functional change.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/nvdimm/region_devs.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 7cf9c7d857909ce..49be115c9189eff 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -77,11 +77,8 @@ int nd_region_activate(struct nd_region *nd_region)
>   }
>  
>   /* at least one null hint slot per-dimm for the "no-hint" case 
> */
> - flush_data_size += sizeof(void *);
> + flush_data_size += (nvdimm->num_flush + 1) * sizeof(void *);
>   num_flush = min_not_zero(num_flush, nvdimm->num_flush);
> - if (!nvdimm->num_flush)
> - continue;
> - flush_data_size += nvdimm->num_flush * sizeof(void *);
>   }
>   nvdimm_bus_unlock(&nd_region->dev);
>  
> 



Re: [PATCH v4 12/20] dt-bindings: arm: hisilicon: convert hisilicon,hi3798cv200-perictrl bindings to json-schema

2020-09-29 Thread Leizhen (ThunderTown)



On 2020/9/29 11:18, Leizhen (ThunderTown) wrote:
> 
> 
> On 2020/9/29 3:14, Rob Herring wrote:
>> On Mon, Sep 28, 2020 at 11:13:16PM +0800, Zhen Lei wrote:
>>> Convert the Hisilicon Hi3798CV200 Peripheral Controller binding to DT
>>> schema format using json-schema.
>>>
>>> Signed-off-by: Zhen Lei 
>>> ---
>>>  .../controller/hisilicon,hi3798cv200-perictrl.txt  | 21 --
>>>  .../controller/hisilicon,hi3798cv200-perictrl.yaml | 45 
>>> ++
>>>  2 files changed, 45 insertions(+), 21 deletions(-)
>>>  delete mode 100644 
>>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>  create mode 100644 
>>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>
>>> diff --git 
>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>  
>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>> deleted file mode 100644
>>> index 0d5282f4670658d..000
>>> --- 
>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>> +++ /dev/null
>>> @@ -1,21 +0,0 @@
>>> -Hisilicon Hi3798CV200 Peripheral Controller
>>> -
>>> -The Hi3798CV200 Peripheral Controller controls peripherals, queries
>>> -their status, and configures some functions of peripherals.
>>> -
>>> -Required properties:
>>> -- compatible: Should contain "hisilicon,hi3798cv200-perictrl", "syscon"
>>> -  and "simple-mfd".
>>> -- reg: Register address and size of Peripheral Controller.
>>> -- #address-cells: Should be 1.
>>> -- #size-cells: Should be 1.
>>> -
>>> -Examples:
>>> -
>>> -   perictrl: peripheral-controller@8a2 {
>>> -   compatible = "hisilicon,hi3798cv200-perictrl", "syscon",
>>> -"simple-mfd";
>>> -   reg = <0x8a2 0x1000>;
>>> -   #address-cells = <1>;
>>> -   #size-cells = <1>;
>>> -   };
>>> diff --git 
>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>  
>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>> new file mode 100644
>>> index 000..4e547017e368393
>>> --- /dev/null
>>> +++ 
>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>> @@ -0,0 +1,45 @@
>>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>>> +%YAML 1.2
>>> +---
>>> +$id: 
>>> http://devicetree.org/schemas/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml#
>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>> +
>>> +title: Hisilicon Hi3798CV200 Peripheral Controller
>>> +
>>> +maintainers:
>>> +  - Wei Xu 
>>> +
>>> +description: |
>>> +  The Hi3798CV200 Peripheral Controller controls peripherals, queries
>>> +  their status, and configures some functions of peripherals.
>>> +
>>> +properties:
>>> +  compatible:
>>> +items:
>>> +  - const: hisilicon,hi3798cv200-perictrl
>>> +  - const: syscon
>>> +  - const: simple-mfd
>>> +
>>> +  reg:
>>> +description: Register address and size
>>> +maxItems: 1
>>> +
>>> +  '#address-cells':
>>> +const: 1
>>> +
>>> +  '#size-cells':
>>> +const: 1
>>
>> That implies child nodes. You need some sort of schema for them.
> 
> OK, I will drop #address-cells and #size-cells in this binding.

I think I misunderstood. I shoud describe child nodes here.

It's National Day the day after tomorrow, total eight days off. It's so hurry.
I'll give up this patch! And do it for v5.11

> 
>>
>>> +
>>> +required:
>>> +  - compatible
>>> +  - reg
>>> +
>>> +examples:
>>> +  - |
>>> +perictrl: peripheral-controller@8a2 {
>>> +compatible = "hisilicon,hi3798cv200-perictrl", "syscon", 
>>> "simple-mfd";
>>> +reg = <0x8a2 0x1000>;
>>> +#address-cells = <1>;
>>> +#size-cells = <1>;
>>> +};
>>> +...
>>> -- 
>>> 1.8.3
>>>
>>>
>>
>> .
>>
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH v4 12/20] dt-bindings: arm: hisilicon: convert hisilicon,hi3798cv200-perictrl bindings to json-schema

2020-09-29 Thread Leizhen (ThunderTown)



On 2020/9/29 17:21, Leizhen (ThunderTown) wrote:
> 
> 
> On 2020/9/29 11:18, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2020/9/29 3:14, Rob Herring wrote:
>>> On Mon, Sep 28, 2020 at 11:13:16PM +0800, Zhen Lei wrote:
>>>> Convert the Hisilicon Hi3798CV200 Peripheral Controller binding to DT
>>>> schema format using json-schema.
>>>>
>>>> Signed-off-by: Zhen Lei 
>>>> ---
>>>>  .../controller/hisilicon,hi3798cv200-perictrl.txt  | 21 --
>>>>  .../controller/hisilicon,hi3798cv200-perictrl.yaml | 45 
>>>> ++
>>>>  2 files changed, 45 insertions(+), 21 deletions(-)
>>>>  delete mode 100644 
>>>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>>  create mode 100644 
>>>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>>
>>>> diff --git 
>>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>>  
>>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>> deleted file mode 100644
>>>> index 0d5282f4670658d..000
>>>> --- 
>>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>> +++ /dev/null
>>>> @@ -1,21 +0,0 @@
>>>> -Hisilicon Hi3798CV200 Peripheral Controller
>>>> -
>>>> -The Hi3798CV200 Peripheral Controller controls peripherals, queries
>>>> -their status, and configures some functions of peripherals.
>>>> -
>>>> -Required properties:
>>>> -- compatible: Should contain "hisilicon,hi3798cv200-perictrl", "syscon"
>>>> -  and "simple-mfd".
>>>> -- reg: Register address and size of Peripheral Controller.
>>>> -- #address-cells: Should be 1.
>>>> -- #size-cells: Should be 1.
>>>> -
>>>> -Examples:
>>>> -
>>>> -  perictrl: peripheral-controller@8a2 {
>>>> -  compatible = "hisilicon,hi3798cv200-perictrl", "syscon",
>>>> -   "simple-mfd";
>>>> -  reg = <0x8a2 0x1000>;
>>>> -  #address-cells = <1>;
>>>> -  #size-cells = <1>;
>>>> -  };
>>>> diff --git 
>>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>>  
>>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>> new file mode 100644
>>>> index 000..4e547017e368393
>>>> --- /dev/null
>>>> +++ 
>>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>> @@ -0,0 +1,45 @@
>>>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>>>> +%YAML 1.2
>>>> +---
>>>> +$id: 
>>>> http://devicetree.org/schemas/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml#
>>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>>> +
>>>> +title: Hisilicon Hi3798CV200 Peripheral Controller
>>>> +
>>>> +maintainers:
>>>> +  - Wei Xu 
>>>> +
>>>> +description: |
>>>> +  The Hi3798CV200 Peripheral Controller controls peripherals, queries
>>>> +  their status, and configures some functions of peripherals.
>>>> +
>>>> +properties:
>>>> +  compatible:
>>>> +items:
>>>> +  - const: hisilicon,hi3798cv200-perictrl
>>>> +  - const: syscon
>>>> +  - const: simple-mfd
>>>> +
>>>> +  reg:
>>>> +description: Register address and size
>>>> +maxItems: 1
>>>> +
>>>> +  '#address-cells':
>>>> +const: 1
>>>> +
>>>> +  '#size-cells':
>>>> +const: 1
>>>
>>> That implies child nodes. You need some sort of schema for them.
>>
>> OK, I will drop #address-cells and #size-cells in this binding.
> 
> I think I misunderstood. I shoud describe child nodes here.
> 
> It's National Day the day after tomorrow, total eight days off. It's so hurry.
> I'll give up this patch! And do it for v5.11

I searched the dtsi, these two properties are required by property "ranges", so
I will add it.

> 
">>
>>>
>>>> +
>>>> +required:
>>>> +  - compatible
>>>> +  - reg
>>>> +
>>>> +examples:
>>>> +  - |
>>>> +perictrl: peripheral-controller@8a2 {
>>>> +compatible = "hisilicon,hi3798cv200-perictrl", "syscon", 
>>>> "simple-mfd";
>>>> +reg = <0x8a2 0x1000>;
>>>> +#address-cells = <1>;
>>>> +#size-cells = <1>;
>>>> +};
>>>> +...
>>>> -- 
>>>> 1.8.3
>>>>
>>>>
>>>
>>> .
>>>
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>> .
>>
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH v5 15/17] dt-bindings: arm: hisilicon: convert Hi6220 domain controller bindings to json-schema

2020-09-29 Thread Leizhen (ThunderTown)
Hi, Rob:
  I'm so glad to see you applied my patches in this morning. However, this patch
is not applied and without any comment. Did you miss it?


On 2020/9/29 22:14, Zhen Lei wrote:
> Convert the Hisilicon Hi6220 domain controllers binding to DT schema
> format using json-schema. All of them are grouped into one yaml file, to
> help users understand differences and avoid repeated descriptions.
> 
> Signed-off-by: Zhen Lei 
> ---
>  .../hisilicon/controller/hi6220-domain-ctrl.yaml   | 64 
> ++
>  .../controller/hisilicon,hi6220-aoctrl.txt | 18 --
>  .../controller/hisilicon,hi6220-mediactrl.txt  | 18 --
>  .../controller/hisilicon,hi6220-pmctrl.txt | 18 --
>  4 files changed, 64 insertions(+), 54 deletions(-)
>  create mode 100644 
> Documentation/devicetree/bindings/arm/hisilicon/controller/hi6220-domain-ctrl.yaml
>  delete mode 100644 
> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-aoctrl.txt
>  delete mode 100644 
> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-mediactrl.txt
>  delete mode 100644 
> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-pmctrl.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hi6220-domain-ctrl.yaml
>  
> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hi6220-domain-ctrl.yaml
> new file mode 100644
> index 000..32c562720d877c9
> --- /dev/null
> +++ 
> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hi6220-domain-ctrl.yaml
> @@ -0,0 +1,64 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: 
> http://devicetree.org/schemas/arm/hisilicon/controller/hi6220-domain-ctrl.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Hisilicon Hi6220 domain controller
> +
> +maintainers:
> +  - Wei Xu 
> +
> +description: |
> +  Hisilicon designs some special domain controllers for mobile platform,
> +  such as: the power Always On domain controller, the Media domain
> +  controller(e.g. codec, G3D ...) and the Power Management domain
> +  controller.
> +
> +  The compatible names of each domain controller are as follows:
> +  Power Always ON domain controller  --> hisilicon,hi6220-aoctrl
> +  Media domain controller--> hisilicon,hi6220-mediactrl
> +  Power Management domain controller --> hisilicon,hi6220-pmctrl
> +
> +properties:
> +  compatible:
> +items:
> +  - enum:
> +  - hisilicon,hi6220-aoctrl
> +  - hisilicon,hi6220-mediactrl
> +  - hisilicon,hi6220-pmctrl
> +  - const: syscon
> +
> +  reg:
> +maxItems: 1
> +
> +  '#clock-cells':
> +const: 1
> +
> +required:
> +  - compatible
> +  - reg
> +  - '#clock-cells'
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +ao_ctrl@f780 {
> +compatible = "hisilicon,hi6220-aoctrl", "syscon";
> +reg = <0xf780 0x2000>;
> +#clock-cells = <1>;
> +};
> +
> +media_ctrl@f441 {
> +compatible = "hisilicon,hi6220-mediactrl", "syscon";
> +reg = <0xf441 0x1000>;
> +#clock-cells = <1>;
> +};
> +
> +pm_ctrl@f7032000 {
> +compatible = "hisilicon,hi6220-pmctrl", "syscon";
> +reg = <0xf7032000 0x1000>;
> +#clock-cells = <1>;
> +};
> +...
> diff --git 
> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-aoctrl.txt
>  
> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-aoctrl.txt
> deleted file mode 100644
> index 5a723c1d45f4a17..000
> --- 
> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-aoctrl.txt
> +++ /dev/null
> @@ -1,18 +0,0 @@
> -Hisilicon Hi6220 Power Always ON domain controller
> -
> -Required properties:
> -- compatible : "hisilicon,hi6220-aoctrl"
> -- reg : Register address and size
> -- #clock-cells: should be set to 1, many clock registers are defined
> -  under this controller and this property must be present.
> -
> -Hisilicon designs this system controller to control the power always
> -on domain for mobile platform.
> -
> -Example:
> - /*for Hi6220*/
> - ao_ctrl: ao_ctrl@f780 {
> - compatible = "hisilicon,hi6220-aoctrl", "syscon";
> - reg = <0x0 0xf780 0x0 0x2000>;
> - #clock-cells = <1>;
> - };
> diff --git 
> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-mediactrl.txt
>  
> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-mediactrl.txt
> deleted file mode 100644
> index dcfdcbcb6455771..000
> --- 
> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-mediactrl.txt
> +++ /dev/null
> @@ -1,18 +0,0 @@
> -Hisilicon Hi6220 Media domain controller
> -
> -Required properties:
> -- compatible : "hisilicon,hi6220-mediactrl"
> -- reg : Register address and si

Re: [PATCH v4 12/20] dt-bindings: arm: hisilicon: convert hisilicon,hi3798cv200-perictrl bindings to json-schema

2020-09-29 Thread Leizhen (ThunderTown)



On 2020/9/29 21:52, Rob Herring wrote:
> On Tue, Sep 29, 2020 at 8:25 AM Leizhen (ThunderTown)
>  wrote:
>>
>>
>>
>> On 2020/9/29 17:21, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2020/9/29 11:18, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2020/9/29 3:14, Rob Herring wrote:
>>>>> On Mon, Sep 28, 2020 at 11:13:16PM +0800, Zhen Lei wrote:
>>>>>> Convert the Hisilicon Hi3798CV200 Peripheral Controller binding to DT
>>>>>> schema format using json-schema.
>>>>>>
>>>>>> Signed-off-by: Zhen Lei 
>>>>>> ---
>>>>>>  .../controller/hisilicon,hi3798cv200-perictrl.txt  | 21 --
>>>>>>  .../controller/hisilicon,hi3798cv200-perictrl.yaml | 45 
>>>>>> ++
>>>>>>  2 files changed, 45 insertions(+), 21 deletions(-)
>>>>>>  delete mode 100644 
>>>>>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>>>>  create mode 100644 
>>>>>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>>>>
>>>>>> diff --git 
>>>>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>>>>  
>>>>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>>>> deleted file mode 100644
>>>>>> index 0d5282f4670658d..000
>>>>>> --- 
>>>>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.txt
>>>>>> +++ /dev/null
>>>>>> @@ -1,21 +0,0 @@
>>>>>> -Hisilicon Hi3798CV200 Peripheral Controller
>>>>>> -
>>>>>> -The Hi3798CV200 Peripheral Controller controls peripherals, queries
>>>>>> -their status, and configures some functions of peripherals.
>>>>>> -
>>>>>> -Required properties:
>>>>>> -- compatible: Should contain "hisilicon,hi3798cv200-perictrl", "syscon"
>>>>>> -  and "simple-mfd".
>>>>>> -- reg: Register address and size of Peripheral Controller.
>>>>>> -- #address-cells: Should be 1.
>>>>>> -- #size-cells: Should be 1.
>>>>>> -
>>>>>> -Examples:
>>>>>> -
>>>>>> -  perictrl: peripheral-controller@8a2 {
>>>>>> -  compatible = "hisilicon,hi3798cv200-perictrl", "syscon",
>>>>>> -   "simple-mfd";
>>>>>> -  reg = <0x8a2 0x1000>;
>>>>>> -  #address-cells = <1>;
>>>>>> -  #size-cells = <1>;
>>>>>> -  };
>>>>>> diff --git 
>>>>>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>>>>  
>>>>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>>>> new file mode 100644
>>>>>> index 000..4e547017e368393
>>>>>> --- /dev/null
>>>>>> +++ 
>>>>>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml
>>>>>> @@ -0,0 +1,45 @@
>>>>>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>>>>>> +%YAML 1.2
>>>>>> +---
>>>>>> +$id: 
>>>>>> http://devicetree.org/schemas/arm/hisilicon/controller/hisilicon,hi3798cv200-perictrl.yaml#
>>>>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>>>>> +
>>>>>> +title: Hisilicon Hi3798CV200 Peripheral Controller
>>>>>> +
>>>>>> +maintainers:
>>>>>> +  - Wei Xu 
>>>>>> +
>>>>>> +description: |
>>>>>> +  The Hi3798CV200 Peripheral Controller controls peripherals, queries
>>>>>> +  their status, and configures some functions of peripherals.
>>>>>> +
>>>>>> +properties:
>>>>>> +  compatible:
>>>>>> +items:
>>>>>> +  - const: hisilicon,hi3798cv200-perictrl
>>>>>> +  - const: syscon
>>>>>> +  - const: simple-mfd
>>>>>> +
>>>>>> +  reg:
>>>>>> +description: Register address and size
>>>>>> +maxItems: 1
>>>>>> +
>>>>>> +  '#address-cells':
>>>>>> +const: 1
>>>>>> +
>>>>>> +  '#size-cells':
>>>>>> +const: 1
>>>>>
>>>>> That implies child nodes. You need some sort of schema for them.
>>>>
>>>> OK, I will drop #address-cells and #size-cells in this binding.
>>>
>>> I think I misunderstood. I shoud describe child nodes here.
>>>
>>> It's National Day the day after tomorrow, total eight days off. It's so 
>>> hurry.
>>> I'll give up this patch! And do it for v5.11
>>
>> I searched the dtsi, these two properties are required by property "ranges", 
>> so
>> I will add it.
> 
> 'ranges' also implies there are child nodes as does 'simple-mfd', so
> whatever child nodes you have are missing and need to be documented
> too. Also, 'ranges' implies the child nodes are memory-mapped, but
> 'simple-mfd' implies they are not. 'simple-bus' is what should be used
> for memory-mapped children.

Sorry, The reason for the jet lag, I went straight home after I sent the
version 5 of these patches last night after 10 p.m. I saw you had applied
the new one. Thanks for the information you showed me here.

> 
> Rob
> 
> .
> 



Re: [PATCH v5 15/17] dt-bindings: arm: hisilicon: convert Hi6220 domain controller bindings to json-schema

2020-09-29 Thread Leizhen (ThunderTown)



On 2020/9/30 9:38, Leizhen (ThunderTown) wrote:
> Hi, Rob:
>   I'm so glad to see you applied my patches in this morning. However, this 
> patch
> is not applied and without any comment. Did you miss it?

Oh, I got it, missed the property "#reset-cells". What a shame! I will post the 
new one.

> 
> 
> On 2020/9/29 22:14, Zhen Lei wrote:
>> Convert the Hisilicon Hi6220 domain controllers binding to DT schema
>> format using json-schema. All of them are grouped into one yaml file, to
>> help users understand differences and avoid repeated descriptions.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  .../hisilicon/controller/hi6220-domain-ctrl.yaml   | 64 
>> ++
>>  .../controller/hisilicon,hi6220-aoctrl.txt | 18 --
>>  .../controller/hisilicon,hi6220-mediactrl.txt  | 18 --
>>  .../controller/hisilicon,hi6220-pmctrl.txt | 18 --
>>  4 files changed, 64 insertions(+), 54 deletions(-)
>>  create mode 100644 
>> Documentation/devicetree/bindings/arm/hisilicon/controller/hi6220-domain-ctrl.yaml
>>  delete mode 100644 
>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-aoctrl.txt
>>  delete mode 100644 
>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-mediactrl.txt
>>  delete mode 100644 
>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-pmctrl.txt
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hi6220-domain-ctrl.yaml
>>  
>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hi6220-domain-ctrl.yaml
>> new file mode 100644
>> index 000..32c562720d877c9
>> --- /dev/null
>> +++ 
>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hi6220-domain-ctrl.yaml
>> @@ -0,0 +1,64 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: 
>> http://devicetree.org/schemas/arm/hisilicon/controller/hi6220-domain-ctrl.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Hisilicon Hi6220 domain controller
>> +
>> +maintainers:
>> +  - Wei Xu 
>> +
>> +description: |
>> +  Hisilicon designs some special domain controllers for mobile platform,
>> +  such as: the power Always On domain controller, the Media domain
>> +  controller(e.g. codec, G3D ...) and the Power Management domain
>> +  controller.
>> +
>> +  The compatible names of each domain controller are as follows:
>> +  Power Always ON domain controller  --> hisilicon,hi6220-aoctrl
>> +  Media domain controller--> hisilicon,hi6220-mediactrl
>> +  Power Management domain controller --> hisilicon,hi6220-pmctrl
>> +
>> +properties:
>> +  compatible:
>> +items:
>> +  - enum:
>> +  - hisilicon,hi6220-aoctrl
>> +  - hisilicon,hi6220-mediactrl
>> +  - hisilicon,hi6220-pmctrl
>> +  - const: syscon
>> +
>> +  reg:
>> +maxItems: 1
>> +
>> +  '#clock-cells':
>> +const: 1
>> +
>> +required:
>> +  - compatible
>> +  - reg
>> +  - '#clock-cells'
>> +
>> +additionalProperties: false
>> +
>> +examples:
>> +  - |
>> +ao_ctrl@f780 {
>> +compatible = "hisilicon,hi6220-aoctrl", "syscon";
>> +reg = <0xf780 0x2000>;
>> +#clock-cells = <1>;
>> +};
>> +
>> +media_ctrl@f441 {
>> +compatible = "hisilicon,hi6220-mediactrl", "syscon";
>> +reg = <0xf441 0x1000>;
>> +#clock-cells = <1>;
>> +};
>> +
>> +pm_ctrl@f7032000 {
>> +compatible = "hisilicon,hi6220-pmctrl", "syscon";
>> +reg = <0xf7032000 0x1000>;
>> +#clock-cells = <1>;
>> +};
>> +...
>> diff --git 
>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-aoctrl.txt
>>  
>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-aoctrl.txt
>> deleted file mode 100644
>> index 5a723c1d45f4a17..000
>> --- 
>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hi6220-aoctrl.txt
>> +++ /dev/null
>> @@ -1,18 +0,0 @@
>> -Hisilicon Hi6220 Power Always ON domain controller
>> -
>> -Required properties:
>> -- compatible : "hisili

Re: [PATCH v6 01/17] dt-bindings: mfd: syscon: add some compatible strings for Hisilicon

2020-09-30 Thread Leizhen (ThunderTown)



On 2020/9/30 15:11, Lee Jones wrote:
> On Wed, 30 Sep 2020, Zhen Lei wrote:
> 
>> Add some compatible strings for Hisilicon controllers:
>> hisilicon,hi6220-sramctrl  --> Hi6220 SRAM controller
>> hisilicon,pcie-sas-subctrl --> HiP05/HiP06 PCIe-SAS subsystem controller
>> hisilicon,peri-subctrl --> HiP05/HiP06 PERI subsystem controller
>> hisilicon,dsa-subctrl  --> HiP05/HiP06 DSA subsystem controller
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  Documentation/devicetree/bindings/mfd/syscon.yaml | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> This was already applied by the time you re-sent it.
> 
> Any reason for sending it again?

Path 15 are modified. The Document patches except Patch 15 are applied,
but the config/DTS patches are not applied(They are applied after I re-sent).

> 



Re: [PATCH v6 01/17] dt-bindings: mfd: syscon: add some compatible strings for Hisilicon

2020-10-10 Thread Leizhen (ThunderTown)



On 2020/10/1 14:59, Lee Jones wrote:
> On Wed, 30 Sep 2020, Leizhen (ThunderTown) wrote:
> 
>>
>>
>> On 2020/9/30 15:11, Lee Jones wrote:
>>> On Wed, 30 Sep 2020, Zhen Lei wrote:
>>>
>>>> Add some compatible strings for Hisilicon controllers:
>>>> hisilicon,hi6220-sramctrl  --> Hi6220 SRAM controller
>>>> hisilicon,pcie-sas-subctrl --> HiP05/HiP06 PCIe-SAS subsystem controller
>>>> hisilicon,peri-subctrl --> HiP05/HiP06 PERI subsystem controller
>>>> hisilicon,dsa-subctrl  --> HiP05/HiP06 DSA subsystem controller
>>>>
>>>> Signed-off-by: Zhen Lei 
>>>> ---
>>>>  Documentation/devicetree/bindings/mfd/syscon.yaml | 5 -
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> This was already applied by the time you re-sent it.
>>>
>>> Any reason for sending it again?
>>
>> Path 15 are modified. The Document patches except Patch 15 are applied,
>> but the config/DTS patches are not applied(They are applied after I re-sent).
> 
> Could you please only send patches which have not been applied.

No experience. I'll pay attention next time.

> 



Re: [PATCH v6 14/17] dt-bindings: arm: hisilicon: convert hisilicon,hip04-bootwrapper bindings to json-schema

2020-10-10 Thread Leizhen (ThunderTown)



On 2020/10/1 14:41, Krzysztof Kozlowski wrote:
> On Wed, Sep 30, 2020 at 11:17:09AM +0800, Zhen Lei wrote:
>> Convert the Hisilicon Bootwrapper boot method binding to DT schema format
>> using json-schema.
>>
>> The property boot-method contains two groups of physical address range
>> information: bootwrapper and relocation. The "uint32-array" type is not
>> suitable for it, because the field "address" and "size" may occupy one or
>> two cells respectively. Use "minItems: 1" and "maxItems: 2" to allow it
>> can be written in "" or ", "
>> format.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  .../hisilicon/controller/hip04-bootwrapper.yaml| 34 
>> ++
>>  .../controller/hisilicon,hip04-bootwrapper.txt |  9 --
>>  2 files changed, 34 insertions(+), 9 deletions(-)
>>  create mode 100644 
>> Documentation/devicetree/bindings/arm/hisilicon/controller/hip04-bootwrapper.yaml
>>  delete mode 100644 
>> Documentation/devicetree/bindings/arm/hisilicon/controller/hisilicon,hip04-bootwrapper.txt
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/arm/hisilicon/controller/hip04-bootwrapper.yaml
>>  
>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hip04-bootwrapper.yaml
>> new file mode 100644
>> index 000..7378159e61df998
>> --- /dev/null
>> +++ 
>> b/Documentation/devicetree/bindings/arm/hisilicon/controller/hip04-bootwrapper.yaml
>> @@ -0,0 +1,34 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: 
>> http://devicetree.org/schemas/arm/hisilicon/controller/hip04-bootwrapper.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Bootwrapper boot method
>> +
>> +maintainers:
>> +  - Wei Xu 
>> +
>> +description: Bootwrapper boot method (software protocol on SMP)
>> +
>> +properties:
>> +  compatible:
>> +items:
>> +  - const: hisilicon,hip04-bootwrapper
>> +
>> +  boot-method:
>> +description: |
>> +  Address and size of boot method.
>> +  [0]: bootwrapper physical address
>> +  [1]: bootwrapper size
>> +  [2]: relocation physical address
>> +  [3]: relocation size
> 
> Intead: items with each item description (bootwrapper address,
> relocation address). This way also min/max Items should not be needed.

I think it's needed. "reg" also specifies maxItems.

> 
> Best regards,
> Krzysztof
> 
> 
>> +minItems: 1
>> +maxItems: 2
>> +
> 
> .
> 



Re: linux-next: manual merge of the devicetree tree with the mfd tree

2020-10-10 Thread Leizhen (ThunderTown)



On 2020/10/1 20:31, Rob Herring wrote:
> On Thu, Oct 1, 2020 at 1:26 AM Krzysztof Kozlowski  wrote:
>>
>> On Thu, 1 Oct 2020 at 08:22, Stephen Rothwell  wrote:
>>>
>>> Hi all,
>>>
>>> Today's linux-next merge of the devicetree tree got a conflict in:
>>>
>>>   Documentation/devicetree/bindings/mfd/syscon.yaml
>>>
>>> between commit:
>>>
>>>   18394297562a ("dt-bindings: mfd: syscon: Merge Samsung Exynos Sysreg 
>>> bindings")
>>>   05027df1b94f ("dt-bindings: mfd: syscon: Document Exynos3 and Exynos5433 
>>> compatibles")
>>>
>>> from the mfd tree and commit:
>>>
>>>   35b096dd6353 ("dt-bindings: mfd: syscon: add some compatible strings for 
>>> Hisilicon")
>>>
>>> from the devicetree tree.
>>>
>>> I fixed it up (see below) and can carry the fix as necessary. This
>>> is now fixed as far as linux-next is concerned, but any non trivial
>>> conflicts should be mentioned to your upstream maintainer when your tree
>>> is submitted for merging.  You may also want to consider cooperating
>>> with the maintainer of the conflicting tree to minimise any particularly
>>> complex conflicts.
>>>
>>> --
>>> Cheers,
>>> Stephen Rothwell
>>>
>>> diff --cc Documentation/devicetree/bindings/mfd/syscon.yaml
>>> index 0f21943dea28,fc2e85004d36..
>>> --- a/Documentation/devicetree/bindings/mfd/syscon.yaml
>>> +++ b/Documentation/devicetree/bindings/mfd/syscon.yaml
>>> @@@ -40,11 -40,10 +40,14 @@@ properties
>>> - allwinner,sun50i-a64-system-controller
>>> - microchip,sparx5-cpu-syscon
>>> - mstar,msc313-pmsleep
>>>  +  - samsung,exynos3-sysreg
>>>  +  - samsung,exynos4-sysreg
>>>  +  - samsung,exynos5-sysreg
>>>  +  - samsung,exynos5433-sysreg
>>> -
>>> +   - hisilicon,hi6220-sramctrl
>>> +   - hisilicon,pcie-sas-subctrl
>>> +   - hisilicon,peri-subctrl
>>> +   - hisilicon,dsa-subctrl
>>
>> Thanks Stephen, looks good.
>>
>> Zhei,
>> However the Huawei compatibles in the original patch were added not
>> alphabetically which messes the order and increases the possibility of
>> conflicts. It would be better if the entries were kept ordered.
> 
> I've fixed up the order.

Thanks.

> 
> Rob
> 
> .
> 



<    1   2   3   4   >