from:"Jason Gunthorpe"

Re: [PATCH] mtd: document linux-specific partition parser DT binding

2015-10-15 Thread Jason Gunthorpe

On Thu, Oct 15, 2015 at 10:47:09AM -0700, Brian Norris wrote:
> Hmm, I don't know. I wouldn't expect people should really be using
> cmdlineparts as a production solution, but I'd consider it more of a
> debugging/development option -- if I want to override the DT (which is
> sometimes a bit harder to change) or the on-flash layout (e.g.,
> RedBoot), then I can fiddle with the command line.

We don't have cmdline overrides for very much (if any?) of DT.

The DT is supposed to describe the hardware/boot environment, if
overriding the partition layout is really common, then is the DT
binding really describing the hardware at all?

The partition layout is already very border line to support in DT, it
is really very close to a software configuration which is strongly
discouraged from DT files.

I justify it as the boot loader communicating the partition scheme it
understands to the OS, so the OS can have a hope of setting up the
flash in a way the bootloader understands.

I've never used the cmdline option, so maybe I don't see the use case,
but it seems like a really weird desire to change the partitioning
away from what the bootloader supports.

> Any better nominations for names?

bcrm-part-table-43 ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mtd: document linux-specific partition parser DT binding

2015-10-15 Thread Jason Gunthorpe

On Thu, Oct 15, 2015 at 09:22:10AM -0700, Brian Norris wrote:

> Are you trying to use this binding, or is this just purely a mechanical
> documentation issue? I ask, because it seems that binding never really
> got reviewed at all, and others have recently tried to extend
> support

I wrote it before ARM started down the DT direction and all these
formalities were developed.

> That's not to say we can't document the old one, but I'm curious if
> there are real users. I'd also like to encourage new users to avoid the
> old one if we can make that feasible.

We continue to use it here, there is just no way to autodetect flash
partition layouts. I'm not fussed about having to change to something
else in future.

The reason the original patch exposed the cmdline parser is because
that is how the internal mechanisms work - but realistically, the
cmdline parser is a hack to get around the lack of DT partition
description. If the DT can describe the partitions it should supersede
and obsolete the old command line hack. IMHO

> Also, if we're really going to support this, we should list exactly what
> strings we support. And that's one of the problems with the existing
> binding; it supports any old string Linux supports, which doesn't match
> how we typically want to add bindings (i.e., via proposal + review).

That is why it is prefixed with linux, the review of the part parser name is
the responsibility of the MTD crew, not the DT folks.

bcm47xxpart seems like a terrible name for a partition scheme. Really,
almost any scheme can be used on any SOC, naming a partition parser
after a SOC family makes very little sense to me..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 6/7] staging: add simple-fpga-bus

2015-07-23 Thread Jason Gunthorpe

On Thu, Jul 23, 2015 at 02:55:52PM -0700, Moritz Fischer wrote:
> Hi Alan,
> 
> I saw that your socfpga driver doesn't support the partial reconfig
> use case (not a big deal).
> What I currently do for Zynq is if I'm doing a non-partial reconfig is
> that I disable input
> level shifters and assert *all* resets while reprogramming in my FPGA
> manager .write_init() and .write_complete().

I do this as well, but it is a bit more complex.. FPGA specific code
has to run around and ensure all DMA is shut off, then we need to make
sure no CPU issued AXI transactions can happen, then we can tear down
the FPGA side.

If the FPGA is torn down while an AXI op is inprogress things go
sideways, we have to work to prevent that :)

This happens almost for free, I use DT and the device model to
disconnect the drivers. The drivers are careful to synchronously fence
off in-progress DMA. Then drop the DT nodes associated with the
FPGA, finally the actual FPGA cells can be reset.

> In a partial reconfiguration situation, would I have separate
> simple-fpga buses for each of the parts that I swap out, each with
> it's own reset and bitfile attached?

I'd think of partial reconfiguration as another nested FPGA. The
resets and so forth could be attached to soft controllers in the
unswappable part of the FPGA.

DT nodes have to surround it in some way...

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 0/7] FPGA Manager Framework and Simple FPGA Bus

2015-07-22 Thread Jason Gunthorpe

On Wed, Jul 22, 2015 at 03:32:32PM -0500, atull wrote:

> I looked some more; I don't see a simple way of deferring probing until 
> after the filesystem is loaded (so that the image file would be 
> available), late_initcall is still not late enough.

That seems weird, are you saying you can't use request firmware at all
from a compiled in driver? I don't know much about that flow, sorry.

Having that work would mean the system can have a reasonable in-tree
use case without requiring the out of tree dt overlay stuff.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 3/7] staging: add bindings document for simple fpga bus

2015-07-17 Thread Jason Gunthorpe

On Fri, Jul 17, 2015 at 09:49:13PM +0200, Steffen Trumtrar wrote:
> What you are describing here is a virtual bus, that is not existing on
> at least the Socfpga, right?

I wouldn't get too hung up on bus or not bus, the HW these days
doesn't even use busses.

For AXI systems, which is all the ARM designs, the the bridge between
the CPU and FPGA is always a physical AXI link hanging off one of the
CPU block's internal AXI switches.

I choose to model these ports in DT explicitly, because they represent
swappable attachment points, and often the switch ports have
programmable registers related to that port.

FPGA logic is always hanging off one of these physical ports.

Usually there are multiple independent links between the CPU and the
FPGA (ie Xilinx Virtex 4 has at least 4 CPU to FPGA bus links, Zynq
has something like 7)

Ie on Zynq, I do things like this:

/ {
/* This is a simplification of the 5 AXI interconnect switches
   hardwired inside the CPU block */
ps7_axi_interconnect_0: amba@0 {
// This is the register block that programs the FPGA
ps7-dev-cfg@f8007000 {
clock-names = "ref_clk", "fclk0", "fclk1", "fclk2", 
"fclk3";
clocks = <&clkc 12>, <&clkc 15>, <&clkc 16>, <&clkc 
17>, <&clkc 18>;
compatible = "xlnx,zynq-devcfg-1.0";
interrupt-parent = <&ps7_scugic_0>;
interrupts = <0 8 4>;
reg = <0xf8007000 0x100>;
};

// This node bundles the entire FPGA side
programmable: fpga@pl {
// This is a physical port on one of the CPU core's AXI swithces
gp0_axi: axi@4000 {
k_gpio@8 {
k_sysmon@81000 {
gpio3: klina_gpio@80010 {
i2c_qsfp1 {
}

// This is another physical port on one of the CPU core's AXI 
swithces
gp1_axi: axi@8000 {
}

// The FPGA bitstream also hooks up to a 3rd AXI port for 
initiating DMA
// hp0_axi ...
}
}

Zynq has 5 internal AXI switches, but the typical Linux DT models them
all as single DT 'bus'

I've brought the FPGA exposed AXI ports out under the programmable
node, purely for convenience of coding the dynamic load/unload of all
the FPGA elements. We do full hot swap of the FPGA during system
operation.

The physical ports could be located someplace within the amba@0, but
since amba@0 is basically already a lie, I don't really mind this
slight divergence as it makes logical sense and life easier.

Usually what my FPGA designs do is take the AXI port(s) and then fan
out to something else, either more AXI ports through yet another AXI
switch, or convert to a low speed multi-drop bus and fan that out to a
number of devices. I don't usually model this, because it provide no
value to Linux to know these details.

Ultimately the gp0_axi/gp1_axi have a number of peripheral childern,
each with their own drivers, interrupts, etc.

In this particular design gp1_axi bridges to a FPGA soft-core which
provides a physical bus to another FPGA, which is represented as more
nested nodes, and another FPGA programmable node.

DMA for the brdige side flows through the hp0_axi port. (it consumes 3
AXI ports IIRC)

I think the simplified DT modeling is a reasonable compromise.

>   b) shouldn't the h2f/lwh2f/f2h bridges be the "simple-bus devices" instead 
> of
>  just reset-controllers ? What about e.g. the bus width of the
>  bridges?

On Zynq there are a variety of resets and clocks going from the CPU
block to the FPGA, they all need to be configured properly to run
correctly, and they need a home. The control registers for these are
located someplace under amba@0, but I'd be happy to see the actual
values to program contained under the programming node. That fits much
better with the overlay scheme.

There is also some AXI port-specific registers that may need tuning as
well, but they can naturally fit under the axi port nodes..

>  It can change depending on the bitstream. When I have an IP core that 
> does
>  DMA I might want my driver to be able to configure the bus width 
> accordingly.
>  There are other settings in the bridges that I can not set when they are 
> just
>  reset controllers.

bridge@0xff20 should be the bus port and it can have configuration...

AXI negotiates things like link width dynamically, and for AXI, DMA
doesn't flow through the same AXI port as the control registers
anyhow.

DT is a very poor fit for a modeling a modern AXI interconnect system,
so there is often some irrelevant lossage..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-inf

Re: [PATCH v9 0/7] FPGA Manager Framework and Simple FPGA Bus

2015-07-17 Thread Jason Gunthorpe

On Fri, Jul 17, 2015 at 10:51:10AM -0500, at...@opensource.altera.com wrote:
> From: Alan Tull 
> 
> This patchset adds two chunks plus documentation:
>  * fpga manager core: exports ABI functions that write an image to a FPGA
>  * DT Overlay support: simple-fpga-bus to handle FPGA from a DT overlay

I didn't read super closely, but overall it makes sense to me..

Providing an in-kernel API will let someone else figure out how to
expose that to user space. The DT based scheme seems pretty nice.

Can you use this without DT overlay? Ie if I provide the FGPA
description as part of my boot time DT will it just work?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/5] tee: generic TEE subsystem

2015-07-09 Thread Jason Gunthorpe

On Thu, Jul 09, 2015 at 02:49:08PM +0200, Jens Wiklander wrote:
> > Generally in a scheme like this we'd see open and release get/put the
> > underlying module handle to prevent driver removal while the char dev
> > is open. Otherwise module removal will hang here.
> 
> I'm perhaps misunderstanding you. While the cdev has any open file
> descriptors rmmod will fail with "Resource temporarily unavailable"
> because of fops_get() in chrdev_open().

Hmm, I see, you've done this:

+   cdev_init(&teedev->cdev, &tee_fops);
+   teedev->cdev.owner = teedesc->owner;

And owner here is the driver module? Interesting.. Looks OK..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/5] tee: generic TEE subsystem

2015-07-08 Thread Jason Gunthorpe

On Wed, Jul 08, 2015 at 03:33:25PM -0700, Greg Kroah-Hartman wrote:
> > The basic issue is that cdev_del doesn't seem to be synchronizing.
> > 
> > The use after free race is then something like:
> > 
> >struct tpm_chip {
> > struct device dev;
> > struct cdev cdev;
> 
> Oops, right there's your problem.  You can't have two reference counted
> objects trying to manage the memory of a single structure.  No matter
> what you do, it's going to be a pain to deal with this, so don't :)

Sure, generally, yes, but that isn't done for no reason, it is to make
open straightforward:

static int tpm_open(struct inode *inode, struct file *file)
{
struct tpm_chip *chip =
container_of(inode->i_cdev, struct tpm_chip, cdev);

We need to recover the tpm_chip associated with the char device
node, in a way that is holding a kref on it, without racing with
cdev_del/etc

This scheme does mean that if we have a struct file we have a kref on
the cdev, and if we have cdev then we have a kref on the tpm_chip,
which is really easy to use properly.

> > Ie we need cdev to hold a ref on tpm_chip->dev until cdev_put is
> > called.
> 
> No, separate them, make the cdev a pointer and all should be fine.

Okay, cdev_alloc takes care of the cdev lifetime.

Do you have a simple solution to replace container_of as well?

What would you think about something like:

 cdev_alloc(&chip->dev.kref)

?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/5] tee: generic TEE subsystem

2015-07-08 Thread Jason Gunthorpe

On Wed, Jul 08, 2015 at 02:11:29PM -0700, Greg Kroah-Hartman wrote:
> > > +   cdev_init(&teedev->cdev, &tee_fops);
> > > +   teedev->cdev.owner = teedesc->owner;
> > 
> > This also needs to set teedev->cdev.kobj.parent.
> > I'm guessing:
> > 
> >  teedev->cdev.kobj.parent = &teedev->dev.kobj;
> > 
> > TPM had the same mistake..
> 
> Really?  As of a few years ago, A cdev's kobject should not be touched
> by anything other than the cdev core.  It's not a "real" kobject in that
> it is never registered in sysfs, and no one sees it.  I keep meaning to

Well, when I looked at it, it looked like it was necessary to maintain
the refcount on the memory that is holding cdev.

The basic issue is that cdev_del doesn't seem to be synchronizing.

The use after free race is then something like:

   struct tpm_chip {
struct device dev;
struct cdev cdev;

   CPU0CPU1
= ==
tpm_chip = kalloc
cdev_add(&tpm_chip->cdev)
device_add(&tpm_chip->dev)
chrdev_open
 filp->f_op->open
cdev_del(&tpm_chip->cdev)
device_unregister
   (&tpm_chip->dev)
 kfree(tpm_chip)
  tpm_chip = container_of
 fput
  cdev_put(.. cdev)

Ie we need cdev to hold a ref on tpm_chip->dev until cdev_put is
called.

> just use something else one of these days for that structure, as lots of
> people get it wrong.  Or has things changed there?

Not recently, but this is the commit:

commit 2f0157f13f42800aa3d9017ebb0fb80a65f7b2de
Author: Dmitry Torokhov 
Date:   Sun Oct 21 17:57:19 2012 -0700

char_dev: pin parent kobject

In certain cases (for example when a cdev structure is embedded into
another object whose lifetime is controlled by a separate kobject) it is
beneficial to tie lifetime of another object to the lifetime of
character device so that related object is not freed until after
char_dev object is freed.

To achieve this let's pin kobject's parent when doing cdev_add() and
unpin when last reference to cdev structure is being released.

Signed-off-by: Dmitry Torokhov 
Acked-by: Al Viro 
Signed-off-by: Linus Torvalds 

It doesn't seem the be the best situation, this is the 3rd time this
week I've noticed cdev with a kalloc'd struct being used improperly.

Perhaps cdev_init should accept the module and kref parent as an
argument?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/5] tee: generic TEE subsystem

2015-07-08 Thread Jason Gunthorpe

On Wed, Jul 08, 2015 at 12:16:30PM +0200, Jens Wiklander wrote:

> +static void tee_device_complete_unused(struct kref *kref)
> +{
> + struct tee_device *teedev;
> +
> + teedev = container_of(kref, struct tee_device, users);
> + /* When the mutex is released, no other tee_device_get() will succeed */
> + teedev->desc = NULL;
> + complete(&teedev->c_no_users);
> +}
> +
> +void tee_device_put(struct tee_device *teedev)
> +{
> + mutex_lock(&teedev->mutex);
> + /* Shouldn't put in this state */
> + if (!WARN_ON(!teedev->desc))
> + kref_put(&teedev->users, tee_device_complete_unused);
> + mutex_unlock(&teedev->mutex);
> +}
> +
> +bool tee_device_get(struct tee_device *teedev)
> +{
> + mutex_lock(&teedev->mutex);
> + if (!teedev->desc) {
> + mutex_unlock(&teedev->mutex);
> + return false;
> + }
> + kref_get(&teedev->users);
> + mutex_unlock(&teedev->mutex);
> + return true;
> +}

If you are holding the mutex then you don't really need a kref, just a
simple active count counter.

I've been a bit learly lately about seeing krefs used for something
other than kfree, I've seen a few subtle mistakes in those schemes -
yours looks OK, only because of the lock, and the lock makes the kref
redundant..

> +   cdev_init(&teedev->cdev, &tee_fops);
> +   teedev->cdev.owner = teedesc->owner;

This also needs to set teedev->cdev.kobj.parent.
I'm guessing:

 teedev->cdev.kobj.parent = &teedev->dev.kobj;

TPM had the same mistake..

> +void tee_device_unregister(struct tee_device *teedev)
> +{
> + if (IS_ERR_OR_NULL(teedev))
> + return;

See for some general colour on IS_ERR_OR_NULL

https://www.mail-archive.com/linux-omap@vger.kernel.org/msg78030.html

IMHO, you should never, store an ERR pointer into long term storage,
so I wonder why this is like this...

> + if (teedev->flags & TEE_DEVICE_FLAG_REGISTERED) {
> + cdev_del(&teedev->cdev);
> + device_del(&teedev->dev);
> + }
> +
> + tee_device_put(teedev);
> + wait_for_completion(&teedev->c_no_users);

Generally in a scheme like this we'd see open and release get/put the
underlying module handle to prevent driver removal while the char dev
is open. Otherwise module removal will hang here.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [RFC PATCH 1/2] tee: generic TEE subsystem

2015-04-20 Thread Jason Gunthorpe

On Mon, Apr 20, 2015 at 08:20:44AM +0200, Jens Wiklander wrote:

> I'm not sure I understand what you mean. This function is a building
> block for the TEE driver to supply whatever interface is needed for user
> space. For a Global Platform like TEE it will typically have support for
> TEEC_OpenSession(), TEEC_InvokeCommand(), TEEC_RequestCancellation() and
> TEEC_CloseSession(). But how that's done is depending on how the
> interface towards the TEE (in secure world) looks like. From what I've
> heard so far those interfaces diverges a lot so we've compromised with
> this function.

The goal of the mid layer is to bring all these differences into a
common abstraction, not punt on them to higher layers.

The goal if the driver is to translate and transport the common
abstraction to the hardware.

It is an absolute failure if each TEE driver implements a different
TEEC_OpenSession() ioctl. They must be the same, the common code
must de-marshal the request from user space and then call
ops->open_session()

Driver specific ioctls are a terrible way to start a new mid layer.

> > What is the typical and maximum allocation size here?
> It depends on the design of the Trusted Application in secure world and
> the client in user space.  A few KiB could be the typical allocation
> size, with a maximum at perhaps 512 KiB (for instance when loading a
> very large Trusted Application).

So this TEE stuff also encompasses a 'firmware' loader (to the secure
world, presumably)?

That is probably your base level of 'ops' functionality, plus the
shared memroy stuff.

How does this work if two userspace things run concurrently with
different firmwares? Is there some locking or something? What is the
lifetime of this firmware tied to?

> I agree that we can drop least one of the _version fields probably both,
> but something is needed for user space to be able to know which TEE (in
> secure world) it's communicating with. The uuid will let the client know
> how how to format the commands passed to TEE_IOC_CMD below.

So you load the firmare, learn what command set it supports then use
TEE_IOC_CMD to shuttle firmware-specific data to and from?

> I've touched on this above, this function the essential when
> communicating with the driver and secure world. Different TEEs (running
> in some secure environment) provides different interfaces. By providing
> an opaque channel we don't have to force something on the TEE.  The
> problem is moved to the user space library which is used when talking to
> the TEE. The assumption here is that the interface provided the TEE is
> stable or something that the specific TEE driver can handle with a glue
> layer.

I would use read/write for this, not ioctl. read/write can work with
select/poll so you can send your command then go into a polling loop
waiting for the reply from the firmware.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [RFC PATCH 1/2] tee: generic TEE subsystem

2015-04-20 Thread Jason Gunthorpe

On Mon, Apr 20, 2015 at 03:02:03PM +0200, Jens Wiklander wrote:
> > It appeared to me this driver was copying TPM's old architecture,
> > which is very much known to be broken.
> 
> The struct tee_device holds a shared memory pool from which shared
> memory objects are allocated. These shared memory objects can be mapped
> both by user space and secure world. 

So this is a whole other set of problems besides what was already
brought up.

You need to figure out a lifetime model for this shared memory that
works.

> To come around the problem with what should happen when the driver
> is removed I'm increasing the refcount on the driver for each
> allocated shared memory object and created file pointers. As long as
> any resource is in use by either user space or secure world the
> driver can't be unloaded.

This isn't how the kernel works. The module refcount effects module
unload (it protects the .text) - it does not interact with driver
detatch. Userspace can trigger driver detatch (which results in
tee_unregister being called) at any time via sysfs.

If you properly design for that case then module unload sequencing
works properly for free.

Based on what I gather, I would suggest the following sequence in
tee_unregister
 - unregister all sysfs and char dev registrations.
 - Write lock ops and set to null. This will error future cdev ioctls,
   and guarentees no driver ops callbacks are in progress, or will be
   started in future.
 - Wait until all client accesses to shared memory are
   released.
 - Command the driver to release it's side of the
   shared memory and wait for that to complete
 - Free the shared memory
 - deref the tee_device's struct device (match ref in tee_register)

Then in your struct tee_device's release function free the tee_device
memory.

Replace all the module locking code with an active count in struct
tee_device (see something like kernfs_drain for an example).

> * Change to use the pattern (with a struct device etc) as described
>   above.

Yes, I think Greg confirmed you need to use a struct device, and purge
misc_device from the mid layer.

>   I can't protect the ops with just a mutex since tee_ioctl_cmd() needs to
>   be multithreaded.

Then use a sleeping read/write lock - aka an active count.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [RFC PATCH 1/2] tee: generic TEE subsystem

2015-04-20 Thread Jason Gunthorpe

On Mon, Apr 20, 2015 at 04:54:32PM +0200, Greg Kroah-Hartman wrote:
> On Sun, Apr 19, 2015 at 11:08:00PM -0600, Jason Gunthorpe wrote:
> > I still suspect the expected way to write a new mid layer is to create
> > your own struct device and not rely on misc_device,
> 
> Yes, that is the way.  You can not use misc_device for anything other
> than creating the char node that your driver can use through the fileops
> you pass to it.
> 
> Do not use a misc_device to create sysfs files for, or anything else, it
> will be wrong and racy, as you have pointed out.

Thanks!

Can you quickly comment on when a misc device should be used compared to
alloc_chrdev_region & cdev_add ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [RFC PATCH 1/2] tee: generic TEE subsystem

2015-04-19 Thread Jason Gunthorpe

On Sat, Apr 18, 2015 at 10:57:55PM +0100, Russell King - ARM Linux wrote:

> > But then we trundle down to:
> > 
> > + ctx->teedev->desc->ops->get_version(ctx, &vers.spec_version,
> > +   vers.uuid);
> > 
> > If we kref teedev so it is valid then calling a driver call back after
> > (or during) it's remove function is very likely to blow up.
> 
> Why?
> 
> Normally, the file_operations will be associated with the module which,
> in this case, called misc_register() - the same module which contains
> the remove function.

As I read it, tee is similar to TPM and RTC, this code is the mid
layer code, and resides in it's own module while the function pointers
in 'ops' reside in the actual driver. So there are two modules.

There is some mucking in tee to keep the driver module around, it is
probably OK for the chardev, but again, sysfs, probably not. Absent
that code I think the module ops points to could be unloaded and the
.text backing the ops function can go away.

However, I was worrying about the driver itself - what is a driver to
do if a callback is called after tee_unregister is called? It must do
nothing and return error. Drawing from TPM, most drivers did not have
this check, so they blow up. Ie devm frees all their resources after
tee_unregister returns and they quickly deref null, or sleep on
disabled IRQ or timer, or something else fatal.

It seems like a good idea to have the midlayer guarentee the driver
ops cannot be called after the midlayer unregister function returns so
that drivers can operate in a simple environment. Nulling ops also
avoids caring about the driver's module ref count.

> > Also, in TPM we discovered that adding a sysfs file was very ugly
> > (impossible?) because without the misc_mtx protection that open has,
> > getting a locked tee_device in the sysfs callback is difficult.

> So, attaching attributes to the class device is _really_ a no-go.
> The attributes should be attached to the parent device - the device
> which is actually being driven.

But common midlayer convention is to put them in the same directory as
the 'dev', ie:

/sys/devices/pnp0/00:06/rtc/rtc0/dev
/sys/devices/pnp0/00:06/rtc/rtc0/max_user_freq

Which, I think, is misc->this_device

> attribute is removed from a struct device, its method won't be
> called any further (if it doesn't do this, I suspect we have a fair
> number of buggy drivers...)

Hmm, I went and looked again, and it seems kernfs_drain is part of a
mechanism that causes kernfs_remove to wait until all opens are
closed, so it sure looks like a sysfs callback cannot be called after
it is removed.

Years ago lwn documented that this wasn't true,
https://lwn.net/Articles/54651/, it is nice to see that has changed.

I still suspect the expected way to write a new mid layer is to create
your own struct device and not rely on misc_device, but use-after-free
races from sysfs doesn't seem to be a motivating reason...

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [RFC PATCH 1/2] tee: generic TEE subsystem

2015-04-18 Thread Jason Gunthorpe

On Sat, Apr 18, 2015 at 10:01:47AM +0100, Russell King - ARM Linux wrote:
> On Fri, Apr 17, 2015 at 10:30:54AM -0600, Jason Gunthorpe wrote:
> > On Fri, Apr 17, 2015 at 09:50:56AM +0200, Jens Wiklander wrote:
> > > + teedev = devm_kzalloc(dev, sizeof(*teedev), GFP_KERNEL);
> > [..]
> > > + rc = misc_register(&teedev->miscdev);
> > [..]
> > > +void tee_unregister(struct tee_device *teedev)
> > > +{
> > [..]
> > > + misc_deregister(&teedev->miscdev);
> > > +}
> > [..]
> > >+static int optee_remove(struct platform_device *pdev)
> > >+{
> > >+   tee_unregister(optee->teedev);
> > 
> > Isn't that a potential use after free? AFAIK misc_deregister does not
> > guarentee the miscdev will no longer be accessed after it returns, and
> > the devm will free it after optee_remove returns.
> > 
> > Memory backing a stuct device needs to be freed via the release
> > function.
> 
> Out of interest, which struct device are you talking about here?

Sorry, I was imprecise. In the first paragraph I ment 'miscdev' to
refer to the entire thing, struct tee_device, struct misc_device, the
driver allocations, etc.

So, the first issue is the use-after-free via ioctl() touching struct
tee_device that you described.

But then we trundle down to:

+ ctx->teedev->desc->ops->get_version(ctx, &vers.spec_version,
+   vers.uuid);

If we kref teedev so it is valid then calling a driver call back after
(or during) it's remove function is very likely to blow up.

Also, in TPM we discovered that adding a sysfs file was very ugly
(impossible?) because without the misc_mtx protection that open has,
getting a locked tee_device in the sysfs callback is difficult.

With TPM, we ended up trying lots of options for fixing struct
misc_device in the tpm core, which is handling multiple sub drivers,
and basically gave up. Gave each struct tpm_device an embedded struct
device like Greg suggested here. Then the tpm core is working with the
APIs, not struggling against them.

But this is not a user-space invisible change, so better to do it right
from day 1 ..

We followed rtc as an example of how to create a mid-layer that
exports it's own register function, with char dev and sysfs
components. It seems properly implemented, and has elegant solutions
to these problems (like ops):
 - Don't mess with modules, use 'ops' and set 'ops' to null when the
   driver removes. The driver core will keep the driver module around
   for you bettwen the probe/remove calls. Setting ops = NULL ensures driver
   module code cannot be called after remove.
 - Use locking for 'ops' to serialize driver callbacks with driver removal
 - Embed a struct device/etc in the struct tee_device and use the release
   function to deallocate struct tee_device. All callbacks from the
   driver/char/sysfs core can now use container_of on something that
   is already holds the right kref.
 - Consider an alloc/register pattern as we use now in TPM. This has proven
   smart for TPM as it allows:
   alloc tee_device + init struct device, etc
   driver setup
   core library helper calls for setup/etc
   driver register + char dev publish

It appeared to me this driver was copying TPM's old architecture,
which is very much known to be broken.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [RFC PATCH 1/2] tee: generic TEE subsystem

2015-04-17 Thread Jason Gunthorpe

On Fri, Apr 17, 2015 at 09:50:56AM +0200, Jens Wiklander wrote:
> + teedev = devm_kzalloc(dev, sizeof(*teedev), GFP_KERNEL);
[..]
> + rc = misc_register(&teedev->miscdev);
[..]
> +void tee_unregister(struct tee_device *teedev)
> +{
[..]
> + misc_deregister(&teedev->miscdev);
> +}
[..]
>+static int optee_remove(struct platform_device *pdev)
>+{
>+   tee_unregister(optee->teedev);

Isn't that a potential use after free? AFAIK misc_deregister does not
guarentee the miscdev will no longer be accessed after it returns, and
the devm will free it after optee_remove returns.

Memory backing a stuct device needs to be freed via the release
function.

We have been going through this for a while with TPM - it seems like
using misc devices dynamically is not a good idea. Manage your own
struct device directly..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-02-17 Thread Jason Gunthorpe

On Sun, Feb 15, 2015 at 11:40:06PM +0100, Pavel Machek wrote:

> > So keeping that much memory pinned in the kernel when I can prove it
> > is uncessary for my system (either because there is no suspend/resume
> > possibility, or because I know the CPU can always access the
> > filesytem) is very undesirable.
> 
> Well, your current device aalso has 1GB RAM, no?

No, certainly not. 256MB - the board only has space for two x16 DDR3
chips, and at design time 1GBit was about the biggest that could be
reasonably obtained.

It is also using a Zynq chip for management and the next firmware spin
is likely to throw away 50% of that space to enable Xilinx's narrow ECC
scheme on the ram.

The Linux environment requires only about 40M of ram for runtime, as
it was designed for systems with only 64M of ram, so even the 128M is
overkill.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Aw: [PATCH v5 3/4] tpm/st33zp24/spi: Add st33zp24 spi phy

2015-01-28 Thread Jason Gunthorpe

On Wed, Jan 28, 2015 at 10:49:39PM +0100, peterhu...@gmx.de wrote:

> >> I will go for a define making the code more readable together with a
> >> comment.
> >
> >You can probably just make these static arrays inside your priv
> >structure and drop these independent allocations:
> 
> Hmm, doesn' t spi stuff need dma'able buffers?

Well, the buffer allocation is already devm_kalloc, so it won't change
the properties to combine it with the devm_kalloc that creates the
priv structure.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Aw: [PATCH v5 3/4] tpm/st33zp24/spi: Add st33zp24 spi phy

2015-01-28 Thread Jason Gunthorpe

On Wed, Jan 28, 2015 at 08:13:33PM +0100, RICARD Christophe wrote:
> Hi Peter,
> 
> A TPM command can be up to 2048 byte, A TPM response can be up to 1024 byte.
> Between command and response, there are latency byte (up to 15
> usually on st33zp24 2 are enough).
> 
> Overall when sending a command and expecting an answer we need in
> worst case:
> 2048 (for the TPM command) + 1024 (for the TPM answer).  We need
> some latency byte before the answer is available (max 15).
> We have 2048 + 1024 + 15.
> 
> I will go for a define making the code more readable together with a
> comment.

You can probably just make these static arrays inside your priv
structure and drop these independent allocations:

> >+ phy->spi_xfer.tx_buf = devm_kmalloc(&dev->dev, (TPM_BUFSIZE +
> >+ (TPM_BUFSIZE / 2) + MAX_SPI_LATENCY),
> >+ GFP_KERNEL);

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-21 Thread Jason Gunthorpe

On Wed, Jan 21, 2015 at 06:33:12PM +0200, Pantelis Antoniou wrote:
> Hi Alan,
> 
> > On Jan 21, 2015, at 18:01 , One Thousand Gnomes 
> >  wrote:
> > 
> > On Thu, 15 Jan 2015 22:54:46 +0200
> > Pantelis Antoniou  wrote:
> > 
> >> Hi Alan,
> >> 
> >>> On Jan 15, 2015, at 22:45 , One Thousand Gnomes 
> >>>  wrote:
> >>> 
> >>> On Thu, 15 Jan 2015 11:47:26 -0700
> >>> Jason Gunthorpe  wrote:
> >>>> It is a novel idea, my concern would be that embedding the FPGA in the
> >>>> DT makes it permanent unswappable kernel memory.
> >>>> Not having the kernel hold the FPGA is best for many uses.
> >>> 
> >>> If you have a filesysytem before the FPGA is set up then it belongs in
> >>> the file system. As you presumably loaded the kernel from somewhere there
> >>> ought to be a file system (even an initrd).
> >>> 
> >> 
> >> Request firmware does not imply keeping it around. You can always 
> >> re-request
> >> when reloading (although there’s a nasty big of caching that needs to be
> >> resolved with the firmware loader).
> > 
> > Which comes down to the same thing. Unless you can prove that there is a
> > path to recover the firmware file that does not have any dependancies
> > upon the firmware executing (and those can be subtle and horrid at times)
> > you need to keep it around for suspend/resume at least and potentially
> > any unexpected error/reset.
> > 
> 
> In that case the only safe place to put it is in the kernel image itself, 
> which
> is something the firmware loader already supports.

My point is that the current firmware layer is overly cautious and
FPGAs are very big. My current project on small Xilinx device has a
10MB programming file. The biggest Xilinx device today has a max
bitfile size around 122MB.

So keeping that much memory pinned in the kernel when I can prove it
is uncessary for my system (either because there is no suspend/resume
possibility, or because I know the CPU can always access the
filesytem) is very undesirable.

Other systems might have to take the ram hit.

Since it seems like the kernel has no way to know, we probably have
have a way to tell it not to cache the bitfile.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-15 Thread Jason Gunthorpe

On Thu, Jan 15, 2015 at 08:45:02PM +, One Thousand Gnomes wrote:

> > - Hand over to a DT overlay (how does this work?) Lock transfers
> >   from FD to kernel
> 
> That bit isn't stateful so I would actually have expected something in
> the kernel ABI along the lines of 
> 
>request_fpga(blah)
> 
> which does
> 
>  if in use by user
> EBUSY
>  lock it (all user opens will fail)
> 
> and
> 
> release_fpga(blah)

I am imagining two ways to start a kernel FPGA, the in-kernel method
triggered by a DT node:

  fpga = request_and_lock_fpga(of_get_fpga_controller(dev->of_node))
  fw = request_firmat(of_get_fpga_firmware(dev->of_node));
  fpga_program_fw(fpga, fw);

  for_each_child_of_node(dev->of_node, child)
.. of_platform_bus_probe(child ... ) ..

  .. somehow fpga and its lock transfer to dev->of_node ..

The problem with this is it assumes the FPGA is ready to go
immediately after fpga_program_fw. There are a few platforms that can
manage this, but many others require at least some kind of startup
sequence - eg wait for clocking PLLs to lock, do low level setup, etc.

For very common cases (like Zynq can have a pretty common setup scheme
for the PL side) the DT binding can guide the kernel to run the right
code, and the code can live in the kernel because it is simple and
broadly useful.

For wild cases I'd like to just punt the setup code to user space. It
is safer and simpler to run that complexity in a user process than in
the kernel.

Maybe there is a better way to handle this, but I have been under the
impression that these days it is frowned on for the kernel to wait for
userspace?

So my idea is to use the user space method to also load a 'kernel'
fpga, the process follows the kernel flow:

   fd = open("/dev/fpga0"); // request_and_lock_fpga

   ioctl(fd,START_PROGRAMMING); // fpga_program_fw
   write(fd,fw,fw_size);

   // Arbitary complexity here
   userspace_setup_fpga();

   icotl(fd,BIND_DT_OVERLAY, .. ?? ..) // for_each_child_of_node

This is what the state is about, if the setup fails I expect the
kernel to go and unlock and deprogram the FPGA if fd is closed. Only a
success on BIND_DT_OVERLAY will make the FPGA permanent beyond closing
fd.

Pantelis: I think this explains why a fd and ioctls. configfs, sysfs,
etc lack the FD context to do the cleanup. 

Essentially, I think a running FPGA should always be attached to some
context that is the user - either the user is in-kernel (a DT of_node)
or the user is in userspace (the fd).

The invarient is - if there is no user then the kernel automatically
makes the FPGA deprogrammed.

Having a device that is potentially attached to the CPU bus running
'rouge' is dangerous. We can't realistically deprogram it safely if we
don't know what it is doing. Eg deprogramming in the middle of a DMA
operation between the CPU and FPGA will often crash the entire system.

The only way to provide reasonable safe guards is to have a clear
chain of ownership. When the kernel takes ownership of the FPGA it
also takes ownership of any cleanup required prior to deprogram.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-15 Thread Jason Gunthorpe

On Thu, Jan 15, 2015 at 10:34:39AM -0600, atull wrote:

> This is great!  The way I had it working was using Pantelis' devicetree
> configfs interface.

I figured you were very close to this already in your overlay work..

> The DT fragment described the FPGA logic and included a filename
> for firmware to load. In another branch of this thread, I see discussion
> starting on what the overlay should look like and whether it could somehow
> contain the DT itself.

It is a novel idea, my concern would be that embedding the FPGA in the
DT makes it permanent unswappable kernel memory.

Not having the kernel hold the FPGA is best for many uses.

Having the kernel hold the FPGA as a swppable file handle/mmap of some
sort is next best (obviously the fs must be operating during resume)

Unswappable kernel memory is the worst choice

> Long ago this driver started out with a /dev interface.  It didn't have
> an ioctl yet at that point, but programming the fpga was by opening
> the devnode and writing to it.  Greg KH preferred sysfs or configfs
> over adding another ioctl:

I think to justify the ioctls you need a reason to have the context
of a FD. sysfs is stateless, so if my process dies the kernel doesn't
know.

But now that we are talking about adding locking and ownership
concepts a FD is the natural anchor for that in user space.

Ie, if I open the dev node, program a FPGA and then crash the kernel
doesn't attach drivers, and immediately de-programs the
chip. Userspace has to make it all the way through to the DT bind
before the FPGA lifetime would exceed the FD.

> https://lkml.org/lkml/2013/10/8/677

I think Greg's reply makes sense in the context of the question being
asked. Thinking of the FPGA as lockable ref counted kind of resource
changes the question somewhat.

Identifying the ioctls needed would probably clarify things. My
rough start would be 

- Get status: programed, not programmed, error?
  Bitfile type? (eg Xilinx has nearly every permutation of bit/byte
  ordering)
- Start Program with with some kind of context (ie this a new bit
  file, partial reconfiguration basis X, partial reconfiguration
  overlay on X)
- for (;;) write() to do programming
- Get Error to return detailed failure information (CRC error,
  auth error, etc)
- Hand over to a DT overlay (how does this work?) Lock transfers
  from FD to kernel

-  .. something something VFIO .. ?

Where start program is refused if the FPGA is already locked, and
  locks it 
Where start program -> close() returns the FPGA back to reset and
  unlocks
Where start program -> hand over -> close() keeps the FPGA loaded with
 kernel drivers attached and fpga locked (remove the overlay to
 de-program and unlock)

Not sure exactly how to tie together DT overlays with the FPGA state,
but that seems the natural combination..

Not sure about partial reconfiguration - clearly the kernel needs to
know and check that the bitfiles are of the correct family, I wonder
if the approach should be to program a basis on the FPGA which then
creates a new FPGA device in the system that can accept the partial
reconfiguration - this way the locking makes sense, the basis is
locked by the kernel and devices and the overlay remains
lockable/swappable/whatever by a dedicated /dev/ node ??

Just thinking aloud, I've never had a use case for partial
reconfiguration.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-14 Thread Jason Gunthorpe

On Wed, Jan 14, 2015 at 04:06:17PM +, One Thousand Gnomes wrote:

> and I think you effectively have the user usage covered here for such
> things. It much like GPIO pins - we can describe them but we can also
> declare they are not visible to the user.

A missing element in mainline is a kind of VFIO scheme to let user
space access the FPGA registers designated for user space use.

We have been using these patches since 2.6.16 to allow user space to
access the FPGA register resources via a PCI like /sys/../resource0
mmapable:

https://github.com/jgunthorpe/linux/commit/59d5d13ddeffa8980ccc6248ebb5f1678ccb23f4

https://github.com/jgunthorpe/linux/commit/7c29c4344627be8a3906d64d32db533bc131df86

https://github.com/jgunthorpe/linux/commit/e41bb4a197368a9d505d66b627aee82f2d2b8895

We deliberately split up the FPGA registers and the assign user space
permissions to the resource0 files in a way that makes sense for our
app. Typically there are 10-20 FPGA register regions. User space does
not access register regions that control DMA.

> The swappable case mostly comes out of the /dev node. Once you have the
> dev node it becomes a detail of the OS not the FPGA driver as to who may
> open it, and how it is handed about. It might be an FPU manager daemon it
> might not.

Right, but the thing that scares me about the swappable case is the
kernel side support.. The FPGA has to connect to the CPU in some way,
it must have some address assigned, etc. Swapping needs to take care
of all those details in some way.

A fixed bus interface block and dynamic reconfiguration for the
remainder is probably the way to manage that. But, that implies that
even a family of swappable FPGAs will have a DT overlay associated
with it.

Ideally, I could see wanting to have a file format that combines the
overlay and FPGA bitfile. A loader tool would use the /dev/ interface
to setup both elements. That would be much more robust than the
current scheme I see (eg Xilinx) using where the bitfile and DT bit
live in completely different places and have to be perfectly matched.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-13 Thread Jason Gunthorpe

On Tue, Jan 13, 2015 at 03:37:14PM -0600, atull wrote:

> > I do agree with this, and I think this is where this patch set goes so
> > wrong.
> > 
> > Just exposing all sorts of controls to userspace and having a way for
> > the kernel to load/unload a bitstream compleletely misses the point that
> > the two main ways to use FPGAs *require* some kind of kernel side
> > driver suppport on the FPGA.
> 
> Hi Jason,
> 
> I may be misunderstanding.  I thought you wanted lots of user control
> a couple years ago.
> 
> https://lkml.org/lkml/2013/9/18/490

Yes, that is absolutely true, that is certainly where my projects have
been historically, and in the context of requiring user space to
manage the whole process I think that is completely right.

But - I like Alan's point - this should be a higher level thing, the
kernel should be able to load the FPGA, and handle binding the drivers
so user space doesn't need to be involved, and it makes sense that
should be the place to start the API design.

> Your feedback carried a lot of weight as I've been developing this patchset
> although I didn't do absolutely everything you asked for.  Back then you
> were asking for lots of user control including being able to reset the FPGA
> and being able to know from userspace what configuration steps failed (if 
> any).

Yes, this is all true. Since my FPGAs all seem to require a software
mediated startup sequence the dream of having a DT scheme like I
described is farther away for me, but I think that is not the 90%
case. Almost all vendor IP designs do not require a SW startup and
auto-start after programming. 90% of designs can program and then
immediately connect drivers.

> I'm glad you like DT support for FPGA images + drivers loading since I've
> been pushing that since April and have submitted several earlier versions
> of this patchset which included it.

Yes, I do, thank you for working on this!

> I think your arguements against the current patchset are a bit over
> the top here.

Perhaps, and I apologize. But as you know I've never liked the sysfs
interface, and I think Pavel is right. catting a file name just so you
can call request_firmware on that file in the kernel is a bad design.

The request_firmware interface should be for the DT overlay path, and
other schemes shouldn't use it. The name should come from the DT and
no place else.

> > Make it so linux can boot, automatically fetch the gpio-api1 bit file
> > and load it into the chip and then bind the GPIO driver to the FPGA
> > resource. All without userspace involvment, all potentially done
> > before calling INIT.
> 
> Where would the fpga image be fetch from?  What is the mechanism for
> doing that?  Do we need to reqire everybody to do that or can
> firmware be one of the available mechanisms?

I see there are three options:
 1) The bootloader programs the FPGA and passes a DT to the kernel
that completely describes the hard and soft logic, the kernel
doesn't need FPGA bitfile management code or other special
support.

Obviously things like reprogram on resume have to be done by the
bootloader, there is no handoff here where the kernel takes over
control of the bitfile programming. I think that is fine, #2/#3
are better choices for most people anyhow.
 2) The bootloader starts the kernel and passes a DT that describes
the hard logic and soft logic using a scheme like I outlined. The
kernel is responsible to request_firmware the bitfile and start
the FPGA and connect the drivers.

DT purists will rightly point out that this isn't great, but it
is a simple and pragmatic solution.

This probably falls out automatically if #3 is implemented..
 3) The bootloader starts the kernel with a DT that only describes the
hard logic.

Userspace must load a DT overlay early on. Otherwise proceeds like #2.

In both 2 and 3 the FPGA can be reprogrammed on resume by re-doing
request_firmware.

And for folks that need more control, expose all the knobs explicitly
to user space:
  4) There is a /dev/ node for the fpga that allows
 - ioctl to program via mmap'd file (no request_firmware)

   The ioctl has an option for the kernel to hang on to the mmap
   for the repogram on resume case.
 - ioctl to get status/error reporting/etc
 - ioctl to load/bind a DT overlay to the FPGA instance
   This provides a gap where userspace can boot and configure the
   FPGA internals before binding the kernel drivers.
   (end result is the same as #3 case, but no request_firmware)
 - (future) ioctl to load a swappable FPGA (what Alan is talking
   about)

The key thing is that we have a FPGA object that can be associated
with some DT element, and the kernel can then keep track if the FPGA
is is in use or not. Exactly like you said in your reply.

Otherwise, for the /dev/ case the FPGA becomes unuse once the FD is
closed if it wasn't attached to an overlay. This is at least the start
of

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-13 Thread Jason Gunthorpe

On Tue, Jan 13, 2015 at 04:28:47PM +, One Thousand Gnomes wrote:

> There is a lot of code overlap in things like loading the bitstreams,
> there is also some overlap because you want to be able to assign FPGA
> resources. For example if you have four FPGAs and you want to use one for
> OS stuff (say video) you want the other three to be open/close
> accessible, but if you've not got a video driver loaded and are running
> the same board headless you'd like all four to be handed out as normal
> resources.
>
> So IMHO it's no different to say kmalloc. We don't pre-empt kernel memory
> and give it to users but we don't reserve memory for kernel and not use
> it.

I do agree with this, and I think this is where this patch set goes so
wrong.

Just exposing all sorts of controls to userspace and having a way for
the kernel to load/unload a bitstream compleletely misses the point that
the two main ways to use FPGAs *require* some kind of kernel side
driver suppport on the FPGA.

Even if it is just a maths FPGA kernel side has to be involved in some
way to restrict DMAs, MMIO, by the user process. Obviously a FPGA that
uses kernel drivers needs kernel side help to bind those drivers.

Plonking /sys/class/fpga_manager//reset for either case is more
likely than not to completely crash the system.

I would look at this problem from a completely different angle - the
90% case is a single FPGA that is loaded to provide HW that Linux
drivers bind to, these days the majority use DT.

So..

soc
{
zynq_cfg: ps7-dev-cfg@f8007000 { compatible = "xlnx,zynq-devcfg-1.0"; }

fpga@pl {
   compatible = "..";
   fpga,api = "gpio-api1";
   fpga,controller = &zynq_cfg;

   gp0_axi: axi@4000 {
 gpio3: klina_gpio@80010 {
 #gpio-cells = <2>;
 compatible = "linux,basic-mmio-gpio";
 gpio-controller;
 reg-names = "dat", "set", "dirin";
 reg = <0x80010 4>,
   <0x80014 4>,
   <0x80018 4>;
 };
   }
}  
}

Make it so linux can boot, automatically fetch the gpio-api1 bit file
and load it into the chip and then bind the GPIO driver to the FPGA
resource. All without userspace involvment, all potentially done
before calling INIT.

Make it so via sysfs we can reverse this process and reboot the FPGA.

Make it so userspace can't do something to the FPGA without first
unloading the drivers.

Then worry about how to change the FPGA, or providing more user space
control over the process (DT overlays are a natural answer).

Providing a roll-your-own approach via a bunch of sysfs controls
satisfies the 'make the HW work' need without actually providing the
overall architecture someone needs to *make use* of this mechanism.

And no, this isn't just a 'theory', I have 4 shipping products today
that work pretty much exactly like this, including a Zynq
design. Binding kernel drivers to a FPGA once it is loaded is a big
functionality gap, and an annoying problem. Programming the actual
FPGA? That is the *EASIEST* part of this whole thing!

Honestly, I'd much rather see the above use case solved properly
without sysfs support and then go from there to build a replacable
FPGA scheme on the same concepts.

Providing a 'build your own' solution with sysfs controls just seems
to completely miss the mark to me.

> So IMHO it's no different to say kmalloc. We don't pre-empt kernel memory
> and give it to users but we don't reserve memory for kernel and not use
> it.

I think we see a kernel side with more structure, that knows when the
FPGA is in use or not, there is a pretty clear path to later add a
/dev/ scheme for user space use that can properly interlock.

A /dev/ scheme doesn't make much sense to bind kernel drivers...

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-12 Thread Jason Gunthorpe

On Mon, Jan 12, 2015 at 09:01:34PM +, One Thousand Gnomes wrote:
> There are plenty of people today who treat the FPGA as an entirely
> dynamic resource. It's not like flashing a controller, its near
> immediate.

But this is a completely different use case. Remember, there are
*megabytes* of internal state in a FPGA, and it isn't really feasible
to dump/restore that state.

It is one thing to context switch a maths algorithm that is built to
be stateless, it is quite another to context switch between, say an
ethernet core with an operating Linux Net driver doing DMA and a maths
algorithm.

A DT overlay approach where the overlay has to be unloaded to 'free'
the FPGA makes alot of sense to me for the stateful kernel driver
environment, and open/close/etc makes alot of sense for the stateless
switchable userspace environment - other than sharing configuration
code, is there any overlap between these use cases

> Its completely dynamic and it will get more so as we switch from the
> painful world of VHDL and friends to high level parallel aware language
> compilers for FPGAs and everyone will be knocking up quick FPGA hacks.

Only for some users. In my world FPGAs are filled with bus interface
logic, ethernet controllers, memory controllers, packet processing
engines, etc. This is all incredibly stateful - both in the FPGA
itself, and in the Linux side w/ drviers. It certainly will not ever
work in the model you are talking about.

Even if the digital state could somehow be frozen, dumped and
restored, all the FPGA external interface logic has *ANALOG* state
that cannot ever be dump/restored. It just isn't feasible for that
class of application.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 08/10] tpm/tpm_i2c_stm_st33: Split tpm_i2c_tpm_st33 in 2 layers (core + phy)

2015-01-12 Thread Jason Gunthorpe

On Sat, Jan 10, 2015 at 12:20:29PM +0100, Christophe Ricard wrote:

For reference:

> -static int tpm_stm_i2c_remove(struct i2c_client *client)
> -{
> -   struct tpm_chip *chip =
> -   (struct tpm_chip *) i2c_get_clientdata(client);
> -
> -   if (chip)
> -tpm_chip_unregister(chip);
> -
> -   return 0;
> -}

Became:

> +static int st33zp24_i2c_remove(struct i2c_client *client)
> +{
> + void *tpm_data = i2c_get_clientdata(client);
> +
> + if (tpm_data)
> + st33zp24_remove(tpm_data);
> +
> + return 0;
> +}

The value of i2c_get_clientdata hasn't/can't be changed, it must be
the tpm_chip, so tpm_data should be tpm_chip *, not void *.

The 'if (tpm_data)' is an anti-pattern, please remove it

Same comment applies to patch 9

> +int st33zp24_remove(void *tpm_data)
> +{
> + struct st33zp24_dev *tpm_dev = (struct st33zp24_dev *)tpm_data;
> + struct tpm_chip *chip = tpm_dev->chip;
> +
> + if (chip)
> + tpm_chip_unregister(chip);
> +
> + return 0;
> +}

Now this looks wrong. st33zp24_dev is the TPM_VPRIV of the chip, but
tpm_data is the tpm_chip already, so that cast is surely wrong.

Again, remove the 'if (chip)' anti-pattern.

> +static struct st33zp24_phy_ops i2c_phy_ops = {
> +   .send = st33zp24_i2c_send,
> +   .recv = st33zp24_i2c_recv,
> +};

Should be 'static const struct', same comment for patch 9

> +   tpm_dev = devm_kzalloc(dev, sizeof(struct st33zp24_dev),
> +  GFP_KERNEL);
> +   if (!tpm_dev)
> +   return -ENOMEM;
> +
> +   chip = tpmm_chip_alloc(dev, &st33zp24_tpm);
> +   if (IS_ERR(chip))
> +   return PTR_ERR(chip);

It is idomatic in the TPM drivers for the tpmm_chip_alloc to be first,
then the priv allocation to be second. Someday we might merge the two
calls.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 00/10] st33zp24 new architecture proposal and st33zp24 spi driver

2015-01-12 Thread Jason Gunthorpe

On Sat, Jan 10, 2015 at 12:20:21PM +0100, Christophe Ricard wrote:
> Hi,
> 
> The following patchset:
> - clean the current tpm_i2c_stm_st33 driver
> - propose a new architecture allowing to share a core st33zp24 data management
> layer with different phy (i2c & spi). For st33zp24 both phy have a 
> proprietary transport
> protocol. Both are relying on the TCG TIS protocol. At the end, it simplifies 
> the maintenance.
> - Add an spi phy allowing to support st33zp24 using with an SPI bus.

I read through this and not much stood out, just one comment on patch 8

Patch 4 and 6 do not seem to be part of this series and seem to be
bug fixes, the should be seperated so Peter can fast track them
into the kernel.

So excluding patch 8:

Reviewed-By: Jason Gunthorpe 

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/4] fpga manager: add sysfs interface document

2015-01-12 Thread Jason Gunthorpe

On Sun, Jan 11, 2015 at 10:29:00AM -0600, atull wrote:
> the FPGA image.  If someone wants there to be only one FPGA image on
> the FGPA forever, they will probably not be using this framework; their
> FPGA will probably be loaded before Linux boots up.

Nonsense, loading the FPGA through Linux is much better, it avoids
having to deal with this complexity in the boot loader and means that
Linux can be used to locate the huge image in some kind of sensible
filesystem, log failures, do any FPGA startup sequence/etc.

With hotplug and DT I find this this works very well. The FPGA devices
simply are not registered with the kernel until userspace gives the OK
(in future a DT overload can handle this)

If you keep with the notion that the DT overload specifies the
matching FPGA firmware then it makes alot more sense to me to use DT
overlays as the API to change the file name - not sysfs.

To reload a FPGA, unload the DT overlay (this would have to disconnect
all the drivers), de-program the FPGA, the load a new DT overlay,
which reprograms and re-binds the new drivers. All those steps have to
be done anyhow, you can't just swap the HW while drivers are attached.

.. and if there are no drivers then Alan is right, this is the wrong
interface for the 'FPGA as a user space co-processor' model.

I would also re-iterate my earlier comments: requiring the whole FPGA
image to be held in ram makes this useless for any of my applications.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 2/4] PCI: generic: Add support for ARM64 and MSI(x)

2014-10-23 Thread Jason Gunthorpe

On Thu, Oct 23, 2014 at 12:27:31PM +0100, Lorenzo Pieralisi wrote:

> I think that by removing that, we could switch to CONFIG_PCI_DOMAINS_GENERIC
> on ARM32. I will remove the dependency in drivers/pci/host/pci-mvebu.c
> introduced by commit 2613ba48. pci_sys_data.domain is always 0 in that
> driver so its usefulness is doubtful, comments welcome, copied Jason in
> if he has comments.

pcie-mvebu is like all the other new drivers, each top level DT node
that introduces the interface should have a unique domain number. It
would be very strange (and currently unsupported by the driver) to
ever have more than 1 mvebu top level node in any DT.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 11/15] tpm/tpm_i2c_stm_st33: Replace wait_for_serirq_timeout by wait_for_tpm_stat

2014-10-14 Thread Jason Gunthorpe

On Tue, Oct 14, 2014 at 10:44:25PM +0200, Christophe RICARD wrote:

> > > + r =
> > > wait_event_interruptible_timeout(chip->vendor.read_queue,
> > > + check_locality(chip) >= 0,
> > > + timeout);
> > 
> > Can it use wait_for_stat?
> It actually cannot use wait_for_stat because wait_for_stat is waiting
> for a mask condition on the status register. Here we are addressing the
> TPM_ACCESS register.

It would be cleaner if wait_for_stat handled all sleep-for-irq
actions.

> > This seqence is racy, the reason the nuvoton driver has the counter is
> > because wake_up_interruptible only wakes sleeping threads, so the
> > driver has to use the counter to close the race between the enable_irq
> > and the actual sleep:
> > 
> > unsigned int cur_intrs = priv->intrs;
> > enable_irq(chip->vendor.irq);
> > rc = wait_event_interruptible_timeout(*queue,
> >   cur_intrs !=
> > priv->intrs, timeout);
> If my understanding is correct the counter prevent the case where the
> irq is raised before entering into the wait_event_interruptible_timeout.
> wait_for_tpm_stat will check before going into sleep the status
> register in order to make sure the condition is not already satisfied
> and should fulfill this requirement.

Yes, your analysis is correct - removing the extra I2C polling would
be the goal of using the counter rather than an I2C read.

> As i could get different kind of interruption i think i cannot avoid an
> i2c check here. The other solution would be to activate only the right
> interruption in the INT_ENABLE tis register but still with an i2c
> transaction.

But there is a problem with the wrong interruption no matter what, the
wait_event_ loop does not ever re-enable_irq() - any generated IRQ
must already be exactly the IRQ that is being waited for. Basically,
the driver must remain synchronized with the chip and it must know
what IRQs can be generated at any point.

When I read the driver I assumed this was already happening, and the
auditing was going to be done to make sure it is the case, hence my
comments on the idle state.

The advantage is clearly that many I2C transactions can be eliminated
by relying on the IRQ line to signal progress.

The alternative is to have the wait_event loops clear the pending bits
and re-enable the IRQ every time they go around, but this is more
transactions.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 00/15] ST33 I2C TPM driver cleanup

2014-10-14 Thread Jason Gunthorpe

On Mon, Oct 13, 2014 at 10:23:22PM +0200, Christophe Ricard wrote:

>   tpm/tpm_i2c_stm_st33: Update Kconfig in order to be inline to other
> similar product
>   tpm/tpm_i2c_stm_st33: Fix few coding style error reported by
> scripts/checkpatch.pl
>   tpm/tpm_i2c_stm_st33: Move tpm registers to tpm_i2c_stm_st33.c
>   tpm/tpm_i2c_stm_st33: Add new tpm_stm_dev structure and remove
> tpm_i2c_buffer[0], [1] buffer.
>   tpm/tpm_i2c_stm_st33: Remove reference to io_serirq
>   tpm/tpm_i2c_stm_st33: Replace err/rc/ret by r for a function return
> code
>   tpm/tpm_i2c_stm_st33: Replace tpm_st33_* function with tpm_stm_*
>   tpm/tpm_i2c_stm_st33/dts/st33zp24_i2c: Add DTS Documentation
>   tpm/tpm_i2c_stm_st33: Few code cleanup
>   tpm/tpm_i2c_stm_st33: Remove useless i2c read on interrupt registers
>   tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send
>   tpm/tpm_i2c_stm_st33: Change License header to have up to date address
> information
>   tpm/tpm_i2c_stm_st33: Increment driver version to 1.2.1.


You can add my

Reviewed-By: Jason Gunthorpe 

On the above in the series

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 11/15] tpm/tpm_i2c_stm_st33: Replace wait_for_serirq_timeout by wait_for_tpm_stat

2014-10-14 Thread Jason Gunthorpe

On Mon, Oct 13, 2014 at 10:23:33PM +0200, Christophe Ricard wrote:
> The tpm layer already provides a function to wait for a TIS event
> wait_for_tpm_stat. Exactly like wait_for_serirq_timeout, it can work in
> polling or interrupt mode.
> Instead of using a completion struct, we rely on the waitqueue read_queue
> and int_queue from chip->vendor field.


>  static int request_locality(struct tpm_chip *chip)
>  {
[..]

>   if (chip->vendor.irq) {
> - r = wait_for_serirq_timeout(chip, (check_locality
> -(chip) >= 0),
> -   chip->vendor.timeout_a);
> +again:
> + timeout = stop - jiffies;
> + if ((long) timeout <= 0)
> + return -1;

I don't see an enable_irq before this sleep:

> + r = wait_event_interruptible_timeout(chip->vendor.read_queue,
> + check_locality(chip) >= 0,
> + timeout);

Can it use wait_for_stat?

>  static int wait_for_stat(struct tpm_chip *chip, u8 mask, unsigned long 
> timeout,
> -  wait_queue_head_t *queue)
> +  wait_queue_head_t *queue, bool check_cancel)
>  {

So, I didn't audit the driver 100%, but this new version has the flow

- enable_irq
- wait for irq
- clear irq

Which is conceptually fine (and efficient), but it requires the rest
of the driver to guarentee that the interrupt is consistent after
every interation with the TPM. Which for this driver I think means one
of two states:
 - No interrupt asserted
 - Interrupt asserted, and the TPM is ready to accept a command
Other states will cause longer command execution times, but not
malfunction.

For instance, it looks to me like request_locality might not maintain
that invariant.

Clearing old pending bits before starting an interaction is certainly
safer, but costs that extra I2C sequence.

> + tpm_dev = (struct tpm_stm_dev *)TPM_VPRIV(chip);
> +
> + if (chip->vendor.irq)
> + enable_irq(chip->vendor.irq);
> +
> + r = wait_for_tpm_stat(chip, mask, timeout, queue, check_cancel);

This seqence is racy, the reason the nuvoton driver has the counter is
because wake_up_interruptible only wakes sleeping threads, so the
driver has to use the counter to close the race between the enable_irq
and the actual sleep:

unsigned int cur_intrs = priv->intrs;
enable_irq(chip->vendor.irq);
rc = wait_event_interruptible_timeout(*queue,
  cur_intrs != priv->intrs,
  timeout);

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 08/15] tpm/tpm_i2c_stm_st33: Add devicetree structure

2014-10-14 Thread Jason Gunthorpe

On Mon, Oct 13, 2014 at 10:23:30PM +0200, Christophe Ricard wrote:
> Add tpm_stm_st33_i2c dts structure keeping backward compatibility
> with static platform_data support as well.
> In the mean time the code is made much simpler by:
> - Moving all gpio_request to devm_gpio_request_one primitive

> - Moving request_irq to devm_request_threaded_irq

This should move to patch 11, and it doesn't look like threaded_irq is
necessary anymore?

> + pr_err("Failed to retrieve lpcpd-gpios from dts.\n");

All these prints should be dev_err, I think, clinet->dev looks like it
must be valid at this point.

This is a general comment actually, all the pr_* prints look like they
have access to a struct device and should be dev_* prints.

> + /* GPIO request and configuration */
> + r = devm_gpio_request_one(&client->dev, gpio,
> + GPIOF_OUT_INIT_HIGH, "TPM IO LPCPD");
> + if (r) {
> + pr_err("Failed to request lpcpd pin\n");

Ditto

> + return -ENODEV;

Return r?

> + pdata = client->dev.platform_data;
> + if (pdata == NULL) {
> + pr_err("No platform data\n");
> + return -EINVAL;
> + }

It would be nice if the platform data was optional since it is now
the case that the only content (io_lpcd) is optional.

> + if (r) {
> + pr_err("%s : reset gpio_request failed\n", __FILE__);
> + return -ENODEV;

Ditto

>   tpm_dev = devm_kzalloc(&client->dev, sizeof(struct tpm_stm_dev),
>  GFP_KERNEL);
>   if (!tpm_dev) {
>   dev_info(&client->dev, "cannot allocate memory for tpm
>  data\n");

The print is redundant, kzalloc failure already prints lots of stuff

> + platform_data = client->dev.platform_data;
> + if (!platform_data && client->dev.of_node) {
> + r = tpm_stm_i2c_of_request_resources(chip);
> + if (r) {
> + pr_err("No platform data\n");

If we go down this branch then tpm_stm_i2c_of_request_resources
already printed, so this print is redundant

> + goto _tpm_clean_answer;
> + }
> + } else if (platform_data) {
> + r = tpm_stm_i2c_request_resources(client, chip);
> + if (r) {
> + pr_err("Cannot get platform resources\n");

Ditto

> + goto _tpm_clean_answer;
> + }
> + } else {
> + pr_err("tpm_stm_st33 platform resources not available\n");
> + goto _tpm_clean_answer;

Again, would be nice if platform data was optional

>   intmask |= TPM_INTF_CMD_READY_INT
> - |  TPM_INTF_FIFO_AVALAIBLE_INT
> - |  TPM_INTF_WAKE_UP_READY_INT
> - |  TPM_INTF_LOCALITY_CHANGE_INT
>   |  TPM_INTF_STS_VALID_INT
>   |  TPM_INTF_DATA_AVAIL_INT;

This hunk should also go to patch 11, I think..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 11/16] tpm/tpm_i2c_stm_st33: Remove useless i2c read on interrupt registers

2014-10-09 Thread Jason Gunthorpe

On Thu, Oct 09, 2014 at 07:35:07PM +0200, Christophe Henri RICARD wrote:
> Hi Jason,
> 
> > If your chip is sane like other TPMs the IRQ pin will *only* be
> > asserted while there is data pending in the out command FIFO,
> > reading the FIFO *should naturally clear the IRQ *and the acking
> > process may be entirely unnecessary and can be removed.

> I believe this assessment is wrong according to existing
> specifications and current implementation and I will explain you
> why:

It does sound like it is not how the ST33 chip operates.

> Our TPM is managing the TPM TIS registers Interrupt Enable and
> Interrupt Status.

The TIS register mapping was intended for environments where register
access is not expensive, and can be done from an ISR. I2C is not such
an environment, which is why other vendors are not using such a strict
mapping of the TIS registers.

That doesn't really change anything, it just means the driver has to
do more I2C transactions to manage this extra chip state.

> According to which one is triggered, the wait queue to wake up is
> different it could be chip->vendor.read_queue or
> chip->vendor.int_queue.

These different queues are not needed, the driver knows what it is
waiting for when it enables the interrupt handler, and can manipulate
the interrupt mask if necessary for special cases. Multiple queues
make more sense when the ISR can cheaply read the status register,
which is not true for I2C.

> I would point out as well the int_queue initialization in the
> i2c_nuvoton_probe which is never used in the tpm_i2c_nuvoton.c nor
> in any tpm core file.

Chip structure members that are not used in the core code are hold
overs from the ancient core design. Drivers ideally should not use
them, favoring their own structures in their driver private
allocation. Someday that will get cleaned up.

Don't get confused that int_queue and read_queue are in the global
chip, they have no special meaning, modern drivers should not use
them at all.

> However, I am not in favor to change to non-threaded irq unless I
> have a clear and convincing argument to do so.

The need for the udelay should be all the convincing required. The
reason for the udelay clearly shows the driver has a synchronization
problem - and sleeping to solve synchronization problem is rarely
correct.

This is especially true since I've already explained how to design so
everything is synchronous and solve the issue.

Again:

In TPM the interrupt is not delivering an asynchronous notification,
everything is synchronous to the driver state, ISRs happen only to
indicate completion of driver initiated actions.

This is why the synchronous flow I suggested is much safer and better,
the driver just can't get the situation where the IRQ cannot safely
run because the main driver thread has changed the TPM state.

To summarize, the flow for an interrupt wait becomes very
simple, ie to wait for command ready:

 - Do write to clear all interrupts
 - do read on command ready
 - if not ready then write to enable only the command ready occured interrupt
 - enable_irq
 - wait for irq w/ timer
 - do read on command ready
 - if not ready and no timeout, do write to clear all interrupts,
   enable_irq, loop.
 - if not ready and timeout, disable irq, return error

All sleepables follow the same synchronous pattern. This makes the
driver compeltely single threaded so no analysis for thread problems
is need. No locking primitives are needed. The inter-locking problem
with request_locality goes away.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 07/16] tpm/tpm_i2c_stm_st33: Add devicetree structure

2014-10-08 Thread Jason Gunthorpe

On Wed, Oct 08, 2014 at 11:14:40PM +0200, Christophe Henri RICARD wrote:

> For your information, the lpcpd pin is named like this as per
> PCClientTPMSpecification is also known as TPMSTB on this product.
> The TPMSTB pin can be used in order to support D1 and D2 power
> management state.  I am actually sorry to refer to Microsoft website
> here ;) :

So, having the option to enter D1/D2 through that pin in the driver
seems fine, and it should just be enabled if the driver is told what
that pin is. No need for a module parameter, and no need for the
driver to fail probe if it isn't told how to access that GPIO.

However.. suspend/resume are called from the broader PM framework and
a platform might arrange things so the TPM is put into a deeper sleep
than D1/D2 (ie entirely powered off because the CPU was put into
suspend-to-ram and the peripheral power converter was switched off)

Thus the driver really must save the TPM state at suspend, and if the
volatile state was wiped after resume then it must be restored from
the saved state.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 07/16] tpm/tpm_i2c_stm_st33: Add devicetree structure

2014-10-08 Thread Jason Gunthorpe

On Wed, Oct 08, 2014 at 10:41:36PM +0200, Christophe Henri RICARD wrote:
> Hi Jason,
> 
> I have currently seen both. I agree on the principles as it is
> simplifying the code a little bit.  I including this clean up in
> this patch Add devicetree structure for a future v2 submission

I was just looking at what the remaining lpcpd GPIO does, and..

1) I think you can probably just make this optional, if it is not
   specified in the DT or through platform data, just don't fiddle with
   it. This makes the driver much friendlier - ie it can be manually
   attached via sysfs for instance.

2) This looks fishy:

static int tpm_st33_i2c_pm_suspend(struct device *dev)
if (power_mgt) {
gpio_set_value(pin_infos->io_lpcpd, 0);
} else {
ret = tpm_pm_suspend(dev);
}

   I can't think of a reason why the driver wouldn't need to call
   tpm_pm_suspend to save the internal state? Ditto for restore?

   I'm also suspicious of the power_mgt module option, that should
   probably go away entirely?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 10/16] tpm/tpm_i2c_stm_st33: Replace wait_for_serirq_timeout by wait_for_tpm_stat

2014-10-08 Thread Jason Gunthorpe

On Wed, Oct 08, 2014 at 10:26:57PM +0200, Christophe Henri RICARD wrote:
> Hi Jason,
> 
> Patch 10/16 includes the following lines:
> + if (r == -ERESTARTSYS && freezing(current)) {
> + clear_thread_flag(TIF_SIGPENDING);
> + goto again;
> + }
> 
> Line 149 in this patch. I am surprise you are saying it does not occur.

Okay, I see it now, it is in request_locality, not get_burstcount

And I see this code fragment is copied from commit 20b87bbfada, so it
seems reasonable - though I wonder if all wait_event's need similar
protection?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 11/16] tpm/tpm_i2c_stm_st33: Remove useless i2c read on interrupt registers

2014-10-08 Thread Jason Gunthorpe

On Wed, Oct 08, 2014 at 07:36:04AM +0200, Christophe RICARD wrote:

> I believe this is completely 2 different things. The delay before the
> release locality is to make sure that the isr will be service before
> release_locality to guarantee
> that any pending interrupt are cleared while the locality is active.
> Here i just want to save 2 i2c transaction as only 1 write is enough
> to get the register change as effective.

I think I follow the interrupt changes a bit better now..

First of all, things are spread out too much, ie patch 10 makes the
actual ISR handler change but patch 13 corrects the locking bug
introduced in patch 10, and patch 7 switches to the threaded irq
required by patch 13 - this should all be in the same patch.

Second, I don't think switching to threaded IRQs and then using I2C
transactions in the handler is a great idea. I think you should follow
the pattern in the nuvoton driver:

The IRQ handler simply records the interrupt occured, triggers a WQ
and disables the IRQ:

static irqreturn_t i2c_nuvoton_int_handler(int dummy, void *dev_id)
{
struct tpm_chip *chip = dev_id;
struct priv_data *priv = chip->vendor.priv;

priv->intrs++;
wake_up(&chip->vendor.read_queue);
disable_irq_nosync(chip->vendor.irq);
return IRQ_HANDLED;
}

The ops explicitly enables the IRQ before sleeping waiting on the
output FIFO:

if (chip->vendor.irq && queue) {
s32 rc;
struct priv_data *priv = chip->vendor.priv;
unsigned int cur_intrs = priv->intrs;

enable_irq(chip->vendor.irq);
rc = wait_event_interruptible_timeout(*queue,
  cur_intrs != priv->intrs,
  timeout);
if (rc > 0)
return 0;

For your chip you might need to ack the IRQ before writing a new
command to the input FIFO.

1) This gets rid of the udelay, the IRQ doesn't do anything so any
   acks can be sequenced properly with the request_locality
2) This gets rid of the locking, the IRQ doesn't attempt to ack, and
   acks can be sequenced before any enable_irq

If your chip is sane like other TPMs the IRQ pin will only be asserted
while there is data pending in the out command FIFO, reading the FIFO
should naturally clear the IRQ and the acking process may be entirely
unnecessary and can be removed. If you have to ack via that weird
read/write sequence then it should always be done before submitting a
new command to the input fifo.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 07/16] tpm/tpm_i2c_stm_st33: Add devicetree structure

2014-10-08 Thread Jason Gunthorpe

On Wed, Oct 08, 2014 at 07:47:58AM +0200, Christophe RICARD wrote:
> >>+   if (interrupts) {
> >>+   r = devm_gpio_request_one(&client->dev, pdata->io_serirq,
> >>+   GPIOF_IN, "TPM IO_SERIRQ");
> >Similarly, I wonder if pdata->io_serirq is just duplication of
> >client->irq and that should be set by the creator instead?

> pdata->io_serirq stores the gpio number which will be converted into
> irq number.  pdata->io_serirq is only use by static platform
> configuration not devicetree configuration

Right, but the driver never uses it as a GPIO, so accepting a GPIO is
actually less flexible - a platform may connect the TPM to a dedicated
IRQ pin, for instance.

The creator should just specify the irq in client->irq, however that is
typically done..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 14/16] tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send

2014-10-08 Thread Jason Gunthorpe

On Wed, Oct 08, 2014 at 09:38:39AM +0200, Christophe Ricard wrote:
>I would just add an additional comment to my previous message.
>After reviewing Documentation/stable_kernel_rules.txt, it looks like it
>is *not* necessary to send this patch marked as stable because of the
>following rule:
>"- It must fix a real bug that bothers people (not a, "This could be a
>problem..." type thing)."
>As argued previously this patch is more an algorithm fix luckily never
>seen with the current ST33 I2C TPM devices.
>I looks like a "This could be a problem..."
>Do you agree with this ?

Yes, please clarify your patch description with something like:

'This never happens in practice because the burst count is XXX bytes
and the largest TPM command is YYY bytes.'

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 10/16] tpm/tpm_i2c_stm_st33: Replace wait_for_serirq_timeout by wait_for_tpm_stat

2014-10-08 Thread Jason Gunthorpe

On Wed, Oct 08, 2014 at 07:38:30AM +0200, Christophe RICARD wrote:
>  Hi Jason,
> 
> The freezer header is here for the freezing(current) function call
> in get_burstcount().

But freezing(current) does not occur in patch 10/16.

Is this include in the right patch in the series?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 07/16] tpm/tpm_i2c_stm_st33: Add devicetree structure

2014-10-07 Thread Jason Gunthorpe

On Tue, Oct 07, 2014 at 10:03:00PM +0200, Christophe Ricard wrote:

> +_irq_probe:
> + /* IRQ */
> + r = irq_of_parse_and_map(pp, 0);
> + if (r < 0) {
> + pr_err("Unable to get irq, error: %d\n", r);
> + interrupts = 0;
> + goto _end;
> + }
> + interrupts = 1;
> + client->irq = r;

client->irq is already set correctly by of_i2c_register_devices - so I
don't think this sequence is necessary?

> + if (interrupts) {
> + r = devm_gpio_request_one(&client->dev, pdata->io_serirq,
> + GPIOF_IN, "TPM IO_SERIRQ");

Similarly, I wonder if pdata->io_serirq is just duplication of
client->irq and that should be set by the creator instead?

> +#ifdef CONFIG_OF
> +static const struct of_device_id of_st33zp24_i2c_match[] = {
> + { .compatible = "st,st33zp24_i2c", },
> + {}
> +};

missing:

MODULE_DEVICE_TABLE(of, of_st33zp24_i2c_match);

> + #ifdef CONFIG_OF
> + .of_match_table = of_match_ptr(of_st33zp24_i2c_match),
> + #endif

The #ifdef is unnecessary, the of_match_ptr macro takes care of that
already.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 00/16] ST33 I2C TPM driver cleanup

2014-10-07 Thread Jason Gunthorpe

On Tue, Oct 07, 2014 at 10:02:53PM +0200, Christophe Ricard wrote:
> Hi Peter,
> 
> The following patchset brings:
> - Some few code clean up from code style up to structure
> - Device tree support keeping static platform data configuration support.
> - Fix irq support.
> - Update the GPLv2 license header

I really don't have much to say about this - it looks like the older
driver was never being used so if you have reviewed and tested it
again in a new system I think that is an improvement..

Can you put this on a github or something? It would be nice to see the
file new .c file rather than looking at patches..

>   tpm/tpm_i2c_stm_st33: Move tpm registers to tpm_i2c_stm_st33.c
>   tpm/tpm_i2c_stm_st33: Add new tpm_stm_dev structure and remove
> tpm_i2c_buffer[0], [1] buffer.

I'm very happy to see this, this is much better than the old way.

>   tpm/tpm_i2c_stm_st33: Replace err/rc/ret by r for a function return
> code

This is very churny, and I didn't think 'r' was a kernel convention..

>   tpm/tpm_i2c_stm_st33: Replace tpm_st33_* function with tpm_stm_*

What is the part branding? We still have 'st33' in various places..

>   tpm/tpm_i2c_stm_st33: Add devicetree structure
>   tpm: dts: st33zp24_i2c: Add DTS Documentation

Peter: FYI, the protocol for device tree stuff is to wait for an ack
from the device tree maintainers on the documentation patch..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 10/16] tpm/tpm_i2c_stm_st33: Replace wait_for_serirq_timeout by wait_for_tpm_stat

2014-10-07 Thread Jason Gunthorpe

On Tue, Oct 07, 2014 at 10:03:03PM +0200, Christophe Ricard wrote:

> +++ b/drivers/char/tpm/tpm_i2c_stm_st33.c
> @@ -39,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

I'm surprised to see that include here..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 11/16] tpm/tpm_i2c_stm_st33: Remove useless i2c read on interrupt registers

2014-10-07 Thread Jason Gunthorpe

On Tue, Oct 07, 2014 at 10:03:04PM +0200, Christophe Ricard wrote:
> Remove useless i2c read on TPM_INT_ENABLE and TPM_INT_STATUS
> 
> Signed-off-by: Christophe Ricard 
>  drivers/char/tpm/tpm_i2c_stm_st33.c | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_i2c_stm_st33.c 
> b/drivers/char/tpm/tpm_i2c_stm_st33.c
> index e99bb78..660ff8b 100644
> +++ b/drivers/char/tpm/tpm_i2c_stm_st33.c
> @@ -187,7 +187,6 @@ static u8 clear_interruption(struct tpm_stm_dev *tpm_dev)
>  
>   I2C_READ_DATA(tpm_dev, TPM_INT_STATUS, &interrupt, 1);
>   I2C_WRITE_DATA(tpm_dev, TPM_INT_STATUS, &interrupt, 1);
> - I2C_READ_DATA(tpm_dev, TPM_INT_STATUS, &interrupt, 1);

Hum, I don't have a chip datasheet here, but is this really useless?

It looks like a synchronizing read to me - ie completing the read at
the CPU is enough to know that the TPM itself has processed the prior
write and is known to have lowered the level triggered interrupt..

A read like this is the sort of thing that you'd expect to avoid the
udelay in your 'Add delay before release_locality to make sure irq are
cleared'

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 14/16] tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send

2014-10-07 Thread Jason Gunthorpe

On Tue, Oct 07, 2014 at 10:03:07PM +0200, Christophe Ricard wrote:
> When sending data in tpm_stm_i2c_send, each loop iteration send buf.
> Send buf + i instead as the goal of this for loop is to send a number
> of byte from buf that fit in burstcnt. Once those byte are sent, we are
> supposed to send the next ones.

So, this driver never really worked? I'm guessing sending a larger
command (take ownership, for example) will exceed the burst count and
just blow up?

This should be marked for stable (please see
Documentation/stable_kernel_rules.txt)

Also, please make it the first patch in your series so it applies
cleanly to older kernels.

Reviewed-By: Jason Gunthorpe 

> Signed-off-by: Christophe Ricard 
>  drivers/char/tpm/tpm_i2c_stm_st33.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/char/tpm/tpm_i2c_stm_st33.c 
> b/drivers/char/tpm/tpm_i2c_stm_st33.c
> index 8d32ade..de9f12e 100644
> +++ b/drivers/char/tpm/tpm_i2c_stm_st33.c
> @@ -480,7 +480,7 @@ static int tpm_stm_i2c_send(struct tpm_chip *chip, 
> unsigned char *buf,
>   if (burstcnt < 0)
>   return burstcnt;
>   size = min_t(int, len - i - 1, burstcnt);
> - r = i2c_write_data(tpm_dev, TPM_DATA_FIFO, buf, size);
> + r = i2c_write_data(tpm_dev, TPM_DATA_FIFO, buf + i, size);
>   if (r < 0)
>   goto out_err;
>  
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 15/16] tpm/tpm_i2c_stm_st33: Add delay before release_locality to make sure irq are cleared

2014-10-07 Thread Jason Gunthorpe

On Tue, Oct 07, 2014 at 10:03:08PM +0200, Christophe Ricard wrote:
> In order to manage irq, locality must be active. As Status Ready
> interrupt is activated, when going back into ready state with the
> cancel function, we need to add a little delay to make sure the irq
> is going to be serviced before the release_locality is hit.

Using a delay for this seems pretty sketchy, is this supposed to be
waiting for a level sensitive IRQ to desassert? or a queued IRQ to be
delivered?

What is the harm without the udelay?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 13/16] tpm/tpm_i2c_stm_st33: Add tpm_lock mutex for safe irq management

2014-10-07 Thread Jason Gunthorpe

On Tue, Oct 07, 2014 at 10:03:06PM +0200, Christophe Ricard wrote:
> Adding tpm_lock mutex in order to guarantee that a i2c_read_data or a
> i2c_write_data will not get interrupted by a threaded interrupt.

Can you elaborate on this?

Any call from the TPM core itself through 'ops' is locked already, and
I don't see this driver's IRQ handler doing I2C ops..

What scenario is this protecting against?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [tpmdd-devel] [PATCH 01/16] tpm/tpm_i2c_stm_st33: Update Kconfig in order to be inline to other similar product

2014-10-07 Thread Jason Gunthorpe

On Tue, Oct 07, 2014 at 10:02:54PM +0200, Christophe Ricard wrote:
> STMicroelectronics i2c tpm is the only one to have a different tristate
> label.
> 
> Rename it "TPM Interface Specification 1.2 Interface (I2C - 
> STMicroelectronics)"

Looks fine, but why move the stanza? Are you sorting on the short
description? (vs config tag?)

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 1/4] pci:host: APM X-Gene PCIe host controller driver

2014-09-22 Thread Jason Gunthorpe

On Mon, Sep 22, 2014 at 04:09:03PM -0600, Bjorn Helgaas wrote:
> On Mon, Sep 22, 2014 at 3:33 PM, Tanmay Inamdar  wrote:
> 
> >>> +static void xgene_pcie_fixup_bridge(struct pci_dev *dev)
> >>> +{
> >>> + int i;
> >>> +
> >>> + /* Hide the PCI host BARs from the kernel as their content doesn't
> >>> +  * fit well in the resource management
> >>> +  */
> >>
> >> This needs a better explanation than "doesn't fit well."
> >>
> >> I *think* you're probably talking about something similar to the MVEBU
> >> devices mentioned here:
> >> http://lkml.kernel.org/r/caerspo56jb1bf2jtycgkxzbzqrf1jxfxgmeewpx_e6vsxue...@mail.gmail.com
> >>
> >> where the device can be configured as either an endpoint or a root port,
> >> and the endpoint BARs are still visible when configured as a root port.
> >>
> >
> > It is true that X-Gene PCIe port can be configured as EP, however the
> > the FIXUP is required not because of the BARs are still visible when
> > configured as EP in past. X-Gene PCIe port, when configured as RC,
> > uses BAR0-BAR1 of RC's configuration space as inbound BARs. Entire DDR
> > region is mapped in these BARs so that it is accessible for EP devices
> > for DMA. So if the fixup is not done during enumeration, whole
> > outbound memory resource space gets assigned to RC and nothing is left
> > EP devices.
> 
> I'm not familiar with the concept of an "inbound BAR"; at least I'm
> not aware of anything like this in the PCI specs.  I think it might
> reduce confusion if we avoided calling them "BARs".  They happen to be
> at the same addresses where real BARs would be, but they certainly
> don't behave like real BARs.

So what does the lspci look like?

If x-gene has an otherwise correct bridge config space and only the
0/1 BARs have a non-standard meaning, that I'd agree with Bjorn, hide
all access to these registers from the kernel (and tell your HW crew
to read the specs, because it is very clear what those registers are
supposed to do) .

If x-gene doesn't even have a bridge config space, then there are much
bigger problems, and these corrupt BARs are just the start.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v10 07/10] OF: Introduce helper function for getting PCI domain_nr

2014-09-10 Thread Jason Gunthorpe

On Tue, Sep 09, 2014 at 09:44:56PM -0500, Rob Herring wrote:
> On Tue, Sep 9, 2014 at 10:41 AM, Jason Gunthorpe
>  wrote:
> > On Tue, Sep 09, 2014 at 08:26:19AM -0600, Bjorn Helgaas wrote:
> >
> >> So as long as the DT tells you the ECAM information for each host
> >> bridge, that should be sufficient.  The domain number is then just a
> >> Linux convenience and is not tied to the platform as it is on ia64.
> >
> > I think this is right for DT systems - the domain is purely internal
> > to the kernel and userspace, it is used to locate the proper host
> > bridge driver instance, which contains the proper config accessor (and
> > register bases, etc).
> >
> > AFAIK the main reason to have a DT alias to learn the domain number is
> > to make it stable so things like udev/etc can reliably match on the
> > PCI location.
> 
> For what purpose?

udev places PCI D:B:D.F's all over the place, eg:

$ ls /dev/serial/by-path/
pci-:00:1a.0-usb-0:1.4.2.5:1.0@
pci-:00:1a.0-usb-0:1.4.2.6:1.0@
$ ls /dev/disk/by-path/
pci-:03:00.0-sas-0x5fcf0001-lun-0@
pci-:03:00.0-sas-0x5fcf0001-lun-0-part1@
pci-:03:00.0-sas-0x5fcf0001-lun-0-part2@
pci-:03:00.0-sas-0x5fcf0001-lun-0-part3@

domain number

It is part of the stable naming scheme udev uses - and that scheme is
predicated on the PCI location being stable.

> IMO they should not. We really want to move away from aliases, not
> expand their use. They are used for serial because there was no good
> way to not break things like "console=ttyS0". I2C I think was more
> internal, but may have been for i2c-dev. What are we going to break if
> we don't have consistent domain numbering? 

Well, DT needs some kind of solution to produce stable names for
things. It is no good if the names unpredictably randomize.

Lets use I2C as an example. My embedded systems have multiple I2C
busses, with multiple drivers (so load order is not easily ensured).
I am relying on DT aliases to force the bus numbers to stable values,
but if I don't do that - then how do I find things in user space?

$ ls -l /sys/bus/i2c/devices
lrwxrwxrwx1 root root0 Sep 10 16:12 i2c-2 -> 
../../../devices/pci:00/:00:01.0/i2c_qsfp.4/i2c-2

Oh, I have to search based on the HW path, which includes the domain
number.

So that *must* be really stable, or user space has no hope of mapping
real hardware back to an I2C bus number.

The same is true for pretty much every small IDR scheme in the kernel
- they rely on the HW path for stable names.

As an aside, I think for embedded being able to directly specify
things like the bus number for I2C, ethX for ethernet, etc is very
valuable, I would be sad to see that go away.

> DT, I'd rather see it as part of the PCI root node. But I'm not
> convinced it is needed.

> It doesn't really sound like we have any actual need to solve this for
> DT ATM. It's not clear to me if all buses should be domain 0 or a
> simple incrementing index for each bus in absence of any firmware set
> value.

There are already ARM DTs in the kernel that require multiple domains,
so that ship has sailed.

I don't think it really matters where the number comes from, so long
as it doesn't change after a kernel upgrade, or with a different
module load order, or if the DT is recompiled, or whatever. It needs
to be stable.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v10 07/10] OF: Introduce helper function for getting PCI domain_nr

2014-09-09 Thread Jason Gunthorpe

On Tue, Sep 09, 2014 at 08:26:19AM -0600, Bjorn Helgaas wrote:

> So as long as the DT tells you the ECAM information for each host
> bridge, that should be sufficient.  The domain number is then just a
> Linux convenience and is not tied to the platform as it is on ia64.

I think this is right for DT systems - the domain is purely internal
to the kernel and userspace, it is used to locate the proper host
bridge driver instance, which contains the proper config accessor (and
register bases, etc).

AFAIK the main reason to have a DT alias to learn the domain number is
to make it stable so things like udev/etc can reliably match on the
PCI location.

This is similar to i2c, etc that use the alias scheme, so IMHO
whatever they do to assign ID's to drivers should be copied for domain
numbers.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v10 07/10] OF: Introduce helper function for getting PCI domain_nr

2014-09-08 Thread Jason Gunthorpe

On Mon, Sep 08, 2014 at 04:59:31PM +0100, Liviu Dudau wrote:

> > I don't really understand how domains are used so it's hard to provide
> > a recommendation here. Do domains even belong in the DT?
> 
> ACPI calls them segments and the way Bjorn explained it to me at some moment 
> was
> that it was an attempt to split up a bus in different groups (or 
> alternatively,
> merge a few busses together). To be honest I haven't seen systems where the 
> domain
> is anything other than zero, but JasonG (or maybe Benjamin) were floating an
> idea of using the domain number to identify physical slots.

A PCI domain qualifies the bus:device.function addressing so that we
can have duplicate BDFs in the system.

So, yes, they belong in DT - each 'top level' PCI node naturally
represents a unique set of BDF addressing below it, and could alias
other top level nodes. The first step to resolve a BDF to a DT node is
to find the correct 'top level' by mapping the domain number.

In an ideal world no small scale system should ever have domains. They
were primarily introduced for large scale NUMA system that actually
have more than 256 PCI busses.

However, from a DT prespective, we've been saying that if the SOC
presents a PCI-E root port that shares nothing with other root ports
(ie aperture, driver instance, config space) then those ports must be
represented by unique 'top level' DT nodes, and each DT node must be
assigned to a Linux PCI domain.

IMHO, designing such a SOC ignores the API guidelines provides by the
PCI-E spec itself, and I hope such a thing won't be considered SBSA
compatible..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] ARM: zynq: DT: Add Ethernet phys

2014-08-29 Thread Jason Gunthorpe

On Fri, Aug 29, 2014 at 11:23:57AM -0700, Florian Fainelli wrote:
> On 08/29/2014 10:31 AM, Jason Gunthorpe wrote:
> > On Fri, Aug 29, 2014 at 08:35:36AM -0700, Sören Brinkmann wrote:
> > 
> >> The compatible string is listed as optional property for PHYs. So, not
> >> having one is an option, I guess. But, I'd also prefer to at least keep
> >> the -c22 one, since I saw problems when I tried using -c45 (the Zed phy
> >> should support it...).
> > 
> > -c45 and -c22 use a completely different MDIO protocol, Zed doesn't
> > have a 10GE port, so it certainly doesn't use -c45.
> 
> Most recent 1GbE PHYs should also implement clause 45. It is a nice
> improvement if you are using lot of transactions, otherwise clause 45
> over clause 22 is suitable and supported by the PHY library (for EEE in
> particular).

Oh, that is interesting, I haven't actually seen one of those yet..

Hum. So that is messy, even if the Zed phy supports the C45 format,
the macb driver (and by my reading, the Zynq hardware) lacks the
capability to generate C45 frames.

It could be supported, but you'd have to use the GPIO bitbang MDIO
driver to talk to the phy.

So... that makes the compatible string for the phy even more
confusing. 'Describe The HW' says it should have both c22 and c45
listed - however we don't have software support in Linux to negotiate
c22 and c45 support between the phy bus driver and attached phy :(

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] ARM: zynq: DT: Add Ethernet phys

2014-08-29 Thread Jason Gunthorpe

On Fri, Aug 29, 2014 at 08:35:36AM -0700, Sören Brinkmann wrote:

> The compatible string is listed as optional property for PHYs. So, not
> having one is an option, I guess. But, I'd also prefer to at least keep
> the -c22 one, since I saw problems when I tried using -c45 (the Zed phy
> should support it...).

-c45 and -c22 use a completely different MDIO protocol, Zed doesn't
have a 10GE port, so it certainly doesn't use -c45.

If there is no compatible string at all the defined default is to use
-c22.

> Also, so far, we haven't had any phy nodes in our Zynq dts files and
> Ethernet worked, so the auto-detection there works pretty well
> apparently. But it may be problematic if more than a single PHY is on
> the MDIO bus, I'd assume.

Phy autodetection has always worked in some cases, but for DT ethernet
bindings it is expected that there is an option to specify an explicit
MDIO bus, and a phy-handle phandle to point to the phy. The phy should
be explicitly called out with the fixed MDIO address specified to
avoid sketchy MDIO address auto probing.

This provides the framework to specify interrupts and register
overrides for the phy in the DT.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] ARM: zynq: DT: Add Ethernet phys

2014-08-25 Thread Jason Gunthorpe

On Fri, Aug 22, 2014 at 01:47:09PM -0700, Florian Fainelli wrote:

> >  - the ID based strings seem to be not needed since, IIUC, the core
> >reads the ID from the PHY and uses it, so I just left it out not
> >trying to figure out how to obtain the correct ID
> 
> It is not needed, but it is one way to specify a PHY device if you do
> not know what compatible string to use instead.

No, it is a way to specify a PHY device if the kernel can't auto probe
the Phy ID.

Last I checked, the kernel doesn't support plain text compatible
strings for phys - everything is driven on the phy id, either auto
probed or specified in the DT.

> >  - the marvell compatible strings are used in our vendor tree. They
> >aren't used anywhere but in our vendor tree. I though keeping them is
> >nice since it identifies the PHY fully. And in case that level of
> >detail is needed at some point it is already there.
> 
> And this is the recommended way to do it in case we ever need to key a
> software decision based on the hardware.

All compatible strings need to be documented.

.. and they need to encode more information than you get from the phy
id - die revsision, package option, functional options, voltage
codes. Etc.

.. and they actually need to be *right*

An example: The kernel reports 88E1318S for all four chips in that
family, AFAIK you have to read the package marking to figure out which
you have (it is the same die, with options switched on/off at
packaging time). People have already posted patches trying to
helpfully add a 'marvell,88E1318S' compatible string based on kernel
output. Except it is wrong, it isn't actually the '8S version in the
HW.

Even worse, Marvell has a whole series of socket compatible phys. Just
because the board the DT author looked at has a '318, doesn't mean
that every board ever made will. We've actually already been switching
between the 318 and 318S for production depending on which has part
availability.

Basically: don't try to override self-discoverable hardware in DT
without a really good reason.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] ARM: zynq: DT: Add Ethernet phys

2014-08-22 Thread Jason Gunthorpe

On Fri, Aug 22, 2014 at 09:31:20AM -0700, Sören Brinkmann wrote:

> > >  - the marvell compatible strings are used in our vendor tree. They
> > >aren't used anywhere but in our vendor tree. I though keeping them is
> > >nice since it identifies the PHY fully. And in case that level of
> > >detail is needed at some point it is already there.
> > 
> > DT is complex enough already, don't include useless compatible strings
> > in mainline - people will see and tend to blindly copy. Ideally your
> > vendor tree would use the standard property :|
> 
> There are different opinions on this. I also heard to just add such
> strings, so in case this level of differentiation becomes necessary at
> some later point, it is already existing.

In some cases, yes, and they should be docuemented in the binding, of
course.

But there is already a standard compatible string format to uniquely
identify a phy, so in this case another string is not justified, and
just creates confusion about what is correct.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] ARM: zynq: DT: Add Ethernet phys

2014-08-22 Thread Jason Gunthorpe

On Thu, Aug 21, 2014 at 08:49:19AM -0700, Sören Brinkmann wrote:

> So my thinkings:
>  - the compatible string with the -c22 is used and documented in the phy
>bindings, it should be there.

Yes

>  - the ID based strings seem to be not needed since, IIUC, the core
>reads the ID from the PHY and uses it, so I just left it out not
>trying to figure out how to obtain the correct ID

It is certainly optional, I added the property to solve some obscure
problems with probing, but I've noticed people using it more broadly.
I suspect it also speeds up booting a tiny bit.

>  - the marvell compatible strings are used in our vendor tree. They
>aren't used anywhere but in our vendor tree. I though keeping them is
>nice since it identifies the PHY fully. And in case that level of
>detail is needed at some point it is already there.

DT is complex enough already, don't include useless compatible strings
in mainline - people will see and tend to blindly copy. Ideally your
vendor tree would use the standard property :|

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/2] ARM: shmobile: r8a7790: add internal PCI bridge nodes

2014-06-23 Thread Jason Gunthorpe


> >> Hm, are you sure about that? I thought only PCI devices should have 
> >> it...
> 
> >Yes, pretty sure it's needed in both the host controller and the
> >devices.
> 
> I don't understand the case of the PCI devices, honestly.
> Shouldn't the "device_type" prop reflect the device's functionality
> rather than the bus where it's located?

It is confusingly named, but it is required on the host bridge OF node.

The spec says ' .. corresponding to a device that implements a PCI bus',
which includes the host bridge.

The key element is that it must be present on the node that introduces
the PCI 3 dword address encoding scheme, and then on all nodes below
it that use the 3 dword scheme.

Otherwise Linux common OF PCI code does not work properly.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] of: treat PCI config space as IORESOURCE_MEM type

2014-05-30 Thread Jason Gunthorpe

On Fri, May 30, 2014 at 02:41:17AM +0100, Liviu Dudau wrote:

> Agree, I'm only concerned that if this ECAM config space gets added to
> the list of pci_host_bridge windows it will be indistinguishable from
> IORESOURCE_MEM resources and pci_create_root_bus() will add it to the
> bus and allow devices present on that bus to be assigned addresses from
> that range. Which might not be what one wants for certain BARs.

I wouldn't worry about supporting config in ranges. ECAM is the
logical use for config ranges, but it isn't specified and probably
will never be.

Will's driver the is the only driver I've seen to support ECAM and it
didn't use ranges.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 03/18] PCI: designware: Configuration space should be specified in 'reg'

2014-05-29 Thread Jason Gunthorpe

On Thu, May 29, 2014 at 11:03:36AM -0500, Kumar Gala wrote:

> Just because the kernel doesn’t handle this is NO reason to change
> the way the DT works.

The OF specs do not specify how to process a config type ranges entry,
and we all mutually agreed that the only sane interpretation for such
a thing would be to describe an ECAM memory space so generic code
could potentially make use of it.

Since designware is not ECAM it should not use config ranges.

This has come up multiple times now, and the above is the consensus.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 0/3] Add support for PCI in AArch64

2014-05-21 Thread Jason Gunthorpe

On Wed, May 21, 2014 at 12:34:21PM +0100, Liviu Dudau wrote:
> On Wed, May 21, 2014 at 04:45:29PM +0530, Sunil Kovvuri wrote:
> > Hi Liviu,
> > 
> > Sorry for the trouble.
> > I got why 'res->parent' is not set in my case.
> > Basically my SR-IOV device has fixed resources, so resources will not
> > be allocated/assigned and hence parent resource is not set.
> > I will move the resource claiming to host controller driver as a fixup
> > so that parent resource hierarchy is set.
> > 
> > Thanks for the support.
> 
> Glad you worked out the cause for the problem. I will still at to my list of
> ToDo things to investigate resource parenting with my patchset.

We recently fixed some things in this area on mvebu. It is important
to ensure that the aperature in the host driver has a proper resource
associated with it, or the PCI core won't create sub resources.

commit 2613ba480fb7b40c67eea36d03c9946977828623
Author: Jason Gunthorpe 
Date:   Wed Feb 12 15:57:08 2014 -0700

PCI: mvebu: Call request_resource() on the apertures

It is typical for host drivers to request a resource for the aperture; once
this is done the PCI core will properly populate resources for all BARs in
the system.

With this patch cat /proc/iomem will now show:

  e000-efff : PCI MEM 
e000-e00f : PCI Bus :01
  e000-e001 : :01:00.0
    
Tested on Kirkwood.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Bjorn Helgaas 
Acked-by: Jason Cooper 

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 05e352889868..d3d1cfd51e09 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -101,7 +101,9 @@ struct mvebu_pcie {
struct mvebu_pcie_port *ports;
struct msi_chip *msi;
struct resource io;
+   char io_name[30];
struct resource realio;
+   char mem_name[30];
struct resource mem;
struct resource busn;
int nports;
@@ -672,10 +674,30 @@ static int mvebu_pcie_setup(int nr, struct pci_sys_data 
*sys)
 {
struct mvebu_pcie *pcie = sys_to_pcie(sys);
int i;
+   int domain = 0;
 
-   if (resource_size(&pcie->realio) != 0)
+#ifdef CONFIG_PCI_DOMAINS
+   domain = sys->domain;
+#endif
+
+   snprintf(pcie->mem_name, sizeof(pcie->mem_name), "PCI MEM %04x",
+domain);
+   pcie->mem.name = pcie->mem_name;
+
+   snprintf(pcie->io_name, sizeof(pcie->io_name), "PCI I/O %04x", domain);
+   pcie->realio.name = pcie->io_name;
+
+   if (request_resource(&iomem_resource, &pcie->mem))
+   return 0;
+
+   if (resource_size(&pcie->realio) != 0) {
+   if (request_resource(&ioport_resource, &pcie->realio)) {
+   release_resource(&pcie->mem);
+   return 0;
+   }
pci_add_resource_offset(&sys->resources, &pcie->realio,
sys->io_offset);
+   }
pci_add_resource_offset(&sys->resources, &pcie->mem, sys->mem_offset);
pci_add_resource(&sys->resources, &pcie->busn);
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type

2014-05-20 Thread Jason Gunthorpe

On Fri, May 16, 2014 at 10:53:33AM +0100, Will Deacon wrote:

> > Correct. Assume a PCI device uses PIO and DMA. It sends a DMA to main memory
> > and lets the CPU know about the data using a level (IntA as opposed to MSI)
> > interrupt. The CPU performs an outl() operation to an I/O port to let the
> > hardware know it has received the IRQ and the response of the outl() is
> > guaranteed to flush the DMA transaction: by the time the outl() completes
> > we know that the data in memory is valid because it is strongly ordered
> > relative to the DMA.

Keep in mind that the IntA message itself is going to flush the DMA,
no sane host bridge implementation should process the IntA until all
prior DMA writes are completed, just like MSI.

Also, legacy non-MSI interrupts are always sharable, so the ISR must
always start with a read of a device specific status reguster, which
will also flush any DMA writes.

The simplest common scenario to show synchronous outl is this:

void pci_isr()
{
   if (inl(status_reg) & INT_PENDING)
  outl(ACK_INT,status_reg);
}

Where the outl is not expected to complete at the CPU until the device
has lowered the level triggered interrupt line.

If outl is not synchronous then a spurious interrupt will be caused.

When converting a driver to MMIO you'd often have to do this:

void pci_isr()
{
   if (readl(status_reg) & INT_PENDING) {
  writel(ACK_INT,status_reg);
  readl(status_reg); // Synchronizing read, flushes write.
   }
}

Which is one of the software visible impacts of io vs mmio.

> Hmm, when you say `guaranteed to flush the DMA transaction', is that a PCI
> requirement? If so, whether or not that DMA data is then visible to the CPU
> is really specific to the host-controller implementation. It could easily be
> buffered somewhere between the host controller and memory, for example.

PCI has the producer/consumer ordering model as part of the
driving concept in the spec. Basically it wants to see the ordering
model preserved right to the driver code itself.

Realistically, way back, archs that couldn't do the synchronous IO
(like my old MIPS design) had to convert their drivers to MMIO and run
that way. It never worked 100% properly, or made sense to try an use an
async outl, even though some systems provided it :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 4/9] of: mtd: add documentation for the ONFI NAND timing mode property

2014-05-20 Thread Jason Gunthorpe

On Tue, May 20, 2014 at 09:30:33PM +0200, Boris BREZILLON wrote:

> AFAICT nothing, but the same goes for the ECC requirements, and we've
> recently added DT bindings to define these requirements.
> I'm not telling we should drop these ECC requirements bindings (actually
> I'm using them :-)), but what's different with the timings requirements ?

ECC requirements are almost always something that has to be matched to
the bootloader (since the bootloader typicaly reads the NAND to boot),
so it is sensible to put that in the DT

The timings are a property of the chip, and if they can be detected
they should be. IMHO, the main purpose of a DT property would be to
lower the speed if, for some reason, the board cannot support the
device's full speed.

> Indeed, I based it on the ONFI NAND timings mode model, but AFAIK
> (tell me if I'm wrong), it should work because most of the timings
> are min requirements.  This means, even if you provide slower
> signals transitions, the NAND will work as expected.

IIRC for ONFI a device must always work in the mode 0 timings, without
requiring a command?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type

2014-05-16 Thread Jason Gunthorpe

On Fri, May 16, 2014 at 10:57:36AM +0100, Will Deacon wrote:
> > AFAIK, to duplicate x86 semantics an outl/inl must spin the CPU until
> > it completes at the target, and the CPU must not pipeline outl/inl
> > operations: outl();  outl(); produces 1 IOWr TLP, waits for
> > completion, then produces another.
> 
> So that's the real question: Do we really need to duplicate x86 semantics
> for IO space accesses? If we do, then we need both strongly-ordered memory
> *and* a dsb in our accessors. That's not going to be much fun.

Yep, that is the real question.

It has been over 10 years since I last converted a driver from IO to
MMIO - but IIRC the completion timing became a software visible
difference.

The entire reasons that this funny non-posted outl exists in PCI is to
support software compatability. You could take an ISA device and stick
it on PCI and the driver timing would be completely unaltered.

Linux obviously works on systems where outl is posted, ARM currently,
for instance. I've also run on MIPS systems that completely lack the
ability to synchronize outl (and I recall having to convert all the
drivers to mmio on that system..)

Arnd is right though, I doubt anyone cares, using IO space has been
discouraged for a decade at least at this point.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type

2014-05-15 Thread Jason Gunthorpe

On Thu, May 15, 2014 at 04:34:30PM +0100, Will Deacon wrote:

> > How can a write be non-posted on the PCI bus if it's posted on AXI?
> 
> From the point-of-view of the CPU it would be posted, but the PCI bus would
> see an unposted write (so I imagine there would be write buffering at the
> host controller). However, I worry that I'm missing your point :)

It is worth being a bit careful with language here, from an AXI
perspective there is not really such thing as a posted write. 

All writes are explicitly ack'd upon 'completion', however the memory
type influences when that is allowed to happen.

For PCI IO writes the AXI memory type from the CPU must be 'Device
Non-bufferable' (AWCACHE = 0), which will require the AXI ACK to be
generated only once the PCI target returns an IOWr completion TLP.

For PCI Memory writes the AXI memory type from the CPU could be
'Device Non-bufferable' but it would be best if it is 'Device
Bufferable' (AWCACHE = 1).

The latter allows more performance by permitting any AXI bridge in the
path to ack the write early. This is as close as AXI gets to 'posted
writes'

It is very important that the page tables in the CPU properly select
the right AXI Memory Type for each space.

Somewhere there should be a table describing how the CPU page table
attributes map into AXI *CACHE/Memory Type signaling selectors.

Beyond that, as Will points out, a DSB as part of the outl might be
required to spin the cpu and prevent pipelining.

AFAIK, to duplicate x86 semantics an outl/inl must spin the CPU until
it completes at the target, and the CPU must not pipeline outl/inl
operations: outl();  outl(); produces 1 IOWr TLP, waits for
completion, then produces another.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Describing arbitrary bus mastering relationships in DT

2014-05-09 Thread Jason Gunthorpe

On Fri, May 09, 2014 at 03:16:33PM +0100, Dave Martin wrote:
> On Fri, May 02, 2014 at 12:17:50PM -0600, Jason Gunthorpe wrote:

> > I wonder if this might be a better naming scheme, I actually don't
> > really like 'slave' for this, it really only applies well to AXI style
> > unidirectional busses, and any sort of message-based bus architectures
> > (HT, PCI, QPI, etc) just have the concept of an initiator and target.
> > 
> > Since initiator/target applies equally well to master/slave buses,
> > that seems like better, clearer, naming.
> 
> Sure, I wouldn't have a problem with such a suggestion.  A more neutral
> naming is less likely to cause confusion.
> 
> > Using a nomenclature where
> >   'reg' describes a target reachable from the CPU initiator via the
> > natural DT hierarchy
> 
> I would say, reachable from the parent device node (which implies your
> statement).  This is consistent with the way ePAPR describes device-to-
> device DMA (even if Linux doesn't usually make a lot of use of that).

Trying to simplify, but yes, that is right..

> >   'initiator' describes a non-CPU (eg 'DMA') source of ops, and
> > travels via the path described to memory (which is the
> > target).
> 
> CPUs are initiators only; non-mastering devices are targets only.
> 
> We might want some terminology to distinguish between mastering 
> devices and bridges, both of which act as initiators and targets.

I was hoping to simplify a bit. What the kernel needs, really, is the
node that initates a transaction, the node that is the ultimate
completing target of that transaction and the path through all
intervening (transformative or not) components.

The fact a bridge is bus-level slave on one side and a bus-level
master on another is not relevant to the above - a bridge is not an
initiator and it is not a completing target.

> We could have a concept of a "forwarder" or "gateway".  But a bus
> may still be a target as well as forwarder: if the bus contains some
> control registers for example.  There is nothing to stop "reg" and
> "ranges" being present on the same node.

Then the node is a 'target', and a 'bridge'. The key is to carefully
define how the DT properties are used for each view-point.

> >   'upstream' path direction toward the target, typically memory.
> 
> I'm not keen on that, because we would describe the hop between /
> and /memory as downstream or upstream depending on who initiates the
> transaction.  (I appreciate you weren't including CPUs in your
> discussion, but if the termology works for the whole system it
> would be a bonus).

I'm just using the word 'upstream' as meaning 'moving closer to the
completing target'.

It isn't a great word, 'forward path' would do better to borrow
a networking term.

Then we have the 'return path' which, on message based busses is the
path a completion message from 'completing target' travels to reach
the 'initiator'.

They actually can be asymmetric in some situations, but I'm not sure
that is worth considering in DT, we can just assume completions travel
a reverse path that hits every transformative bridge.

> >   'upstream-bridge' The next hop on a path between an initiator/target
> 
> Maybe.  I'm still not sure quite why this is considered different
> from the downward path through the DT, except that you consider
> the cross-links in the DT to be "upward", but I considered them
> "downward" (which I think are mostly equivalent approaches).
> 
> Can you elaborate?

Let us not worry about upstream/downstream and just talk about the
next bridge on the 'forward path' toward the 'completing target'.

> > But I would encourage you to think about the various limitations this
> > still has
> >  - NUMA systems. How does one describe the path from each
> >CPU to a target regs, and target memory? This is important for
> >automatically setting affinities.
> 
> This is a good point.
> 
> Currently I had only been considering visibility, not affinity.
> We actually have a similar problem with GIC, where there may
> be multiple MSI mailboxes visible to a device, but one that is
> preferred (due to being fewer hops away in the silicon, even though
> the routing may be transparent).

Really, the MSI affinity isn't handled by the architecture like x86?
Funky.
 
> We could describe a whole separate bus for each CPU, with links
> to common interconnect subtrees downstream.  But that might involve
> a lot of duplication.  Your example below doe

Re: [PATCH] of: Add of_device_destroy_children() function

2014-05-08 Thread Jason Gunthorpe

On Thu, May 08, 2014 at 06:37:49PM +0200, Sylwester Nawrocki wrote:
> This patch adds a helper function to unregister devices which
> were created by an of_platform_populate() call. The pattern
> used here can already be found in multiple drivers. This helper
> can now be used instead of repeating similar code in drivers.

I have a driver that does this as well, and what I found is that the
remove must be in reverse order from the create or things explode, and
that assumes the DT is topologically sorted according to dependency
(so no deferred probe).

AFAIK, there is no analog to deferred probe for removal, and
attempting to remove, say, a GPIO driver while an I2C bit bang is using
it just fails.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/17] pci: host: pcie-dra7xx: add support for pcie-dra7xx controller

2014-05-06 Thread Jason Gunthorpe

On Tue, May 06, 2014 at 07:03:51PM +0530, Kishon Vijay Abraham I wrote:
> +Example:
> +pcie@5100 {
> + compatible = "ti,dra7xx-pcie";
> + reg = <0x51002000 0x14c>, <0x5100 0x2000>;
> + reg-names = "ti_conf", "rc_dbics";
> + interrupts = <0 232 0x4>, <0 233 0x4>;
> + #address-cells = <3>;
> + #size-cells = <2>;
> + device_type = "pci";
> + ti,device_type = <3>;
> + ranges = <0x0800 0 0x20001000 0x20001000 0 0x2000  /* 
> Configuration Space */

Configuration space should not show up in the ranges, please don't
copy that mistake from other drivers, put it in reg.

> + interrupt-map-mask = <0 0 0 0>;
> + interrupt-map = <0x0 0 &gic 134>;

The HW cannot decode INTA/B/C/D?

> +#define  PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI  0x0034
> +#define  PCIECTRL_DRA7XX_CONF_IRQENABLE_SET_MSI  0x0038
> +#define  INTABIT(0)
> +#define  INTBBIT(1)
> +#define  INTCBIT(2)
> +#define  INTDBIT(3)
> +#define  MSI BIT(4)
> +#define  LEG_EP_INTERRUPTS (INTA | INTB | INTC | INTD)

Oh, it can, it would be wise to export this from the driver. Look at
the latest patches from Srikanth Thokala for the Xilinx PCI driver to
see how this should look

> +static int dra7xx_pcie_establish_link(struct pcie_port *pp)
> +{
> + u32 reg;
> + int retries = 1000;
> + struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pp);
> +
> + if (dw_pcie_link_up(pp)) {
> + dev_err(pp->dev, "link is already up\n");
> + return 0;
> + }
> +
> + reg = dra7xx_pcie_readl(dra7xx->base, PCIECTRL_DRA7XX_CONF_DEVICE_CMD);
> + reg |= LTSSM_EN;
> + dra7xx_pcie_writel(dra7xx->base, PCIECTRL_DRA7XX_CONF_DEVICE_CMD, reg);
> +
> + while (--retries) {
> + reg = dra7xx_pcie_readl(dra7xx->base,
> + PCIECTRL_DRA7XX_CONF_PHY_CS);
> + if (reg & LINK_UP)
> + break;
> + usleep_range(10, 20);
> + }
> +
> + if (retries <= 0) {
> + dev_err(pp->dev, "link is not up\n");
> + return -ETIMEDOUT;
> + }
> +
> + return 0;
> +}

It would be really nice to see the link bring up process live in the
PCI core, every driver seems to have its own take on this.

The PCI-E spec requires a 100ms delay after link bring up (aka hot
reset) before sending any configuration TLPs.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Describing arbitrary bus mastering relationships in DT

2014-05-02 Thread Jason Gunthorpe

On Fri, May 02, 2014 at 06:31:20PM +0100, Dave Martin wrote:

> Note that there is no cycle through the "reg" property on iommu:
> "reg" indicates a sink for transactions; "slaves" indicates a
> source of transactions, and "ranges" indicates a propagator of
> transactions.

I wonder if this might be a better naming scheme, I actually don't
really like 'slave' for this, it really only applies well to AXI style
unidirectional busses, and any sort of message-based bus architectures
(HT, PCI, QPI, etc) just have the concept of an initiator and target.

Since initiator/target applies equally well to master/slave buses,
that seems like better, clearer, naming.

Using a nomenclature where
  'reg' describes a target reachable from the CPU initiator via the
natural DT hierarchy
  'initiator' describes a non-CPU (eg 'DMA') source of ops, and
travels via the path described to memory (which is the
target).
  'path' describes the route between an intitator and target, where
bridges along the route may alter the operation.
  'upstream' path direction toward the target, typically memory.
  'upstream-bridge' The next hop on a path between an initiator/target

But I would encourage you to think about the various limitations this
still has
 - NUMA systems. How does one describe the path from each
   CPU to a target regs, and target memory? This is important for
   automatically setting affinities.
 - Peer-to-Peer DMA, this is where a non-CPU initiator speaks to a
   non-memory target, possibly through IOMMUs and what not. ie
   a graphics card in a PCI-E slot DMA'ing through a QPI bus to
   a graphics card in a PCI-E slot attached to a different socket.

These are already use-cases happening on x86.. and the same underlying
hardware architectures this tries to describe for DMA to memory is at
work for the above as well.

Basically, these days, interconnect is a graph. Pretending things are
a tree is stressful :)

Here is a basic attempt using the above language, trying to describe
an x86ish system with two sockets, two DMA devices, where one has DMA
target capabable memory (eg a GPU)

// DT tree is the view from the SMP CPU complex down to regs
smp_system {
   socket0 {
   cpu0@0 {}
   cpu1@0 {}
   memory@0: {}
   interconnect0: {targets = <&memory@0,interconnect1>;}
   interconnect0_control: {
 ranges;
 peripheral@0 {
regs = <>;
intiator1 {
ranges = < ... >;
// View from this DMA initiator back to memory
upstream-bridge = <&interconnect0>;
};
/* For some reason this peripheral has two DMA
   initiation ports. */
intiator2 {
ranges = < ... >;
upstream-bridge = <&interconnect0>;
};
 };
};
   }
   socket1 {
   cpu0@1 {}
   cpu1@1 {}
   memory@1: {}
   interconnect1: {targets = 
<&memory@1,&interconnect0,&peripheral@1/target>;}
   interconnect1_control: {
 ranges;
 peripheral@1 {
ranges = < ... >;
regs = <>;
intiator {
ranges = < ... >;
// View from this DMA initiator back to memory
upstream-bridge = <&interconnect1>;
};
target {
reg = <..>
/* This peripheral has integrated memory!
   But notice the CPU path is
 smp_system -> socket1 -> interconnect1_control -> 
target
   While a DMA path is
 intiator1 -> interconnect0 -> interconnect1 -> 
target
 */
};
};
peripheral2@0 {
regs = <>;

// Or we can write the simplest case like this.
dma-ranges = <>;
upstream-bridge = <&interconnect1>;
/* if upstream-bridge is omitted then it defaults to
   &parent, eg interconnect1_control */
   }
}

It is computable that ops from initator2 -> target flow through
interconnect0, interconnect1, and then are delivered to target.

It has a fair symmetry with the interrupt-parent mechanism..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/3] mtd: nand: add randomizer support

2014-05-01 Thread Jason Gunthorpe

On Thu, May 01, 2014 at 10:56:06PM +0200, Boris BREZILLON wrote:

> > However, with a synchronous scrambler the security concern boils down
> > to how robust and unpredictable is the PRBS.
> 
> I'm not sure security is the main concern here.
> AFAICT, NAND scramblers (note that I stopped using the name "randomizer"
> :-)) is mainly used to avoid large island of identical data, because
> some NAND chips are sensible to such patterns (see [1] page 14).

Right, if you send to the flash 'the wrong data' then some combination of:
1) Retention time till ECC failure is reduced
2) The flash block is permanently damaged early
3) A 'nearby', unrelated flash block has ECC failure due to interference

So, if someone deliberately and maliciously defeats the scrambler and
deliberately sends in wrong data what happens?
 1/3) Delibrate, predictable file system corruption
 2) Create device damage and significantly early replacement of the device.
All could lead to a DOS attack of some sort, at a minimum.

FWIW, there was a similar attack against a certain communication
system. The line scrambler was statistically predictable, and if an
attacker sent enough packets that were the predictable anti-scramble
then enough would align with the scamble pattern and the
communication channel would fail and retrain creating a DOS vector.

For this reason these days com systems tend to use a 58 bit
self-synchronous LFSR for scrambling purposes.

> And this is exactly what's done in the sunxi HW  scrambler
> implementation, or at least you can do it based on what you're
> specifying in your DT (see the "nand-randomizer-seeds" in the 3rd
> patch): you can define a seed table and the seed is selected based on
> the page number you're reading or writing.

Well, re-using fixed (and public) seeds:

 state = rnd->seeds[page % rnd->nseeds];

Just changes the probabilities. For instance, some filesystems can be
asked to create extents with a large alignment (like 2M) to speed IOs,
and a small seeds table means the seeds within such a file will be
fully predictable.

If you are already stuck with this, then fine, it can be a driver
specific binding - but if this is a new green-field design, intended
to be broadly used as a core MTD feature:

I'd suggest just seeding with the block number xor some value, and
using a LFSR with a state space larger than the number of blocks in
the device, and don't specify a seeds array in DT.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/3] mtd: nand: add randomizer support

2014-05-01 Thread Jason Gunthorpe

On Thu, May 01, 2014 at 07:31:13PM +0200, Boris BREZILLON wrote:

> I totally agree with you, this is not a randomizer but rather a scrambler.
> The reason I chose the "randomizer" word is that all the documents I
> read are talking about randomizers.
> But, other than I don't have any concern about changing all references
> to "randomizer" into "scrambler" ;-).

If nobody else says anything, Scrambler is at least consistent with
Wikipedia..

and 'descrambler' sounds better than 'unrandomizer' :)

> > BTW, there are security concerns here. The scrambler PRBS must not be
> > predictable by the user, otherwise they can write data that undoes the
> > scramble and defeat it, ie deliberately writing the last 2k of a 4k
> > write block as all 0's after scrambling could cause the first 2k to be
> > lost. That feels like something that could be scary ..
> 
> AFAICT, the scramblers/randomizers used in NAND applications are all
> predictable, which means the scrambler state does not depend on the last
> data being scrambled.

Right, I'm not surprised storage would use a synchronous scrambler,
self-synchronizing scramblers make the most sense for communication..

However, with a synchronous scrambler the security concern boils down
to how robust and unpredictable is the PRBS.

For instance, re-using the same PRBS seed and staring point for every
block is probably a bad idea.

At the very least I would think the block number should be included in
the per-block seed of the PRBS.

And also a per-parition/device global seed..

I suspect even a properly seeded LFSR is sufficent, but by no means
have I studied this :)

> For example, the sunxi HW scrambler is using a Fibonacci LFSR [1].
> Do you have any example of non predictable scrambler that are used to
> scramble NAND data ?

The benifit here is that the scrambled data is completely private to
the driver and flash chip. So any attack on the scrambler would have
to use the driver as some kind of oracle to search for
synchornization. Eg running through the full keyspace of an N-Bit LFSR
by writing an anti-scrambled block, reading it back and checking if it
is corrupted.

This is why per-block randomization of the PRBS is very important - if
the attacker has only filesytems access then they don't know the block
number and now have to attack the filesystem block allocator as well
as the LFSR block-independent seed to generate an anti-scrambler data
pattern.

But that is just an off the cuff feeling. It would be smart to study
it further :)

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/3] mtd: nand: add randomizer support

2014-05-01 Thread Jason Gunthorpe

On Thu, May 01, 2014 at 03:09:49AM +0200, Boris BREZILLON wrote:
> Hello,
> 
> This series is a proposal to add support for randomizers (either software
> or hardware) to NAND flash controller drivers.

FWIW, I think the term for reversibly combining a PRBS with data is
'scrambling', it is often used in communication systems for similar
reasons - probabilisticly increasing transition density.

randomizing is something else entirely :)

BTW, there are security concerns here. The scrambler PRBS must not be
predictable by the user, otherwise they can write data that undoes the
scramble and defeat it, ie deliberately writing the last 2k of a 4k
write block as all 0's after scrambling could cause the first 2k to be
lost. That feels like something that could be scary ..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/15] ARM: dts: kirkwood: consolidate common pinctrl settings

2014-04-30 Thread Jason Gunthorpe

On Wed, Apr 30, 2014 at 09:39:41PM +0200, Sebastian Hesselbarth wrote:
> On 04/30/2014 06:42 PM, Jason Gunthorpe wrote:
> > On Wed, Apr 30, 2014 at 02:56:34PM +0200, Sebastian Hesselbarth wrote:
> >> All SoCs have the same pinctrl setting for NAND, UART0/1, SPI, TWSI0,
> >> and GBE1. Move it to the common pinctrl node that we now have.

> Yes, there are already some boards (e.g. t5325 with spi0) overwriting
> pinctrl settings instead of overwriting the pinctrl-0 property. I
> thought, I keep this behavior and note it above each pinctrl node in
> some of the following patches.

That all makes sense, I think the commit message just seemed to say
something else.

Maybe more like:

NAND and TWSI0 have only one valid pin control choice on Kirkwood,
move those definitions into the common dtsi.

For UART0/1 and SPI, which have two choices, move the definition that
is used in the majority of the board files into the common dtsi.
Board files that are different will override.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/15] ARM: dts: kirkwood: consolidate common pinctrl settings

2014-04-30 Thread Jason Gunthorpe

On Wed, Apr 30, 2014 at 02:56:34PM +0200, Sebastian Hesselbarth wrote:
> All SoCs have the same pinctrl setting for NAND, UART0/1, SPI, TWSI0,
> and GBE1. Move it to the common pinctrl node that we now have.

There are two possible choices for UART0, UART1, and SPI on kirkwood..

For instance I use this on my board:

pmx_spi0: pmx-spi0 {
marvell,pins = "mpp7", "mpp10", 
"mpp11", "mpp12";
marvell,function = "spi";
};

vs

> +
> + pmx_spi: pmx-spi {
> + marvell,pins = "mpp0", "mpp1", "mpp2", "mpp3";
> + marvell,function = "spi";
> + };

It looks like all the boards in the kernel use the same choice, so it
makes some sense to consolidate, but I assume a board file can
override the marvell,pins?

Otherwise the rest of your patchset looked sane to me.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/4] arm64: dts: APM X-Gene PCIe device tree nodes

2014-04-16 Thread Jason Gunthorpe

On Thu, Apr 17, 2014 at 01:20:42AM +0100, Liviu Dudau wrote:

> > No spec says you can put config space into the ranges at all, nobody
> > should be doing that today, obviously some cases were missed during
> > review..
> 
> ePAPR documents allows that when ss == 00.

Which do you mean? The 'PCI Bus Binding' spec has fairly specific
language on how ranges should be used and interpreted, and it
precludes doing anything meaningful with config space (like requiring
b,d,f and r to be zeroed when doing compares against ranges, requiring
the ranges to represent the bridge windows, etc).

There is certainly room to invent something (like ECAM mapping) but
nothing is specified in that document.

The ePAPR document I have doesn't talk about PCI..

If you've found a document that defines how it works then that changes
things.. ;)

> > The comment about ECAM was intended as a general guidance on what
> > config space in ranges could/should be used for.
> > 
> > Right now config space shouldn't propagate out side any driver, so you
> > can probably just filter it in your generic code, and make it very hard
> > and obviously wrong for a driver to parse ranges for config space, so
> > we don't get more usages.
> 
> OK, this goes slightly against your email from 26th March:
> 
> "When we talked about this earlier on the DT bindings list the
> consensus seemed to be that configuration MMIO ranges should only be
> used if the underlying memory was exactly ECAM, and was not to be used
> for random configuration related register blocks.
> 
> The rational being that generic code, upon seeing that ranges entry,
> could just go ahead and assume ECAM mapping."
> 
> What I'm saying is that the only code that will see this ranges entry will
> be the parsing code as if we try to create a resource out of the range
> and add it to the host bridge structure (not driver) we will confuse the
> rest of the pci_host_bridge API. So we cannot do any ECAM accesses (yet?).

Sorry if this seems unclear, what you quoted was from a specification
standpoint - someday defining config space ranges to be the ECAM
window makes the most sense. This is from the direction of precluding
drivers from using it for random purposes.

>From a Linux standpoint, there is simply no infrastructure for generic
config access outside the driver, so config space must remain
contained in the driver, and shouldn't leak into the host bridge or
other core structures.

I think the shared code you are working on should simply ignore config
ss ranges entirely, they have no defined meaning..

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/4] arm64: dts: APM X-Gene PCIe device tree nodes

2014-04-16 Thread Jason Gunthorpe

On Wed, Apr 16, 2014 at 06:05:45PM +0100, Liviu Dudau wrote:

> I have found out that we cannot pasd the config ranges from the DT into the
> pci_host_bridge structure as the PCI framework doesn't have a resource type
> for config resources. Leaving the translation between range flags and
> resource type as is (filtered through the IORESOURCE_TYPE_BITS) will lead
> to a resource type of value zero, which is not recognised by any resource
> handling API so bridge configuration and bus scanning will barf.
> 
> I'm looking for suggestions here, as Jason Gunthorpe suggested that we
> should be able to parse config ranges if they conform to the ECAM part
> of the PCI standard.

The thinking here is the ranges should be well defined and general, it
isn't a dumping ground for driver specific stuff.

No spec says you can put config space into the ranges at all, nobody
should be doing that today, obviously some cases were missed during
review..

The comment about ECAM was intended as a general guidance on what
config space in ranges could/should be used for.

Right now config space shouldn't propagate out side any driver, so you
can probably just filter it in your generic code, and make it very hard
and obviously wrong for a driver to parse ranges for config space, so
we don't get more usages.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/6] PCI: tegra: use new OF interrupt mapping when possible

2014-04-11 Thread Jason Gunthorpe

On Fri, Apr 11, 2014 at 11:10:59PM +0530, Srikanth Thokala wrote:

> I see this error too on my setup (Xilinx PCIe Root Complex Driver),
> https://lkml.org/lkml/2014/3/3/183

> After digging into it, I see I need to override the API
> pcibios_get_phb_of_node()

No, as I pointed out before, the DT node comes in through
pci_scan_root_bus:

+static struct pci_bus __init *xilinx_pcie_scan_bus(int nr,
+   struct pci_sys_data *sys)
+{
+   struct xilinx_pcie_port *port = sys_to_pcie(sys);
+   struct pci_bus *bus;
+
+   if (port) {
+   port->root_busno = sys->busnr;
+   bus = pci_scan_root_bus(NULL, sys->busnr, &xilinx_pcie_ops,
   ^^

You can't pass NULL here and have DT work properly. 

See http://www.spinics.net/lists/arm-kernel/msg312392.html

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/5] ARM: zynq: dt: Convert to preprocessor includes

2014-04-08 Thread Jason Gunthorpe

On Tue, Apr 08, 2014 at 09:03:27AM +0200, Michal Simek wrote:

> > If in doubt follow common mainline practice. Although using includes
> > for DT is not necessarily common practice, readability of DTs is
> > really important IMHO.
> 
> Let me give you one example. When you add xilinx intc controller
> to zynq HW design which uses gic with headers you are using
> then you will find out that sensitivity for both controllers
> are just different.
> This is just one case I am aware of. I expect there will be one more.

I'm not sure I see the problem here, just because some bindings can't
use the standard shared constants doesn't mean the GIC bindings and
users should avoid them. The binding documentation is supposed to make
it clear what is correct.

It is just as easy to get confused with numbers, does 4 mean
XILINX_INTC_IRQ_RISING or IRQ_TYPE_LEVEL_HIGH ?

> Using dtc preprocessor and macros improve DTS readability but at the same
> time force other users to copy all necessary files from the kernel
> to that projects which is just hassle.

You can run the DTS through cpp before you export it out of the kernel
environment, then you get a flat file with no includes.

The shared kernel conventions are more important than constraints from
outside projects.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/5] ARM: zynq: dt: Convert to preprocessor includes

2014-04-07 Thread Jason Gunthorpe

On Mon, Apr 07, 2014 at 02:24:07PM +0200, Michal Simek wrote:

> Device-tree BSP and in 2014.01 there will be new BSP which just
> generate them directly from the Vivado tools which just target your
> reference design.  You can connect your custom IP (or Xilinx or 3rd
> party) directly to the GIC which using different IRQ sensitivity
> with whatever register addresses and make no sense to write it by
> hand.

On our Zynq design here we ended up being unwilling to use platform
generation from Vivado. Basically all our IP was custom, so there was
no win at all to invoking the complexity of the automatic tools.

Thus we write the DT by hand, and our DT is complex, integrating
peripherals that span two FPGAs.

I think the in-kernel DT should use the kernel conventions, which
means using #include and the binding constants over magic values.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/4] arm64: dts: APM X-Gene PCIe device tree nodes

2014-03-26 Thread Jason Gunthorpe

On Wed, Mar 26, 2014 at 09:28:42AM -0500, Rob Herring wrote:

> Where is the right place for config space? This binding has it here
> and others have it in ranges. 

I think all the drivers in drivers/pci/host use 'reg', this was
discussed in the dt-bindings list and AFAIK no new drivers have used
ranges since it was brought up.

> Given that config space type is defined for ranges, I would think
> that is the right place. But Liviu's patches do not process config
> space entries in ranges. Perhaps we need a config space resource
> populated in the bridge struct.

When we talked about this earlier on the DT bindings list the
consensus seemed to be that configuration MMIO ranges should only be
used if the underlying memory was exactly ECAM, and was not to be used
for random configuration related register blocks.

The rational being that generic code, upon seeing that ranges entry,
could just go ahead and assume ECAM mapping.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/8] Watchdog support for Armada 375/38x SoC

2014-03-19 Thread Jason Gunthorpe

On Wed, Mar 12, 2014 at 06:54:56PM -0300, Ezequiel Garcia wrote:
> On Mar 12, Jason Gunthorpe wrote:
> > On Wed, Mar 12, 2014 at 06:11:47PM -0300, Ezequiel Garcia wrote:
> > 
> > > I've pushed a branch so people can test this easily, e.g. ensuring
> > > no regressions on Dove, Kirkwood and A370/XP:
> > > 
> > > https://github.com/MISL-EBU-System-SW/mainline-public/tree/wdt_a385_a375_v3
> > 
> > This branch worked for me, the watchdog functioned as expected.
> > 
> 
> Good to hear. Is that a Tested-by on Kirkwood?

Yes, sorry for the delay

> > I got these two oddball messages during boot though:
> > 
> > irq: Cannot allocate irq_descs @ IRQ26, assuming pre-allocated
> > irq: Cannot allocate irq_descs @ IRQ61, assuming pre-allocated
> > 
> > Probably unrelated to the watchdog.. Unfortunately I've run out of
> > time right now to investigate.
> > 
> 
> Probably. That branch is based on some mvebu/for-next tip, which has all
> the watchdog and devicetree dependencies for this patchset to apply.
> 
> I can re-send a series based on v3.15-rc1 and we can check again; maybe
> we're just missing some irq patch.

I quickly re-checked and 3.14.0-rc7 doesn't print these messages..

I can review again when 3.15-rc1 comes out.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ARM: dts: ventana: fix eth1 pci dev node

2014-03-18 Thread Jason Gunthorpe

On Tue, Mar 18, 2014 at 01:02:55PM -0700, Tim Harvey wrote:
> > Is this whole bridge/switch hierarchy binding documented somewhere
> > or is this just something that work for you?
> 
> I'm not sure where its 'best' documented, but it is the way the
> kernel works.

It is documented in the 'PCI Bus Binding to Open Firware'
publication from IEEE.

> >> + pcie@0,0 {
> >> + /* 01:00.0 PCIe switch */
> >> + #address-cells = <3>;
> >> + #size-cells = <2>;
> >> + device_type = "pci";
> >> + reg = <0x0 0 0 0 0>;
> >> +
> >> + pcie@8,0 {
> >
> > What's the naming schema for all these pcie nodes?  Generally, we should
> > have the numbers encoded in the node name coming from the address cells
> > in 'reg' property.

The 'reg' property for PCI encodes the device and function number, and
the suffix in the device path is of the form @DEVICE,FUNCTION (see
2.2.1.3 of the spec)

So device=8, function=0 is @8,0 and reg = 0x4000.

> I was hoping there was a way to reference PCI nodes by BDF values as
> I'm simply trying to define a marvell,sky2 device at 08:00.0.  I found
> that the kernel's OF parsing code for PCI requires you to nest the
> nodes to match the bus hierarchy.  In order to map a dt node to a PCI
> device, the bus the device is on must have a dt node itself, which is
> what creates the need for the nesting.  Note that the bus topology
> here rc -> P2P bridge -> GigE.

Right, otherwise the kernel and firmware would have to agree on bus
numbering. With nesting it only has to agree on the device numbering,
which is a fixed property of PCI.

> >> + /* 02:08.0 PCIe switch port */
> >> + #address-cells = <3>;
> >> + #size-cells = <2>;
> >> + device_type = "pci";
> >> + reg = <0x4000 0 0 0 0>;
> >> + eth1: pcie@0,0 {
> >> + /* 08:00.0 GigE */
> >> + #address-cells = <3>;
> >> + #size-cells = <2>;
> >> + device_type = "pci";
> >> + reg = <0x0 0 0 0 0>;
> >> + compatible = "marvell,sky2";
> >> + };

Don't forget your interrupts and interrupt-map - every DT nodes need
to describe how its interrupts are routed.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/8] Watchdog support for Armada 375/38x SoC

2014-03-12 Thread Jason Gunthorpe

On Wed, Mar 12, 2014 at 06:11:47PM -0300, Ezequiel Garcia wrote:

> I've pushed a branch so people can test this easily, e.g. ensuring
> no regressions on Dove, Kirkwood and A370/XP:
> 
> https://github.com/MISL-EBU-System-SW/mainline-public/tree/wdt_a385_a375_v3

This branch worked for me, the watchdog functioned as expected.

# killall -STOP wdt...
..
.
orion_wdt: Oops: Watchdog Timeout
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.0-rc1-00205-g666e53f-dirty #1
[..]


I got these two oddball messages during boot though:

irq: Cannot allocate irq_descs @ IRQ26, assuming pre-allocated
irq: Cannot allocate irq_descs @ IRQ61, assuming pre-allocated

Probably unrelated to the watchdog.. Unfortunately I've run out of
time right now to investigate.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 04/14] mtd: nand: define struct nand_timings

2014-03-12 Thread Jason Gunthorpe

On Wed, Mar 12, 2014 at 05:46:53PM +0100, Boris BREZILLON wrote:

> >>I see at least 3 of those timings that could be useful (for the moment) :
> >>- tR: this one should be used to fill the chip_delay field
> >>- tPROG and tBERS: could be used within nand_wait to choose the timeo
> >>   value appropriately.
> >IIRC these timing values are really only necessary if the controller
> >does not support the READY/BUSY input, in that case drivers typically
> >seem to use 'chip_delay' which is the maximum possible command
> >execution time (a sleep long enough to guarentee that READY/BUSY is
> >de-asserted).

> You're right about tR (or chip_delay): it's only used when there are
> no R/B pin.  I experienced it when I tried the RB_NONE case in the
> sunxi driver: the default chip_delay set by the NAND core code was
> too small to fit the NAND chip requirements.
> 
> Anyway, I really think the chip_delay field should be set according
> to NAND chip characteristics not harcoded in NAND controller driver
> code (as currently done).

Drivers these days are often taking this value from the DT node
property 'chip-delay'. I think this would be nice to have in common
code too...

> tPROG and tBERS, would be used in nand_wait function and do not
> depend on the R/B pin.  These are just used as timeouts.

tPROG/tBERS have that special mode where R/B remains asserted but you
can still issue a status read to the chip to check on the command, so
the timeout required here is just a big number to detect failed NAND
controllers, it isn't really too important to have an exact value..

> >>Or should I create a new struct for these timings ?
> >>In the latter case how should I name it ?
> >struct onfi_command_timings ?
> 
> I'm not a big fan of this name. I think timing structs should not
> contain onfi in their names, because these timings are also
> available on non ONFI chips.

Explicitly defering to the ONFI spec makes it clear what the
definition of the timing parameter actually is.

If JEDEC has a different model then drivers will need to configure
their interfaces a little differently.

So we might end up with a jedec_sdr_timings too :|

> What do you think ?

I'd focus on getting the bus timings working before tackling too much
more ...

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/7] Watchdog support for Armada 375/38x SoC

2014-03-11 Thread Jason Gunthorpe

On Thu, Mar 06, 2014 at 09:13:46PM -0300, Ezequiel Garcia wrote:

> > On the driver side, we need to implement per-SoC stop() and enabled()
> > functions. Such somewhat complex infrastructure is needed to ensure the 
> > driver
> > performs proper reset of the watchdog timer, by masking and disabling the
> > RSTOUT before the interrupt is enabled.

> JasonG: How does this look now?

A cursory look seems reasonable..

> Any chance someone gives a Tested-by on Dove/Kirkwood?

If you have a git I can pull I can try it here.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 04/14] mtd: nand: define struct nand_timings

2014-03-11 Thread Jason Gunthorpe

On Mon, Mar 10, 2014 at 02:44:04PM +0100, Boris BREZILLON wrote:

> Some timings are missing here (see Table 55 in the ONFI spec):

Right..

The 'mode' covers only the raw electrical parameters needed to
exchange commands, other timings cover the commands
themselves. Notably the timing mode does not alter those parameters.

To me it seems tidy to keep the 'mode' timings contained in their own
struct and find other homes for the other parameters.

> -tR
> -tBERS
> -tCCS
> -tPLEBSY
> -...
> 
> I see at least 3 of those timings that could be useful (for the moment) :
> - tR: this one should be used to fill the chip_delay field
> - tPROG and tBERS: could be used within nand_wait to choose the timeo
>   value appropriately.

IIRC these timing values are really only necessary if the controller
does not support the READY/BUSY input, in that case drivers typically
seem to use 'chip_delay' which is the maximum possible command
execution time (a sleep long enough to guarentee that READY/BUSY is
de-asserted).

> Or should I create a new struct for these timings ?
> In the latter case how should I name it ?

struct onfi_command_timings ?

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-sunxi] Re: [RFC PATCH v2 08/14] mtd: nand: add sunxi NAND flash controller support

2014-03-11 Thread Jason Gunthorpe

On Mon, Mar 10, 2014 at 12:17:31PM +0100, Boris BREZILLON wrote:
> Hello Jason,
> 
> Le 29/01/2014 20:10, Jason Gunthorpe a écrit :
> >On Wed, Jan 29, 2014 at 03:46:20PM -0300, Ezequiel Garcia wrote:
> >
> >>After CE# has been pulled high and then transitioned low again, the host
> >>should issue a Set Features to select the appropriate asynchronous timing 
> >>mode.
> >>"""
> >Oh, I had forgot you should do a set feature too
> >
> >Boris, I think the core core should handle this dance and the driver
> >should just implement a call back to change the timing mode on the
> >interface..
> >
> >If I ever get a moment I can work on support for timing setting in the
> >mvebu driver, I have boards here with ONFI NAND..
> 
> Any progress on this ?
> I'm about to post the 3rd version of this series and if you already have
> something working I could base my work on your proposal.

Sorry, no, I will try to look over your v3 though.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 6/6] PCI: designware: use new OF interrupt mapping when possible

2014-03-05 Thread Jason Gunthorpe

On Wed, Mar 05, 2014 at 02:25:51PM +0100, Lucas Stach wrote:
> - return pp->irq;
> + irq = of_irq_parse_and_map_pci(dev, slot, pin);
> + if (!irq)
> + irq = pp->irq;

In light of the two bugs that Tim found, it might be wise to throw a
'dev_warn(FW_BUG "Missing DT interrupt mapping")' in the fall back
path, so it doesn't continue to silently cover up errors on the OF/DT
side..

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] PCI irq mapping fixes and cleanups

2014-03-03 Thread Jason Gunthorpe

On Mon, Mar 03, 2014 at 03:40:43PM -0800, Tim Harvey wrote:

> of_irq_parse_and_map_pci().  The GIC function that translates the
> interrupt per domain is given irq_data: 0x123 0x04 0x00 

This has been shifted by 1 byte..

> IRQ 123, which should get 32 added to it for irq155).  Instead, the
> implementation of gic_irq_domain_xlate()
> (http://lxr.missinglinkelectronics.com/linux/drivers/irqchip/irq-gic.c#L832)
> adds 32 to the 0x04 returning 20:
> [1.841781] of_irq_parse_raw:  /soc/pcie@0x0100:0001
> [1.841813] of_irq_parse_raw: ipar=/soc/pcie@0x0100, size=1
> [1.841838]  -> addrsize=3
> [1.841870]  -> match=1 (imaplen=28)
  ^

That looks odd, it should be the number of dwords in the
interrupt-map, you have 4 lines of 8 dwords each, so it should be
32. 

You have 7*4, which really suggests to me that your interrupt-map
is corrupted somehow, are you sure you are using this:

>   interrupt-map = <0 0 0 1 &intc GIC_SPI 123 IRQ_TYPE_LEVEL_HIGH>,
>   <0 0 0 2 &intc GIC_SPI 122 IRQ_TYPE_LEVEL_HIGH>,
>   <0 0 0 3 &intc GIC_SPI 121 IRQ_TYPE_LEVEL_HIGH>,
>   <0 0 0 4 &intc GIC_SPI 120 IRQ_TYPE_LEVEL_HIGH>;

If an element was missing it would explain why everything is broken,
in fact if GIC_SPI was missing it would match the debug perfectly.

> [1.841903]  -> newintsize=3, newaddrsize=1
> [1.841916]  -> imaplen=23
> [1.841928]  -> new parent: /interrupt-controller@00a01000
> [1.841946]  -> got it !

K..

> [1.841972] irq_create_of_mapping: calling xlate for 123/4/0 3

And it is the wrong data.. 123/4/0 is 


> For the slots above that swizzle to pin2,3,4 of_irq_parse_raw()
> returns -EINVAL because for some reason it can't match an interrupt
> mapping for the pcie host controller:
> [1.842996] of_irq_parse_raw:  /soc/pcie@0x0100/pcie@0,0:0002
> [1.843046] of_irq_parse_raw: ipar=/soc/pcie@0x0100, size=1
> [1.843070]  -> addrsize=3
> [1.843100]  -> match=0 (imaplen=28)
> ^ indicates no match in interrupt map.

Well, since we know the map is corrupted somehow, it isn't surprising
that the 2nd entry won't match anything. It is probably matching
against '0 0 2 &intc'
 
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] pcie: Add Xilinx PCIe Host Bridge IP driver

2014-03-03 Thread Jason Gunthorpe

On Tue, Mar 04, 2014 at 12:21:55AM +0530, Srikanth Thokala wrote:

> >> + pci_express: axi-pcie@5000 {
> >> + #address-cells = <3>;
> >> + #size-cells = <2>;
> >> + compatible = "xlnx,axi-pcie-host-1.00.a";
> >> + reg = < 0x5000 0x1000 >;
> >> + interrupts = < 0 52 4 >;
> >> + ranges = < 0x0200 0 0x6000 0x6000 0 0x1000 >;
> >> + };

> >  - How are INTA/B/C/D handled? Typically I'd like to see an
> >interrupt-mapping block describing them.
> >
> >There are a few options here, if the HW can decode A/B/C/D then
> >it probably needs another irq_domain for INTx, and have the IRQ
> >handler decode like it does for MSI.
> 
> The core provides a single interrupt for both INTx/MSI messages.  The
> core decodes the TLP messages and accordingly assert the INTx/MSI
> bits.  It provides two registers Interrupt FIFO Read 1 and Read 2 to get
> more information of these.
> 
>| CPU |  <--  | PCIe IP |   <--  MSI/INTx Message (INTA, INTB,
> INTC, INTD)
> 
>Reading Interrupt Status Reg -> MSI/INTx bit set/not set
>Reading Root Port FIFO Read Register1 -> Mentions which interrupt
> (INTx) is received
>Reading Root Port FIFO Read Register2 -> MSI Data

Okay, that matches my guess based on the code.

You probably want to do something like this:

 pci_express: axi-pcie@5000 {
 #address-cells = <3>;
 #size-cells = <2>;
 device_type ="pci";
 compatible = "xlnx,axi-pcie-host-1.00.a";
 reg = < 0x5000 0x1000 >;
 interrupts = < 0 52 4 >;
 ranges = < 0x0200 0 0x6000 0x6000 0 0x1000 >;

 pcie_intc: interrupt-controller {
interrupt-controller;
#address-cells = <0>;
#interrupt-cells = <1>;
 }

 interrupt-map-mask = <0 0 0 7>;
 interrupt-map = <0 0 0 1 &pcie_intc 0>, // INT A
 <0 0 0 2 &pcie_intc 2>, // INT B
 <0 0 0 3 &pcie_intc 3>, // INT C
 <0 0 0 4 &pcie_intc 4>; // INT D
 };

(similar to what ralink,rt3883 does for their binding)

Setup a interrupt domain like for MSI, except attach it to the
nested interrupt-controller DT node. Decode the 4 interrutps from
the core in the ISR and route them into the domain.

The generic code will take care of the rest.

This way all the interrupts are decoded and the interrupt-map DT
functionality is supported, which is important in a number of
board-specific corner cases.

> > You may want to work with Liviu to use his patch set which provides
> > more generic support. Also, the driver needs to put every one of these
> > ports in a PCI domain, Liviu is working on generic support for that
> > too.
> 
> I will check and come back to you.

Will's latest patch that was just posted is a very good example of how
the resource stuff should work correctly.

> > Don't forget to think about hot plug
> 
> Did you mean using 'rescan' (from sysfs), correct?

Yes

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] PCI irq mapping fixes and cleanups

2014-03-03 Thread Jason Gunthorpe

On Mon, Mar 03, 2014 at 09:49:52AM -0800, Tim Harvey wrote:

> I'm not clear why irq 20 is getting returned for all the slots with
> (slot%4)=0 and func=0.  If I start debugging of_irq_parse_pci() I see
> that it walks up the tree until it gets to the pcie host controller
> then calls of_irq_parse_raw() which is returning irq20 or -EINVAL.

Can you share your DT as well?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] pcie: Add Xilinx PCIe Host Bridge IP driver

2014-03-03 Thread Jason Gunthorpe

On Mon, Mar 03, 2014 at 07:10:36PM +0530, Srikanth Thokala wrote:

> +Required properties:
> +- #address-cells: Address representation for root ports, set to <3>
> +- #size-cells: Size representation for root ports, set to <2>
> +- compatible: Should contain "xlnx,axi-pcie-host-1.00.a"
> +- reg: Should contain AXI PCIe registers location and length
> +- interrupts: Should contain AXI PCIe interrupt
> +- ranges: ranges for the PCI memory regions
> + Please refer to the standard PCI bus binding document for a more
> + detailed explanation
> +
> +Example:
> +
> +
> + pci_express: axi-pcie@5000 {
> + #address-cells = <3>;
> + #size-cells = <2>;
> + compatible = "xlnx,axi-pcie-host-1.00.a";
> + reg = < 0x5000 0x1000 >;
> + interrupts = < 0 52 4 >;
> + ranges = < 0x0200 0 0x6000 0x6000 0 0x1000 >;
> + };

That is much shorter, more notes:
 - It looks like the driver (HW?) has no support for IO space?
 - How are INTA/B/C/D handled? Typically I'd like to see an
   interrupt-mapping block describing them.

   There are a few options here, if the HW can decode A/B/C/D then
   it probably needs another irq_domain for INTx, and have the IRQ
   handler decode like it does for MSI.
- The stanza needs a
device_type = "pci";

> + for (i = 0; i < port->mem_resno; i++)
> + pci_add_resource_offset(&sys->resources, &port->mem[i],
> + sys->mem_offset);

The sys->mem_offset is probably wrong, please review Arnd's comments
on the recent threads with Will and Liviu regarding how this is
supposed to work with DT.

You may want to work with Liviu to use his patch set which provides
more generic support. Also, the driver needs to put every one of these
ports in a PCI domain, Liviu is working on generic support for that
too.

> +static struct pci_bus __init *xilinx_pcie_scan_bus(int nr,
> + struct pci_sys_data *sys)
> +{
> + struct xilinx_pcie_port *port = sys_to_pcie(sys);
> + struct pci_bus *bus;
> +
> + if (port) {
> + port->root_busno = sys->busnr;
> + bus = pci_scan_root_bus(NULL, sys->busnr, &xilinx_pcie_ops,
> + sys, &sys->resources);

The NULL is probably wrong, please review the recent thread with Lucas
and Jingoo 'PCI irq mapping fixes and cleanups'

> +static int xilinx_pcie_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
> +{
> + struct xilinx_pcie_port *port = sys_to_pcie(dev->bus->sysdata);
> +
> + if (!port)
> + return -EINVAL;
> +
> + return port->irq;
> +}

It is preferred to use DT mechanisms via of_irq_parse_and_map_pci
(interrupt-mapping, etc) to define the slot IRQ, drivers should not
provide a map_irq unless absolutely necessary.

> + /* make sure it is root port before touching header */
> + pcie_write(port, PCI_COMMAND_MASTER, PCI_COMMAND);

Bit of a confusing comment

> + /* Check if PCIe link is up? */
> + port->link_up = xilinx_pcie_is_link_up(port);
> + if (!port->link_up)
> + dev_info(port->dev, "PCIe Link is DOWN\n");
> + else
> + dev_info(port->dev, "PCIe Link is UP\n");

Don't forget to think about hot plug

> + /* Register Interrupt Handler */
> + err = request_irq(port->irq, xilinx_pcie_intr_handler,
> + IRQF_SHARED, "xilinx-pcie", port);

Use devm? This is leaking on error cases.

Review other resources too...

> + /* Request and map I/O memory */
> + io = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + port->phys_reg_base = io->start;
> + port->reg_size = resource_size(io);

Here it uses the platform resource..

> + /* Map the IRQ */
> + port->irq = irq_of_parse_and_map(node, 0);

.. And here it doesn't, probably should be consistent and just use the
platform resources.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] PCI irq mapping fixes and cleanups

2014-03-01 Thread Jason Gunthorpe

On Fri, Feb 28, 2014 at 04:53:33PM -0800, Tim Harvey wrote:

> In testing this on IMX6 I'm finding that 'of_irq_parse_and_map_pci()'
> always returns -EINVAL because it can't find a dt node for the host
> bridge: 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/of/of_pci_irq.c#n60.

There may be some kind of issue in the pcie-designware.c:

static struct pci_bus *dw_pcie_scan_bus(int nr, struct pci_sys_data *sys)
{
struct pci_bus *bus;
struct pcie_port *pp = sys_to_pcie(sys);

if (pp) {
pp->root_bus_nr = sys->busnr;
bus = pci_scan_root_bus(NULL, sys->busnr, &dw_pcie_ops,
 ^^^
 Shouldn't be null for DT cases.

Perhaps since the driver doesn't pass in a parent pointer, no parent
device is associated with the struct pci_bus that gets created, so
pci_bus_to_OF_node will always fail and the DT PCI mechanisms become
broken.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/4] pci: OF: Fix the conversion of IO ranges into IO resources.

2014-02-27 Thread Jason Gunthorpe

On Thu, Feb 27, 2014 at 08:48:08PM +0100, Arnd Bergmann wrote:
> > It also looks correct for architectures that use the CPU MMIO address
> > as the IO address directly (where IO_SPACE_LIMIT would be 4G)
> 
> Are you aware of any that still do? I thought we had stopped doing
> that.

I thought ia64 used to, but it has been a long time since I've touched
one...

> > Architectures that use the virtual IO window technique will always
> > require a custom pci_address_to_pio implementation.
> 
> Hmm, at the moment we only call it from of_address_to_resource(),
> which in turn does not get called on PCI devices, and does not
> call pci_address_to_pio for 'simple' platform devices. The only
> case I can think of where it actually matters is when we have
> ISA devices in DT that use an I/O port address in the reg property,
> and that case hopefully won't happen on ARM32 or ARM64.

Sure, I ment, after Liviu's patch it will become required since he is
cleverly using it to figure out what the io mapping the bridge driver
setup before calling the helper.

> > I think the legacy reasons for having all those layers of translation
> > are probably not applicable to ARM64, and it is much simpler without
> > the extra translation step
> > 
> > Arnd, what do you think?
> 
> Either I don't like it or I misunderstand you ;-)
> 
> Most PCI drivers normally don't call ioport_map or pci_iomap, so
> we can't just do it there. If you are thinking of calling ioport_map

Okay, that was one of the 'legacy reasons'. Certainly lots of drivers
do call pci_iomap, but if you think legacy drivers that don't are
important to ARM64 then it makes sense to use the virtual IO window.

> for every PCI device that has an I/O BAR and storing the virtual
> address in the pci_dev resource, I don't see what that gains us

Mainly we get to drop the fancy dynamic allocation stuff for the fixed
virtual window, and it gives the option to have a 1:1 relationship
between CPU addresses and PCI BARs.

> in terms of complexity, and it will also break /dev/port.

Yes, /dev/port needs updating, it would need to iomap (arguably it
probably should be doing that already anyhow), and the hardwired limit
of 65536 needs to be replaced with the arch's IO limit, but those do
not seem to be fundemental problems with the UAPI??

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/4] pci: OF: Fix the conversion of IO ranges into IO resources.

2014-02-27 Thread Jason Gunthorpe

On Thu, Feb 27, 2014 at 07:12:59PM +, Liviu Dudau wrote:
> The outstanding issue is how to fix pci_address_to_pio() as it will not
> for for range->cpu_addr > IO_SPACE_LIMIT (16MB in my case).

The default actually looks fine to me, it is the correct behavior for
systems that actually have a dedicated IO space (like x86) where the
'CPU' value for IO is the exact value used in the IO accessor
instructions. In this case the IO_SPACE_LIMIT test is appropriate.

It also looks correct for architectures that use the CPU MMIO address
as the IO address directly (where IO_SPACE_LIMIT would be 4G)

Architectures that use the virtual IO window technique will always
require a custom pci_address_to_pio implementation.

BTW, something that occured to me after reading the patches:

For ARM64 you might want to think about doing away with the fixed
virtual IO window like we see in ARM32. Just use the CPU MMIO address
directly within the kernel, and implement a ioport_map to setup the MM
on demand.

I think the legacy reasons for having all those layers of translation
are probably not applicable to ARM64, and it is much simpler without
the extra translation step

Arnd, what do you think?

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/4] pci: OF: Fix the conversion of IO ranges into IO resources.

2014-02-27 Thread Jason Gunthorpe

On Thu, Feb 27, 2014 at 01:06:39PM +, Liviu Dudau wrote:
> + if (res->flags & IORESOURCE_IO) {
> + unsigned long port;
> + port = pci_address_to_pio(range->pci_addr);

This looks very suspicious, pci_addr is not unique across all domains,
so there is no way to convert from a pci_addr to the virtual IO
address without knowing the domain number as well.

I would like to see it be:
  port = pci_address_to_pio(range->cpu_addr);

cpu_addr is unique across all domains.

Looking at the microblaze and PPC versions I think the above version
is actually correct (assuming io_base_phys is the CPU address of the
IO window)

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/PATCH 0/3] Add devicetree scanning for randomness

2014-02-18 Thread Jason Gunthorpe

On Mon, Feb 17, 2014 at 03:54:19PM +, Grant Likely wrote:

> I applied a patch that did exactly that (109b623629), and then reverted
> it (b920ecc82) shortly thereafter because add_device_randomness() is
> a rather slow function and FDTs can get large. I'd like to see someone
> do a reasonable analysis on the cost of using an FDT for randomness
> before I reapply a patch doing something similar. An awful lot of the
> FDT data is not very random, but there are certainly portions of it that
> are appropriate for the random pool.

I read through the original thread from Tim Bird and FWIW I agree with
the assessment that passing the FDT through MD5 first is a good
approach.

Thinking into the future, I'd expect to see similar variable data in
DT on servers as we see in DMI, including:
  - Vendor serial number for the HW, manufacturing date, model number,
and HW UUID
  - Serial numbers and vendor part numbers for DIMMS
  - MAC addresses for all the ethernet
  - OEM specific data

At worst a 'choosen/linux,no-dt-random = 1' value in the DT to disable
it would solve the problem for those in embedded that care about
microseconds during booting.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/PATCH 0/3] Add devicetree scanning for randomness

2014-02-12 Thread Jason Gunthorpe

On Wed, Feb 12, 2014 at 12:45:54PM -0500, Jason Cooper wrote:

> The bootloader would then load this file into ram, and pass the
> address/size to the kernel either via dt, or commandline.  kaslr (run in
> the decompressor) would consume some of this randomness, and then
> random.c would consume the rest in a non-crediting initialization.

Sure is a neat idea, but I think in general it would probably be smart
to include the entire FDT blob in the early random pool, that way you
get MACs and other machine unique data too.

>From there it is a small step to encourage bootloaders to include
boot-time-variable data in the DT like like 'boot time of day', 'cycle
counter', 'random blob', etc.

Then you just need the bootloader to dump the random-seed file into a
DT property.

Or have the bootloader fetch randomness from any HWRNG it has a driver
for. (eg a TPM)

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 158 matches

Mail list logo