[PATCH] fsl/sata: create a sysfs entry for rx water mark

2013-03-03 Thread Qiang Liu
Support config RX WATER MARK via sysfs when running at run-time;
A wrokaround for fix the exception happened to some WD HDD, found on
WD3000HLFS-01G6U1, WD3000HLFS-01G6U0, some SSD disks. The read performance
is also regression (about 30%) when use default value.

According to the latest documents, 0x10 is the default value of RX WATER MARK,
but exception/performance issue happened to some disks mentioned above.

The exception log as below when testing read performance with IOZone:
ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x80 action 0x6 frozen
ata1: SError: { LinkSeq }
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/00:00:ff:2c:14/01:00:02:00:00/40 tag 0 ncq 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/00:08:ff:2d:14/01:00:02:00:00/40 tag 1 ncq 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/10:10:af:08:6e/00:00:12:00:00/40 tag 2 ncq 8192 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: Hardreset failed, not off-lined 0
ata1: Signature Update detected @ 504 msecs
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1.00: device reported invalid CHS sector 0
ata1.00: device reported invalid CHS sector 0
ata1.00: device reported invalid CHS sector 0
ata1: EH complete

The exception/performance can be resolved when RX WATER MARK value is 0x16.

Signed-off-by: Qiang Liu 
---
 drivers/ata/sata_fsl.c |   55 
 1 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/sata_fsl.c b/drivers/ata/sata_fsl.c
index 124b2c1..515f66f 100644
--- a/drivers/ata/sata_fsl.c
+++ b/drivers/ata/sata_fsl.c
@@ -285,6 +285,7 @@ struct sata_fsl_host_priv {
int irq;
int data_snoop;
struct device_attribute intr_coalescing;
+   struct device_attribute rx_watermark;
 };

 static void fsl_sata_set_irq_coalescing(struct ata_host *host,
@@ -343,6 +344,48 @@ static ssize_t fsl_sata_intr_coalescing_store(struct 
device *dev,
return strlen(buf);
 }

+static ssize_t fsl_sata_rx_watermark_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   unsigned int rx_watermark;
+   unsigned long flags;
+   struct ata_host *host = dev_get_drvdata(dev);
+   struct sata_fsl_host_priv *host_priv = host->private_data;
+   void __iomem *csr_base = host_priv->csr_base;
+
+   spin_lock_irqsave(&host->lock, flags);
+   rx_watermark = ioread32(csr_base + TRANSCFG);
+   rx_watermark &= 0x1f;
+
+   spin_unlock_irqrestore(&host->lock, flags);
+   return sprintf(buf, "%d\n", rx_watermark);
+}
+
+static ssize_t fsl_sata_rx_watermark_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   unsigned int rx_watermark;
+   unsigned long flags;
+   struct ata_host *host = dev_get_drvdata(dev);
+   struct sata_fsl_host_priv *host_priv = host->private_data;
+   void __iomem *csr_base = host_priv->csr_base;
+   u32 temp;
+
+   if (sscanf(buf, "%d", &rx_watermark) != 1) {
+   printk(KERN_ERR "fsl-sata: wrong parameter format.\n");
+   return -EINVAL;
+   }
+
+   spin_lock_irqsave(&host->lock, flags);
+   temp = ioread32(csr_base + TRANSCFG);
+   temp &= 0xffe0;
+   iowrite32(temp | rx_watermark, csr_base + TRANSCFG);
+
+   spin_unlock_irqrestore(&host->lock, flags);
+   return strlen(buf);
+}
+
 static inline unsigned int sata_fsl_tag(unsigned int tag,
void __iomem *hcr_base)
 {
@@ -1500,6 +1543,17 @@ static int sata_fsl_probe(struct platform_device *ofdev)
if (retval)
goto error_exit_with_cleanup;

+   host_priv->rx_watermark.show = fsl_sata_rx_watermark_show;
+   host_priv->rx_watermark.store = fsl_sata_rx_watermark_store;
+   sysfs_attr_init(&host_priv->rx_watermark.attr);
+   host_priv->rx_watermark.attr.name = "rx_watermark";
+   host_priv->rx_watermark.attr.mode = S_IRUGO | S_IWUSR;
+   retval = device_create_file(host->dev, &host_priv->rx_watermark);
+   if (retval) {
+   device_remove_file(&ofdev->dev, &host_priv->intr_coalescing);
+   goto error_exit_with_cleanup;
+   }
+
return 0;

 error_exit_with_cleanup:
@@ -1523,6 +1577,7 @@ static int sata_fsl_remove(struct platform_device *ofdev)
struct sata_fsl_host_priv *host_priv = host->private_data;

device_remove_file(&ofdev->dev, &host_priv->intr_coalescing);
+   device_remove_file(&ofdev->dev, &host_priv->rx_watermark);

ata_host_detach(host);

--
1.7.5.1


__

RE: [PATCH 1/6 v8] iommu/fsl: Store iommu domain information pointer in archdata.

2013-03-03 Thread Sethi Varun-B16395


> -Original Message-
> From: Yoder Stuart-B08248
> Sent: Friday, March 01, 2013 9:52 PM
> To: Alexey Kardashevskiy; Sethi Varun-B16395
> Cc: Kumar Gala; Benjamin Herrenschmidt; io...@lists.linux-foundation.org;
> linuxppc-dev@lists.ozlabs.org list; linux-ker...@vger.kernel.org list;
> Wood Scott-B07421; Joerg Roedel; Paul Mackerras; David Gibson; Alex
> Williamson
> Subject: RE: [PATCH 1/6 v8] iommu/fsl: Store iommu domain information
> pointer in archdata.
> 
> 
> 
> > -Original Message-
> > From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru]
> > Sent: Friday, March 01, 2013 4:07 AM
> > To: Sethi Varun-B16395
> > Cc: Kumar Gala; Benjamin Herrenschmidt;
> > io...@lists.linux-foundation.org; linuxppc-dev@lists.ozlabs.org list;
> > linux-ker...@vger.kernel.org list; Wood Scott-B07421; Yoder
> > Stuart-B08248; Joerg Roedel; Paul Mackerras; David Gibson; Alex
> > Williamson
> > Subject: Re: [PATCH 1/6 v8] iommu/fsl: Store iommu domain information
> pointer in archdata.
> >
> > btw the device struct already has a pointer to its iommu_group, and
> > the iommu_group struct itself has a pointer void *iommu_data which you
> > could use for anything you want (iommu_group_get_iommudata(),
> > iommu_group_set_iommudata()).
> >
> > By design you are expected to add iommu groups to a domain but not
> > devices so I am not so sure that you really need a pointer to domain
> > in the device struct.
> 
> Well, at the lowest level the IOMMU API does attach devices to domains--
> i.e.
> API attach_dev().  So, it seems to conceptually make sense to have a ptr
> from the device to the associated domain.  When you implement
> attach_dev() you need to be able to see whether the device is already
> attached to
> a domain.   Adding a couple of levels of indirection...from device to
> group to domain...doesn't seems to make things simpler or better IMHO.
> 
> x86 keeps a pointer to the domain in device archdata and since there is a
> direct correlation between a device and domain I'd rather see it where
> this patch has it.
> 

As Stuart stated this allows us to maintain the device <-> domain relationship 
at the lowest level.

-Varun

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] kexec/ppc: Fix kernel program entry point while changing the load addr

2013-03-03 Thread Suzuki K. Poulose

On 03/04/2013 07:11 AM, Simon Horman wrote:

[ Cc: linuxppc-dev@lists.ozlabs.org ]

On Sun, Mar 03, 2013 at 01:06:00PM +0530, Suzuki K. Poulose wrote:

From: Suzuki K. Poulose 

uImage probe fills the entry point (ep) based on the load_addr
from the uImage headers. If we change the load_addr, we should
accordingly update the entry point.

For ELF, calculate the offset of e_entry from the virtual start
address and add it to the physical start address to find the
physical address of kernel entry.

i.e,
   pa (e_entry) = pa(phdr[0].p_vaddr) + (e_entry - phdr[0].p_vaddr)
= kernel_addr + (e_entry - phdr[0].p_vaddr)


Would it be possible for someone to provide a review of this change?

To make it bit more clear :

The entry point of the kernel is usually at 0 offset from the first
PT_LOAD section. The current code makes this assumption and uses the 
pa(phdr[0].p_vaddr) as the kernel entry.


But this *may* not be true always, in such a case the kexec would fail.
While I fixed the uImage case, I thought it would be better to handle 
the same case in ELF.


Btw, this calculation is not specific to ppc32.

Thanks
Suzuki





Signed-off-by: Suzuki K. Poulose 
Cc: Sebastian Andrzej Siewior 
Cc: Matthew McClintock 
---
  kexec/arch/ppc/kexec-elf-ppc.c|   12 
  kexec/arch/ppc/kexec-uImage-ppc.c |6 +-
  2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/kexec/arch/ppc/kexec-elf-ppc.c b/kexec/arch/ppc/kexec-elf-ppc.c
index 8e408cc..5f63a64 100644
--- a/kexec/arch/ppc/kexec-elf-ppc.c
+++ b/kexec/arch/ppc/kexec-elf-ppc.c
@@ -397,10 +397,14 @@ int elf_ppc_load(int argc, char **argv,   const char 
*buf, off_t len,
die("Error device tree not loadded to address it was expecting to be 
loaded too!\n");
}

-   /* set various variables for the purgatory  ehdr.e_entry is a
-* virtual address, we can use kernel_addr which
-* should be the physical start address of the kernel */
-   addr = kernel_addr;
+   /*
+* set various variables for the purgatory.
+* ehdr.e_entry is a virtual address. we know physical start
+* address of the kernel (kernel_addr). Find the offset of
+* e_entry from the virtual start address(e_phdr[0].p_vaddr)
+* and calculate the actual physical address of the 'kernel entry'.
+*/
+   addr = kernel_addr + (ehdr.e_entry - ehdr.e_phdr[0].p_vaddr);
elf_rel_set_symbol(&info->rhdr, "kernel", &addr, sizeof(addr));

addr = dtb_addr;
diff --git a/kexec/arch/ppc/kexec-uImage-ppc.c 
b/kexec/arch/ppc/kexec-uImage-ppc.c
index e0bc7bb..900cd16 100644
--- a/kexec/arch/ppc/kexec-uImage-ppc.c
+++ b/kexec/arch/ppc/kexec-uImage-ppc.c
@@ -159,15 +159,19 @@ static int ppc_load_bare_bits(int argc, char **argv, 
const char *buf,

/*
 * If the provided load_addr cannot be allocated, find a new
-* area.
+* area. Rebase the entry point based on the new load_addr.
 */
if (!valid_memory_range(info, load_addr, load_addr + (len + _1MiB))) {
+   int ep_offset = ep - load_addr;
+
load_addr = locate_hole(info, len + _1MiB, 0, 0, max_addr, 1);
if (load_addr == ULONG_MAX) {
printf("Can't allocate memory for kernel of len %ld\n",
len + _1MiB);
return -1;
}
+
+   ep = load_addr + ep_offset;
}

add_segment(info, buf, len, load_addr, len + _1MiB);


___
kexec mailing list
ke...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V1 07/24] powerpc: Add size argument to pgtable_cache_add

2013-03-03 Thread Paul Mackerras
On Tue, Feb 26, 2013 at 01:34:57PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" 
> 
> We will use this later with THP changes to request for pmd table of double 
> the size.
> THP code does PTE page allocation along with large page request and deposit 
> them
> for later use. This is to ensure that we won't have any failures when we split
> huge pages to regular pages.
> 
> On powerpc we want to use the deposited PTE page for storing hash pte slot and
> secondary bit information for the HPTEs. Hence we save them in the second half
> of the pmd table.

Looks OK, but you should explain why you made the wholesale change of
"shift" to "index".  Is there some important semantic difference, or
do you just prefer the "index" name for some reason?

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V1 08/24] powerpc: Use encode avpn where we need only avpn values

2013-03-03 Thread Paul Mackerras
On Tue, Feb 26, 2013 at 01:34:58PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" 
> 
> In all these cases we are doing something similar to
> 
> HPTE_V_COMPARE(hpte_v, want_v) which ignores the HPTE_V_LARGE bit
> 
> With MPSS support we would need actual page size to set HPTE_V_LARGE
> bit and that won't be available in most of these cases. Since we are ignoring
> HPTE_V_LARGE bit, use the  avpn value instead. There should not be any change
> in behaviour after this patch.
> 
> Signed-off-by: Aneesh Kumar K.V 

Acked-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V1 09/24] powerpc: Decode the pte-lp-encoding bits correctly.

2013-03-03 Thread Paul Mackerras
On Tue, Feb 26, 2013 at 01:34:59PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" 
> 
> We look at both the segment base page size and actual page size and store
> the pte-lp-encodings in an array per base page size.
> 
> We also update all relevant functions to take actual page size argument
> so that we can use the correct PTE LP encoding in HPTE. This should also
> get the basic Multiple Page Size per Segment (MPSS) support. This is needed
> to enable THP on ppc64.

Mostly looks OK, comments below...

> +/*
> + * HPTE LP details
> + */
> +#define LP_SHIFT 12
> +#define LP_BITS  8
> +#define LP_MASK(i)   ((0xFF >> (i)) << LP_SHIFT)

The reader might be wondering at this point what "LP" is; be kind and
make it "large page" in the comment for them.

> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 71d0c90..48f6d99 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1515,7 +1515,7 @@ static void kvmppc_add_seg_page_size(struct 
> kvm_ppc_one_seg_page_size **sps,
>   (*sps)->page_shift = def->shift;
>   (*sps)->slb_enc = def->sllp;
>   (*sps)->enc[0].page_shift = def->shift;
> - (*sps)->enc[0].pte_enc = def->penc;
> + (*sps)->enc[0].pte_enc = def->penc[linux_psize];
>   (*sps)++;
>  }

This will only return the entries where actual page size == base page
size, which basically means that KVM guests won't be able to use
MPSS.  We will need to return multiple entries in that case.

> +static inline int hpte_actual_psize(struct hash_pte *hptep, int psize)
> +{
> + unsigned int mask;
> + int i, penc, shift;
> + /* Look at the 8 bit LP value */
> + unsigned int lp = (hptep->r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
> +
> + penc = 0;
> + for (i = 0; i < MMU_PAGE_COUNT; i++) {
> + /* valid entries have a shift value */
> + if (!mmu_psize_defs[i].shift)
> + continue;
> +
> + /* encoding bits per actual page size */
> + shift = mmu_psize_defs[i].shift - 11;
> + if (shift > 9)
> + shift = 9;
> + mask = (1 << shift) - 1;
> + if ((lp & mask) == mmu_psize_defs[psize].penc[i])
> + return i;
> + }
> + return -1;
> +}

This doesn't look right to me.  First, it's not clear what the 11 and
9 refer to, and I think the 9 should be LP_BITS (i.e. 8).  Secondly,
the mask for the comparison needs to depend on the actual page size
not the base page size.

I strongly suggest you pull out this code together with
native_hpte_insert into a little userspace test program that runs
through all the possible page size combinations, creating an HPTE and
then decoding it with hpte_actual_psize() to check that you get back
the correct actual page size.

>  static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
> - int *psize, int *ssize, unsigned long *vpn)
> + int *psize, int *apsize, int *ssize, unsigned long *vpn)
>  {
>   unsigned long avpn, pteg, vpi;
>   unsigned long hpte_r = hpte->r;
>   unsigned long hpte_v = hpte->v;
>   unsigned long vsid, seg_off;
> - int i, size, shift, penc;
> + int i, size, a_size = MMU_PAGE_4K, shift, penc;
>  
>   if (!(hpte_v & HPTE_V_LARGE))
>   size = MMU_PAGE_4K;
> @@ -395,12 +422,13 @@ static void hpte_decode(struct hash_pte *hpte, unsigned 
> long slot,
>   /* valid entries have a shift value */
>   if (!mmu_psize_defs[size].shift)
>   continue;
> -
> - if (penc == mmu_psize_defs[size].penc)
> - break;
> + for (a_size = 0; a_size < MMU_PAGE_COUNT; a_size++)
> + if (penc == mmu_psize_defs[size].penc[a_size])
> + goto out;

Once again I don't think this is correct, since the number of bits in
the page size encoding depends on the page size.  In fact the
calculation of penc in that function looks completely bogus to me (not
that that is code that you have written or modified, but it looks to
me like it needs fixing).

>  static int __init htab_dt_scan_page_sizes(unsigned long node,
> const char *uname, int depth,
> void *data)
> @@ -294,60 +318,57 @@ static int __init htab_dt_scan_page_sizes(unsigned long 
> node,
>   size /= 4;
>   cur_cpu_spec->mmu_features &= ~(MMU_FTR_16M_PAGE);
>   while(size > 0) {
> - unsigned int shift = prop[0];
> + unsigned int base_shift = prop[0];
>   unsigned int slbenc = prop[1];
>   unsigned int lpnum = prop[2];
> - unsigned int lpenc = 0;
>   struct mmu_psize_def *def;
> 

Re: [PATCH -V1 06/24] powerpc: Reduce PTE table memory wastage

2013-03-03 Thread Paul Mackerras
On Tue, Feb 26, 2013 at 01:34:56PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" 
> 
> We allocate one page for the last level of linux page table. With THP and
> large page size of 16MB, that would mean we are be wasting large part
> of that page. To map 16MB area, we only need a PTE space of 2K with 64K
> page size. This patch reduce the space wastage by sharing the page
> allocated for the last level of linux page table with multiple pmd
> entries. We call these smaller chunks PTE page fragments and allocated
> page, PTE page. We use the page->_mapcount as bitmap to indicate which
> PTE fragments are free.
> 
> page->_mapcount is divided into two halves. The upper half is used for
> tracking the freed page framents in the RCU grace period.
> 
> In order to support systems which doesn't have 64K HPTE support, we also
> add another 2K to PTE page fragment. The second half of the PTE fragments
> is used for storing slot and secondary bit information of an HPTE. With this
> we now have a 4K PTE fragment.
> 
> Signed-off-by: Aneesh Kumar K.V 

This one has taken me hours to review.  Perhaps it's partly because of
the way that diff has matched things up, but it's difficult to see
what's moved where, what's common code that is now the 4k page case,
etc.  For example, pmd_alloc_one() and pmd_free() are unchanged, but
the diff shows them as removed in one place and added in another.

The other general comment I have is that it's not really clear when a
page will be on the mm->context.pgtable_list and when it won't.  I
would like to see an invariant that says something like "the page is
on the pgtable_list if and only if (page->_mapcount & FRAG_MASK) is
neither 0 nor FRAG_MASK".  But that doesn't seem to be the case
exactly, and I can't see any consistent rule, which makes me think
there are going to be bugs in corner cases.

Consider, for example, the case where a page has two fragments still
in use, and one of them gets queued up by RCU for freeing via a call
to page_table_free_rcu, and then the other one gets freed through
page_table_free().  Neither the call to page_table_free_rcu nor the
call to page_table_free will take the page off the list AFAICS, and
then __page_table_free_rcu() will free the page while it's still on
the pgtable_list.

More specific comments below...

> -static inline void pgtable_free(void *table, unsigned index_size)
> -{
> - if (!index_size)
> - free_page((unsigned long)table);
> - else {
> - BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
> - kmem_cache_free(PGT_CACHE(index_size), table);
> - }
> -}

This is still used in the UP case, both for 4k and 64k, and UP configs
now fail to build.

>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
>   free_page((unsigned long)pte);
> @@ -156,7 +118,12 @@ static inline void __tlb_remove_table(void *_table)
>   void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
>   unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
>  
> - pgtable_free(table, shift);
> + if (!shift)
> + free_page((unsigned long)table);
> + else {
> + BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
> + kmem_cache_free(PGT_CACHE(shift), table);
> + }

Any particular reason for open-coding pgtable_free() here?

> +/*
> + * we support 15 fragments per PTE page. This is limited by how many
> + * bits we can pack in page->_mapcount. We use the first half for
> + * tracking the usage for rcu page table free.
> + */
> +#define FRAG_MASK_BITS   15
> +#define FRAG_MASK ((1 << FRAG_MASK_BITS) - 1)

Atomic_t variables are 32-bit, so you really should be able to make
FRAG_MASK_BITS be 16 and avoid wasting the last fragment of each page.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/3] Enable multiple MSI feature in pSeries

2013-03-03 Thread Mike Qiu

于 2013/3/1 11:54, Michael Ellerman 写道:

On Fri, Mar 01, 2013 at 11:08:45AM +0800, Mike wrote:

Hi all

Any comments? or any questions about my patchset?

You were going to get some performance numbers that show a definite
benefit for using more than one MSI.

Yes, but my patch just enable the kernel to support this feature, whether
to use it depens on the device driver.

And this feature has been merged to the kernel for X86 for a long time.
See commit: 5ca72c4f7c412c2002363218901eba5516c476b1
51906e779f2b13b38f8153774c4c7163d412ffd9

Actually, I'm trying to do the test. but it is difficult to do that test,
because it mostly depends on how the device driver to use this feature,
while the ipr driver patch was wrote by another person. also no any reply
from her.



cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kexec/ppc: Fix kernel program entry point while changing the load addr

2013-03-03 Thread Simon Horman
[ Cc: linuxppc-dev@lists.ozlabs.org ]

On Sun, Mar 03, 2013 at 01:06:00PM +0530, Suzuki K. Poulose wrote:
> From: Suzuki K. Poulose 
> 
> uImage probe fills the entry point (ep) based on the load_addr
> from the uImage headers. If we change the load_addr, we should
> accordingly update the entry point.
> 
> For ELF, calculate the offset of e_entry from the virtual start
> address and add it to the physical start address to find the
> physical address of kernel entry.
> 
> i.e,
>   pa (e_entry) = pa(phdr[0].p_vaddr) + (e_entry - phdr[0].p_vaddr)
>= kernel_addr + (e_entry - phdr[0].p_vaddr)

Would it be possible for someone to provide a review of this change?

> 
> Signed-off-by: Suzuki K. Poulose 
> Cc: Sebastian Andrzej Siewior 
> Cc: Matthew McClintock 
> ---
>  kexec/arch/ppc/kexec-elf-ppc.c|   12 
>  kexec/arch/ppc/kexec-uImage-ppc.c |6 +-
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/kexec/arch/ppc/kexec-elf-ppc.c b/kexec/arch/ppc/kexec-elf-ppc.c
> index 8e408cc..5f63a64 100644
> --- a/kexec/arch/ppc/kexec-elf-ppc.c
> +++ b/kexec/arch/ppc/kexec-elf-ppc.c
> @@ -397,10 +397,14 @@ int elf_ppc_load(int argc, char **argv, const char 
> *buf, off_t len,
>   die("Error device tree not loadded to address it was expecting 
> to be loaded too!\n");
>   }
>  
> - /* set various variables for the purgatory  ehdr.e_entry is a
> -  * virtual address, we can use kernel_addr which
> -  * should be the physical start address of the kernel */
> - addr = kernel_addr;
> + /* 
> +  * set various variables for the purgatory.
> +  * ehdr.e_entry is a virtual address. we know physical start
> +  * address of the kernel (kernel_addr). Find the offset of
> +  * e_entry from the virtual start address(e_phdr[0].p_vaddr)
> +  * and calculate the actual physical address of the 'kernel entry'.
> +  */
> + addr = kernel_addr + (ehdr.e_entry - ehdr.e_phdr[0].p_vaddr);
>   elf_rel_set_symbol(&info->rhdr, "kernel", &addr, sizeof(addr));
>  
>   addr = dtb_addr;
> diff --git a/kexec/arch/ppc/kexec-uImage-ppc.c 
> b/kexec/arch/ppc/kexec-uImage-ppc.c
> index e0bc7bb..900cd16 100644
> --- a/kexec/arch/ppc/kexec-uImage-ppc.c
> +++ b/kexec/arch/ppc/kexec-uImage-ppc.c
> @@ -159,15 +159,19 @@ static int ppc_load_bare_bits(int argc, char **argv, 
> const char *buf,
>  
>   /*
>* If the provided load_addr cannot be allocated, find a new
> -  * area.
> +  * area. Rebase the entry point based on the new load_addr.
>*/
>   if (!valid_memory_range(info, load_addr, load_addr + (len + _1MiB))) {
> + int ep_offset = ep - load_addr;
> +
>   load_addr = locate_hole(info, len + _1MiB, 0, 0, max_addr, 1);
>   if (load_addr == ULONG_MAX) {
>   printf("Can't allocate memory for kernel of len %ld\n",
>   len + _1MiB);
>   return -1;
>   }
> +
> + ep = load_addr + ep_offset;
>   }
>  
>   add_segment(info, buf, len, load_addr, len + _1MiB);
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Linux kernel 3.x problems on PowerMac G5

2013-03-03 Thread Benjamin Herrenschmidt
On Sun, 2013-03-03 at 20:16 +0100, Phileas Fogg wrote:
> Benjamin Herrenschmidt wrote:
> > Thanks. It looks like a bisection might indeed be the way to go...
> >
> > Out of curiosity, have you tried without some of your additional drivers ?
> > Maybe one of them is the culprit...
> >
> > Cheers,
> > Ben.
> >
> 
> Not yet, will do.
> But I tested the official Debian Wheezy RC netinstall CD with Linux 3.2,
> it has the same problem and hangs at boot on my machine.

Ok, so it's definitely something about your configuration. Maybe
something in the 11,2 support code chokes on single-chip configs, I
don't have one of them to test, both mines are dual chip (ie. quad
core).

But it does look like a regression that should be bisectable, so let me
know went you're done there and what you get.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Linux kernel 3.x problems on PowerMac G5

2013-03-03 Thread Phileas Fogg

Benjamin Herrenschmidt wrote:

Thanks. It looks like a bisection might indeed be the way to go...

Out of curiosity, have you tried without some of your additional drivers ?
Maybe one of them is the culprit...

Cheers,
Ben.



Not yet, will do.
But I tested the official Debian Wheezy RC netinstall CD with Linux 3.2,
it has the same problem and hangs at boot on my machine.

regards

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Linux kernel 3.x problems on PowerMac G5

2013-03-03 Thread Benjamin Herrenschmidt
On Sun, 2013-03-03 at 05:16 -0700, Phileas Fogg wrote:
> i'm testing currently with slightly modified g5_defconfig. I enabled
> the Ralink driver for my USB WLAN NIC and some other stuff, nothing 
> fancy. Anyways, i'm sending you config for 3.0.67. Linux 3.0.67 hangs on 
> my machine too.
> 
> I cloned your GIT powerpc repository and at the moment bisecting and
> trying to find the commit which causes trouble on my machine.
> 
> I have already found out that
> the commit 27d934b28752b860cba6c0d77ea4598861d80998 is OK
> but
> the commit 0bd41dfc9fbbcf174d5336c1c9fc5ba917519761 is NOT.
> The problem appeared between Linux 2.6.39 and Linux-3.0-rc1.
> 
> Just my typical luck with PowerPC :) First PS3 and now G5.
> 
> I will report back as soon as i find the commit which causes the hangs.

Thanks. It looks like a bisection might indeed be the way to go... 

Out of curiosity, have you tried without some of your additional drivers ?
Maybe one of them is the culprit...

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V2] lglock: add read-preference local-global rwlock

2013-03-03 Thread Oleg Nesterov
On 03/02, Oleg Nesterov wrote:
>
> On 03/02, Lai Jiangshan wrote:
> >
> > +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> > +{
> > +   switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> > +   case 1:
> > +   __this_cpu_write(*lgrw->reader_refcnt, 0);
> > +   lg_local_unlock(&lgrw->lglock);
> > +   return;
> > +   case FALLBACK_BASE:
> > +   __this_cpu_write(*lgrw->reader_refcnt, 0);
> > +   read_unlock(&lgrw->fallback_rwlock);
> > +   rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>
> I guess "case 1:" should do rwlock_release() too.
>
> Otherwise, at first glance looks correct...
>
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

Yes, but...

> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?
>
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.

Please ignore this part.

I missed that lg_rwlock_global_write_lock() relies on lg_global_lock(),
and I just noticed that it does rwlock_acquire(lg->lock_dep_map).

Hmm. But then I do not understand the lglock annotations. Obviously,
rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).

OK, I understand that it is not easy to make these annotations correct...

Oleg.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Linux kernel 3.x problems on PowerMac G5

2013-03-03 Thread Phileas Fogg

Benjamin Herrenschmidt wrote:

On Sat, 2013-03-02 at 15:40 +0100, Phileas Fogg wrote:

recently i got a PowerMac G5 and installed Debian Linux 2.6.32 on it.
Everything works so far and Debian boots properly.

Today i tried to boot Linux 3 on the machine and it doesn't boot.
The Linux 3 kernel was cross-compiled by me.

On Linux 3.8.1 it hangs after this line:
---
windfarm: Drive bay control loop started.

And then i'm getting RCU stall call traces.

On Linux 3.2 it hangs too but not at the same place.
It hangs after some SCSI message.

Have anyone tested Linux 3 kernels on PowerMac G5 recently ?


Hrm, this is odd. I do run pretty much every release on my G5's without
problems... Can you send me your .config and then try with a
g5_defconfig just to see if it makes a difference ?

There *might* have been a problem on those older machines vs. the 64T
address space patches, so maybe try back 3.6 and 3.7 and let me know,
I'm still trying to get the right fix in (I know it breaks PS/3 under
some circumstances).

Cheers,
Ben.





Reverted 64TB commit on Linux 3.8.1 and it didn't help, it still hangs.

Here is the RCU stall call trace:

BUG - soft lockup - CPU#1 stuck for 23s
Call trace:
padzero
load_elf_binary
search_binary_handler
load_script
search_binary_handler
do_execve_common
sys_execve
syscall_exit

Exception at kernel_execve
LR = run_init_process



regards


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Linux kernel 3.x problems on PowerMac G5

2013-03-03 Thread Benjamin Herrenschmidt
On Sun, 2013-03-03 at 11:23 +0300, Denis Kirjanov wrote:
> I'm running 3.7.6 without any problems on my quad core machine. You
> have to try to compile with g5_defconfig...

Sure but I still want to know what in his config is breaking it, chances
are that's something that we should fix :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Linux kernel 3.x problems on PowerMac G5

2013-03-03 Thread Denis Kirjanov
I'm running 3.7.6 without any problems on my quad core machine. You
have to try to compile with g5_defconfig...

On 3/3/13, Benjamin Herrenschmidt  wrote:
> On Sat, 2013-03-02 at 17:22 +0100, Andreas Schwab wrote:
>> > Thanks. Then i configured something wrongly probably.
>> > I tried Linux 2.6.39.4 and it boots too.
>> > I looked around on the Internet and it seems i'm not the only one
>> > who has problems with Linux 3.x on my PowerMac G5.
>>
>> I have no problem with 3.8 either, but mine is a PowerMac7,3.
>
> Ok. I have a quad core G5 which is also a 11,2, it's even my primary
> workstation in the office :-) So it tends to work fine but I admit I
> haven't updated the kernel on it in a while.
>
> Phileas, send me the .config you're using, there might be an issue
> there, I'll do some tests next week.
>
> Cheers,
> Ben.
>
>
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>


-- 
Regards,
Denis
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev