date:20140327

[PATCH RESEND] gadgetfs: Initialize CHIP to NULL before UDC probe

2014-03-27 Thread Lubomir Rintel

Otherwise the value from the last probe would be retained that possibly is
freed since (the UDC is removed) and therefore no longer relevant. Reproducible
with the dummy UDC:

  modprobe dummy_hcd
  mount -t gadgetfs gadgetfs /dev/gadget
  umount /dev/gadget
  rmmod dummy_hcd
  mount -t gadgetfs gadgetfs /dev/gadget

BUG: unable to handle kernel paging request at a066fd9d
Call Trace:
 [] ? d_alloc_name+0x22/0x50
 [] ? selinux_d_instantiate+0x1c/0x20
 [] gadgetfs_create_file+0x27/0xa0 [gadgetfs]
 [] ? setup_req.isra.4+0x80/0x80 [gadgetfs]
 [] gadgetfs_fill_super+0x13c/0x180 [gadgetfs]
 [] mount_single+0x92/0xc0
 [] gadgetfs_mount+0x18/0x20 [gadgetfs]
 [] mount_fs+0x39/0x1b0
 [] ? __alloc_percpu+0x10/0x20
 [] vfs_kern_mount+0x63/0xf0
 [] do_mount+0x23e/0xac0
 [] ? strndup_user+0x4b/0xf0
 [] SyS_mount+0x83/0xc0
 [] system_call_fastpath+0x16/0x1b

Signed-off-by: Lubomir Rintel 
---
 drivers/usb/gadget/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c
index b94c049..ee15628 100644
--- a/drivers/usb/gadget/inode.c
+++ b/drivers/usb/gadget/inode.c
@@ -2046,6 +2046,7 @@ gadgetfs_fill_super (struct super_block *sb, void *opts, 
int silent)
return -ESRCH;
 
/* fake probe to determine $CHIP */
+   CHIP = NULL;
usb_gadget_probe_driver(&probe_driver);
if (!CHIP)
return -ENODEV;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC II] Splitting scheduler into two halves

2014-03-27 Thread Ingo Molnar

* Mike Galbraith  wrote:

> On Thu, 2014-03-27 at 02:37 +0800, Yuyang du wrote: 
> > Hi all,
> > 
> > This is continued after the first RFC about splitting the scheduler. Still
> > work-in-progress, and call for feedback.
> > 
> > The question addressed here is how load balance should be changed. And I 
> > think
> > the question then goes to how to *reuse* common code as much as possible and
> > meanwhile be able to serve various objectives.
> > 
> > So these are the basic semantics needed in current load balance:
> 
> I'll probably regret it, but I'm gonna speak my mind.  I think this 
> two halves concept is fundamentally broken.

As PeterZ pointed it out in the previous discussion, this approach, 
besides being fundamentally broken, also gives no valid technical 
rationale given for the change.

Firstly, I'd like to stress it that we are not against abstraction and 
interfaces within the scheduler (at all!) - we already have a 'split' 
and use interfaces between 'scheduler classes':

struct sched_class {
const struct sched_class *next;

void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
void (*yield_task) (struct rq *rq);
bool (*yield_to_task) (struct rq *rq, struct task_struct *p, bool 
preempt);

void (*check_preempt_curr) (struct rq *rq, struct task_struct *p, int 
flags);

/*
 * It is the responsibility of the pick_next_task() method that will
 * return the next task to call put_prev_task() on the @prev task or
 * something equivalent.
 *
 * May return RETRY_TASK when it finds a higher prio class has runnable
 * tasks.
 */
struct task_struct * (*pick_next_task) (struct rq *rq,
struct task_struct *prev);
void (*put_prev_task) (struct rq *rq, struct task_struct *p);

#ifdef CONFIG_SMP
int  (*select_task_rq)(struct task_struct *p, int task_cpu, int 
sd_flag, int flags);
void (*migrate_task_rq)(struct task_struct *p, int next_cpu);

void (*post_schedule) (struct rq *this_rq);
void (*task_waking) (struct task_struct *task);
void (*task_woken) (struct rq *this_rq, struct task_struct *task);

void (*set_cpus_allowed)(struct task_struct *p,
 const struct cpumask *newmask);

void (*rq_online)(struct rq *rq);
void (*rq_offline)(struct rq *rq);
#endif

void (*set_curr_task) (struct rq *rq);
void (*task_tick) (struct rq *rq, struct task_struct *p, int queued);
void (*task_fork) (struct task_struct *p);
void (*task_dead) (struct task_struct *p);

void (*switched_from) (struct rq *this_rq, struct task_struct *task);
void (*switched_to) (struct rq *this_rq, struct task_struct *task);
void (*prio_changed) (struct rq *this_rq, struct task_struct *task,
 int oldprio);

unsigned int (*get_rr_interval) (struct rq *rq,
 struct task_struct *task);

#ifdef CONFIG_FAIR_GROUP_SCHED
void (*task_move_group) (struct task_struct *p, int on_rq);
#endif
};

So where it makes sense we make use of this programming technique, to 
the extent it is helpful.

But interfaces and abstraction has a cost, and the justification given 
in this submission looks very weak to me. There's no justification 
given in this specific submission, the closest I could find was in the 
first submission:

> > With the advent of more cores and heterogeneous architectures, the 
> > scheduler is required to be more complex (power efficiency) and 
> > diverse (big.little). For the scheduler to address that challenge 
> > as a whole, it is costly but not necessary. This proposal argues 
> > that the scheduler be spitted into two parts: top half (task 
> > scheduling) and bottom half (load balance). Let the bottom half 
> > take charge of the incoming requirements.

That is just way too generic with no specific technical benefits 
listed. No cost/benefit demonstrated.

If there's any advantage to a 'split', then it must be expressable via 
one or more of these positive attributes:

 - better numbers (better performance, etc.)
 - reduced code
 - new features

A split alone, without making active and convincing use of it, is 
inadequate.

So without a much better rationale, demonstrated via actual, real 
working code that not only does the split but also makes real use of 
every aspect of the proposed abstraction interfaces, which 
demonstrates that the proposed 'split' is the most sensible way 
forward, this specific submission earns a NAK from me.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
P

linux-next: Tree for Mar 27

2014-03-27 Thread Stephen Rothwell

Hi all,

This tree still fails (more than usual) the powerpc allyesconfig build.

Changes since 20140325:

The powerpc tree still had its build failure.

The staging tree gained a conflict against the net-next tree.

Non-merge commits (relative to Linus' tree): 11008
 9412 files changed, 451457 insertions(+), 225148 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 213 trees (counting Linus' and 28 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (f217c44ebd41 Merge tag 'trace-fixes-v3.14-rc7-v2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace)
Merging fixes/master (b0031f227e47 Merge tag 's2mps11-build' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator)
Merging kbuild-current/rc-fixes (38dbfb59d117 Linus 3.14-rc1)
Merging arc-current/for-curr (f8b34c3fd5a3 ARC: [clockevent] simplify timer ISR)
Merging arm-current/fixes (95c52fe06335 ARM: 8007/1: Remove extraneous kcmp 
syscall ignore)
Merging m68k-current/for-linus (7247f55381d5 m68k: Wire up sched_setattr and 
sched_getattr)
Merging metag-fixes/fixes (0414855fdc4a Linux 3.14-rc5)
Merging powerpc-merge/merge (ece980ff6de6 powerpc/book3s: Fix CFAR clobbering 
issue in machine check handler.)
Merging sparc/master (b098d6726bbf Linux 3.14-rc8)
Merging net/master (de1443916791 net: unix: non blocking recvmsg() should not 
return -EINTR)
Merging ipsec/master (3e3d35402140 ATHEROS-ATL1E: Convert iounmap to 
pci_iounmap)
Merging sound-current/for-linus (749d32237bf3 ALSA: compress: Pass through 
return value of open ops callback)
Merging pci-current/for-linus (707d4eefbdb3 Revert "[PATCH] Insert GART region 
into resource map")
Merging wireless/master (584221918925 Revert "rt2x00: rt2800lib: Update BBP 
register initialization for RT53xx")
Merging driver-core.current/driver-core-linus (0414855fdc4a Linux 3.14-rc5)
Merging tty.current/tty-linus (0414855fdc4a Linux 3.14-rc5)
Merging usb.current/usb-linus (fa389e220254 Linux 3.14-rc6)
Merging staging.current/staging-linus (dcb99fd9b08c Linux 3.14-rc7)
Merging char-misc.current/char-misc-linus (0414855fdc4a Linux 3.14-rc5)
Merging input-current/for-linus (70b0052425ff Input: da9052_onkey - use correct 
register bit for key status)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (ee97dc7db4cb crypto: s390 - fix des and des3_ede 
ctr concurrency issue)
Merging ide/master (5b40dd30bbfa ide: Fix SC1200 dependencies)
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging devicetree-current/devicetree/merge (1f42e5dd5065 of: Add self test for 
of_match_node())
Merging rr-fixes/fixes (7122c3e9154b scripts/link-vmlinux.sh: only filter 
kernel symbols for arm)
Merging mfd-fixes/master (73beb63d290f mfd: rtsx_pcr: Disable interrupts before 
cancelling delayed works)
Merging vfio-fixes/for-linus (239a87020b26 Merge branch 
'for-joerg/arm-smmu/fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into for-linus)
Merging drm-intel-fixes/for-linux-next-fixes (24bd9bf54d45 drm/i915: Fix PSR 
programming)
Merging asm-generic/master (fb9de7ebc3a2 xtensa: Use generic asm/mmu.h for 
nommu)
Merging arc/for-next (b098d6726bbf Linux 3.14-rc8)
Merging arm/for-next (54740e1ee0fe Merge branch 'devel-stable' into for-next)
Merging arm-kvm

Re: [PATCH -mm 1/4] sl[au]b: do not charge large allocations to memcg

2014-03-27 Thread Vladimir Davydov

Hi Michal,

On 03/27/2014 01:53 AM, Michal Hocko wrote:
> On Wed 26-03-14 19:28:04, Vladimir Davydov wrote:
>> We don't track any random page allocation, so we shouldn't track kmalloc
>> that falls back to the page allocator.
> Why did we do that in the first place? d79923fad95b (sl[au]b: allocate
> objects from memcg cache) didn't tell me much.

I don't know, we'd better ask Glauber about that.

> How is memcg_kmem_skip_account removal related?

The comment this patch removes along with the memcg_kmem_skip_account
check explains that pretty well IMO. In short, we only use
memcg_kmem_skip_account to prevent kmalloc's from charging, which is
crucial for recursion-avoidance in memcg_kmem_get_cache. Since we don't
charge pages allocated from a root (not per-memcg) cache, from the first
glance it would be enough to check for memcg_kmem_skip_account only in
memcg_kmem_get_cache and return the root cache if it's set. However, for
we can also kmalloc w/o issuing memcg_kmem_get_cache (kmalloc_large), we
also need this check in memcg_kmem_newpage_charge. This patch removes
kmalloc_large accounting, so we don't need this check anymore.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/4] sl[au]b: do not charge large allocations to memcg

2014-03-27 Thread Vladimir Davydov

Hi Greg,

On 03/27/2014 08:31 AM, Greg Thelen wrote:
> On Wed, Mar 26 2014, Vladimir Davydov  wrote:
>
>> We don't track any random page allocation, so we shouldn't track kmalloc
>> that falls back to the page allocator.
> This seems like a change which will leads to confusing (and arguably
> improper) kernel behavior.  I prefer the behavior prior to this patch.
>
> Before this change both of the following allocations are charged to
> memcg (assuming kmem accounting is enabled):
>  a = kmalloc(KMALLOC_MAX_CACHE_SIZE, GFP_KERNEL)
>  b = kmalloc(KMALLOC_MAX_CACHE_SIZE + 1, GFP_KERNEL)
>
> After this change only 'a' is charged; 'b' goes directly to page
> allocator which no longer does accounting.

Why do we need to charge 'b' in the first place? Can the userspace
trigger such allocations massively? If there can only be one or two such
allocations from a cgroup, is there any point in charging them?

In fact, do we actually need to charge every random kmem allocation? I
guess not. For instance, filesystems often allocate data shared among
all the FS users. It's wrong to charge such allocations to a particular
memcg, IMO. That said the next step is going to be adding a per kmem
cache flag specifying if allocations from this cache should be charged
so that accounting will work only for those caches that are marked so
explicitly.

There is one more argument for removing kmalloc_large accounting - we
don't have an easy way to track such allocations, which prevents us from
reparenting kmemcg charges on css offline. Of course, we could link
kmalloc_large pages in some sort of per-memcg list which would allow us
to find them on css offline, but I don't think such a complication is
justified.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Qemu-devel] Massive read only kvm guests when backing file was missing

2014-03-27 Thread Markus Armbruster

"Michael S. Tsirkin"  writes:

> On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
>> Hi List!
>> Hope some one can help me, we had a big issue in our cloud the other
>> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
>> went read only filesystem from the guest side because the backing
>> files directory (the openstack _base directory) was compromised and
>> the data was lost, when we realized the data was lost, it took us 5
>> mins to restore the backup of the backing files, but by that time all
>> the kvm guests received some kind of IO error from the hypervisor
>> layer, and went read only on root filesystem.
>> 
>> My question would be, is there a way to hold the IO operations against
>> the backing files ( i thought that would be 99% READ operations ) for
>> a little longer ( im asking this because i dont quite understand what
>> is the process and when it raises the error ) in a case the backing
>> files are missing (no IO possible) but is recoverable within minutes ?
>> 
>> Any tip  on how to achieve this if possible, or information about how
>> backing files works on kvm, will be amazing.
>> Waiting for feedback!
>> 
>> kindest regards.
>> Alejandro Comisario
>
>
> I'm guessing this is what happened: guests timed out meanwhile.
> You can increase the timeout within the guest:
> echo 600 > /sys/block/sda/device/timeout
> to timeout after 10 minutes.
>
> If you have installed qemu guest agent on your system, you can do this
> from the host. Unfortunately by default it's memory can be pushed out to swap
> and then on disk error access there might will fail :(
> Maybe we should consider mlock on all its memory at least as an option.
>
> You could pause your guests, restart them after the issue is resolved,
> and we could I guess add functionality to pause VM on disk errors
> automatically.
> Stefan?

Would -drive rerror=stop do?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 2/4] sl[au]b: charge slabs to memcg explicitly

2014-03-27 Thread Vladimir Davydov

On 03/27/2014 01:58 AM, Michal Hocko wrote:
> On Wed 26-03-14 19:28:05, Vladimir Davydov wrote:
>> We have only a few places where we actually want to charge kmem so
>> instead of intruding into the general page allocation path with
>> __GFP_KMEMCG it's better to explictly charge kmem there. All kmem
>> charges will be easier to follow that way.
>>
>> This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG
>> from memcg caches' allocflags. Instead it makes slab allocation path
>> call memcg_charge_kmem directly getting memcg to charge from the cache's
>> memcg params.
> Yes, removing __GFP_KMEMCG is definitely a good step. I am currently at
> a conference and do not have much time to review this properly (even
> worse will be on vacation for the next 2 weeks) but where did all the
> static_key optimization go? What am I missing.

I expected this question, because I want somebody to confirm if we
really need such kind of optimization in the slab allocation path. From
my POV, since we thrash cpu caches there anyway by calling alloc_pages,
wrapping memcg_charge_slab in a static branch wouldn't result in any
noticeable performance boost.

I do admit we benefit from static branching in memcg_kmem_get_cache,
because this one is called on every kmem object allocation, but slab
allocations happen much rarer.

I don't insist on that though, so if you say "no", I'll just add
__memcg_charge_slab and make memcg_charge_slab call it if the static key
is on, but may be, we can avoid such code bloating?

Thanks.

>> Signed-off-by: Vladimir Davydov 
>> Cc: Johannes Weiner 
>> Cc: Michal Hocko 
>> Cc: Glauber Costa 
>> Cc: Christoph Lameter 
>> Cc: Pekka Enberg 
>> ---
>>  include/linux/memcontrol.h |   24 +---
>>  mm/memcontrol.c|   15 +++
>>  mm/slab.c  |7 ++-
>>  mm/slab_common.c   |6 +-
>>  mm/slub.c  |   24 +---
>>  5 files changed, 52 insertions(+), 24 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index e9dfcdad24c5..b8aaecc25cbf 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -512,6 +512,9 @@ void memcg_update_array_size(int num_groups);
>>  struct kmem_cache *
>>  __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
>>  
>> +int memcg_charge_slab(struct kmem_cache *s, gfp_t gfp, int order);
>> +void memcg_uncharge_slab(struct kmem_cache *s, int order);
>> +
>>  void mem_cgroup_destroy_cache(struct kmem_cache *cachep);
>>  int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
>>  
>> @@ -589,17 +592,7 @@ memcg_kmem_commit_charge(struct page *page, struct 
>> mem_cgroup *memcg, int order)
>>   * @cachep: the original global kmem cache
>>   * @gfp: allocation flags.
>>   *
>> - * This function assumes that the task allocating, which determines the 
>> memcg
>> - * in the page allocator, belongs to the same cgroup throughout the whole
>> - * process.  Misacounting can happen if the task calls 
>> memcg_kmem_get_cache()
>> - * while belonging to a cgroup, and later on changes. This is considered
>> - * acceptable, and should only happen upon task migration.
>> - *
>> - * Before the cache is created by the memcg core, there is also a possible
>> - * imbalance: the task belongs to a memcg, but the cache being allocated 
>> from
>> - * is the global cache, since the child cache is not yet guaranteed to be
>> - * ready. This case is also fine, since in this case the GFP_KMEMCG will 
>> not be
>> - * passed and the page allocator will not attempt any cgroup accounting.
>> + * All memory allocated from a per-memcg cache is charged to the owner 
>> memcg.
>>   */
>>  static __always_inline struct kmem_cache *
>>  memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
>> @@ -667,6 +660,15 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t 
>> gfp)
>>  {
>>  return cachep;
>>  }
>> +
>> +static inline int memcg_charge_slab(struct kmem_cache *s, gfp_t gfp, int 
>> order)
>> +{
>> +return 0;
>> +}
>> +
>> +static inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
>> +{
>> +}
>>  #endif /* CONFIG_MEMCG_KMEM */
>>  #endif /* _LINUX_MEMCONTROL_H */
>>  
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 81a162d01d4d..9bbc088e3107 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3506,6 +3506,21 @@ out:
>>  }
>>  EXPORT_SYMBOL(__memcg_kmem_get_cache);
>>  
>> +int memcg_charge_slab(struct kmem_cache *s, gfp_t gfp, int order)
>> +{
>> +if (is_root_cache(s))
>> +return 0;
>> +return memcg_charge_kmem(s->memcg_params->memcg, gfp,
>> + PAGE_SIZE << order);
>> +}
>> +
>> +void memcg_uncharge_slab(struct kmem_cache *s, int order)
>> +{
>> +if (is_root_cache(s))
>> +return;
>> +memcg_uncharge_kmem(s->memcg_params->memcg, PAGE_SIZE << order);
>> +}
>> +
>>  /*
>>   * We need to verify if the al

Re: [PATCH -mm 3/4] fork: charge threadinfo to memcg explicitly

2014-03-27 Thread Vladimir Davydov

On 03/27/2014 02:00 AM, Michal Hocko wrote:
> On Wed 26-03-14 19:28:06, Vladimir Davydov wrote:
>> We have only a few places where we actually want to charge kmem so
>> instead of intruding into the general page allocation path with
>> __GFP_KMEMCG it's better to explictly charge kmem there. All kmem
>> charges will be easier to follow that way.
>>
>> This is a step toward removing __GFP_KMEMCG. It makes fork charge task
>> threadinfo pages explicitly instead of passing __GFP_KMEMCG to
>> alloc_pages.
> Looks good from a quick glance. I would also remove
> THREADINFO_GFP_ACCOUNTED in this patch.

To do so,  I'd have to remove __GFP_KMEMCG check from
memcg_kmem_newpage_charge, which is better to do in the next patch,
which removes __GFP_KMEMCG everywhere, IMO.

Thanks.

>> Signed-off-by: Vladimir Davydov 
>> Cc: Johannes Weiner 
>> Cc: Michal Hocko 
>> Cc: Glauber Costa 
>> ---
>>  kernel/fork.c |   13 ++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index f4b09bc15f3a..8209780cf732 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -150,15 +150,22 @@ void __weak arch_release_thread_info(struct 
>> thread_info *ti)
>>  static struct thread_info *alloc_thread_info_node(struct task_struct *tsk,
>>int node)
>>  {
>> -struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED,
>> - THREAD_SIZE_ORDER);
>> +struct page *page;
>> +struct mem_cgroup *memcg = NULL;
>>  
>> +if (!memcg_kmem_newpage_charge(THREADINFO_GFP_ACCOUNTED, &memcg,
>> +   THREAD_SIZE_ORDER))
>> +return NULL;
>> +page = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER);
>> +memcg_kmem_commit_charge(page, memcg, THREAD_SIZE_ORDER);
>>  return page ? page_address(page) : NULL;
>>  }
>>  
>>  static inline void free_thread_info(struct thread_info *ti)
>>  {
>> -free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER);
>> +if (ti)
>> +memcg_kmem_uncharge_pages(virt_to_page(ti), THREAD_SIZE_ORDER);
>> +free_pages((unsigned long)ti, THREAD_SIZE_ORDER);
>>  }
>>  # else
>>  static struct kmem_cache *thread_info_cache;
>> -- 
>> 1.7.10.4
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] MIPS: OCTEON: fix EARLY_PRINTK_8250 build failure

2014-03-27 Thread Ralf Baechle

On Tue, Mar 25, 2014 at 09:58:45PM +0200, Aaro Koskinen wrote:

> Enabling EARLY_PRINTK_8250 breaks OCTEON builds because of multiple
> prom_putchar() implementations. OCTEON provides its own prom_putchar()
> (also used by the watchdog driver), so we should prevent user from
> selecting EARLY_PRINTK_8250 on OCTEON.

There shouldn't be Octeon-specific code in the 8250-driver, not even
the Kconfig bits.  Also other systems - at least Sibyte - affected
by the same issue so I went for a different patch.

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: hpet: Don't default CONFIG_HPET_TIMER to be y for X86_64

2014-03-27 Thread Feng Tang

On many new phone/tablet platforms like Baytrail/Merrifield etc,
the HPET are either defeatured or has some problem to be used
as a reliable timer. As these platforms also have X86_64, we
should not make HPET_TIMER default y for all X86_64.

Signed-off-by: Feng Tang 
---
 arch/x86/Kconfig |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..269bd47 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -695,8 +695,7 @@ config X86_CYCLONE_TIMER
 source "arch/x86/Kconfig.cpu"
 
 config HPET_TIMER
-   def_bool X86_64
-   prompt "HPET Timer Support" if X86_32
+   prompt "HPET Timer Support"
---help---
  Use the IA-PC HPET (High Precision Event Timer) to manage
  time in preference to the PIT and RTC, if a HPET is
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] clk: register fixed-clock only if #clock-cells property is present

2014-03-27 Thread Boris BREZILLON


Le 26/03/2014 21:14, Fabio Estevam a écrit :

On Wed, Mar 26, 2014 at 4:57 PM, Sylwester Nawrocki
 wrote:


Perhaps a change as below helps ?

 From 85ee85e4a92b42442354f3f2454be50c173e1c59 Mon Sep 17 00:00:00 2001
From: Sylwester Nawrocki 
Date: Wed, 26 Mar 2014 20:54:13 +0100
Subject: [PATCH] clk: reverse default clk provider initialization order in
of_clk_init()


Signed-off-by: Sylwester Nawrocki 
---
  drivers/clk/clk.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index fb3c40b..d30809c 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2608,7 +2608,7 @@ void __init of_clk_init(const struct of_device_id
*matches)

 parent->clk_init_cb = match->data;
 parent->np = np;
-   list_add(&parent->node, &clk_provider_list);
+   list_add_tail(&parent->node, &clk_provider_list);
 }

 while (!list_empty(&clk_provider_list)) {

Thanks, Sylwester!

This makes my imx6q-wandboard to boot again.

Tested-by: Fabio Estevam 


This solution solve the problem for this specific case because clks are
declared in the correct order in imx DTs.
But, even with your patch I think we could see similar issues by 
reordering DT

nodes...

The real problem here is that imx platform does not declare the CCM clocks
dependencies upon ckil, ckih1 and osc fixed clocks within the DT [1], and
retrieve these clocks when initializing the CCM clocks ([2] and [3]).

We should try to a add these dependencies in the DT and see if it works.

[1] http://lxr.free-electrons.com/source/arch/arm/boot/dts/imx6sl.dtsi#L379
[2] http://lxr.free-electrons.com/source/arch/arm/mach-imx/clk-imx6q.c#L151
[3] http://lxr.free-electrons.com/source/arch/arm/mach-imx/clk.c#L30

Best Regards,

Boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] ARM : kdump : Add LPAE support

2014-03-27 Thread Liu Hua

With CONFIG_LPAE=y, memory in 32-bit ARM systems can exceed
4G. So if we use kdump in such systems. The capture kernel
should parse 64-bit elf header(parse_crash_elf64_headers).

And this process can not pass because ARM linux does not
supply related check function.

This patch adds check functions related of elf64 header.

Signed-off-by: Liu Hua 
To: Russell King 
Cc: Dan Aloni 
Cc: Catalin Marinas 
Cc: 
Cc: 
Cc: 
---
 arch/arm/include/asm/elf.h |  5 -
 arch/arm/kernel/elf.c  | 33 +
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h
index f4b46d3..6e02a6d 100644
--- a/arch/arm/include/asm/elf.h
+++ b/arch/arm/include/asm/elf.h
@@ -90,14 +90,17 @@ typedef struct user_fp elf_fpregset_t;
 extern char elf_platform[];
 
 struct elf32_hdr;
+struct elf64_hdr;
 
 /*
  * This is used to ensure we don't load something for the wrong architecture.
  */
 extern int elf_check_arch(const struct elf32_hdr *);
+extern int elf_check_arch_64(const struct elf64_hdr *);
 #define elf_check_arch elf_check_arch
 
-#define vmcore_elf64_check_arch(x) (0)
+#define vmcore_elf64_check_arch(x) (elf_check_arch_64(x) || \
+   vmcore_elf_check_arch_cross(x))
 
 extern int arm_elf_read_implies_exec(const struct elf32_hdr *, int);
 #define elf_read_implies_exec(ex,stk) arm_elf_read_implies_exec(&(ex), stk)
diff --git a/arch/arm/kernel/elf.c b/arch/arm/kernel/elf.c
index d0d1e83..452086a 100644
--- a/arch/arm/kernel/elf.c
+++ b/arch/arm/kernel/elf.c
@@ -38,6 +38,39 @@ int elf_check_arch(const struct elf32_hdr *x)
 }
 EXPORT_SYMBOL(elf_check_arch);
 
+int elf_check_arch_64(const struct elf64_hdr *x)
+{
+   unsigned int eflags;
+
+   /* Make sure it's an ARM executable */
+   if (x->e_machine != EM_ARM)
+   return 0;
+
+   /* Make sure the entry address is reasonable */
+   if (x->e_entry & 1) {
+   if (!(elf_hwcap & HWCAP_THUMB))
+   return 0;
+   } else if (x->e_entry & 3)
+   return 0;
+
+   eflags = x->e_flags;
+   if ((eflags & EF_ARM_EABI_MASK) == EF_ARM_EABI_UNKNOWN) {
+   unsigned int flt_fmt;
+
+   /* APCS26 is only allowed if the CPU supports it */
+   if ((eflags & EF_ARM_APCS_26) && !(elf_hwcap & HWCAP_26BIT))
+   return 0;
+
+   flt_fmt = eflags & (EF_ARM_VFP_FLOAT | EF_ARM_SOFT_FLOAT);
+
+   /* VFP requires the supporting code */
+   if (flt_fmt == EF_ARM_VFP_FLOAT && !(elf_hwcap & HWCAP_VFP))
+   return 0;
+   }
+   return 1;
+}
+EXPORT_SYMBOL(elf_check_arch_64);
+
 void elf_set_personality(const struct elf32_hdr *x)
 {
unsigned int eflags = x->e_flags;
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] ARM : kdump : add arch_crash_save_vmcoreinfo

2014-03-27 Thread Liu Hua

For vmcore generated by LPAE enabled kernel, user space
utility such as crash needs additional infomation to
parse.

So this patch add arch_crash_save_vmcoreinfo as what PAE enabled
i386 linux does.

Signed-off-by: Liu Hua 
To: Russell King 
Cc: Stephen Warren  
Cc: Will Deacon 
Cc: Vijaya Kumar K 
Cc: 
Cc: 
Cc: 
---
 arch/arm/kernel/machine_kexec.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f0d180d..8cf0996 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -184,3 +184,10 @@ void machine_kexec(struct kimage *image)
 
soft_restart(reboot_entry_phys);
 }
+
+void arch_crash_save_vmcoreinfo(void)
+{
+#ifdef CONFIG_ARM_LPAE
+   VMCOREINFO_CONFIG(ARM_LPAE);
+#endif
+}
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] kexec : ARM : add LPAE support

2014-03-27 Thread Liu Hua

For 32-bit ARM systems with CONFIG_ARM_LPAE=y, when kexec utility
loads the crash kernel. 32-bit elf header is not enough if the
physical address exceeds 4G.

This patch check whether the largest physical address of the system
exceeds 4G. If so, kexec creates 64-bit elf header.Otherwise it
creates 32-bit elf header.

Signed-off-by: Liu Hua 
To: Simon Horman 
Cc: Vivek Goyal 
Cc: 
Cc: 
Cc: 
---
 kexec/arch/arm/crashdump-arm.c | 23 ---
 kexec/kexec-iomem.c|  8 
 kexec/kexec.h  |  4 ++--
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/kexec/arch/arm/crashdump-arm.c b/kexec/arch/arm/crashdump-arm.c
index 0cd6935..d1133cd 100644
--- a/kexec/arch/arm/crashdump-arm.c
+++ b/kexec/arch/arm/crashdump-arm.c
@@ -20,6 +20,7 @@
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
+#include 
 #include 
 #include 
 #include 
@@ -75,8 +76,8 @@ unsigned long phys_offset;
  * regions is placed in @crash_memory_nr_ranges.
  */
 static int crash_range_callback(void *UNUSED(data), int UNUSED(nr),
-   char *str, unsigned long base,
-   unsigned long length)
+   char *str, unsigned long long base,
+   unsigned long long length)
 {
struct memory_range *range;
 
@@ -276,6 +277,7 @@ int load_crashdump_segments(struct kexec_info *info, char 
*mod_cmdline)
unsigned long bufsz;
void *buf;
int err;
+   int last_ranges;
 
/*
 * First fetch all the memory (RAM) ranges that we are going to pass to
@@ -292,10 +294,25 @@ int load_crashdump_segments(struct kexec_info *info, char 
*mod_cmdline)
phys_offset = usablemem_rgns.ranges->start;
dbgprintf("phys_offset: %#lx\n", phys_offset);
 
-   err = crash_create_elf32_headers(info, &elf_info,
+   last_ranges = usablemem_rgns.size - 1;
+   if (last_ranges < 0)
+   last_ranges = 0;
+
+   if (crash_memory_ranges[last_ranges].end > ULONG_MAX) {
+
+   /* for support arm LPAE and arm64 */
+   elf_info.class = ELFCLASS64;
+
+   err = crash_create_elf64_headers(info, &elf_info,
 usablemem_rgns.ranges,
 usablemem_rgns.size, &buf, &bufsz,
 ELF_CORE_HEADER_ALIGN);
+   } else {
+   err = crash_create_elf32_headers(info, &elf_info,
+usablemem_rgns.ranges,
+usablemem_rgns.size, &buf, &bufsz,
+ELF_CORE_HEADER_ALIGN);
+   }
if (err)
return err;
 
diff --git a/kexec/kexec-iomem.c b/kexec/kexec-iomem.c
index 0396713..485a2e8 100644
--- a/kexec/kexec-iomem.c
+++ b/kexec/kexec-iomem.c
@@ -26,8 +26,8 @@ int kexec_iomem_for_each_line(char *match,
  int (*callback)(void *data,
  int nr,
  char *str,
- unsigned long base,
- unsigned long length),
+ unsigned long long base,
+ unsigned long long length),
  void *data)
 {
const char *iomem = proc_iomem();
@@ -65,8 +65,8 @@ int kexec_iomem_for_each_line(char *match,
 
 static int kexec_iomem_single_callback(void *data, int nr,
   char *UNUSED(str),
-  unsigned long base,
-  unsigned long length)
+  unsigned long long base,
+  unsigned long long length)
 {
struct memory_range *range = data;
 
diff --git a/kexec/kexec.h b/kexec/kexec.h
index 2bd6e96..ecc4681 100644
--- a/kexec/kexec.h
+++ b/kexec/kexec.h
@@ -279,8 +279,8 @@ int kexec_iomem_for_each_line(char *match,
  int (*callback)(void *data,
  int nr,
  char *str,
- unsigned long base,
- unsigned long length),
+ unsigned long long base,
+ unsigned long long length),
  void *data);
 int parse_iomem_single(char *str, uint64_t *start, uint64_t *end);
 const char * proc_iomem(void);
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info a

Re: [PATCH v2 0/2] mmc: rtsx: add new cmd type handle and modify error handle

2014-03-27 Thread Ulf Hansson

On 27 March 2014 06:35,   wrote:
> From: Micky Ching 
>
> v2:
> fix checkpatch warning.
> WARNING: Missing a blank line after declarations
>
> v1:
> Add new command type(R1 without CRC) handle, without this
> patch mmc card initialize will be failed.
>
> Using a more careful handle in request timeout, this would
> improve error recover capability. Debug info is printed
> using non DMA mode, this would help print more accurately
> for DMA command failed. Smatch warning was removed.
>
> Micky Ching (2):
>   mmc: rtsx: add R1-no-CRC mmc command type handle
>   mmc: rtsx: modify error handle and remove smatch warnings
>
>  drivers/mmc/host/rtsx_pci_sdmmc.c |  122 
> +
>  1 file changed, 68 insertions(+), 54 deletions(-)

Acked-by: Ulf Hansson 

>
> --
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Qemu-devel] Massive read only kvm guests when backing file was missing

2014-03-27 Thread Michael S. Tsirkin

On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
> >> Hi List!
> >> Hope some one can help me, we had a big issue in our cloud the other
> >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
> >> went read only filesystem from the guest side because the backing
> >> files directory (the openstack _base directory) was compromised and
> >> the data was lost, when we realized the data was lost, it took us 5
> >> mins to restore the backup of the backing files, but by that time all
> >> the kvm guests received some kind of IO error from the hypervisor
> >> layer, and went read only on root filesystem.
> >> 
> >> My question would be, is there a way to hold the IO operations against
> >> the backing files ( i thought that would be 99% READ operations ) for
> >> a little longer ( im asking this because i dont quite understand what
> >> is the process and when it raises the error ) in a case the backing
> >> files are missing (no IO possible) but is recoverable within minutes ?
> >> 
> >> Any tip  on how to achieve this if possible, or information about how
> >> backing files works on kvm, will be amazing.
> >> Waiting for feedback!
> >> 
> >> kindest regards.
> >> Alejandro Comisario
> >
> >
> > I'm guessing this is what happened: guests timed out meanwhile.
> > You can increase the timeout within the guest:
> > echo 600 > /sys/block/sda/device/timeout
> > to timeout after 10 minutes.
> >
> > If you have installed qemu guest agent on your system, you can do this
> > from the host. Unfortunately by default it's memory can be pushed out to 
> > swap
> > and then on disk error access there might will fail :(
> > Maybe we should consider mlock on all its memory at least as an option.
> >
> > You could pause your guests, restart them after the issue is resolved,
> > and we could I guess add functionality to pause VM on disk errors
> > automatically.
> > Stefan?
> 
> Would -drive rerror=stop do?

I think it will. It's a pity it doesn't appear in --help output -
would make it easier to find.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ARM: imx6/dt: add ccm dependency upon ckil, ckih1 and osc clocks

2014-03-27 Thread Boris BREZILLON

Signed-off-by: Boris BREZILLON 
---
 arch/arm/boot/dts/imx6qdl.dtsi |4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi
index cfc85be..060e94c 100644
--- a/arch/arm/boot/dts/imx6qdl.dtsi
+++ b/arch/arm/boot/dts/imx6qdl.dtsi
@@ -62,16 +62,19 @@
 
ckil {
compatible = "fsl,imx-ckil", "fixed-clock";
+   #clock-cells = <0>;
clock-frequency = <32768>;
};
 
ckih1 {
compatible = "fsl,imx-ckih1", "fixed-clock";
+   #clock-cells = <0>;
clock-frequency = <0>;
};
 
osc {
compatible = "fsl,imx-osc", "fixed-clock";
+   #clock-cells = <0>;
clock-frequency = <2400>;
};
};
@@ -489,6 +492,7 @@
interrupts = <0 87 IRQ_TYPE_LEVEL_HIGH>,
 <0 88 IRQ_TYPE_LEVEL_HIGH>;
#clock-cells = <1>;
+   clocks = <&ckil &ckih1 &osc>;
};
 
anatop: anatop@020c8000 {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] timekeeping: check params before use them

2014-03-27 Thread Neil Zhang

Sometimes we won't need all the information from
get_xtime_and_monotonic_and_sleep_offset(),
so let's check the params before assign the value to them.

Signed-off-by: Neil Zhang 
---
 kernel/time/timekeeping.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0aa4ce8..f0e8f53 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1598,9 +1598,12 @@ void get_xtime_and_monotonic_and_sleep_offset(struct 
timespec *xtim,
 
do {
seq = read_seqcount_begin(&timekeeper_seq);
-   *xtim = tk_xtime(tk);
-   *wtom = tk->wall_to_monotonic;
-   *sleep = tk->total_sleep_time;
+   if (xtim)
+   *xtim = tk_xtime(tk);
+   if (wtom)
+   *wtom = tk->wall_to_monotonic;
+   if (sleep)
+   *sleep = tk->total_sleep_time;
} while (read_seqcount_retry(&timekeeper_seq, seq));
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] printk: add sleep time into timestamp

2014-03-27 Thread Neil Zhang

Add sleep time into timestamp to reflect the actual time since
sched_clock will be stopped during suspend.

This patch depends on the following patch.
timekeeping: check params before use them

Signed-off-by: Neil Zhang 
---
 kernel/printk/printk.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 4dae9cb..2dc6145 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -250,6 +250,17 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+static u64 print_clock(void)
+{
+   struct timespec ts;
+   u64 ts_nsec = local_clock();
+
+   get_xtime_and_monotonic_and_sleep_offset(NULL, NULL, &ts);
+   ts_nsec += (u64)ts.tv_sec * NSEC_PER_SEC + (u64)ts.tv_nsec;
+
+   return ts_nsec;
+}
+
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
@@ -349,7 +360,7 @@ static void log_store(int facility, int level,
if (ts_nsec > 0)
msg->ts_nsec = ts_nsec;
else
-   msg->ts_nsec = local_clock();
+   msg->ts_nsec = print_clock();
memset(log_dict(msg) + dict_len, 0, pad_len);
msg->len = sizeof(struct printk_log) + text_len + dict_len + pad_len;
 
@@ -1440,7 +1451,7 @@ static bool cont_add(int facility, int level, const char 
*text, size_t len)
cont.facility = facility;
cont.level = level;
cont.owner = current;
-   cont.ts_nsec = local_clock();
+   cont.ts_nsec = print_clock();
cont.flags = 0;
cont.cons = 0;
cont.flushed = false;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/15] mtd: st_spi_fsm: Add Macronix MX25L25655E device

2014-03-27 Thread Lee Jones

Hi Geert,

> > From: Angus Clark 
> >
> > Add Macronix MX25L25655E to the list of known devices.
> >
> > Signed-off-by: Angus Clark 
> > Signed-off-by: Lee Jones 
> > ---
> >  drivers/mtd/devices/st_spi_fsm.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/mtd/devices/st_spi_fsm.c 
> > b/drivers/mtd/devices/st_spi_fsm.c
> > index bea1416..2471061 100644
> > --- a/drivers/mtd/devices/st_spi_fsm.c
> > +++ b/drivers/mtd/devices/st_spi_fsm.c
> > @@ -380,6 +380,9 @@ static struct flash_info flash_types[] = {
> > { "mx25l25635e", 0xc22019, 0, 64*1024, 512,
> >   (MX25_FLAG | FLASH_FLAG_32BIT_ADDR | FLASH_FLAG_RESET), 70,
> >   stfsm_mx25_config },
> > +   { "mx25l25655e", 0xc22619, 0, 64*1024, 512,
> > + (MX25_FLAG | FLASH_FLAG_32BIT_ADDR | FLASH_FLAG_RESET), 70,
> > + stfsm_mx25_config},
> >
> >  #define N25Q_FLAG (FLASH_FLAG_READ_WRITE   |   \
> >FLASH_FLAG_READ_FAST |   \
> 
> How much of this table can be shared with the one in m25p80.c?

I have a long term plan to merge the two. Just waiting for the SPI NOR
Framework to land before I do so.

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] initramfs: print error and shell out for unsupported content

2014-03-27 Thread Alexander Holler


Am 26.03.2014 23:37, schrieb Alexander Holler:

Am 26.03.2014 22:55, schrieb Alexander Holler:

Am 26.03.2014 22:38, schrieb Levente Kurusa:



This is walkable but probably not worth the effort. Besides, why would
anyone want to put spaces, colons and arbitrary characters to filenames
in the initramfs?


I've already suggest an example for that. If you have a machine with
bluetooth, look at /var/lib/bluetooth and you will discover directories
with colons. So, guess what happens if you want to have (preset)
link-keys in an initramfs to avoid an otherwise necessary pairing.

And spaces in filenames are used by a lot of people for various reasons.
And you might wonder, but there exists software one might want to use in
an initramfs which needs some file(s) with an hardcoded name wich
contains spaces.


Just that this problem exists at least since the dawn of git doesn't 
mean nobody has suffert through it.


E.g. I know the bug with colons since several years, but just feared to 
post a simple patch (for legitimate reasons as this thread shows).


Alexander Holler

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH resend] clk: axi-clkgen: Add support for v2

2014-03-27 Thread Lars-Peter Clausen


On 03/27/2014 03:38 AM, Mike Turquette wrote:

Quoting Lars-Peter Clausen (2014-02-26 22:20:53)

On 02/27/2014 02:04 AM, Mike Turquette wrote:

Quoting Lars-Peter Clausen (2014-02-17 01:31:53)

This patch adds support for the new v2 version of the axi-clkgen core.
Unfortunately the method of accessing the registers is quite different on v2,
while the content still stays largely the same. So the patch adds a small
abstraction layer which implements the specific read and write functions for v1
and v2 in callback functions.


Hi,

This patch almost doubles the size of clk-axi-clkgen.c. Should it be a
separate clock driver? I guess that depends on the relationship between
"v1" and "v2". Are both of those versions of the clkgen core going into
production?


Hi,

The only thing that is different between the two versions is how the PLL
registers are accessed. The content that is written to those register is a
100% identical. So splitting it up into two drivers makes no sense, since
you'd have to copy&paste all the application logic. Both versions of the
core can be found in the wild.


Hi Lars,

I took this into clk-next some time ago but never replied to this thread
letting you know. Better late than never :-)


Yep, already saw it showing up in clk-next a while ago, thanks.

- Lars

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH resend] serial_core: Fix pm imbalance on unbind

2014-03-27 Thread Geert Uytterhoeven

On Fri, Mar 21, 2014 at 2:06 PM, Peter Hurley  wrote:
>> When a serial port is closed, uart_close() takes care of shutting down the
>> hardware, and powering it down.
>>
>> When a serial port is unbound while in use, uart_close() bypasses all of
>> this, as this is supposed to be done through uart_hangup() (invoked via
>> tty_vhangup() in uart_remove_one_port()).
>>
>> However, uart_hangup() does not set the hardware's power state, leaving it
>> powered up.  This may also lead to unbounded nesting counts in clock and
>> power management, depending on their internal implementation.
>>
>> Make sure to power down the port in uart_hangup(), except when the port is
>> used as a serial console. For serial consoles, this must be postponed
>> until
>> after their deregistration in uart_remove_one_port() (symmetry with
>> registration in uart_configure_port(), invoked from uart_add_one_port()).
>>
>> After this, the module clock used by the sh-sci driver is disabled on
>> unbind while the serial port is in use.
>>
>> Signed-off-by: Geert Uytterhoeven 
>> ---
>>   drivers/tty/serial/serial_core.c |8 ++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/tty/serial/serial_core.c
>> b/drivers/tty/serial/serial_core.c
>> index 2cf5649a6dc0..56dda84f82a5 100644
>> --- a/drivers/tty/serial/serial_core.c
>> +++ b/drivers/tty/serial/serial_core.c
>> @@ -1452,6 +1452,8 @@ static void uart_hangup(struct tty_struct *tty)
>> clear_bit(ASYNCB_NORMAL_ACTIVE, &port->flags);
>> spin_unlock_irqrestore(&port->lock, flags);
>> tty_port_tty_set(port, NULL);
>> +   if (!uart_console(state->uart_port))
>> +   uart_change_pm(state, UART_PM_STATE_OFF);
>
>
> Ok.

Thanks, I'll send an updated patch handling the non-serial console case only.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: imx6/dt: add ccm dependency upon ckil, ckih1 and osc clocks

2014-03-27 Thread Boris BREZILLON


Le 27/03/2014 09:11, Boris BREZILLON a écrit :

Signed-off-by: Boris BREZILLON 
---
  arch/arm/boot/dts/imx6qdl.dtsi |4 
  1 file changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi
index cfc85be..060e94c 100644
--- a/arch/arm/boot/dts/imx6qdl.dtsi
+++ b/arch/arm/boot/dts/imx6qdl.dtsi
@@ -62,16 +62,19 @@
  
  		ckil {

Oops.
You'll have to change this line into the line below in order to be able 
to reference this clk.


  ckil: ckil {

compatible = "fsl,imx-ckil", "fixed-clock";
+   #clock-cells = <0>;
clock-frequency = <32768>;
};
  
  		ckih1 {

ditto

compatible = "fsl,imx-ckih1", "fixed-clock";
+   #clock-cells = <0>;
clock-frequency = <0>;
};
  
  		osc {

ditto

compatible = "fsl,imx-osc", "fixed-clock";
+   #clock-cells = <0>;
clock-frequency = <2400>;
};
};
@@ -489,6 +492,7 @@
interrupts = <0 87 IRQ_TYPE_LEVEL_HIGH>,
 <0 88 IRQ_TYPE_LEVEL_HIGH>;
#clock-cells = <1>;
+   clocks = <&ckil &ckih1 &osc>;
};
  
  			anatop: anatop@020c8000 {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH resend] serial_core: Fix pm imbalance on unbind

2014-03-27 Thread Geert Uytterhoeven

On Fri, Mar 21, 2014 at 2:06 PM, Peter Hurley  wrote:
>> When a serial port is closed, uart_close() takes care of shutting down the
>> hardware, and powering it down.
>>
>> When a serial port is unbound while in use, uart_close() bypasses all of
>> this, as this is supposed to be done through uart_hangup() (invoked via
>> tty_vhangup() in uart_remove_one_port()).
>>
>> However, uart_hangup() does not set the hardware's power state, leaving it
>> powered up.  This may also lead to unbounded nesting counts in clock and
>> power management, depending on their internal implementation.
>>
>> Make sure to power down the port in uart_hangup(), except when the port is
>> used as a serial console. For serial consoles, this must be postponed
>> until
>> after their deregistration in uart_remove_one_port() (symmetry with
>> registration in uart_configure_port(), invoked from uart_add_one_port()).
>>
>> After this, the module clock used by the sh-sci driver is disabled on
>> unbind while the serial port is in use.
>>
>> Signed-off-by: Geert Uytterhoeven 
>> ---
>>   drivers/tty/serial/serial_core.c |8 ++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/tty/serial/serial_core.c
>> b/drivers/tty/serial/serial_core.c
>> index 2cf5649a6dc0..56dda84f82a5 100644
>> --- a/drivers/tty/serial/serial_core.c
>> +++ b/drivers/tty/serial/serial_core.c
>> @@ -1452,6 +1452,8 @@ static void uart_hangup(struct tty_struct *tty)
>> clear_bit(ASYNCB_NORMAL_ACTIVE, &port->flags);
>> spin_unlock_irqrestore(&port->lock, flags);
>> tty_port_tty_set(port, NULL);
>> +   if (!uart_console(state->uart_port))
>> +   uart_change_pm(state, UART_PM_STATE_OFF);
>
>
> Ok.

Thanks, I'll send an updated patch handling the non-serial console case only.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] perf/tool: Fix usage of trace events with '-' in trace system name.

2014-03-27 Thread Christian Borntraeger

On 27/03/14 09:27, Alexander Yarygin wrote:
> 
> Trace events potentially can have a '-' in their trace system name,
> e.g. kvm on s390 defines kvm-s390:* tracepoints.
> tools/perf could not parse them, because there was no rule for this:
> $ sudo ./perf top -e "kvm-s390:*"
> invalid or unsupported event: 'kvm-s390:*'
> 
> This patch allows to '-' to be a part of PE_NAME token, so tracepoints
> with '-' can be parsed by the event_legacy_tracepoint rule.
> Without the patch, perf will not accept such tracepoints in the -e
> option.
> 
> Signed-off-by: Alexander Yarygin 
> Signed-off-by: Christian Borntraeger 

When doing a V2, you should remove my Signed-off-by. ;-)



But at least we can now add my
Acked-by: Christian Borntraeger 


Ingo, Peter, Paul,

If you agree with this solution, I would like to have this in the next merge 
window - maybe 
cc stable if we consider perf stable relevant.

Christian


> ---
>  tools/perf/util/parse-events.l |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index 3432995..ca20da7 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -199,7 +199,7 @@ r{num_raw_hex}{ return raw(yyscanner); }
>  {num_hex}{ return value(yyscanner, 16); }
> 
>  {modifier_event} { return str(yyscanner, PE_MODIFIER_EVENT); }
> -{name}   { return str(yyscanner, PE_NAME); }
> +{name_minus} { return str(yyscanner, PE_NAME); }
>  "/"  { BEGIN(config); return '/'; }
>  -{ return '-'; }
>  ,{ BEGIN(event); return ','; }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] random32: avoid attempt to late reseed if in the middle of seeding

2014-03-27 Thread Daniel Borkmann


On 03/27/2014 07:01 AM, Sasha Levin wrote:

Commit 4af712e8df ("random32: add prandom_reseed_late() and call when
nonblocking pool becomes initialized") has added a late reseed stage
that happens as soon as the nonblocking pool is marked as initialized.

This fails in the case that the nonblocking pool gets initialized
during __prandom_reseed()'s call to get_random_bytes(). In that case
we'd double back into __prandom_reseed() in an attempt to do a late
reseed - deadlocking on 'lock' early on in the boot process.

Instead, just avoid even waiting to do a reseed if a reseed is already
occuring.

Signed-off-by: Sasha Levin 


Looks better now, thanks!

Fixes: 4af712e8df99 ("random32: add prandom_reseed_late() and call when nonblocking 
pool becomes initialized")
Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Qemu-devel] Massive read only kvm guests when backing file was missing

2014-03-27 Thread Stefan Hajnoczi

On Thu, Mar 27, 2014 at 10:10:40AM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote:
> > "Michael S. Tsirkin"  writes:
> > 
> > > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
> > >> Hi List!
> > >> Hope some one can help me, we had a big issue in our cloud the other
> > >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
> > >> went read only filesystem from the guest side because the backing
> > >> files directory (the openstack _base directory) was compromised and
> > >> the data was lost, when we realized the data was lost, it took us 5
> > >> mins to restore the backup of the backing files, but by that time all
> > >> the kvm guests received some kind of IO error from the hypervisor
> > >> layer, and went read only on root filesystem.
> > >> 
> > >> My question would be, is there a way to hold the IO operations against
> > >> the backing files ( i thought that would be 99% READ operations ) for
> > >> a little longer ( im asking this because i dont quite understand what
> > >> is the process and when it raises the error ) in a case the backing
> > >> files are missing (no IO possible) but is recoverable within minutes ?
> > >> 
> > >> Any tip  on how to achieve this if possible, or information about how
> > >> backing files works on kvm, will be amazing.
> > >> Waiting for feedback!
> > >> 
> > >> kindest regards.
> > >> Alejandro Comisario
> > >
> > >
> > > I'm guessing this is what happened: guests timed out meanwhile.
> > > You can increase the timeout within the guest:
> > > echo 600 > /sys/block/sda/device/timeout
> > > to timeout after 10 minutes.
> > >
> > > If you have installed qemu guest agent on your system, you can do this
> > > from the host. Unfortunately by default it's memory can be pushed out to 
> > > swap
> > > and then on disk error access there might will fail :(
> > > Maybe we should consider mlock on all its memory at least as an option.
> > >
> > > You could pause your guests, restart them after the issue is resolved,
> > > and we could I guess add functionality to pause VM on disk errors
> > > automatically.
> > > Stefan?
> > 
> > Would -drive rerror=stop do?
> 
> I think it will. It's a pity it doesn't appear in --help output -
> would make it easier to find.

It is documented on the man page.  I'll send a patch to document it in
the --help output too.

But there's still a problem because the guest can have a shorter timeout
or the image may be NFS mounted on the host.  In that case the guest may
give up on the request before the host.  Then there is nothing QEMU can
do to avoid an error being returned to the application or the guest file
system going into read-only mode.

So make sure the timeout inside the guest is high.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] regmap: Add REGMAP_ENDIAN_SWAP support for values.

2014-03-27 Thread Xiubo Li

For the following cases of the SoCs using regmap-mmio:

Index   CPU mode   Device mode   Need Bytes Swap ?
--
1   LE LENo
2   LE BEYes
3   BE BENo
4   BE LEYes

And possiblly one Device will be used in all the endianness modes
above with the same device driver, then for the 1 and 3 cases the
REGMAP_ENDIAN_NATIVE is okey, but for the 2 and 4 cases, the
REGMAP_ENDIAN_SWAP is needed.

For the DT node, just one property like 'endian-swap' will be okey
for cases 2 and 4.



Certainly, for the 2 case, we can just use REGMAP_ENDIAN_BIG
instead of REGMAP_ENDIAN_SWAP, and then we should add one DT node
property like 'big-endian'.

While for the 4 case, we can just use REGMAP_ENDIAN_LITTLE instead
of REGMAP_ENDIAN_SWAP, and then we should add one DT node property
like 'little-endian'. Another question is that the
REGMAP_ENDIAN_LITTLE hasn't support by regmap core yet.

And using the REGMAP_ENDIAN_BIG and REGMAP_ENDIAN_LITTLE will make
the driver a bit more complex, and also the usage of it.

Thus using the REGMAP_ENDIAN_SWAP and one DT node property like
'endian-swap' will make the driver more easy to develop and to use
for all the above possible cases.

Signed-off-by: Xiubo Li 
---
 drivers/base/regmap/regmap.c | 56 
 include/linux/regmap.h   |  1 +
 2 files changed, 57 insertions(+)

diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c
index 6a19515..71e0a0d 100644
--- a/drivers/base/regmap/regmap.c
+++ b/drivers/base/regmap/regmap.c
@@ -192,6 +192,14 @@ static void regmap_format_16_be(void *buf, unsigned int 
val, unsigned int shift)
b[0] = cpu_to_be16(val << shift);
 }
 
+static void regmap_format_16_swap(void *buf, unsigned int val,
+ unsigned int shift)
+{
+   __u16 *b = buf;
+
+   b[0] = __swab16(val << shift);
+}
+
 static void regmap_format_16_native(void *buf, unsigned int val,
unsigned int shift)
 {
@@ -216,6 +224,14 @@ static void regmap_format_32_be(void *buf, unsigned int 
val, unsigned int shift)
b[0] = cpu_to_be32(val << shift);
 }
 
+static void regmap_format_32_swap(void *buf, unsigned int val,
+ unsigned int shift)
+{
+   __u32 *b = buf;
+
+   b[0] = __swab32(val << shift);
+}
+
 static void regmap_format_32_native(void *buf, unsigned int val,
unsigned int shift)
 {
@@ -240,6 +256,13 @@ static unsigned int regmap_parse_16_be(const void *buf)
return be16_to_cpu(b[0]);
 }
 
+static unsigned int regmap_parse_16_swap(const void *buf)
+{
+   const __u16 *b = buf;
+
+   return __swab16(b[0]);
+}
+
 static void regmap_parse_16_be_inplace(void *buf)
 {
__be16 *b = buf;
@@ -247,6 +270,13 @@ static void regmap_parse_16_be_inplace(void *buf)
b[0] = be16_to_cpu(b[0]);
 }
 
+static void regmap_parse_16_swap_inplace(void *buf)
+{
+   __u16 *b = buf;
+
+   b[0] = __swab16(b[0]);
+}
+
 static unsigned int regmap_parse_16_native(const void *buf)
 {
return *(u16 *)buf;
@@ -269,6 +299,13 @@ static unsigned int regmap_parse_32_be(const void *buf)
return be32_to_cpu(b[0]);
 }
 
+static unsigned int regmap_parse_32_swap(const void *buf)
+{
+   const __u32 *b = buf;
+
+   return __swab32((b[0]));
+}
+
 static void regmap_parse_32_be_inplace(void *buf)
 {
__be32 *b = buf;
@@ -276,6 +313,13 @@ static void regmap_parse_32_be_inplace(void *buf)
b[0] = be32_to_cpu(b[0]);
 }
 
+static void regmap_parse_32_swap_inplace(void *buf)
+{
+   __u32 *b = buf;
+
+   b[0] = __swab32(b[0]);
+}
+
 static unsigned int regmap_parse_32_native(const void *buf)
 {
return *(u32 *)buf;
@@ -585,6 +629,12 @@ struct regmap *regmap_init(struct device *dev,
map->format.parse_val = regmap_parse_16_be;
map->format.parse_inplace = regmap_parse_16_be_inplace;
break;
+   case REGMAP_ENDIAN_SWAP:
+   map->format.format_val = regmap_format_16_swap;
+   map->format.parse_val = regmap_parse_16_swap;
+   map->format.parse_inplace =
+   regmap_parse_16_swap_inplace;
+   break;
case REGMAP_ENDIAN_NATIVE:
map->format.format_val = regmap_format_16_native;
map->format.parse_val = regmap_parse_16_native;
@@ -606,6 +656,12 @@ struct regmap *regmap_init(struct device *dev,
map->format.parse_val = regmap_parse_32_be;
map->format.parse_inplace = regmap_parse_32_be_inplace;
break;
+   case REGMAP_ENDIAN_SWAP:
+   map->format.format_val =

Re: [PATCH] random32: avoid attempt to late reseed if in the middle of seeding

2014-03-27 Thread Daniel Borkmann


On 03/27/2014 03:21 AM, Hannes Frederic Sowa wrote:

On Wed, Mar 26, 2014 at 07:35:01PM -0400, Sasha Levin wrote:

On 03/26/2014 07:18 PM, Daniel Borkmann wrote:

On 03/26/2014 06:12 PM, Sasha Levin wrote:

Commit 4af712e8df ("random32: add prandom_reseed_late() and call when
nonblocking pool becomes initialized") has added a late reseed stage
that happens as soon as the nonblocking pool is marked as initialized.

This fails in the case that the nonblocking pool gets initialized
during __prandom_reseed()'s call to get_random_bytes(). In that case
we'd double back into __prandom_reseed() in an attempt to do a late
reseed - deadlocking on 'lock' early on in the boot process.

Instead, just avoid even waiting to do a reseed if a reseed is already
occuring.

Signed-off-by: Sasha Levin 


Thanks for catching! (If you want Dave to pick it up, please also
Cc netdev.)

Why not via spin_trylock_irqsave() ? Thus, if we already hold the
lock, we do not bother any longer with doing the same work twice
and just return.


I totally agree with Daniel spin_trylock_irqsave seems like the best
solution.

In case we really want to make sure that even early seeding doesn't
race with late seed and the pool is only filled by another CPU, we would
actually need per-cpu bools to get this case correct.


But then again, we would just exit via spin_trylock_irqsave()
now, no? Whenever something enters this section protected under
irq save spinlock we would do a reseed of the entire state (s1-s4)
for each cpu.


Your code looks much better, I'll should really stop sending patches
too early in the morning...

It's also worth adding lib/random32.c to the MAINTAINERS file, as my
list of recipients is solely based on what get_maintainer.pl tells
me to do (and I'm assuming that I'm not the last person who will be
sending patches for this).


Would be a nice idea, especially because prandom_u32 changes are sensitive to
network security and should get reviewed there, too.


Indeed, sounds good to me.


Greetings,

   Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: APEI hardware reduced profile

2014-03-27 Thread Tomasz Nowicki


On 03/26/14 16:10, Mauro Carvalho Chehab wrote:

Em Wed, 26 Mar 2014 15:55:07 +0100
"Rafael J. Wysocki"  escreveu:


On Wednesday, March 26, 2014 01:08:10 PM Tomasz Nowicki wrote:

Hi,


Hi,

This is a question for Tony, Boris and Mauro (CCed now).


Currently APEI depends on x86 architecture. It is because of many x86
specific features like "IA-32 Architecture Corrected Machine Check
" error source or NMI hardware error notification. However, many other
features like "PCI Express Device AER Structure" or GHES via external
interrupt can be still used perfectly by other architectures. So my idea
is to move x86 dependency away form Kconfig to APEI areas where it
really applies to.

I have started refactoring ghes.c driver in that direction. And here
comes my confusion, how should we treat x86 related parts, as fixed
profile? (which means we could use ACPI_REDUCED_HARDWARE or
CONFIG_ACPI_REDUCED_HARDWARE_ONLY define). I would like to ask for your
opinion.


That's a good question, and probably depends on how are you mapping the
ACPI changes. For example, are you moving acpi out of /arch?

As I answered to a similar questioning, IMHO, the better would be to
have the hardware error report mechanisms on /drivers/ras, and have
there some Kconfig items that would depend on X86 to enable certain
drivers.

Also, I don't like to have something like ACPI_REDUCED_foo. IMHO, the
better would be to do the reverse: to have Kconfig symbols enabling the
extra X86-specific functionality, and have them mapped into separate
files/drivers, with proper KConfig names, like ACPI_X86 or ACPI_X86_NMI.

Yet, it would be better if you could be a little more specific about
what are your plans and what are the common/not-common features that
you're mapping.


Yes and sorry, I should have been more specific here.

After scanning APEI code it seems like NMI notification of GHES implies 
APEI x86 dependency for Kconfig, so I am targeting ghes.c.


I agree that ACPI_REDUCED_foo is not suitable for that purpose. However, 
ACPI_X86_NMI sounds good to me. I also have been thinking of moving NMI 
code (from ghes.c) to separate file but NMI and IRQ context are tightly 
coupled. That convinced me to leave it in ghes.c for now but I need to 
look at it closer.


Thanks,
Tomasz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [alsa-devel] [PATCH] ASoC: Add support for multi register mux

2014-03-27 Thread Lars-Peter Clausen

On 03/26/2014 11:41 PM, Songhee Baek wrote:

-Original Message-
From: Lars-Peter Clausen [mailto:l...@metafoo.de]
Sent: Wednesday, March 26, 2014 12:39 PM
To: Arun Shamanna Lakshmi
Cc: lgirdw...@gmail.com; broo...@kernel.org; swar...@wwwdotorg.org;
Songhee Baek; alsa-de...@alsa-project.org; ti...@suse.de; linux-
ker...@vger.kernel.org
Subject: Re: [alsa-devel] [PATCH] ASoC: Add support for multi register mux

On 03/26/2014 01:02 AM, Arun Shamanna Lakshmi wrote:

If the mux uses 1 bit position per input, and requires to set one
single bit at a time, then an N bit register can support up to N
inputs. In more recent Tegra chips, we have at least greater than
64 inputs which requires at least 2 .reg fields in struct soc_enum.

Signed-off-by: Arun Shamanna Lakshmi 
Signed-off-by: Songhee Baek 

The way you describe this it seems to me that a value array for this kind of
mux would look like.

0x, 0x, 0x0001
0x, 0x, 0x0002
0x, 0x, 0x0003
0x, 0x, 0x0004
0x, 0x, 0x0008
...

That seems to be extremely tedious. If the MUX uses a one hot encoding
how about storing the index of the bit in the values array and use (1 << value)
when writing the value to the register?

If we store the index of the bit, the value will be duplicated for each 
registers inputs since register has 0 to 31bits to shift, then we need to 
decode the index to interpret value for which registers to set. If we need to 
interpret the decoded value of index, it is better to have custom put/get 
function in our driver, isn't it?

I'm not sure I understand. If you use (val / 32) to pick the register and 
(val % 32) to pick the bit in the register this should work just fine. Maybe 
I'm missing something. Do you have a real world code example of of the this 
type of enum is used?

-   int reg;
+   int reg[SOC_ENUM_MAX_REGS];
unsigned char shift_l;
unsigned char shift_r;
unsigned int items;
-   unsigned int mask;
+   unsigned int mask[SOC_ENUM_MAX_REGS];

If you make mask and reg pointers instead of arrays this should be much
more flexible and not be limited to 3 registers.

To use pointers instead of arrays, it will be flexible but I need to update 
SOC_ENUM SINGLE/DOUBLE macros.
It will changes a lot in current soc-core.c and soc-dapm.c.

In the existing macros you can do something like this:
...
.reg = &(unsigned int){(xreg)},
...

const char * const *texts;
const unsigned int *values;
+   unsigned int num_regs;
   };

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: APEI hardware reduced profile

2014-03-27 Thread Tomasz Nowicki


On 03/26/14 21:36, Borislav Petkov wrote:

On Wed, Mar 26, 2014 at 12:10:47PM -0300, Mauro Carvalho Chehab wrote:

Yet, it would be better if you could be a little more specific about
what are your plans and what are the common/not-common features that
you're mapping.


Well, I don't see anything x86-specific in ghes.c on a quick scan - just
the GHES gunk itself, which is the spec. Which begs the question, how
much of the APEI spec is ARM going to implement and if it is a subset,
this definitely needs to be handled cleanly. Judging by your text, it
seems like you want to ignore the correctable errors part...?

So yes, Tomasz, you want to be much more specific here :-)


Thanks for your comment. Please refer to my email to Mauro. I have put 
more explanation there.


Correctable errors are fine for ARM. But ARM is not going to have NMI 
ever. So as responded to Mauro, I am focusing on making NMI as the 
feature for ghes.c.


Thanks,
Tomasz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] random32: avoid attempt to late reseed if in the middle of seeding

2014-03-27 Thread Hannes Frederic Sowa

On Thu, Mar 27, 2014 at 10:04:03AM +0100, Daniel Borkmann wrote:
> On 03/27/2014 03:21 AM, Hannes Frederic Sowa wrote:
> >On Wed, Mar 26, 2014 at 07:35:01PM -0400, Sasha Levin wrote:
> >>On 03/26/2014 07:18 PM, Daniel Borkmann wrote:
> >>>On 03/26/2014 06:12 PM, Sasha Levin wrote:
> Commit 4af712e8df ("random32: add prandom_reseed_late() and call when
> nonblocking pool becomes initialized") has added a late reseed stage
> that happens as soon as the nonblocking pool is marked as initialized.
> 
> This fails in the case that the nonblocking pool gets initialized
> during __prandom_reseed()'s call to get_random_bytes(). In that case
> we'd double back into __prandom_reseed() in an attempt to do a late
> reseed - deadlocking on 'lock' early on in the boot process.
> 
> Instead, just avoid even waiting to do a reseed if a reseed is already
> occuring.
> 
> Signed-off-by: Sasha Levin 
> >>>
> >>>Thanks for catching! (If you want Dave to pick it up, please also
> >>>Cc netdev.)
> >>>
> >>>Why not via spin_trylock_irqsave() ? Thus, if we already hold the
> >>>lock, we do not bother any longer with doing the same work twice
> >>>and just return.
> >
> >I totally agree with Daniel spin_trylock_irqsave seems like the best
> >solution.
> >
> >In case we really want to make sure that even early seeding doesn't
> >race with late seed and the pool is only filled by another CPU, we would
> >actually need per-cpu bools to get this case correct.
> 
> But then again, we would just exit via spin_trylock_irqsave()
> now, no? Whenever something enters this section protected under
> irq save spinlock we would do a reseed of the entire state (s1-s4)
> for each cpu.

If early reseed races with late one, we would actually need to spin on
maybe another cpu, so the early call can leave critical section before
late call enters. If we don't spin we could possibly miss the late call
when nonblocking pool is fully seeded (entropy may be added in batches
and first cpus of the early reseeding might miss better entropy).

If the early call blocks the late call, maybe even on another cpu, the late
call should spin until the early call left the critical section. We can only
deadlock on same cpu.

I consider this just hypothetical.

Bye,

  Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] mmc: core: Invoke sdio func driver's PM callbacks from the sdio bus

2014-03-27 Thread Ulf Hansson

On 28 February 2014 12:49, Ulf Hansson  wrote:
> The sdio func device is added to the driver model after the card
> device.
>
> This means the sdio func device will be suspend before the card device
> and thus resumed after. The consequence are the mmc core don't
> explicity need to protect itself from receiving sdio requests in
> suspended state. Instead that can be handled from the sdio bus, which
> is thus invokes the PM callbacks instead of old dummy function.
>
> In the case were the sdio func driver don't implement the PM callbacks
> the mmc core will in the early phase of system suspend, remove the
> card from the driver model and thus power off it.
>
> Cc: Aaron Lu 
> Cc: NeilBrown 
> Cc: Rafael J. Wysocki 
> Signed-off-by: Ulf Hansson 
> ---
>
> Note, this patch has only been compile tested. Would appreciate if some with
> SDIO and a sdio func driver could help out to test this. Especially the
> libertas driver would be nice.
>
> ---

Hi Chris,

Would be nice if you could pick this up for 3.15 and possibly send it
to "stable" as well.

It has been tested and reviewed, so I am confident that we are doing
the right thing here.

Kind regards
Uffe

>  drivers/mmc/core/sdio.c |   45 
> ---
>  drivers/mmc/core/sdio_bus.c |   14 +-
>  2 files changed, 5 insertions(+), 54 deletions(-)
>
> diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
> index 4d721c6..9933e42 100644
> --- a/drivers/mmc/core/sdio.c
> +++ b/drivers/mmc/core/sdio.c
> @@ -943,40 +943,21 @@ static int mmc_sdio_pre_suspend(struct mmc_host *host)
>   */
>  static int mmc_sdio_suspend(struct mmc_host *host)
>  {
> -   int i, err = 0;
> -
> -   for (i = 0; i < host->card->sdio_funcs; i++) {
> -   struct sdio_func *func = host->card->sdio_func[i];
> -   if (func && sdio_func_present(func) && func->dev.driver) {
> -   const struct dev_pm_ops *pmops = func->dev.driver->pm;
> -   err = pmops->suspend(&func->dev);
> -   if (err)
> -   break;
> -   }
> -   }
> -   while (err && --i >= 0) {
> -   struct sdio_func *func = host->card->sdio_func[i];
> -   if (func && sdio_func_present(func) && func->dev.driver) {
> -   const struct dev_pm_ops *pmops = func->dev.driver->pm;
> -   pmops->resume(&func->dev);
> -   }
> -   }
> -
> -   if (!err && mmc_card_keep_power(host) && 
> mmc_card_wake_sdio_irq(host)) {
> +   if (mmc_card_keep_power(host) && mmc_card_wake_sdio_irq(host)) {
> mmc_claim_host(host);
> sdio_disable_wide(host->card);
> mmc_release_host(host);
> }
>
> -   if (!err && !mmc_card_keep_power(host))
> +   if (!mmc_card_keep_power(host))
> mmc_power_off(host);
>
> -   return err;
> +   return 0;
>  }
>
>  static int mmc_sdio_resume(struct mmc_host *host)
>  {
> -   int i, err = 0;
> +   int err = 0;
>
> BUG_ON(!host);
> BUG_ON(!host->card);
> @@ -1019,24 +1000,6 @@ static int mmc_sdio_resume(struct mmc_host *host)
> wake_up_process(host->sdio_irq_thread);
> mmc_release_host(host);
>
> -   /*
> -* If the card looked to be the same as before suspending, then
> -* we proceed to resume all card functions.  If one of them returns
> -* an error then we simply return that error to the core and the
> -* card will be redetected as new.  It is the responsibility of
> -* the function driver to perform further tests with the extra
> -* knowledge it has of the card to confirm the card is indeed the
> -* same as before suspending (same MAC address for network cards,
> -* etc.) and return an error otherwise.
> -*/
> -   for (i = 0; !err && i < host->card->sdio_funcs; i++) {
> -   struct sdio_func *func = host->card->sdio_func[i];
> -   if (func && sdio_func_present(func) && func->dev.driver) {
> -   const struct dev_pm_ops *pmops = func->dev.driver->pm;
> -   err = pmops->resume(&func->dev);
> -   }
> -   }
> -
> host->pm_flags &= ~MMC_PM_KEEP_POWER;
> return err;
>  }
> diff --git a/drivers/mmc/core/sdio_bus.c b/drivers/mmc/core/sdio_bus.c
> index 92d1ba8..4fa8fef9 100644
> --- a/drivers/mmc/core/sdio_bus.c
> +++ b/drivers/mmc/core/sdio_bus.c
> @@ -197,20 +197,8 @@ static int sdio_bus_remove(struct device *dev)
>
>  #ifdef CONFIG_PM
>
> -#ifdef CONFIG_PM_SLEEP
> -static int pm_no_operation(struct device *dev)
> -{
> -   /*
> -* Prevent the PM core from calling SDIO device drivers' suspend
> -* callback routines, which it is not supposed to do, by using this
> -* empty function as the bus type suspend callaback for SDIO.
> -*/
>

Re: [PATCH 2/2] [RFC] serial_core: Avoid NULL pointer dereference in uart_close()

2014-03-27 Thread Geert Uytterhoeven

Hi Peter,

On Wed, Mar 26, 2014 at 9:10 PM, Peter Hurley  wrote:
> On 03/26/2014 02:58 PM, Geert Uytterhoeven wrote:
>> Thanks for your comments!
>
> Not a problem; just wanted to save you some time and frustration :)

Much appreciated!

>> On Fri, Mar 21, 2014 at 9:29 PM, Peter Hurley 
>> wrote:
>>> On 03/17/2014 09:10 AM, Geert Uytterhoeven wrote:
 From: Geert Uytterhoeven 
 When unbinding a serial driver that's being used as a serial console,
 the kernel may crash with a NULL pointer dereference in a uart_*()
 function
 called from uart_close () (e.g. uart_flush_buffer() or
 uart_chars_in_buffer()).

 To fix this, let uart_close() check for port->count == 0. If this is the
 case, bail out early. Else tty_port_close_start() will make the port
 counts inconsistent, printing out warnings like

   tty_port_close_start: tty->count = 1 port count = 0.

 and

   tty_port_close_start: count = -1
>>>
>>> As you noted in the patch comments below, the tty core always closes
>>> a failed open.
>>>
>>> So the reason for the port->count mismatch is because
>>> tty_port_close_start()
>>> only handles the tty hangup condition -- all other failed opens are
>>> assumed
>>> to carry a port->count.
>>>
>>> A similar mismatch will occur if the mutex_lock in uart_open() is
>>> interrupted.
>>>
>>> This means that the port->count mismatch can occur even if port->count !=
>>> 0;
>>> so the bug here is that uart_open() and uart_close() don't agree on
>>> who does what cleanup under what error conditions.
>>>
>>> So with respect to the port count mismatches, the conditions need careful
>>> auditing and fixing, separate from the tty console teardown problem.
>>
>> Indeed. Currently uart_open() always decrements port->count again
>> in any error condition, which is clearly wrong.
>
> I started looking at this problem only to realize that the
> tty_hung_up_p() condition in uart_open() can't actually happen.

BTW, generic tty_port_open(), as used by new serial port drivers, also checks
for this condition.

> Which has lead me to a bunch of cleanups and fixes that I'm still
> working on. It's just slow going because tty code audit takes
> forever with legacy intentions that no longer apply and some of
> the bit-rotting tty drivers that I doubt even run.

Sure, I understand.

> What are the circumstances of device removal in your case?

I'm unbinding the driver using:

echo sh-sci.6 > /sys/bus/platform/drivers/sh-sci/unbind

As long as the serial port is not opened as a console at the time
of unbind, everything is reasonably well. But if it's open as a console,
uart_hangup() is no longer called, and proper cleanup never happens.

I started looking into this when I wanted to verify that the serial hardware's
clock is properly disabled when the hardware is not in use (e.g. on driver
shutdown).

Note that Greg has applied this patch to linux-next, so you may want to
revert it for your investigations (and fix ;-).

Thanks again!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net] vhost: fix total length when packets are too short

2014-03-27 Thread Michael S. Tsirkin

When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.

This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.

Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.

Fix this up by detecting this overrun and doing packet drop
immediately.

CVE-2014-0055

Signed-off-by: Michael S. Tsirkin 
---

Note: this is needed for -stable.

I wonder if this can still make the release.

 drivers/vhost/net.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index a0fa5de..026be58 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -532,6 +532,12 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
*iovcount = seg;
if (unlikely(log))
*log_num = nlogs;
+
+   /* Detect overrun */
+   if (unlikely(datalen > 0)) {
+   r = UIO_MAXIOV + 1;
+   goto err;
+   }
return headcount;
 err:
vhost_discard_vq_desc(vq, headcount);
@@ -587,6 +593,14 @@ static void handle_rx(struct vhost_net *net)
/* On error, stop handling until the next kick. */
if (unlikely(headcount < 0))
break;
+   /* On overrun, truncate and discard */
+   if (unlikely(headcount > UIO_MAXIOV)) {
+   msg.msg_iovlen = 1;
+   err = sock->ops->recvmsg(NULL, sock, &msg,
+1, MSG_DONTWAIT | MSG_TRUNC);
+   pr_debug("Discarded rx packet: len %zd\n", sock_len);
+   continue;
+   }
/* OK, now we need to know about added descriptors. */
if (!headcount) {
if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] ASoC: fsl_sai: Add isr to deal with error flag

2014-03-27 Thread David Laight

From: Mark Brown
> On Wed, Mar 26, 2014 at 11:59:53AM +, David Laight wrote:
> > From: Nicolin Chen
> 
> > > + regmap_read(sai->regmap, FSL_SAI_TCSR, &xcsr);
> > > + regmap_write(sai->regmap, FSL_SAI_TCSR, xcsr);
> 
> > Assuming these are 'write to clear' bits, you might want
> > to make the write (above) and all the traces (below)
> > conditional on the value being non-zero.
> 
> The trace is already conditional?  I'd also expect to see the driver
> only acknowledging sources it knows about and only reporting that the
> interrupt was handled if it saw one of them - right now all interrupts
> are unconditionally acknowleged.

The traces are separately conditional on their own bits.
That is a lot of checks that will normally be false.

Also the driver may need to clear all the active interrupt
bits in order to make the IRQ go away.
It should trace that bits it doesn't expect to be set though.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-27 Thread Paul Durrant

> -Original Message-
> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
> Sent: 26 March 2014 19:57
> To: Paul Durrant
> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
> linux-
> kernel; net...@vger.kernel.org
> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
> troubles "bisected"
> 
> 
> Wednesday, March 26, 2014, 6:48:15 PM, you wrote:
> 
> >> -Original Message-
> >> From: Paul Durrant
> >> Sent: 26 March 2014 17:47
> >> To: 'Sander Eikelenboom'
> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
> linux-
> >> kernel; net...@vger.kernel.org
> >> Subject: RE: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
> >> troubles "bisected"
> >>
> >> Re-send shortened version...
> >>
> >> > -Original Message-
> >> > From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
> >> > Sent: 26 March 2014 16:54
> >> > To: Paul Durrant
> >> > Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian 
> >> > Campbell;
> >> linux-
> >> > kernel; net...@vger.kernel.org
> >> > Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
> >> > troubles "bisected"
> >> >
> >> [snip]
> >> > >>
> >> > >> - When processing an SKB we end up in "xenvif_gop_frag_copy"
> while
> >> > prod
> >> > >> == cons ... but we still have bytes and size left ..
> >> > >> - start_new_rx_buffer() has returned true ..
> >> > >> - so we end up in get_next_rx_buffer
> >> > >> - this does a RING_GET_REQUEST and ups cons ..
> >> > >> - and we end up with a bad grant reference.
> >> > >>
> >> > >> Sometimes we are saved by the bell .. since additional slots have
> >> become
> >> > >> free (you see cons become > prod in "get_next_rx_buffer" but
> shortly
> >> > after
> >> > >> that prod is increased ..
> >> > >> just in time to not cause a overrun).
> >> > >>
> >> >
> >> > > Ah, but hang on... There's a BUG_ON meta_slots_used >
> >> > max_slots_needed, so if we are overflowing the worst-case calculation
> >> then
> >> > why is that BUG_ON not firing?
> >> >
> >> > You mean:
> >> > sco = (struct skb_cb_overlay *)skb->cb;
> >> > sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> >> > BUG_ON(sco->meta_slots_used > max_slots_needed);
> >> >
> >> > in "get_next_rx_buffer" ?
> >> >
> >>
> >> That code excerpt is from net_rx_action(),isn't it?
> >>
> >> > I don't know .. at least now it doesn't crash dom0 and therefore not my
> >> > complete machine and since tcp is recovering from a failed packet  :-)
> >> >
> >>
> >> Well, if the code calculating max_slots_needed were underestimating
> then
> >> the BUG_ON() should fire. If it is not firing in your case then this 
> >> suggests
> >> your problem lies elsewhere, or that meta_slots_used is not equal to the
> >> number of ring slots consumed.
> >>
> >> > But probably because "npo->copy_prod++" seems to be used for the
> frags
> >> ..
> >> > and it isn't added to  npo->meta_prod ?
> >> >
> >>
> >> meta_slots_used is calculated as the value of meta_prod at return (from
> >> xenvif_gop_skb()) minus the value on entry , and if you look back up the
> >> code then you can see that meta_prod is incremented every time
> >> RING_GET_REQUEST() is evaluated. So, we must be consuming a slot
> without
> >> evaluating RING_GET_REQUEST() and I think that's exactly what's
> >> happening... Right at the bottom of xenvif_gop_frag_copy() req_cons is
> >> simply incremented in the case of a GSO. So the BUG_ON() is indeed off
> by
> >> one.
> >>
> 
> > Can you re-test with the following patch applied?
> 
> >   Paul
> 
> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback
> > index 438d0c0..4f24220 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -482,6 +482,8 @@ static void xenvif_rx_action(struct xenvif *vif)
> 
> > while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
> > RING_IDX max_slots_needed;
> > +   RING_IDX old_req_cons;
> > +   RING_IDX ring_slots_used;
> > int i;
> 
> > /* We need a cheap worse case estimate for the number of
> > @@ -511,8 +513,12 @@ static void xenvif_rx_action(struct xenvif *vif)
> > vif->rx_last_skb_slots = 0;
> 
> > sco = (struct skb_cb_overlay *)skb->cb;
> > +
> > +   old_req_cons = vif->rx.req_cons;
> > sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> > -   BUG_ON(sco->meta_slots_used > max_slots_needed);
> > +   ring_slots_used = vif->rx.req_cons - old_req_cons;
> > +
> > +   BUG_ON(ring_slots_used > max_slots_needed);
> 
> > __skb_queue_tail(&rxq, skb);
> > }
> 
> That blew pretty fast .. on that BUG_ON
> 

Good. That's what should have happened :-)

  Paul

> [  290.218182] [ cut here ]
> [  290.225425

RE: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-27 Thread Paul Durrant

> -Original Message-
> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
> Sent: 26 March 2014 20:18
> To: Paul Durrant
> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
> linux-
> kernel; net...@vger.kernel.org
> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
> troubles "bisected"
> 
> 
> Wednesday, March 26, 2014, 7:15:30 PM, you wrote:
> 
> >> -Original Message-
> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
> >> Sent: 26 March 2014 18:08
> >> To: Paul Durrant
> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
> linux-
> >> kernel; net...@vger.kernel.org
> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
> >> troubles "bisected"
> >>
> >>
> >> Wednesday, March 26, 2014, 6:46:06 PM, you wrote:
> >>
> >> > Re-send shortened version...
> >>
> >> >> -Original Message-
> >> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
> >> >> Sent: 26 March 2014 16:54
> >> >> To: Paul Durrant
> >> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian 
> >> >> Campbell;
> >> linux-
> >> >> kernel; net...@vger.kernel.org
> >> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
> >> >> troubles "bisected"
> >> >>
> >> > [snip]
> >> >> >>
> >> >> >> - When processing an SKB we end up in "xenvif_gop_frag_copy"
> while
> >> >> prod
> >> >> >> == cons ... but we still have bytes and size left ..
> >> >> >> - start_new_rx_buffer() has returned true ..
> >> >> >> - so we end up in get_next_rx_buffer
> >> >> >> - this does a RING_GET_REQUEST and ups cons ..
> >> >> >> - and we end up with a bad grant reference.
> >> >> >>
> >> >> >> Sometimes we are saved by the bell .. since additional slots have
> >> become
> >> >> >> free (you see cons become > prod in "get_next_rx_buffer" but
> shortly
> >> >> after
> >> >> >> that prod is increased ..
> >> >> >> just in time to not cause a overrun).
> >> >> >>
> >> >>
> >> >> > Ah, but hang on... There's a BUG_ON meta_slots_used >
> >> >> max_slots_needed, so if we are overflowing the worst-case calculation
> >> then
> >> >> why is that BUG_ON not firing?
> >> >>
> >> >> You mean:
> >> >> sco = (struct skb_cb_overlay *)skb->cb;
> >> >> sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> >> >> BUG_ON(sco->meta_slots_used > max_slots_needed);
> >> >>
> >> >> in "get_next_rx_buffer" ?
> >> >>
> >>
> >> > That code excerpt is from net_rx_action(),isn't it?
> >>
> >> Yes
> >>
> >> >> I don't know .. at least now it doesn't crash dom0 and therefore not my
> >> >> complete machine and since tcp is recovering from a failed packet  :-)
> >> >>
> >>
> >> > Well, if the code calculating max_slots_needed were underestimating
> then
> >> the BUG_ON() should fire. If it is not firing in your case then this 
> >> suggests
> >> your problem lies elsewhere, or that meta_slots_used is not equal to the
> >> number of ring slots consumed.
> >>
> >> It's seem to be the last ..
> >>
> >> [ 1157.188908] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 5 npo-
> >> >meta_prod:40 old_meta_prod:36 vif->rx.sring->req_prod:2105867 vif-
> >> >rx.req_cons:2105868 meta->gso_type:1 meta->gso_size:1448 nr_frags:1
> >> req->gref:657 req->id:7 estimated_slots_needed:4 j(data):1
> >> reserved_slots_left:-1used in funcstart: 0 + 1 .. used_dataloop:1 ..
> >> used_fragloop:3
> >> [ 1157.244975] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 2 ..  vif-
> >> >rx.sring->req_prod:2105867 vif->rx.req_cons:2105868 sco-
> >> >meta_slots_used:4 max_upped_gso:1 skb_is_gso(skb):1
> >> max_slots_needed:4 j:6 is_gso:1 nr_frags:1 firstpart:1 secondpart:2
> >> reserved_slots_left:-1
> >>
> >> net_rx_action() calculated we would need 4 slots .. and sco-
> >> >meta_slots_used == 4 when we return so it doesn't trigger you BUG_ON
> ..
> >>
> >> The 4 slots we calculated are:
> >>   1 slot for the data part: DIV_ROUND_UP(offset_in_page(skb->data) +
> >> skb_headlen(skb), PAGE_SIZE)
> >>   2 slots for the single frag in this SKB from: DIV_ROUND_UP(size,
> PAGE_SIZE)
> >>   1 slot since GSO
> >>
> >> In the debug code i annotated all cons++, and the code uses 1 slot to
> process
> >> the data from the SKB as expected but uses 3 slots in the frag chopping
> loop.
> >> And when it reaches the state  were cons > prod it is always in
> >> "get_next_rx_buffer".
> >>
> >> >> But probably because "npo->copy_prod++" seems to be used for the
> >> frags ..
> >> >> and it isn't added to  npo->meta_prod ?
> >> >>
> >>
> >> > meta_slots_used is calculated as the value of meta_prod at return
> (from
> >> xenvif_gop_skb()) minus the value on entry ,
> >> > and if you look back up the code then you can see that meta_prod is
> >> incremented every time RING_GET_REQUEST() is evaluated.
> >> > So, we must be consuming a slot without evaluating
> RING_GET_REQUEST()
> >> and I think that's exactly what's happening...
> >> > Right at th

Re: [PATCH net] vhost: fix total length when packets are too short

2014-03-27 Thread Michael S. Tsirkin

On Thu, Mar 27, 2014 at 11:38:41AM +0200, Michael S. Tsirkin wrote:
> When mergeable buffers are disabled, and the
> incoming packet is too large for the rx buffer,
> get_rx_bufs returns success.
> 
> This was intentional in order for make recvmsg
> truncate the packet and then handle_rx would
> detect err != sock_len and drop it.
> 
> Unfortunately we pass the original sock_len to
> recvmsg - which means we use parts of iov not fully
> validated.
> 
> Fix this up by detecting this overrun and doing packet drop
> immediately.
> 
> CVE-2014-0055

Ouch wrong CVE#. It's  CVE-2014-0077  actually.
Will resend V2 with the corrected commit log now.


> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> Note: this is needed for -stable.
> 
> I wonder if this can still make the release.
> 
>  drivers/vhost/net.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index a0fa5de..026be58 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -532,6 +532,12 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
>   *iovcount = seg;
>   if (unlikely(log))
>   *log_num = nlogs;
> +
> + /* Detect overrun */
> + if (unlikely(datalen > 0)) {
> + r = UIO_MAXIOV + 1;
> + goto err;
> + }
>   return headcount;
>  err:
>   vhost_discard_vq_desc(vq, headcount);
> @@ -587,6 +593,14 @@ static void handle_rx(struct vhost_net *net)
>   /* On error, stop handling until the next kick. */
>   if (unlikely(headcount < 0))
>   break;
> + /* On overrun, truncate and discard */
> + if (unlikely(headcount > UIO_MAXIOV)) {
> + msg.msg_iovlen = 1;
> + err = sock->ops->recvmsg(NULL, sock, &msg,
> +  1, MSG_DONTWAIT | MSG_TRUNC);
> + pr_debug("Discarded rx packet: len %zd\n", sock_len);
> + continue;
> + }
>   /* OK, now we need to know about added descriptors. */
>   if (!headcount) {
>   if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> -- 
> MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-27 Thread Sander Eikelenboom


Thursday, March 27, 2014, 10:47:02 AM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 19:57
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> 
>> Wednesday, March 26, 2014, 6:48:15 PM, you wrote:
>> 
>> >> -Original Message-
>> >> From: Paul Durrant
>> >> Sent: 26 March 2014 17:47
>> >> To: 'Sander Eikelenboom'
>> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> >> kernel; net...@vger.kernel.org
>> >> Subject: RE: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> troubles "bisected"
>> >>
>> >> Re-send shortened version...
>> >>
>> >> > -Original Message-
>> >> > From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> > Sent: 26 March 2014 16:54
>> >> > To: Paul Durrant
>> >> > Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian 
>> >> > Campbell;
>> >> linux-
>> >> > kernel; net...@vger.kernel.org
>> >> > Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> > troubles "bisected"
>> >> >
>> >> [snip]
>> >> > >>
>> >> > >> - When processing an SKB we end up in "xenvif_gop_frag_copy"
>> while
>> >> > prod
>> >> > >> == cons ... but we still have bytes and size left ..
>> >> > >> - start_new_rx_buffer() has returned true ..
>> >> > >> - so we end up in get_next_rx_buffer
>> >> > >> - this does a RING_GET_REQUEST and ups cons ..
>> >> > >> - and we end up with a bad grant reference.
>> >> > >>
>> >> > >> Sometimes we are saved by the bell .. since additional slots have
>> >> become
>> >> > >> free (you see cons become > prod in "get_next_rx_buffer" but
>> shortly
>> >> > after
>> >> > >> that prod is increased ..
>> >> > >> just in time to not cause a overrun).
>> >> > >>
>> >> >
>> >> > > Ah, but hang on... There's a BUG_ON meta_slots_used >
>> >> > max_slots_needed, so if we are overflowing the worst-case calculation
>> >> then
>> >> > why is that BUG_ON not firing?
>> >> >
>> >> > You mean:
>> >> > sco = (struct skb_cb_overlay *)skb->cb;
>> >> > sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
>> >> > BUG_ON(sco->meta_slots_used > max_slots_needed);
>> >> >
>> >> > in "get_next_rx_buffer" ?
>> >> >
>> >>
>> >> That code excerpt is from net_rx_action(),isn't it?
>> >>
>> >> > I don't know .. at least now it doesn't crash dom0 and therefore not my
>> >> > complete machine and since tcp is recovering from a failed packet  :-)
>> >> >
>> >>
>> >> Well, if the code calculating max_slots_needed were underestimating
>> then
>> >> the BUG_ON() should fire. If it is not firing in your case then this 
>> >> suggests
>> >> your problem lies elsewhere, or that meta_slots_used is not equal to the
>> >> number of ring slots consumed.
>> >>
>> >> > But probably because "npo->copy_prod++" seems to be used for the
>> frags
>> >> ..
>> >> > and it isn't added to  npo->meta_prod ?
>> >> >
>> >>
>> >> meta_slots_used is calculated as the value of meta_prod at return (from
>> >> xenvif_gop_skb()) minus the value on entry , and if you look back up the
>> >> code then you can see that meta_prod is incremented every time
>> >> RING_GET_REQUEST() is evaluated. So, we must be consuming a slot
>> without
>> >> evaluating RING_GET_REQUEST() and I think that's exactly what's
>> >> happening... Right at the bottom of xenvif_gop_frag_copy() req_cons is
>> >> simply incremented in the case of a GSO. So the BUG_ON() is indeed off
>> by
>> >> one.
>> >>
>> 
>> > Can you re-test with the following patch applied?
>> 
>> >   Paul
>> 
>> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
>> netback/netback
>> > index 438d0c0..4f24220 100644
>> > --- a/drivers/net/xen-netback/netback.c
>> > +++ b/drivers/net/xen-netback/netback.c
>> > @@ -482,6 +482,8 @@ static void xenvif_rx_action(struct xenvif *vif)
>> 
>> > while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
>> > RING_IDX max_slots_needed;
>> > +   RING_IDX old_req_cons;
>> > +   RING_IDX ring_slots_used;
>> > int i;
>> 
>> > /* We need a cheap worse case estimate for the number of
>> > @@ -511,8 +513,12 @@ static void xenvif_rx_action(struct xenvif *vif)
>> > vif->rx_last_skb_slots = 0;
>> 
>> > sco = (struct skb_cb_overlay *)skb->cb;
>> > +
>> > +   old_req_cons = vif->rx.req_cons;
>> > sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
>> > -   BUG_ON(sco->meta_slots_used > max_slots_needed);
>> > +   ring_slots_used = vif->rx.req_cons - old_req_cons;
>> > +
>> > +   BUG_ON(ring_slots_used > max_slots_needed);
>> 
>> > __skb_queue_tail(&rxq, skb);
>

Re: [PATCH] clk: register fixed-clock only if #clock-cells property is present

2014-03-27 Thread Sylwester Nawrocki

Hi Boris,

On 27/03/14 08:58, Boris BREZILLON wrote:
> This solution solve the problem for this specific case because clks are
> declared in the correct order in imx DTs.
> But, even with your patch I think we could see similar issues by 
> reordering DT nodes...
> 
> The real problem here is that imx platform does not declare the CCM clocks
> dependencies upon ckil, ckih1 and osc fixed clocks within the DT [1], and
> retrieve these clocks when initializing the CCM clocks ([2] and [3]).
> 
> We should try to a add these dependencies in the DT and see if it works.

While presumably all of us agree the dependencies should be correctly
specified in dts I think we should minimize possible regressions by
keeping the clocks registration order as before, i.e. as parsed by the
kernel from DT. Rather than explicitly reversing it, which does not gain
us anything AFAICS. Instead we are seeing regressions where new kernels
stop working with old dtbs.

I'm going to resend the patch replacing list_add() with list_add_tail(),
with this the mvebu platform would work and there should be no regression
on imx and exynos.

Please note that specifying dependencies between CCM on imx and the fixed
clocks might not be enough. If the fixed clocks get matched on "fixed-clock"
compatible some clock specifiers (i.e. those using phandle to the CCM) could
get invalid, since the clocks won't get registered by the ccm driver, but by
the regular fixed clock driver. That means a phandle to different node would
need to be used to reference the fixed clock. I'm not sure if this is the case
for imx, but changes may be needed all over various dts files.
In addition, we should make sure the kernel works with current and modified
dtbs.

> [1] http://lxr.free-electrons.com/source/arch/arm/boot/dts/imx6sl.dtsi#L379
> [2] http://lxr.free-electrons.com/source/arch/arm/mach-imx/clk-imx6q.c#L151
> [3] http://lxr.free-electrons.com/source/arch/arm/mach-imx/clk.c#L30

-- 
Thanks,
Sylwester
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-27 Thread Sander Eikelenboom

Hrmm i don't know if it's your mailer or my mailer .. but i seem to get a lot 
of your mails truncated somehow :S
though the xen-devel list archive seem to have them in complete form .. so it's 
probably my mailer tripping over something

> I'll come up with some patches shortly.

OK will test them ASAP.


Thursday, March 27, 2014, 10:54:09 AM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 20:18
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> 
>> Wednesday, March 26, 2014, 7:15:30 PM, you wrote:
>> 
>> >> -Original Message-
>> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> Sent: 26 March 2014 18:08
>> >> To: Paul Durrant
>> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> >> kernel; net...@vger.kernel.org
>> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> troubles "bisected"
>> >>
>> >>
>> >> Wednesday, March 26, 2014, 6:46:06 PM, you wrote:
>> >>
>> >> > Re-send shortened version...
>> >>
>> >> >> -Original Message-
>> >> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> >> Sent: 26 March 2014 16:54
>> >> >> To: Paul Durrant
>> >> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian 
>> >> >> Campbell;
>> >> linux-
>> >> >> kernel; net...@vger.kernel.org
>> >> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> >> troubles "bisected"
>> >> >>
>> >> > [snip]
>> >> >> >>
>> >> >> >> - When processing an SKB we end up in "xenvif_gop_frag_copy"
>> while
>> >> >> prod
>> >> >> >> == cons ... but we still have bytes and size left ..
>> >> >> >> - start_new_rx_buffer() has returned true ..
>> >> >> >> - so we end up in get_next_rx_buffer
>> >> >> >> - this does a RING_GET_REQUEST and ups cons ..
>> >> >> >> - and we end up with a bad grant reference.
>> >> >> >>
>> >> >> >> Sometimes we are saved by the bell .. since additional slots have
>> >> become
>> >> >> >> free (you see cons become > prod in "get_next_rx_buffer" but
>> shortly
>> >> >> after
>> >> >> >> that prod is increased ..
>> >> >> >> just in time to not cause a overrun).
>> >> >> >>
>> >> >>
>> >> >> > Ah, but hang on... There's a BUG_ON meta_slots_used >
>> >> >> max_slots_needed, so if we are overflowing the worst-case calculation
>> >> then
>> >> >> why is that BUG_ON not firing?
>> >> >>
>> >> >> You mean:
>> >> >> sco = (struct skb_cb_overlay *)skb->cb;
>> >> >> sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
>> >> >> BUG_ON(sco->meta_slots_used > max_slots_needed);
>> >> >>
>> >> >> in "get_next_rx_buffer" ?
>> >> >>
>> >>
>> >> > That code excerpt is from net_rx_action(),isn't it?
>> >>
>> >> Yes
>> >>
>> >> >> I don't know .. at least now it doesn't crash dom0 and therefore not my
>> >> >> complete machine and since tcp is recovering from a failed packet  :-)
>> >> >>
>> >>
>> >> > Well, if the code calculating max_slots_needed were underestimating
>> then
>> >> the BUG_ON() should fire. If it is not firing in your case then this 
>> >> suggests
>> >> your problem lies elsewhere, or that meta_slots_used is not equal to the
>> >> number of ring slots consumed.
>> >>
>> >> It's seem to be the last ..
>> >>
>> >> [ 1157.188908] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 5 npo-
>> >> >meta_prod:40 old_meta_prod:36 vif->rx.sring->req_prod:2105867 vif-
>> >> >rx.req_cons:2105868 meta->gso_type:1 meta->gso_size:1448 nr_frags:1
>> >> req->gref:657 req->id:7 estimated_slots_needed:4 j(data):1
>> >> reserved_slots_left:-1used in funcstart: 0 + 1 .. used_dataloop:1 ..
>> >> used_fragloop:3
>> >> [ 1157.244975] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 2 ..  vif-
>> >> >rx.sring->req_prod:2105867 vif->rx.req_cons:2105868 sco-
>> >> >meta_slots_used:4 max_upped_gso:1 skb_is_gso(skb):1
>> >> max_slots_needed:4 j:6 is_gso:1 nr_frags:1 firstpart:1 secondpart:2
>> >> reserved_slots_left:-1
>> >>
>> >> net_rx_action() calculated we would need 4 slots .. and sco-
>> >> >meta_slots_used == 4 when we return so it doesn't trigger you BUG_ON
>> ..
>> >>
>> >> The 4 slots we calculated are:
>> >>   1 slot for the data part: DIV_ROUND_UP(offset_in_page(skb->data) +
>> >> skb_headlen(skb), PAGE_SIZE)
>> >>   2 slots for the single frag in this SKB from: DIV_ROUND_UP(size,
>> PAGE_SIZE)
>> >>   1 slot since GSO
>> >>
>> >> In the debug code i annotated all cons++, and the code uses 1 slot to
>> process
>> >> the data from the SKB as expected but uses 3 slots in the frag chopping
>> loop.
>> >> And when it reaches the state  were cons > prod it is always in
>> >> "get_next_rx_buffer".
>> >>
>> >> >> But probably because "npo->copy_prod++" seems to be used for the
>> >> frags

Re: [PATCH 0/4] devfreq: exynos: generalize PPMU code

2014-03-27 Thread 함명주



On Sat, Mar 22, 2014 at 2:31 AM, Bartlomiej Zolnierkiewicz 
 wrote:
> Hi,
>
> This patch series generalizes PPMU support for Exynos devfreq
> drivers.
>
> It is based on top of "devfreq: exynos: Fix minor issue and code
> clean to remove legacy method" patch series from Chanwoo Choi
> (https://lkml.org/lkml/2014/3/19/713).
>
> Please note that the patches were only compile tested because
> Exynos devfreq drivers don't work yet in the upstream kernels
> (FWIW these changes were briefly tested with the internal tree
> which has working devfreq drivers).

Hi Blartlomiej,


Thank you.

I'll merge this into devfreq's for-next.


Anyway, days ago, I had a short discussion with Chanwoo about
providing some common interface for PPMU-like devices
(sort of sub device of devfreq device) that provide usage statistics. 

At this stage, I roughly guess that such a device driver (or whatever
entity it is) may be simply a helper for filling out the "get_dev_status" 
callback.
It is because devfreq's PPMU usage is limited to getting get_dev_status().
As you had done something on PPMU, I guess you may have some
idea on this.

Cheers,
MyungJoo.
>
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung R&D Institute Poland
> Samsung Electronics
>
>
> Bartlomiej Zolnierkiewicz (3):
>   devfreq: exynos4: introduce struct busfreq_ppmu_data
>   devfreq: exynos5: introduce struct busfreq_ppmu_data
>   devfreq: exynos: make more PPMU code common
>
> Chanwoo Choi (1):
>   devfreq: exynos4: use common PPMU code
>
>  drivers/devfreq/exynos/Makefile  |   2 +-
>  drivers/devfreq/exynos/exynos4_bus.c | 176 
> +--
>  drivers/devfreq/exynos/exynos5_bus.c |  94 +--
>  drivers/devfreq/exynos/exynos_ppmu.c |  60 
>  drivers/devfreq/exynos/exynos_ppmu.h |   8 ++
>  5 files changed, 135 insertions(+), 205 deletions(-)
>
> --
> 1.8.2.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
MyungJoo Ham, Ph.D.
System S/W Lab, S/W Center, Samsung Electronics

[PATCH] cpufreq: don't print value of .driver_data from core

2014-03-27 Thread Viresh Kumar

CPUFreq core doesn't control value of .driver_data and this field is completely
driver specific. This can contain any value and not only indexes. For most of
the drivers, which aren't using this field, its value is zero. So, printing this
from core doesn't make any sense. Don't print it.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/freq_table.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 8e54f97..f002272 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -36,8 +36,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy 
*policy,
&& table[i].driver_data == CPUFREQ_BOOST_FREQ)
continue;
 
-   pr_debug("table entry %u: %u kHz, %u driver_data\n",
-   i, freq, table[i].driver_data);
+   pr_debug("table entry %u: %u kHz\n", i, freq);
if (freq < min_freq)
min_freq = freq;
if (freq > max_freq)
@@ -175,8 +174,8 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
*policy,
} else
*index = optimal.driver_data;
 
-   pr_debug("target is %u (%u kHz, %u)\n", *index, table[*index].frequency,
-   table[*index].driver_data);
+   pr_debug("target index is %u, freq is:%u kHz\n", *index,
+table[*index].frequency);
 
return 0;
 }
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tick, broadcast: Prevent false alarm when force mask contains offline cpus

2014-03-27 Thread Preeti U Murthy

On 03/27/2014 11:58 AM, Srivatsa S. Bhat wrote:
> 
> Actually, my suggestion was to remove the dying CPU from the force_mask alone,
> in the CPU_DYING notifier. The rest of the cleanup (removing it from the other
> masks, moving the broadcast duty to someone else etc can still be done at
> the CPU_DEAD stage). Also, note that the CPU which is set in force_mask is
> definitely not the one doing the broadcast.
> 
> Basically, my reasoning was this:
> 
> If we look at how the 3 broadcast masks (oneshot, pending and force) are
> set and cleared during idle entry/exit, we see this pattern:
> 
> oneshot_mask: This is set at BROADCAST_ENTER and cleared at EXIT.
> pending_mask: This is set at tick_handle_oneshot_broadcast and cleared at
>   EXIT.
> force_mask:   This is set at EXIT and cleared at the next call to
>   tick_handle_oneshot_broadcast. (Also, if the CPU is set in this
>   mask, the CPU doesn't enter deep idle states in subsequent
>   idle durations, and keeps polling instead, until it gets the
>   broadcast interrupt).
> 
> What we can derive from this is that force_mask is the only mask that can
> remain set across an idle ENTER/EXIT sequence. Both of the other 2 masks
> can never remain set across a full idle ENTER/EXIT sequence. And a CPU going
> offline certainly goes through EXIT if it had gone through ENTER, before
> entering stop_machine().
> 
> That means, force_mask is the only odd one out here, which can remain set
> when entering stop_machine() for CPU offline. So that's the only mask that
> needs to be cleared separately. The other 2 masks take care of themselves
> automatically. So, we can have a CPU_DYING callback which just clears the
> dying CPU from the force_mask (and does nothing more). That should work, no?

Yep I think this will work. Find the modified patch below:

Thanks.

Regards
Preeti U Murthy

tick,broadcast:Clear hotplugged cpu in broadcast masks during CPU_DYING 
notification

From: Preeti U Murthy 

Its possible that the tick_broadcast_force_mask contains cpus which are not
in cpu_online_mask when a broadcast tick occurs. This could happen under the
following circumstance assuming CPU1 is among the CPUs waiting for broadcast.

CPU0CPU1

Run CPU_DOWN_PREPARE notifiers

Start stop_machine  Gets woken up by IPI to run
stop_machine, sets itself in
tick_broadcast_force_mask if the
time of broadcast interrupt is around
the same time as this IPI.

Start stop_machine
  set_cpu_online(cpu1, false)
End stop_machineEnd stop_machine

Broadcast interrupt
  Finds that cpu1 in
  tick_broadcast_force_mask is offline
  and triggers the WARN_ON in
  tick_handle_oneshot_broadcast()

Clears all broadcast masks
in CPU_DEAD stage.

While the hotplugged cpu clears its bit in the tick_broadcast_oneshot_mask
and tick_broadcast_pending mask during BROADCAST_EXIT, it *sets* its bit
in the tick_broadcast_force_mask if the broadcast interrupt is found to be
around the same time as the present time. Today we clear all the broadcast
masks and shutdown tick devices in the CPU_DEAD stage. But as shown above
the broadcast interrupt could occur before this stage is reached and the
WARN_ON() gets triggered when it is found that the tick_broadcast_force_mask
contains an offline cpu.

This WARN_ON was added to capture scenarios where the broadcast mask, be it
oneshot/pending/force_mask contain offline cpus whose tick devices have been
removed. But here is a case where we trigger the WARN_ON() when the tick
device of the hotplugged cpu is still around but we are delaying the clearing
of the broadcast masks. This has not been a problem for
tick_broadcastoneshot_mask and tick_broadcast_pending_mask since they get
cleared on exit from broadcast.
   But since the force_mask gets set at the same time on certain occasions
it is necessary to move the clearing of masks to a stage during cpu hotplug
before the hotplugged cpu clears itself in the online_mask.

Hence move the clearing of broadcast masks to the CPU_DYING notification stage
so that they remain consistent with the cpu_online_mask at the time of
broadcast delivery at all times.

Suggested-by: Srivatsa S. Bhat 
Signed-off-by: Preeti U Murthy 
---
 kernel/time/clockevents.c|1 +
 kernel/time/tick-broadcast.c |   20 +++-
 kernel/time/tick-internal.h  |3 +++
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index ad362c2..d33d808 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -566,6 +566,7 @@ int clockevents_notify(unsigned long reason, void *arg)

case CLOCK_

Re: [PATCH] cpufreq: don't print value of .driver_data from core

2014-03-27 Thread Srivatsa S. Bhat

On 03/27/2014 03:37 PM, Viresh Kumar wrote:
> CPUFreq core doesn't control value of .driver_data and this field is 
> completely
> driver specific. This can contain any value and not only indexes. For most of
> the drivers, which aren't using this field, its value is zero. So, printing 
> this
> from core doesn't make any sense. Don't print it.
> 
> Signed-off-by: Viresh Kumar 

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat

> ---
>  drivers/cpufreq/freq_table.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
> index 8e54f97..f002272 100644
> --- a/drivers/cpufreq/freq_table.c
> +++ b/drivers/cpufreq/freq_table.c
> @@ -36,8 +36,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy 
> *policy,
>   && table[i].driver_data == CPUFREQ_BOOST_FREQ)
>   continue;
> 
> - pr_debug("table entry %u: %u kHz, %u driver_data\n",
> - i, freq, table[i].driver_data);
> + pr_debug("table entry %u: %u kHz\n", i, freq);
>   if (freq < min_freq)
>   min_freq = freq;
>   if (freq > max_freq)
> @@ -175,8 +174,8 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
> *policy,
>   } else
>   *index = optimal.driver_data;
> 
> - pr_debug("target is %u (%u kHz, %u)\n", *index, table[*index].frequency,
> - table[*index].driver_data);
> + pr_debug("target index is %u, freq is:%u kHz\n", *index,
> +  table[*index].frequency);
> 
>   return 0;
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[linux-next] scsi_attach_vpd() warning at mm/page_alloc.c:2497

2014-03-27 Thread Sergey Senozhatsky

Hello,

[1.971778] [ cut here ]
[1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 
__alloc_pages_nodemask+0x1b9/0x693()
[1.972246] Modules linked in: sd_mod ahci
[1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 
3.14.0-rc8-next-20140327-dbg-dirty #202
[1.972890] Hardware name: Acer Aspire 5741G/Aspire 5741G
, BIOS V1.20 02/08/2011
[1.973182] Workqueue: events_unbound async_run_entry_fn
[1.973417]   8801534d5af0 813ad822 

[1.973994]  8801534d5b28 8103c03c 810ad4e5 
001000d0
[1.974529]  81656fc8  81656fc0 
8801534d5b38
[1.975125] Call Trace:
[1.975306]  [] dump_stack+0x4e/0x7a
[1.975488]  [] warn_slowpath_common+0x75/0x8e
[1.975670]  [] ? __alloc_pages_nodemask+0x1b9/0x693
[1.975853]  [] warn_slowpath_null+0x15/0x17
[1.976035]  [] __alloc_pages_nodemask+0x1b9/0x693
[1.976220]  [] ? console_unlock+0x2dd/0x2fa
[1.976404]  [] ? scsi_execute_req_flags+0x9c/0xb3
[1.976587]  [] __get_free_pages+0x12/0x3f
[1.976771]  [] __kmalloc+0x37/0x112
[1.976951]  [] scsi_attach_vpd+0x41/0x1a8
[1.977133]  [] scsi_probe_and_add_lun+0x8ec/0xa21
[1.977317]  [] __scsi_add_device+0xce/0x10a
[1.977501]  [] ata_scsi_scan_host+0x60/0x142
[1.977683]  [] async_port_probe+0x45/0x4a
[1.977873]  [] async_run_entry_fn+0x5a/0x110
[1.978061]  [] process_one_work+0x1c9/0x2e9
[1.978243]  [] worker_thread+0x1d3/0x2bd
[1.978424]  [] ? rescuer_thread+0x27d/0x27d
[1.978606]  [] kthread+0xd6/0xde
[1.978786]  [] ? kthread_create_on_node+0x162/0x162
[1.978970]  [] ret_from_fork+0x7c/0xb0
[1.979151]  [] ? kthread_create_on_node+0x162/0x162
[1.979334] ---[ end trace ca7c5bd74b0dca21 ]---


scsi_attach_vpd() (at pg0 label) attempts to firstly allocate 255 and later 
134217730 bytes.



---

general question,

scsi_attach_vpd() is sure that scsi_vpd_inquiry() returns `ret < 0' in case of
an error and `ret > 0' otherwise:

"Returns size of the vpd page on success or a negative error number."

while this is not exactly true.

scsi_vpd_inquiry() indeed can return -EINVAL, but usually it returns
scsi_execute_req()->scsi_execute_req_flags()->scsi_execute() status, which
has different error indication and can be, e.g. for failed blk_get_request()
in scsi_execute() or failed sense kzalloc() in scsi_execute_req_flags(),
`DRIVER_ERROR << 24'.


-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 00/47] mtd: nand: Add new driver supporting ST's BCH h/w

2014-03-27 Thread Lee Jones

> > After taking a quick glance at the whole driver I noticed you have something
> > strange going on. AFAIK, the typical NAND driver probe() should be one of
> > these two:
> > 
> > * Call nand_scan() which calls nand_scan_ident() + nand_scan_tail().
> > 
> > * Call nand_scan_ident() to identify the NAND device geometry, do some
> >   driver specific initialization, fill some hooks, and finally call
> >   nand_scan_tail() to complete the initialization.
> > 
> > You driver call nand_scan_ident() and then does some bad block scan, and
> > fills some callbacks on its own, but never calls nand_scan_tail().
> > 
> > The call to nand_scan_tail() would remove the need to export those NAND
> > core functions, and remove the need to scan and print the bad blocks.
> > I don't know if you have a real reason for not doing it this way, or
> > maybe it's the way this driver was originally written.
> > 
> > Care to review this and re-spin the driver? You'll have a more nicer
> > driver, and more framework-compliant.
> 
> A hearty +1 to this. You are avoiding much of the core of the NAND
> framework by avoiding the nand_chip callbacks and nand_scan_tail(), and
> by reimplementing the BBT. I will have to NAK to some of the patches
> that EXPORT the nand_base private core (e.g., nand_get_device()), and I
> will most likely NAK the custom BBT implementation (please improve
> nand_bbt.c as needed).

This is a good catch. I will attempt to reimplement the driver's
initialisation steps to utilise more of the core infrastructure in an
attempt to mitigate the requirement for exportation of private
routines.

The BBT requirements are somewhere more complex. To provide you with
the complete picture, a little knowledge of driver history is
required. When it was initially created the MTD core only supported
OOB BBTs, but the ST BCH Controller doesn't support OOB access, so
Angus wrote his on In-Band (IB) implementation. Unfortunately the IB
support which _is_ now present in the kernel doesn't match the
internal implementation. Normally this wouldn't be an issue in itself,
but ST's boot-stack and tooling (Primary Bootloader, U-Boot, various
Programmers, etc) are aware of the internal IB BTT and utilise it
in varying ways. Shifting over to the Mainline version in
one-foul-swoop _will_ cause lots of pain and will probably result in
the disownership of driver we're trying to Mainline today. Naturally
I'm keen to avoid this.

> > Also, if you plan to target v3.16 on this, I'd suggest that you pick
> > some pack of features and submit those first, reducing the amount of code
> > to be reviewed. For instance, you may choose to leave some of the ECC bits
> > aside for now.
> > 
> > It's just a suggestion to get at least some of the code merged quicker,
> > don't take me too seriously on this.
> 
> That's a possible approach if it still leaves your driver functional.
> But I wouldn't trim the driver too much just for sake of reviewing.
> 
> BTW, why do you call this driver stm_nand_bch? BCH is a particular type
> of ECC algorithm, not unique at all to ST's hardware. Can you drop the
> _bch and make it just stm_nand?

>From my knowledge (Angus feel free to jump in anywhere you like), ST
have 4 NAND drivers. This is just one of them. The others are EMI,
Flex and Advanced Flex (AFM). This particular controller is described
as the BCH throughout ST's documentation. Much of this documentation is
available freely online [1].

> Also, you might want to change the
> namespacing on some of your functions; for instance, I don't think you
> can own the name bch_write(). Possibly prefix things with stm_* or
> stm_nand_* where reasonable.

Yes, absolutely.

[1] http://www.stlinux.com/howto/NAND

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2 net] vhost: fix total length when packets are too short

2014-03-27 Thread Michael S. Tsirkin

When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.

This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.

Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.

Fix this up by detecting this overrun and doing packet drop
immediately.

CVE-2014-0077

Signed-off-by: Michael S. Tsirkin 
---

Changes from v1:
Fix CVE# in the commit log.
Patch is unchanged.

Note: this is needed for -stable.

I wonder if this can still make the release.

 drivers/vhost/net.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index a0fa5de..026be58 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -532,6 +532,12 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
*iovcount = seg;
if (unlikely(log))
*log_num = nlogs;
+
+   /* Detect overrun */
+   if (unlikely(datalen > 0)) {
+   r = UIO_MAXIOV + 1;
+   goto err;
+   }
return headcount;
 err:
vhost_discard_vq_desc(vq, headcount);
@@ -587,6 +593,14 @@ static void handle_rx(struct vhost_net *net)
/* On error, stop handling until the next kick. */
if (unlikely(headcount < 0))
break;
+   /* On overrun, truncate and discard */
+   if (unlikely(headcount > UIO_MAXIOV)) {
+   msg.msg_iovlen = 1;
+   err = sock->ops->recvmsg(NULL, sock, &msg,
+1, MSG_DONTWAIT | MSG_TRUNC);
+   pr_debug("Discarded rx packet: len %zd\n", sock_len);
+   continue;
+   }
/* OK, now we need to know about added descriptors. */
if (!headcount) {
if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Consistent kernel oops with 3.11.10 & 3.12.9 on Haswell CPUs...

2014-03-27 Thread dafreedm

Hi,

I've attached another oops (initial one from untainted kernel, and
then successive ones) on the same machine.

Please see the HW stress-testing I've already done below (without
seeing such an oops).  Any further suggestions?

Also, how can I tell from the registers you decoded (below) that it's
a bit-flip?  (That way I can look at this stuff more myself,
perhaps)...

Thanks.



On Sun, Mar 23, 2014, Daniel Freedman wrote:
> >   Hum, so decodecode shows:
> > ...
> >   26: 48 85 c0test   %rax,%rax
> >   29: 74 10   je 0x3b
> >   2b:*0f b7 80 ac 05 00 00movzwl 0x5ac(%rax),%eax <-- 
> > trapping instruction
> >   32: 66 85 c0test   %ax,%ax
> > ...
> >
> >   And the register has:
> > RAX: f7ff880037267140 RBX: 1000 RCX: 
> >
> >   So that looks like a bitbflip the upper byte.
> 
> Just for my own knowledge / growth --- how can you tell there's a
> "bitbflip" on the upper byte?
> 
> > So I'd check the hardware first...
> 
> 
> Yes, I absolutely did check the HW first --- and repeatedly (over a
> couple of weeks) --- before reaching out to LKML.
> 
> As described in my original email below, here's what I've done so far:
> 
>   I've been very extensively testing all of the likely culprits among
>   hardware components on both of my servers --- running memtest86 upon
>   boot for 3+ days, memtester in userspace for 24 hours, repeated
>   kernel compiles with various '-j' values, and the 'stress' and
>   'stressapptest' load generators (see below for full details) --- and
>   I have never seen even a hiccup in server operation under such
>   "artificial" environments --- however, it consistently occurs with
>   heavy md5sum operation, and randomly at other times.
> 
> More specifically, here are the exact stept I took to try to implicate
> the HW:
> 
>   aptitude install memtest86+  # reboot and run for 3+ days
> 
>   aptitude install memtester
>   memtester 30G
> 
>   aptitude install linux-source
>   cp /usr/src/linux-source-3.2.tar.bz2 /root/
>   tar xvfj linux-source-3.2.tar.bz2
>   cd linux-source-3.2/
>   make defconfig
>   time make 1>LOG 2>ERR
>   make mrproper
>   make defconfig
>   time make -j16 1>LOG 2>ERR
> 
>   aptitude install stress
>   stress --cpu 8 --io 4 --vm 2 --timeout 10s --dry-run
>   stress --cpu 8 --io 4 --vm 2 --hdd 3 --timeout 60s
>   stress --cpu 8 --io 8 --vm 8 --hdd 4 --timeout 5m
> 
>   aptitude install stressapptest
>   stressapptest -m 8 -i 4 -C 4 -W -s 30
>   stressapptest -m 8 -i 4 -C 4 -W -f /root/sat-file-test --filesize 1gb -s 30
>   stressapptest -m 8 -i 4 -C 4 -W -f /root/sat-file-test --filesize 1024 
> --random-threads 4 -s 30
>   stressapptest -m 8 -i 4 -C 4 -W --cc_test -s 30
>   stressapptest -m 8 -i 4 -C 4 -W --local_numa -s 30
>   stressapptest -m 8 -i 4 -C 4 -W -n 127.0.0.1 --listen -s 30
>   stressapptest -m 12 -i 6 -C 8 -W -f /root/sat-file-test --filesize 1024 
> --random-threads 4 -n 127.0.0.1 --listen -s 300
> 
> 
> As mentioned earlier --- I just could not make it oops doing the
> above! (or get any errors in the standalone memtest86+ procedure).
> 
> What do you think?  Should I just keep on stress-testing it somewhat
> indefinitely?  Also, please recall that I have two of the identical
> machines, and I suffer the same problems with both of them (and they
> both pass the above artificial stress-testing).
> 
> Thoughts or suggestions, please, for me to explore further...
> 
> Thanks again!
[210799.624492] invalid opcode:  [#1] SMP 
[210799.624516] Modules linked in: dm_crypt dm_mod parport_pc ppdev lp parport 
bnep rfcomm bluetooth rfkill cpufreq_stats cpufreq_userspace 
cpufreq_conservative cpufreq_powersave nfsd auth_rpcgss oid_registry nfs_acl 
nfs lockd fscache sunrpc netconsole configfs loop raid1 md_mod 
snd_hda_codec_realtek snd_hda_codec_hdmi joydev hid_generic hid_kensington 
usbhid hid x86_pkg_temp_thermal coretemp kvm_intel kvm snd_hda_intel 
crct10dif_pclmul snd_hda_codec crc32_pclmul crc32c_intel snd_hwdep snd_pcm 
ghash_clmulni_intel snd_page_alloc snd_seq iTCO_wdt snd_seq_device aesni_intel 
iTCO_vendor_support aes_x86_64 snd_timer evdev lrw gf128mul i915 glue_helper 
ablk_helper snd cryptd drm_kms_helper soundcore psmouse pcspkr drm lpc_ich 
mei_me mfd_core serio_raw mei i2c_i801 video button processor ext4 crc16 
mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common ahci libahci libata xhci_hcd 
scsi_mod ehci_pci ehci_hcd e1000e igb i2c_algo_bit i2c_core usbcore dca ptp 
usb_common pps_core fan thermal thermal_sys
[210799.624870] CPU: 2 PID: 22239 Comm: Timer Not tainted 3.12-0.bpo.1-amd64 #1 
Debian 3.12.9-1~bpo70+1
[210799.624891] Hardware name: Supermicro X10SLQ/X10SLQ, BIOS 1.00 05/09/2013
[210799.624908] task: 88081a485800 ti: 88081ba24000 task.ti: 
88081ba24000
[210799.624927] RIP: 0010:[]  [] 
futex_requeue+0x721/0x7e0
[210799.624957] RSP: 0018:88081ba25e00  EFLAGS: 00010297
[210799.624974] RAX: 0002 RBX:

[PATCH v2] serial_core: Fix pm imbalance on unbind

2014-03-27 Thread Geert Uytterhoeven

From: Geert Uytterhoeven 

When a serial port is closed, uart_close() takes care of shutting down the
hardware, and powering it down.

When a serial port is unbound while in use, uart_close() bypasses all of
this, as this is supposed to be done through uart_hangup() (invoked via
tty_vhangup() in uart_remove_one_port()).

However, uart_hangup() does not set the hardware's power state, leaving it
powered up.  This may also lead to unbounded nesting counts in clock and
power management, depending on their internal implementation.

Make sure to power down the port in uart_hangup(), except when the port is
used as a serial console.

For serial consoles, this operation must be postponed until after the port
becomes completely unused. This case is not fixed yet, as it depends on a
(future) fix for the tty->count vs. port->count imbalance on failed
uart_open().

After this, the module clock used by the sh-sci driver is disabled on
unbind while the serial port is in use.

Signed-off-by: Geert Uytterhoeven 
---
v2:
  - Drop serial console case, as this needs other fixes first

 drivers/tty/serial/serial_core.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 2cf5649a6dc0..0cec51ce8902 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -1452,6 +1452,8 @@ static void uart_hangup(struct tty_struct *tty)
clear_bit(ASYNCB_NORMAL_ACTIVE, &port->flags);
spin_unlock_irqrestore(&port->lock, flags);
tty_port_tty_set(port, NULL);
+   if (!uart_console(state->uart_port))
+   uart_change_pm(state, UART_PM_STATE_OFF);
wake_up_interruptible(&port->open_wait);
wake_up_interruptible(&port->delta_msr_wait);
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sh: pci: Remove duplicate SH4A_PCIEPHYCTLR

2014-03-27 Thread Geert Uytterhoeven

From: Geert Uytterhoeven 

Signed-off-by: Geert Uytterhoeven 
---
 arch/sh/drivers/pci/pcie-sh7786.h |3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/sh/drivers/pci/pcie-sh7786.h 
b/arch/sh/drivers/pci/pcie-sh7786.h
index 1ee054e47eae..4a6ff55f759b 100644
--- a/arch/sh/drivers/pci/pcie-sh7786.h
+++ b/arch/sh/drivers/pci/pcie-sh7786.h
@@ -145,9 +145,6 @@
 /* PCIERMSGIER */
 #defineSH4A_PCIERMSGIER(0x004040)  /* R/W - 0x  32 
*/
 
-/* PCIEPHYCTLR */
-#define SH4A_PCIEPHYCTLR   (0x01)  /* R/W - 0x  32 */
-
 /* PCIEPHYADRR */
 #defineSH4A_PCIEPHYADRR(0x010004)  /* R/W - 0x  32 
*/
 #defineBITS_ACK(24)// Rev1.171
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] clk: reverse default clk provider initialization order in of_clk_init()

2014-03-27 Thread Sylwester Nawrocki

This restores the default clocks registration order as parsed from
devicetree, i.e. as before commit 1771b10d605d26ccee771a7fb4b08718
"clk: respect the clock dependencies in of_clk_init", for when there
is no explicit parent clock dependencies between clock providers
specified in the device tree.

It prevents regressions (boot failure, division by 0 errors) on
imx and exynos platforms.

Signed-off-by: Sylwester Nawrocki 
---
 drivers/clk/clk.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 07cba07..c859adf 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2596,7 +2596,7 @@ void __init of_clk_init(const struct of_device_id 
*matches)
 
parent->clk_init_cb = match->data;
parent->np = np;
-   list_add(&parent->node, &clk_provider_list);
+   list_add_tail(&parent->node, &clk_provider_list);
}
 
while (!list_empty(&clk_provider_list)) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] clk: reverse default clk provider initialization order in of_clk_init()

2014-03-27 Thread Sylwester Nawrocki

On 27/03/14 11:43, Sylwester Nawrocki wrote:
> This restores the default clocks registration order as parsed from
> devicetree, i.e. as before commit 1771b10d605d26ccee771a7fb4b08718
> "clk: respect the clock dependencies in of_clk_init", for when there
> is no explicit parent clock dependencies between clock providers
> specified in the device tree.
> 
> It prevents regressions (boot failure, division by 0 errors) on
> imx and exynos platforms.
> 
> Signed-off-by: Sylwester Nawrocki 

Oops, I've forgotten to add:

Tested-by: Fabio Estevam 

> ---
>  drivers/clk/clk.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 07cba07..c859adf 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -2596,7 +2596,7 @@ void __init of_clk_init(const struct of_device_id 
> *matches)
>  
>   parent->clk_init_cb = match->data;
>   parent->np = np;
> - list_add(&parent->node, &clk_provider_list);
> + list_add_tail(&parent->node, &clk_provider_list);
>   }
>  
>   while (!list_empty(&clk_provider_list)) {

-- 
Regards,
Sylwester

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: don't print value of .driver_data from core

2014-03-27 Thread Gautham R Shenoy


On Thu, Mar 27, 2014 at 03:37:22PM +0530, Viresh Kumar wrote:
> CPUFreq core doesn't control value of .driver_data and this field is 
> completely
> driver specific. This can contain any value and not only indexes. For most of
> the drivers, which aren't using this field, its value is zero. So, printing 
> this
> from core doesn't make any sense. Don't print it.

So after this patch, driver_data is only going to be used by drivers
which want an "unsigned int" value to be saved along with the
frequency in the frequency_table and for those who want to overload
its interpretation to indicate BOOST.

>From the core's stand point, it is useful only for determining whether
a frequency is BOOST frequency or not. 

So, wouldn't it be logical to allow drivers maintain their own driver
data since the core is anyway not interested in it, and change this
.driver_data to "flags" or some such which can indicate boost ?

--
Thanks and Regards
gautham.

> 
> Signed-off-by: Viresh Kumar 
> ---
>  drivers/cpufreq/freq_table.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
> index 8e54f97..f002272 100644
> --- a/drivers/cpufreq/freq_table.c
> +++ b/drivers/cpufreq/freq_table.c
> @@ -36,8 +36,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy 
> *policy,
>   && table[i].driver_data == CPUFREQ_BOOST_FREQ)
>   continue;
> 
> - pr_debug("table entry %u: %u kHz, %u driver_data\n",
> - i, freq, table[i].driver_data);
> + pr_debug("table entry %u: %u kHz\n", i, freq);
>   if (freq < min_freq)
>   min_freq = freq;
>   if (freq > max_freq)
> @@ -175,8 +174,8 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
> *policy,
>   } else
>   *index = optimal.driver_data;
> 
> - pr_debug("target is %u (%u kHz, %u)\n", *index, table[*index].frequency,
> - table[*index].driver_data);
> + pr_debug("target index is %u, freq is:%u kHz\n", *index,
> +  table[*index].frequency);
> 
>   return 0;
>  }
> -- 
> 1.7.12.rc2.18.g61b472e
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 1/1] mtd: gpmi: make blockmark swapping optional

2014-03-27 Thread Huang Shijie

On Wed, Mar 26, 2014 at 12:55:02PM +0100, Lothar Waßmann wrote:
> Hi,
> 
> Huang Shijie wrote:
> > 于 2014年03月26日 16:51, Lothar Waßmann 写道:
> > > I don't see why this should not be supported on i.MX28 (i.MX23 doesn't
> > > do byteswapping anyway, so this wouldn't change anything for i.MX23).
> > > The partitions used by Linux need not necessarily be accessible for the
> > > Boot ROM code (and vice versa).
> > But the first partition used to store the u-boot is accessible for the ROM.
> > 
> Whether this partition is available to Linux depends on the partition table
> passed in the DT.
yes, i agree.

But it is strange if we do not export this partition to Linux.

> 
> > Please see "Figure 12-13" in the 12.12.1.12:
> >"In order to preserve the BI (bad block information), flash updater 
> > or gang programmer
> > applications need to swap Bad Block Information (BI) data to byte 0 of 
> > metadata area for
> > every page before programming NAND Flash. ROM when loading firmware, 
> > copies back
> > the value at metadata[0] to BI offset in page data. The following figure 
> > shows how the
> > factory bad block marker is preserved."
> > 
> The inspection of the BB markers is only a fallback for the case that
> there is no DBBT. From the same chapter that you quoted above:
> | ROM uses DBBT to skip any bad block that falls within firmware data
> | on NAND Flash device.
> | If the address of DBBT Search Area in FCB is 0, ROM will rely on
> | factory marked bad block markers to find out if a block is good or bad.
> 
> Thus, even the boot ROM of i.MX28 can well live without blockmark
> swapping.

Assume that there is a NAND block "A",  and the A consist of 256 pages.
the uboot is burned to the "A", can occupy 6 pages:

  -
 | page 0 |  page 1 | page 2 | page 3 | page 4 | page 5 | ... | ... | page 255 |
  -
 
  \-- /
 V  
"A" 
 

The DBBT is used to track if "A" is bad or not.
Assume we know that "A" is a good block, ROM then need to read out the uboot.
When the ROM needs to read out the 6 pages one by one. And each time the ROM 
read
the page, it should do the swapping for this page.

In this case, the ROM will do the swapping six times.

Please read the sector again, you will see the "every page" in it:

   "In order to preserve the BI (bad block information), flash updater 
or gang programmer applications need to swap Bad Block Information (BI) data to 
byte 0 of 
metadata area for every page before programming NAND Flash. ROM when loading 
firmware, 
copies back


thanks
Huang Shijie

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCHv4 0/5] devfreq: exynos: Fix minor issue and code clean to remove legacy method

2014-03-27 Thread 함명주


On Mon, Mar 24, 2014 at 10:36 AM, Chanwoo Choi  wrote:
> Hi Tomasz,
>
> On 03/22/2014 11:52 PM, Tomasz Figa wrote:
>> Hi,
>>
>> [fixing mistyped addresses of me and Bartlomiej]
>>
>> On 20.03.2014 03:59, Chanwoo Choi wrote:
>>> This patchset use SIMPLE_DEV_PM_OPS macro intead of legacy method and fix
>>> probe fail if CONFIG_PM_OPP is disabled. Also, this patchset fix minor 
>>> issue.
>>>
>>
>> Reviewed-by: Tomasz Figa 
>>
>
> Thanks for your review always.
>
> Best Regards,
> Chanwoo Choi
>

Applied/Merged.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" 
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
MyungJoo Ham, Ph.D.
System S/W Lab, S/W Center, Samsung Electronics

[PATCH net] vhost: validate vhost_get_vq_desc return value

2014-03-27 Thread Michael S. Tsirkin

vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfff2
as vector size, which exceeds the allocated size.

The code in question was introduced in commit
8dd014adfea6f173c1ef6378f7e5e7924866c923
vhost-net: mergeable buffers support

CVE-2014-0055

Signed-off-by: Michael S. Tsirkin 
---

This is needed in -stable.

 drivers/vhost/net.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 026be58..e1e22e0 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -505,9 +505,13 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
r = -ENOBUFS;
goto err;
}
-   d = vhost_get_vq_desc(vq->dev, vq, vq->iov + seg,
+   r = vhost_get_vq_desc(vq->dev, vq, vq->iov + seg,
  ARRAY_SIZE(vq->iov) - seg, &out,
  &in, log, log_num);
+   if (unlikely(r < 0))
+   goto err;
+
+   d = r;
if (d == vq->num) {
r = 0;
goto err;
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] clk: register fixed-clock only if #clock-cells property is present

2014-03-27 Thread Sylwester Nawrocki

Mike, please ignore this patch for now. It turns out a less intrusive
change [1] is enough to fix the regressions on both imx and exynos.

I'm going to address the issue properly for next kernel release and
make exynos use regular fixed-clock driver, rather than registering
the external clock generators within the SoC main clock controller
driver.

-- 
Thanks,
Sylwester

[1] http://permalink.gmane.org/gmane.linux.kernel/1673639
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ASoC: fsl_sai: Add isr to deal with error flag

2014-03-27 Thread Mark Brown

On Thu, Mar 27, 2014 at 11:57:27AM +0800, Nicolin Chen wrote:
> On Thu, Mar 27, 2014 at 12:06:53PM +0800, Xiubo Li-B47053 wrote:

> > I have checked in the Vybrid and LS1 SoC datasheets, and they are all the
> > Same as above, and nothing else.

> > Have I missed ?

> What i.MX IC team told me is SAI ignores what we do to FWF and FRF, so you
> don't need to worry about it at all unless Vybrid makes them writable, in
> which case we may also need to clear these bits and confirm with Vybrid IC
> team if they're also W1C.

And even if it payed attention I'd expect that a lot of the time they'd
just be reasserted immediately as the condition still holds.


signature.asc
Description: Digital signature

Re: [PATCH 1/2] regmap: mmio: add regmap_mmio_{regsize, count}_check.

2014-03-27 Thread Mark Brown

On Thu, Mar 27, 2014 at 12:42:42PM +0800, Xiubo Li wrote:
> Signed-off-by: Xiubo Li 

Applied both, thanks.


signature.asc
Description: Digital signature

Re: [PATCH] cpufreq: don't print value of .driver_data from core

2014-03-27 Thread Viresh Kumar

On 27 March 2014 16:18, Gautham R Shenoy  wrote:
> So after this patch, driver_data is only going to be used by drivers
> which want an "unsigned int" value to be saved along with the
> frequency in the frequency_table and for those who want to overload
> its interpretation to indicate BOOST.
>
> From the core's stand point, it is useful only for determining whether
> a frequency is BOOST frequency or not.

Yes.

> So, wouldn't it be logical to allow drivers maintain their own driver
> data since the core is anyway not interested in it, and change this
> .driver_data to "flags" or some such which can indicate boost ?

We can add another field .flags in case Rafael doesn't accept the
other proposal I sent for fixing BOOST issue.

But the point behind keeping .driver_data field here was: many drivers
have some information attached to each frequency and they are closely
bound to each other. And so it made more sense to keep them together.
This is still used by many drivers and I wouldn't like them to maintain
separate arrays for keeping this information. They are so much bound
to the frequencies at the same index, that keeping them separately
wouldn't be a good idea.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: hpet: Don't default CONFIG_HPET_TIMER to be y for X86_64

2014-03-27 Thread Clemens Ladisch

Feng Tang wrote:
> On many new phone/tablet platforms like Baytrail/Merrifield etc,
> the HPET are either defeatured or has some problem to be used
> as a reliable timer. As these platforms also have X86_64, we
> should not make HPET_TIMER default y for all X86_64.

The help text still says:
| You can safely choose Y here.  [...]
| Choose N to continue using the legacy 8254 timer.

Are these statements still true for those platforms?


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cpufreq: ia64: don't set .driver_data to index

2014-03-27 Thread Viresh Kumar

.driver_data field is only required to be filled if drivers want to preserve
some data in there which they can use according to the value of .frequency
field. But this driver isn't using this field at all, but just setting it equal
to the index value. Which isn't required. Fix it.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/ia64-acpi-cpufreq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/cpufreq/ia64-acpi-cpufreq.c 
b/drivers/cpufreq/ia64-acpi-cpufreq.c
index 53c6ac6..f0d447d 100644
--- a/drivers/cpufreq/ia64-acpi-cpufreq.c
+++ b/drivers/cpufreq/ia64-acpi-cpufreq.c
@@ -275,7 +275,6 @@ acpi_cpufreq_cpu_init (
/* table init */
for (i = 0; i <= data->acpi_data.state_count; i++)
{
-   data->freq_table[i].driver_data = i;
if (i < data->acpi_data.state_count) {
data->freq_table[i].frequency =
  data->acpi_data.states[i].core_frequency * 1000;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

fail to add 64 VLANs or more when SR-IOV is enabled

2014-03-27 Thread Madoka Komatsubara

Hi all,


We're facing an issue that we cannot add 64 VLANs or more per VF.
When using SR-IOV, pass through a VF to the KVM guest and a lot of VLANs,
we could add 63 VLANs using vconfig but we failed to add 64th VLAN.
We'd like to use many VLANs on the guest with SR-IOV.
We're using Intel's 82599EB chip with ixgbe and ixgbevf driver.

Has anyone seen the same issue?
Is there any idea to solve this?

The below instruction is a reproducing method.
Create hundred VLANs on the guest.

# for i in `seq 100 199`; do vconfig add eth2 $i; ifconfig 192.168.$i.1/24; done

# vconfig add eth2 100
Added VLAN with VID == 100 to IF -:eth2:-

# vconfig add eth2 101
Added VLAN with VID == 101 to IF -:eth2:-

# vconfig add eth2 102
Added VLAN with VID == 102 to IF -:eth2:-

...

# vconfig add eth2 162
Added VLAN with VID == 162 to IF -:eth2:-

# vconfig add eth2 163
ERROR: trying to add VLAN #163 to IF -:eth2:-  error: Permission denied
SIOCSIFADDR: No such device
eth2.163: unknown interface: No such device

# vconfig add eth2 164
ERROR: trying to add VLAN #164 to IF -:eth2:-  error: Permission denied
SIOCSIFADDR: No such device
eth2.164: unknown interface: No such device

# vconfig add eth2 165
ERROR: trying to add VLAN #165 to IF -:eth2:-  error: Permission denied
SIOCSIFADDR: No such device
eth2.165: unknown interface: No such device


thanks,
Madoka Komatsubara
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] mm/percpu.c: don't bother to re-walk the pcpu_slot list if nobody free space since we last drop pcpu_lock.

2014-03-27 Thread Jianyu Zhan

Presently, after we fail the first try to walk the pcpu_slot list
to find a chunk for allocating, we just drop the pcpu_lock spinlock,
and go allocating a new chunk. Then we re-gain the pcpu_lock and
anchoring our hope on that during this period, some guys might have
freed space for us(we still hold the pcpu_alloc_mutex during this
period, so only freeing or reclaiming could happen), we do a fully
rewalk of the pcpu_slot list.

However if nobody free space, this fully rewalk may seem too silly,
and we would eventually fall back to the new chunk.

And since we hold pcpu_alloc_mutex, only freeing or reclaiming path
could touch the pcpu_slot(which just need holding a pcpu_lock), we
could maintain a pcpu_slot_stat bitmap to record that during the period
we don't have the pcpu_lock, if anybody free space to any slot we
interest in. If so, we just just go inside these slots for a try;
if not, we just do allocation using the newly-allocated fully-free
new chunk.

Signed-off-by: Jianyu Zhan 
---
 mm/percpu.c | 80 -
 1 file changed, 69 insertions(+), 11 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index cfda29c..4e81367 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -178,6 +178,13 @@ static DEFINE_MUTEX(pcpu_alloc_mutex); /* protects 
whole alloc and reclaim */
 static DEFINE_SPINLOCK(pcpu_lock); /* protects index data structures */
 
 static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
+/* A bitmap to record the stat of pcpu_slot, protected by pcpu_lock.
+ * If the correspoding bit == 0, that slot doesn't get changed during
+ * pcpu_lock dropped period; if bit == 1, otherwise.
+ *
+ * We have to defer its initialization until we konw the exact value of
+ * pcpu_nr_slots. */
+static unsigned long *pcpu_slot_stat_bitmap;
 
 /* reclaim work to release fully free chunks, scheduled from free path */
 static void pcpu_reclaim(struct work_struct *work);
@@ -313,10 +320,13 @@ static void pcpu_mem_free(void *ptr, size_t size)
vfree(ptr);
 }
 
+#define PCPU_ALLOC 1
+#define PCPU_FREE  0
 /**
  * pcpu_chunk_relocate - put chunk in the appropriate chunk slot
  * @chunk: chunk of interest
  * @oslot: the previous slot it was on
+ * @reason: why we get here, from allocating or freeing path?
  *
  * This function is called after an allocation or free changed @chunk.
  * New slot according to the changed state is determined and @chunk is
@@ -326,15 +336,23 @@ static void pcpu_mem_free(void *ptr, size_t size)
  * CONTEXT:
  * pcpu_lock.
  */
-static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot)
+static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot,
+   int reason)
 {
int nslot = pcpu_chunk_slot(chunk);
 
-   if (chunk != pcpu_reserved_chunk && oslot != nslot) {
-   if (oslot < nslot)
+   if (chunk != pcpu_reserved_chunk) {
+   if (oslot < nslot) {
list_move(&chunk->list, &pcpu_slot[nslot]);
-   else
+   /* oslot < nslot means we get more space
+* in this chunk, so mark it */
+   __set_bit(nslot, pcpu_slot_stat_bitmap);
+   } else if (oslot > nslot)
list_move_tail(&chunk->list, &pcpu_slot[nslot]);
+   else if (reason == PCPU_FREE)
+   /* oslot == nslot, but we are freeing space
+* in this chunk, worth trying, mark it */
+   __set_bit(nslot, pcpu_slot_stat_bitmap);
}
 }
 
@@ -546,12 +564,12 @@ static int pcpu_alloc_area(struct pcpu_chunk *chunk, int 
size, int align)
chunk->free_size -= chunk->map[i];
chunk->map[i] = -chunk->map[i];
 
-   pcpu_chunk_relocate(chunk, oslot);
+   pcpu_chunk_relocate(chunk, oslot, PCPU_ALLOC);
return off;
}
 
chunk->contig_hint = max_contig;/* fully scanned */
-   pcpu_chunk_relocate(chunk, oslot);
+   pcpu_chunk_relocate(chunk, oslot, PCPU_ALLOC);
 
/* tell the upper layer that this chunk has no matching area */
return -1;
@@ -600,7 +618,7 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int 
freeme)
}
 
chunk->contig_hint = max(chunk->map[i], chunk->contig_hint);
-   pcpu_chunk_relocate(chunk, oslot);
+   pcpu_chunk_relocate(chunk, oslot, PCPU_FREE);
 }
 
 static struct pcpu_chunk *pcpu_alloc_chunk(void)
@@ -714,6 +732,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, 
bool reserved)
int slot, off, new_alloc;
unsigned long flags;
void __percpu *ptr;
+   bool retry = false;
+   int base_slot = pcpu_size_to_slot(size);
 
if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
WARN(true, "illegal size (%zu) or align (%zu) for "
@@ -752,7 +772,12 @

[PATCH v2] ASoC: fsl_sai: Add isr to deal with error flag

2014-03-27 Thread Nicolin Chen

It's quite cricial to clear error flags because SAI might hang if getting
FIFO underrun during playback (I haven't confirmed the same issue on Rx
overflow though).

So this patch enables those irq and adds isr() to clear the flags so as to
keep playback entirely safe.

Signed-off-by: Nicolin Chen 
---

Changelog
v2:
 * Mask the active flags only for the following handler.
 * Reset FIFO for FIFO underrun/overflow cases.
 * Enable two error flags only as default.
 * Use dev_warn for two error flags.
 * Only clear those W1C bits.

 sound/soc/fsl/fsl_sai.c | 85 +++--
 sound/soc/fsl/fsl_sai.h | 15 +
 2 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index c4a4231..0bc98bb 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -23,6 +23,71 @@
 
 #include "fsl_sai.h"
 
+#define FSL_SAI_FLAGS (FSL_SAI_CSR_SEIE |\
+  FSL_SAI_CSR_FEIE)
+
+static irqreturn_t fsl_sai_isr(int irq, void *devid)
+{
+   struct fsl_sai *sai = (struct fsl_sai *)devid;
+   struct device *dev = &sai->pdev->dev;
+   u32 xcsr, mask;
+
+   /* Only handle those what we enabled */
+   mask = (FSL_SAI_FLAGS >> FSL_SAI_CSR_xIE_SHIFT) << FSL_SAI_CSR_xF_SHIFT;
+
+   /* Tx IRQ */
+   regmap_read(sai->regmap, FSL_SAI_TCSR, &xcsr);
+   xcsr &= mask;
+
+   if (xcsr & FSL_SAI_CSR_WSF)
+   dev_dbg(dev, "isr: Start of Tx word detected\n");
+
+   if (xcsr & FSL_SAI_CSR_SEF)
+   dev_warn(dev, "isr: Tx Frame sync error detected\n");
+
+   if (xcsr & FSL_SAI_CSR_FEF) {
+   dev_warn(dev, "isr: Transmit underrun detected\n");
+   /* FIFO reset for safety */
+   xcsr |= FSL_SAI_CSR_FR;
+   }
+
+   if (xcsr & FSL_SAI_CSR_FWF)
+   dev_dbg(dev, "isr: Enabled transmit FIFO is empty\n");
+
+   if (xcsr & FSL_SAI_CSR_FRF)
+   dev_dbg(dev, "isr: Transmit FIFO watermark has been reached\n");
+
+   regmap_update_bits(sai->regmap, FSL_SAI_TCSR,
+  FSL_SAI_CSR_xF_W_MASK | FSL_SAI_CSR_FR, xcsr);
+
+   /* Rx IRQ */
+   regmap_read(sai->regmap, FSL_SAI_RCSR, &xcsr);
+   xcsr &= mask;
+
+   if (xcsr & FSL_SAI_CSR_WSF)
+   dev_dbg(dev, "isr: Start of Rx word detected\n");
+
+   if (xcsr & FSL_SAI_CSR_SEF)
+   dev_warn(dev, "isr: Rx Frame sync error detected\n");
+
+   if (xcsr & FSL_SAI_CSR_FEF) {
+   dev_warn(dev, "isr: Receive overflow detected\n");
+   /* FIFO reset for safety */
+   xcsr |= FSL_SAI_CSR_FR;
+   }
+
+   if (xcsr & FSL_SAI_CSR_FWF)
+   dev_dbg(dev, "isr: Enabled receive FIFO is full\n");
+
+   if (xcsr & FSL_SAI_CSR_FRF)
+   dev_dbg(dev, "isr: Receive FIFO watermark has been reached\n");
+
+   regmap_update_bits(sai->regmap, FSL_SAI_RCSR,
+  FSL_SAI_CSR_xF_W_MASK | FSL_SAI_CSR_FR, xcsr);
+
+   return IRQ_HANDLED;
+}
+
 static int fsl_sai_set_dai_sysclk_tr(struct snd_soc_dai *cpu_dai,
int clk_id, unsigned int freq, int fsl_dir)
 {
@@ -373,8 +438,8 @@ static int fsl_sai_dai_probe(struct snd_soc_dai *cpu_dai)
 {
struct fsl_sai *sai = dev_get_drvdata(cpu_dai->dev);
 
-   regmap_update_bits(sai->regmap, FSL_SAI_TCSR, 0x, 0x0);
-   regmap_update_bits(sai->regmap, FSL_SAI_RCSR, 0x, 0x0);
+   regmap_update_bits(sai->regmap, FSL_SAI_TCSR, 0x, 
FSL_SAI_FLAGS);
+   regmap_update_bits(sai->regmap, FSL_SAI_RCSR, 0x, 
FSL_SAI_FLAGS);
regmap_update_bits(sai->regmap, FSL_SAI_TCR1, FSL_SAI_CR1_RFW_MASK,
   FSL_SAI_MAXBURST_TX * 2);
regmap_update_bits(sai->regmap, FSL_SAI_RCR1, FSL_SAI_CR1_RFW_MASK,
@@ -490,12 +555,14 @@ static int fsl_sai_probe(struct platform_device *pdev)
struct fsl_sai *sai;
struct resource *res;
void __iomem *base;
-   int ret;
+   int irq, ret;
 
sai = devm_kzalloc(&pdev->dev, sizeof(*sai), GFP_KERNEL);
if (!sai)
return -ENOMEM;
 
+   sai->pdev = pdev;
+
sai->big_endian_regs = of_property_read_bool(np, "big-endian-regs");
if (sai->big_endian_regs)
fsl_sai_regmap_config.val_format_endian = REGMAP_ENDIAN_BIG;
@@ -514,6 +581,18 @@ static int fsl_sai_probe(struct platform_device *pdev)
return PTR_ERR(sai->regmap);
}
 
+   irq = platform_get_irq(pdev, 0);
+   if (irq < 0) {
+   dev_err(&pdev->dev, "no irq for node %s\n", np->full_name);
+   return irq;
+   }
+
+   ret = devm_request_irq(&pdev->dev, irq, fsl_sai_isr, 0, np->name, sai);
+   if (ret) {
+   dev_err(&pdev->dev, "failed to claim irq %u\n", irq);
+   return ret;
+   }
+
sai->dma_params_rx.addr = res->start + FSL_SAI_RDR;
sa

[PATCH 1/2] mm/percpu.c: renew the max_contig if we merge the head and previous block.

2014-03-27 Thread Jianyu Zhan

During pcpu_alloc_area(), we might merge the current head with the
previous block. Since we have calculated the max_contig using the
size of previous block before we skip it, and now we update the size
of previous block, so we should renew the max_contig.

Signed-off-by: Jianyu Zhan 
---
 mm/percpu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 036cfe0..cfda29c 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -506,9 +506,11 @@ static int pcpu_alloc_area(struct pcpu_chunk *chunk, int 
size, int align)
 * uncommon for percpu allocations.
 */
if (head && (head < sizeof(int) || chunk->map[i - 1] > 0)) {
-   if (chunk->map[i - 1] > 0)
+   if (chunk->map[i - 1] > 0) {
chunk->map[i - 1] += head;
-   else {
+   max_contig =
+   max(chunk->map[i - 1], max_contig);
+   } else {
chunk->map[i - 1] -= head;
chunk->free_size -= head;
}
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[media] videobuf-dma-contig: fix vm_iomap_memory() call

2014-03-27 Thread Ma Haijun

Hi all,

This is a trivial fix, but I think the patch itself has problem too. 
The function requires a phys_addr_t, but we feed it with a dma_handle_t.
AFAIK, this implicit conversion does not always work.
Can I use virt_to_phys(mem->vaddr) to get the physical address instead?
(mem->vaddr and mem->dma_handle are from dma_alloc_coherent)

Regards

Ma Haijun

Ma Haijun (1):
  [media] videobuf-dma-contig: fix incorrect argument to
vm_iomap_memory() call

 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] clk: Fix minor errors in of_clk_init() function comments

2014-03-27 Thread Sylwester Nawrocki

Signed-off-by: Sylwester Nawrocki 
---
 drivers/clk/clk.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index c859adf..29dc1e7 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2575,7 +2575,7 @@ static int parent_ready(struct device_node *np)
  * @matches: array of compatible values and init functions for providers.
  *
  * This function scans the device tree for matching clock providers
- * and calls their initialization functions. It also do it by trying
+ * and calls their initialization functions. It also does it by trying
  * to follow the dependencies.
  */
 void __init of_clk_init(const struct of_device_id *matches)
@@ -2612,7 +2612,7 @@ void __init of_clk_init(const struct of_device_id 
*matches)
}
 
/*
-* We didn't managed to initialize any of the
+* We didn't manage to initialize any of the
 * remaining providers during the last loop, so now we
 * initialize all the remaining ones unconditionally
 * in case the clock parent was not mandatory
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [media] videobuf-dma-contig: fix incorrect argument to vm_iomap_memory() call

2014-03-27 Thread Ma Haijun

The second argument should be physical address rather than virtual address.

Signed-off-by: Ma Haijun 
---
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c 
b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 7e6b209..bf80f0f 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -305,7 +305,7 @@ static int __videobuf_mmap_mapper(struct videobuf_queue *q,
/* Try to remap memory */
size = vma->vm_end - vma->vm_start;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-   retval = vm_iomap_memory(vma, vma->vm_start, size);
+   retval = vm_iomap_memory(vma, mem->dma_handle, size);
if (retval) {
dev_err(q->dev, "mmap: remap failed with error %d. ",
retval);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm64: Fix duplicated Kconfig entries again

2014-03-27 Thread Sudeep Holla

Hi Hanjun,

On 25/03/14 11:09, Hanjun Guo wrote:
> Hi Sudeep,
> 
> On 2014-3-25 18:00, Sudeep Holla wrote:
>> Hi Hanjun,
>>
>> On 25/03/14 09:00, Hanjun Guo wrote:
>>> After commit 74397174989e5 (arm64: Fix duplicated Kconfig entries),
>>> I still get a duplicate Power management options section in linux-next
>>> git repo, may be due to some merge conflicts, anyway, fix that in this
>>> patch.
>>>
>> I reported this and Mark Brown posted the patch[1].
>> I assumed it is already pulled, but looks like that's not the case.
> [...]
>> [1] http://www.spinics.net/lists/arm-kernel/msg314472.html

[...]

> here is the link:
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/arch/arm64/Kconfig
> 

Just had a look at the linux-next tree and looks like the original commit:
"cpufreq: enable ARM drivers on arm64" is pulled by both Catalin and Rafael
which has resulted in the fixup patch[1] not removing the duplicate entry
cleanly.

If not too late it better to ask either Rafael or Catalin to drop both patches
from their tree, instead of creating 4 patches in total to enable cpufreq :)

Regards,
Sudeep

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] clk: register fixed-clock only if #clock-cells property is present

2014-03-27 Thread Boris BREZILLON


Hi Sylwester,

Le 27/03/2014 11:01, Sylwester Nawrocki a écrit :

Hi Boris,

On 27/03/14 08:58, Boris BREZILLON wrote:

This solution solve the problem for this specific case because clks are
declared in the correct order in imx DTs.
But, even with your patch I think we could see similar issues by
reordering DT nodes...

The real problem here is that imx platform does not declare the CCM clocks
dependencies upon ckil, ckih1 and osc fixed clocks within the DT [1], and
retrieve these clocks when initializing the CCM clocks ([2] and [3]).

We should try to a add these dependencies in the DT and see if it works.

While presumably all of us agree the dependencies should be correctly
specified in dts I think we should minimize possible regressions by
keeping the clocks registration order as before, i.e. as parsed by the
kernel from DT. Rather than explicitly reversing it, which does not gain
us anything AFAICS. Instead we are seeing regressions where new kernels
stop working with old dtbs.


I totally agree with you on this point: my patch is not a replacement of 
yours.

I just wanted to point out that we need to fix DT definitions to avoid these
kind of issues in the future.



I'm going to resend the patch replacing list_add() with list_add_tail(),
with this the mvebu platform would work and there should be no regression
on imx and exynos.

Please note that specifying dependencies between CCM on imx and the fixed
clocks might not be enough. If the fixed clocks get matched on "fixed-clock"
compatible some clock specifiers (i.e. those using phandle to the CCM) could
get invalid, since the clocks won't get registered by the ccm driver, but by
the regular fixed clock driver. That means a phandle to different node would
need to be used to reference the fixed clock. I'm not sure if this is the case
for imx, but changes may be needed all over various dts files.
In addition, we should make sure the kernel works with current and modified
dtbs.


[1] http://lxr.free-electrons.com/source/arch/arm/boot/dts/imx6sl.dtsi#L379
[2] http://lxr.free-electrons.com/source/arch/arm/mach-imx/clk-imx6q.c#L151
[3] http://lxr.free-electrons.com/source/arch/arm/mach-imx/clk.c#L30


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: imx6/dt: add ccm dependency upon ckil, ckih1 and osc clocks

2014-03-27 Thread Fabio Estevam

On Thu, Mar 27, 2014 at 5:11 AM, Boris BREZILLON
 wrote:
> Signed-off-by: Boris BREZILLON 

Please provide a commit message, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: don't print value of .driver_data from core

2014-03-27 Thread Gautham R Shenoy

On Thu, Mar 27, 2014 at 04:29:37PM +0530, Viresh Kumar wrote:
> On 27 March 2014 16:18, Gautham R Shenoy  wrote:
> > So after this patch, driver_data is only going to be used by drivers
> > which want an "unsigned int" value to be saved along with the
> > frequency in the frequency_table and for those who want to overload
> > its interpretation to indicate BOOST.
> >
> > From the core's stand point, it is useful only for determining whether
> > a frequency is BOOST frequency or not.
> 
> Yes.
> 
> > So, wouldn't it be logical to allow drivers maintain their own driver
> > data since the core is anyway not interested in it, and change this
> > .driver_data to "flags" or some such which can indicate boost ?
> 
> We can add another field .flags in case Rafael doesn't accept the
> other proposal I sent for fixing BOOST issue.

Even with that patch, the .driver_data won't be opaque. And that's not
good. Because, while some driver might not be explicitly setting the
value of .driver_data to 0xABABABAB, it might want to store the value
obtained at runtime into this field. And it could so happen
that at runtime this value is 0xABABABAB.

> 
> But the point behind keeping .driver_data field here was: many drivers
> have some information attached to each frequency and they are closely
> bound to each other. And so it made more sense to keep them together.
> This is still used by many drivers and I wouldn't like them to maintain
> separate arrays for keeping this information. They are so much bound
> to the frequencies at the same index, that keeping them separately
> wouldn't be a good idea.

I understand this part. However there might be more data than an
"unsigned int" that the drivers would like to be bound at the same
index. Voltage information, for instance.

> 
--
Thanks and Regards
gautham.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/5] KVM: speed up invalid guest state emulation

2014-03-27 Thread Paolo Bonzini

This series identifies some low-hanging fruit for speeding up emulation
of invalid guest state.  I am a bit worried about patch 2, while
everything else should be relatively safe.

On the kvm-unit-tests microbenchmarks I get a 1.8-2.5x speedup (from 
740-1100 cycles/instruction to 280-600).

This is not for 3.15, of course.  Even patch 5 is not too useful without
the others; saving 40 clock cycles is a 10% speedup after the previous
patches, but only 4% before.

Please review carefully!

Paolo

Paolo Bonzini (5):
  KVM: vmx: speed up emulation of invalid guest state
  KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  KVM: x86: move around some checks
  KVM: x86: protect checks on ctxt->d by a common "if (unlikely())"
  KVM: x86: speed up emulated moves

 arch/x86/include/asm/kvm_emulate.h |   2 +-
 arch/x86/kvm/emulate.c | 182 -
 arch/x86/kvm/vmx.c |   5 +-
 arch/x86/kvm/x86.c |  28 --
 4 files changed, 120 insertions(+), 97 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 5/5] KVM: x86: speed up emulated moves

2014-03-27 Thread Paolo Bonzini

We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.

Avoiding a call to memcpy (the compiler optimizes it out) gains about 50
cycles on kvm-unit-tests for register-to-register moves, and makes them
about as fast as arithmetic instructions.

We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/include/asm/kvm_emulate.h | 2 +-
 arch/x86/kvm/emulate.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 24ec1216596e..f7b1e45eb753 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -232,7 +232,7 @@ struct operand {
union {
unsigned long val;
u64 val64;
-   char valptr[sizeof(unsigned long) + 2];
+   char valptr[sizeof(sse128_t)];
sse128_t vec_val;
u64 mm_val;
void *data;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 94974055d906..4a3584d419e5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2955,7 +2955,7 @@ static int em_rdpmc(struct x86_emulate_ctxt *ctxt)
 
 static int em_mov(struct x86_emulate_ctxt *ctxt)
 {
-   memcpy(ctxt->dst.valptr, ctxt->src.valptr, ctxt->op_bytes);
+   memcpy(ctxt->dst.valptr, ctxt->src.valptr, sizeof(ctxt->src.valptr));
return X86EMUL_CONTINUE;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation

2014-03-27 Thread Paolo Bonzini

Despite the provisions to emulate up to 130 consecutive instructions, in
practice KVM will emulate just one before exiting handle_invalid_guest_state,
because x86_emulate_instructionn always sets KVM_REQ_EVENT.

However, we only need to do this if an interrupt could be injected,
which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
away; b) if the interrupt flag has just been set (because other
instructions than STI can set it without enabling an interrupt shadow).

This cuts another 250-300 clock cycles from the cost of emulating an
instruction (530-870 cycles before the patch on kvm-unit-tests,
290-600 afterwards).

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/x86.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd31aada351b..ce9523345f2e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = 
~((u64)EFER_SCE);
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
 static void process_nmi(struct kvm_vcpu *vcpu);
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 
 struct kvm_x86_ops *kvm_x86_ops;
 EXPORT_SYMBOL_GPL(kvm_x86_ops);
@@ -4856,8 +4857,10 @@ static void toggle_interruptibility(struct kvm_vcpu 
*vcpu, u32 mask)
 * means that the last instruction is an sti. We should not
 * leave the flag on in this case. The same goes for mov ss
 */
-   if (!(int_shadow & mask))
+   if (unlikely(int_shadow) && !(int_shadow & mask)) {
kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+   }
 }
 
 static void inject_emulated_exception(struct kvm_vcpu *vcpu)
@@ -5083,20 +5086,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 
type, u32 dr7,
return dr6;
 }
 
-static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r)
+static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long 
rflags, int *r)
 {
struct kvm_run *kvm_run = vcpu->run;
 
/*
-* Use the "raw" value to see if TF was passed to the processor.
-* Note that the new value of the flags has not been saved yet.
+* rflags is the old, "raw" value of the flags.  The new value has
+* not been saved yet.
 *
 * This is correct even for TF set by the guest, because "the
 * processor will not generate this exception after the instruction
 * that sets the TF flag".
 */
-   unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
-
if (unlikely(rflags & X86_EFLAGS_TF)) {
if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
@@ -5263,13 +5264,15 @@ restart:
r = EMULATE_DONE;
 
if (writeback) {
+   unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
-   kvm_make_request(KVM_REQ_EVENT, vcpu);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
kvm_rip_write(vcpu, ctxt->eip);
if (r == EMULATE_DONE)
-   kvm_vcpu_check_singlestep(vcpu, &r);
-   kvm_set_rflags(vcpu, ctxt->eflags);
+   kvm_vcpu_check_singlestep(vcpu, rflags, &r);
+   __kvm_set_rflags(vcpu, ctxt->eflags);
+   if (unlikely((ctxt->eflags & ~rflags) & X86_EFLAGS_IF))
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
} else
vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
 
@@ -7385,12 +7388,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_rflags);
 
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
rflags |= X86_EFLAGS_TF;
kvm_x86_ops->set_rflags(vcpu, rflags);
+}
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+   __kvm_set_rflags(vcpu, rflags);
kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 3/5] KVM: x86: move around some checks

2014-03-27 Thread Paolo Bonzini

The only purpose of this patch is to make the next patch simpler
to review.  No semantic change.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/emulate.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c9f8f61df46c..9e40fbf94dcd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4347,12 +4347,15 @@ done_prefixes:
ctxt->d |= opcode.flags;
}
 
+   /* Unrecognised? */
+   if (ctxt->d == 0)
+   return EMULATION_FAILED;
+
ctxt->execute = opcode.u.execute;
ctxt->check_perm = opcode.check_perm;
ctxt->intercept = opcode.intercept;
 
-   /* Unrecognised? */
-   if (ctxt->d == 0 || (ctxt->d & NotImpl))
+   if (ctxt->d & NotImpl)
return EMULATION_FAILED;
 
if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
@@ -4494,19 +4497,19 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
ctxt->mem_read.pos = 0;
 
-   if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
-   (ctxt->d & Undefined)) {
+   /* LOCK prefix is allowed only with some instructions */
+   if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != 
OP_MEM)) {
rc = emulate_ud(ctxt);
goto done;
}
 
-   /* LOCK prefix is allowed only with some instructions */
-   if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != 
OP_MEM)) {
+   if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
rc = emulate_ud(ctxt);
goto done;
}
 
-   if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
+   if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+   (ctxt->d & Undefined)) {
rc = emulate_ud(ctxt);
goto done;
}
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())"

2014-03-27 Thread Paolo Bonzini

There are several checks for "peculiar" aspects of instructions in both
x86_decode_insn and x86_emulate_insn.  Group them together, and guard
them with a single "if" that lets the processor quickly skip them all.
To do this effectively, add two more flag bits that say whether the
.intercept and .check_perm fields are valid.

This skims about 10 cycles for each emulated instructions, which is
a 2 to 6% improvement.

Signed-off-by: Paolo Bonzini 
---
To review, use -b.

 arch/x86/kvm/emulate.c | 175 ++---
 1 file changed, 94 insertions(+), 81 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9e40fbf94dcd..94974055d906 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -161,6 +161,8 @@
 #define Fastop  ((u64)1 << 44)  /* Use opcode::u.fastop */
 #define NoWrite ((u64)1 << 45)  /* No writeback */
 #define SrcWrite((u64)1 << 46)  /* Write back src operand */
+#define Intercept   ((u64)1 << 47)  /* Has intercept field */
+#define CheckPerm   ((u64)1 << 48)  /* Has intercept field */
 
 #define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -3514,9 +3516,9 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 }
 
 #define D(_y) { .flags = (_y) }
-#define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i }
-#define DIP(_y, _i, _p) { .flags = (_y), .intercept = x86_intercept_##_i, \
- .check_perm = (_p) }
+#define DI(_y, _i) { .flags = (_y)|Intercept, .intercept = x86_intercept_##_i }
+#define DIP(_y, _i, _p) { .flags = (_y)|Intercept|CheckPerm, \
+ .intercept = x86_intercept_##_i, .check_perm = (_p) }
 #define ND(NotImpl)
 #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
 #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
@@ -3525,10 +3527,10 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
 #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
 #define II(_f, _e, _i) \
-   { .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
+   { .flags = (_f)|Intercept, .u.execute = (_e), .intercept = 
x86_intercept_##_i }
 #define IIP(_f, _e, _i, _p) \
-   { .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i, \
- .check_perm = (_p) }
+   { .flags = (_f)|Intercept|CheckPerm, .u.execute = (_e), \
+ .intercept = x86_intercept_##_i, .check_perm = (_p) }
 #define GP(_f, _g) { .flags = ((_f) | Prefix), .u.gprefix = (_g) }
 
 #define D2bv(_f)  D((_f) | ByteOp), D(_f)
@@ -4352,29 +4354,37 @@ done_prefixes:
return EMULATION_FAILED;
 
ctxt->execute = opcode.u.execute;
-   ctxt->check_perm = opcode.check_perm;
-   ctxt->intercept = opcode.intercept;
 
-   if (ctxt->d & NotImpl)
-   return EMULATION_FAILED;
+   if (unlikely(ctxt->d &
+
(NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
+   /*
+* These are copied unconditionally here, and checked 
unconditionally
+* in x86_emulate_insn.
+*/
+   ctxt->check_perm = opcode.check_perm;
+   ctxt->intercept = opcode.intercept;
 
-   if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
-   return EMULATION_FAILED;
+   if (ctxt->d & NotImpl)
+   return EMULATION_FAILED;
 
-   if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
-   ctxt->op_bytes = 8;
+   if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
+   return EMULATION_FAILED;
 
-   if (ctxt->d & Op3264) {
-   if (mode == X86EMUL_MODE_PROT64)
+   if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
ctxt->op_bytes = 8;
-   else
-   ctxt->op_bytes = 4;
-   }
 
-   if (ctxt->d & Sse)
-   ctxt->op_bytes = 16;
-   else if (ctxt->d & Mmx)
-   ctxt->op_bytes = 8;
+   if (ctxt->d & Op3264) {
+   if (mode == X86EMUL_MODE_PROT64)
+   ctxt->op_bytes = 8;
+   else
+   ctxt->op_bytes = 4;
+   }
+
+   if (ctxt->d & Sse)
+   ctxt->op_bytes = 16;
+   else if (ctxt->d & Mmx)
+   ctxt->op_bytes = 8;
+   }
 
/* ModRM and SIB bytes. */
if (ctxt->d & ModRM) {
@@ -4508,75 +4518,78 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
goto done;
}
 
-   if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
-   (ctxt->d & Undefined)) {
-   rc = emulate_ud(ctxt);
-   goto done;
-   }
-
-   if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
-

[RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state

2014-03-27 Thread Paolo Bonzini

About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction.  However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).

In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment).  So there is no need to
call guest_state_valid in the emulation loop.

Emulation performance test results indicate 530-870 cycles
for common instructions, versus 740-1110 before this patch.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/vmx.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1320e0f8e611..73aa522db47b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3629,7 +3629,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
 
 out:
-   vmx->emulation_required |= emulation_required(vcpu);
+   vmx->emulation_required = emulation_required(vcpu);
 }
 
 static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
@@ -5580,7 +5580,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu 
*vcpu)
cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
 
-   while (!guest_state_valid(vcpu) && count-- != 0) {
+   while (vmx->emulation_required && count-- != 0) {
if (intr_window_requested && vmx_interrupt_allowed(vcpu))
return handle_interrupt_window(&vmx->vcpu);
 
@@ -5614,7 +5614,6 @@ static int handle_invalid_guest_state(struct kvm_vcpu 
*vcpu)
schedule();
}
 
-   vmx->emulation_required = emulation_required(vcpu);
 out:
return ret;
 }
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next] xen-netback: Grant copy the header instead of map and memcpy

2014-03-27 Thread Ian Campbell

On Wed, 2014-03-26 at 21:18 +, Zoltan Kiss wrote:
> An old inefficiency of the TX path that we are grant mapping the first slot,
> and then copy the header part to the linear area. Instead, doing a grant copy
> for that header straight on is more reasonable. Especially because there are
> ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
> not touched in Dom0. In the original way the memcpy ruined that.
> The key changes:
> - the vif has a tx_copy_ops array again
> - xenvif_tx_build_gops sets up the grant copy operations
> - we don't have to figure out whether the header and first frag are on the 
> same
>   grant mapped page or not
> 
> Unrelated, but I've made a small refactoring in xenvif_get_extras as well.

Just a few thoughts, not really based on close review of the code.

Do you have any measurements for this series or workloads where it is
particularly beneficial?

You've added a second hypercall to tx_action, probably those can be
combined into one vm exit by using a multicall. Also you should omit
either if their respective nr_?ops is zero (can nr_cops ever be zero?)

Adding another MAX_PENDING_REQS sized array is unfortunate. Could we
perhaps get away with only ever doing copy for the first N requests (for
small N) in a frame and copying up the request, N should be chosen to be
large enough to cover the more common cases (which I suppose is Windows
which puts the network and transport headers in separate slots). This
might allow the copy array to be smaller, at the expense of still doing
the map+memcpy for some corner cases.

Or (and I confess this is a skanky hack): we overlay tx_copy_ops and
tx_map_ops in a union, and insert one set of ops from the front and the
other from the end, taking great care around when and where they meet.

>  static int xenvif_tx_check_gop(struct xenvif *vif,
>  struct sk_buff *skb,
> -struct gnttab_map_grant_ref **gopp)
> +struct gnttab_map_grant_ref **gopp,
> +struct gnttab_copy **gopp_copy)

I think a precursor patch which only does s/gopp/gopp_map/ would be
beneficial.
> @@ -1164,7 +1147,9 @@ static bool tx_credit_exceeded(struct xenvif *vif, 
> unsigned size)
>   return false;
>  }
>  
> -static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
> +static unsigned xenvif_tx_build_gops(struct xenvif *vif,
> +  int budget,
> +  unsigned *copy_ops)

I think you should turn the nr map ops into an explicit pointer return
too, having one thing go via the formal return code and another via a
pointer is a bit odd.

>   struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
>   struct sk_buff *skb;
> @@ -1267,24 +1252,37 @@ static unsigned xenvif_tx_build_gops(struct xenvif 
> *vif, int budget)
>   }
>   }
>  
> - xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
> -
> - gop++;
> -
>   XENVIF_TX_CB(skb)->pending_idx = pending_idx;
>  
>   __skb_put(skb, data_len);

Can't you allocate the skb with sufficient headroom? (or maybe I've
forgotten again how skb payload management works and __skb_put is
effectively free on an empty skb?)

> + vif->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref;
> + vif->tx_copy_ops[*copy_ops].source.domid = vif->domid;
> + vif->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
> +
> + vif->tx_copy_ops[*copy_ops].dest.u.gmfn = 
> virt_to_mfn(skb->data);
> + vif->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
> + vif->tx_copy_ops[*copy_ops].dest.offset = 
> offset_in_page(skb->data);
> +
> + vif->tx_copy_ops[*copy_ops].len = data_len;

Do we want to copy the entire first frag even if it is e.g. a complete
page?

I'm not sure where the tradeoff lies between doing a grant copy of more
than necessary and doing a map+memcpy when the map is already required
for the page frag anyway.

What about the case where the first frag is less than PKT_PROT_LEN? I
think you still do map+memcpy in that case?

> @@ -1375,6 +1374,7 @@ static int xenvif_handle_frag_list(struct xenvif *vif, 
> struct sk_buff *skb)
>  static int xenvif_tx_submit(struct xenvif *vif)
>  {
>   struct gnttab_map_grant_ref *gop = vif->tx_map_ops;

Another candidate for a precursor patch renaming for clarity.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Bug 71331 - mlock yields processor to lower priority process

2014-03-27 Thread jimmie.davis

  

> Generally real-time applications should not be doing mlock calls during 
> their real-time execution for that reason. The required memory regions 
> should be locked during startup so that this kind of execution delay can 
> be avoided at runtime.

Total agreement on this.
 .

Regards,
Bud Davis




  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] regmap: Add REGMAP_ENDIAN_SWAP support for values.

2014-03-27 Thread Mark Brown

On Thu, Mar 27, 2014 at 04:17:39PM +0800, Xiubo Li wrote:

> For the DT node, just one property like 'endian-swap' will be okey
> for cases 2 and 4.

I'm not convinced that the cost of having to define explicit big and
little endian properties for the hardware is worth having this - it
seems like a confusing thing to have in the interface since it depends
on both the device and the CPU and would presumably break in cases where
there's an option about which endianness to run things in.

> And using the REGMAP_ENDIAN_BIG and REGMAP_ENDIAN_LITTLE will make
> the driver a bit more complex, and also the usage of it.

What's the complexity here?

signature.asc
Description: Digital signature

Re: [PATCH] arm64: Fix memory layout typo

2014-03-27 Thread Catalin Marinas

On Wed, Mar 26, 2014 at 02:23:21AM +, Neil Zhang wrote:
> Signed-off-by: Neil Zhang 
> ---
>  Documentation/arm64/memory.txt |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/arm64/memory.txt b/Documentation/arm64/memory.txt
> index 5e054bf..c2765e2 100644
> --- a/Documentation/arm64/memory.txt
> +++ b/Documentation/arm64/memory.txt
> @@ -39,7 +39,7 @@ ffbffbc0ffbffbdf   2MB  
> earlyprintk device
>  
>  ffbffbe0 ffbffbe0  64KB  PCI I/O space
>  
> -ffbffbe1 ffbc  ~2MB  [guard]
> +ffbffbe1 ffbffbff  ~2MB  [guard]
>  
>  ffbffc00 ffbf  64MB  modules

I fixed this in linux-next already as part of 22bd1c91fe1 (arm64: Extend
the PCI I/O space to 16MB).

Thanks.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] random: give up adding randomness to the entropy pool when block devices are busy

2014-03-27 Thread Kazuya Hisaki

Hi,

This gives up adding randomness to the entropy pool when block devices
are busy, to avoid lock contention in I/O intensive situation.

Currently randomness is added to the entropy pool for each 4096 I/O
completion, and holds a global spinlock during its process.
Contention for the spinlock occurs In I/O intensive situation that
many cores process I/O completions, even each core processes
completion for different block device.

This just replaces spin_lock_irqsave() in mix_pool_bytes()
with spin_trylock_irqsave(), and gives up adding randomness
when trylock is failed.
And, it moves trace_mix_pool_bytes() after lock, to trace
events only where addition of randomness is executed.

Thank you,


Signed-off-by: Kazuya Hisaki 
Cc: "Theodore Ts'o" 
Cc: Arnd Bergmann 
Cc: Greg Kroah-Hartman 
Cc: linux-kernel@vger.kernel.org
---
 drivers/char/random.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 429b75b..d2603f8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -545,8 +545,9 @@ static void mix_pool_bytes(struct entropy_store *r, const 
void *in,
 {
unsigned long flags;
 
+   if (!spin_trylock_irqsave(&r->lock, flags))
+   return;
trace_mix_pool_bytes(r->name, nbytes, _RET_IP_);
-   spin_lock_irqsave(&r->lock, flags);
_mix_pool_bytes(r, in, nbytes, out);
spin_unlock_irqrestore(&r->lock, flags);
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 03/12] pci: host: pcie-designware: Use base-mask for configuring the iATU

2014-03-27 Thread Jingoo Han

On Wednesday, March 26, 2014 10:58 PM, Kishon Vijay Abraham I wrote:
> 
> In DRA7, the cpu sees 32bit address, but the pcie controller can see only 
> 28bit
> address. So whenever the cpu issues a read/write request, the 4 most
> significant bits are used by L3 to determine the target controller.
> For example, the cpu reserves 0x2000_ - 0x2FFF_ for PCIe controller 
> but
> the PCIe controller will see only (0x000_ - 0xFFF_FFF). So for programming
> the outbound translation window the *base* should be programmed as 0x000_.
> Whenever we try to write to say 0x2000_, it will be translated to whatever
> we have programmed in the translation window with base as 0x000_.
> 
> Signed-off-by: Kishon Vijay Abraham I 

(+cc Pratyush Anand, Marek Vasut, Richard Zhu)

Acked-by: Jingoo Han 

Mohit Kumar, Pratyush Anand,
If you have other opinions, please let us know. :-)
Thank you.

Best regards,
Jingoo Han

> ---
>  .../devicetree/bindings/pci/designware-pcie.txt|1 +
>  drivers/pci/host/pcie-designware.c |   39 
> ++--
>  drivers/pci/host/pcie-designware.h |1 +
>  3 files changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/pci/designware-pcie.txt
> b/Documentation/devicetree/bindings/pci/designware-pcie.txt
> index d6fae13..c574dd3 100644
> --- a/Documentation/devicetree/bindings/pci/designware-pcie.txt
> +++ b/Documentation/devicetree/bindings/pci/designware-pcie.txt
> @@ -27,6 +27,7 @@ Optional properties for fsl,imx6q-pcie
>  - power-on-gpio: gpio pin number of power-enable signal
>  - wake-up-gpio: gpio pin number of incoming wakeup signal
>  - disable-gpio: gpio pin number of outgoing rfkill/endpoint disable signal
> +- base-mask: address mask for the PCIe controller target port
> 
>  Example:
> 
> diff --git a/drivers/pci/host/pcie-designware.c 
> b/drivers/pci/host/pcie-designware.c
> index 17ce88f..98b661c 100644
> --- a/drivers/pci/host/pcie-designware.c
> +++ b/drivers/pci/host/pcie-designware.c
> @@ -464,6 +464,9 @@ int __init dw_pcie_host_init(struct pcie_port *pp)
>   return -EINVAL;
>   }
> 
> + if (of_property_read_u64(np, "base-mask", &pp->base_mask))
> + pp->base_mask = ~(0x0ULL);
> +
>   if (IS_ENABLED(CONFIG_PCI_MSI)) {
>   pp->irq_domain = irq_domain_add_linear(pp->dev->of_node,
>   MAX_MSI_IRQS, &msi_domain_ops,
> @@ -503,12 +506,15 @@ int __init dw_pcie_host_init(struct pcie_port *pp)
> 
>  static void dw_pcie_prog_viewport_cfg0(struct pcie_port *pp, u32 busdev)
>  {
> + u64 cfg0_base;
> +
> + cfg0_base = pp->cfg0_base & pp->base_mask;
>   /* Program viewport 0 : OUTBOUND : CFG0 */
>   dw_pcie_writel_rc(pp, PCIE_ATU_REGION_OUTBOUND | PCIE_ATU_REGION_INDEX0,
> PCIE_ATU_VIEWPORT);
> - dw_pcie_writel_rc(pp, pp->cfg0_base, PCIE_ATU_LOWER_BASE);
> - dw_pcie_writel_rc(pp, (pp->cfg0_base >> 32), PCIE_ATU_UPPER_BASE);
> - dw_pcie_writel_rc(pp, pp->cfg0_base + pp->config.cfg0_size - 1,
> + dw_pcie_writel_rc(pp, cfg0_base, PCIE_ATU_LOWER_BASE);
> + dw_pcie_writel_rc(pp, (cfg0_base >> 32), PCIE_ATU_UPPER_BASE);
> + dw_pcie_writel_rc(pp, cfg0_base + pp->config.cfg0_size - 1,
> PCIE_ATU_LIMIT);
>   dw_pcie_writel_rc(pp, busdev, PCIE_ATU_LOWER_TARGET);
>   dw_pcie_writel_rc(pp, 0, PCIE_ATU_UPPER_TARGET);
> @@ -518,14 +524,17 @@ static void dw_pcie_prog_viewport_cfg0(struct pcie_port 
> *pp, u32 busdev)
> 
>  static void dw_pcie_prog_viewport_cfg1(struct pcie_port *pp, u32 busdev)
>  {
> + u64 cfg1_base;
> +
> + cfg1_base = pp->cfg1_base & pp->base_mask;
>   /* Program viewport 1 : OUTBOUND : CFG1 */
>   dw_pcie_writel_rc(pp, PCIE_ATU_REGION_OUTBOUND | PCIE_ATU_REGION_INDEX1,
> PCIE_ATU_VIEWPORT);
>   dw_pcie_writel_rc(pp, PCIE_ATU_TYPE_CFG1, PCIE_ATU_CR1);
>   dw_pcie_writel_rc(pp, PCIE_ATU_ENABLE, PCIE_ATU_CR2);
> - dw_pcie_writel_rc(pp, pp->cfg1_base, PCIE_ATU_LOWER_BASE);
> - dw_pcie_writel_rc(pp, (pp->cfg1_base >> 32), PCIE_ATU_UPPER_BASE);
> - dw_pcie_writel_rc(pp, pp->cfg1_base + pp->config.cfg1_size - 1,
> + dw_pcie_writel_rc(pp, cfg1_base, PCIE_ATU_LOWER_BASE);
> + dw_pcie_writel_rc(pp, (cfg1_base >> 32), PCIE_ATU_UPPER_BASE);
> + dw_pcie_writel_rc(pp, cfg1_base + pp->config.cfg1_size - 1,
> PCIE_ATU_LIMIT);
>   dw_pcie_writel_rc(pp, busdev, PCIE_ATU_LOWER_TARGET);
>   dw_pcie_writel_rc(pp, 0, PCIE_ATU_UPPER_TARGET);
> @@ -533,14 +542,17 @@ static void dw_pcie_prog_viewport_cfg1(struct pcie_port 
> *pp, u32 busdev)
> 
>  static void dw_pcie_prog_viewport_mem_outbound(struct pcie_port *pp)
>  {
> + u64 mem_base;
> +
> + mem_base = pp->mem_base & pp->base_mask;
>   /* Program viewport 0 : OUTBOUND : MEM */
>   dw_pcie_writel_rc(pp, PCIE_ATU_REGION_OUTBOUND | PCIE_ATU_REGION_INDEX0,
>

Re: [PATCH] phy/at8031: enable at8031 to work on interrupt mode

2014-03-27 Thread Sergei Shtylyov


Hello.

On 27-03-2014 10:18, Zhao Qiang wrote:


The at8031 can work on polling mode and interrupt mode.
Add ack_interrupt and config intr funcs to enable
interrupt mode for it.



Signed-off-by: Zhao Qiang 
---
  drivers/net/phy/at803x.c | 30 ++
  1 file changed, 30 insertions(+)



diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index bc71947..d034ef5 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c

[...]

@@ -191,6 +194,31 @@ static int at803x_config_init(struct phy_device *phydev)
return 0;
  }

+static int at803x_ack_interrupt(struct phy_device *phydev)
+{
+   int err;
+
+   err = phy_read(phydev, AT803X_INSR);


   Could make this an initializer...


+
+   return (err < 0) ? err : 0;
+}
+
+static int at803x_config_intr(struct phy_device *phydev)
+{
+   int err;
+   int value;
+
+   value = phy_read(phydev, AT803X_INER);
+
+   if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
+   err = phy_write(phydev, AT803X_INER,
+   (value | AT803X_INER_INIT));


   Inner parens not needed.


+   else
+   err = phy_write(phydev, AT803X_INER, value);


   Why are you not clearing the bits here? Why write back what has been read 
at all?


WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC PATCH 03/12] pci: host: pcie-designware: Use base-mask for configuring the iATU

2014-03-27 Thread Mohit KUMAR DCG

Hello Kishon,

> -Original Message-
> From: Jingoo Han [mailto:jg1@samsung.com]
> Sent: Thursday, March 27, 2014 5:15 PM
> To: 'Kishon Vijay Abraham I'; Mohit KUMAR DCG; Pratyush ANAND
> Cc: devicet...@vger.kernel.org; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> o...@vger.kernel.org; linux-...@vger.kernel.org; bhelg...@google.com;
> robh...@kernel.org; pawel.m...@arm.com; mark.rutl...@arm.com;
> ijc+devicet...@hellion.org.uk; ga...@codeaurora.org; r...@landley.net;
> li...@arm.linux.org.uk; t...@atomide.com; rna...@ti.com;
> p...@pwsan.com; 'Marek Vasut'; 'Richard Zhu'
> Subject: Re: [RFC PATCH 03/12] pci: host: pcie-designware: Use *base-mask*
> for configuring the iATU
> 
> On Wednesday, March 26, 2014 10:58 PM, Kishon Vijay Abraham I wrote:
> >
> > In DRA7, the cpu sees 32bit address, but the pcie controller can see
> > only 28bit address. So whenever the cpu issues a read/write request,
> > the 4 most significant bits are used by L3 to determine the target 
> > controller.
> > For example, the cpu reserves 0x2000_ - 0x2FFF_ for PCIe
> > controller but the PCIe controller will see only (0x000_ -
> > 0xFFF_FFF). So for programming the outbound translation window the
> *base* should be programmed as 0x000_.
> > Whenever we try to write to say 0x2000_, it will be translated to
> > whatever we have programmed in the translation window with base as
> 0x000_.
> >
> > Signed-off-by: Kishon Vijay Abraham I 
> 
> (+cc Pratyush Anand, Marek Vasut, Richard Zhu)
> 
> Acked-by: Jingoo Han 
> 
> Mohit Kumar, Pratyush Anand,
> If you have other opinions, please let us know. :-) Thank you.
> 

- Yes, looks more clean now.
Acked-by: Mohit Kumar 

Regards,
Mohit

> Best regards,
> Jingoo Han
> 
> > ---
> >  .../devicetree/bindings/pci/designware-pcie.txt|1 +
> >  drivers/pci/host/pcie-designware.c |   39 
> > ++--
> >  drivers/pci/host/pcie-designware.h |1 +
> >  3 files changed, 29 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/devicetree/bindings/pci/designware-pcie.txt
> > b/Documentation/devicetree/bindings/pci/designware-pcie.txt
> > index d6fae13..c574dd3 100644
> > --- a/Documentation/devicetree/bindings/pci/designware-pcie.txt
> > +++ b/Documentation/devicetree/bindings/pci/designware-pcie.txt
> > @@ -27,6 +27,7 @@ Optional properties for fsl,imx6q-pcie
> >  - power-on-gpio: gpio pin number of power-enable signal
> >  - wake-up-gpio: gpio pin number of incoming wakeup signal
> >  - disable-gpio: gpio pin number of outgoing rfkill/endpoint disable
> > signal
> > +- base-mask: address mask for the PCIe controller target port
> >
> >  Example:
> >
> > diff --git a/drivers/pci/host/pcie-designware.c
> > b/drivers/pci/host/pcie-designware.c
> > index 17ce88f..98b661c 100644
> > --- a/drivers/pci/host/pcie-designware.c
> > +++ b/drivers/pci/host/pcie-designware.c
> > @@ -464,6 +464,9 @@ int __init dw_pcie_host_init(struct pcie_port *pp)
> > return -EINVAL;
> > }
> >
> > +   if (of_property_read_u64(np, "base-mask", &pp->base_mask))
> > +   pp->base_mask = ~(0x0ULL);
> > +
> > if (IS_ENABLED(CONFIG_PCI_MSI)) {
> > pp->irq_domain = irq_domain_add_linear(pp->dev-
> >of_node,
> > MAX_MSI_IRQS, &msi_domain_ops,
> > @@ -503,12 +506,15 @@ int __init dw_pcie_host_init(struct pcie_port
> > *pp)
> >
> >  static void dw_pcie_prog_viewport_cfg0(struct pcie_port *pp, u32
> > busdev)  {
> > +   u64 cfg0_base;
> > +
> > +   cfg0_base = pp->cfg0_base & pp->base_mask;
> > /* Program viewport 0 : OUTBOUND : CFG0 */
> > dw_pcie_writel_rc(pp, PCIE_ATU_REGION_OUTBOUND |
> PCIE_ATU_REGION_INDEX0,
> >   PCIE_ATU_VIEWPORT);
> > -   dw_pcie_writel_rc(pp, pp->cfg0_base, PCIE_ATU_LOWER_BASE);
> > -   dw_pcie_writel_rc(pp, (pp->cfg0_base >> 32),
> PCIE_ATU_UPPER_BASE);
> > -   dw_pcie_writel_rc(pp, pp->cfg0_base + pp->config.cfg0_size - 1,
> > +   dw_pcie_writel_rc(pp, cfg0_base, PCIE_ATU_LOWER_BASE);
> > +   dw_pcie_writel_rc(pp, (cfg0_base >> 32), PCIE_ATU_UPPER_BASE);
> > +   dw_pcie_writel_rc(pp, cfg0_base + pp->config.cfg0_size - 1,
> >   PCIE_ATU_LIMIT);
> > dw_pcie_writel_rc(pp, busdev, PCIE_ATU_LOWER_TARGET);
> > dw_pcie_writel_rc(pp, 0, PCIE_ATU_UPPER_TARGET); @@ -518,14
> +524,17
> > @@ static void dw_pcie_prog_viewport_cfg0(struct pcie_port *pp, u32
> > busdev)
> >
> >  static void dw_pcie_prog_viewport_cfg1(struct pcie_port *pp, u32
> > busdev)  {
> > +   u64 cfg1_base;
> > +
> > +   cfg1_base = pp->cfg1_base & pp->base_mask;
> > /* Program viewport 1 : OUTBOUND : CFG1 */
> > dw_pcie_writel_rc(pp, PCIE_ATU_REGION_OUTBOUND |
> PCIE_ATU_REGION_INDEX1,
> >   PCIE_ATU_VIEWPORT);
> > dw_pcie_writel_rc(pp, PCIE_ATU_TYPE_CFG1, PCIE_ATU_CR1);
> > dw_pcie_writel_rc(pp, PCIE_ATU_ENABLE, PCIE_ATU_CR2);
> >

[RESEND PATCH] charger-manager: Fix checking of wrong return type

2014-03-27 Thread Chanwoo Choi

This patch fix minor issue about checking wrong return type.

The of_cm_parse_desc() return ERR_PTR(errnor number) when some error happen
in this function. But, charger_manager_probe() has only checked whether
desc is NULL or not. If of_cm_parse_desc() returns ERR_PTR(-ENOMEM), desc
isn't NULL but desc is (void *)(-ENOMEM). Althouhg some error happen for parsing
DT, charger_manager_probe() can't detect error of desc instance.

Signed-off-by: Chanwoo Choi 
Signed-off-by: Myungjoo Ham 
---
 drivers/power/charger-manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/power/charger-manager.c b/drivers/power/charger-manager.c
index 9e4dab4..a10fb57 100644
--- a/drivers/power/charger-manager.c
+++ b/drivers/power/charger-manager.c
@@ -1677,7 +1677,7 @@ static int charger_manager_probe(struct platform_device 
*pdev)
}
}
 
-   if (!desc) {
+   if (IS_ERR(desc)) {
dev_err(&pdev->dev, "No platform data (desc) found\n");
return -ENODEV;
}
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] workqueue: use manager lock only to protect worker_idr

2014-03-27 Thread Lai Jiangshan

please omit this patch and wait for my new patchset.

Thanks,
Lai

On 03/26/2014 10:41 PM, Lai Jiangshan wrote:
> worker_idr is always accessed in manager lock context currently.
> worker_idr is highly related to managers, it will be unlikely
> accessed in pool->lock only in future.
> 
> Signed-off-by: Lai Jiangshan 
> ---
>  kernel/workqueue.c |   34 ++
>  1 files changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 0c74979..f5b68a3 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -123,8 +123,7 @@ enum {
>   *cpu or grabbing pool->lock is enough for read access.  If
>   *POOL_DISASSOCIATED is set, it's identical to L.
>   *
> - * MG: pool->manager_mutex and pool->lock protected.  Writes require both
> - * locks.  Reads can happen under either lock.
> + * M: pool->manager_mutex protected.
>   *
>   * PL: wq_pool_mutex protected.
>   *
> @@ -163,7 +162,7 @@ struct worker_pool {
>   /* see manage_workers() for details on the two manager mutexes */
>   struct mutexmanager_arb;/* manager arbitration */
>   struct mutexmanager_mutex;  /* manager exclusion */
> - struct idr  worker_idr; /* MG: worker IDs and iteration 
> */
> + struct idr  worker_idr; /* M: worker IDs and iteration 
> */
>  
>   struct workqueue_attrs  *attrs; /* I: worker attributes */
>   struct hlist_node   hash_node;  /* PL: unbound_pool_hash node */
> @@ -339,16 +338,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs 
> *to,
>  lockdep_is_held(&wq->mutex), \
>  "sched RCU or wq->mutex should be held")
>  
> -#ifdef CONFIG_LOCKDEP
> -#define assert_manager_or_pool_lock(pool)\
> - WARN_ONCE(debug_locks &&\
> -   !lockdep_is_held(&(pool)->manager_mutex) &&   \
> -   !lockdep_is_held(&(pool)->lock),  \
> -   "pool->manager_mutex or ->lock should be held")
> -#else
> -#define assert_manager_or_pool_lock(pool)do { } while (0)
> -#endif
> -
>  #define for_each_cpu_worker_pool(pool, cpu)  \
>   for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0];   \
>(pool) < &per_cpu(cpu_worker_pools, cpu)[NR_STD_WORKER_POOLS]; \
> @@ -377,14 +366,14 @@ static void copy_workqueue_attrs(struct workqueue_attrs 
> *to,
>   * @wi: integer used for iteration
>   * @pool: worker_pool to iterate workers of
>   *
> - * This must be called with either @pool->manager_mutex or ->lock held.
> + * This must be called with either @pool->manager_mutex.
>   *
>   * The if/else clause exists only for the lockdep assertion and can be
>   * ignored.
>   */
>  #define for_each_pool_worker(worker, wi, pool)   
> \
>   idr_for_each_entry(&(pool)->worker_idr, (worker), (wi)) \
> - if (({ assert_manager_or_pool_lock((pool)); false; })) { } \
> + if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { 
> } \
>   else
>  
>  /**
> @@ -1717,13 +1706,7 @@ static struct worker *create_worker(struct worker_pool 
> *pool)
>* ID is needed to determine kthread name.  Allocate ID first
>* without installing the pointer.
>*/
> - idr_preload(GFP_KERNEL);
> - spin_lock_irq(&pool->lock);
> -
> - id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_NOWAIT);
> -
> - spin_unlock_irq(&pool->lock);
> - idr_preload_end();
> + id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
>   if (id < 0)
>   goto fail;
>  
> @@ -1765,18 +1748,13 @@ static struct worker *create_worker(struct 
> worker_pool *pool)
>   worker->flags |= WORKER_UNBOUND;
>  
>   /* successful, commit the pointer to idr */
> - spin_lock_irq(&pool->lock);
>   idr_replace(&pool->worker_idr, worker, worker->id);
> - spin_unlock_irq(&pool->lock);
>  
>   return worker;
>  
>  fail:
> - if (id >= 0) {
> - spin_lock_irq(&pool->lock);
> + if (id >= 0)
>   idr_remove(&pool->worker_idr, id);
> - spin_unlock_irq(&pool->lock);
> - }
>   kfree(worker);
>   return NULL;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] KVM: vmx: fix MPX detection

2014-03-27 Thread Paolo Bonzini


Il 27/03/2014 09:31, Jet Chen ha scritto:

Hi Paolo,

I helped to test for your patch on our LKP system. It fixes the bug
reported by Fengguang.

I applied your patch based on commit
93c4adc7afedf9b0ec190066d45b6d67db5270da.


Thanks Jet!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] workqueue: add __WQ_FREEZING and remove POOL_FREEZING

2014-03-27 Thread Lai Jiangshan

On 03/25/2014 05:56 PM, Lai Jiangshan wrote:
> freezing is nothing related to pools, but POOL_FREEZING adds a connection,
> and causes freeze_workqueues_begin() and thaw_workqueues() complicated.
> 
> Since freezing is workqueue instance attribute, so we introduce __WQ_FREEZING
> to wq->flags instead and remove POOL_FREEZING.
> 
> we set __WQ_FREEZING only when freezable(to simplify pwq_adjust_max_active()),
> make freeze_workqueues_begin() and thaw_workqueues() fast skip non-freezable 
> wq.
> 
> Changed from previous patches(requested by tj):
>   1) added the WARN_ON_ONCE() back
>   2) merged the two patches as one

Ping.

Hi, Tejun

You had reviewed this patch several rounds.
I had applied all your requests(the last two is listed above) in your comments.

I'm deeply sorry for responding so late.

Thanks,
Lai


> 
> Signed-off-by: Lai Jiangshan 
> ---
>  include/linux/workqueue.h |1 +
>  kernel/workqueue.c|   43 ---
>  2 files changed, 13 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 704f4f6..a45202b 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -335,6 +335,7 @@ enum {
>*/
>   WQ_POWER_EFFICIENT  = 1 << 7,
>  
> + __WQ_FREEZING   = 1 << 15, /* internel: workqueue is freezing */
>   __WQ_DRAINING   = 1 << 16, /* internal: workqueue is draining */
>   __WQ_ORDERED= 1 << 17, /* internal: workqueue is ordered */
>  
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 193e977..0c74979 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -70,7 +70,6 @@ enum {
>*/
>   POOL_MANAGE_WORKERS = 1 << 0,   /* need to manage workers */
>   POOL_DISASSOCIATED  = 1 << 2,   /* cpu can't serve workers */
> - POOL_FREEZING   = 1 << 3,   /* freeze in progress */
>  
>   /* worker flags */
>   WORKER_STARTED  = 1 << 0,   /* started */
> @@ -3632,9 +3631,6 @@ static struct worker_pool *get_unbound_pool(const 
> struct workqueue_attrs *attrs)
>   if (!pool || init_worker_pool(pool) < 0)
>   goto fail;
>  
> - if (workqueue_freezing)
> - pool->flags |= POOL_FREEZING;
> -
>   lockdep_set_subclass(&pool->lock, 1);   /* see put_pwq() */
>   copy_workqueue_attrs(pool->attrs, attrs);
>  
> @@ -3730,18 +3726,13 @@ static void pwq_unbound_release_workfn(struct 
> work_struct *work)
>  static void pwq_adjust_max_active(struct pool_workqueue *pwq)
>  {
>   struct workqueue_struct *wq = pwq->wq;
> - bool freezable = wq->flags & WQ_FREEZABLE;
>  
> - /* for @wq->saved_max_active */
> + /* for @wq->saved_max_active and @wq->flags */
>   lockdep_assert_held(&wq->mutex);
>  
> - /* fast exit for non-freezable wqs */
> - if (!freezable && pwq->max_active == wq->saved_max_active)
> - return;
> -
>   spin_lock_irq(&pwq->pool->lock);
>  
> - if (!freezable || !(pwq->pool->flags & POOL_FREEZING)) {
> + if (!(wq->flags & __WQ_FREEZING)) {
>   pwq->max_active = wq->saved_max_active;
>  
>   while (!list_empty(&pwq->delayed_works) &&
> @@ -4250,6 +4241,8 @@ struct workqueue_struct *__alloc_workqueue_key(const 
> char *fmt,
>   mutex_lock(&wq_pool_mutex);
>  
>   mutex_lock(&wq->mutex);
> + if ((wq->flags & WQ_FREEZABLE) && workqueue_freezing)
> + wq->flags |= __WQ_FREEZING;
>   for_each_pwq(pwq, wq)
>   pwq_adjust_max_active(pwq);
>   mutex_unlock(&wq->mutex);
> @@ -4856,26 +4849,20 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
>   */
>  void freeze_workqueues_begin(void)
>  {
> - struct worker_pool *pool;
>   struct workqueue_struct *wq;
>   struct pool_workqueue *pwq;
> - int pi;
>  
>   mutex_lock(&wq_pool_mutex);
>  
>   WARN_ON_ONCE(workqueue_freezing);
>   workqueue_freezing = true;
>  
> - /* set FREEZING */
> - for_each_pool(pool, pi) {
> - spin_lock_irq(&pool->lock);
> - WARN_ON_ONCE(pool->flags & POOL_FREEZING);
> - pool->flags |= POOL_FREEZING;
> - spin_unlock_irq(&pool->lock);
> - }
> -
>   list_for_each_entry(wq, &workqueues, list) {
> + if (!(wq->flags & WQ_FREEZABLE))
> + continue;
>   mutex_lock(&wq->mutex);
> + WARN_ON_ONCE(wq->flags & __WQ_FREEZING);
> + wq->flags |= __WQ_FREEZING;
>   for_each_pwq(pwq, wq)
>   pwq_adjust_max_active(pwq);
>   mutex_unlock(&wq->mutex);
> @@ -4943,25 +4930,19 @@ void thaw_workqueues(void)
>  {
>   struct workqueue_struct *wq;
>   struct pool_workqueue *pwq;
> - struct worker_pool *pool;
> - int pi;
>  
>   mutex_lock(&wq_pool_mutex);
>  
>   if (!workqueue_freezing)
>   goto out_unlock;
>  
> - /* clear FREEZING */

[PATCH RFC v3 0/2] clk: Support for DT assigned clock parents and rates

2014-03-27 Thread Sylwester Nawrocki

This patch set adds a DT binding documentation for new 'clock-parents'
and 'clock-rates' DT properties and a helper function to parse them.
The helper is now being called from within the driver core, similarly
as it is done for the pins configuration binding to a device.

Patch 1/2 adds a variant of of_clk_get() function which accepts name of
a DT property containing list of phandle + clock specifier pairs, as
opposed to hard coded "clocks" property name in of_clk_get().
As Mike suggested I've renamed this function to of_clk_get_by_property().

Patch 2/2 actually adds the code searching for related DT properties at
device node and performing re-parenting and/or clock frequency setting
as specified.

I didn't add sorting of clocks depending on parentship relation when
setting the clock rates, it could be added in next iteration if it's
decided it's required.

Changes since v2:
 - code reordering to ensure there is no build errors, the clock
   configuration code moved to a separate file,
 - introduced an 'assigned-clocks' DT node which is supposed to contain
   clocks, clock-parents, clock-rates properties and be child node
   a clock provider node, and a code parsing it called from of_clk_init();
   It's for clocks which are not directly connected to consumer devices.
   An alternative would be to list such assigned clocks in 'clocks'
   property, along with "proper" parent clocks, but then there would
   be phandles in clocks property of a node pointing to itself and it
   would require proper handling in of_clock_init().
   I actually tried it but it looked a bit ugly and chose this time to
   use an extra subnode.

Changes since v1:
 - updated DT binding documentation,
 - dropped the platform bus notifier, the clock setup routine is now
   being called directly from the driver core before a driver probe() call;
   this has an advantage such as all bus types are handled and any errors
   are propagated, so that, for instance a driver probe() can be deferred
   also when resources specified by clock-parents/clock-rates properties
   are not yet available; an alternative would be to let drivers call
   of_clk_device_setup() directly,
 - dropped the patch adding a macro definition for maximum DT property
   name length for now.

Open issues:
 - handling of errors from of_clk_get_by_property() could be improved,
   currently ENOENT is returned by this function not only for a null
   entry.

This series has been tested on ARM, on Exynos4412 Trats2 board, with
patch [1] applied. RFC v2 can be found at [2].

[1] https://lkml.org/lkml/2014/3/27/97
[2] https://lkml.org/lkml/2014/3/3/324

Sylwester Nawrocki (2):
  clk: Add function parsing arbitrary clock list DT property
  clk: Add handling of clk parent and rate assigned from DT

 .../devicetree/bindings/clock/clock-bindings.txt   |   26 ++
 drivers/base/dd.c  |7 ++
 drivers/clk/Makefile   |1 +
 drivers/clk/clk-conf.c |   87 
 drivers/clk/clk.c  |   10 ++-
 drivers/clk/clkdev.c   |   25 +-
 include/linux/clk.h|7 ++
 include/linux/clk/clk-conf.h   |   19 +
 8 files changed, 177 insertions(+), 5 deletions(-)
 create mode 100644 drivers/clk/clk-conf.c
 create mode 100644 include/linux/clk/clk-conf.h

--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC v3 1/2] clk: Add function parsing arbitrary clock list DT property

2014-03-27 Thread Sylwester Nawrocki

The of_clk_get_by_property() function added by this patch is similar to
of_clk_get(), except it allows to pass name of a DT property containing
list of phandles and clock specifiers. For of_clk_get() this has been
hard coded to "clocks".

Signed-off-by: Sylwester Nawrocki 
---
Changes since v2:
 - moved the function declaration from drivers/clk/clk.h to
   include/linux/clk.h

Changes since v1:
 - s/of_clk_get_list_entry/of_clk_get_by_property.
---
 drivers/clk/clkdev.c |   25 +
 include/linux/clk.h  |7 +++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
index a360b2e..1d41540 100644
--- a/drivers/clk/clkdev.c
+++ b/drivers/clk/clkdev.c
@@ -27,17 +27,28 @@ static LIST_HEAD(clocks);
 static DEFINE_MUTEX(clocks_mutex);
 
 #if defined(CONFIG_OF) && defined(CONFIG_COMMON_CLK)
-struct clk *of_clk_get(struct device_node *np, int index)
+/**
+ * of_clk_get_by_property() - Parse and lookup a clock referenced by a device 
node
+ * @np: pointer to clock consumer node
+ * @list_name: name of the clock list property
+ * @index: index to the clock list
+ *
+ * This function parses the @list_name property and together with @index
+ * value indicating an entry of the list uses it to look up the struct clk
+ * from the registered list of clock providers.
+ */
+struct clk *of_clk_get_by_property(struct device_node *np,
+  const char *list_name, int index)
 {
struct of_phandle_args clkspec;
struct clk *clk;
int rc;
 
-   if (index < 0)
+   if (index < 0 || !list_name)
return ERR_PTR(-EINVAL);
 
-   rc = of_parse_phandle_with_args(np, "clocks", "#clock-cells", index,
-   &clkspec);
+   rc = of_parse_phandle_with_args(np, list_name, "#clock-cells",
+   index, &clkspec);
if (rc)
return ERR_PTR(rc);
 
@@ -51,6 +62,12 @@ struct clk *of_clk_get(struct device_node *np, int index)
of_node_put(clkspec.np);
return clk;
 }
+EXPORT_SYMBOL(of_clk_get_by_property);
+
+struct clk *of_clk_get(struct device_node *np, int index)
+{
+   return of_clk_get_by_property(np, "clocks", index);
+}
 EXPORT_SYMBOL(of_clk_get);
 
 /**
diff --git a/include/linux/clk.h b/include/linux/clk.h
index 0dd9114..f71235b 100644
--- a/include/linux/clk.h
+++ b/include/linux/clk.h
@@ -383,6 +383,8 @@ struct of_phandle_args;
 
 #if defined(CONFIG_OF) && defined(CONFIG_COMMON_CLK)
 struct clk *of_clk_get(struct device_node *np, int index);
+struct clk *of_clk_get_by_property(struct device_node *np,
+  const char *list_name, int index);
 struct clk *of_clk_get_by_name(struct device_node *np, const char *name);
 struct clk *of_clk_get_from_provider(struct of_phandle_args *clkspec);
 #else
@@ -395,6 +397,11 @@ static inline struct clk *of_clk_get_by_name(struct 
device_node *np,
 {
return ERR_PTR(-ENOENT);
 }
+struct clk *of_clk_get_by_property(struct device_node *np,
+  const char *list_name, int index)
+{
+   return ERR_PTR(-ENOENT);
+}
 #endif
 
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC v3 2/2] clk: Add handling of clk parent and rate assigned from DT

2014-03-27 Thread Sylwester Nawrocki

This function adds a helper function to configure clock parents and rates
as specified in clock-parents, clock-rates DT properties for a consumer
device and a call to it before driver is bound to a device.

Signed-off-by: Sylwester Nawrocki 
---
Changes since v2:
 - edited in clock-bindings.txt, added note about 'assigned-clocks'
   subnode which may be used to specify "global" clocks configuration
   at a clock provider node,
 - moved of_clk_device_setup() function declaration from clk-provider.h
   to clk-conf.h so required function stubs are available when
   CONFIG_COMMON_CLK is not enabled,

Changes since v1:
 - the helper function to parse and set assigned clock parents and
   rates made public so it is available to clock providers to call
   directly;
 - dropped the platform bus notification and call of_clk_device_setup()
   is is now called from the driver core, rather than from the
   notification callback;
 - s/of_clk_get_list_entry/of_clk_get_by_property.
---
 .../devicetree/bindings/clock/clock-bindings.txt   |   26 ++
 drivers/base/dd.c  |7 ++
 drivers/clk/Makefile   |1 +
 drivers/clk/clk-conf.c |   87 
 drivers/clk/clk.c  |   10 ++-
 include/linux/clk/clk-conf.h   |   19 +
 6 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 drivers/clk/clk-conf.c
 create mode 100644 include/linux/clk/clk-conf.h

diff --git a/Documentation/devicetree/bindings/clock/clock-bindings.txt 
b/Documentation/devicetree/bindings/clock/clock-bindings.txt
index 7c52c29..b452f80 100644
--- a/Documentation/devicetree/bindings/clock/clock-bindings.txt
+++ b/Documentation/devicetree/bindings/clock/clock-bindings.txt
@@ -115,3 +115,29 @@ clock signal, and a UART.
   ("pll" and "pll-switched").
 * The UART has its baud clock connected the external oscillator and its
   register clock connected to the PLL clock (the "pll-switched" signal)
+
+==Assigned clock parents and rates==
+
+Some platforms require static initial configuration of parts of the clocks
+controller. Such a configuration can be specified in a clock consumer node
+through clock-parents and clock-rates DT properties. The former should contain
+a list of parent clocks in form of phandle and clock specifier pairs, the
+latter the list of assigned clock frequency values (one cell each).
+
+uart@a000 {
+compatible = "fsl,imx-uart";
+reg = <0xa000 0x1000>;
+...
+clocks = <&clkcon 0>, <&clkcon 3>;
+clock-names = "baud", "mux";
+
+clock-parents = <0>, <&pll 1>;
+clock-rates = <460800>;
+};
+
+In this example the pll is set as parent of "mux" clock and frequency of "baud"
+clock is specified as 460800 Hz.
+
+For clocks which are not directly connected to any consumer device similarly
+clocks, clock-parents and/or clock-rates properties should be specified in
+assigned-clocks subnode of a clock controller DT node.
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 0605176..4c633e7 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "base.h"
 #include "power/power.h"
@@ -278,6 +279,12 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
if (ret)
goto probe_failed;
 
+   if (dev->of_node) {
+   ret = of_clk_device_init(dev->of_node);
+   if (ret)
+   goto probe_failed;
+   }
+
if (driver_sysfs_add(dev)) {
printk(KERN_ERR "%s: driver_sysfs_add(%s) failed\n",
__func__, dev_name(dev));
diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index a367a98..c720e4b 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_COMMON_CLK)+= clk-fixed-rate.o
 obj-$(CONFIG_COMMON_CLK)   += clk-gate.o
 obj-$(CONFIG_COMMON_CLK)   += clk-mux.o
 obj-$(CONFIG_COMMON_CLK)   += clk-composite.o
+obj-$(CONFIG_COMMON_CLK)   += clk-conf.o
 
 # hardware specific clock types
 # please keep this section sorted lexicographically by file/directory path name
diff --git a/drivers/clk/clk-conf.c b/drivers/clk/clk-conf.c
new file mode 100644
index 000..a2e992e
--- /dev/null
+++ b/drivers/clk/clk-conf.c
@@ -0,0 +1,87 @@
+/*
+ * Copyright (C) 2014 Samsung Electronics Co., Ltd.
+ * Sylwester Nawrocki 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * of_clk_device_init() - parse and set clk configuration assigned to a device
+ * @node: device node to apply the configuration for
+ *
+ * This function parses 'clock-parents' and 'clock-rates' properties and se

Re: [PATCH v2 1/3] kmemleak: allow freeing internal objects after kmemleak was disabled

2014-03-27 Thread Catalin Marinas

On Thu, Mar 27, 2014 at 02:29:18AM +, Li Zefan wrote:
> On 2014/3/22 7:37, Catalin Marinas wrote:
> > On 17 Mar 2014, at 04:07, Li Zefan  wrote:
> >> Currently if kmemleak is disabled, the kmemleak objects can never be freed,
> >> no matter if it's disabled by a user or due to fatal errors.
> >>
> >> Those objects can be a big waste of memory.
> >>
> >>  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> >> 1200264 1197433  99%0.30K  46164   26369312K kmemleak_object
> >>
> >> With this patch, internal objects will be freed immediately if kmemleak is
> >> disabled explicitly by a user. If it's disabled due to a kmemleak error,
> >> The user will be informed, and then he/she can reclaim memory with:
> >>
> >># echo off > /sys/kernel/debug/kmemleak
> >>
> >> v2: use "off" handler instead of "clear" handler to do this, suggested
> >>by Catalin.
> > 
> > I think there was a slight misunderstanding. My point was about "echo
> > scan=off” before “echo off”, they can just be squashed into the
> > same action of the latter.
> 
> I'm not sure if I understand correctly, so you want the "off" handler to
> stop the scan thread but it will never free kmemleak objects until the 
> user explicitly trigger the "clear" action, right?

Yes. That's just in case someone wants to stop kmemleak but still
investigate some previously reported leaks.

Thanks.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 1/1] mtd: gpmi: make blockmark swapping optional

2014-03-27 Thread Lothar Waßmann

Hi,

Huang Shijie wrote:
> > > Please see "Figure 12-13" in the 12.12.1.12:
> > >"In order to preserve the BI (bad block information), flash updater 
> > > or gang programmer
> > > applications need to swap Bad Block Information (BI) data to byte 0 of 
> > > metadata area for
> > > every page before programming NAND Flash. ROM when loading firmware, 
> > > copies back
> > > the value at metadata[0] to BI offset in page data. The following figure 
> > > shows how the
> > > factory bad block marker is preserved."
> > > 
> > The inspection of the BB markers is only a fallback for the case that
> > there is no DBBT. From the same chapter that you quoted above:
> > | ROM uses DBBT to skip any bad block that falls within firmware data
> > | on NAND Flash device.
> > | If the address of DBBT Search Area in FCB is 0, ROM will rely on
> > | factory marked bad block markers to find out if a block is good or bad.
> > 
> > Thus, even the boot ROM of i.MX28 can well live without blockmark
> > swapping.
> 
> Assume that there is a NAND block "A",  and the A consist of 256 pages.
> the uboot is burned to the "A", can occupy 6 pages:
> 
>   
> -
>  | page 0 |  page 1 | page 2 | page 3 | page 4 | page 5 | ... | ... | page 
> 255 |
>   
> -
>  
>   \-- 
> /
>  V  
> "A"   
>  
> 
> The DBBT is used to track if "A" is bad or not.
> Assume we know that "A" is a good block, ROM then need to read out the uboot.
> When the ROM needs to read out the 6 pages one by one. And each time the ROM 
> read
> the page, it should do the swapping for this page.
> 
> In this case, the ROM will do the swapping six times.
> 
> Please read the sector again, you will see the "every page" in it:
> 
>"In order to preserve the BI (bad block information), flash updater 
> or gang programmer applications need to swap Bad Block Information (BI) data 
> to byte 0 of 
> metadata area for every page before programming NAND Flash. ROM when loading 
> firmware, 
> copies back
> 
>
I can assure you that the >100.000 i.MX28 based modules, that we sold
up to now boot from NAND just fine without any block mark swapping in
the U-Boot pages.


Lothar Waßmann
-- 
___

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

www.karo-electronics.de | i...@karo-electronics.de
___
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 343 matches

Mail list logo