date:20120815

Re: [PATCH] crypto: twofish - add x86_64/avx assembler implementation

2012-08-15 Thread Jussi Kivilinna

Quoting Johannes Goetzfried  
:



This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose  
registers.

For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way
module are called. A good performance increase is provided for blocksizes
greater or equal to 128B.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

Intel Core i5-2500 CPU (fam:6, model:42, step:7)


I started thinking about the performance on AMD Bulldozer.  
vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers  
on AMD CPU is alot slower (latencies from 8 to 12 cycles) than on  
Intel sandy-bridge (where instructions have latency of 1 to 2). See:  
http://www.agner.org/optimize/instruction_tables.pdf


It would be really good, if implementation could be tested on AMD CPU  
to determinate, if it causes performance regression. However I don't  
have access to machine with such CPU.


-Jussi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] iio: adc: add new lp8788 adc driver

2012-08-15 Thread Lars-Peter Clausen

On 08/10/2012 09:06 AM, Kim, Milo wrote:
> [...]
> + switch (mask) {
> + case IIO_CHAN_INFO_RAW:
> + *val = result;
> + return IIO_VAL_INT;
> + case IIO_CHAN_INFO_SCALE:
> + *val = adc_const[id] * ((result * 1000 + 500) / 1000);

This looks wrong. The IIO_CHAN_INFO_SCALE attribute is the factor by which
IIO_CHAN_INFO_RAW needs to be multiplied to get the value in the proper unit,
which is specified in the IIO ABI spec. E.g. milli volts for voltages.

What you return here seems to be the IIO_CHAN_INFO_PROCESSED attribute. Which
basically is raw * scale.

> + *val2 = 0;
> + return IIO_VAL_INT_PLUS_MICRO;
> + default:
> + break;
> + }
> +
> +err:
> + return -EINVAL;
> +}
> +
> [...]
> +}
> +
> +static struct iio_chan_spec lp8788_adc_channels[] = {

const

> + [LPADC_VBATT_5P5] = LP8788_CHAN(VBATT_5P5, IIO_VOLTAGE),
> + [LPADC_VIN_CHG]   = LP8788_CHAN(VIN_CHG, IIO_VOLTAGE),
> + [LPADC_IBATT] = LP8788_CHAN(IBATT, IIO_CURRENT),
> + [LPADC_IC_TEMP]   = LP8788_CHAN(IC_TEMP, IIO_TEMP),
> + [LPADC_VBATT_6P0] = LP8788_CHAN(VBATT_6P0, IIO_VOLTAGE),
> + [LPADC_VBATT_5P0] = LP8788_CHAN(VBATT_5P0, IIO_VOLTAGE),
> + [LPADC_ADC1]  = LP8788_CHAN(ADC1, IIO_VOLTAGE),
> + [LPADC_ADC2]  = LP8788_CHAN(ADC2, IIO_VOLTAGE),
> + [LPADC_VDD]   = LP8788_CHAN(VDD, IIO_VOLTAGE),
> + [LPADC_VCOIN] = LP8788_CHAN(VCOIN, IIO_VOLTAGE),
> + [LPADC_VDD_LDO]   = LP8788_CHAN(VDD_LDO, IIO_VOLTAGE),
> + [LPADC_ADC3]  = LP8788_CHAN(ADC3, IIO_VOLTAGE),
> + [LPADC_ADC4]  = LP8788_CHAN(ADC4, IIO_VOLTAGE),
> +};
> +


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-08-15 Thread Mel Gorman

On Tue, Aug 14, 2012 at 05:00:49PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:35:25PM +0300, Michael S. Tsirkin wrote:
> > > > > +/* __isolate_lru_page() counterpart for a ballooned page */
> > > > > +bool isolate_balloon_page(struct page *page)
> > > > > +{
> > > > > + if (WARN_ON(!movable_balloon_page(page)))
> > > > 
> > > > Looks like this actually can happen if the page is leaked
> > > > between previous movable_balloon_page and here.
> > > > 
> > > > > + return false;
> > > 
> > > Yes, it surely can happen, and it does not harm to catch it here, print a 
> > > warn and
> > > return.
> > 
> > If it is legal, why warn? For that matter why test here at all?
> >
> 
> As this is a public symbol, and despite the usage we introduce is sane, the 
> warn
> was placed as an insurance policy to let us know about any insane attempt to 
> use
> the procedure in the future. That was due to a nice review nitpick, actually.
> 
> Even though the code already had a test to properly avoid this race you
> mention, I thought that sustaining the warn was a good thing. As I told you,
> despite real, I've never got (un)lucky enough to stumble across that race 
> window
> while testing the patch.
> 
> If your concern is about being too much verbose on logging, under certain
> conditions, perhaps we can change that test to a WARN_ON_ONCE() ?
> 
> Mel, what are your thoughts here?
>  

I viewed it as being defensive programming. VM_BUG_ON would be less
useful as it can be compiled out. If the race can be routinely hit then
multiple warnings is instructive in itself. I have no strong feelings
about this though. I see little harm in making the check but in light of
this conversation add a short comment explaining that the check should
be redundant.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mtd: kill MTD_NAND_VERIFY_WRITE

2012-08-15 Thread Huang Shijie


于 2012年08月15日 15:06, Shmulik Ladkani 写道:

Hi Huang,

On Tue, 14 Aug 2012 22:38:45 -0400 Huang Shijie  wrote:

diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig
index 588e989..0ca7257 100644
--- a/drivers/mtd/nand/Kconfig
+++ b/drivers/mtd/nand/Kconfig
@@ -22,15 +22,6 @@ menuconfig MTD_NAND

  if MTD_NAND

-config MTD_NAND_VERIFY_WRITE
-   bool "Verify NAND page writes"
-   help
- This adds an extra check when data is written to the flash. The
- NAND flash device internally checks only bits transitioning
- from 1 to 0. There is a rare possibility that even though the
- device thinks the write was successful, a bit could have been
- flipped accidentally due to device wear or something else.
-

There are some defconfig files which set CONFIG_MTD_NAND_VERIFY_WRITE.

I guess you should submit an accompanying patch that removes
CONFIG_MTD_NAND_VERIFY_WRITE from all defconfig files.


thanks a lot.

I will send out a separate patch to fix it.

Huang Shijie

(also, trimmed the CC list for this specific discussion, seems unrelated
to all of the parties)

Regards,
Shmulik

__
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 02/16] user_ns: use new hashtable implementation

2012-08-15 Thread David Laight

> Yes hash_32 seems reasonable for the uid hash.   With those long hash
> chains I wouldn't like to be on a machine with 10,000 processes with
> each with a different uid, and a processes calling setuid in the fast
> path.
> 
> The uid hash that we are playing with is one that I sort of wish that
> the hash table could grow in size, so that we could scale up better.

Since uids are likely to be allocated in dense blocks, maybe an
unhashed multi-level lookup scheme might be appropriate.

Index an array with the low 8 (say) bits of the uid.
Each item can be either:  
  1) NULL => free entry.
  2) a pointer to a uid structure (check uid value).
  3) a pointer to an array to index with the next 8 bits.
(2) and (3) can be differentiated by the low address bit.
I think that is updateable with cmpxchg.

Clearly this is a bad algorithm if uids are all multiples of 2^24
but that is true or any hash function.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drivers/iio/adc/at91_adc.c: use devm_ functions

2012-08-15 Thread Lars-Peter Clausen

On 08/14/2012 10:32 PM, Jonathan Cameron wrote:
> Lars-Peter,
> 
> Are you happy with this updated version?  Can't immediately find any response
> from you to it.
> 

I think it is ok, you can add my
Reviewed-by: Lars-Peter Clausen .

One minor nitpick though.

> Jonathan
>> From: Julia Lawall 
>>
>> The various devm_ functions allocate memory that is released when a driver
>> detaches.  This patch uses these functions for data that is allocated in
>> the probe function of a platform device and is only freed in the remove
>> function.
>>
>> The call to platform_get_resource(pdev, IORESOURCE_MEM, 0) is moved coser
>> to the call to devm_request_and_ioremap, which is th first use of the
>> result of platform_get_resource.
>>
>> This does not use devm_request_irq to ensure that free_irq is executed
>> before its idev argument is freed.
>>
>> Signed-off-by: Julia Lawall 
>>
>> ---
>>  drivers/iio/adc/at91_adc.c |   41 -
>>  1 file changed, 8 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/iio/adc/at91_adc.c b/drivers/iio/adc/at91_adc.c
>> index f61780a..3506e3d 100644
>> --- a/drivers/iio/adc/at91_adc.c
>> +++ b/drivers/iio/adc/at91_adc.c
>> @@ -545,13 +545,6 @@ static int __devinit at91_adc_probe(struct 
>> platform_device *pdev)
>>  goto error_free_device;
>>  }
>>
>> -res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> -if (!res) {
>> -dev_err(>dev, "No resource defined\n");
>> -ret = -ENXIO;
>> -goto error_ret;
>> -}
>> -
>>  platform_set_drvdata(pdev, idev);
>>
>>  idev->dev.parent = >dev;
>> @@ -566,18 +559,13 @@ static int __devinit at91_adc_probe(struct 
>> platform_device *pdev)
>>  goto error_free_device;
>>  }
>>
>> -if (!request_mem_region(res->start, resource_size(res),
>> -"AT91 adc registers")) {
>> -dev_err(>dev, "Resources are unavailable.\n");
>> -ret = -EBUSY;
>> -goto error_free_device;
>> -}
>> +res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>
>> -st->reg_base = ioremap(res->start, resource_size(res));
>> +st->reg_base = devm_request_and_ioremap(>dev, res);
>>  if (!st->reg_base) {
>>  dev_err(>dev, "Failed to map registers.\n");

devm_request_and_ioremap will already print a error messages on it's own if
something goes wrong. So strictly speaking this one is redundant, but I don't
think it is necessary to do a resend just for this, maybe you can remove the
extra dev_err when you apply the patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman

On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > What I think you should do is use rcu for access.
> > And here sync rcu before freeing.
> > Maybe an overkill but at least a documented synchronization
> > primitive, and it is very light weight.
> > 
> 
> I liked your suggestion on barriers, as well.
> 

I have not thought about this as deeply as I shouold but is simply rechecking
the mapping under the pages_lock to make sure the page is still a balloon
page an option? i.e. use pages_lock to stabilise page->mapping.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 10/65] ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+

2012-08-15 Thread Will Deacon

Hi Ben,

On Wed, Aug 15, 2012 at 05:29:26AM +0100, Ben Hutchings wrote:
> On Mon, 2012-08-13 at 15:13 -0700, Greg Kroah-Hartman wrote:
> > From: Greg KH 
> > 
> > 3.4-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Will Deacon 
> > 
> > commit a76d7bd96d65fa5119adba97e1b58d95f2e78829 upstream.
> > 
> > The open-coded mutex implementation for ARMv6+ cores suffers from a
> > severe lack of barriers, so in the uncontended case we don't actually
> > protect any accesses performed during the critical section.
> > 
> > Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
> > code but optimised to remove a branch instruction, as the mutex fastpath
> > was previously inlined. Now that this is executed out-of-line, we can
> > reuse the atomic access code for the locking (in fact, we use the xchg
> > code as this produces shorter critical sections).
> > 
> > This patch uses the generic xchg based implementation for mutexes on
> > ARMv6+, which introduces barriers to the lock/unlock operations and also
> > has the benefit of removing a fair amount of inline assembly code.
> [...]
> 
> I understand that a further fix is needed on top of this
>  but it's
> not in Linus's tree yet.  Is it better to apply this on its own or to
> wait for the complete fix?

The additional patch should also be CC'd to stable and is sitting in -tip
somewhere I believe, so it shouldn't be long before it does hit mainline.

Without this patch there's a memory-ordering bug (which we seem to have hit
once in > 5 years). With the patch there's a mutex lockup issue on SMP systems
that I can provoke with enough hackbenching, so you may want to hold off for
now.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Glauber Costa

On 08/14/2012 07:16 PM, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote:
>> When a process tries to allocate a page with the __GFP_KMEMCG flag, the
>> page allocator will call the corresponding memcg functions to validate
>> the allocation. Tasks in the root memcg can always proceed.
>>
>> To avoid adding markers to the page - and a kmem flag that would
>> necessarily follow, as much as doing page_cgroup lookups for no reason,
> 
> As you already guessed, doing a page_cgroup in the page allocator free
> path would be a no-go.

Specifically yes, but in general, you will be able to observe that I am
taking all the possible measures to make sure existing paths are
disturbed as little as possible.

Thanks for your review here

>>  
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index b956cec..da341dc 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
>> order,
>>  struct page *page = NULL;
>>  int migratetype = allocflags_to_migratetype(gfp_mask);
>>  unsigned int cpuset_mems_cookie;
>> +void *handle = NULL;
>>  
>>  gfp_mask &= gfp_allowed_mask;
>>  
>> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
>> order,
>>  return NULL;
>>  
>>  /*
>> + * Will only have any effect when __GFP_KMEMCG is set.
>> + * This is verified in the (always inline) callee
>> + */
>> +if (!memcg_kmem_new_page(gfp_mask, , order))
> 
> memcg_kmem_new_page takes a void * parameter already but here you are
> passing in a void **. This probably happens to work because you do this
> 
> struct mem_cgroup **handle = (struct mem_cgroup **)_handle;
> 
> but that appears to defeat the purpose of having an opaque type as a
> "handle". You have to treat it different then passing it into the commit
> function because it expects a void *. The motivation for an opaque type
> is completely unclear to me and how it is managed with a mix of void *
> and void ** is very confusing.

okay.

The opaque exists because I am doing speculative charging. I believe it
to be a better and less complicated approach then letting a page appear
and then charging it. Besides being consistent with the rest of memcg,
it won't create unnecessary disturbance in the page allocator
when the allocation is to fail.

Now, tasks can move between memcgs, so we can't rely on grabbing it from
current in commit_page, so we pass it around as a handle. Also, even if
the task could not move, we already got it once from the task, and that
is not for free. Better save it.

Aside from the handle needed, the cost is more or less the same compared
to doing it in one pass. All we do by using speculative charging is to
split the cost in two, and doing it from two places.
We'd have to charge + update page_cgroup anyway.

As for the type, do you think using struct mem_cgroup would be less
confusing?

> On a similar note I spotted #define memcg_kmem_on 1 . That is also
> different just for the sake of it. The convension is to do something
> like this
> 
> /* This helps us to avoid #ifdef CONFIG_NUMA */
> #ifdef CONFIG_NUMA
> #define NUMA_BUILD 1
> #else
> #define NUMA_BUILD 0
> #endif

For simple defines, yes. But a later patch will turn this into a static
branch test. memcg_kmem_on will be always 0 when compile-disabled, but
when enable will expand to static_branch(&...).

> memcg_kmem_on was difficult to guess based on its name. I thought initially
> that it would only be active if a memcg existed or at least something like
> mem_cgroup_disabled() but it's actually enabled if CONFIG_MEMCG_KMEM is set.

For now. And I thought that adding the static branch in this patch would
only confuse matters. The placeholder is there, but it is later patched
to the final thing.

With that explained, if you want me to change it to something else, I
can do it. Should I ?

> I also find it *very* strange to have a function named as if it is an
> allocation-style function when it in fact it's looking up a mem_cgroup
> and charging it (and uncharging it in the error path if necessary). If
> it was called memcg_kmem_newpage_charge I might have found it a little
> better.

I don't feel strongly about names in general. I can change it.
Will update to memcg_kmem_newpage_charge() and memcg_kmem_page_uncharge().

> This whole operation also looks very expensive (cgroup lookups, RCU locks
> taken etc) but I guess you're willing to take that cost in the same of
> isolating containers from each other. However, I strongly suggest that
> this overhead is measured in advance. It should not stop the series being
> merged as such but it should be understood because if the cost is high
> then this feature will be avoided like the plague. I am skeptical that
> distributions would enable this by default, at least not without support
> for cgroup_disable=kmem

Enabling this feature will bring you nothing, therefore, no (or

Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)

2012-08-15 Thread Lukáš Czerner

On Thu, 9 Aug 2012, Theodore Ts'o wrote:

> Date: Thu, 9 Aug 2012 13:06:40 -0400
> From: Theodore Ts'o 
> To: Lukas Czerner 
> Cc: Paolo Bonzini ,
> "Linux Kernel mailinlinux-e...@vger.kernel.orgg List"
> , linux-e...@vger.kernel.org
> Subject: Re: ext4fs error
> "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd"
>  (with repro)
> 
> On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote:
> > Here is how to reproduce it.  It happens during fstrim.  I found other
> > occurrences of the error in the mailing list, but they were not related
> > to trim so they may be something different.
> > 
> > modprobe scsi_debug dev_size_mb=256 lbpws=1
> > dd if=/dev/zero of=/dev/sdb bs=1M  
> > fdisk /dev/sdb
> >  >> create a new partition accepting all defaults
> > fdisk -lu /dev/sdb|tail -1
> >  >> should show: /dev/sdb1 57  524285  262114+  83  Linux
> > 
> > mkfs.ext4 /dev/sdb1
> > mkdir test
> > mount /dev/sdb1 test
> > fstrim ./test
> 
> I can confirm that this accurately reproduces file system corruption
> using a 3.5 kernel.  It looks like some block allocation bitmap blocks
> is getting trimmed when it shouldn't have been.  Lukas, can you take a
> look at this?
> 
>   - Ted

Hi Ted,

sorry for the delay, I've just got back from my vacation. I'll take
a look at it.

Thanks!
-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drivers/iio/adc/at91_adc.c: use devm_ functions

2012-08-15 Thread Julia Lawall


devm_request_and_ioremap will already print a error messages on it's own if
something goes wrong. So strictly speaking this one is redundant, but I don't
think it is necessary to do a resend just for this, maybe you can remove the
extra dev_err when you apply the patch.


Thanks for pointing that out.  I will get rid of the messages in the 
future.  That seems easier than figuring out how to adapt the message to 
the new function.


julia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 1/1] HID:hid-multitouch: Add ELAN production request when resume.

2012-08-15 Thread Scott Liu

Add ELAN production request when resume.

Some Elan legacy devices require SET_IDLE to be set on resume.
It should be safe to send it to other devices too.
Tested on 3M, Stantum, Cypress, Zytronic, eGalax, and Elan panels. 


Signed-off-by: Scott Liu 
Suggested-off-by: Benjamin Tissoires 
---
 drivers/hid/hid-multitouch.c |   27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c
index 59c8b5c..e824c37 100644
--- a/drivers/hid/hid-multitouch.c
+++ b/drivers/hid/hid-multitouch.c
@@ -767,6 +767,32 @@ static int mt_reset_resume(struct hid_device *hdev)
mt_set_input_mode(hdev);
return 0;
 }
+
+static int mt_resume(struct hid_device *hdev)
+{
+   struct usb_interface *intf;
+   struct usb_host_interface *interface;
+   struct usb_device *dev;
+
+   if (hdev->bus != BUS_USB)
+   return 0;
+
+   intf = to_usb_interface(hdev->dev.parent);
+   interface = intf->cur_altsetting;
+   dev = hid_to_usb_dev(hdev);
+
+   /* Some Elan legacy devices require SET_IDLE to be set on resume.
+* It should be safe to send it to other devices too.
+* Tested on 3M, Stantum, Cypress, Zytronic, eGalax, and Elan panels. */
+
+   usb_control_msg(dev, usb_sndctrlpipe(dev, 0),
+   HID_REQ_SET_IDLE,
+   USB_TYPE_CLASS | USB_RECIP_INTERFACE,
+   0, interface->desc.bInterfaceNumber,
+   NULL, 0, USB_CTRL_SET_TIMEOUT);
+
+   return 0;
+}
 #endif
 
 static void mt_remove(struct hid_device *hdev)
@@ -1092,6 +1118,7 @@ static struct hid_driver mt_driver = {
.event = mt_event,
 #ifdef CONFIG_PM
.reset_resume = mt_reset_resume,
+   .resume = mt_resume,
 #endif
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa

On 08/14/2012 10:58 PM, Greg Thelen wrote:
> On Mon, Aug 13 2012, Glauber Costa wrote:
> 
> + WARN_ON(mem_cgroup_is_root(memcg));
> + size = (1 << order) << PAGE_SHIFT;
> + memcg_uncharge_kmem(memcg, size);
> + mem_cgroup_put(memcg);
>>> Why do we need ref-counting here ? kmem res_counter cannot work as
>>> reference ?
>> This is of course the pair of the mem_cgroup_get() you commented on
>> earlier. If we need one, we need the other. If we don't need one, we
>> don't need the other =)
>>
>> The guarantee we're trying to give here is that the memcg structure will
>> stay around while there are dangling charges to kmem, that we decided
>> not to move (remember: moving it for the stack is simple, for the slab
>> is very complicated and ill-defined, and I believe it is better to treat
>> all kmem equally here)
> 
> By keeping memcg structures hanging around until the last referring kmem
> page is uncharged do such zombie memcg each consume a css_id and thus
> put pressure on the 64k css_id space?  I imagine in pathological cases
> this would prevent creation of new cgroups until these zombies are
> dereferenced.

Yes, but although this patch makes it more likely, it doesn't introduce
that. If the tasks, for instance, grab a reference to the cgroup dentry
in the filesystem (like their CWD, etc), they will also keep the cgroup
around.


> Is there any way to see how much kmem such zombie memcg are consuming?
> I think we could find these with
> for_each_mem_cgroup_tree(root_mem_cgroup).

Yes, just need an interface for that. But I think it is something that
can be addressed orthogonaly to this work, in a separate patch, not as
some fundamental limitation.

>  Basically, I'm wanting to
> know where kernel memory has been allocated.  For live memcg, an admin
> can cat memory.kmem.usage_in_bytes.  But for zombie memcg, I'm not sure
> how to get this info.  It looks like the root_mem_cgroup
> memory.kmem.usage_in_bytes is not hierarchically charged.
> 

Not sure what you mean by not being hierarchically charged. It should
be, when use_hierarchy = 1. As a matter of fact, I just tested it, and I
do see kmem being charged all the way to the root cgroup when hierarchy
is used. (we just can't limit it there)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Michal Hocko

On Thu 09-08-12 17:01:15, Glauber Costa wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b956cec..da341dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>   struct page *page = NULL;
>   int migratetype = allocflags_to_migratetype(gfp_mask);
>   unsigned int cpuset_mems_cookie;
> + void *handle = NULL;
>  
>   gfp_mask &= gfp_allowed_mask;
>  
> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>   return NULL;
>  
>   /*
> +  * Will only have any effect when __GFP_KMEMCG is set.
> +  * This is verified in the (always inline) callee
> +  */
> + if (!memcg_kmem_new_page(gfp_mask, , order))
> + return NULL;

When the previous patch introduced this function I thought the handle
obfuscantion is to prevent from spreading struct mem_cgroup inside the
page allocator but memcg_kmem_commit_page uses the type directly. So why
that obfuscation? Even handle as a name sounds unnecessarily confusing.
I would go with struct mem_cgroup **memcgp or even return the pointer on
success or NULL otherwise.

[...]
> +EXPORT_SYMBOL(__free_accounted_pages);

Why exported?

Btw. this is called from call_rcu context but it itself calls call_rcu
down the chain in mem_cgroup_put. Is it safe?

[...]
> +EXPORT_SYMBOL(free_accounted_pages);

here again
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 0/8] procfs, fdinfo updated

2012-08-15 Thread Cyrill Gorcunov

Hi guys,

here is an updated series. As being discussed with Al
the fdinfo helper provided via file_operations. Also
I've dropped CONFIG_CHECKPOINT_RESTORE wrap from inside
of particular subsystems, thus this new feature will be
available by default. I've tested the whole series but
additional review would be appreciated.

Please tell me wht you think.

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 6/8] fs, eventfd: Add procfs fdinfo helper

2012-08-15 Thread Cyrill Gorcunov

This allow us to print out raw counter value.
The /proc/pid/fdinfo/fd output is

 | pos: 0
 | flags:   04002
 | eventfd-count:   5a

Signed-off-by: Cyrill Gorcunov 
CC: Pavel Emelyanov 
CC: Al Viro 
CC: Alexey Dobriyan 
CC: Andrew Morton 
CC: James Bottomley 
---
 fs/eventfd.c |   20 
 1 file changed, 20 insertions(+)

Index: linux-2.6.git/fs/eventfd.c
===
--- linux-2.6.git.orig/fs/eventfd.c
+++ linux-2.6.git/fs/eventfd.c
@@ -19,6 +19,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 struct eventfd_ctx {
struct kref kref;
@@ -284,7 +286,25 @@ static ssize_t eventfd_write(struct file
return res;
 }
 
+#ifdef CONFIG_PROC_FS
+static int eventfd_show_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct eventfd_ctx *ctx = f->private_data;
+   int ret;
+
+   spin_lock_irq(>wqh.lock);
+   ret = seq_printf(m, "eventfd-count: %16llx\n",
+(unsigned long long)ctx->count);
+   spin_unlock_irq(>wqh.lock);
+
+   return ret;
+}
+#endif
+
 static const struct file_operations eventfd_fops = {
+#ifdef CONFIG_PROC_FS
+   .show_fdinfo= eventfd_show_fdinfo,
+#endif
.release= eventfd_release,
.poll   = eventfd_poll,
.read   = eventfd_read,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 5/8] fs, notify: Add procfs fdinfo helper v3

2012-08-15 Thread Cyrill Gorcunov

This allow us to print out fsnotify details such as
watchee inode, device, mask and file handle.

For example for inotify objects the output is

 | pos: 0
 | flags:   0200
 | inotify wd:3 ino: 9e7e sdev:   800013 mask:  800afce 
ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle: 
7e9e640d1b6d
 | inotify wd:2 ino: a111 sdev:   800013 mask:  800afce 
ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle: 
11a120542153
 | inotify wd:1 ino:6b149 sdev:   800013 mask:  800afce 
ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle: 
49b1060023552153

For fanotify it is like

 | pos: 0
 | flags:   02
 | fanotify ino:68f71 sdev:   800013 mask:1 ignored_mask: 
4000 fhandle-bytes:8 fhandle-type:1 f_handle: 
718f0600b9f42053
 | fanotify mnt_id:   13 mask:1 ignored_mask: 4000

To minimize impact on general fsnotify code the new functionality
is gathered in fs/notify/fdinfo.c file.

v2:
 - append missing colons to terms
v3:
 - continue from pervious position in list on ->next

Signed-off-by: Cyrill Gorcunov 
CC: Pavel Emelyanov 
CC: Al Viro 
CC: Alexey Dobriyan 
CC: Andrew Morton 
CC: James Bottomley 
---
 fs/notify/Makefile |2 
 fs/notify/fanotify/fanotify_user.c |4 
 fs/notify/fdinfo.c |  167 +
 fs/notify/fdinfo.h |   22 
 fs/notify/inotify/inotify_user.c   |4 
 5 files changed, 198 insertions(+), 1 deletion(-)

Index: linux-2.6.git/fs/notify/Makefile
===
--- linux-2.6.git.orig/fs/notify/Makefile
+++ linux-2.6.git/fs/notify/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_FSNOTIFY) += fsnotify.o notification.o group.o 
inode_mark.o \
-  mark.o vfsmount_mark.o
+  mark.o vfsmount_mark.o fdinfo.o
 
 obj-y  += dnotify/
 obj-y  += inotify/
Index: linux-2.6.git/fs/notify/fanotify/fanotify_user.c
===
--- linux-2.6.git.orig/fs/notify/fanotify/fanotify_user.c
+++ linux-2.6.git/fs/notify/fanotify/fanotify_user.c
@@ -17,6 +17,7 @@
 #include 
 
 #include "../../mount.h"
+#include "../fdinfo.h"
 
 #define FANOTIFY_DEFAULT_MAX_EVENTS16384
 #define FANOTIFY_DEFAULT_MAX_MARKS 8192
@@ -446,6 +447,9 @@ static long fanotify_ioctl(struct file *
 }
 
 static const struct file_operations fanotify_fops = {
+#ifdef CONFIG_PROC_FS
+   .show_fdinfo= fanotify_show_fdinfo,
+#endif
.poll   = fanotify_poll,
.read   = fanotify_read,
.write  = fanotify_write,
Index: linux-2.6.git/fs/notify/fdinfo.c
===
--- /dev/null
+++ linux-2.6.git/fs/notify/fdinfo.c
@@ -0,0 +1,167 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "inotify/inotify.h"
+#include "../fs/mount.h"
+
+struct inode_file_handle {
+   struct file_handle  h;
+   struct fid  fid;
+} __packed;
+
+#if defined(CONFIG_PROC_FS)
+
+#if defined(CONFIG_INOTIFY_USER) || defined(CONFIG_FANOTIFY)
+
+#ifdef CONFIG_EXPORTFS
+static int inotify_encode_target(struct inode *inode, struct inode_file_handle 
*fhandle)
+{
+   int ret, size;
+
+   size = sizeof(fhandle->fid) >> 2;
+   ret = export_encode_inode_fh(inode, >fid, );
+   BUG_ON(ret != FILEID_INO32_GEN);
+
+   fhandle->h.handle_type = FILEID_INO32_GEN;
+   fhandle->h.handle_bytes = size * sizeof(u32);
+
+   return 0;
+}
+#else
+static int inotify_encode_target(struct inode *inode, struct inode_file_handle 
*fhandle)
+{
+   fhandle->h.handle_type = FILEID_ROOT;
+   fhandle->h.handle_bytes = 0;
+   return 0;
+}
+#endif /* CONFIG_EXPORTFS */
+
+static int show_fdinfo(struct seq_file *m, struct file *f,
+  int (*show)(struct seq_file *m, struct fsnotify_mark 
*mark))
+{
+   struct fsnotify_group *group = f->private_data;
+   struct fsnotify_mark *mark;
+   int ret = 0;
+
+   spin_lock(>mark_lock);
+   list_for_each_entry(mark, >marks_list, g_list) {
+   ret = show(m, mark);
+   if (ret)
+   break;
+   }
+   spin_unlock(>mark_lock);
+   return ret;
+}
+
+#ifdef CONFIG_INOTIFY_USER
+
+static int inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
+{
+   struct inotify_inode_mark *inode_mark;
+   struct inode *inode;
+   int ret = 0;
+
+   if (!(mark->flags & (FSNOTIFY_MARK_FLAG_ALIVE | 
FSNOTIFY_MARK_FLAG_INODE)))
+   return 0;
+
+   inode_mark = container_of(mark, struct

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin

On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > What I think you should do is use rcu for access.
> > > And here sync rcu before freeing.
> > > Maybe an overkill but at least a documented synchronization
> > > primitive, and it is very light weight.
> > > 
> > 
> > I liked your suggestion on barriers, as well.
> > 
> 
> I have not thought about this as deeply as I shouold but is simply rechecking
> the mapping under the pages_lock to make sure the page is still a balloon
> page an option? i.e. use pages_lock to stabilise page->mapping.

To clarify, are you concerned about cost of rcu_read_lock
for non balloon pages?

> -- 
> Mel Gorman
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 8/8] fdinfo: Show sigmask for signalfd fd v2

2012-08-15 Thread Cyrill Gorcunov

Signed-off-by: Pavel Emelyanov 
Signed-off-by: Cyrill Gorcunov 
---
 fs/proc/array.c |2 +-
 fs/signalfd.c   |   26 ++
 include/linux/proc_fs.h |3 +++
 3 files changed, 30 insertions(+), 1 deletion(-)

Index: linux-2.6.git/fs/proc/array.c
===
--- linux-2.6.git.orig/fs/proc/array.c
+++ linux-2.6.git/fs/proc/array.c
@@ -220,7 +220,7 @@ static inline void task_state(struct seq
seq_putc(m, '\n');
 }
 
-static void render_sigset_t(struct seq_file *m, const char *header,
+void render_sigset_t(struct seq_file *m, const char *header,
sigset_t *set)
 {
int i;
Index: linux-2.6.git/fs/signalfd.c
===
--- linux-2.6.git.orig/fs/signalfd.c
+++ linux-2.6.git/fs/signalfd.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void signalfd_cleanup(struct sighand_struct *sighand)
 {
@@ -46,6 +47,7 @@ void signalfd_cleanup(struct sighand_str
 }
 
 struct signalfd_ctx {
+   seqcount_t cnt;
sigset_t sigmask;
 };
 
@@ -227,7 +229,28 @@ static ssize_t signalfd_read(struct file
return total ? total: ret;
 }
 
+#ifdef CONFIG_PROC_FS
+static int signalfd_show_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct signalfd_ctx *ctx = f->private_data;
+   sigset_t sigmask;
+   unsigned seq;
+
+   do {
+   seq = read_seqcount_begin(>cnt);
+   sigmask = ctx->sigmask;
+   } while (read_seqcount_retry(>cnt, seq));
+
+   signotset();
+   render_sigset_t(m, "sigmask:\t", );
+   return 0;
+}
+#endif
+
 static const struct file_operations signalfd_fops = {
+#ifdef CONFIG_PROC_FS
+   .show_fdinfo= signalfd_show_fdinfo,
+#endif
.release= signalfd_release,
.poll   = signalfd_poll,
.read   = signalfd_read,
@@ -259,6 +282,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sig
return -ENOMEM;
 
ctx->sigmask = sigmask;
+   seqcount_init(>cnt);
 
/*
 * When we call this, the initialization must be complete, since
@@ -279,7 +303,9 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sig
return -EINVAL;
}
spin_lock_irq(>sighand->siglock);
+   write_seqcount_begin(>cnt);
ctx->sigmask = sigmask;
+   write_seqcount_end(>cnt);
spin_unlock_irq(>sighand->siglock);
 
wake_up(>sighand->signalfd_wqh);
Index: linux-2.6.git/include/linux/proc_fs.h
===
--- linux-2.6.git.orig/include/linux/proc_fs.h
+++ linux-2.6.git/include/linux/proc_fs.h
@@ -290,4 +290,7 @@ static inline struct net *PDE_NET(struct
return pde->parent->data;
 }
 
+#include 
+
+void render_sigset_t(struct seq_file *m, const char *header, sigset_t *set);
 #endif /* _LINUX_PROC_FS_H */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 3/8] procfs: Add ability to plug in auxiliary fdinfo providers

2012-08-15 Thread Cyrill Gorcunov

This patch brings ability to print out auxiliary data associated
with file in procfs interface /proc/pid/fdinfo/fd.

In particular further patches make eventfd, evenpoll, signalfd
and fsnotify to print additional information complete enough
to restore these objects after checkpoint.

To simplify the code we add show_fdinfo callback inside
struct file_operations (as Al proposed and Pavel are proposing).

Signed-off-by: Cyrill Gorcunov 
CC: Pavel Emelyanov 
CC: Al Viro 
CC: Alexey Dobriyan 
CC: Andrew Morton 
CC: James Bottomley 
---
 fs/proc/fd.c   |   51 ---
 include/linux/fs.h |3 +++
 2 files changed, 39 insertions(+), 15 deletions(-)

Index: linux-2.6.git/fs/proc/fd.c
===
--- linux-2.6.git.orig/fs/proc/fd.c
+++ linux-2.6.git/fs/proc/fd.c
@@ -15,11 +15,11 @@
 #include "fd.h"
 
 struct proc_fdinfo {
-   loff_t  f_pos;
-   int f_flags;
+   struct file *f_file;
+   int f_flags;
 };
 
-static int fdinfo_open_helper(struct inode *inode, int *f_flags, struct path 
*path)
+static int fdinfo_open_helper(struct inode *inode, int *f_flags, struct file 
**f_file, struct path *path)
 {
struct files_struct *files = NULL;
struct task_struct *task;
@@ -49,6 +49,10 @@ static int fdinfo_open_helper(struct ino
*path = fd_file->f_path;
path_get(_file->f_path);
}
+   if (f_file) {
+   *f_file = fd_file;
+   get_file(fd_file);
+   }
ret = 0;
}
spin_unlock(>file_lock);
@@ -61,28 +65,44 @@ static int fdinfo_open_helper(struct ino
 static int seq_show(struct seq_file *m, void *v)
 {
struct proc_fdinfo *fdinfo = m->private;
-   seq_printf(m, "pos:\t%lli\nflags:\t0%o\n",
-  (long long)fdinfo->f_pos,
-  fdinfo->f_flags);
-   return 0;
+   int ret;
+
+   ret = seq_printf(m, "pos:\t%lli\nflags:\t0%o\n",
+(long long)fdinfo->f_file->f_pos,
+fdinfo->f_flags);
+
+   if (!ret && fdinfo->f_file->f_op->show_fdinfo)
+   ret = fdinfo->f_file->f_op->show_fdinfo(m, fdinfo->f_file);
+
+   return ret;
 }
 
 static int seq_fdinfo_open(struct inode *inode, struct file *file)
 {
-   struct proc_fdinfo *fdinfo = NULL;
-   int ret = -ENOENT;
+   struct proc_fdinfo *fdinfo;
+   struct seq_file *m;
+   int ret;
 
fdinfo = kzalloc(sizeof(*fdinfo), GFP_KERNEL);
if (!fdinfo)
return -ENOMEM;
 
-   ret = fdinfo_open_helper(inode, >f_flags, NULL);
-   if (!ret) {
-   ret = single_open(file, seq_show, fdinfo);
-   if (!ret)
-   fdinfo = NULL;
+   ret = fdinfo_open_helper(inode, >f_flags, >f_file, 
NULL);
+   if (ret)
+   goto err_free;
+
+   ret = single_open(file, seq_show, fdinfo);
+   if (ret) {
+   put_filp(fdinfo->f_file);
+   goto err_free;
}
 
+   m = file->private_data;
+   m->private = fdinfo;
+
+   return ret;
+
+err_free:
kfree(fdinfo);
return ret;
 }
@@ -92,6 +112,7 @@ static int seq_fdinfo_release(struct ino
struct seq_file *m = file->private_data;
struct proc_fdinfo *fdinfo = m->private;
 
+   put_filp(fdinfo->f_file);
kfree(fdinfo);
 
return single_release(inode, file);
@@ -173,7 +194,7 @@ static const struct dentry_operations ti
 
 static int proc_fd_link(struct dentry *dentry, struct path *path)
 {
-   return fdinfo_open_helper(dentry->d_inode, NULL, path);
+   return fdinfo_open_helper(dentry->d_inode, NULL, NULL, path);
 }
 
 static struct dentry *
Index: linux-2.6.git/include/linux/fs.h
===
--- linux-2.6.git.orig/include/linux/fs.h
+++ linux-2.6.git/include/linux/fs.h
@@ -1775,6 +1775,8 @@ struct block_device_operations;
 #define HAVE_COMPAT_IOCTL 1
 #define HAVE_UNLOCKED_IOCTL 1
 
+struct seq_file;
+
 struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
@@ -1803,6 +1805,7 @@ struct file_operations {
int (*setlease)(struct file *, long, struct file_lock **);
long (*fallocate)(struct file *file, int mode, loff_t offset,
  loff_t len);
+   int (*show_fdinfo)(struct seq_file *m, struct file *f);
 };
 
 struct inode_operations {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/8] procfs: Convert /proc/pid/fdinfo/ handling routines to seq-file

2012-08-15 Thread Cyrill Gorcunov

This patch converts /proc/pid/fdinfo/ handling routines to seq-file which
is needed to extend seq operations and plug in auxiliary fdinfo provides
from subsystems like eventfd/eventpoll/fsnotify.

Note the proc_fd_link no longer call for proc_fd_info, simply because
proc_fd_info is converted to seq_fdinfo_open (which is seq-file open()
prototype).

Also, to eliminate code duplication (and Pavel's concerns) the 
fdinfo_open_helper
function introduced which is used in both seq_fdinfo_open and proc_fd_link.

Signed-off-by: Cyrill Gorcunov 
Acked-by: Pavel Emelyanov 
CC: Al Viro 
CC: Alexey Dobriyan 
CC: Andrew Morton 
CC: James Bottomley 
---
 fs/proc/fd.c |  123 +++
 1 file changed, 75 insertions(+), 48 deletions(-)

Index: linux-2.6.git/fs/proc/fd.c
===
--- linux-2.6.git.orig/fs/proc/fd.c
+++ linux-2.6.git/fs/proc/fd.c
@@ -6,61 +6,104 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 
 #include "internal.h"
 #include "fd.h"
 
-#define PROC_FDINFO_MAX 64
+struct proc_fdinfo {
+   loff_t  f_pos;
+   int f_flags;
+};
 
-static int proc_fd_info(struct inode *inode, struct path *path, char *info)
+static int fdinfo_open_helper(struct inode *inode, int *f_flags, struct path 
*path)
 {
-   struct task_struct *task = get_proc_task(inode);
struct files_struct *files = NULL;
-   int fd = proc_fd(inode);
-   struct file *file;
+   struct task_struct *task;
+   int ret = -ENOENT;
 
+   task = get_proc_task(inode);
if (task) {
files = get_files_struct(task);
put_task_struct(task);
}
+
if (files) {
-   /*
-* We are not taking a ref to the file structure, so we must
-* hold ->file_lock.
-*/
-   spin_lock(>file_lock);
-   file = fcheck_files(files, fd);
-   if (file) {
-   unsigned int f_flags;
-   struct fdtable *fdt;
-
-   fdt = files_fdtable(files);
-   f_flags = file->f_flags & ~O_CLOEXEC;
-   if (close_on_exec(fd, fdt))
-   f_flags |= O_CLOEXEC;
+   int fd = proc_fd(inode);
+   struct file *fd_file;
 
+   spin_lock(>file_lock);
+   fd_file = fcheck_files(files, fd);
+   if (fd_file) {
+   if (f_flags) {
+   struct fdtable *fdt = files_fdtable(files);
+
+   *f_flags = fd_file->f_flags & ~O_CLOEXEC;
+   if (close_on_exec(fd, fdt))
+   *f_flags |= O_CLOEXEC;
+   }
if (path) {
-   *path = file->f_path;
-   path_get(>f_path);
+   *path = fd_file->f_path;
+   path_get(_file->f_path);
}
-   if (info)
-   snprintf(info, PROC_FDINFO_MAX,
-"pos:\t%lli\n"
-"flags:\t0%o\n",
-(long long) file->f_pos,
-f_flags);
-   spin_unlock(>file_lock);
-   put_files_struct(files);
-   return 0;
+   ret = 0;
}
spin_unlock(>file_lock);
put_files_struct(files);
}
-   return -ENOENT;
+
+   return ret;
 }
 
+static int seq_show(struct seq_file *m, void *v)
+{
+   struct proc_fdinfo *fdinfo = m->private;
+   seq_printf(m, "pos:\t%lli\nflags:\t0%o\n",
+  (long long)fdinfo->f_pos,
+  fdinfo->f_flags);
+   return 0;
+}
+
+static int seq_fdinfo_open(struct inode *inode, struct file *file)
+{
+   struct proc_fdinfo *fdinfo = NULL;
+   int ret = -ENOENT;
+
+   fdinfo = kzalloc(sizeof(*fdinfo), GFP_KERNEL);
+   if (!fdinfo)
+   return -ENOMEM;
+
+   ret = fdinfo_open_helper(inode, >f_flags, NULL);
+   if (!ret) {
+   ret = single_open(file, seq_show, fdinfo);
+   if (!ret)
+   fdinfo = NULL;
+   }
+
+   kfree(fdinfo);
+   return ret;
+}
+
+static int seq_fdinfo_release(struct inode *inode, struct file *file)
+{
+   struct seq_file *m = file->private_data;
+   struct proc_fdinfo *fdinfo = m->private;
+
+   kfree(fdinfo);
+
+   return single_release(inode, file);
+}
+
+static const struct file_operations proc_fdinfo_file_operations = {
+   .open   = seq_fdinfo_open,
+   .read   = seq_read,
+   .llseek

[patch 7/8] fs, epoll: Add procfs fdinfo helper v2

2012-08-15 Thread Cyrill Gorcunov

This allow us to print out eventpoll target file descriptor,
events and data, the /proc/pid/fdinfo/fd consists of

 | pos: 0
 | flags:   02
 | tfd:5 events:   1d data: 

This feature is CONFIG_CHECKPOINT_RESTORE only.

Signed-off-by: Cyrill Gorcunov 
CC: Pavel Emelyanov 
CC: Al Viro 
CC: Alexey Dobriyan 
CC: Andrew Morton 
CC: James Bottomley 
CC: Matthew Helsley 
---
 fs/eventpoll.c |   28 
 1 file changed, 28 insertions(+)

Index: linux-2.6.git/fs/eventpoll.c
===
--- linux-2.6.git.orig/fs/eventpoll.c
+++ linux-2.6.git/fs/eventpoll.c
@@ -38,6 +38,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * LOCKING:
@@ -783,8 +785,34 @@ static unsigned int ep_eventpoll_poll(st
return pollflags != -1 ? pollflags : 0;
 }
 
+#ifdef CONFIG_PROC_FS
+static int ep_show_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct eventpoll *ep = f->private_data;
+   struct rb_node *rbp;
+   int ret;
+
+   mutex_lock(>mtx);
+   for (rbp = rb_first(>rbr); rbp; rbp = rb_next(rbp)) {
+   struct epitem *epi = rb_entry(rbp, struct epitem, rbn);
+
+   ret = seq_printf(m, "tfd: %8d events: %8x data: %16llx\n",
+epi->ffd.fd, epi->event.events,
+(long long)epi->event.data);
+   if (ret)
+   break;
+   }
+   mutex_unlock(>mtx);
+
+   return ret;
+}
+#endif
+
 /* File callbacks that implement the eventpoll file behaviour */
 static const struct file_operations eventpoll_fops = {
+#ifdef CONFIG_PROC_FS
+   .show_fdinfo= ep_show_fdinfo,
+#endif
.release= ep_eventpoll_release,
.poll   = ep_eventpoll_poll,
.llseek = noop_llseek,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/8] procfs: Move /proc/pid/fd[info] handling code to fd.[ch]

2012-08-15 Thread Cyrill Gorcunov

This patch prepares the ground for further extension of
/proc/pid/fd[info] handling code by moving fdinfo handling
code into fs/proc/fd.c.

I think such move makes both fs/proc/base.c and fs/proc/fd.c
easier to read.

Signed-off-by: Cyrill Gorcunov 
Acked-by: Pavel Emelyanov 
CC: Al Viro 
CC: Alexey Dobriyan 
CC: Andrew Morton 
CC: James Bottomley 
---
 fs/proc/Makefile   |2 
 fs/proc/base.c |  388 -
 fs/proc/fd.c   |  351 +++
 fs/proc/fd.h   |   14 +
 fs/proc/internal.h |   48 ++
 5 files changed, 416 insertions(+), 387 deletions(-)

Index: linux-2.6.git/fs/proc/Makefile
===
--- linux-2.6.git.orig/fs/proc/Makefile
+++ linux-2.6.git/fs/proc/Makefile
@@ -8,7 +8,7 @@ proc-y  := nommu.o task_nommu.o
 proc-$(CONFIG_MMU) := mmu.o task_mmu.o
 
 proc-y   += inode.o root.o base.o generic.o array.o \
-   proc_tty.o
+   proc_tty.o fd.o
 proc-y += cmdline.o
 proc-y += consoles.o
 proc-y += cpuinfo.o
Index: linux-2.6.git/fs/proc/base.c
===
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -90,6 +90,7 @@
 #endif
 #include 
 #include "internal.h"
+#include "fd.h"
 
 /* NOTE:
  * Implementing inode permission operations in /proc is almost
@@ -136,8 +137,6 @@ struct pid_entry {
NULL, _single_file_operations, \
{ .proc_show = show } )
 
-static int proc_fd_permission(struct inode *inode, int mask);
-
 /*
  * Count the number of hardlinks for the pid_entry table, excluding the .
  * and .. links.
@@ -1492,7 +1491,7 @@ out:
return error;
 }
 
-static const struct inode_operations proc_pid_link_inode_operations = {
+const struct inode_operations proc_pid_link_inode_operations = {
.readlink   = proc_pid_readlink,
.follow_link= proc_pid_follow_link,
.setattr= proc_setattr,
@@ -1501,21 +1500,6 @@ static const struct inode_operations pro
 
 /* building an inode */
 
-static int task_dumpable(struct task_struct *task)
-{
-   int dumpable = 0;
-   struct mm_struct *mm;
-
-   task_lock(task);
-   mm = task->mm;
-   if (mm)
-   dumpable = get_dumpable(mm);
-   task_unlock(task);
-   if(dumpable == 1)
-   return 1;
-   return 0;
-}
-
 struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct 
*task)
 {
struct inode * inode;
@@ -1641,15 +1625,6 @@ int pid_revalidate(struct dentry *dentry
return 0;
 }
 
-static int pid_delete_dentry(const struct dentry * dentry)
-{
-   /* Is the task we represent dead?
-* If so, then don't put the dentry on the lru list,
-* kill it immediately.
-*/
-   return !proc_pid(dentry->d_inode)->tasks[PIDTYPE_PID].first;
-}
-
 const struct dentry_operations pid_dentry_operations =
 {
.d_revalidate   = pid_revalidate,
@@ -1712,289 +1687,6 @@ end_instantiate:
return filldir(dirent, name, len, filp->f_pos, ino, type);
 }
 
-static unsigned name_to_int(struct dentry *dentry)
-{
-   const char *name = dentry->d_name.name;
-   int len = dentry->d_name.len;
-   unsigned n = 0;
-
-   if (len > 1 && *name == '0')
-   goto out;
-   while (len-- > 0) {
-   unsigned c = *name++ - '0';
-   if (c > 9)
-   goto out;
-   if (n >= (~0U-9)/10)
-   goto out;
-   n *= 10;
-   n += c;
-   }
-   return n;
-out:
-   return ~0U;
-}
-
-#define PROC_FDINFO_MAX 64
-
-static int proc_fd_info(struct inode *inode, struct path *path, char *info)
-{
-   struct task_struct *task = get_proc_task(inode);
-   struct files_struct *files = NULL;
-   struct file *file;
-   int fd = proc_fd(inode);
-
-   if (task) {
-   files = get_files_struct(task);
-   put_task_struct(task);
-   }
-   if (files) {
-   /*
-* We are not taking a ref to the file structure, so we must
-* hold ->file_lock.
-*/
-   spin_lock(>file_lock);
-   file = fcheck_files(files, fd);
-   if (file) {
-   unsigned int f_flags;
-   struct fdtable *fdt;
-
-   fdt = files_fdtable(files);
-   f_flags = file->f_flags & ~O_CLOEXEC;
-   if (close_on_exec(fd, fdt))
-   f_flags |= O_CLOEXEC;
-
-   if (path) {
-   *path = file->f_path;
-   path_get(>f_path);
-   }
-   if (info)
-   snprintf(info,

Mmap on SSD (directly mapping the device vs. mapping a file)

2012-08-15 Thread Daniel Noack

Hi, folks!

Like you can see on the subject I experimented a little with mmap in
the last time. I've written a little B+tree library which uses mmap to
store the tree to a file or the whole device (means it is also
possible to map the raw device (i.e. /dev/sdb)). I used msync after
every successfull change on the tree. Next thing I did was to use this
for a little benchmark on performance of different storage devices
(ramdisk, HDD, and a very fast flashcard directly atteched to the PCIe
bus). I recognized that in allmost all cases when directly mapping the
device without a filesystem the file-mapped version was a little bit
slower. But when I tried it on an SSD device the file-mapped version
was an order of magnitude faster. I also tried secure erase and did
the benchmarks many times and in many configurations, but I came to
the same results.
Last thing I tried were the different queue schedulers, without any changes.

In one of the posts from january I read that there is a performance
bug when directly reading from the raw SSD device, but I didn't find
any else comment which stated this as true. For the benchmarks I used
a current Ubuntu with a 3.2.16 kernel (from kernel.org). Is this
behavior normal, or did I miss something.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

2012-08-15 Thread Cyrill Gorcunov

To provide fsnotify object inodes being watched without
binding to alphabetical path we need to encode them with
exportfs help. This patch adds a helper which operates
with plain inodes directly.

Signed-off-by: Cyrill Gorcunov 
Acked-by: Pavel Emelyanov 
CC: Al Viro 
CC: Alexey Dobriyan 
CC: Andrew Morton 
CC: James Bottomley 
---
 fs/exportfs/expfs.c  |   19 +++
 include/linux/exportfs.h |2 ++
 2 files changed, 21 insertions(+)

Index: linux-2.6.git/fs/exportfs/expfs.c
===
--- linux-2.6.git.orig/fs/exportfs/expfs.c
+++ linux-2.6.git/fs/exportfs/expfs.c
@@ -302,6 +302,25 @@ out:
return error;
 }
 
+int export_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len)
+{
+   int len = *max_len;
+   int type = FILEID_INO32_GEN;
+
+   if (len < 2) {
+   *max_len = 2;
+   return 255;
+   }
+
+   len = 2;
+   fid->i32.ino = inode->i_ino;
+   fid->i32.gen = inode->i_generation;
+   *max_len = len;
+
+   return type;
+}
+EXPORT_SYMBOL_GPL(export_encode_inode_fh);
+
 /**
  * export_encode_fh - default export_operations->encode_fh function
  * @inode:   the object to encode
Index: linux-2.6.git/include/linux/exportfs.h
===
--- linux-2.6.git.orig/include/linux/exportfs.h
+++ linux-2.6.git/include/linux/exportfs.h
@@ -177,6 +177,8 @@ struct export_operations {
int (*commit_metadata)(struct inode *inode);
 };
 
+extern int export_encode_inode_fh(struct inode *inode, struct fid *fid, int 
*max_len);
+
 extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
int *max_len, int connectable);
 extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] crypto: twofish - add x86_64/avx assembler implementation

2012-08-15 Thread Borislav Petkov

On Wed, Aug 15, 2012 at 11:42:16AM +0300, Jussi Kivilinna wrote:
> I started thinking about the performance on AMD Bulldozer.
> vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers
> on AMD CPU is alot slower (latencies from 8 to 12 cycles) than on
> Intel sandy-bridge (where instructions have latency of 1 to 2). See:
> http://www.agner.org/optimize/instruction_tables.pdf
>
> It would be really good, if implementation could be tested on AMD CPU
> to determinate, if it causes performance regression. However I don't
> have access to machine with such CPU.

But I do. :)

And if you tell me exactly how to run the tests and on what kernel, I'll
try to do so.

HTH.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kconfig: remove CONFIG_MTD_NAND_VERIFY_WRITE

2012-08-15 Thread Huang Shijie

Just as Artem suggested:

"Both UBI and JFFS2 are able to read verify what they wrote already.
There are also MTD tests which do this verification. So I think there
is no reason to keep this in the NAND layer, let alone wasting RAM in
the driver to support this feature."

So kill MTD_NAND_VERIFY_WRITE entirely. Please see the patch:
http://lists.infradead.org/pipermail/linux-mtd/2012-August/043189.html
  
This patch removes the CONFIG_MTD_NAND_VERIFY_WRITE in the defconfigs.


Signed-off-by: Huang Shijie 
---
 arch/arm/configs/bcmring_defconfig  |1 -
 arch/arm/configs/cam60_defconfig|1 -
 arch/arm/configs/corgi_defconfig|1 -
 arch/arm/configs/ep93xx_defconfig   |1 -
 arch/arm/configs/mini2440_defconfig |1 -
 arch/arm/configs/mv78xx0_defconfig  |1 -
 arch/arm/configs/nhk8815_defconfig  |1 -
 arch/arm/configs/orion5x_defconfig  |1 -
 arch/arm/configs/pxa3xx_defconfig   |1 -
 arch/arm/configs/spitz_defconfig|1 -
 arch/blackfin/configs/BF561-ACVILON_defconfig   |1 -
 arch/mips/configs/rb532_defconfig   |1 -
 arch/powerpc/configs/83xx/mpc8313_rdb_defconfig |1 -
 arch/powerpc/configs/83xx/mpc8315_rdb_defconfig |1 -
 arch/powerpc/configs/mpc83xx_defconfig  |1 -
 15 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/arch/arm/configs/bcmring_defconfig 
b/arch/arm/configs/bcmring_defconfig
index 9e6a8fe..6c389d9 100644
--- a/arch/arm/configs/bcmring_defconfig
+++ b/arch/arm/configs/bcmring_defconfig
@@ -44,7 +44,6 @@ CONFIG_MTD_CFI_ADV_OPTIONS=y
 CONFIG_MTD_CFI_GEOMETRY=y
 # CONFIG_MTD_CFI_I2 is not set
 CONFIG_MTD_NAND=y
-CONFIG_MTD_NAND_VERIFY_WRITE=y
 CONFIG_MTD_NAND_BCM_UMI=y
 CONFIG_MTD_NAND_BCM_UMI_HWCS=y
 # CONFIG_MISC_DEVICES is not set
diff --git a/arch/arm/configs/cam60_defconfig b/arch/arm/configs/cam60_defconfig
index cedc92e..1457971 100644
--- a/arch/arm/configs/cam60_defconfig
+++ b/arch/arm/configs/cam60_defconfig
@@ -49,7 +49,6 @@ CONFIG_MTD_COMPLEX_MAPPINGS=y
 CONFIG_MTD_PLATRAM=m
 CONFIG_MTD_DATAFLASH=y
 CONFIG_MTD_NAND=y
-CONFIG_MTD_NAND_VERIFY_WRITE=y
 CONFIG_MTD_NAND_ATMEL=y
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
diff --git a/arch/arm/configs/corgi_defconfig b/arch/arm/configs/corgi_defconfig
index e53c475..4b8a25d 100644
--- a/arch/arm/configs/corgi_defconfig
+++ b/arch/arm/configs/corgi_defconfig
@@ -97,7 +97,6 @@ CONFIG_MTD_BLOCK=y
 CONFIG_MTD_ROM=y
 CONFIG_MTD_COMPLEX_MAPPINGS=y
 CONFIG_MTD_NAND=y
-CONFIG_MTD_NAND_VERIFY_WRITE=y
 CONFIG_MTD_NAND_SHARPSL=y
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_IDE=y
diff --git a/arch/arm/configs/ep93xx_defconfig 
b/arch/arm/configs/ep93xx_defconfig
index 8e97b2f..806005a 100644
--- a/arch/arm/configs/ep93xx_defconfig
+++ b/arch/arm/configs/ep93xx_defconfig
@@ -61,7 +61,6 @@ CONFIG_MTD_CFI_STAA=y
 CONFIG_MTD_ROM=y
 CONFIG_MTD_PHYSMAP=y
 CONFIG_MTD_NAND=y
-CONFIG_MTD_NAND_VERIFY_WRITE=y
 CONFIG_BLK_DEV_NBD=y
 CONFIG_EEPROM_LEGACY=y
 CONFIG_SCSI=y
diff --git a/arch/arm/configs/mini2440_defconfig 
b/arch/arm/configs/mini2440_defconfig
index 082175c..00630e6 100644
--- a/arch/arm/configs/mini2440_defconfig
+++ b/arch/arm/configs/mini2440_defconfig
@@ -102,7 +102,6 @@ CONFIG_MTD_CFI_STAA=y
 CONFIG_MTD_RAM=y
 CONFIG_MTD_ROM=y
 CONFIG_MTD_NAND=y
-CONFIG_MTD_NAND_VERIFY_WRITE=y
 CONFIG_MTD_NAND_S3C2410=y
 CONFIG_MTD_NAND_PLATFORM=y
 CONFIG_MTD_LPDDR=y
diff --git a/arch/arm/configs/mv78xx0_defconfig 
b/arch/arm/configs/mv78xx0_defconfig
index 7305ebd..1f08219 100644
--- a/arch/arm/configs/mv78xx0_defconfig
+++ b/arch/arm/configs/mv78xx0_defconfig
@@ -49,7 +49,6 @@ CONFIG_MTD_CFI_INTELEXT=y
 CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_MTD_PHYSMAP=y
 CONFIG_MTD_NAND=y
-CONFIG_MTD_NAND_VERIFY_WRITE=y
 CONFIG_MTD_NAND_ORION=y
 CONFIG_BLK_DEV_LOOP=y
 # CONFIG_SCSI_PROC_FS is not set
diff --git a/arch/arm/configs/nhk8815_defconfig 
b/arch/arm/configs/nhk8815_defconfig
index bf123c5..240b25e 100644
--- a/arch/arm/configs/nhk8815_defconfig
+++ b/arch/arm/configs/nhk8815_defconfig
@@ -57,7 +57,6 @@ CONFIG_MTD_CHAR=y
 CONFIG_MTD_BLOCK=y
 CONFIG_MTD_NAND=y
 CONFIG_MTD_NAND_ECC_SMC=y
-CONFIG_MTD_NAND_VERIFY_WRITE=y
 CONFIG_MTD_NAND_NOMADIK=y
 CONFIG_MTD_ONENAND=y
 CONFIG_MTD_ONENAND_VERIFY_WRITE=y
diff --git a/arch/arm/configs/orion5x_defconfig 
b/arch/arm/configs/orion5x_defconfig
index a288d70..cd5e6ba 100644
--- a/arch/arm/configs/orion5x_defconfig
+++ b/arch/arm/configs/orion5x_defconfig
@@ -72,7 +72,6 @@ CONFIG_MTD_CFI_INTELEXT=y
 CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_MTD_PHYSMAP=y
 CONFIG_MTD_NAND=y
-CONFIG_MTD_NAND_VERIFY_WRITE=y
 CONFIG_MTD_NAND_PLATFORM=y
 CONFIG_MTD_NAND_ORION=y
 CONFIG_BLK_DEV_LOOP=y
diff --git a/arch/arm/configs/pxa3xx_defconfig 
b/arch/arm/configs/pxa3xx_defconfig
index 1677a06..60e3138 100644
--- a/arch/arm/configs/pxa3xx_defconfig
+++ b/arch/arm/configs/pxa3xx_defconfig
@@ -36,7 +36,6 @@ CONFIG_MTD_CONCAT=y
 CONFIG_MTD_CHAR=y
 CONFIG_MTD_BLOCK=y

[PATCH] video:uvesafb: reduce the double check

2012-08-15 Thread Wang YanQing


uvesafb_open had checked the par->vbe_state_size,
so we don't need to check it again in uvesafb_vbe_state_save,
this patch just can reduce a few lines of code.

Signed-off-by: Wang YanQing 
---
 drivers/video/uvesafb.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 2f8f82d..b064a3e 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -357,9 +357,6 @@ static u8 *uvesafb_vbe_state_save(struct uvesafb_par *par)
u8 *state;
int err;
 
-   if (!par->vbe_state_size)
-   return NULL;
-
state = kmalloc(par->vbe_state_size, GFP_KERNEL);
if (!state)
return ERR_PTR(-ENOMEM);
-- 
1.7.11.1.116.g8228a23
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8] kbuild: move W=... stuff to Kbuild.arch

2012-08-15 Thread Artem Bityutskiy

On Wed, 2012-06-06 at 17:35 +0200, Sam Ravnborg wrote:
> On Wed, Jun 06, 2012 at 01:18:47PM +0300, Artem Bityutskiy wrote:
> > On Sat, 2012-05-05 at 10:18 +0200, Sam Ravnborg wrote:
> > > Prevent that we eveluate cc-option multiple times for the same
> > > option by moving the definitions to Kbuild.arch.
> > > The file is included once only, thus gcc is not invoked once per 
> > > directory.
> > > 
> > > Another side-effect of this patch is that KCFLAGS are appended last
> > > to the list of options. This allows us to better control the options.
> > > Artem Bityutskiy  noticed this.
> > > 
> > > Signed-off-by: Sam Ravnborg 
> > > Cc: Artem Bityutskiy 
> > 
> > Hi,
> > 
> > what happened to this patch? I was fixing the real issue I am
> > encountering and I thought it'd be taken instead of my original patch.
> We decided to move this to next merge release because is was not added
> to kbuild thus not enough exposure in -next.
> 
> I am planning to resend the serie at around -rc2 time.

Hi Sam, what happened to this patch-set? At least KCFLAGS patches I am
waiting for are still not merged.

-- 
Best Regards,
Artem Bityutskiy


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa


>> We always account to both user and kernel resource_counters. This
>> effectively means that an independent kernel limit is in place when the
>> limit is set to a lower value than the user memory. A equal or higher
>> value means that the user limit will always hit first, meaning that kmem
>> is effectively unlimited.
> 
> Well, it contributes to the user limit so it is not unlimited. It just
> falls under a different limit and it tends to contribute less.

You are right, but this is just wording. I will update it, but what I
really mean here is that an independent limit is no imposed on kmem.

> This can
> be quite confusing.  I am still not sure whether we should mix the two
> things together. If somebody wants to limit the kernel memory he has to
> touch the other limit anyway.  Do you have a strong reason to mix the
> user and kernel counters?

This is funny, because the first opposition I found to this work was
"Why would anyone want to limit it separately?" =p

It seems that a quite common use case is to have a container with a
unified view of "memory" that it can use the way he likes, be it with
kernel memory, or user memory. I believe those people would be happy to
just silently account kernel memory to user memory, or at the most have
a switch to enable it.

What gets clear from this back and forth, is that there are people
interested in both use cases.

> My impression was that kernel allocation should simply fail while user
> allocations might reclaim as well. Why should we reclaim just because of
> the kernel allocation (which is unreclaimable from hard limit reclaim
> point of view)?

That is not what the kernel does, in general. We assume that if he wants
that memory and we can serve it, we should. Also, not all kernel memory
is unreclaimable. We can shrink the slabs, for instance. Ying Han
claims she has patches for that already...

> I also think that the whole thing would get much simpler if those two
> are split. Anyway if this is really a must then this should be
> documented here.

Well, documentation can't hurt.

> 
> This doesn't check for the hierachy so kmem_accounted might not be in 
> sync with it's parents. mem_cgroup_create (below) needs to copy
> kmem_accounted down from the parent and the above needs to check if this
> is a similar dance like mem_cgroup_oom_control_write.
> 

I don't see why we have to.

I believe in a A/B/C hierarchy, C should be perfectly able to set a
different limit than its parents. Note that this is not a boolean.

Also, right now, C can become completely unlimited (by not setting a
limited) and this is, indeed, not the desired behavior.

A later patch will change kmem_accounted to a bitfield, and we'll use
one of the bits to signal that we should account kmem because our parent
is limited.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/1] HID:hid-multitouch: Add ELAN production request when resume.

2012-08-15 Thread Jiri Kosina

On Wed, 15 Aug 2012, Scott Liu wrote:

> Add ELAN production request when resume.
> 
> Some Elan legacy devices require SET_IDLE to be set on resume.
> It should be safe to send it to other devices too.
> Tested on 3M, Stantum, Cypress, Zytronic, eGalax, and Elan panels. 
> 
> 
> Signed-off-by: Scott Liu 
> Suggested-off-by: Benjamin Tissoires 
> ---
>  drivers/hid/hid-multitouch.c |   27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c
> index 59c8b5c..e824c37 100644
> --- a/drivers/hid/hid-multitouch.c
> +++ b/drivers/hid/hid-multitouch.c
> @@ -767,6 +767,32 @@ static int mt_reset_resume(struct hid_device *hdev)
>   mt_set_input_mode(hdev);
>   return 0;
>  }
> +
> +static int mt_resume(struct hid_device *hdev)
> +{
> + struct usb_interface *intf;
> + struct usb_host_interface *interface;
> + struct usb_device *dev;
> +
> + if (hdev->bus != BUS_USB)
> + return 0;
> +
> + intf = to_usb_interface(hdev->dev.parent);
> + interface = intf->cur_altsetting;
> + dev = hid_to_usb_dev(hdev);
> +
> + /* Some Elan legacy devices require SET_IDLE to be set on resume.
> +  * It should be safe to send it to other devices too.
> +  * Tested on 3M, Stantum, Cypress, Zytronic, eGalax, and Elan panels. */
> +
> + usb_control_msg(dev, usb_sndctrlpipe(dev, 0),
> + HID_REQ_SET_IDLE,
> + USB_TYPE_CLASS | USB_RECIP_INTERFACE,
> + 0, interface->desc.bInterfaceNumber,
> + NULL, 0, USB_CTRL_SET_TIMEOUT);
> +
> + return 0;
> +}
>  #endif
>  
>  static void mt_remove(struct hid_device *hdev)
> @@ -1092,6 +1118,7 @@ static struct hid_driver mt_driver = {
>   .event = mt_event,
>  #ifdef CONFIG_PM
>   .reset_resume = mt_reset_resume,
> + .resume = mt_resume,
>  #endif
>  };

I am now queuing this one in my tree. If it later turns out that some 
devices actually don't like this request (which one would hope is very 
unlinkely to happen), we'll have to make it device specific.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/7] HID: picoLCD updates

2012-08-15 Thread Bruno Prémont

Hi Jiri,

On Wed, 15 August 2012 Jiri Kosina  wrote:
> On Mon, 30 Jul 2012, Bruno Prémont wrote:
> > Hi,
> > 
> > This series updates picoLCD driver:
> > - split the driver functions into separate files which get included
> >   depending on Kconfig selection
> >   (implementation for CIR using RC_CORE will follow later)
> > - drop private framebuffer refcounting in favor of refcounting added
> >   to fb_info some time ago
> > - fix various bugs issues
> > - disabled firmware version checking in probe() as it does not work
> >   anymore since commit 4ea5454203d991ec85264f64f89ca8855fce69b0
> >   [HID: Fix race condition between driver core and ll-driver]
> 
> I have now applied the series to my 'picolcd' branch, except for 6/7, 
> please see the comment I sent to it separately.

Will respin that one soon

> > Note: I still get weird behavior on quick unbind/bind sequences
> > issued via sysfs (CONFIG_SMP=n system) that are triggered by framebuffer
> > support and apparently more specifically fb_defio part of it.
> > 
> > Unfortunately I'm out of ideas as to how to track down the problem which
> > shows either as SLAB corruption (detected with SLUB debugging, e.g.
> 
> Would be nice to have this sorted out before the next merge window indeed, 
> so that it can go in together with the rest of the changes.
> 
> > 
> > [ 6383.521833] 
> > =
> > [ 6383.530020] BUG kmalloc-64 (Not tainted): Object already free
> > [ 6383.530020] 
> > -
> > [ 6383.530020] 
> > [ 6383.530020] INFO: Slab 0xdde0ea20 objects=51 used=40 fp=0xcef516e0 
> > flags=0x4080
> > [ 6383.530020] INFO: Object 0xcef51190 @offset=400 fp=0xcef51f50
> > [ 6383.530020] 
> > [ 6383.530020] Bytes b4 cef51180: cc cc cc cc d0 12 f5 ce 5a 5a 5a 5a 5a 5a 
> > 5a 5a  
> > [ 6383.530020] Object cef51190: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> > 6b 6b  
> > [ 6383.530020] Object cef511a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> > 6b 6b  
> > [ 6383.530020] Object cef511b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> > 6b 6b  
> > [ 6383.530020] Object cef511c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> > 6b a5  kkk.
> > [ 6383.530020] Redzone cef511d0: bb bb bb bb
> >   
> > [ 6383.530020] Padding cef511d8: 5a 5a 5a 5a 5a 5a 5a 5a
> >   
> > [ 6383.530020] Pid: 1922, comm: bash Not tainted 
> > 3.5.0-jupiter-3-g8d858b1-dirty #2
> > [ 6383.530020] Call Trace:
> > [ 6383.530020]  [] print_trailer+0x11c/0x130
> > [ 6383.530020]  [] object_err+0x35/0x40
> > [ 6383.530020]  [] free_debug_processing+0x99/0x200
> > [ 6383.530020]  [] __slab_free+0x2e/0x280
> > [ 6383.530020]  [] ? hid_submit_out+0xa4/0x120
> > [ 6383.530020]  [] ? __usbhid_submit_report+0xc0/0x3c0
> > [ 6383.530020]  [] ? kfree+0xfa/0x110
> > [ 6383.530020]  [] ? picolcd_debug_out_report+0x8c4/0x8e0 
> > [hid_picolcd]
> > [ 6383.530020]  [] kfree+0xfa/0x110
> > [ 6383.530020]  [] ? hid_submit_out+0xa4/0x120
> > [ 6383.530020]  [] ? hid_submit_out+0xa4/0x120
> > [ 6383.530020]  [] ? hid_submit_out+0xa4/0x120
> > [ 6383.530020]  [] hid_submit_out+0xa4/0x120
> > [ 6383.530020]  [] __usbhid_submit_report+0x158/0x3c0
> > [ 6383.530020]  [] usbhid_submit_report+0x1b/0x30
> > [ 6383.530020]  [] picolcd_fb_reset+0xb9/0x180 [hid_picolcd]
> > [ 6383.530020]  [] picolcd_init_framebuffer+0x20d/0x2e0 
> > [hid_picolcd]
> > [ 6383.530020]  [] picolcd_probe+0x3cc/0x580 [hid_picolcd]
> > [ 6383.530020]  [] hid_device_probe+0x67/0xf0
> > [ 6383.530020]  [] ? driver_sysfs_add+0x57/0x80
> > [ 6383.530020]  [] driver_probe_device+0xbd/0x1c0
> > [ 6383.530020]  [] ? hid_match_device+0x7b/0x90
> > [ 6383.530020]  [] driver_bind+0x75/0xd0
> > [ 6383.530020]  [] ? driver_unbind+0x90/0x90
> > [ 6383.530020]  [] drv_attr_store+0x27/0x30
> > [ 6383.530020]  [] sysfs_write_file+0xac/0xf0
> > [ 6383.530020]  [] vfs_write+0x9c/0x130
> > [ 6383.530020]  [] ? sys_dup3+0x11f/0x160
> > [ 6383.530020]  [] ? sysfs_poll+0x90/0x90
> > [ 6383.530020]  [] sys_write+0x3d/0x70
> > [ 6383.530020]  [] sysenter_do_call+0x12/0x26
> 
> So I am wondering whether the path this happens on is
> 
> if (!test_bit(HID_OUT_RUNNING, >iofl)) {
> usbhid_restart_out_queue(usbhid);
> 
> in __usbhid_submit_report(). It would then indicate perhaps some race with 
> iofl handling.

Huh, that specific test_bit hunk I can't find in __usbhid_submit_report,
is that 3.6 material?
I'm running my tests against 3.5...

The nearest I have is:
if (!test_bit(HID_OUT_RUNNING, >iofl))
if (!irq_out_pump_restart(hid))
set_bit(HID_OUT_RUNNING, 
>iofl);


> Could you please stick some printk() just

[PATCH] act_mirred: do not drop packets when fails to mirror it

2012-08-15 Thread Jason Wang

We drop packet unconditionally when we fail to mirror it. This is not intended
in some cases. Consdier for kvm guest, we may mirror the traffic of the bridge
to a tap device used by a VM. When kernel fails to mirror the packet in
conditions such as when qemu crashes or stop polling the tap, it's hard for the
management software to detect such condition and clean the the mirroring
before. This would lead all packets to the bridge to be dropped and break the
netowrk of other virtual machines.

To solve the issue, the patch does not drop packets when kernel fails to mirror
it, and only drop the redirected packets.

Signed-off-by: Jason Wang 
---
 net/sched/act_mirred.c |9 +++--
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index fe81cc1..3682951 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -198,15 +198,12 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
err = dev_queue_xmit(skb2);
 
 out:
-   if (err) {
+   if (err)
m->tcf_qstats.overlimits++;
-   /* should we be asking for packet to be dropped?
-* may make sense for redirect case only
-*/
+   if (err && m->tcf_action == TC_ACT_STOLEN)
retval = TC_ACT_SHOT;
-   } else {
+   else
retval = m->tcf_action;
-   }
spin_unlock(>tcf_lock);
 
return retval;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa


>> + * memcg_kmem_new_page: verify if a new kmem allocation is allowed.
>> + * @gfp: the gfp allocation flags.
>> + * @handle: a pointer to the memcg this was charged against.
>> + * @order: allocation order.
>> + *
>> + * returns true if the memcg where the current task belongs can hold this
>> + * allocation.
>> + *
>> + * We return true automatically if this allocation is not to be accounted to
>> + * any memcg.
>> + */
>> +static __always_inline bool
>> +memcg_kmem_new_page(gfp_t gfp, void *handle, int order)
>> +{
>> +if (!memcg_kmem_on)
>> +return true;
>> +if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL))
> 
> OK, I see the point behind __GFP_NOFAIL but it would deserve a comment
> or a mention in the changelog.

documentation can't hurt!

Just added.

> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 54e93de..e9824c1 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
> [...]
>> +EXPORT_SYMBOL(__memcg_kmem_new_page);
> 
> Why is this exported?
> 

It shouldn't be. Removed.

>> +
>> +void __memcg_kmem_commit_page(struct page *page, void *handle, int order)
>> +{
>> +struct page_cgroup *pc;
>> +struct mem_cgroup *memcg = handle;
>> +
>> +if (!memcg)
>> +return;
>> +
>> +WARN_ON(mem_cgroup_is_root(memcg));
>> +/* The page allocation must have failed. Revert */
>> +if (!page) {
>> +size_t size = PAGE_SIZE << order;
>> +
>> +memcg_uncharge_kmem(memcg, size);
>> +mem_cgroup_put(memcg);
>> +return;
>> +}
>> +
>> +pc = lookup_page_cgroup(page);
>> +lock_page_cgroup(pc);
>> +pc->mem_cgroup = memcg;
>> +SetPageCgroupUsed(pc);
> 
> Don't we need a write barrier before assigning memcg? Same as
> __mem_cgroup_commit_charge. This tests the Used bit always from within
> lock_page_cgroup so it should be safe but I am not 100% sure about the
> rest of the code.
> 
Well, I don't see the reason, precisely because we'll always grab it
from within the locked region. That should ensure all the necessary
serialization.

>> +#ifdef CONFIG_MEMCG_KMEM
>> +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, s64 delta)
>> +{
>> +struct res_counter *fail_res;
>> +struct mem_cgroup *_memcg;
>> +int ret;
>> +bool may_oom;
>> +bool nofail = false;
>> +
>> +may_oom = (gfp & __GFP_WAIT) && (gfp & __GFP_FS) &&
>> +!(gfp & __GFP_NORETRY);
> 
> This deserves a comment.
> 
can't hurt!! =)

>> +
>> +ret = 0;
>> +
>> +if (!memcg)
>> +return ret;
>> +
>> +_memcg = memcg;
>> +ret = __mem_cgroup_try_charge(NULL, gfp, delta / PAGE_SIZE,
>> +&_memcg, may_oom);
> 
> This is really dangerous because atomic allocation which seem to be
> possible could result in deadlocks because of the reclaim. 

Can you elaborate on how this would happen?

> Also, as I
> have mentioned in the other email in this thread. Why should we reclaim
> just because of kernel allocation when we are not reclaiming any of it
> because shrink_slab is ignored in the memcg reclaim.


Don't get too distracted by the fact that shrink_slab is ignored. It is
temporary, and while this being ignored now leads to suboptimal
behavior, it will 1st, only affect its users, and 2nd, not be disastrous.

I see it this as more or less on pair with the soft limit reclaim
problem we had. It is not ideal, but it already provided functionality

>> +
>> +if (ret == -EINTR)  {
>> +nofail = true;
>> +/*
>> + * __mem_cgroup_try_charge() chosed to bypass to root due to
>> + * OOM kill or fatal signal.  Since our only options are to
>> + * either fail the allocation or charge it to this cgroup, do
>> + * it as a temporary condition. But we can't fail. From a
>> + * kmem/slab perspective, the cache has already been selected,
>> + * by mem_cgroup_get_kmem_cache(), so it is too late to change
>> + * our minds
>> + */
>> +res_counter_charge_nofail(>res, delta, _res);
>> +if (do_swap_account)
>> +res_counter_charge_nofail(>memsw, delta,
>> +  _res);
> 
> Hmmm, this is kind of ugly but I guess unvoidable with the current
> implementation. Oh well...
> 

Oh well...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman

On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > What I think you should do is use rcu for access.
> > > > And here sync rcu before freeing.
> > > > Maybe an overkill but at least a documented synchronization
> > > > primitive, and it is very light weight.
> > > > 
> > > 
> > > I liked your suggestion on barriers, as well.
> > > 
> > 
> > I have not thought about this as deeply as I shouold but is simply 
> > rechecking
> > the mapping under the pages_lock to make sure the page is still a balloon
> > page an option? i.e. use pages_lock to stabilise page->mapping.
> 
> To clarify, are you concerned about cost of rcu_read_lock
> for non balloon pages?
> 

Not as such, but given the choice between introducing RCU locking and
rechecking page->mapping under a spinlock I would choose the latter as it
is more straight-forward.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Qemu-devel] [PATCH v8] kvm: notify host when the guest is panicked

2012-08-15 Thread Gleb Natapov

On Tue, Aug 14, 2012 at 02:35:34PM -0500, Anthony Liguori wrote:
> > Do you consider allowing support for Windows as overengineering?
> 
> I don't think there is a way to hook BSOD on Windows so attempting to
> engineer something that works with Windows seems odd, no?
> 
Yan says in other email that is is possible to register a bugcheck callback.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin

On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > What I think you should do is use rcu for access.
> > > > > And here sync rcu before freeing.
> > > > > Maybe an overkill but at least a documented synchronization
> > > > > primitive, and it is very light weight.
> > > > > 
> > > > 
> > > > I liked your suggestion on barriers, as well.
> > > > 
> > > 
> > > I have not thought about this as deeply as I shouold but is simply 
> > > rechecking
> > > the mapping under the pages_lock to make sure the page is still a balloon
> > > page an option? i.e. use pages_lock to stabilise page->mapping.
> > 
> > To clarify, are you concerned about cost of rcu_read_lock
> > for non balloon pages?
> > 
> 
> Not as such, but given the choice between introducing RCU locking and
> rechecking page->mapping under a spinlock I would choose the latter as it
> is more straight-forward.

OK but checking it how? page->mapping == balloon_mapping does not scale to
multiple balloons, so I hoped we can switch to
page->mapping->flags & BALLOON_MAPPING or some such,
but this means we dereference it outside the lock ...

We will also need to add some API to set/clear mapping so that driver
does not need to poke in mm internals, but that's easy.

> -- 
> Mel Gorman
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [NEW DRIVER V2 5/7] DA9058 GPIO driver

2012-08-15 Thread Opensource [Anthony Olech]

> -Original Message-
> From: Linus Walleij [mailto:linus.wall...@linaro.org]
> Sent: 13 August 2012 14:10
> To: Opensource [Anthony Olech]
> Cc: Grant Likely; Linus Walleij; Mark Brown; LKML; David Dajun Chen; Samuel
> Ortiz; Lee Jones
> Subject: Re: [NEW DRIVER V2 5/7] DA9058 GPIO driver
> Hi Anthony, sorry for delayed reply... 
> On Sun, Aug 5, 2012 at 10:43 PM, Anthony Olech
>  wrote:
> > This is the GPIO component driver of the Dialog DA9058 PMIC.
> > This driver is just one component of the whole DA9058 PMIC driver.
> > It depends on the core DA9058 MFD driver.
> OK
> > +config GPIO_DA9058
> > +   tristate "Dialog DA9058 GPIO"
> > +   depends on MFD_DA9058
> select IRQ_DOMAIN, you're going to want to use it...
> > diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile index
> > 0f55662..209224a 100644
> (...)
> > +#include 
> > +#include 
> Really?
> > +#include 
> Really?
> > +#include 
> > +#include 
> > +#include 
> Really?
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> If you're using regmap you better select it in Kconfig too, but it appears you
> don't. You should be using regmap in the main MFD driver in this case (I
> haven't looked at it though.)
> This header set just looks like it was copied from some other file and never
> really proofread, so please go over it in detail.


for some reason against 2.6.2x some of those were required, I just totally
forgot to prune out the unwanted ones when rebasing forwards to 3.5.
Good that you spotted it! Sorry I will try to prune the includes in the future.


> > +#include  #include
> > + #include 
> > +#include  #include 
> > +#include 
> Samuel will have to comment on this organization of headers, it seems a little
> much. DO you really need all of them?


One of them should have been stripped out by my submit script, but as for the
others you must bear in mind that the DA9058 PMIC is a multifunction device,
and thus some header files are common and some are specific to various
component drivers. The very reason that you picked up on non-relevant include
files surely has implications on the structure of the header files for the 
DA9058,
in particular struct's and define's that only apply to one component driver 
should
be in separate header files.


> > +struct da9058_gpio {
> > +   struct da9058 *da9058;
> > +   struct platform_device *pdev;
> > +   struct gpio_chip gp;
> > +   struct mutex lock;
> > +   u8 inp_config;
> > +   u8 out_config;
> > +};
> > +
> > +static struct da9058_gpio *gpio_chip_to_da9058_gpio(struct gpio_chip
> > +*chip) {
> > +   return container_of(chip, struct da9058_gpio, gp); }
> static inline, or a #define, but the compile will probably optimize-inline it
> anyway.


The compiler should optimize it to in-line, but I will change it anyway.


> > +static int da9058_gpio_get(struct gpio_chip *gc, unsigned offset) {
> > +   struct da9058_gpio *gpio = gpio_chip_to_da9058_gpio(gc);
> > +   struct da9058 *da9058 = gpio->da9058;
> > +   unsigned int gpio_level;
> > +   int ret;
> > +
> > +   if (offset > 1)
> > +   return -EINVAL; 
> So there are two GPIO pins, 0 and 1? That seems odd, but OK.


That is a feature of the hardware. I believe that calling them "0" and
"1" is the correct thing to do. Correct me if they should have been
called "1" and "2", or something else.


> > +   if (offset) { 
> So this is for GPIO 1


Yes, it seemed the obvious thing to do.


> > +   u8 value_bits = value ? 0x80 : 0x00;
> 
> These "value_bits" are just confusing. Just delete this and use the direct 
> value
> below.


Will do. It was done for diagnostics that have since been stripped out.


> > +
> > +   gpio->out_config &= ~0x80;
> A better way of writing &= ~0x80; is &= 0x7F
> > +   gpio->out_config |= value_bits;
> gpio->out_config = value ? 0x80 : 0x00;
> So, less confusing.


see HANDLING NIBBLES below


> > +   if (!(gpio_cntrl & 0x20))
> > +   goto exit;
> Please insert a comment explaining what this bit is doing and why you're just
> exiting if it's not set. I don't understand one thing.


I have explained why in the driver source in the next submission attempt

 
> Maybe this would be better if you didn't use so many magic values, what about:
> #include 
> #define FOO_FLAG BIT(3) /* This is a flag for foo */
> > +
> > +   gpio_cntrl &= ~0xF0;
> A better way to write &= ~F0 is to write &= 0x0F;
> If you don't #define the constants this way of negating numbers just get
> confusing.
> So this is OK:
>   foo &= ~FOO_FLAG;
>   foo |= set ? FOO_FLAG : 0;
> This is just hard to read:
>   foo &= ~0x55;
>   foo |= set ? 0x55 : 0;
> And is better off
>foo &= 0xAA;
>foo |= set ? 0x55 : 0;
> I prefer that you #define the registers but it's your pick.
> > +   gpio_cntrl |= 0xF0 & gpio->out_config;
> > +
> > +   ret =

Re: [PATCH 7/8] kbuild: move W=... stuff to Kbuild.arch

2012-08-15 Thread Sam Ravnborg

On Wed, Aug 15, 2012 at 12:41:23PM +0300, Artem Bityutskiy wrote:
> On Wed, 2012-06-06 at 17:35 +0200, Sam Ravnborg wrote:
> > On Wed, Jun 06, 2012 at 01:18:47PM +0300, Artem Bityutskiy wrote:
> > > On Sat, 2012-05-05 at 10:18 +0200, Sam Ravnborg wrote:
> > > > Prevent that we eveluate cc-option multiple times for the same
> > > > option by moving the definitions to Kbuild.arch.
> > > > The file is included once only, thus gcc is not invoked once per 
> > > > directory.
> > > > 
> > > > Another side-effect of this patch is that KCFLAGS are appended last
> > > > to the list of options. This allows us to better control the options.
> > > > Artem Bityutskiy  noticed this.
> > > > 
> > > > Signed-off-by: Sam Ravnborg 
> > > > Cc: Artem Bityutskiy 
> > > 
> > > Hi,
> > > 
> > > what happened to this patch? I was fixing the real issue I am
> > > encountering and I thought it'd be taken instead of my original patch.
> > We decided to move this to next merge release because is was not added
> > to kbuild thus not enough exposure in -next.
> > 
> > I am planning to resend the serie at around -rc2 time.
> 
> Hi Sam, what happened to this patch-set? At least KCFLAGS patches I am
> waiting for are still not merged.
Vacation and then I have not yet gotten back to them.
Will do soon - thanks for the reminder!

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[perf] make clean problematic bashism

2012-08-15 Thread Wouter M. Koolen


Dear perf maintainers,

I attempted to compile perf 3.5.1 without worrying about installing 
dependencies first. The resulting error messages were quite helpful, and 
led me to install a bunch of development libraries and flex.


Unfortunately, after installing flex the build still failed, even after 
make clean.


The reason for this was a bunch of generated empty flex files in util/ 
that were not removed by make clean. They are intended to be erased, 
since the Makefile executes


rm -f util/*-{bison,flex}*

however, this command does not remove the files. I guess because {,} 
alternatives are only special in bash but the makefile is run with some 
other shell?


I got perf to compile now, but thought you would be interested to know 
about this little problem.


With kind regards,

Wouter Koolen


PS: as a side note: GNU make has the .DELETE_ON_ERROR: special target, 
which removes the target file when its generating command fails. This 
would have prevented my problem and sounds like a good idea in general. 
Maybe perf could make use of this feature when on GNU make?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [perf] make clean problematic bashism

2012-08-15 Thread Peter Zijlstra

On Wed, 2012-08-15 at 11:52 +0200, Wouter M. Koolen wrote:
> Dear perf maintainers,
> 
> I attempted to compile perf 3.5.1 without worrying about installing 
> dependencies first. The resulting error messages were quite helpful, and 
> led me to install a bunch of development libraries and flex.
> 
> Unfortunately, after installing flex the build still failed, even after 
> make clean.
> 
> The reason for this was a bunch of generated empty flex files in util/ 
> that were not removed by make clean. They are intended to be erased, 
> since the Makefile executes
> 
> rm -f util/*-{bison,flex}*
> 
> however, this command does not remove the files. I guess because {,} 
> alternatives are only special in bash but the makefile is run with some 
> other shell?

ISTR us getting a number of such patches, did we miss a site, acme?

> I got perf to compile now, but thought you would be interested to know 
> about this little problem.
> 
> With kind regards,
> 
> Wouter Koolen
> 
> 
> PS: as a side note: GNU make has the .DELETE_ON_ERROR: special target, 
> which removes the target file when its generating command fails. This 
> would have prevented my problem and sounds like a good idea in general. 
> Maybe perf could make use of this feature when on GNU make?

I don't think we build with anything but gnu make, mind sending a patch
implementing your suggestion?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf: Let O= makes handle relative paths

2012-08-15 Thread Borislav Petkov

On Mon, Aug 13, 2012 at 03:02:49PM -0300, Arnaldo Carvalho de Melo wrote:
> [acme@sandy linux]$ rm -rf ../build/perf
> [acme@sandy linux]$ make -j8 -C tools/perf/ LIBUNWIND_DIR=/opt/libunwind 
> O=/home/acme/git/build/perf install
> /bin/sh: line 0: cd: /home/acme/git/build/perf: No such file or directory
> make: Entering directory `/home/git/linux/tools/perf'
> GEN perf-archive
> GEN /home/git/linux/tools/perf/python/perf.so
> make[1]: Entering directory `/home/git/linux/tools/lib/traceevent'
> make[2]: warning: jobserver unavailable: using -j1.  Add `+' to parent make 
> rule.
> * new build flags or cross compiler
> CC /home/git/linux/tools/perf/perf.o
> CC /home/git/linux/tools/perf/builtin-annotate.o
> CC /home/git/linux/tools/perf/builtin-bench.o
> CC /home/git/linux/tools/perf/bench/sched-messaging.o
> CC /home/git/linux/tools/perf/bench/sched-pipe.o
> 
> I.e. it should stop if the O= provided directory is not there.

Why stop? Don't we want to make the directory instead and continue
building in there?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] mtd: lpc32xx_slc: Adjust to pl08x DMA interface changes

2012-08-15 Thread Artem Bityutskiy

On Thu, 2012-07-12 at 14:22 +0200, Roland Stigge wrote:
> This patch adjusts the LPC32xx SLC NAND driver to the new pl08x DMA interface,
> fixing the compile error resulting from changed pl08x structures.
> 
> Signed-off-by: Roland Stigge 

This patch breaks compilation:

ERROR: "pl08x_filter_id" [drivers/mtd/nand/lpc32xx_slc.ko] undefined!

Please, send a fix. The defconfig I used is attached.

-- 
Best Regards,
Artem Bityutskiy
CONFIG_EXPERIMENTAL=y
CONFIG_SYSVIPC=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_EMBEDDED=y
CONFIG_SLAB=y
CONFIG_JUMP_LABEL=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_PARTITION_ADVANCED=y
CONFIG_ARCH_LPC32XX=y
CONFIG_PREEMPT=y
CONFIG_AEABI=y
CONFIG_ZBOOT_ROM_TEXT=0x0
CONFIG_ZBOOT_ROM_BSS=0x0
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
CONFIG_CMDLINE="console=ttyS0,115200n81 root=/dev/ram0"
CONFIG_CPU_IDLE=y
CONFIG_FPE_NWFPE=y
CONFIG_VFP=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_BINFMT_AOUT=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_INET_LRO is not set
# CONFIG_INET_DIAG is not set
CONFIG_IPV6=y
CONFIG_IPV6_PRIVACY=y
# CONFIG_WIRELESS is not set
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
# CONFIG_FW_LOADER is not set
CONFIG_MTD=m
CONFIG_MTD_TESTS=m
CONFIG_MTD_REDBOOT_PARTS=m
CONFIG_MTD_REDBOOT_DIRECTORY_BLOCK=1
CONFIG_MTD_AFS_PARTS=m
CONFIG_MTD_AR7_PARTS=m
CONFIG_MTD_CHAR=m
CONFIG_MTD_BLOCK=m
CONFIG_FTL=m
CONFIG_NFTL=m
CONFIG_NFTL_RW=y
CONFIG_INFTL=m
CONFIG_RFD_FTL=m
CONFIG_SSFDC=m
CONFIG_SM_FTL=m
CONFIG_MTD_OOPS=m
CONFIG_MTD_SWAP=m
CONFIG_MTD_CFI=m
CONFIG_MTD_JEDECPROBE=m
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=m
CONFIG_MTD_CFI_STAA=m
CONFIG_MTD_ROM=m
CONFIG_MTD_ABSENT=m
CONFIG_MTD_COMPLEX_MAPPINGS=y
CONFIG_MTD_PHYSMAP=m
CONFIG_MTD_PHYSMAP_COMPAT=y
CONFIG_MTD_PHYSMAP_OF=m
CONFIG_MTD_IMPA7=m
CONFIG_MTD_GPIO_ADDR=m
CONFIG_MTD_PLATRAM=m
CONFIG_MTD_LATCH_ADDR=m
CONFIG_MTD_DATAFLASH=m
CONFIG_MTD_DATAFLASH_WRITE_VERIFY=y
CONFIG_MTD_DATAFLASH_OTP=y
CONFIG_MTD_M25P80=m
# CONFIG_M25PXX_USE_FAST_READ is not set
CONFIG_MTD_SST25L=m
CONFIG_MTD_SLRAM=m
CONFIG_MTD_PHRAM=m
CONFIG_MTD_MTDRAM=m
CONFIG_MTD_BLOCK2MTD=m
CONFIG_MTD_DOC2000=m
CONFIG_MTD_DOC2001=m
CONFIG_MTD_DOC2001PLUS=m
CONFIG_MTD_DOCG3=m
CONFIG_MTD_DOCPROBE_ADVANCED=y
CONFIG_MTD_NAND_ECC_SMC=y
CONFIG_MTD_NAND=m
CONFIG_MTD_NAND_MUSEUM_IDS=y
CONFIG_MTD_NAND_GPIO=m
CONFIG_MTD_NAND_DISKONCHIP=m
CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADVANCED=y
CONFIG_MTD_NAND_DISKONCHIP_BBTWRITE=y
CONFIG_MTD_NAND_DOCG4=m
CONFIG_MTD_NAND_SLC_LPC32XX=m
CONFIG_MTD_NAND_MLC_LPC32XX=m
CONFIG_MTD_NAND_NANDSIM=m
CONFIG_MTD_NAND_PLATFORM=m
CONFIG_MTD_ALAUDA=m
CONFIG_MTD_ONENAND=m
CONFIG_MTD_ONENAND_VERIFY_WRITE=y
CONFIG_MTD_ONENAND_GENERIC=m
CONFIG_MTD_ONENAND_OTP=y
CONFIG_MTD_ONENAND_2X_PROGRAM=y
CONFIG_MTD_ONENAND_SIM=m
CONFIG_MTD_LPDDR=m
CONFIG_MTD_UBI=m
CONFIG_MTD_UBI_GLUEBI=m
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_CRYPTOLOOP=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=1
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_EEPROM_AT25=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_NETDEVICES=y
CONFIG_MII=y
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CIRRUS is not set
# CONFIG_NET_VENDOR_FARADAY is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MICROCHIP is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
CONFIG_LPC_ENET=y
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_STMICRO is not set
CONFIG_SMSC_PHY=y
# CONFIG_WLAN is not set
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=240
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=320
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_MOUSE is not set
CONFIG_INPUT_TOUCHSCREEN=y
CONFIG_TOUCHSCREEN_LPC32XX=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
# CONFIG_HW_RANDOM is not set
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_PNX=y
CONFIG_SPI=y
CONFIG_SPI_PL022=y
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_PNX4008_WATCHDOG=y
CONFIG_FB=y
CONFIG_FB_ARMCLCD=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_SOUND=y
CONFIG_SND=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
CONFIG_SND_SEQUENCER_OSS=y
# CONFIG_SND_SUPPORT_OLD_API is not set
# CONFIG_SND_VERBOSE_PROCFS is not set
CONFIG_SND_DEBUG=y
CONFIG_SND_DEBUG_VERBOSE=y
# CONFIG_SND_DRIVERS is not set
#

Re: [perf] make clean problematic bashism

2012-08-15 Thread Wouter M. Koolen


On 08/15/2012 12:26 PM, Peter Zijlstra wrote:

On Wed, 2012-08-15 at 11:52 +0200, Wouter M. Koolen wrote:

Dear perf maintainers,

I attempted to compile perf 3.5.1 without worrying about installing
dependencies first. The resulting error messages were quite helpful, and
led me to install a bunch of development libraries and flex.

Unfortunately, after installing flex the build still failed, even after
make clean.

The reason for this was a bunch of generated empty flex files in util/
that were not removed by make clean. They are intended to be erased,
since the Makefile executes

rm -f util/*-{bison,flex}*

however, this command does not remove the files. I guess because {,}
alternatives are only special in bash but the makefile is run with some
other shell?


ISTR us getting a number of such patches, did we miss a site, acme?


I got perf to compile now, but thought you would be interested to know
about this little problem.

With kind regards,

Wouter Koolen


PS: as a side note: GNU make has the .DELETE_ON_ERROR: special target,
which removes the target file when its generating command fails. This
would have prevented my problem and sounds like a good idea in general.
Maybe perf could make use of this feature when on GNU make?


I don't think we build with anything but gnu make, mind sending a patch
implementing your suggestion?




Hi Peter,

Some more information: my system has /bin/sh set to dash. I remember a 
question about this during Debian installation. I guess Ubuntu does 
something similar viz. https://lkml.org/lkml/2012/5/4/90


Patch attached :)

With kind regards,

Wouter
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 0eee64c..29b2373 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -1,3 +1,5 @@
+.DELETE_ON_ERROR:
+
 include ../scripts/Makefile.include
 
 # The default target of this Makefile is...

Re: [PATCH 0/4] promote zcache from staging

2012-08-15 Thread Konrad Rzeszutek Wilk

On Fri, Aug 10, 2012 at 01:14:01PM -0500, Seth Jennings wrote:
> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> > I also wonder if you have anything else unusual in your
> > test setup, such as a fast swap disk (mine is a partition
> > on the same rotating disk as source and target of the kernel build,
> > the default install for a RHEL6 system)?
> 
> I'm using a normal SATA HDD with two partitions, one for
> swap and the other an ext3 filesystem with the kernel source.
> 
> > Or have you disabled cleancache?
> 
> Yes, I _did_ disable cleancache.  I could see where having
> cleancache enabled could explain the difference in results.

Why did you disable the cleancache? Having both (cleancache
to compress fs data) and frontswap (to compress swap data) is the
goal - while you turned one of its sources off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/7] zram/zsmalloc promotion

2012-08-15 Thread Konrad Rzeszutek Wilk

On Tue, Aug 14, 2012 at 12:39:25PM -0500, Seth Jennings wrote:
> On 08/14/2012 12:36 AM, Nitin Gupta wrote:
> > On 08/13/2012 07:35 PM, Greg Kroah-Hartman wrote:
> >> On Wed, Aug 08, 2012 at 03:12:13PM +0900, Minchan Kim wrote:
> >>> This patchset promotes zram/zsmalloc from staging.
> >>> Both are very clean and zram is used by many embedded product
> >>> for a long time.
> >>>
> >>> [1-3] are patches not merged into linux-next yet but needed
> >>> it as base for [4-5] which promotes zsmalloc.
> >>> Greg, if you merged [1-3] already, skip them.
> >>
> >> I've applied 1-3 and now 4, but that's it, I can't apply the rest
> >> without getting acks from the -mm maintainers, sorry.  Please work with
> >> them to get those acks, and then I will be glad to apply the rest (after
> >> you resend them of course...)
> >>
> > 
> > On a second thought, I think zsmalloc should stay in drivers/block/zram
> > since zram is now the only user of zsmalloc since zcache and ramster are
> > moving to another allocator.
> 
> The removal of zsmalloc from zcache has not been agreed upon
> yet.


> 
> Dan _suggested_ removing zsmalloc as the persistent
> allocator for zcache in favor of zbud to solve "flaws" in
> zcache.  However, zbud has large deficiencies.
> 
> A zero-filled 4k page will compress with LZO to 103 bytes.
> zbud can only store two compressed pages in each memory pool
> page, resulting in 95% fragmentation (i.e. 95% of the memory
> pool page goes unused).  While this might not be a typical
> case, it is the worst case and absolutely does happen.
> 
> zbud's design also effectively limits the useful page
> compression to 50%. If pages are compressed beyond that, the
> added space savings is lost in memory pool fragmentation.
> For example, if two pages compress to 30% of their original
> size, those two pages take up 60% of the zbud memory pool
> page, and 40% is lost to fragmentation because zbud can't
> store anything in the remaining space.
> 
> To say it another way, for every two page cache pages that
> cleancache stores in zcache, zbud _must_ allocate a memory
> pool page, regardless of how well those pages compress.
> This reduces the efficiency of the page cache reclaim
> mechanism by half.
> 
> I have posted some work (zsmalloc shrinker interface, user
> registered alloc/free functions for the zsmalloc memory
> pool) that begins to make zsmalloc a suitable replacement
> for zbud, but that work was put on hold until the path out
> of staging was established.
> 
> I'm hoping to continue this work once the code is in
> mainline.  While zbud has deficiencies, it doesn't prevent
> zcache from having value as I have already demonstrated.
> However, replacing zsmalloc with zbud would step backward
> for the reasons mentioned above.

What would be nice is having only one engine instead
of two - and I believe that is what you and Dan are aiming at.

Dan is looking at it from the perspective of re-engineering
zcache to use an LRU for keeping track of pages and pushing
those to the compression engine. And redoing the zbud engine
a bit (I think, let me double-check the git tree he pointed
out).

> 
> I do not support the removal of zsmalloc from zcache.  As
> such, I think the zsmalloc code should remain independent.
> 
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-15 Thread Avi Kivity

On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
>> 
>> Userspace may want to modify the ROM (for example, when programming a
>> flash device).  It is also possible to map an hva range rw through one
>> slot and ro through another.
> 
> Right, can do that with multiple userspace maps to the same anonymous 
> memory region (see other email).

Yes it's possible.  It requires that we move all memory allocation to be
fd based, since userspace can't predict what memory will be dual-mapped
(at least if emulated hardware allows this).  Is this a reasonable
requirement?  Do ksm/thp/autonuma work with this?



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa

On 08/15/2012 01:42 PM, Glauber Costa wrote:
>> Also, as I
>> > have mentioned in the other email in this thread. Why should we reclaim
>> > just because of kernel allocation when we are not reclaiming any of it
>> > because shrink_slab is ignored in the memcg reclaim.
> 
> Don't get too distracted by the fact that shrink_slab is ignored. It is
> temporary, and while this being ignored now leads to suboptimal
> behavior, it will 1st, only affect its users, and 2nd, not be disastrous.
> 
> I see it this as more or less on pair with the soft limit reclaim
> problem we had. It is not ideal, but it already provided functionality
> 

Okay, I sent the e-mail before finishing it... duh

What I meant in this last sentence, is that the situation while the
memcg-aware shrinkers doesn't land in the kernel is more or less the
same (obviously not exactly) as with the soft reclaim work. It is an
evolutionary approach that provides some functionality that is not yet
perfect but already solves lots of problems for people willing to live
with its temporary drawbacks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/7] perf gtk/browser: Use hist_period_print functions

2012-08-15 Thread Pekka Enberg

On Mon, 6 Aug 2012, Namhyung Kim wrote:
> Now we can support color using pango markup with this change.
> 
> Cc: Pekka Enberg 
> Signed-off-by: Namhyung Kim 

Awesome!

Acked-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net PATCH v3 1/3] net: netprio: fix files lock and remove useless d_path bits

2012-08-15 Thread Neil Horman

On Tue, Aug 14, 2012 at 03:34:24PM -0700, John Fastabend wrote:
> Add lock to prevent a race with a file closing and also remove
> useless and ugly sscanf code. The extra code was never needed
> and the case it supposedly protected against is in fact handled
> correctly by sock_from_file as pointed out by Al Viro.
> 
> CC: Neil Horman 
> Reported-by: Al Viro 
> Signed-off-by: John Fastabend 
> ---
> 
>  net/core/netprio_cgroup.c |   22 --
>  1 files changed, 4 insertions(+), 18 deletions(-)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index ed0c043..f65dba3 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -277,12 +277,6 @@ out_free_devname:
>  void net_prio_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
>  {
>   struct task_struct *p;
> - char *tmp = kzalloc(sizeof(char) * PATH_MAX, GFP_KERNEL);
> -
> - if (!tmp) {
> - pr_warn("Unable to attach cgrp due to alloc failure!\n");
> - return;
> - }
>  
>   cgroup_taskset_for_each(p, cgrp, tset) {
>   unsigned int fd;
> @@ -296,32 +290,24 @@ void net_prio_attach(struct cgroup *cgrp, struct 
> cgroup_taskset *tset)
>   continue;
>   }
>  
> - rcu_read_lock();
> + spin_lock(>file_lock);
>   fdt = files_fdtable(files);
>   for (fd = 0; fd < fdt->max_fds; fd++) {
> - char *path;
>   struct file *file;
>   struct socket *sock;
> - unsigned long s;
> - int rv, err = 0;
> + int err;
>  
>   file = fcheck_files(files, fd);
>   if (!file)
>   continue;
>  
> - path = d_path(>f_path, tmp, PAGE_SIZE);
> - rv = sscanf(path, "socket:[%lu]", );
> - if (rv <= 0)
> - continue;
> -
>   sock = sock_from_file(file, );
> - if (!err)
> + if (sock)
>   sock_update_netprioidx(sock->sk, p);
>   }
> - rcu_read_unlock();
> + spin_unlock(>file_lock);
>   task_unlock(p);
>   }
> - kfree(tmp);
>  }
>  
>  static struct cftype ss_files[] = {
> 
> 
Acked-by: Neil Horman 

It looks good to me.  Al, could you please lend your review here too?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net PATCH v3 2/3] net: netprio: fd passed in SCM_RIGHTS datagram not set correctly

2012-08-15 Thread Neil Horman

On Tue, Aug 14, 2012 at 03:34:30PM -0700, John Fastabend wrote:
> A socket fd passed in a SCM_RIGHTS datagram was not getting
> updated with the new tasks cgrp prioidx. This leaves IO on
> the socket tagged with the old tasks priority.
> 
> To fix this add a check in the scm recvmsg path to update the
> sock cgrp prioidx with the new tasks value.
> 
> Thanks to Al Viro for catching this.
> 
> CC: Neil Horman 
> Reported-by: Al Viro 
> Signed-off-by: John Fastabend 
> ---
> 
>  net/core/scm.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/scm.c b/net/core/scm.c
> index 8f6ccfd..040cebe 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -265,6 +265,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie 
> *scm)
>   for (i=0, cmfptr=(__force int __user *)CMSG_DATA(cm); ii++, cmfptr++)
>   {
> + struct socket *sock;
>   int new_fd;
>   err = security_file_receive(fp[i]);
>   if (err)
> @@ -281,6 +282,9 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie 
> *scm)
>   }
>   /* Bump the usage count and install the file. */
>   get_file(fp[i]);
> + sock = sock_from_file(fp[i], );
> + if (sock)
> + sock_update_netprioidx(sock->sk, current);
>   fd_install(new_fd, fp[i]);
>   }
>  
> 
> 

Acked-by: Neil Horman 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net PATCH v3 3/3] net: netprio: fix cgrp create and write priomap race

2012-08-15 Thread Neil Horman

On Tue, Aug 14, 2012 at 03:34:35PM -0700, John Fastabend wrote:
> A race exists where creating cgroups and also updating the priomap
> may result in losing a priomap update. This is because priomap
> writers are not protected by rtnl_lock.
> 
> Move priority writer into rtnl_lock()/rtnl_unlock().
> 
> CC: Neil Horman 
> Reported-by: Al Viro 
> Signed-off-by: John Fastabend 
> ---
> 
>  net/core/netprio_cgroup.c |8 +++-
>  1 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index f65dba3..c75e3f9 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -101,12 +101,10 @@ static int write_update_netdev_table(struct net_device 
> *dev)
>   u32 max_len;
>   struct netprio_map *map;
>  
> - rtnl_lock();
>   max_len = atomic_read(_prioidx) + 1;
>   map = rtnl_dereference(dev->priomap);
>   if (!map || map->priomap_len < max_len)
>   ret = extend_netdev_table(dev, max_len);
> - rtnl_unlock();
>  
>   return ret;
>  }
> @@ -256,17 +254,17 @@ static int write_priomap(struct cgroup *cgrp, struct 
> cftype *cft,
>   if (!dev)
>   goto out_free_devname;
>  
> + rtnl_lock();
>   ret = write_update_netdev_table(dev);
>   if (ret < 0)
>   goto out_put_dev;
>  
> - rcu_read_lock();
> - map = rcu_dereference(dev->priomap);
> + map = rtnl_dereference(dev->priomap);
>   if (map)
>   map->priomap[prioidx] = priority;
> - rcu_read_unlock();
>  
>  out_put_dev:
> + rtnl_unlock();
>   dev_put(dev);
>  
>  out_free_devname:
> 
> 

Acked-by: Neil Horman 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Align MIPS swapper_pg_dir for faster code.

2012-08-15 Thread Ralf Baechle

On Tue, Aug 14, 2012 at 11:07:59AM -0700, David Daney wrote:

> From: David Daney 
> 
> The MIPS swapper_pg_dir needs 64K alignment for faster TLB refills in
> kernel mappings.  There are two parts to the patch set:
> 
> 1) Modify generic vmlinux.lds.h to allow architectures to place
>additional sections at the start of .bss.  This allows alignment
>constraints to be met with minimal holes added for padding.
>Putting this in common code should reduce the risk of future
>changes to the linker scripts not being propagated to MIPS (or any
>other architecture that needs something like this).
> 
> 2) Align the MIPS swapper_pg_dir.
> 
> Since the initial use of the code is for MIPS, perhaps both parts
> could be merged by Ralf's tree (after collecting any Acked-bys).

Looks good to me but will wait a bit longer for comments and (N)Acks
before merging.

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers

2012-08-15 Thread Peter Zijlstra

On Tue, 2012-08-14 at 17:18 -0700, Tejun Heo wrote:
> Let's see if we can agree on the latter point first.  Do you agree
> that it wouldn't be a good idea to implement relatively complex timer
> subsystem inside workqueue? 

RB-trees are fairly trivial to use, but can we please get back to why
people want to do del/mod delayed work from IRQ context?

I can get the queueing part, but why do they need to cancel and or
modify stuff?

Trying to come up with a solution to a problem you don't understand is
kinda difficult.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] crypto: twofish - add x86_64/avx assembler implementation

2012-08-15 Thread Jussi Kivilinna


Quoting Borislav Petkov :


On Wed, Aug 15, 2012 at 11:42:16AM +0300, Jussi Kivilinna wrote:

I started thinking about the performance on AMD Bulldozer.
vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers
on AMD CPU is alot slower (latencies from 8 to 12 cycles) than on
Intel sandy-bridge (where instructions have latency of 1 to 2). See:
http://www.agner.org/optimize/instruction_tables.pdf

It would be really good, if implementation could be tested on AMD CPU
to determinate, if it causes performance regression. However I don't
have access to machine with such CPU.


But I do. :)

And if you tell me exactly how to run the tests and on what kernel, I'll
try to do so.



Twofish-avx (CONFIG_TWOFISH_AVX_X86_64) is available in 3.6-rc1. For  
testing you need CRYPTO_TEST build as module. You should turn off  
turbo-core, freq-scaling, etc.


Testing twofish-avx ('async twofish' speed test):
 modprobe twofish-avx-x86_64
 modprobe tcrypt mode=504 sec=1

Testing twofish-x86_64-3way ('sync twofish' speed test):
 modprobe twofish-x86_64-3way
 modprobe tcrypt mode=202 sec=1

Loading tcrypt will block until tests are complete, after which  
modprobe will return with error. This is expected. Results are in  
kernel log.


-Jussi


HTH.

--
Regards/Gruss,
Boris.






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] scripts/decodecode: Fixup trapping instruction marker

2012-08-15 Thread Borislav Petkov

From: Borislav Petkov 

When dumping "Code: " sections from an oops, the trapping instruction %rip
points to can be a string copy

  2b:*  f3 a5   rep movsl %ds:(%rsi),%es:(%rdi)

and the line contain a bunch of ":". Current "cut" selects only the
and the second field output looks funnily overlaid this:

  2b:*  f3 a5   rep movsl %ds <-- trapping 
instruction:(%rsi),%es:(%rdi

Fix this by selecting the remaining fields too.

Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: linux-kbu...@vger.kernel.org
Signed-off-by: Borislav Petkov 
---
 scripts/decodecode | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/decodecode b/scripts/decodecode
index 18ba881c3415..4f8248d5a11f 100755
--- a/scripts/decodecode
+++ b/scripts/decodecode
@@ -89,7 +89,7 @@ echo $code >> $T.s
 disas $T
 cat $T.dis >> $T.aa
 
-faultline=`cat $T.dis | head -1 | cut -d":" -f2`
+faultline=`cat $T.dis | head -1 | cut -d":" -f2-`
 faultline=`echo "$faultline" | sed -e 's/\[/\\\[/g; s/\]/\\\]/g'`
 
 cat $T.oo | sed -e "s/\($faultline\)/\*\1 <-- trapping instruction/g"
-- 
1.7.11.rc1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [discussion]sched: a rough proposal to enable power saving in scheduler

2012-08-15 Thread Peter Zijlstra

On Mon, 2012-08-13 at 20:21 +0800, Alex Shi wrote:
> Since there is no power saving consideration in scheduler CFS, I has a
> very rough idea for enabling a new power saving schema in CFS.

Adding Thomas, he always delights poking holes in power schemes.

> It bases on the following assumption:
> 1, If there are many task crowd in system, just let few domain cpus
> running and let other cpus idle can not save power. Let all cpu take the
> load, finish tasks early, and then get into idle. will save more power
> and have better user experience.

I'm not sure this is a valid assumption. I've had it explained to me by
various people that race-to-idle isn't always the best thing. It has to
do with the cost of switching power states and the duration of execution
and other such things.

> 2, schedule domain, schedule group perfect match the hardware, and
> the power consumption unit. So, pull tasks out of a domain means
> potentially this power consumption unit idle.

I'm not sure I understand what you're saying, sorry.

> So, according Peter mentioned in commit 8e7fbcbc22c(sched: Remove stale
> power aware scheduling), this proposal will adopt the
> sched_balance_policy concept and use 2 kind of policy: performance, power.

Yay, ideally we'd also provide a 3rd option: auto, which simply switches
between the two based on AC/BAT, UPS status and simple things like that.
But this seems like a later concern, you have to have something to pick
between before you can pick :-)

> And in scheduling, 2 place will care the policy, load_balance() and in
> task fork/exec: select_task_rq_fair().

ack

> Here is some pseudo code try to explain the proposal behaviour in
> load_balance() and select_task_rq_fair();

Oh man.. A few words outlining the general idea would've been nice.

> load_balance() {
>   update_sd_lb_stats(); //get busiest group, idlest group data.
> 
>   if (sd->nr_running > sd's capacity) {
>   //power saving policy is not suitable for
>   //this scenario, it runs like performance policy
>   mv tasks from busiest cpu in busiest group to
>   idlest  cpu in idlest group;

Once upon a time we talked about adding a factor to the capacity for
this. So say you'd allow 2*capacity before overflowing and waking
another power group.

But I think we should not go on nr_running here, PJTs per-entity load
tracking stuff gives us much better measures -- also, repost that series
already Paul! :-)

Also, I'm not sure this is entirely correct, the thing you want to do
for power aware stuff is to minimize the number of active power domains,
this means you don't want idlest, you want least busy non-idle.

>   } else {// the sd has enough capacity to hold all tasks.
>   if (sg->nr_running > sg's capacity) {
>   //imbalanced between groups
>   if (schedule policy == performance) {
>   //when 2 busiest group at same busy
>   //degree, need to prefer the one has
>   // softest group??
>   move tasks from busiest group to
>   idletest group;

So I'd leave the currently implemented scheme as performance, and I
don't think the above describes the current state.

>   } else if (schedule policy == power)
>   move tasks from busiest group to
>   idlest group until busiest is just full
>   of capacity.
>   //the busiest group can balance
>   //internally after next time LB,

There's another thing we need to do, and that is collect tasks in a
minimal amount of power domains. The old code (that got deleted) did
something like that, you can revive some of the that code if needed -- I
just killed everything to be able to start with a clean slate.

>   } else {
>   //all groups has enough capacity for its tasks.
>   if (schedule policy == performance)
>   //all tasks may has enough cpu
>   //resources to run,
>   //mv tasks from busiest to idlest group?
>   //no, at this time, it's better to keep
>   //the task on current cpu.
>   //so, it is maybe better to do balance
>   //in each of groups
>   for_each_imbalance_groups()
>   move tasks from busiest cpu to
>   idlest cpu in each of groups;
>   else if (schedule policy == power) {
>   if (no hard pin in idlest group)
>   mv tasks from idlest group to
>

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread James Bottomley

On Wed, 2012-08-15 at 13:33 +0400, Glauber Costa wrote:
> > This can
> > be quite confusing.  I am still not sure whether we should mix the two
> > things together. If somebody wants to limit the kernel memory he has to
> > touch the other limit anyway.  Do you have a strong reason to mix the
> > user and kernel counters?
> 
> This is funny, because the first opposition I found to this work was
> "Why would anyone want to limit it separately?" =p
> 
> It seems that a quite common use case is to have a container with a
> unified view of "memory" that it can use the way he likes, be it with
> kernel memory, or user memory. I believe those people would be happy to
> just silently account kernel memory to user memory, or at the most have
> a switch to enable it.
> 
> What gets clear from this back and forth, is that there are people
> interested in both use cases.

Haven't we already had this discussion during the Prague get together?
We discussed the use cases and finally agreed to separate accounting for
k and then k+u mem because that satisfies both the Google and Parallels
cases.  No-one was overjoyed by k and k+u but no-one had a better
suggestion ... is there a better way of doing this that everyone can
agree to?

We do need to get this nailed down because it's the foundation of the
patch series.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Mel Gorman

On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > > What I think you should do is use rcu for access.
> > > > > > And here sync rcu before freeing.
> > > > > > Maybe an overkill but at least a documented synchronization
> > > > > > primitive, and it is very light weight.
> > > > > > 
> > > > > 
> > > > > I liked your suggestion on barriers, as well.
> > > > > 
> > > > 
> > > > I have not thought about this as deeply as I shouold but is simply 
> > > > rechecking
> > > > the mapping under the pages_lock to make sure the page is still a 
> > > > balloon
> > > > page an option? i.e. use pages_lock to stabilise page->mapping.
> > > 
> > > To clarify, are you concerned about cost of rcu_read_lock
> > > for non balloon pages?
> > > 
> > 
> > Not as such, but given the choice between introducing RCU locking and
> > rechecking page->mapping under a spinlock I would choose the latter as it
> > is more straight-forward.
> 
> OK but checking it how? page->mapping == balloon_mapping does not scale to
> multiple balloons,

I was thinking of exactly that page->mapping == balloon_mapping check. As I
do not know how many active balloon drivers there might be I cannot guess
in advance how much of a scalability problem it will be.

> so I hoped we can switch to
> page->mapping->flags & BALLOON_MAPPING or some such,
> but this means we dereference it outside the lock ...
> 

That also sounded like future stuff to me that would be justified with
profiling if necessary. Personally I would have started with the spinlock
and a simple check and moved to RCU later when either scalability was a
problem or it was found there was a need to stabilise whether a page was
a balloon page or not outside a spinlock.

This is not a NAK to the idea and I'm not objecting to RCU being used now
if that is what is really desired. I just suspect it's making the series
more complex than it needs to be right now.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH-V2] arm/dts: AM33XX: Set the default status of module to "disabled" state

2012-08-15 Thread Vaibhav Hiremath

Ideally in common SoC dtsi file should set all modules
to "disabled" state and it should get enabled in respective
EVM/Board dts file as per usage.

This patch sets default status of all modules to "disabled"
state in am33xx.dtsi file. Currently there are no modules
supported as part of Bone and EVM dts support, so care
to add entry "status = "okay"" while adding support for any
module.

Signed-off-by: Vaibhav Hiremath 
Acked-by: Arnd Bergmann 
Cc: Benoit Cousson 
Cc: Grant Likely 
Cc: Tony Lindgren 
---
Changes from V1:
- Fixed indentation issue caused due to extra spaces.

 arch/arm/boot/dts/am335x-bone.dts |6 ++
 arch/arm/boot/dts/am335x-evm.dts  |6 ++
 arch/arm/boot/dts/am33xx.dtsi |9 +
 3 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/am335x-bone.dts 
b/arch/arm/boot/dts/am335x-bone.dts
index a9af4db..a7906cb 100644
--- a/arch/arm/boot/dts/am335x-bone.dts
+++ b/arch/arm/boot/dts/am335x-bone.dts
@@ -17,4 +17,10 @@
device_type = "memory";
reg = <0x8000 0x1000>; /* 256 MB */
};
+
+   ocp {
+   uart1: serial@44E09000 {
+   status = "okay";
+   };
+   };
 };
diff --git a/arch/arm/boot/dts/am335x-evm.dts b/arch/arm/boot/dts/am335x-evm.dts
index d6a97d9..5dd8a6b 100644
--- a/arch/arm/boot/dts/am335x-evm.dts
+++ b/arch/arm/boot/dts/am335x-evm.dts
@@ -17,4 +17,10 @@
device_type = "memory";
reg = <0x8000 0x1000>; /* 256 MB */
};
+
+   ocp {
+   uart1: serial@44E09000 {
+   status = "okay";
+   };
+   };
 };
diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi
index 59509c4..5f6c8e3 100644
--- a/arch/arm/boot/dts/am33xx.dtsi
+++ b/arch/arm/boot/dts/am33xx.dtsi
@@ -102,36 +102,42 @@
compatible = "ti,omap3-uart";
ti,hwmods = "uart1";
clock-frequency = <4800>;
+   status = "disabled";
};

uart2: serial@48022000 {
compatible = "ti,omap3-uart";
ti,hwmods = "uart2";
clock-frequency = <4800>;
+   status = "disabled";
};

uart3: serial@48024000 {
compatible = "ti,omap3-uart";
ti,hwmods = "uart3";
clock-frequency = <4800>;
+   status = "disabled";
};

uart4: serial@481A6000 {
compatible = "ti,omap3-uart";
ti,hwmods = "uart4";
clock-frequency = <4800>;
+   status = "disabled";
};

uart5: serial@481A8000 {
compatible = "ti,omap3-uart";
ti,hwmods = "uart5";
clock-frequency = <4800>;
+   status = "disabled";
};

uart6: serial@481AA000 {
compatible = "ti,omap3-uart";
ti,hwmods = "uart6";
clock-frequency = <4800>;
+   status = "disabled";
};

i2c1: i2c@44E0B000 {
@@ -139,6 +145,7 @@
#address-cells = <1>;
#size-cells = <0>;
ti,hwmods = "i2c1";
+   status = "disabled";
};

i2c2: i2c@4802A000 {
@@ -146,6 +153,7 @@
#address-cells = <1>;
#size-cells = <0>;
ti,hwmods = "i2c2";
+   status = "disabled";
};

i2c3: i2c@4819C000 {
@@ -153,6 +161,7 @@
#address-cells = <1>;
#size-cells = <0>;
ti,hwmods = "i2c3";
+   status = "disabled";
};
};
 };
--
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Michael S. Tsirkin

On Wed, Aug 15, 2012 at 12:16:51PM +0100, Mel Gorman wrote:
> On Wed, Aug 15, 2012 at 01:01:08PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 10:48:39AM +0100, Mel Gorman wrote:
> > > On Wed, Aug 15, 2012 at 12:25:28PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Aug 15, 2012 at 10:05:28AM +0100, Mel Gorman wrote:
> > > > > On Tue, Aug 14, 2012 at 05:11:13PM -0300, Rafael Aquini wrote:
> > > > > > On Tue, Aug 14, 2012 at 10:51:39PM +0300, Michael S. Tsirkin wrote:
> > > > > > > What I think you should do is use rcu for access.
> > > > > > > And here sync rcu before freeing.
> > > > > > > Maybe an overkill but at least a documented synchronization
> > > > > > > primitive, and it is very light weight.
> > > > > > > 
> > > > > > 
> > > > > > I liked your suggestion on barriers, as well.
> > > > > > 
> > > > > 
> > > > > I have not thought about this as deeply as I shouold but is simply 
> > > > > rechecking
> > > > > the mapping under the pages_lock to make sure the page is still a 
> > > > > balloon
> > > > > page an option? i.e. use pages_lock to stabilise page->mapping.
> > > > 
> > > > To clarify, are you concerned about cost of rcu_read_lock
> > > > for non balloon pages?
> > > > 
> > > 
> > > Not as such, but given the choice between introducing RCU locking and
> > > rechecking page->mapping under a spinlock I would choose the latter as it
> > > is more straight-forward.
> > 
> > OK but checking it how? page->mapping == balloon_mapping does not scale to
> > multiple balloons,
> 
> I was thinking of exactly that page->mapping == balloon_mapping check. As I
> do not know how many active balloon drivers there might be I cannot guess
> in advance how much of a scalability problem it will be.

Not at all sure multiple drivers are worth supporting, but multiple
*devices* is I think worth supporting, if for no other reason than that
they can work today. For that, we need a device pointer which Rafael
wants to put into the mapping, this means multiple balloon mappings.


> > so I hoped we can switch to
> > page->mapping->flags & BALLOON_MAPPING or some such,
> > but this means we dereference it outside the lock ...
> > 
> 
> That also sounded like future stuff to me that would be justified with
> profiling if necessary. Personally I would have started with the spinlock
> and a simple check and moved to RCU later when either scalability was a
> problem or it was found there was a need to stabilise whether a page was
> a balloon page or not outside a spinlock.
> 
> This is not a NAK to the idea and I'm not objecting to RCU being used now
> if that is what is really desired. I just suspect it's making the series
> more complex than it needs to be right now.
> 
> -- 
> Mel Gorman
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:sched/core] tile: Remove SD_PREFER_LOCAL leftover

2012-08-15 Thread tip-bot for Alex Shi

Commit-ID:  c7660994ed6b44d17dad0aac0d156da1e0a2f003
Gitweb: http://git.kernel.org/tip/c7660994ed6b44d17dad0aac0d156da1e0a2f003
Author: Alex Shi 
AuthorDate: Wed, 15 Aug 2012 08:14:36 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 15 Aug 2012 13:22:55 +0200

tile: Remove SD_PREFER_LOCAL leftover

commit (sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code
clean up) removed SD_PREFER_LOCAL, but left a SD_PREFER_LOCAL usage in
arch/tile code. That breaks the arch/tile build.

Reported-by: Fengguang Wu 
Signed-off-by: Alex Shi 
Acked-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/502af3e6.3050...@intel.com
Signed-off-by: Thomas Gleixner 
---
 arch/tile/include/asm/topology.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index 7a7ce39..d5e86c9 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -69,7 +69,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 0*SD_WAKE_AFFINE  \
-   | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER   \
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Qemu-devel] [PATCH v8] kvm: notify host when the guest is panicked

2012-08-15 Thread Yan Vugenfirer


On Aug 14, 2012, at 10:35 PM, Anthony Liguori wrote:

> Marcelo Tosatti  writes:
> 
>> On Tue, Aug 14, 2012 at 01:53:01PM -0500, Anthony Liguori wrote:
>>> Marcelo Tosatti  writes:
>>> 
 On Tue, Aug 14, 2012 at 05:55:54PM +0300, Yan Vugenfirer wrote:
> 
> On Aug 14, 2012, at 1:42 PM, Jan Kiszka wrote:
> 
>> On 2012-08-14 10:56, Daniel P. Berrange wrote:
>>> On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote:
 On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote:
> We can know the guest is panicked when the guest runs on xen.
> But we do not have such feature on kvm.
> 
> Another purpose of this feature is: management app(for example:
> libvirt) can do auto dump when the guest is panicked. If management
> app does not do auto dump, the guest's user can do dump by hand if
> he sees the guest is panicked.
> 
> We have three solutions to implement this feature:
> 1. use vmcall
> 2. use I/O port
> 3. use virtio-serial.
> 
> We have decided to avoid touching hypervisor. The reason why I choose
> choose the I/O port is:
> 1. it is easier to implememt
> 2. it does not depend any virtual device
> 3. it can work when starting the kernel
 
 How about searching for the "Kernel panic - not syncing" string 
 in the guests serial output? Say libvirtd could take an action upon
 that?
>>> 
>>> No, this is not satisfactory. It depends on the guest OS being
>>> configured to use the serial port for console output which we
>>> cannot mandate, since it may well be required for other purposes.
>> 
> Please don't forget Windows guests, there is no console and no "Kernel 
> Panic" string ;)
> 
> What I used for debugging purposes on Windows guest is to register a 
> bugcheck callback in virtio-net driver and write 1 to VIRTIO_PCI_ISR 
> register.
> 
> Yan. 
 
 Considering whether a "panic-device" should cover other OSes is also \
>> 
 something to consider. Even for Linux, is "panic" the only case which
 should be reported via the mechanism? What about oopses without panic? 
 
 Is the mechanism general enough for supporting new events, etc.
>>> 
>>> Hi,
>>> 
>>> I think this discussion is gone of the deep end.
>>> 
>>> Forget about !x86 platforms.  They have their own way to do this sort of
>>> thing.  
>> 
>> The panic function in kernel/panic.c has the following options, which
>> appear to be arch independent, on panic:
>> 
>> - reboot 
>> - blink
> 
> Not sure the semantics of blink but that might be a good place for a
> pvops hook.
> 
>> 
>> None are paravirtual interfaces however.
>> 
>>> Think of this feature like a status LED on a motherboard.  These
>>> are very common and usually controlled by IO ports.
>>> 
>>> We're simply reserving a "status LED" for the guest to indicate that it
>>> has paniced.  Let's not over engineer this.
>> 
>> My concern is that you end up with state that is dependant on x86.
>> 
>> Subject: [PATCH v8 3/6] add a new runstate: RUN_STATE_GUEST_PANICKED
>> 
>> Having the ability to stop/restart the guest (and even introducing a 
>> new VM runstate) is more than a status LED analogy.
> 
> I must admit, I don't know why a new runstate is necessary/useful.  The
> kernel shouldn't have to care about the difference between a halted guest
> and a panicked guest.  That level of information belongs in userspace IMHO.
> 
>> Can this new infrastructure be used by other architectures?
> 
> I guess I don't understand why the kernel side of this isn't anything
> more than a paravirt op hook that does a single outb() with the
> remaining logic handled 100% in QEMU.
> 
>> Do you consider allowing support for Windows as overengineering?
> 
> I don't think there is a way to hook BSOD on Windows so attempting to
> engineer something that works with Windows seems odd, no?
> 

Actually there is a way 
(http://msdn.microsoft.com/en-us/library/windows/hardware/ff553105(v=vs.85).aspx).
 That's what I just mentioned already done in Windows virtio-net driver. 


Best regards,
Yan.

> Regards,
> 
> Anthony Liguori
> 
>> 
>>> Regards,
>>> 
>>> Anthony Liguori
>>> 
 
> 
>> Well, we have more than a single serial port, even when leaving
>> virtio-serial aside...
>> 
>> Jan
>> 
>> -- 
>> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
>> Corporate Competence Center Embedded Linux
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6][resend] mempolicy memory corruption fixlet

2012-08-15 Thread Josh Boyer

On Mon, Aug 6, 2012 at 3:32 PM, KOSAKI Motohiro
 wrote:
> On 7/31/2012 8:33 AM, Josh Boyer wrote:
>> On Mon, Jun 11, 2012 at 5:17 AM,   wrote:
>>> From: KOSAKI Motohiro 
>>>
>>> Hi
>>>
>>> This is trivial fixes of mempolicy meory corruption issues. There
>>> are independent patches each ather. and, they don't change userland
>>> ABIs.
>>>
>>> Thanks.
>>>
>>> changes from v1: fix some typo of changelogs s.
>>>
>>> ---
>>> KOSAKI Motohiro (6):
>>>   Revert "mm: mempolicy: Let vma_merge and vma_split handle
>>> vma->vm_policy linkages"
>>>   mempolicy: Kill all mempolicy sharing
>>>   mempolicy: fix a race in shared_policy_replace()
>>>   mempolicy: fix refcount leak in mpol_set_shared_policy()
>>>   mempolicy: fix a memory corruption by refcount imbalance in
>>> alloc_pages_vma()
>>>   MAINTAINERS: Added MEMPOLICY entry
>>>
>>>  MAINTAINERS|7 +++
>>>  mm/mempolicy.c |  151 
>>> 
>>>  mm/shmem.c |9 ++--
>>>  3 files changed, 120 insertions(+), 47 deletions(-)
>>
>> I don't see these patches queued anywhere.  They aren't in linux-next,
>> mmotm, or Linus' tree.  Did these get dropped?  Is the revert still
>> needed?
>
> Sorry. my fault. yes, it is needed. currently, Some LTP was fail since
> Mel's "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy 
> linkages" patch.

The series still isn't queued anywhere.  Are you planning on resending
it again, or should it get picked up in a particular tree?

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Improve sanity checking of timekeeping inputs

2012-08-15 Thread Josh Boyer

On Wed, Aug 8, 2012 at 3:36 PM, John Stultz  wrote:
> Thomas, Ingo,
> Here's a fix against tip/timers/urgent that addresses
> timekeeping edge cases detected by both a bad BIOS and system
> fuzzing w/ trinity. Thanks to Sasha Levin and CAI Qian for
> finding and reporting these!
>
> Let me know if you have any tweaks you want to see.
>
> thanks
> -john
>
> Unexpected behavior could occur if the time is set to
> a value large enough to overflow a 64bit ktime_t
> (which is something larger then the year 2262).
>
> Also unexpected behavior could occur if large negative
> offsets are injected via adjtimex.
>
> So this patch improves the sanity check timekeeping inputs
> by improving the timespec_valid() check, and then makes better
> use of timespec_valid() to make sure we don't set the time to
> an invalid negative value or one that overflows ktime_t.
>
> Note: This does not protect from setting the time close to
> overflowing ktime_t and then letting natural accumulation
> cause the overflow.
>
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Prarit Bhargava 
> Cc: Thomas Gleixner 
> Cc: Zhouping Liu 
> Cc: CAI Qian 
> Cc: Sasha Levin 
> Cc: sta...@vger.kernel.org
> Reported-by: CAI Qian 
> Reported-by: Sasha Levin 
> Signed-off-by: John Stultz 
> ---
>  include/linux/ktime.h |7 ---
>  include/linux/time.h  |   22 --
>  kernel/time/timekeeping.c |   26 --
>  3 files changed, 44 insertions(+), 11 deletions(-)

This patch fixes a boot regression on machines with crappy BIOS.  Is
this going to get committed soon?

https://bugzilla.redhat.com/show_bug.cgi?id=844249

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] Call netif_carrier_off() after register_netdev()

2012-08-15 Thread Bjørn Mork

Ben Hutchings  writes:

> But if you do it beforehand then it doesn't have the intended effect.
> (Supposed to be fixed by 22604c866889c4b2e12b73cbf1683bda1b72a313, which
> had to be reverted: c276e098d3ee33059b4a1c747354226cec58487c.)
>
> So you have to do it after, but without dropping the RTNL lock in
> between.

So you may want to add something like

int register_netdev_carrier_off(struct net_device *dev)
{
int err;

rtnl_lock();
err = register_netdevice(dev);
if (!err)
set_bit(__LINK_STATE_NOCARRIER, >state)
rtnl_unlock();
return err;
}


for these drivers?



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Qemu-devel] [PATCH v8] kvm: notify host when the guest is panicked

2012-08-15 Thread Yan Vugenfirer

On Aug 15, 2012, at 12:56 PM, Gleb Natapov wrote:

> On Tue, Aug 14, 2012 at 02:35:34PM -0500, Anthony Liguori wrote:
>>> Do you consider allowing support for Windows as overengineering?
>> 
>> I don't think there is a way to hook BSOD on Windows so attempting to
>> engineer something that works with Windows seems odd, no?
>> 
> Yan says in other email that is is possible to register a bugcheck callback.
> 

Here you go - 
http://msdn.microsoft.com/en-us/library/windows/hardware/ff553105(v=vs.85).aspx
Already done in virtio-net for two reasons: 1. we could configure virtio-net to 
notify QEMU in a hacky way (write 1 to VIRTIO_PCI_ISR register) that there was 
a bugckeck .It was very useful debugging complex WHQL issues that involved host 
networking. 2. Store additional information (for example time stamps of last 
receive packet, last interrupt and etc) in crash dump.

Yan.

> --
>   Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Update LZO compression

2012-08-15 Thread Markus F.X.J. Oberhumer

Hi Johannes,

On 2012-08-14 14:39, Johannes Stezenbach wrote:
> On Tue, Aug 14, 2012 at 01:44:02AM +0200, Markus F.X.J. Oberhumer wrote:
>> On 2012-07-16 20:30, Markus F.X.J. Oberhumer wrote:
>>>
>>> As stated in the README this version is significantly faster (typically more
>>> than 2 times faster!) than the current version, has been thoroughly tested 
>>> on
>>> x86_64/i386/powerpc platforms and is intended to get included into the
>>> official Linux 3.6 or 3.7 release.
>>>
>>> I encourage all compression users to test and benchmark this new version,
>>> and I also would ask some official LZO maintainer to convert the updated
>>> source files into a GIT commit and possibly push it to Linus or linux-next.
> 
> Sorry for not reporting earlier, but I didn't have time to do real
> benchmarks, just a quick test on ARM926EJ-S using barebox,
> and found in the new version decompression is slower:
> http://lists.infradead.org/pipermail/barebox/2012-July/008268.html

I can only guess, but maybe your ARM cpu does not have an efficient
implementation of {get,put}_unaligned().

Could you please try the following patch and test if you can see
any significant speed difference?

Thanks,
Markus


diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index ddc8db5..efc5714 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -12,8 +12,15 @@
  */


+#if defined(__arm__)
+#define COPY4(dst, src)\
+   (dst)[0] = (src)[0]; (dst)[1] = (src)[1]; \
+   (dst)[2] = (src)[2]; (dst)[3] = (src)[3]
+#endif
+#ifndef COPY4
 #define COPY4(dst, src)\
put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#endif
 #if defined(__x86_64__)
 #define COPY8(dst, src)\
put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst))


> 
> BTW, do you have userspace code matching the old and new
> lzo versions?  It would be easier to benchmark.
> 
> Unfortunately I cannot claim high confidence in my benchmark results
> due to missing time to do it properly, it would be useful if
> someone else could do some benchmarks on ARM before merging this.
> 
> 
> Johannes 

-- 
Markus Oberhumer, , http://www.oberhumer.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: O_DIRECT to md raid 6 is slow

2012-08-15 Thread John Robinson


On 15/08/2012 01:49, Andy Lutomirski wrote:

If I do:
# dd if=/dev/zero of=/dev/md0p1 bs=8M

[...]

It looks like md isn't recognizing that I'm writing whole stripes when
I'm in O_DIRECT mode.


I see your md device is partitioned. Is the partition itself stripe-aligned?

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mtd: kill MTD_NAND_VERIFY_WRITE

2012-08-15 Thread Marek Vasut

Dear Huang Shijie,

> 于 2012年08月15日 15:06, Shmulik Ladkani 写道:
> > Hi Huang,
> > 
> > On Tue, 14 Aug 2012 22:38:45 -0400 Huang Shijie  wrote:
> >> diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig
> >> index 588e989..0ca7257 100644
> >> --- a/drivers/mtd/nand/Kconfig
> >> +++ b/drivers/mtd/nand/Kconfig
> >> @@ -22,15 +22,6 @@ menuconfig MTD_NAND
> >> 
> >>   if MTD_NAND
> >> 
> >> -config MTD_NAND_VERIFY_WRITE
> >> -  bool "Verify NAND page writes"
> >> -  help
> >> -This adds an extra check when data is written to the flash. The
> >> -NAND flash device internally checks only bits transitioning
> >> -from 1 to 0. There is a rare possibility that even though the
> >> -device thinks the write was successful, a bit could have been
> >> -flipped accidentally due to device wear or something else.
> >> -
> > 
> > There are some defconfig files which set CONFIG_MTD_NAND_VERIFY_WRITE.
> > 
> > I guess you should submit an accompanying patch that removes
> > CONFIG_MTD_NAND_VERIFY_WRITE from all defconfig files.
> 
> thanks a lot.
> 
> I will send out a separate patch to fix it.

I'd still prefer for this to be rather fixed. It seems to be able to find some 
obvious mistakes etc.

[...]

Best regards,
Marek Vasut
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/7] HID: picoLCD updates

2012-08-15 Thread Jiri Kosina

On Wed, 15 Aug 2012, Bruno Prémont wrote:

> > > [ 6383.521833] 
> > > =
> > > [ 6383.530020] BUG kmalloc-64 (Not tainted): Object already free
> > > [ 6383.530020] 
> > > -
> > > [ 6383.530020] 
> > > [ 6383.530020] INFO: Slab 0xdde0ea20 objects=51 used=40 fp=0xcef516e0 
> > > flags=0x4080
> > > [ 6383.530020] INFO: Object 0xcef51190 @offset=400 fp=0xcef51f50
> > > [ 6383.530020] 
> > > [ 6383.530020] Bytes b4 cef51180: cc cc cc cc d0 12 f5 ce 5a 5a 5a 5a 5a 
> > > 5a 5a 5a  
> > > [ 6383.530020] Object cef51190: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> > > 6b 6b  
> > > [ 6383.530020] Object cef511a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> > > 6b 6b  
> > > [ 6383.530020] Object cef511b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> > > 6b 6b  
> > > [ 6383.530020] Object cef511c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
> > > 6b a5  kkk.
> > > [ 6383.530020] Redzone cef511d0: bb bb bb bb  
> > > 
> > > [ 6383.530020] Padding cef511d8: 5a 5a 5a 5a 5a 5a 5a 5a  
> > > 
> > > [ 6383.530020] Pid: 1922, comm: bash Not tainted 
> > > 3.5.0-jupiter-3-g8d858b1-dirty #2
> > > [ 6383.530020] Call Trace:
> > > [ 6383.530020]  [] print_trailer+0x11c/0x130
> > > [ 6383.530020]  [] object_err+0x35/0x40
> > > [ 6383.530020]  [] free_debug_processing+0x99/0x200
> > > [ 6383.530020]  [] __slab_free+0x2e/0x280
> > > [ 6383.530020]  [] ? hid_submit_out+0xa4/0x120
> > > [ 6383.530020]  [] ? __usbhid_submit_report+0xc0/0x3c0
> > > [ 6383.530020]  [] ? kfree+0xfa/0x110
> > > [ 6383.530020]  [] ? picolcd_debug_out_report+0x8c4/0x8e0 
> > > [hid_picolcd]
> > > [ 6383.530020]  [] kfree+0xfa/0x110
> > > [ 6383.530020]  [] ? hid_submit_out+0xa4/0x120
> > > [ 6383.530020]  [] ? hid_submit_out+0xa4/0x120
> > > [ 6383.530020]  [] ? hid_submit_out+0xa4/0x120
> > > [ 6383.530020]  [] hid_submit_out+0xa4/0x120
> > > [ 6383.530020]  [] __usbhid_submit_report+0x158/0x3c0
> > > [ 6383.530020]  [] usbhid_submit_report+0x1b/0x30
> > > [ 6383.530020]  [] picolcd_fb_reset+0xb9/0x180 [hid_picolcd]
> > > [ 6383.530020]  [] picolcd_init_framebuffer+0x20d/0x2e0 
> > > [hid_picolcd]
> > > [ 6383.530020]  [] picolcd_probe+0x3cc/0x580 [hid_picolcd]
> > > [ 6383.530020]  [] hid_device_probe+0x67/0xf0
> > > [ 6383.530020]  [] ? driver_sysfs_add+0x57/0x80
> > > [ 6383.530020]  [] driver_probe_device+0xbd/0x1c0
> > > [ 6383.530020]  [] ? hid_match_device+0x7b/0x90
> > > [ 6383.530020]  [] driver_bind+0x75/0xd0
> > > [ 6383.530020]  [] ? driver_unbind+0x90/0x90
> > > [ 6383.530020]  [] drv_attr_store+0x27/0x30
> > > [ 6383.530020]  [] sysfs_write_file+0xac/0xf0
> > > [ 6383.530020]  [] vfs_write+0x9c/0x130
> > > [ 6383.530020]  [] ? sys_dup3+0x11f/0x160
> > > [ 6383.530020]  [] ? sysfs_poll+0x90/0x90
> > > [ 6383.530020]  [] sys_write+0x3d/0x70
> > > [ 6383.530020]  [] sysenter_do_call+0x12/0x26
> > 
> > So I am wondering whether the path this happens on is
> > 
> > if (!test_bit(HID_OUT_RUNNING, >iofl)) {
> > usbhid_restart_out_queue(usbhid);
> > 
> > in __usbhid_submit_report(). It would then indicate perhaps some race with 
> > iofl handling.
> 
> Huh, that specific test_bit hunk I can't find in __usbhid_submit_report,
> is that 3.6 material?
> I'm running my tests against 3.5...

I see. Alan Stern has fixed a huge pile of things in this area in 3.6-rc1. 
I have expected all of those to actually be on theoretical problems not 
ever having happened in the wild, but it might be that you are actually 
chasing on of those.

Could you please retest with latest Linus' tree (or at least eb055fd0560b) 
to see whether this hasn't actually been fixed already by Alan's series?

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udf: fix retun value on error path in udf_load_logicalvol

2012-08-15 Thread Jan Kara

On Wed 15-08-12 00:38:08, Nikola Pajkovsky wrote:
> In case we detect a problem and bail out, we fail to set "ret" to a
> nonzero value, and udf_load_logicalvol will mistakenly report success.
> 
> Signed-off-by: Nikola Pajkovsky 
  Thanks. I've added the patch to my tree and will send it to Linus soon.

Honza

> ---
>  fs/udf/super.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/udf/super.c b/fs/udf/super.c
> index dcbf987..c96bd77 100644
> --- a/fs/udf/super.c
> +++ b/fs/udf/super.c
> @@ -1344,6 +1344,7 @@ static int udf_load_logicalvol(struct super_block *sb, 
> sector_t block,
>   udf_err(sb, "error loading logical volume descriptor: "
>   "Partition table too long (%u > %lu)\n", table_len,
>   sb->s_blocksize - sizeof(*lvd));
> + ret = 1;
>   goto out_bh;
>   }
>  
> @@ -1388,8 +1389,10 @@ static int udf_load_logicalvol(struct super_block *sb, 
> sector_t block,
>   UDF_ID_SPARABLE,
>   strlen(UDF_ID_SPARABLE))) {
>   if (udf_load_sparable_map(sb, map,
> - (struct sparablePartitionMap *)gpm) < 0)
> + (struct sparablePartitionMap *)gpm) < 0) {
> + ret = 1;
>   goto out_bh;
> + }
>   } else if (!strncmp(upm2->partIdent.ident,
>   UDF_ID_METADATA,
>   strlen(UDF_ID_METADATA))) {
> -- 
> 1.7.10.2
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 1/6] kvm: Allow filtering of acked irqs

2012-08-15 Thread Michael S. Tsirkin

On Fri, Aug 10, 2012 at 04:37:17PM -0600, Alex Williamson wrote:
> Registering an kvm_irq_ack_notifier with kian.irq_source_id < 0
> retains existing behavior, filling in the actual irq_source_id results
> in the callback only being called when the specified irq_source_id is
> asserting the given gsi.
> 
> The i8254 PIT remains unfiltered because it de-asserts it's irq source
> id, so it's notifier would never get called otherwise.  KVM device
> assignment gets filtering as it de-asserts the GSI in it's notifier.
> 
> Signed-off-by: Alex Williamson 

Looks good to me. For the record, I expect this to help if
- an assigned device interrupt is shared in host
  so we use slow config cycles in the ack notifier
- said device is sharing interrupt with another device in guest
- said another device is actually driving most interrupts
For example, I think this could be tested
by booting guest with pci=nomsi.

A minor suggestions below but
nothing that needs to block this patch.

> ---
> 
>  arch/x86/kvm/i8254.c |1 +
>  arch/x86/kvm/i8259.c |8 +++-
>  include/linux/kvm_host.h |4 +++-
>  virt/kvm/assigned-dev.c  |1 +
>  virt/kvm/ioapic.c|5 -
>  virt/kvm/irq_comm.c  |6 --
>  6 files changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index adba28f..2355d19 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -709,6 +709,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
>   hrtimer_init(_state->pit_timer.timer,
>CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
>   pit_state->irq_ack_notifier.gsi = 0;
> + pit_state->irq_ack_notifier.irq_source_id = -1; /* No filter */

A bit prettier would be to
#define KVM_NO_IRQ_SOURCE_ID (-1)
and test for it explicitly.

>   pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
>   kvm_register_irq_ack_notifier(kvm, _state->irq_ack_notifier);
>   pit_state->pit_timer.reinject = true;
> diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> index e498b18..d2175a9 100644
> --- a/arch/x86/kvm/i8259.c
> +++ b/arch/x86/kvm/i8259.c
> @@ -74,9 +74,14 @@ static void pic_unlock(struct kvm_pic *s)
>  
>  static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
>  {
> + unsigned long irq_source_ids;
> +
>   s->isr &= ~(1 << irq);
>   if (s != >pics_state->pics[0])
>   irq += 8;
> +
> + irq_source_ids = s->pics_state->irq_states[irq];
> +
>   /*
>* We are dropping lock while calling ack notifiers since ack
>* notifier callbacks for assigned devices call into PIC recursively.
> @@ -84,7 +89,8 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
>* it should be safe since PIC state is already updated at this stage.
>*/
>   pic_unlock(s->pics_state);
> - kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq);
> + kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq,
> +  irq_source_ids);
>   pic_lock(s->pics_state);
>  }
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b70b48b..2ad3e4a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -577,6 +577,7 @@ int kvm_is_mmio_pfn(pfn_t pfn);
>  
>  struct kvm_irq_ack_notifier {
>   struct hlist_node link;
> + int irq_source_id;
>   unsigned gsi;
>   void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
>  };
> @@ -627,7 +628,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
> *ioapic,
>  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
>  int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
> *kvm,
>   int irq_source_id, int level);
> -void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
> +void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
> +   unsigned long irq_source_ids);
>  void kvm_register_irq_ack_notifier(struct kvm *kvm,
>  struct kvm_irq_ack_notifier *kian);
>  void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index 23a41a9..a08c9c1 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -407,6 +407,7 @@ static int assigned_device_enable_guest_intx(struct kvm 
> *kvm,
>  {
>   dev->guest_irq = irq->guest_irq;
>   dev->ack_notifier.gsi = irq->guest_irq;
> + dev->ack_notifier.irq_source_id = dev->irq_source_id;
>   return 0;
>  }
>  
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index ef61d52..1a9f445 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -241,10 +241,12 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic 
> *ioapic, int vector,
>  
>   for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>   union kvm_ioapic_redirect_entry *ent = >redirtbl[i];
> +

Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-15 Thread Rafael Aquini

On Tue, Aug 14, 2012 at 10:31:09PM +0300, Michael S. Tsirkin wrote:
> > > now CPU1 executes the next instruction:
> > > 
> > > }
> > > 
> > > which would normally return to function's caller,
> > > but it has been overwritten by CPU2 so we get corruption.
> > > 
> > > No?
> > 
> > At the point CPU2 is unloading the module, it will be kept looping at the
> > snippet Rusty pointed out because the isolation / migration steps do not 
> > mess
> > with 'vb->num_pages'. The driver will only unload after leaking the total 
> > amount
> > of balloon's inflated pages, which means (for this hypothetical case) CPU2 
> > will
> > wait until CPU1 finishes the putaback procedure.
> > 
> 
> Yes but only until unlock finishes. The last return from function
> is not guarded and can be overwritten.

CPU1 will be returning to putback_balloon_page() which code is located at core
mm/compaction.c, outside the driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko

On Wed 15-08-12 13:33:55, Glauber Costa wrote:
[...]
> > This can
> > be quite confusing.  I am still not sure whether we should mix the two
> > things together. If somebody wants to limit the kernel memory he has to
> > touch the other limit anyway.  Do you have a strong reason to mix the
> > user and kernel counters?
> 
> This is funny, because the first opposition I found to this work was
> "Why would anyone want to limit it separately?" =p
> 
> It seems that a quite common use case is to have a container with a
> unified view of "memory" that it can use the way he likes, be it with
> kernel memory, or user memory. I believe those people would be happy to
> just silently account kernel memory to user memory, or at the most have
> a switch to enable it.
> 
> What gets clear from this back and forth, is that there are people
> interested in both use cases.

I am still not 100% sure myself. It is just clear that the reclaim would
need some work in order to do accounting like this.

> > My impression was that kernel allocation should simply fail while user
> > allocations might reclaim as well. Why should we reclaim just because of
> > the kernel allocation (which is unreclaimable from hard limit reclaim
> > point of view)?
> 
> That is not what the kernel does, in general. We assume that if he wants
> that memory and we can serve it, we should. Also, not all kernel memory
> is unreclaimable. We can shrink the slabs, for instance. Ying Han
> claims she has patches for that already...

Are those patches somewhere around?

[...]
> > This doesn't check for the hierachy so kmem_accounted might not be in 
> > sync with it's parents. mem_cgroup_create (below) needs to copy
> > kmem_accounted down from the parent and the above needs to check if this
> > is a similar dance like mem_cgroup_oom_control_write.
> > 
> 
> I don't see why we have to.
> 
> I believe in a A/B/C hierarchy, C should be perfectly able to set a
> different limit than its parents. Note that this is not a boolean.

Ohh, I wasn't clear enough. I am not against setting the _limit_ I just
meant that the kmem_accounted should be consistent within the hierarchy.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH V2 02/18] Drivers: hv: Add KVP definitions for IP address injection

2012-08-15 Thread KY Srinivasan



> -Original Message-
> From: devel-boun...@linuxdriverproject.org [mailto:devel-
> boun...@linuxdriverproject.org] On Behalf Of KY Srinivasan
> Sent: Monday, August 13, 2012 10:57 PM
> To: Greg KH
> Cc: o...@aepfle.de; linux-kernel@vger.kernel.org; 
> virtualizat...@lists.osdl.org;
> a...@canonical.com; de...@linuxdriverproject.org; b...@decadent.org.uk
> Subject: RE: [PATCH V2 02/18] Drivers: hv: Add KVP definitions for IP address
> injection
> 
> 
> 
> > -Original Message-
> > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > Sent: Monday, August 13, 2012 9:38 PM
> > To: KY Srinivasan
> > Cc: linux-kernel@vger.kernel.org; de...@linuxdriverproject.org;
> > virtualizat...@lists.osdl.org; o...@aepfle.de; a...@canonical.com;
> > b...@decadent.org.uk
> > Subject: Re: [PATCH V2 02/18] Drivers: hv: Add KVP definitions for IP 
> > address
> > injection
> >
> > On Mon, Aug 13, 2012 at 10:06:51AM -0700, K. Y. Srinivasan wrote:
> > > Add the necessary definitions for supporting the IP injection 
> > > functionality.
> > >
> > > Signed-off-by: K. Y. Srinivasan 
> > > Reviewed-by: Haiyang Zhang 
> > > Reviewed-by: Olaf Hering 
> > > Reviewed-by: Ben Hutchings 
> > > ---
> > >  drivers/hv/hv_util.c |4 +-
> > >  include/linux/hyperv.h   |   76
> > -
> > >  tools/hv/hv_kvp_daemon.c |2 +-
> > >  3 files changed, 77 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
> > > index d3ac6a4..a0667de 100644
> > > --- a/drivers/hv/hv_util.c
> > > +++ b/drivers/hv/hv_util.c
> > > @@ -263,7 +263,7 @@ static int util_probe(struct hv_device *dev,
> > >   (struct hv_util_service *)dev_id->driver_data;
> > >   int ret;
> > >
> > > - srv->recv_buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
> > > + srv->recv_buffer = kmalloc(PAGE_SIZE * 2, GFP_KERNEL);
> > >   if (!srv->recv_buffer)
> > >   return -ENOMEM;
> > >   if (srv->util_init) {
> > > @@ -274,7 +274,7 @@ static int util_probe(struct hv_device *dev,
> > >   }
> > >   }
> > >
> > > - ret = vmbus_open(dev->channel, 2 * PAGE_SIZE, 2 * PAGE_SIZE, NULL,
> > 0,
> > > + ret = vmbus_open(dev->channel, 4 * PAGE_SIZE, 4 * PAGE_SIZE, NULL,
> > 0,
> > >   srv->util_cb, dev->channel);
> > >   if (ret)
> > >   goto error;
> > > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> > > index 68ed7f7..11afc4e 100644
> > > --- a/include/linux/hyperv.h
> > > +++ b/include/linux/hyperv.h
> > > @@ -122,12 +122,53 @@
> > >  #define REG_U32 4
> > >  #define REG_U64 8
> > >
> > > +/*
> > > + * As we look at expanding the KVP functionality to include
> > > + * IP injection functionality, we need to maintain binary
> > > + * compatibility with older daemons.
> > > + *
> > > + * The KVP opcodes are defined by the host and it was unfortunate
> > > + * that I chose to treat the registration operation as part of the
> > > + * KVP operations defined by the host.
> > > + * Here is the level of compatibility
> > > + * (between the user level daemon and the kernel KVP driver) that we
> > > + * will implement:
> > > + *
> > > + * An older daemon will always be supported on a newer driver.
> > > + * A given user level daemon will require a minimal version of the
> > > + * kernel driver.
> > > + * If we cannot handle the version differences, we will fail gracefully
> > > + * (this can happen when we have a user level daemon that is more
> > > + * advanced than the KVP driver.
> > > + *
> > > + * We will use values used in this handshake for determining if we have
> > > + * workable user level daemon and the kernel driver. We begin by taking 
> > > the
> > > + * registration opcode out of the KVP opcode namespace. We will however,
> > > + * maintain compatibility with the existing user-level daemon code.
> > > + */
> > > +
> > > +/*
> > > + * Daemon code not supporting IP injection (legacy daemon).
> > > + */
> > > +
> > > +#define KVP_OP_REGISTER  4
> >
> > Huh?
> >
> > > +/*
> > > + * Daemon code supporting IP injection.
> > > + * The KVP opcode field is used to communicate the
> > > + * registration information; so define a namespace that
> > > + * will be distinct from the host defined KVP opcode.
> > > + */
> > > +
> > > +#define KVP_OP_REGISTER1 100
> > > +
> > >  enum hv_kvp_exchg_op {
> > >   KVP_OP_GET = 0,
> > >   KVP_OP_SET,
> > >   KVP_OP_DELETE,
> > >   KVP_OP_ENUMERATE,
> > > - KVP_OP_REGISTER,
> > > + KVP_OP_GET_IP_INFO,
> > > + KVP_OP_SET_IP_INFO,
> >
> > So you overloaded the command and somehow think that is ok?  How is that
> > supposed to work?  Why not just always keep it there, but fail if it is
> > called as you know you have a mismatch?
> >
> > Otherwise, again, you just broke older tools on a newer kernel.
> >
> > Or am I missing something here?
> 
> Greg,
> 
> The registration operation occurs when the daemon first starts up. I should 
> have
> established
> a distinct namespace for the daemon versions that would not

Re: [PATCH] act_mirred: do not drop packets when fails to mirror it

2012-08-15 Thread Jamal Hadi Salim


On Wed, 2012-08-15 at 17:37 +0800, Jason Wang wrote:
> We drop packet unconditionally when we fail to mirror it. This is not intended
> in some cases.

Hi Jason,
Did you actually notice the behavior you described or were you going by
the XXX comment I had in the code?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] UBI: use the whole MTD device size to get bad_peb_limit

2012-08-15 Thread Artem Bityutskiy

On Wed, 2012-07-18 at 10:30 +0200, Richard Genoud wrote:
> So the per1024 thing was really to stick to the device layout and to
> be easier for users (IMHO)

Convinced, thanks!

-- 
Best Regards,
Artem Bityutskiy


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] crypto: twofish - add x86_64/avx assembler implementation

2012-08-15 Thread Borislav Petkov

Ok, here we go. Raw data below.

On Wed, Aug 15, 2012 at 02:00:16PM +0300, Jussi Kivilinna wrote:
> >And if you tell me exactly how to run the tests and on what kernel,
> >I'll try to do so.

Ok, the box is a single-socket Bulldozer: "AMD FX(tm)-8100 Eight-Core
Processor stepping 02"; kernel is 3.6.0-rc1+ which is latest Linus +
tip/master merged ontop.

> Twofish-avx (CONFIG_TWOFISH_AVX_X86_64) is available in 3.6-rc1. For

I took CONFIG_CRYPTO_TWOFISH_AVX_X86_64 but I'm pretty sure you meant
that.

> testing you need CRYPTO_TEST build as module. You should turn off
> turbo-core, freq-scaling, etc.

$ for i in $(seq 0 7); do echo "performance" > 
/sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor ; done
$ for i in $(seq 0 7); do echo 0 > /sys/devices/system/cpu/cpu$i/cpufreq/cpb ; 
done

> Testing twofish-avx ('async twofish' speed test):
>  modprobe twofish-avx-x86_64
>  modprobe tcrypt mode=504 sec=1

$ modprobe twofish-avx-x86_64
$ modprobe tcrypt mode=504 sec=1

[  224.672094] 
[  224.672094] testing speed of async ecb(twofish) encryption
[  224.681444] test 0 (128 bit key, 16 byte blocks): 4862478 operations in 1 
seconds (77799648 bytes)
[  225.689190] test 1 (128 bit key, 64 byte blocks): 2040557 operations in 1 
seconds (130595648 bytes)
[  226.695864] test 2 (128 bit key, 256 byte blocks): 564098 operations in 1 
seconds (144409088 bytes)
[  227.702365] test 3 (128 bit key, 1024 byte blocks): 156553 operations in 1 
seconds (160310272 bytes)
[  228.708960] test 4 (128 bit key, 8192 byte blocks): 20128 operations in 1 
seconds (164888576 bytes)
[  229.715485] test 5 (192 bit key, 16 byte blocks): 4853879 operations in 1 
seconds (77662064 bytes)
[  230.722165] test 6 (192 bit key, 64 byte blocks): 2040187 operations in 1 
seconds (130571968 bytes)
[  231.729110] test 7 (192 bit key, 256 byte blocks): 564125 operations in 1 
seconds (144416000 bytes)
[  232.735600] test 8 (192 bit key, 1024 byte blocks): 156231 operations in 1 
seconds (159980544 bytes)
[  233.742205] test 9 (192 bit key, 8192 byte blocks): 19913 operations in 1 
seconds (163127296 bytes)
[  234.748777] test 10 (256 bit key, 16 byte blocks): 4880977 operations in 1 
seconds (78095632 bytes)
[  235.751405] test 11 (256 bit key, 64 byte blocks): 2045621 operations in 1 
seconds (130919744 bytes)
[  236.758079] test 12 (256 bit key, 256 byte blocks): 565273 operations in 1 
seconds (144709888 bytes)
[  237.764579] test 13 (256 bit key, 1024 byte blocks): 156625 operations in 1 
seconds (160384000 bytes)
[  238.771175] test 14 (256 bit key, 8192 byte blocks): 20125 operations in 1 
seconds (164864000 bytes)
[  239.26] 
[  239.26] testing speed of async ecb(twofish) decryption
[  239.787020] test 0 (128 bit key, 16 byte blocks): 4962193 operations in 1 
seconds (79395088 bytes)
[  240.792405] test 1 (128 bit key, 64 byte blocks): 2056765 operations in 1 
seconds (131632960 bytes)
[  241.799070] test 2 (128 bit key, 256 byte blocks): 559384 operations in 1 
seconds (143202304 bytes)
[  242.805568] test 3 (128 bit key, 1024 byte blocks): 153881 operations in 1 
seconds (157574144 bytes)
[  243.812191] test 4 (128 bit key, 8192 byte blocks): 19636 operations in 1 
seconds (160858112 bytes)
[  244.818718] test 5 (192 bit key, 16 byte blocks): 4917689 operations in 1 
seconds (78683024 bytes)
[  245.825408] test 6 (192 bit key, 64 byte blocks): 2056235 operations in 1 
seconds (131599040 bytes)
[  246.832070] test 7 (192 bit key, 256 byte blocks): 560579 operations in 1 
seconds (143508224 bytes)
[  247.838598] test 8 (192 bit key, 1024 byte blocks): 153813 operations in 1 
seconds (157504512 bytes)
[  248.845201] test 9 (192 bit key, 8192 byte blocks): 19411 operations in 1 
seconds (159014912 bytes)
[  249.851755] test 10 (256 bit key, 16 byte blocks): 4932508 operations in 1 
seconds (78920128 bytes)
[  250.858372] test 11 (256 bit key, 64 byte blocks): 2057244 operations in 1 
seconds (131663616 bytes)
[  251.865039] test 12 (256 bit key, 256 byte blocks): 559493 operations in 1 
seconds (143230208 bytes)
[  252.871554] test 13 (256 bit key, 1024 byte blocks): 153980 operations in 1 
seconds (157675520 bytes)
[  253.878159] test 14 (256 bit key, 8192 byte blocks): 19665 operations in 1 
seconds (161095680 bytes)
[  254.884711] 
[  254.884711] testing speed of async cbc(twofish) encryption
[  254.898925] test 0 (128 bit key, 16 byte blocks): 5194404 operations in 1 
seconds (83110464 bytes)
[  255.907087] test 1 (128 bit key, 64 byte blocks): 1916243 operations in 1 
seconds (122639552 bytes)
[  256.913758] test 2 (128 bit key, 256 byte blocks): 541282 operations in 1 
seconds (138568192 bytes)
[  257.916278] test 3 (128 bit key, 1024 byte blocks): 141389 operations in 1 
seconds (144782336 bytes)
[  258.918865] test 4 (128 bit key, 8192 byte blocks): 17811 operations in 1 
seconds (145907712 bytes)
[  259.925372] test 5 (192 bit key, 16 byte blocks): 5176387 operations in 1 
seconds (82822192 bytes)
[  260.932038] test 6

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa

On 08/15/2012 04:39 PM, Michal Hocko wrote:
> On Wed 15-08-12 13:33:55, Glauber Costa wrote:
> [...]
>>> This can
>>> be quite confusing.  I am still not sure whether we should mix the two
>>> things together. If somebody wants to limit the kernel memory he has to
>>> touch the other limit anyway.  Do you have a strong reason to mix the
>>> user and kernel counters?
>>
>> This is funny, because the first opposition I found to this work was
>> "Why would anyone want to limit it separately?" =p
>>
>> It seems that a quite common use case is to have a container with a
>> unified view of "memory" that it can use the way he likes, be it with
>> kernel memory, or user memory. I believe those people would be happy to
>> just silently account kernel memory to user memory, or at the most have
>> a switch to enable it.
>>
>> What gets clear from this back and forth, is that there are people
>> interested in both use cases.
> 
> I am still not 100% sure myself. It is just clear that the reclaim would
> need some work in order to do accounting like this.
> 

Note: Besides what I've already said, right *now* in this series we are
accounting just stack. So reclaimable vs not-reclaimable doesn't even
get to play. It is used while the tasks are running, it gets freed after
the tasks exited.

I do agree we need to look to the whole picture, and reclaiming will be
hard to get right.

This is actually why we're addressing them separately: because they are
a hard problem on their own, and the current status of accounting
already solve real life problems for many, though not for all.

>>> My impression was that kernel allocation should simply fail while user
>>> allocations might reclaim as well. Why should we reclaim just because of
>>> the kernel allocation (which is unreclaimable from hard limit reclaim
>>> point of view)?
>>
>> That is not what the kernel does, in general. We assume that if he wants
>> that memory and we can serve it, we should. Also, not all kernel memory
>> is unreclaimable. We can shrink the slabs, for instance. Ying Han
>> claims she has patches for that already...
> 
> Are those patches somewhere around?
> 

Ying Han ?

> [...]
>>> This doesn't check for the hierachy so kmem_accounted might not be in 
>>> sync with it's parents. mem_cgroup_create (below) needs to copy
>>> kmem_accounted down from the parent and the above needs to check if this
>>> is a similar dance like mem_cgroup_oom_control_write.
>>>
>>
>> I don't see why we have to.
>>
>> I believe in a A/B/C hierarchy, C should be perfectly able to set a
>> different limit than its parents. Note that this is not a boolean.
> 
> Ohh, I wasn't clear enough. I am not against setting the _limit_ I just
> meant that the kmem_accounted should be consistent within the hierarchy.
> 

If a parent of yours is accounted, you get accounted as well. This is
not the state in this patch, but gets added later. Isn't this enough ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko

On Wed 15-08-12 12:12:23, James Bottomley wrote:
> On Wed, 2012-08-15 at 13:33 +0400, Glauber Costa wrote:
> > > This can
> > > be quite confusing.  I am still not sure whether we should mix the two
> > > things together. If somebody wants to limit the kernel memory he has to
> > > touch the other limit anyway.  Do you have a strong reason to mix the
> > > user and kernel counters?
> > 
> > This is funny, because the first opposition I found to this work was
> > "Why would anyone want to limit it separately?" =p
> > 
> > It seems that a quite common use case is to have a container with a
> > unified view of "memory" that it can use the way he likes, be it with
> > kernel memory, or user memory. I believe those people would be happy to
> > just silently account kernel memory to user memory, or at the most have
> > a switch to enable it.
> > 
> > What gets clear from this back and forth, is that there are people
> > interested in both use cases.
> 
> Haven't we already had this discussion during the Prague get together?
> We discussed the use cases and finally agreed to separate accounting for
> k and then k+u mem because that satisfies both the Google and Parallels
> cases.  No-one was overjoyed by k and k+u but no-one had a better
> suggestion ... is there a better way of doing this that everyone can
> agree to?
> We do need to get this nailed down because it's the foundation of the
> patch series.

There is a slot in MM/memcg minisum at KS so we have a slot to discuss
this.

> 
> James
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 01/31] arm64: Assembly macros and definitions

2012-08-15 Thread Arnd Bergmann

On Tuesday 14 August 2012, Catalin Marinas wrote:
> This patch introduces several assembly macros and definitions used in
> the .S files across arch/arm64/ like IRQ disabling/enabling, together
> with asm-offsets.c.
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Catalin Marinas 

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace

2012-08-15 Thread Michael S. Tsirkin

On Fri, Aug 10, 2012 at 04:37:25PM -0600, Alex Williamson wrote:
> Introduce KVM_IRQ_SOURCE_ID and KVM_CAP_NR_IRQ_SOURCE_ID to allow
> user allocation of IRQ source IDs and querying both the capability
> and the total count of IRQ source IDs.  These will later be used
> by interfaces for setting up level IRQs.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  Documentation/virtual/kvm/api.txt |   20 
>  arch/x86/kvm/Kconfig  |1 +
>  arch/x86/kvm/x86.c|3 +++
>  include/linux/kvm.h   |   11 +++
>  include/linux/kvm_host.h  |1 +
>  virt/kvm/Kconfig  |3 +++
>  virt/kvm/irq_comm.c   |   22 ++
>  virt/kvm/kvm_main.c   |   16 
>  8 files changed, 77 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index bf33aaa..062cfd5 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1980,6 +1980,26 @@ return the hash table order in the parameter.  (If the 
> guest is using
>  the virtualized real-mode area (VRMA) facility, the kernel will
>  re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
>  
> +4.77 KVM_IRQ_SOURCE_ID
> +
> +Capability: KVM_CAP_NR_IRQ_SOURCE_ID
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_irq_source_id (in/out)
> +Returns: 0 on success, -errno on error
> +
> +Allows allocating and freeing IRQ source IDs.  Each IRQ source ID
> +represents a complete set of irqchip pin inputs which are logically
> +OR'd with other IRQ source IDs for determining the final assertion
> +level of a pin.  The flag KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN indicates
> +whether the call is for an allocation or deallocation.
> +kvm_irq_source_id.irq_source_id returns the allocated IRQ source ID
> +on success and specifies the freed IRQ source ID on deassign.  The
> +return value of KVM_CAP_NR_IRQ_SOURCE_ID indicates the total number
> +of IRQ source IDs.  These IDs are also shared with KVM internal users
> +(ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
> +may be allocated through this interface.
> +
>  5. The kvm_run structure
>  
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index a28f338..bfd2082 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -37,6 +37,7 @@ config KVM
>   select TASK_DELAY_ACCT
>   select PERF_EVENTS
>   select HAVE_KVM_MSI
> + select HAVE_KVM_IRQ_SOURCE_ID
>   ---help---
> Support hosting fully virtualized guest machines using hardware
> virtualization extensions.  You will need a fairly recent
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 42bce48..75e743e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2209,6 +2209,9 @@ int kvm_dev_ioctl_check_extension(long ext)
>   case KVM_CAP_TSC_DEADLINE_TIMER:
>   r = boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER);
>   break;
> + case KVM_CAP_NR_IRQ_SOURCE_ID:
> + r = BITS_PER_LONG; /* kvm->arch.irq_sources_bitmap */
> + break;
>   default:
>   r = 0;
>   break;
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 2ce09aa..67b6b49 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_PPC_GET_SMMU_INFO 78
>  #define KVM_CAP_S390_COW 79
>  #define KVM_CAP_PPC_ALLOC_HTAB 80
> +#define KVM_CAP_NR_IRQ_SOURCE_ID 81
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -691,6 +692,14 @@ struct kvm_irqfd {
>   __u8  pad[20];
>  };
>  
> +#define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
> +
> +struct kvm_irq_source_id {
> + __u32 flags;
> + __u32 irq_source_id;
> + __u8 pad[24];
> +};
> +
>  struct kvm_clock_data {
>   __u64 clock;
>   __u32 flags;
> @@ -831,6 +840,8 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_PPC_GET_SMMU_INFO  _IOR(KVMIO,  0xa6, struct 
> kvm_ppc_smmu_info)
>  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
>  #define KVM_PPC_ALLOCATE_HTAB  _IOWR(KVMIO, 0xa7, __u32)
> +/* Available with KVM_CAP_IRQ_SOURCE_ID */
> +#define KVM_IRQ_SOURCE_ID _IOWR(KVMIO, 0xa8, struct 
> kvm_irq_source_id)
>  
>  /*
>   * ioctls for vcpu fds
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2ad3e4a..ea6d7a1 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -636,6 +636,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>  struct kvm_irq_ack_notifier *kian);
>  int kvm_request_irq_source_id(struct kvm *kvm);
>  void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
> +int kvm_irq_source_id(struct kvm *kvm, struct kvm_irq_source_id *id);
>  
>  /* For vcpu->arch.iommu_flags */
>  #define KVM_IOMMU_CACHE_COHERENCY0x1
> diff --git

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko

On Wed 15-08-12 16:53:40, Glauber Costa wrote:
[...]
> >>> This doesn't check for the hierachy so kmem_accounted might not be in 
> >>> sync with it's parents. mem_cgroup_create (below) needs to copy
> >>> kmem_accounted down from the parent and the above needs to check if this
> >>> is a similar dance like mem_cgroup_oom_control_write.
> >>>
> >>
> >> I don't see why we have to.
> >>
> >> I believe in a A/B/C hierarchy, C should be perfectly able to set a
> >> different limit than its parents. Note that this is not a boolean.
> > 
> > Ohh, I wasn't clear enough. I am not against setting the _limit_ I just
> > meant that the kmem_accounted should be consistent within the hierarchy.
> > 
> 
> If a parent of yours is accounted, you get accounted as well. This is
> not the state in this patch, but gets added later. Isn't this enough ?

But if the parent is not accounted, you can set the children to be
accounted, right? Or maybe this is changed later in the series? I didn't
get to the end yet.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/31] arm64: Exception handling

2012-08-15 Thread Arnd Bergmann

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +#ifdef CONFIG_AARCH32_EMULATION
> +#define compat_thumb_mode(regs) \
> + (((regs)->pstate & COMPAT_PSR_T_BIT))
> +#else
> +#define compat_thumb_mode(regs) (0)
> +#endif

The symbol we use on other platforms is CONFIG_COMPAT. I don't think you
need to have a separate CONFIG_AARCH32_EMULATION

> +void __bad_xchg(volatile void *ptr, int size)
> +{
> + printk("xchg: bad data size: pc 0x%p, ptr 0x%p, size %d\n",
> + __builtin_return_address(0), ptr, size);
> + BUG();
> +}
> +EXPORT_SYMBOL(__bad_xchg);
> +

I think we're better off not defining this function. My guess is that
initially the idea on ARM was that it was meant as a BUILD_BUG_ON
replacement, but the someone added this function. And you copied it.

Microblaze has the same declaration, but (correctly) misses the
definition, which produces a much more helpful link failure than
a run-time BUG(). Using BUILD_BUG_ON would be even better.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 15/31] arm64: SMP support

2012-08-15 Thread Arnd Bergmann

On Tuesday 14 August 2012, Catalin Marinas wrote:
> This patch adds SMP initialisation and spinlocks implementation for
> AArch64. The spinlock support uses the new load-acquire/store-release
> instructions to avoid explicit barriers. The architecture also specifies
> that an event is automatically generated when clearing the exclusive
> monitor state to wake up processors in WFE, so there is no need for an
> explicit DSB/SEV instruction sequence. The SEVL instruction is used to
> set the exclusive monitor locally as there is no conditional WFE and a
> branch is more expensive.
> 
> For the SMP booting protocol, see Documentation/arm64/booting.txt.
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Marc Zyngier 
> Signed-off-by: Catalin Marinas 

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa

On 08/15/2012 05:02 PM, Michal Hocko wrote:
> On Wed 15-08-12 16:53:40, Glauber Costa wrote:
> [...]
> This doesn't check for the hierachy so kmem_accounted might not be in 
> sync with it's parents. mem_cgroup_create (below) needs to copy
> kmem_accounted down from the parent and the above needs to check if this
> is a similar dance like mem_cgroup_oom_control_write.
>

 I don't see why we have to.

 I believe in a A/B/C hierarchy, C should be perfectly able to set a
 different limit than its parents. Note that this is not a boolean.
>>>
>>> Ohh, I wasn't clear enough. I am not against setting the _limit_ I just
>>> meant that the kmem_accounted should be consistent within the hierarchy.
>>>
>>
>> If a parent of yours is accounted, you get accounted as well. This is
>> not the state in this patch, but gets added later. Isn't this enough ?
> 
> But if the parent is not accounted, you can set the children to be
> accounted, right? Or maybe this is changed later in the series? I didn't
> get to the end yet.
> 

Yes, you can. Do you see any problem with that?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: yama_ptrace_access_check(): possible recursive locking detected

2012-08-15 Thread Oleg Nesterov

On 08/14, Kees Cook wrote:
>
> Okay, I've now managed to reproduce this locally. I added a bunch of
> debugging, and I think I understand what's going on. This warning is,
> actually, a false positive.

Sure. I mean that yes, this warning doesn't mean we already hit deadlock.

> get used recursively (the task_struct->alloc_lock), but they are
> separate instantiations ("task" is never "current").

Yes. But suppose that we have 2 tasks T1 and T2,

- T1 does ptrace(PTRACE_ATTACH, T2);

- T2 does ptrace(PTRACE_ATTACH, T1);

at the same time. This can lead to the "real" deadlock, no?

> So Oleg's suggestion of removing the locking around the reading of
> ->comm is wrong since it really does need the lock.

Nothing bad can happen without the lock. Yes, printk() can print
some string "in between" if we race with set_task_comm() but this
is all.

BTW, set_task_comm()->wmb() and memset() should die. There are
not needed afaics, and the comment is misleading.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf: Let O= makes handle relative paths

2012-08-15 Thread Arnaldo Carvalho de Melo

Em Wed, Aug 15, 2012 at 12:27:22PM +0200, Borislav Petkov escreveu:
> On Mon, Aug 13, 2012 at 03:02:49PM -0300, Arnaldo Carvalho de Melo wrote:
> > [acme@sandy linux]$ rm -rf ../build/perf
> > [acme@sandy linux]$ make -j8 -C tools/perf/ LIBUNWIND_DIR=/opt/libunwind 
> > O=/home/acme/git/build/perf install
> > /bin/sh: line 0: cd: /home/acme/git/build/perf: No such file or directory
> > make: Entering directory `/home/git/linux/tools/perf'
> > GEN perf-archive
> > GEN /home/git/linux/tools/perf/python/perf.so
> > make[1]: Entering directory `/home/git/linux/tools/lib/traceevent'
> > * new build flags or cross compiler
> > CC /home/git/linux/tools/perf/perf.o

> > I.e. it should stop if the O= provided directory is not there.

> Why stop? Don't we want to make the directory instead and continue
> building in there?

That was the case in the past, but IIRC PeterZ advocated not to and I
agreed.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-08-15 Thread Avi Kivity

On 08/15/2012 02:26 AM, Alex Williamson wrote:
> 
> Yes, I understand.  It's simple, it's also very specific to this
> problem, and doesn't address generic ack notification.  All of which
> I've noted before and I continue to note that v8 offers simplifications
> while retaining flexibility.  Least amount of code doesn't really buy us
> much if we end up needing to invent new interfaces down the road because
> we've created such a specific solution here.  Thanks,
> 

One side of the coin is trying to create one generic interface instead
of multiple specific interfaces.  The other side is that by providing a
generic interface, you sometimes expose internal implementation details,
or you constrain future development in order to preserve that interface.
 If the generic interface is not actually exploited, you get pain for no
gain.

This tradeoff is different for every feature.  Right now I'm leaning
towards specialized interfaces here, because we expose quite a lot of
low-level detail.  However I'll review v8 soon and see.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Michal Hocko

On Wed 15-08-12 13:42:24, Glauber Costa wrote:
[...]
> >> +
> >> +  ret = 0;
> >> +
> >> +  if (!memcg)
> >> +  return ret;
> >> +
> >> +  _memcg = memcg;
> >> +  ret = __mem_cgroup_try_charge(NULL, gfp, delta / PAGE_SIZE,
> >> +  &_memcg, may_oom);
> > 
> > This is really dangerous because atomic allocation which seem to be
> > possible could result in deadlocks because of the reclaim. 
> 
> Can you elaborate on how this would happen?

Say you have an atomic allocation and we hit the limit so we get either
to reclaim which can sleep or to oom which can sleep as well (depending
on the oom_control).

> > Also, as I have mentioned in the other email in this thread. Why
> > should we reclaim just because of kernel allocation when we are not
> > reclaiming any of it because shrink_slab is ignored in the memcg
> > reclaim.
> 
> Don't get too distracted by the fact that shrink_slab is ignored. It is
> temporary, and while this being ignored now leads to suboptimal
> behavior, it will 1st, only affect its users, and 2nd, not be disastrous.

It's not just about shrink_slab it is also about triggering memcg-oom
which doesn't consider kmem accounted memory so the wrong tasks could
be killed. It is true that the impact is packed inside the group
(hierarchy) so you are right it won't be disastrous.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [perf] make clean problematic bashism

2012-08-15 Thread Arnaldo Carvalho de Melo

Em Wed, Aug 15, 2012 at 12:26:48PM +0200, Peter Zijlstra escreveu:
> On Wed, 2012-08-15 at 11:52 +0200, Wouter M. Koolen wrote:
> > The reason for this was a bunch of generated empty flex files in util/ 
> > that were not removed by make clean. They are intended to be erased, 
> > since the Makefile executes

> > rm -f util/*-{bison,flex}*

> > however, this command does not remove the files. I guess because {,} 
> > alternatives are only special in bash but the makefile is run with some 
> > other shell?

> ISTR us getting a number of such patches, did we miss a site, acme?

[acme@sandy linux]$ git describe --match 'v[0-9].[0-9]*' 
7f309ed6453926a81e2a97d274f67f1e48f0d74c
v3.5-358-g7f309ed
[acme@sandy linux]$ git show --oneline  7f309ed6453926a81e2a97d274f67f1e48f0d74c
7f309ed perf tools: Remove brace expansion from clean target
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 32912af..35655c3 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -987,7 +987,8 @@ clean:
$(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS 
tags cscope*
$(MAKE) -C Documentation/ clean
$(RM) $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)PERF-CFLAGS
-   $(RM) $(OUTPUT)util/*-{bison,flex}*
+   $(RM) $(OUTPUT)util/*-bison*
+   $(RM) $(OUTPUT)util/*-flex*
$(python-clean)
 
 .PHONY: all install clean strip $(LIBTRACEEVENT)
[acme@sandy linux]$
 
> > I got perf to compile now, but thought you would be interested to know 
> > about this little problem.

> > PS: as a side note: GNU make has the .DELETE_ON_ERROR: special target, 
> > which removes the target file when its generating command fails. This 
> > would have prevented my problem and sounds like a good idea in general. 
> > Maybe perf could make use of this feature when on GNU make?
 
> I don't think we build with anything but gnu make, mind sending a patch
> implementing your suggestion?

Yeah, please submit a patch,

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [discussion]sched: a rough proposal to enable power saving in scheduler

2012-08-15 Thread Borislav Petkov

On Wed, Aug 15, 2012 at 01:05:38PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-08-13 at 20:21 +0800, Alex Shi wrote:
> > Since there is no power saving consideration in scheduler CFS, I has a
> > very rough idea for enabling a new power saving schema in CFS.
> 
> Adding Thomas, he always delights poking holes in power schemes.
> 
> > It bases on the following assumption:
> > 1, If there are many task crowd in system, just let few domain cpus
> > running and let other cpus idle can not save power. Let all cpu take the
> > load, finish tasks early, and then get into idle. will save more power
> > and have better user experience.
> 
> I'm not sure this is a valid assumption. I've had it explained to me by
> various people that race-to-idle isn't always the best thing. It has to
> do with the cost of switching power states and the duration of execution
> and other such things.

I think what he means here is that we might want to let all cores on
the node (i.e., domain) finish and then power down the whole node which
should bring much more power savings than letting a subset of the cores
idle. Alex?

[ … ]

> So I'd leave the currently implemented scheme as performance, and I
> don't think the above describes the current state.
> 
> > } else if (schedule policy == power)
> > move tasks from busiest group to
> > idlest group until busiest is just full
> > of capacity.
> > //the busiest group can balance
> > //internally after next time LB,
> 
> There's another thing we need to do, and that is collect tasks in a
> minimal amount of power domains.

Yep.

Btw, what heuristic would tell here when a domain overflows and another
needs to get woken? Combined load of the whole domain?

And if I absolutely positively don't want a node to wake up, do I
hotplug its cores off or are we going to have a way to tell the
scheduler to overcommit the non-idle domains and spread the tasks only
among them.

I'm thinking of short bursts here where it would be probably beneficial
to let the tasks rather wait runnable for a while then wake up the next
node and waste power...

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf: Let O= makes handle relative paths

2012-08-15 Thread Borislav Petkov

On Wed, Aug 15, 2012 at 10:06:34AM -0300, Arnaldo Carvalho de Melo wrote:
> That was the case in the past, but IIRC PeterZ advocated not to and I
> agreed.

Maybe you guys need to explain yourselves :) I mean, the dir is not
present so we're not overwriting anything. And since we say "O=..." on
the command line, it is actually expected that we really mean it... why
type it, otherwise?

Hmmm.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

2012-08-15 Thread Arnd Bergmann

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +The AArch64 exception model is made up of a number of exception levels
> +(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
> +counterpart.  EL2 is the hypervisor level and exists only in non-secure
> +mode. EL3 is the highest priority level and exists only in secure mode.

I'm always confused by a description like this. It sounds like you cannot
have a hypervisor if you have code running in secure mode in EL3. What
I instead understand is that you enter non-secure mode by going from
EL3 into EL2.

> +2. Setup the device tree
> +-
> +
> +Requirement: MANDATORY
> +
> +The device tree blob (dtb) must be no bigger than 2 megabytes in size
> +and placed at a 2-megabyte boundary within the first 512 megabytes from
> +the start of the kernel image. This is to allow the kernel to map the
> +blob using a single section mapping in the initial page tables.

I've seen people put firmware for some peripherals into the device tree,
so that a device driver can grab a blob from there and load it into the
device, rather than calling request_firmware() which would fail if the
OS running on the system does not contain the blob. If such firmware is
too large, you end up violating the 2 MB limit you impose here.

Should we keep that limit and declare those use cases as invalid, or
should we try to make the boot protocol more flexible?

> diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
> new file mode 100644
> index 000..d766493
> --- /dev/null
> +++ b/arch/arm64/include/asm/setup.h
> @@ -0,0 +1,26 @@
> +#ifndef __ASM_SETUP_H
> +#define __ASM_SETUP_H
> +
> +#include 
> +
> +#define COMMAND_LINE_SIZE 1024
> +
> +#endif

Is this necessary? The asm-generic version of this file allows 512 bytes,
which seems plenty.

> +unsigned int processor_id;
> +EXPORT_SYMBOL(processor_id);
> +
> +unsigned int elf_hwcap __read_mostly;
> +EXPORT_SYMBOL(elf_hwcap);

EXPORT_SYMBOL_GPL?

Neither of these looks like they should be used in drivers.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Mel Gorman

On Wed, Aug 15, 2012 at 01:08:08PM +0400, Glauber Costa wrote:
> On 08/14/2012 07:16 PM, Mel Gorman wrote:
> > On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote:
> >> When a process tries to allocate a page with the __GFP_KMEMCG flag, the
> >> page allocator will call the corresponding memcg functions to validate
> >> the allocation. Tasks in the root memcg can always proceed.
> >>
> >> To avoid adding markers to the page - and a kmem flag that would
> >> necessarily follow, as much as doing page_cgroup lookups for no reason,
> > 
> > As you already guessed, doing a page_cgroup in the page allocator free
> > path would be a no-go.
> 
> Specifically yes, but in general, you will be able to observe that I am
> taking all the possible measures to make sure existing paths are
> disturbed as little as possible.
> 
> Thanks for your review here
> 
> >>  
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index b956cec..da341dc 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> >> order,
> >>struct page *page = NULL;
> >>int migratetype = allocflags_to_migratetype(gfp_mask);
> >>unsigned int cpuset_mems_cookie;
> >> +  void *handle = NULL;
> >>  
> >>gfp_mask &= gfp_allowed_mask;
> >>  
> >> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> >> order,
> >>return NULL;
> >>  
> >>/*
> >> +   * Will only have any effect when __GFP_KMEMCG is set.
> >> +   * This is verified in the (always inline) callee
> >> +   */
> >> +  if (!memcg_kmem_new_page(gfp_mask, , order))
> > 
> > memcg_kmem_new_page takes a void * parameter already but here you are
> > passing in a void **. This probably happens to work because you do this
> > 
> > struct mem_cgroup **handle = (struct mem_cgroup **)_handle;
> > 
> > but that appears to defeat the purpose of having an opaque type as a
> > "handle". You have to treat it different then passing it into the commit
> > function because it expects a void *. The motivation for an opaque type
> > is completely unclear to me and how it is managed with a mix of void *
> > and void ** is very confusing.
> 
> okay.
> 
> The opaque exists because I am doing speculative charging.

I do not get why speculative charing would mandate an opaque type or
"handle". It looks like like a fairly standard prepare/commit pattern to me.

> I believe it
> to be a better and less complicated approach then letting a page appear
> and then charging it. Besides being consistent with the rest of memcg,
> it won't create unnecessary disturbance in the page allocator
> when the allocation is to fail.
> 

I still don't get why you did not just return a mem_cgroup instead of a
handle.

> Now, tasks can move between memcgs, so we can't rely on grabbing it from
> current in commit_page, so we pass it around as a handle.

You could just as easily passed around the mem_cgroup and it would have
been less obscure. Maybe this makes sense from a memcg context and matches
some coding pattern there that I'm not aware of.

> Also, even if
> the task could not move, we already got it once from the task, and that
> is not for free. Better save it.
> 
> Aside from the handle needed, the cost is more or less the same compared
> to doing it in one pass. All we do by using speculative charging is to
> split the cost in two, and doing it from two places.
> We'd have to charge + update page_cgroup anyway.
> 
> As for the type, do you think using struct mem_cgroup would be less
> confusing?
> 

Yes and returning the mem_cgroup or NULL instead of bool.

> > On a similar note I spotted #define memcg_kmem_on 1 . That is also
> > different just for the sake of it. The convension is to do something
> > like this
> > 
> > /* This helps us to avoid #ifdef CONFIG_NUMA */
> > #ifdef CONFIG_NUMA
> > #define NUMA_BUILD 1
> > #else
> > #define NUMA_BUILD 0
> > #endif
> 
> For simple defines, yes. But a later patch will turn this into a static
> branch test. memcg_kmem_on will be always 0 when compile-disabled, but
> when enable will expand to static_branch(&...).
> 

I see.

> 
> > memcg_kmem_on was difficult to guess based on its name. I thought initially
> > that it would only be active if a memcg existed or at least something like
> > mem_cgroup_disabled() but it's actually enabled if CONFIG_MEMCG_KMEM is set.
> 
> For now. And I thought that adding the static branch in this patch would
> only confuse matters.

Ah, I see now. I had stopped reading the series once I reached this
patch. I don't think it would have mattered much to collapse the two
patches together but ok.

The static key handling does look a little suspicious. You appear to do
reference counting in memcg_update_kmem_limit for every mem_cgroup_write()
but decrement it on memcg exit. This does not appear as if it would be
symmetric if the memcg files were written to multiple times (maybe that's
not allowed?). Either way,

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko

On Wed 15-08-12 17:04:31, Glauber Costa wrote:
> On 08/15/2012 05:02 PM, Michal Hocko wrote:
> > On Wed 15-08-12 16:53:40, Glauber Costa wrote:
> > [...]
> > This doesn't check for the hierachy so kmem_accounted might not be in 
> > sync with it's parents. mem_cgroup_create (below) needs to copy
> > kmem_accounted down from the parent and the above needs to check if this
> > is a similar dance like mem_cgroup_oom_control_write.
> >
> 
>  I don't see why we have to.
> 
>  I believe in a A/B/C hierarchy, C should be perfectly able to set a
>  different limit than its parents. Note that this is not a boolean.
> >>>
> >>> Ohh, I wasn't clear enough. I am not against setting the _limit_ I just
> >>> meant that the kmem_accounted should be consistent within the hierarchy.
> >>>
> >>
> >> If a parent of yours is accounted, you get accounted as well. This is
> >> not the state in this patch, but gets added later. Isn't this enough ?
> > 
> > But if the parent is not accounted, you can set the children to be
> > accounted, right? Or maybe this is changed later in the series? I didn't
> > get to the end yet.
> > 
> 
> Yes, you can. Do you see any problem with that?

Well, if a child contributes with the kmem charges upwards the hierachy
then a parent can have kmem.usage > 0 with disabled accounting.
I am not saying this is a no-go but it definitely is confusing and I do
not see any good reason for it. I've considered it as an overlook rather
than a deliberate design decision.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread James Bottomley

On Wed, 2012-08-15 at 14:55 +0200, Michal Hocko wrote:
> On Wed 15-08-12 12:12:23, James Bottomley wrote:
> > On Wed, 2012-08-15 at 13:33 +0400, Glauber Costa wrote:
> > > > This can
> > > > be quite confusing.  I am still not sure whether we should mix the two
> > > > things together. If somebody wants to limit the kernel memory he has to
> > > > touch the other limit anyway.  Do you have a strong reason to mix the
> > > > user and kernel counters?
> > > 
> > > This is funny, because the first opposition I found to this work was
> > > "Why would anyone want to limit it separately?" =p
> > > 
> > > It seems that a quite common use case is to have a container with a
> > > unified view of "memory" that it can use the way he likes, be it with
> > > kernel memory, or user memory. I believe those people would be happy to
> > > just silently account kernel memory to user memory, or at the most have
> > > a switch to enable it.
> > > 
> > > What gets clear from this back and forth, is that there are people
> > > interested in both use cases.
> > 
> > Haven't we already had this discussion during the Prague get together?
> > We discussed the use cases and finally agreed to separate accounting for
> > k and then k+u mem because that satisfies both the Google and Parallels
> > cases.  No-one was overjoyed by k and k+u but no-one had a better
> > suggestion ... is there a better way of doing this that everyone can
> > agree to?
> > We do need to get this nailed down because it's the foundation of the
> > patch series.
> 
> There is a slot in MM/memcg minisum at KS so we have a slot to discuss
> this.

Sure, to get things moving, can you pre-prime us with what you're
thinking in this area so we can be prepared (and if it doesn't work,
tell you beforehand)?

Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/31] arm64: MMU definitions

2012-08-15 Thread Arnd Bergmann

On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> +/*
> + * TCR flags.
> + */
> +#define TCR_TxSZ(x)  (((64 - (x)) << 16) | ((64 - (x)) << 0))
> +#define TCR_IRGN_NC  ((0 << 8) | (0 << 24))
> +#define TCR_IRGN_WBWA((1 << 8) | (1 << 24))
> +#define TCR_IRGN_WT  ((2 << 8) | (2 << 24))
> +#define TCR_IRGN_WBnWA   ((3 << 8) | (3 << 24))
> +#define TCR_IRGN_MASK((3 << 8) | (3 << 24))
> +#define TCR_ORGN_NC  ((0 << 10) | (0 << 26))
> +#define TCR_ORGN_WBWA((1 << 10) | (1 << 26))
> +#define TCR_ORGN_WT  ((2 << 10) | (2 << 26))
> +#define TCR_ORGN_WBnWA   ((3 << 10) | (3 << 26))
> +#define TCR_ORGN_MASK((3 << 10) | (3 << 26))
> +#define TCR_SHARED   ((3 << 12) | (3 << 28))
> +#define TCR_TG0_64K  (1 << 14)
> +#define TCR_TG1_64K  (1 << 30)
> +#define TCR_IPS_40BIT(2 << 32)
> +#define TCR_ASID16   (1 << 36)
> +

As a matter of coding style, I would much prefer tables like this to be
written as

#define TCR_IRGN_MASK   0x03000300
#define TCR_IRGN_WBnWA  0x03000300
#define TCR_IRGN_WT 0x02000200
#define TCR_IRGN_WBWA   0x01000100
#define TCR_IRGN_NC 0x

#define TCR_ORGN_MASK   0x0c000c00
#define TCR_ORGN_WBnWA  0x0c000c00
#define TCR_ORGN_WT 0x08000800
#define TCR_ORGN_WBWA   0x04000400
#define TCR_ORGN_NC 0x

The advantage of this is that you can visually compare the bitmasks
to a hex dump, and if you are suffering from endian-confused documentation
authors, there is no ambiguity about which end of the word is bit zero.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa

On 08/15/2012 05:26 PM, Michal Hocko wrote:
> On Wed 15-08-12 17:04:31, Glauber Costa wrote:
>> On 08/15/2012 05:02 PM, Michal Hocko wrote:
>>> On Wed 15-08-12 16:53:40, Glauber Costa wrote:
>>> [...]
>>> This doesn't check for the hierachy so kmem_accounted might not be in 
>>> sync with it's parents. mem_cgroup_create (below) needs to copy
>>> kmem_accounted down from the parent and the above needs to check if this
>>> is a similar dance like mem_cgroup_oom_control_write.
>>>
>>
>> I don't see why we have to.
>>
>> I believe in a A/B/C hierarchy, C should be perfectly able to set a
>> different limit than its parents. Note that this is not a boolean.
>
> Ohh, I wasn't clear enough. I am not against setting the _limit_ I just
> meant that the kmem_accounted should be consistent within the hierarchy.
>

 If a parent of yours is accounted, you get accounted as well. This is
 not the state in this patch, but gets added later. Isn't this enough ?
>>>
>>> But if the parent is not accounted, you can set the children to be
>>> accounted, right? Or maybe this is changed later in the series? I didn't
>>> get to the end yet.
>>>
>>
>> Yes, you can. Do you see any problem with that?
> 
> Well, if a child contributes with the kmem charges upwards the hierachy
> then a parent can have kmem.usage > 0 with disabled accounting.
> I am not saying this is a no-go but it definitely is confusing and I do
> not see any good reason for it. I've considered it as an overlook rather
> than a deliberate design decision.
> 

No, it is not an overlook.
It is theoretically possible to skip accounting on non-limited parents,
but how expensive is that? This is, indeed, confusing.

Of course I can be biased, but the way I see it, once you have
hierarchy, you account everything your child accounts.

I really don't see what is the concern here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 1534 matches

Mail list logo