date:20130618

[PATCH v3 3/4] mfd: Palmas: Add TPS659038 PMIC support

2013-06-18 Thread Keerthy

From: J Keerthy 

The Patch adds TPS659038 PMIC support in the palmas mfd driver.
The TPS659038 has almost the same registers as of the earlier
supported variants of PALMAS family such as the TWL6035.

The critical differences between TPS659038 and TWL6035 being:

1) TPS659038 has nothing related to battery charging and back up battery stuff.
2) TPS659038 does not have does not have SMPS10(Boost) step up convertor.
3) TPS659038 does not have Battery detection and anything related to battery.
4) SD card detection, Battery presence detection, Vibrator, USB OTG are missing
   when compared to TWL6035.

Signed-off-by: J Keerthy 
---
 Documentation/devicetree/bindings/mfd/palmas.txt |2 ++
 drivers/mfd/palmas.c |5 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/mfd/palmas.txt 
b/Documentation/devicetree/bindings/mfd/palmas.txt
index 7bcd59c..89cb773 100644
--- a/Documentation/devicetree/bindings/mfd/palmas.txt
+++ b/Documentation/devicetree/bindings/mfd/palmas.txt
@@ -5,6 +5,7 @@ twl6035 (palmas)
 twl6037 (palmas)
 tps65913 (palmas)
 tps65914 (palmas)
+tps659038
 
 Required properties:
 - compatible : Should be from the list
@@ -14,6 +15,7 @@ Required properties:
   ti,tps65913
   ti,tps65914
   ti,tps80036
+  ti,tps659038
 and also the generic series names
   ti,palmas
 - interrupt-controller : palmas has its own internal IRQs
diff --git a/drivers/mfd/palmas.c b/drivers/mfd/palmas.c
index 1cacc6a..0439edb 100644
--- a/drivers/mfd/palmas.c
+++ b/drivers/mfd/palmas.c
@@ -232,12 +232,17 @@ static void palmas_dt_to_pdata(struct i2c_client *i2c,
 }
 
 static unsigned int palmas_features = PALMAS_PMIC_FEATURE_SMPS10_BOOST;
+static unsigned int tps659038_features;
 
 static const struct of_device_id of_palmas_match_tbl[] = {
{
.compatible = "ti,palmas",
.data = _features,
},
+   {
+   .compatible = "ti,tps659038",
+   .data = _features,
+   },
{ },
 };
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 4/4] regulator: Palmas: Add TPS659038 support

2013-06-18 Thread Keerthy

From: J Keerthy 

Add TPS659038 support.

Signed-off-by: J Keerthy 
---
 .../devicetree/bindings/regulator/palmas-pmic.txt  |1 +
 drivers/regulator/palmas-regulator.c   |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/regulator/palmas-pmic.txt 
b/Documentation/devicetree/bindings/regulator/palmas-pmic.txt
index d5a3086..5115cd7 100644
--- a/Documentation/devicetree/bindings/regulator/palmas-pmic.txt
+++ b/Documentation/devicetree/bindings/regulator/palmas-pmic.txt
@@ -7,6 +7,7 @@ Required properties:
   ti,twl6037-pmic
   ti,tps65913-pmic
   ti,tps65914-pmic
+  ti,tps659038-pmic
 and also the generic series names
   ti,palmas-pmic
 - interrupt-parent : The parent interrupt controller which is palmas.
diff --git a/drivers/regulator/palmas-regulator.c 
b/drivers/regulator/palmas-regulator.c
index 1ae1e83..d0c8785 100644
--- a/drivers/regulator/palmas-regulator.c
+++ b/drivers/regulator/palmas-regulator.c
@@ -1054,6 +1054,7 @@ static struct of_device_id of_palmas_match_tbl[] = {
{ .compatible = "ti,tps65913-pmic", },
{ .compatible = "ti,tps65914-pmic", },
{ .compatible = "ti,tps80036-pmic", },
+   { .compatible = "ti,tps659038-pmic", },
{ /* end */ }
 };
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 1/4] MFD: Palmas: Check if irq is valid

2013-06-18 Thread Keerthy

From: J Keerthy 

Check if irq value obtained is valid. If it is not valid
then skip the irq request step and go ahead with the probe.

Signed-off-by: J Keerthy 
---
 drivers/mfd/palmas.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/mfd/palmas.c b/drivers/mfd/palmas.c
index 62fa728..b24bee3 100644
--- a/drivers/mfd/palmas.c
+++ b/drivers/mfd/palmas.c
@@ -290,6 +290,11 @@ static int palmas_i2c_probe(struct i2c_client *i2c,
}
}
 
+   if (!palmas->irq) {
+   dev_warn(palmas->dev, "IRQ missing: skipping irq request\n");
+   goto no_irq;
+   }
+
/* Change interrupt line output polarity */
if (pdata->irq_flags & IRQ_TYPE_LEVEL_HIGH)
reg = PALMAS_POLARITY_CTRL_INT_POLARITY;
@@ -316,6 +321,7 @@ static int palmas_i2c_probe(struct i2c_client *i2c,
if (ret < 0)
goto err;
 
+no_irq:
slave = PALMAS_BASE_TO_SLAVE(PALMAS_PU_PD_OD_BASE);
addr = PALMAS_BASE_TO_REG(PALMAS_PU_PD_OD_BASE,
PALMAS_PRIMARY_SECONDARY_PAD1);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 2/4] MFD: Palmas: Add SMPS10_BOOST feature

2013-06-18 Thread Keerthy

From: J Keerthy 

The SMPS10 regulator is not presesnt in all the variants
of the PALMAS PMIC family. Hence adding a feature to distingush
between them.

Signed-off-by: J Keerthy 
---
 drivers/mfd/palmas.c |   27 ---
 drivers/regulator/palmas-regulator.c |3 +++
 include/linux/mfd/palmas.h   |   14 ++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/mfd/palmas.c b/drivers/mfd/palmas.c
index b24bee3..1cacc6a 100644
--- a/drivers/mfd/palmas.c
+++ b/drivers/mfd/palmas.c
@@ -231,6 +231,16 @@ static void palmas_dt_to_pdata(struct i2c_client *i2c,
palmas_set_pdata_irq_flag(i2c, pdata);
 }
 
+static unsigned int palmas_features = PALMAS_PMIC_FEATURE_SMPS10_BOOST;
+
+static const struct of_device_id of_palmas_match_tbl[] = {
+   {
+   .compatible = "ti,palmas",
+   .data = _features,
+   },
+   { },
+};
+
 static int palmas_i2c_probe(struct i2c_client *i2c,
const struct i2c_device_id *id)
 {
@@ -238,8 +248,9 @@ static int palmas_i2c_probe(struct i2c_client *i2c,
struct palmas_platform_data *pdata;
struct device_node *node = i2c->dev.of_node;
int ret = 0, i;
-   unsigned int reg, addr;
+   unsigned int reg, addr, *features;
int slave;
+   const struct of_device_id *match;
 
pdata = dev_get_platdata(>dev);
 
@@ -261,9 +272,16 @@ static int palmas_i2c_probe(struct i2c_client *i2c,
 
i2c_set_clientdata(i2c, palmas);
palmas->dev = >dev;
-   palmas->id = id->driver_data;
palmas->irq = i2c->irq;
 
+   match = of_match_device(of_match_ptr(of_palmas_match_tbl), >dev);
+
+   if (!match)
+   return -ENODATA;
+
+   features = (unsigned int *)match->data;
+   palmas->features = *features;
+
for (i = 0; i < PALMAS_NUM_CLIENTS; i++) {
if (i == 0)
palmas->i2c_clients[i] = i2c;
@@ -433,11 +451,6 @@ static const struct i2c_device_id palmas_i2c_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, palmas_i2c_id);
 
-static struct of_device_id of_palmas_match_tbl[] = {
-   { .compatible = "ti,palmas", },
-   { /* end */ }
-};
-
 static struct i2c_driver palmas_i2c_driver = {
.driver = {
   .name = "palmas",
diff --git a/drivers/regulator/palmas-regulator.c 
b/drivers/regulator/palmas-regulator.c
index 3ae44ac..1ae1e83 100644
--- a/drivers/regulator/palmas-regulator.c
+++ b/drivers/regulator/palmas-regulator.c
@@ -838,6 +838,9 @@ static int palmas_regulators_probe(struct platform_device 
*pdev)
continue;
ramp_delay_support = true;
break;
+   case PALMAS_REG_SMPS10:
+   if (!PALMAS_PMIC_HAS(palmas, SMPS10_BOOST))
+   continue;
}
 
if ((id == PALMAS_REG_SMPS6) || (id == PALMAS_REG_SMPS8))
diff --git a/include/linux/mfd/palmas.h b/include/linux/mfd/palmas.h
index 8f21daf..98058ca 100644
--- a/include/linux/mfd/palmas.h
+++ b/include/linux/mfd/palmas.h
@@ -32,6 +32,19 @@
((a) == PALMAS_CHIP_ID))
 #define is_palmas_charger(a) ((a) == PALMAS_CHIP_CHARGER_ID)
 
+/**
+ * Palmas PMIC feature types
+ *
+ * PALMAS_PMIC_FEATURE_SMPS10_BOOST - used when the PMIC provides SMPS10_BOOST
+ * regulator.
+ *
+ * PALMAS_PMIC_HAS(b, f) - macro to check if a bandgap device is capable of a
+ * specific feature (above) or not. Return non-zero, if yes.
+ */
+#define PALMAS_PMIC_FEATURE_SMPS10_BOOST   BIT(0)
+#define PALMAS_PMIC_HAS(b, f)  \
+   ((b)->features & PALMAS_PMIC_FEATURE_ ## f)
+
 struct palmas_pmic;
 struct palmas_gpadc;
 struct palmas_resource;
@@ -46,6 +59,7 @@ struct palmas {
/* Stored chip id */
int id;
 
+   unsigned int features;
/* IRQ Data */
int irq;
u32 irq_mask;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 0/4] MFD: Palmas: Add TPS659038 PMIC support on Palmas

2013-06-18 Thread Keerthy

From: J Keerthy 

The Patch series adds TPS659038 PMIC support in the palmas MFD and Regulator
drivers. The TPS659038 has almost the same registers as of the earlier
supported variants of PALMAS family such as the TWL6035.

The critical differences between TPS659038 and TWL6035 being:

1) TPS659038 has nothing related to battery charging and back up battery stuff.
2) TPS659038 does not have does not have SMPS10(Boost) step up convertor.
3) TPS659038 does not have Battery detection and anything related to battery.
4) SD card detection, Battery presence detection, Vibrator, USB OTG are missing
   when compared to TWL6035.

The patch series is based on the patch:
http://www.mail-archive.com/linux-omap@vger.kernel.org/msg90598.html

V3:

Implements Interrupts check using i2c->irq variable instead of DT
"interrupts" property.

Cleans ups in assiging the features variable in patch 2.

V2:

Implements Interrupts checking via DT instead of creating flags
and checking based on chip ID.

J Keerthy (4):
  MFD: Palmas: Check if irq is valid
  MFD: Palmas: Add SMPS10_BOOST feature
  mfd: Palmas: Add TPS659038 PMIC support
  regulator: Palmas: Add TPS659038 support

 Documentation/devicetree/bindings/mfd/palmas.txt   |2 +
 .../devicetree/bindings/regulator/palmas-pmic.txt  |1 +
 drivers/mfd/palmas.c   |   38 
 drivers/regulator/palmas-regulator.c   |4 ++
 include/linux/mfd/palmas.h |   14 +++
 5 files changed, 52 insertions(+), 7 deletions(-)

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the driver-core tree with the driver-core.current tree

2013-06-18 Thread Greg KH

On Wed, Jun 19, 2013 at 03:32:25PM +1000, Stephen Rothwell wrote:
> Hi Greg,
> 
> Today's linux-next merge of the driver-core tree got a conflict in
> drivers/base/firmware_class.c between commit 875979368eb4 ("firmware
> loader: fix use-after-free by double abort") from the driver-core.current
> tree and commit fe304143b0c3 ("firmware: Avoid deadlock of usermodehelper
> lock at shutdown") from the driver-core tree.
> 
> I fixed it up (more may be required - see below) and can carry the fix as
> necessary (no action is required).

Thanks, the merge looks correct, Ming, any objection?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[net-next rfc 3/3] tuntap: increase the max queues to 16

2013-06-18 Thread Jason Wang

Since we've reduce the size of tun_struct and use flex array to allocate netdev
queues, it's safe for us to increase the limit of queues in tuntap.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8c5c124..205f6aa 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -110,10 +110,7 @@ struct tap_filter {
unsigned char   addr[FLT_EXACT_COUNT][ETH_ALEN];
 };
 
-/* DEFAULT_MAX_NUM_RSS_QUEUES were choosed to let the rx/tx queues allocated 
for
- * the netdevice to be fit in one page. So we can make sure the success of
- * memory allocation. TODO: increase the limit. */
-#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
+#define MAX_TAP_QUEUES 16
 #define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[net-next rfc 2/3] tuntap: reduce the size of tun_struct by using flex array

2013-06-18 Thread Jason Wang

This patch switches to use flex array to implement the flow caches, it can
brings several advantages:

- save the size of the tun_struct structure, which can allows us to increase the
  upper limit of queues in the future.
- avoid higher order memory allocation which could be used when switching to use
  pure hashing in flow cache who may demand a larger size array in the future.

After this patch, the size of tun_struct on x86_64 were reduced from 8512 to
328.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c  |   54 +--
 net/openvswitch/flow.c |2 +-
 2 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a344270..8c5c124 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -180,7 +181,7 @@ struct tun_struct {
int debug;
 #endif
spinlock_t lock;
-   struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
+   struct flex_array *flows;
struct timer_list flow_gc_timer;
unsigned long ageing_time;
unsigned int numdisabled;
@@ -239,10 +240,11 @@ static void tun_flow_flush(struct tun_struct *tun)
 
spin_lock_bh(>lock);
for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+   struct hlist_head *h = flex_array_get(tun->flows, i);
struct tun_flow_entry *e;
struct hlist_node *n;
 
-   hlist_for_each_entry_safe(e, n, >flows[i], hash_link)
+   hlist_for_each_entry_safe(e, n, h, hash_link)
tun_flow_delete(tun, e);
}
spin_unlock_bh(>lock);
@@ -254,10 +256,11 @@ static void tun_flow_delete_by_queue(struct tun_struct 
*tun, u16 queue_index)
 
spin_lock_bh(>lock);
for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+   struct hlist_head *h = flex_array_get(tun->flows, i);
struct tun_flow_entry *e;
struct hlist_node *n;
 
-   hlist_for_each_entry_safe(e, n, >flows[i], hash_link) {
+   hlist_for_each_entry_safe(e, n, h, hash_link) {
if (e->queue_index == queue_index)
tun_flow_delete(tun, e);
}
@@ -277,10 +280,11 @@ static void tun_flow_cleanup(unsigned long data)
 
spin_lock_bh(>lock);
for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+   struct hlist_head *h = flex_array_get(tun->flows, i);
struct tun_flow_entry *e;
struct hlist_node *n;
 
-   hlist_for_each_entry_safe(e, n, >flows[i], hash_link) {
+   hlist_for_each_entry_safe(e, n, h, hash_link) {
unsigned long this_timer;
count++;
this_timer = e->updated + delay;
@@ -307,7 +311,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 
rxhash,
if (!rxhash)
return;
else
-   head = >flows[tun_hashfn(rxhash)];
+   head = flex_array_get(tun->flows, tun_hashfn(rxhash));
 
rcu_read_lock();
 
@@ -356,7 +360,8 @@ static u16 tun_select_queue(struct net_device *dev, struct 
sk_buff *skb)
 
txq = skb_get_rxhash(skb);
if (txq) {
-   e = tun_flow_find(>flows[tun_hashfn(txq)], txq);
+   e = tun_flow_find(flex_array_get(tun->flows, tun_hashfn(txq)),
+ txq);
if (e)
txq = e->queue_index;
else
@@ -841,23 +846,45 @@ static const struct net_device_ops tap_netdev_ops = {
 #endif
 };
 
-static void tun_flow_init(struct tun_struct *tun)
+static int tun_flow_init(struct tun_struct *tun, bool mq)
 {
-   int i;
+   struct flex_array *buckets;
+   int i, err;
+
+   if (!mq)
+   return 0;
+
+   buckets = flex_array_alloc(sizeof(struct hlist_head),
+   TUN_NUM_FLOW_ENTRIES, GFP_KERNEL);
+   if (!buckets)
+   return -ENOMEM;
 
+   err = flex_array_prealloc(buckets, 0, TUN_NUM_FLOW_ENTRIES, GFP_KERNEL);
+   if (err) {
+   flex_array_free(buckets);
+   return -ENOMEM;
+   }
+
+   tun->flows = buckets;
for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++)
-   INIT_HLIST_HEAD(>flows[i]);
+   INIT_HLIST_HEAD((struct hlist_head *)
+   flex_array_get(buckets, i));
 
tun->ageing_time = TUN_FLOW_EXPIRE;
setup_timer(>flow_gc_timer, tun_flow_cleanup, (unsigned long)tun);
mod_timer(>flow_gc_timer,
  round_jiffies_up(jiffies + tun->ageing_time));
+
+   return 0;
 }
 
 static void tun_flow_uninit(struct tun_struct *tun)
 {
-   del_timer_sync(>flow_gc_timer);
-   tun_flow_flush(tun);
+   if (tun->flags & TUN_TAP_MQ) {
+

[net-next rfc 0/3] increase the limit of tuntap queues

2013-06-18 Thread Jason Wang

Hi all:

This series tries to increase the limit of tuntap queues. Histrocially there're
two reasons which prevent us from doing this:

- We store the hash buckets in tun_struct which results a very large size of
  tun_struct, this high order memory allocation fail easily when the memory were
  fragmented.
- The netdev_queue and netdev_rx_queue array in netdevice were allocated through
  kmalloc, which may cause a high order memory allocation too when we have
  several queues. E.g. sizeof(netdev_queue) is 320, which means a high order
  allocation will happens when the device has more than 12 queues.

So this series tries to address those issues by switching to use flex array. All
entries were preallocated, and since flex array always do a order-0 allocation,
we can safely increase the limit after.

Only compile test, comments or review are more than welcomed.

Jason Wang (3):
  net: avoid high order memory allocation for queues by using flex
array
  tuntap: reduce the size of tun_struct by using flex array
  tuntap: increase the max queues to 16

 drivers/net/tun.c |   59 
 include/linux/netdevice.h |   13 ++
 net/core/dev.c|   57 +++
 net/core/net-sysfs.c  |   15 +++
 net/openvswitch/flow.c|2 +-
 5 files changed, 102 insertions(+), 44 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array

2013-06-18 Thread Jason Wang

Currently, we use kcalloc to allocate rx/tx queues for a net device which could
be easily lead to a high order memory allocation request when initializing a
multiqueue net device. We can simply avoid this by switching to use flex array
which always allocate at order zero.

Signed-off-by: Jason Wang 
---
 include/linux/netdevice.h |   13 ++
 net/core/dev.c|   57 
 net/core/net-sysfs.c  |   15 +++
 3 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 09b4188..c0b5d04 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1230,7 +1231,7 @@ struct net_device {
 
 
 #ifdef CONFIG_RPS
-   struct netdev_rx_queue  *_rx;
+   struct flex_array   *_rx;
 
/* Number of RX queues allocated at register_netdev() time */
unsigned intnum_rx_queues;
@@ -1250,7 +1251,7 @@ struct net_device {
 /*
  * Cache lines mostly used on transmit path
  */
-   struct netdev_queue *_tx cacheline_aligned_in_smp;
+   struct flex_array   *_tx cacheline_aligned_in_smp;
 
/* Number of TX queues allocated at alloc_netdev_mq() time  */
unsigned intnum_tx_queues;
@@ -1434,7 +1435,7 @@ static inline
 struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
 unsigned int index)
 {
-   return >_tx[index];
+   return flex_array_get(dev->_tx, index);
 }
 
 static inline void netdev_for_each_tx_queue(struct net_device *dev,
@@ -1445,8 +1446,10 @@ static inline void netdev_for_each_tx_queue(struct 
net_device *dev,
 {
unsigned int i;
 
-   for (i = 0; i < dev->num_tx_queues; i++)
-   f(dev, >_tx[i], arg);
+   for (i = 0; i < dev->num_tx_queues; i++) {
+   struct netdev_queue *txq = flex_array_get(dev->_tx, i);
+   f(dev, txq, arg);
+   }
 }
 
 extern struct netdev_queue *netdev_pick_tx(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index fa007db..3a4ecb1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -130,6 +130,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "net-sysfs.h"
 
@@ -2902,7 +2903,7 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
if (rxq_index == skb_get_rx_queue(skb))
goto out;
 
-   rxqueue = dev->_rx + rxq_index;
+   rxqueue = flex_array_get(dev->_rx, rxq_index);
flow_table = rcu_dereference(rxqueue->rps_flow_table);
if (!flow_table)
goto out;
@@ -2950,9 +2951,9 @@ static int get_rps_cpu(struct net_device *dev, struct 
sk_buff *skb,
  dev->name, index, dev->real_num_rx_queues);
goto done;
}
-   rxqueue = dev->_rx + index;
+   rxqueue = flex_array_get(dev->_rx, index);
} else
-   rxqueue = dev->_rx;
+   rxqueue = flex_array_get(dev->_rx, 0);
 
map = rcu_dereference(rxqueue->rps_map);
if (map) {
@@ -3038,7 +3039,7 @@ done:
 bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
 u32 flow_id, u16 filter_id)
 {
-   struct netdev_rx_queue *rxqueue = dev->_rx + rxq_index;
+   struct netdev_rx_queue *rxqueue = flex_array_get(dev->_rx, rxq_index);
struct rps_dev_flow_table *flow_table;
struct rps_dev_flow *rflow;
bool expire = true;
@@ -5223,18 +5224,31 @@ EXPORT_SYMBOL(netif_stacked_transfer_operstate);
 static int netif_alloc_rx_queues(struct net_device *dev)
 {
unsigned int i, count = dev->num_rx_queues;
-   struct netdev_rx_queue *rx;
+   struct flex_array *rx;
+   int err;
 
BUG_ON(count < 1);
 
-   rx = kcalloc(count, sizeof(struct netdev_rx_queue), GFP_KERNEL);
-   if (!rx)
+   rx = flex_array_alloc(sizeof(struct netdev_rx_queue), count,
+ GFP_KERNEL | __GFP_ZERO);
+   if (!rx) {
+   pr_err("netdev: Unable to allocate flex array for rx queues\n");
return -ENOMEM;
+   }
+
+   err = flex_array_prealloc(rx, 0, count, GFP_KERNEL | __GFP_ZERO);
+   if (err) {
+   pr_err("netdev, Unable to prealloc %u rx qeueus\n", count);
+   flex_array_free(rx);
+   return err;
+   }
 
dev->_rx = rx;
 
-   for (i = 0; i < count; i++)
-   rx[i].dev = dev;
+   for (i = 0; i < count; i++) {
+   struct netdev_rx_queue *rxq = flex_array_get(rx, i);
+   rxq->dev = dev;
+   }
return 0;
 }
 #endif
@@ -5256,13 +5270,24 @@ static void netdev_init_one_queue(struct net_device 
*dev,
 static int netif_alloc_netdev_queues(struct

linux-next: manual merge of the driver-core tree with the driver-core.current tree

2013-06-18 Thread Stephen Rothwell

Hi Greg,

Today's linux-next merge of the driver-core tree got a conflict in
drivers/base/firmware_class.c between commit 875979368eb4 ("firmware
loader: fix use-after-free by double abort") from the driver-core.current
tree and commit fe304143b0c3 ("firmware: Avoid deadlock of usermodehelper
lock at shutdown") from the driver-core tree.

I fixed it up (more may be required - see below) and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/base/firmware_class.c
index 01e2103,6ede229..000
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@@ -446,22 -452,11 +452,18 @@@ static struct firmware_priv *to_firmwar
return container_of(dev, struct firmware_priv, dev);
  }
  
- static void fw_load_abort(struct firmware_priv *fw_priv)
+ static void fw_load_abort(struct firmware_buf *buf)
  {
-   struct firmware_buf *buf = fw_priv->buf;
- 
 +  /*
 +   * There is a small window in which user can write to 'loading'
 +   * between loading done and disappearance of 'loading'
 +   */
 +  if (test_bit(FW_STATUS_DONE, >status))
 +  return;
 +
+   list_del_init(>pending_list);
set_bit(FW_STATUS_ABORT, >status);
complete_all(>completion);
- 
-   /* avoid user action after loading abort */
-   fw_priv->buf = NULL;
  }
  
  #define is_fw_load_aborted(buf)   \


pgp7X7les8v4_.pgp
Description: PGP signature

Re: [PATCH 1/2] perf/Power7: Save dcache_src fields in sample record.

2013-06-18 Thread Sukadev Bhattiprolu

Michael Neuling [mi...@neuling.org] wrote:
| Suka,
| 
| One of these two patches breaks pmac32_defconfig and I suspect all other
| 32 bit configs (against mainline)
| 
| arch/powerpc/perf/core-book3s.c: In function 'record_and_restart':
| arch/powerpc/perf/core-book3s.c:1632:4: error: passing argument 1 of 
'ppmu->get_mem_data_src' from incompatible pointer type [-Werror]
| arch/powerpc/perf/core-book3s.c:1632:4: note: expected 'struct 
perf_sample_data *' but argument is of type 'struct perf_sample_data *'
| 
| benh is busy enough without this junk.  Please check the simple things
| like white space and compile errors!

Sorry about that.

BTW, this was an early patch more to get some feedback on mapping of
memory hierarchy levels to Power and not intended to be merged. I have
been reworking the patch based on other comments.

Sukadev

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v7] Cpufreq: Fix governor start/stop race condition

2013-06-18 Thread Xiaoguang Chen

Cpufreq governor's stop and start operation should be kept in sequence.
If not, there will be unexpected behavior, for example:

There are 4 CPUs and policy->cpu=CPU0, CPU1/2/3 are linked to CPU0.
The normal sequence is as below:

1) Current governor is userspace, One application tries to set
governor to ondemand. It will call __cpufreq_set_policy in which it
will stop userspace governor and then start ondemand governor.

2) Current governor is userspace, Now CPU0 hotplugs in CPU3(put CPU3 online),
It will call cpufreq_add_policy_cpu in which it first stops userspace
governor, and then starts userspace governor.

Now if the sequence of above two cases interleaves, It becames
below sequence:

1) Application stops userspace governor
2)  Hotplug stops userspace governor
3) Application starts ondemand governor
4)  Hotplug starts a governor

In step 4, hotplug is supposed to start userspace governor, But now
the governor has been changed by application to ondemand, So hotplug
starts ondemand governor again 

The solution is: Do not allow stop one policy's governor multi-times.
Governor stop should only do once for one policy, After it is stopped,
No other governor stop should be executed. also add one mutex to
protect __cpufreq_governor so governor operation can be kept in sequence.

Signed-off-by: Xiaoguang Chen 
---
 drivers/cpufreq/cpufreq.c | 24 
 include/linux/cpufreq.h   |  1 +
 2 files changed, 25 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 2d53f47..6f5aa6f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -46,6 +46,7 @@ static DEFINE_PER_CPU(struct cpufreq_policy *, 
cpufreq_cpu_data);
 static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor);
 #endif
 static DEFINE_RWLOCK(cpufreq_driver_lock);
+static DEFINE_MUTEX(cpufreq_governor_lock);
 
 /*
  * cpu_policy_rwsem is a per CPU reader-writer semaphore designed to cure
@@ -1562,6 +1563,21 @@ static int __cpufreq_governor(struct cpufreq_policy 
*policy,
 
pr_debug("__cpufreq_governor for CPU %u, event %u\n",
policy->cpu, event);
+
+   mutex_lock(_governor_lock);
+   if ((!policy->governor_enabled && (event == CPUFREQ_GOV_STOP)) ||
+   (policy->governor_enabled && (event == CPUFREQ_GOV_START))) {
+   mutex_unlock(_governor_lock);
+   return -EBUSY;
+   }
+
+   if (event == CPUFREQ_GOV_STOP)
+   policy->governor_enabled = false;
+   else if (event == CPUFREQ_GOV_START)
+   policy->governor_enabled = true;
+
+   mutex_unlock(_governor_lock);
+
ret = policy->governor->governor(policy, event);
 
if (!ret) {
@@ -1569,6 +1585,14 @@ static int __cpufreq_governor(struct cpufreq_policy 
*policy,
policy->governor->initialized++;
else if (event == CPUFREQ_GOV_POLICY_EXIT)
policy->governor->initialized--;
+   } else {
+   /* Restore original values */
+   mutex_lock(_governor_lock);
+   if (event == CPUFREQ_GOV_STOP)
+   policy->governor_enabled = true;
+   else if (event == CPUFREQ_GOV_START)
+   policy->governor_enabled = false;
+   mutex_unlock(_governor_lock);
}
 
/* we keep one module reference alive for
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 037d36a..1a81b74 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -107,6 +107,7 @@ struct cpufreq_policy {
unsigned intpolicy; /* see above */
struct cpufreq_governor *governor; /* see below */
void*governor_data;
+   boolgovernor_enabled; /* governor start/stop flag */
 
struct work_struct  update; /* if update_policy() needs to be
 * called, but you're in IRQ context */
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] extcon: Add an API to get extcon device from dt node

2013-06-18 Thread Chanwoo Choi

From: Kishon Vijay Abraham I 

Added an API of_extcon_get_extcon_dev() to be used by drivers to get
extcon device in the case of dt boot (this can be used instead of
extcon_get_extcon_dev()).

Signed-off-by: Kishon Vijay Abraham I 
Signed-off-by: Chanwoo Choi 
Signed-off-by: Myungjoo Ham 
---
Changes since v2:
- Add CONFIG_OF_EXTCON to Kconfig
- Use IS_ENABLED() macro to check both CONFIG_EXTCON and CONFIG_EXTCON_MODULE

Changes since v1:
- If edev->name is NULL, dev_name(dev) is used as edev->name.
- Change filename from of-extcon.* to of_extcon.*
- Fix build error when CONFIG_OF is not set
- Add header file(linux/err.h) to of_extcon.c

 drivers/extcon/Kconfig   |  4 +++
 drivers/extcon/Makefile  |  2 ++
 drivers/extcon/extcon-class.c|  3 +-
 drivers/extcon/of_extcon.c   | 64 
 include/linux/extcon/of_extcon.h | 31 +++
 5 files changed, 103 insertions(+), 1 deletion(-)
 create mode 100644 drivers/extcon/of_extcon.c
 create mode 100644 include/linux/extcon/of_extcon.h

diff --git a/drivers/extcon/Kconfig b/drivers/extcon/Kconfig
index 63f454e..f1d54a3 100644
--- a/drivers/extcon/Kconfig
+++ b/drivers/extcon/Kconfig
@@ -14,6 +14,10 @@ if EXTCON
 
 comment "Extcon Device Drivers"
 
+config OF_EXTCON
+   def_tristate y
+   depends on OF
+
 config EXTCON_GPIO
tristate "GPIO extcon support"
depends on GPIOLIB
diff --git a/drivers/extcon/Makefile b/drivers/extcon/Makefile
index 540e2c3..759fdae 100644
--- a/drivers/extcon/Makefile
+++ b/drivers/extcon/Makefile
@@ -2,6 +2,8 @@
 # Makefile for external connector class (extcon) devices
 #
 
+obj-$(CONFIG_OF_EXTCON)+= of_extcon.o
+
 obj-$(CONFIG_EXTCON)   += extcon-class.o
 obj-$(CONFIG_EXTCON_GPIO)  += extcon-gpio.o
 obj-$(CONFIG_EXTCON_ADC_JACK)  += extcon-adc-jack.o
diff --git a/drivers/extcon/extcon-class.c b/drivers/extcon/extcon-class.c
index 23f11ea..08509ea 100644
--- a/drivers/extcon/extcon-class.c
+++ b/drivers/extcon/extcon-class.c
@@ -602,7 +602,8 @@ int extcon_dev_register(struct extcon_dev *edev, struct 
device *dev)
edev->dev->class = extcon_class;
edev->dev->release = extcon_dev_release;
 
-   dev_set_name(edev->dev, edev->name ? edev->name : dev_name(dev));
+   edev->name = edev->name ? edev->name : dev_name(dev);
+   dev_set_name(edev->dev, edev->name);
 
if (edev->max_supported) {
char buf[10];
diff --git a/drivers/extcon/of_extcon.c b/drivers/extcon/of_extcon.c
new file mode 100644
index 000..72173ec
--- /dev/null
+++ b/drivers/extcon/of_extcon.c
@@ -0,0 +1,64 @@
+/*
+ * OF helpers for External connector (extcon) framework
+ *
+ * Copyright (C) 2013 Texas Instruments, Inc.
+ * Kishon Vijay Abraham I 
+ *
+ * Copyright (C) 2013 Samsung Electronics
+ * Chanwoo Choi 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * of_extcon_get_extcon_dev - Get the name of extcon device from devicetree
+ * @dev - instance to the given device
+ * @index - index into list of extcon_dev
+ *
+ * return the instance of extcon device
+ */
+struct extcon_dev *of_extcon_get_extcon_dev(struct device *dev, int index)
+{
+   struct device_node *node;
+   struct extcon_dev *edev;
+   struct platform_device *extcon_parent_dev;
+
+   if (!dev->of_node) {
+   dev_dbg(dev, "device does not have a device node entry\n");
+   return ERR_PTR(-EINVAL);
+   }
+
+   node = of_parse_phandle(dev->of_node, "extcon", index);
+   if (!node) {
+   dev_dbg(dev, "failed to get phandle in %s node\n",
+   dev->of_node->full_name);
+   return ERR_PTR(-ENODEV);
+   }
+
+   extcon_parent_dev = of_find_device_by_node(node);
+   if (!extcon_parent_dev) {
+   dev_dbg(dev, "unable to find device by node\n");
+   return ERR_PTR(-EPROBE_DEFER);
+   }
+
+   edev = extcon_get_extcon_dev(dev_name(_parent_dev->dev));
+   if (!edev) {
+   dev_dbg(dev, "unable to get extcon device : %s\n",
+   dev_name(_parent_dev->dev));
+   return ERR_PTR(-ENODEV);
+   }
+
+   return edev;
+}
+EXPORT_SYMBOL_GPL(of_extcon_get_extcon_dev);
diff --git a/include/linux/extcon/of_extcon.h b/include/linux/extcon/of_extcon.h
new file mode 100644
index 000..0ebfeff
--- /dev/null
+++ b/include/linux/extcon/of_extcon.h
@@ -0,0 +1,31 @@
+/*
+ * OF helpers for External connector (extcon) framework
+ *
+ * Copyright (C) 2013 Texas Instruments, Inc.
+ * Kishon Vijay Abraham I 
+ *
+ * Copyright (C) 2013 Samsung Electronics
+ * Chanwoo Choi 
+ *
+ * This

Re: [PATCH v6] Cpufreq: Fix governor start/stop race condition

2013-06-18 Thread Xiaoguang Chen

2013/6/19 Viresh Kumar :
> On 19 June 2013 10:45, Xiaoguang Chen  wrote:
>> 2013/6/19 Viresh Kumar :
>>> On 19 June 2013 07:13, Xiaoguang Chen  wrote:
 There are 4 CPUs and policy->cpu=cpu0, cpu1/2/3 are linked to cpu0.
 The normal sequence is as below:
>>>
>>> I thought Rafael asked to write cpu0 as CPU0, ...
>>
>> I changed "cpus" to "CPUs" as Rafael suggested(Please spell cpus as
>> "CPUs".  And please start sequences from capitals.)
>> do I need to change other "cpu" to "CPU"?
>
> I am not really sure what he meant by "sequence", but I guess its regarding
> cpu0, cpu1, cpu2, etc... and not cpu .

Ok, it's not a big deal, I'll change it.

> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-06-18 Thread Linus Torvalds

On Tue, Jun 18, 2013 at 9:13 AM, Stanislav Meduna  wrote:
>
> No crash in 2 days running with preempt none...

Is this UP?

There's the fast_tlb race that Peter fixed in commit 29eb77825cc7
("arch, mm: Remove tlb_fast_mode()"). I'm not seeing how it would
cause infinite TLB faults, but it definitely causes potentially
incoherent TLB contents. And afaik it only happens with
CONFIG_PREEMPT, and on UP systems. Which sounds like it might match
your setup...

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUGFIX v2 0/4] fix bug 56531, 59501 and 59581

2013-06-18 Thread Alexander E. Patrakov

2013/6/19 Rafael J. Wysocki :
> OK, let's try to untangle this a bit.
>
> If you applyt patches [1/4] and [4/4] from the $subject series only, what
> does remain unfixed?

[not tested, can do so in 12 hours if needed]

I think there will be problems on undocking and/or on the second
docking, as described in comments #6 - #8 of
https://bugzilla.kernel.org/show_bug.cgi?id=59501

-- 
Alexander E. Patrakov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] Cpufreq: Fix governor start/stop race condition

2013-06-18 Thread Viresh Kumar

On 19 June 2013 10:45, Xiaoguang Chen  wrote:
> 2013/6/19 Viresh Kumar :
>> On 19 June 2013 07:13, Xiaoguang Chen  wrote:
>>> There are 4 CPUs and policy->cpu=cpu0, cpu1/2/3 are linked to cpu0.
>>> The normal sequence is as below:
>>
>> I thought Rafael asked to write cpu0 as CPU0, ...
>
> I changed "cpus" to "CPUs" as Rafael suggested(Please spell cpus as
> "CPUs".  And please start sequences from capitals.)
> do I need to change other "cpu" to "CPU"?

I am not really sure what he meant by "sequence", but I guess its regarding
cpu0, cpu1, cpu2, etc... and not cpu .
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Question regarding put_prev_task in preempted condition

2013-06-18 Thread Lei Wen

Hi Peter,

On Tue, Jun 18, 2013 at 5:55 PM, Peter Zijlstra  wrote:
> On Sun, Jun 09, 2013 at 11:59:36PM +0800, Lei Wen wrote:
>> Hi Peter,
>>
>> While I am checking the preempt related code, I find a interesting part.
>> That is when preempt_schedule is called, for its preempt_count be added
>> PREEMPT_ACTIVE, so in __schedule() it could not be dequeued from rq
>> by deactivate_task.
>>
>> Thus in put_prev_task, which is called a little later in __schedule(), it
>> would call put_prev_task_fair, which finally calls put_prev_entity.
>> For current task is not dequeued from rq, so in this function, it would
>> enqueue it again to the rq by __enqueue_entity.
>>
>> Is there any reason to do like this, since entity already is over rq,
>> why need to queue it again?
>
> Because we keep the current running task outside of the actual queue
> structure. This is because every time we update the runtime
> (__update_curr) the key on which the tree is sorted (vruntime) is
> changed and we'd need to dequeue + enqueue to keep the tree in sync.

I see... I didn't notice for this difference...

>
> By not having the actively running task in the tree we can avoid this;
> at the cost of having to dequeue on switching to the task and enqueue
> when switching from the task.
>
>> And if current rq's vruntime distribution like below, and vruntime with 8
>> is the task that would be get preempted. So in __enqueue_entity,
>> its rb_left/rb_right would be set as NULL and reinserted into this RB tree.
>> Then seems to me now, the entity with vruntime of 3 would be disappeared
>> from the RB tree.
>> 13
>>/  \
>>  819
>> /  \
>>   311
>>
>> I am not sure whether I understand the whole process correctly...
>> Would the example as above happen in our real life?
>
> No, the RB tree code will ensure we'll not loose 3. I suppose you're
> confused by rb_link_node() which does indeed clear the left and right
> node of the entity we're about to link.

Yep, since 8 is not over rq, NULL its two child would lose any info.
Thanks for detailed explanation! :)

Thanks,
Lei

>
> However, we link the previously unlinked entity as a leaf node. So your
> example is flawed; before insertion the tree would look something like:
>
>
>  13
> /  \
>11  19
>   /
>  3
>
> Then the lookup in __enqueue_entity would find the place to insert 8 and
> would select the right sibling of 3:
>
>  13
> /  \
>11  19
>   /
>  3
>   \
>(8)
>
> We'd then link 8 as a child leaf of 3; which will indeed have NULL
> leafs. rb_insert_color() will then fix up the tree so we conform to the
> RB constraints. Please read lib/rbtree.c:__rb_insert() the code is quite
> readable.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] Cpufreq: Fix governor start/stop race condition

2013-06-18 Thread Xiaoguang Chen

2013/6/19 Viresh Kumar :
> On 19 June 2013 07:13, Xiaoguang Chen  wrote:
>> There are 4 CPUs and policy->cpu=cpu0, cpu1/2/3 are linked to cpu0.
>> The normal sequence is as below:
>
> I thought Rafael asked to write cpu0 as CPU0, ...

I changed "cpus" to "CPUs" as Rafael suggested(Please spell cpus as
"CPUs".  And please start sequences from capitals.)
do I need to change other "cpu" to "CPU"?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Benjamin Herrenschmidt

On Wed, 2013-06-19 at 13:05 +0930, Rusty Russell wrote:
> symbol_get() won't try to load a module; it'll just fail.  This is what
> you want, since they must have vfio in the kernel to get a valid fd...

Ok, cool. I suppose what we want here Alexey is slightly higher level,
something like:

vfio_validate_iommu_id(file, iommu_id)

Which verifies that the file that was passed in is allowed to use
that iommu_id.

That's a simple and flexible interface (ie, it will work even if we
support multiple iommu IDs in the future for a vfio, for example
for DDW windows etc...), the logic to know about the ID remains
in qemu, this is strictly a validation call.

That way we also don't have to expose the containing vfio struct etc...
just that simple function.

Alex, any objection ?

Do we need to make it a get/put interface instead ?

vfio_validate_and_use_iommu(file, iommu_id);

vfio_release_iommu(file, iommu_id);

To ensure that the resource remains owned by the process until KVM
is closed as well ?

Or do we want to register with VFIO with a callback so that VFIO can
call us if it needs us to give it up ?

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] clk: exynos4: Add alias for cpufreq related clocks

2013-06-18 Thread Tushar Behera

On 06/17/2013 10:20 AM, Tushar Behera wrote:
> On 06/11/2013 12:23 AM, Tomasz Figa wrote:
>> On Monday 10 of June 2013 09:13:11 Tushar Behera wrote:
>>> On 06/08/2013 05:20 PM, Tomasz Figa wrote:
 On Thursday 06 of June 2013 16:52:28 Tushar Behera wrote:
> 
> [ ... ]
> 
>   MUX_A(mout_core, "mout_core", mout_core_p4210,
>
> - SRC_CPU, 16, 1, "mout_core"),
> + SRC_CPU, 16, 1, "moutcore"),

 IMHO those typo corrections are not part of this patch.
>>>
>>> But the older drivers (before migration to CCF) were using the clock
>>> "moutcore" (not "mout_core").
>>
>> I mean, this should be placed in a separate patch, as this change is not 
>> "adding alias for cpufreq related clocks", but rather fixing a typo.
>>
> 
> Is it ok if I split this patch into 2, one adding clock alias
> 'mout_apll' and another one fixing the alias names 'mout_mpll',
> 'moutcore' and 'armclk'?
> 

I have to fix up another clock for exynos4x12 too. I feel all these
modifications are too small to justify different patches. I would modify
the commit message appropriately.


> [ ... ]
> 
 Basically I don't like the idea of those global aliases, which IMHO
 should be completely dropped. Someone might not like it, but I'd go
 with the conversion of our cpufreq drivers to platform drivers
 instead, which could receive things like clocks and regulators using
 DT-based lookups.
>>> I agree. Migration of exynos-cpufreq driver as a platform driver is the
>>> best solution. But unless someone picks up that work, cpufreq support
>>> for EXYNOS4 based systems is broken because of the incorrect clock
>>> aliases.
>>
>> We have patches for this in our internal tree. I will clean them up a bit 
>> and submit soon.
>>
> 
> If you are going to submit the cpufreq driver patches for v3.11, then we
> can ignore this patchset. Otherwise, I would prefer to get these patches
> merged for v3.11 to get cpufreq working. Once the driver changes are
> incorporated, we can very well modify these later.
> 
> Thanks.
> 


-- 
Tushar Behera
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs/jbd2: t_updates should increase when start_this_handle() failed in jbd2__journal_restart()

2013-06-18 Thread Younger Liu

jbd2_journal_restart() would restart a handle. In this function, it
calls start_this_handle(). Before calling start_this_handle()，subtract
1 from transaction->t_updates.
If start_this_handle() succeeds, transaction->t_updates increases by 1
in it. But if start_this_handle() fails, transaction->t_updates does
not increase.
So, when commit the handle's transaction in jbd2_journal_stop(), the
assertion is false, and then trigger a bug.
The assertion is as follows:
J_ASSERT(atomic_read(>t_updates) > 0) 

Signed-off-by: Younger Liu 
---
 fs/jbd2/transaction.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 325bc01..9ddb444 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -530,6 +530,8 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, 
gfp_t gfp_mask)
lock_map_release(>h_lockdep_map);
handle->h_buffer_credits = nblocks;
ret = start_this_handle(journal, handle, gfp_mask);
+   if (ret < 0)
+   atomic_inc(>t_updates);
return ret;
 }
 EXPORT_SYMBOL(jbd2__journal_restart);
-- 
1.7.9.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH Resend 1/2] drivers: uio_dmem_genirq: Use of_match_ptr() macro

2013-06-18 Thread Sachin Kamat

This eliminates having an #ifdef returning NULL for the case
when OF is disabled.

Signed-off-by: Sachin Kamat 
Acked-by: Damian Hobson-Garcia 
---
This series is based on linux-next (next-20130618) and is
compile tested.
---

 drivers/uio/uio_dmem_genirq.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/uio/uio_dmem_genirq.c b/drivers/uio/uio_dmem_genirq.c
index 252434c..125d0e5 100644
--- a/drivers/uio/uio_dmem_genirq.c
+++ b/drivers/uio/uio_dmem_genirq.c
@@ -336,8 +336,6 @@ static const struct of_device_id uio_of_genirq_match[] = {
{ /* empty for now */ },
 };
 MODULE_DEVICE_TABLE(of, uio_of_genirq_match);
-#else
-# define uio_of_genirq_match NULL
 #endif
 
 static struct platform_driver uio_dmem_genirq = {
@@ -347,7 +345,7 @@ static struct platform_driver uio_dmem_genirq = {
.name = DRIVER_NAME,
.owner = THIS_MODULE,
.pm = _dmem_genirq_dev_pm_ops,
-   .of_match_table = uio_of_genirq_match,
+   .of_match_table = of_match_ptr(uio_of_genirq_match),
},
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH Resend 2/2] drivers: uio_pdrv_genirq: Use of_match_ptr() macro

2013-06-18 Thread Sachin Kamat

This eliminates having an #ifdef returning NULL for the case
when OF is disabled.

Signed-off-by: Sachin Kamat 
---
 drivers/uio/uio_pdrv_genirq.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/uio/uio_pdrv_genirq.c b/drivers/uio/uio_pdrv_genirq.c
index c122bca..d7ba355 100644
--- a/drivers/uio/uio_pdrv_genirq.c
+++ b/drivers/uio/uio_pdrv_genirq.c
@@ -267,8 +267,6 @@ static const struct of_device_id uio_of_genirq_match[] = {
{ /* empty for now */ },
 };
 MODULE_DEVICE_TABLE(of, uio_of_genirq_match);
-#else
-# define uio_of_genirq_match NULL
 #endif
 
 static struct platform_driver uio_pdrv_genirq = {
@@ -278,7 +276,7 @@ static struct platform_driver uio_pdrv_genirq = {
.name = DRIVER_NAME,
.owner = THIS_MODULE,
.pm = _pdrv_genirq_dev_pm_ops,
-   .of_match_table = uio_of_genirq_match,
+   .of_match_table = of_match_ptr(uio_of_genirq_match),
},
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI / PM: Fix error code path for power resources initialization

2013-06-18 Thread Yasuaki Ishimatsu


2013/06/18 7:37, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Commit 781d737 (ACPI: Drop power resources driver) introduced a
bug in the power resources initialization error code path causing
a NULL pointer to be referenced in acpi_release_power_resource()
if there's an error triggering a jump to the 'err' label in
acpi_add_power_resource().  This happens because the list_node
field of struct acpi_power_resource has not been initialized yet
at this point and doing a list_del() on it is a bad idea.

To prevent this problem from occuring, initialize the list_node
field of struct acpi_power_resource upfront.

Reported-by: Mika Westerberg 
Signed-off-by: Rafael J. Wysocki 
Cc: 3.9+ 


Acked-by: Yasuaki Ishimatsu 

Thanks,
Yasuaki Ishimatsu


---
  drivers/acpi/power.c |1 +
  1 file changed, 1 insertion(+)

Index: linux-pm/drivers/acpi/power.c
===
--- linux-pm.orig/drivers/acpi/power.c
+++ linux-pm/drivers/acpi/power.c
@@ -882,6 +882,7 @@ int acpi_add_power_resource(acpi_handle
ACPI_STA_DEFAULT);
mutex_init(>resource_lock);
INIT_LIST_HEAD(>dependent);
+   INIT_LIST_HEAD(>list_node);
resource->name = device->pnp.bus_id;
strcpy(acpi_device_name(device), ACPI_POWER_DEVICE_NAME);
strcpy(acpi_device_class(device), ACPI_POWER_CLASS);

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI: Remove the use of CONFIG_ACPI_HOTPLUG_MEMORY_MODULE

2013-06-18 Thread Yasuaki Ishimatsu

2013/06/19 6:06, Toshi Kani wrote:
> config ACPI_HOTPLUG_MEMORY has been changed to bool (y/n), and
> its module option is no longer valid.  So, remove the use of
> CONFIG_ACPI_HOTPLUG_MEMORY_MODULE.
> 
> Signed-off-by: Toshi Kani 
> ---
>   include/linux/acpi.h |3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 17b5b59..353ba25 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -352,8 +352,7 @@ extern acpi_status acpi_pci_osc_control_set(acpi_handle 
> handle,
>   
>   /* Enable _OST when all relevant hotplug operations are enabled */
>   #if defined(CONFIG_ACPI_HOTPLUG_CPU) && \
> - (defined(CONFIG_ACPI_HOTPLUG_MEMORY) || \
> -  defined(CONFIG_ACPI_HOTPLUG_MEMORY_MODULE)) && \
> + defined(CONFIG_ACPI_HOTPLUG_MEMORY) &&  \
>   defined(CONFIG_ACPI_CONTAINER)
>   #define ACPI_HOTPLUG_OST
>   #endif
> --
Good catch!!

Acked-by: Yasuaki Ishimatsu 

Thanks,
Yasuaki Ishimatsu

> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3.9-stable] ARM: tegra30: clocks: Fix pciex clock registration

2013-06-18 Thread Jay Agarwal

+Stephen to suggest

> -Original Message-
> From: Jonghwan Choi [mailto:jhbird.c...@samsung.com]
> Sent: Wednesday, June 19, 2013 5:49 AM
> To: 'Jonghwan Choi'; linux-kernel@vger.kernel.org
> Cc: sta...@vger.kernel.org; Jay Agarwal
> Subject: [PATCH 3.9-stable] ARM: tegra30: clocks: Fix pciex clock registration
> 
> This patch looks like it should be in the 3.9-stable tree, should we apply it?
> 
> --
> 
> From: "Jay Agarwal "
> 
> commit ff49fad1d9bf2c49f52817b04cde8e4412434637 upstream
> 
> Registering pciex as peripheral clock instead of fixed clock as
> tegra_perih_reset_assert(deassert) api of this clock api gives warning and
> ultimately does not succeed to assert(deassert)
> 
> Signed-off-by: Jay Agarwal 
> Acked-by: Stephen Warren 
> Signed-off-by: Mike Turquette 
> Signed-off-by: Jonghwan Choi 
> ---
>  drivers/clk/tegra/clk-tegra30.c |   11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/clk/tegra/clk-tegra30.c b/drivers/clk/tegra/clk-tegra30.c
> index ba6f51b..1f8595b 100644
> --- a/drivers/clk/tegra/clk-tegra30.c
> +++ b/drivers/clk/tegra/clk-tegra30.c
> @@ -1592,6 +1592,12 @@ static void __init tegra30_periph_clk_init(void)
>   clk_register_clkdev(clk, "afi", "tegra-pcie");
>   clks[afi] = clk;
> 
> + /* pciex */
> + clk = tegra_clk_register_periph_gate("pciex", "pll_e", 0, clk_base,
> 0,
> + 74, _u_regs,
> periph_clk_enb_refcnt);
> + clk_register_clkdev(clk, "pciex", "tegra-pcie");
> + clks[pciex] = clk;
> +
>   /* kfuse */
>   clk = tegra_clk_register_periph_gate("kfuse", "clk_m",
>   TEGRA_PERIPH_ON_APB,
> @@ -1710,11 +1716,6 @@ static void __init tegra30_fixed_clk_init(void)
>   1, 0, _lock);
>   clk_register_clkdev(clk, "cml1", NULL);
>   clks[cml1] = clk;
> -
> - /* pciex */
> - clk = clk_register_fixed_rate(NULL, "pciex", "pll_e", 0, 1);
> - clk_register_clkdev(clk, "pciex", NULL);
> - clks[pciex] = clk;
>  }
> 
>  static void __init tegra30_osc_clk_init(void)
> --
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] perf/Power7: Save dcache_src fields in sample record.

2013-06-18 Thread Michael Neuling

Suka,

One of these two patches breaks pmac32_defconfig and I suspect all other
32 bit configs (against mainline)

arch/powerpc/perf/core-book3s.c: In function 'record_and_restart':
arch/powerpc/perf/core-book3s.c:1632:4: error: passing argument 1 of 
'ppmu->get_mem_data_src' from incompatible pointer type [-Werror]
arch/powerpc/perf/core-book3s.c:1632:4: note: expected 'struct perf_sample_data 
*' but argument is of type 'struct perf_sample_data *'

benh is busy enough without this junk.  Please check the simple things
like white space and compile errors!

Mikey

Sukadev Bhattiprolu  wrote:
> From: Sukadev Bhattiprolu 
> Date: Wed, 8 May 2013 22:59:29 -0700
> Subject: [PATCH 1/2] perf/Power7: Save dcache_src fields in sample record.
> 
> Power7 saves the "perf-event vector" information in the mmcra register.
> Included in this event vector is a "data-cache source" field which
> identifies where in the memory-hierarchy the data for an instruction
> was found.
> 
> Use the 'struct perf_mem_data_source' to export the "data-cache source"
> field to user space.
> 
> The mapping between the Power7 hierarchy levels and the arch-neutral
> levels is, unfortunately, not trivial.
> 
>   Arch-neutral levels Power7 levels
>   -
>   local   LVL_L2  local (same core) L2 (FROM_L2)
>   local   LVL_L3  local (same core) L3 (FROM_L3)
> 
>   1-hop   REM_CCE1different core on same chip (FROM_L2.1, _L3.1)
>   2-hops  REM_CCE2remote (different chip, same node) (FROM_RL2L3)
>   3-hops  REM_CCE3*   distant (different node)  (FROM_DL2L3)
> 
>   1-hop   REM_MEM1unused
>   2-hops  REM_MEM2remote (different chip, same node) (FROM_RMEM)
>   3-hops  REM_MEM3*   distant (different node) (FROM_DMEM)
> 
> * proposed "extended" levels.
> 
> AFAICT, Power7 supports one extra level in the cache-hierarchy, so we propose
> to add a new cache level, REM_CCE3 shown above.
> 
> To maintain consistency in terminology (i.e 2-hops = remote, 3-hops = 
> distant),
> I propose leaving the REM_MEM1 unused and adding another level, REM_MEM3.
> 
> Further, in the above REM_CCE1 case, Power7 can also identify if the data came
> from the L2 or L3 cache of another core on the same chip. To describe this to
> user space, we propose to set ->mem_lvl to:
> 
>   PERF_MEM_LVL_REM_CCE1|PERF_MEM_LVL_L2
> 
>   PERF_MEM_LVL_REM_CCE1|PERF_MEM_LVL_L3
> 
> Either that or we could leave REM_CCE1 unused in Power and add two more 
> levels:
> 
>   PERF_MEM_XLVL_REM_L2_CCE1
>   PERF_MEM_XLVL_REM_L3_CCE1
> 
> The former approach seems less confusing and this patch uses that approach.
> 
> Signed-off-by: Sukadev Bhattiprolu 
> ---
>  arch/powerpc/include/asm/perf_event_server.h |2 +
>  arch/powerpc/perf/core-book3s.c  |4 +
>  arch/powerpc/perf/power7-pmu.c   |   81 
> ++
>  include/uapi/linux/perf_event.h  |   12 +++-
>  4 files changed, 97 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/perf_event_server.h 
> b/arch/powerpc/include/asm/perf_event_server.h
> index f265049..f2d162b 100644
> --- a/arch/powerpc/include/asm/perf_event_server.h
> +++ b/arch/powerpc/include/asm/perf_event_server.h
> @@ -37,6 +37,8 @@ struct power_pmu {
>   void(*config_bhrb)(u64 pmu_bhrb_filter);
>   void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
>   int (*limited_pmc_event)(u64 event_id);
> + void(*get_mem_data_src)(struct perf_sample_data *data,
> + struct pt_regs *regs);
>   u32 flags;
>   const struct attribute_group**attr_groups;
>   int n_generic;
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 426180b..7778fa9 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -1632,6 +1632,10 @@ static void record_and_restart(struct perf_event 
> *event, unsigned long val,
>   data.br_stack = >bhrb_stack;
>   }
>  
> + if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
> + ppmu->get_mem_data_src)
> + ppmu->get_mem_data_src(, regs);
> +
>   if (perf_event_overflow(event, , regs))
>   power_pmu_stop(event, 0);
>   }
> diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
> index 3c475d6..af92bfe 100644
> --- a/arch/powerpc/perf/power7-pmu.c
> +++ b/arch/powerpc/perf/power7-pmu.c
> @@ -209,6 +209,85 @@ static int power7_get_alternatives(u64 event, unsigned 
> int flags, u64 alt[])
>   return nalt;
>  }
>  
> +#define  POWER7_MMCRA_PEMPTY (0x1L << 63)
> +#define  POWER7_MMCRA_FIN_STALL  (0x1L << 62)
> +#define

Re: [PATCH 0/8] Volatile Ranges (v8?)

2013-06-18 Thread Minchan Kim

Hello Dhaval,

On Tue, Jun 18, 2013 at 12:59:02PM -0400, Dhaval Giani wrote:
> On 2013-06-18 12:11 AM, Minchan Kim wrote:
> >Hello Dhaval,
> >
> >On Mon, Jun 17, 2013 at 12:24:07PM -0400, Dhaval Giani wrote:
> >>Hi John,
> >>
> >>I have been giving your git tree a whirl, and in order to simulate a
> >>limited memory environment, I was using memory cgroups.
> >>
> >>The program I was using to test is attached here. It is your test
> >>code, with some changes (changing the syscall interface, reducing
> >>the memory pressure to be generated).
> >>
> >>I trapped it in a memory cgroup with 1MB memory.limit_in_bytes and hit this,
> >>
> >>[  406.207612] [ cut here ]
> >>[  406.207621] kernel BUG at mm/vrange.c:523!
> >>[  406.207626] invalid opcode:  [#1] SMP
> >>[  406.207631] Modules linked in:
> >>[  406.207637] CPU: 0 PID: 1579 Comm: volatile-test Not tainted
> >Thanks for the testing!
> >Does below patch fix your problem?
> 
> Yes it does! Thank you very much for the patch.

Thaks for the confirming.
While I tested it, I found several problems so I just sent fixes as reply
of each [7/8] and [8/8].
Could you test it?


FYI: John, Dhaval

I am working to clean purging mess up so maybe it would need not a few
change for purging part.

Thanks!

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/8] vrange: Send SIGBUS when user try to access purged page

2013-06-18 Thread Minchan Kim

On Tue, Jun 11, 2013 at 09:22:51PM -0700, John Stultz wrote:
> From: Minchan Kim 
> 
> By vrange(2) semantic, user should see SIGBUG if he try to access
> purged page without vrange(...VRANGE_NOVOLATILE).
> 
> This patch implements it.
> 
> XXX: I reused PSE bit for quick prototype without enough considering
> so need time to see what's empty bit and I am surely missing
> many places to handle vrange pte bit. I should investigate all of
> pte handling places, especially pte_none case.
> 
> Cc: Andrew Morton 
> Cc: Android Kernel Team 
> Cc: Robert Love 
> Cc: Mel Gorman 
> Cc: Hugh Dickins 
> Cc: Dave Hansen 
> Cc: Rik van Riel 
> Cc: Dmitry Adamushko 
> Cc: Dave Chinner 
> Cc: Neil Brown 
> Cc: Andrea Righi 
> Cc: Andrea Arcangeli 
> Cc: Aneesh Kumar K.V 
> Cc: Mike Hommey 
> Cc: Taras Glek 
> Cc: Dhaval Giani 
> Cc: Jan Kara 
> Cc: KOSAKI Motohiro 
> Cc: Michel Lespinasse 
> Cc: Minchan Kim 
> Cc: linux...@kvack.org 
> 
> Signed-off-by: Minchan Kim 
> [jstultz: Extended to work with file pages]
> Signed-off-by: John Stultz 
> ---
>  arch/x86/include/asm/pgtable_types.h |  2 ++
>  include/asm-generic/pgtable.h| 11 +++
>  include/linux/vrange.h   |  2 ++
>  mm/memory.c  | 23 +--
>  mm/vrange.c  | 35 ++-
>  5 files changed, 70 insertions(+), 3 deletions(-)
> 

This patch fixes the problem Dhaval reported.

>From e789359cf2ac706e1ebc925f14eb2d7187cd2267 Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Tue, 11 Jun 2013 21:22:51 -0700
Subject: [PATCH 2/2] vrange: Send SIGBUS when user try to access purged page

By vrange(2) semantic, user should see SIGBUG if he try to access
purged page without vrange(...VRANGE_NOVOLATILE).

This patch implements it.

XXX: I reused PSE bit for quick prototype without enough considering
so need time to see what's empty bit and I am surely missing
many places to handle vrange pte bit. I should investigate all of
pte handling places, especially pte_none case.

Cc: Andrew Morton 
Cc: Android Kernel Team 
Cc: Robert Love 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Rik van Riel 
Cc: Dmitry Adamushko 
Cc: Dave Chinner 
Cc: Neil Brown 
Cc: Andrea Righi 
Cc: Andrea Arcangeli 
Cc: Aneesh Kumar K.V 
Cc: Mike Hommey 
Cc: Taras Glek 
Cc: Dhaval Giani 
Cc: Jan Kara 
Cc: KOSAKI Motohiro 
Cc: Michel Lespinasse 
Cc: Minchan Kim 
Cc: linux...@kvack.org 

Signed-off-by: Minchan Kim 
[jstultz: Extended to work with file pages]
Signed-off-by: John Stultz 
---
 arch/x86/include/asm/pgtable_types.h |2 ++
 include/asm-generic/pgtable.h|   11 +++
 include/linux/vrange.h   |2 ++
 mm/memory.c  |   23 +--
 mm/vrange.c  |   31 +++
 5 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index e642300..d7ea6a0 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -64,6 +64,8 @@
 #define _PAGE_FILE (_AT(pteval_t, 1) << _PAGE_BIT_FILE)
 #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
+#define _PAGE_VRANGE   _PAGE_BIT_PSE
+
 /*
  * _PAGE_NUMA indicates that this page will trigger a numa hinting
  * minor page fault to gather numa placement statistics (see
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index a59ff51..91e8f6f 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -479,6 +479,17 @@ static inline unsigned long my_zero_pfn(unsigned long addr)
 
 #ifdef CONFIG_MMU
 
+static inline pte_t pte_mkvrange(pte_t pte)
+{
+   pte = pte_set_flags(pte, _PAGE_VRANGE);
+   return pte_clear_flags(pte, _PAGE_PRESENT);
+}
+
+static inline int pte_vrange(pte_t pte)
+{
+   return ((pte_flags(pte) | _PAGE_PRESENT) == _PAGE_VRANGE);
+}
+
 #ifndef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_trans_huge(pmd_t pmd)
 {
diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index cbb609a..75754d1 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -41,6 +41,8 @@ int discard_vpage(struct page *page);
 bool vrange_address(struct mm_struct *mm, unsigned long start,
unsigned long end);
 
+extern bool is_purged_vrange(struct mm_struct *mm, unsigned long address);
+
 #else
 
 static inline void vrange_init(void) {};
diff --git a/mm/memory.c b/mm/memory.c
index 61a262b..cc5c70b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -832,7 +833,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
 
/* pte contains position in swap or file, so copy. */
if (unlikely(!pte_present(pte))) {
-   if (!pte_file(pte)) {
+   if (!pte_file(pte) && !pte_vrange(pte)) {

Re: [PATCH 7/8] vrange: Add method to purge volatile ranges

2013-06-18 Thread Minchan Kim

On Tue, Jun 11, 2013 at 09:22:50PM -0700, John Stultz wrote:
> From: Minchan Kim 
> 
> This patch adds discarding function to purge volatile ranges under
> memory pressure. Logic is as following:
> 
> 1. Memory pressure happens
> 2. VM start to reclaim pages
> 3. Check the page is in volatile range.
> 4. If so, zap the page from the process's page table.
>(By semantic vrange(2), we should mark it with another one to
> make page fault when you try to access the address. It will
> be introduced later patch)
> 5. If page is unmapped from all processes, discard it instead of swapping.
> 
> This patch does not address the case where there is no swap, which
> keeps anonymous pages from being aged off the LRUs. Minchan has
> additional patches that add support for purging anonymous pages
> 
> XXX: First pass at file purging. Seems to work, but is likely broken
> and needs close review.
> 
> Cc: Andrew Morton 
> Cc: Android Kernel Team 
> Cc: Robert Love 
> Cc: Mel Gorman 
> Cc: Hugh Dickins 
> Cc: Dave Hansen 
> Cc: Rik van Riel 
> Cc: Dmitry Adamushko 
> Cc: Dave Chinner 
> Cc: Neil Brown 
> Cc: Andrea Righi 
> Cc: Andrea Arcangeli 
> Cc: Aneesh Kumar K.V 
> Cc: Mike Hommey 
> Cc: Taras Glek 
> Cc: Dhaval Giani 
> Cc: Jan Kara 
> Cc: KOSAKI Motohiro 
> Cc: Michel Lespinasse 
> Cc: Minchan Kim 
> Cc: linux...@kvack.org 
> Signed-off-by: Minchan Kim 
> [jstultz: Reworked to add purging of file pages, commit log tweaks]
> Signed-off-by: John Stultz 
> ---
>  include/linux/rmap.h   |  12 +-
>  include/linux/swap.h   |   1 +
>  include/linux/vrange.h |   7 ++
>  mm/ksm.c   |   2 +-
>  mm/rmap.c  |  30 +++--
>  mm/swapfile.c  |  36 ++
>  mm/vmscan.c|  16 ++-
>  mm/vrange.c| 332 
> +
>  8 files changed, 420 insertions(+), 16 deletions(-)

This patch has some bugs so below patch should fix them and pass my
simple cases.

>From 13c458388a4784a785d93f285b0c54156c3b04aa Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Tue, 11 Jun 2013 21:22:50 -0700
Subject: [PATCH 1/2] vrange: Add method to purge volatile ranges

This patch adds discarding function to purge volatile ranges under
memory pressure. Logic is as following:

1. Memory pressure happens
2. VM start to reclaim pages
3. Check the page is in volatile range.
4. If so, zap the page from the process's page table.
   (By semantic vrange(2), we should mark it with another one to
make page fault when you try to access the address. It will
be introduced later patch)
5. If page is unmapped from all processes, discard it instead of swapping.

This patch does not address the case where there is no swap, which
keeps anonymous pages from being aged off the LRUs. Minchan has
additional patches that add support for purging anonymous pages

XXX: First pass at file purging. Seems to work, but is likely broken
and needs close review.

Cc: Andrew Morton 
Cc: Android Kernel Team 
Cc: Robert Love 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Rik van Riel 
Cc: Dmitry Adamushko 
Cc: Dave Chinner 
Cc: Neil Brown 
Cc: Andrea Righi 
Cc: Andrea Arcangeli 
Cc: Aneesh Kumar K.V 
Cc: Mike Hommey 
Cc: Taras Glek 
Cc: Dhaval Giani 
Cc: Jan Kara 
Cc: KOSAKI Motohiro 
Cc: Michel Lespinasse 
Cc: Minchan Kim 
Cc: linux...@kvack.org 
Signed-off-by: Minchan Kim 
[jstultz: Reworked to add purging of file pages, commit log tweaks]
Signed-off-by: John Stultz 
---
 include/linux/rmap.h   |   12 +-
 include/linux/swap.h   |1 +
 include/linux/vrange.h |7 +
 mm/ksm.c   |2 +-
 mm/rmap.c  |   30 +++--
 mm/swapfile.c  |   36 ++
 mm/vmscan.c|   16 ++-
 mm/vrange.c|  332 
 8 files changed, 420 insertions(+), 16 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 6dacb93..6432dfb 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -83,6 +83,8 @@ enum ttu_flags {
 };
 
 #ifdef CONFIG_MMU
+unsigned long vma_address(struct page *page, struct vm_area_struct *vma);
+
 static inline void get_anon_vma(struct anon_vma *anon_vma)
 {
atomic_inc(_vma->refcount);
@@ -182,9 +184,11 @@ static inline void page_dup_rmap(struct page *page)
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked,
-   struct mem_cgroup *memcg, unsigned long *vm_flags);
+   struct mem_cgroup *memcg, unsigned long *vm_flags,
+   int *is_vrange);
 int page_referenced_one(struct page *, struct vm_area_struct *,
-   unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
+   unsigned long address, unsigned int *mapcount, unsigned long *vm_flags,
+   int *is_vrange);
 
 #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
 
@@ -249,9 +253,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct 
page *,
 
 static inline int

Re: [PATCH 2/2] perf: Add support for the mem_xlvl field.

2013-06-18 Thread Michael Neuling

Sukadev Bhattiprolu  wrote:

> From 9f1a8a16e0ef36447e343d1cd4797c2b6a81225f Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu 
> Date: Fri, 7 Jun 2013 13:26:31 -0700
> Subject: [PATCH 2/2] perf: Add support for the mem_xlvl field.
> 
> A follow-on patch to adding perf_mem_data_src support for Power7.
> At this point, this is only  touch-tested as am looking for feedback
> on the main kernel patch.
> 
> Signed-off-by: Sukadev Bhattiprolu 
> ---
>  tools/perf/util/sort.c |   31 ++-
>  1 files changed, 30 insertions(+), 1 deletions(-)
> 
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 5f52d49..24bbf4d 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -631,9 +631,11 @@ static int hist_entry__tlb_snprintf(struct hist_entry 
> *self, char *bf,
>  static int64_t
>  sort__lvl_cmp(struct hist_entry *left, struct hist_entry *right)
>  {
> + int64_t rc;
>   union perf_mem_data_src data_src_l;
>   union perf_mem_data_src data_src_r;
>  
> + data_src_l.val = data_src_r.val = (int64_t)0;
>   if (left->mem_info)
>   data_src_l = left->mem_info->data_src;
>   else
> @@ -644,7 +646,11 @@ sort__lvl_cmp(struct hist_entry *left, struct hist_entry 
> *right)
>   else
>   data_src_r.mem_lvl = PERF_MEM_LVL_NA;
>  
> - return (int64_t)(data_src_r.mem_lvl - data_src_l.mem_lvl);
> + rc = data_src_r.mem_lvl - data_src_l.mem_lvl;
> + if (!rc)
> + rc = data_src_r.mem_xlvl - data_src_l.mem_xlvl;
> + 

whitespace here

> + return rc;
>  }
>  
>  static const char * const mem_lvl[] = {
> @@ -663,7 +669,14 @@ static const char * const mem_lvl[] = {
>   "I/O",
>   "Uncached",
>  };
> +
> +static const char * const mem_xlvl[] = {
> + "Remote RAM (3 hops)",
> + "Remote Cache (3 hops)",
> +};
> +
>  #define NUM_MEM_LVL (sizeof(mem_lvl)/sizeof(const char *))
> +#define NUM_MEM_XLVL (sizeof(mem_xlvl)/sizeof(const char *))
>  
>  static int hist_entry__lvl_snprintf(struct hist_entry *self, char *bf,
>   size_t size, unsigned int width)
> @@ -695,6 +708,22 @@ static int hist_entry__lvl_snprintf(struct hist_entry 
> *self, char *bf,
>   strncat(out, mem_lvl[i], sz - l);
>   l += strlen(mem_lvl[i]);
>   }
> +
> + m = 0;
> + if (self->mem_info)
> + m = self->mem_info->data_src.mem_xlvl;
> +
> + for (i = 0; m && i < NUM_MEM_XLVL; i++, m >>= 1) {
> + if (!(m & 0x1))
> + continue;
> + if (l) {
> + strcat(out, " or ");
> + l += 4;
> + }
> + strncat(out, mem_xlvl[i], sz - l);
> + l += strlen(mem_xlvl[i]);
> + }
> +
>   if (*out == '\0')
>   strcpy(out, "N/A");
>   if (hit)
> -- 
> 1.7.1
> 
> ___
> Linuxppc-dev mailing list
> linuxppc-...@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression in RCU subsystem in latest mainline kernel

2013-06-18 Thread Paul E. McKenney

On Mon, Jun 17, 2013 at 05:42:13PM +1000, Michael Ellerman wrote:
> On Sat, Jun 15, 2013 at 12:02:21PM +1000, Benjamin Herrenschmidt wrote:
> > On Fri, 2013-06-14 at 17:06 -0400, Steven Rostedt wrote:
> > > I was pretty much able to reproduce this on my PA Semi PPC box. Funny
> > > thing is, when I type on the console, it makes progress. Anyway, it
> > > seems that powerpc has an issue with irq_work(). I'll try to get some
> > > time either tonight or next week to figure it out.
> > 
> > Does this help ?
> > 
> > diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> > index 5cbcf4d..ea185e0 100644
> > --- a/arch/powerpc/kernel/irq.c
> > +++ b/arch/powerpc/kernel/irq.c
> > @@ -162,7 +162,7 @@ notrace unsigned int __check_irq_replay(void)
> >  * in case we also had a rollover while hard disabled
> >  */
> > local_paca->irq_happened &= ~PACA_IRQ_DEC;
> > -   if (decrementer_check_overflow())
> > +   if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
> > return 0x900;
> >  
> > /* Finally check if an external interrupt happened */
> > 
> 
> This seems to help, but doesn't elminate the RCU stall warnings I am
> seeing. I now see them less often, but not never.
> 
> Stack trace is something like:

Hmmm...  How many CPUs are on your system?  And how much work is
perf_event_for_each_child() having to do here?

If the amount of work is large and your kernel is built with
CONFIG_PREEMPT=n, the RCU CPU stall warning would be expected behavior.
If so, we might need a preemption point in perf_event_for_each_child().

Thanx, Paul

>   INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 12, 
> t=21372 jiffies, g=18446744073709551503, c=18446744073709551502, q=1018)
>   Task dump for CPU 32:
>   power8-events   R  running task 4960  2009   1988 0x0004
>   Call Trace:
>   [c00fb0e3f910] [c00fb0e3f9d0] 0xc00fb0e3f9d0 (unreliable)
>   
>   [c00fb0e3edc0] [c00b2894] .__run_hrtimer+0xa4/0x2a0
>   [c00fb0e3ee70] [c00b36d8] .hrtimer_interrupt+0x148/0x320
>   [c00fb0e3ef80] [c001c754] .timer_interrupt+0x134/0x320
>   [c00fb0e3f040] [c000a4f4] restore_check_irq_replay+0x68/0xa8
>   --- Exception: 901 at .arch_local_irq_restore+0x24/0x90
>   LR = .__do_softirq+0x100/0x3a0
>   [c00fb0e3f330] [c00c4784] .vtime_account_irq_enter+0x34/0x70 
> (unreliable)
>   [c00fb0e3f3a0] [c0089680] .__do_softirq+0x100/0x3a0
>   [c00fb0e3f4c0] [c0089b38] .irq_exit+0xc8/0x110
>   [c00fb0e3f540] [c001c788] .timer_interrupt+0x168/0x320
>   [c00fb0e3f600] [c00025cc] decrementer_common+0x14c/0x180
>   --- Exception: 901 at .arch_local_irq_restore+0x74/0x90
>   LR = .arch_local_irq_restore+0x74/0x90
>   [c00fb0e3f8f0] [c00fb0e3f970] 0xc00fb0e3f970 (unreliable)
>   [c00fb0e3f960] [c00e4ae0] .smp_call_function_single+0x1d0/0x1e0
>   [c00fb0e3fa10] [c0147aa4] .task_function_call+0x54/0x70
>   [c00fb0e3fab0] [c0147bc4] .perf_event_enable+0x104/0x1c0
>   [c00fb0e3fb60] [c0146800] .perf_event_for_each_child+0x60/0x110
>   [c00fb0e3fbf0] [c014a528] .perf_ioctl+0x108/0x3f0
>   [c00fb0e3fca0] [c01d7138] .do_vfs_ioctl+0xb8/0x730
>   [c00fb0e3fd80] [c01d780c] .SyS_ioctl+0x5c/0xb0
>   [c00fb0e3fe30] [c0009d54] syscall_exit+0x0/0x98
> 
> 
> cheers
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)".

2013-06-18 Thread Chen Gang


Since "WARN_ON(worker->task)", we can not assume that 'worker->task'
will be NULL before set 'current' to it.

So need let 'worker' lock protected too, just like it already lock
protected all time in main looping.


Signed-off-by: Chen Gang 
---
 kernel/kthread.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 760e86d..8d572b8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -511,8 +511,10 @@ int kthread_worker_fn(void *worker_ptr)
struct kthread_worker *worker = worker_ptr;
struct kthread_work *work;
 
+   spin_lock_irq(>lock);
WARN_ON(worker->task);
worker->task = current;
+   spin_unlock_irq(>lock);
 repeat:
set_current_state(TASK_INTERRUPTIBLE);  /* mb paired w/ kthread_stop */
 
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the block tree with the cgroup tree

2013-06-18 Thread Stephen Rothwell

Hi Jens,

Today's linux-next merge of the block tree got a conflict in
include/linux/cgroup.h between commit f63674fd0d6a ("cgroup: update
sane_behavior documentation") from the cgroup tree and commit
9138125beabb ("blk-throttle: implement proper hierarchy support") from
the block tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc include/linux/cgroup.h
index e92e647,09f1a14..000
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@@ -261,23 -269,15 +261,25 @@@ enum 
 *
 * - Remount is disallowed.
 *
 +   * - "tasks" is removed.  Everything should be at process
 +   *   granularity.  Use "cgroup.procs" instead.
 +   *
 +   * - "release_agent" and "notify_on_release" are removed.
 +   *   Replacement notification mechanism will be implemented.
 +   *
 +   * - rename(2) is disallowed.
 +   *
 +   * - cpuset: tasks will be kept in empty cpusets when hotplug happens
 +   *   and take masks of ancestors with non-empty cpus/mems, instead of
 +   *   being moved to an ancestor.
 +   *
 +   * - cpuset: a task can be moved into an empty cpuset, and again it
 +   *   takes masks of ancestors.
 +   *
 * - memcg: use_hierarchy is on by default and the cgroup file for
 *   the flag is not created.
+*
+* - blkcg: blk-throttle becomes properly hierarchical.
 -   *
 -   * The followings are planned changes.
 -   *
 -   * - release_agent will be disallowed once replacement notification
 -   *   mechanism is implemented.
 */
CGRP_ROOT_SANE_BEHAVIOR = (1 << 0),
  


pgpYriuh6linr.pgp
Description: PGP signature

Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Rusty Russell

Alex Williamson  writes:
> On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote:
>> On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:
>> 
>> > IOMMU groups themselves don't provide security, they're accessed by
>> > interfaces like VFIO, which provide the security.  Given a brief look, I
>> > agree, this looks like a possible backdoor.  The typical VFIO way to
>> > handle this would be to pass a VFIO file descriptor here to prove that
>> > the process has access to the IOMMU group.  This is how /dev/vfio/vfio
>> > gains the ability to setup an IOMMU domain an do mappings with the
>> > SET_CONTAINER ioctl using a group fd.  Thanks,
>> 
>> How do you envision that in the kernel ? IE. I'm in KVM code, gets that
>> vfio fd, what do I do with it ?
>> 
>> Basically, KVM needs to know that the user is allowed to use that iommu
>> group. I don't think we want KVM however to call into VFIO directly
>> right ?
>
> Right, we don't want to create dependencies across modules.  I don't
> have a vision for how this should work.  This is effectively a complete
> side-band to vfio, so we're really just dealing in the iommu group
> space.  Maybe there needs to be some kind of registration of ownership
> for the group using some kind of token.  It would need to include some
> kind of notification when that ownership ends.  That might also be a
> convenient tag to toggle driver probing off for devices in the group.
> Other ideas?  Thanks,

It's actually not that bad.

eg. 

struct vfio_container *vfio_container_from_file(struct file *filp)
{
if (filp->f_op != _device_fops)
return ERR_PTR(-EINVAL);

/* OK it really is a vfio fd, return the data. */

}
EXPORT_SYMBOL_GPL(vfio_container_from_file);

...

inside KVM_CREATE_SPAPR_TCE_IOMMU:

struct file *vfio_filp;
struct vfio_container *(lookup)(struct file *filp);

vfio_filp = fget(create_tce_iommu.fd);
if (!vfio)
ret = -EBADF;
lookup = symbol_get(vfio_container_from_file);
if (!lookup)
ret = -EINVAL;
else {
container = lookup(vfio_filp);
if (IS_ERR(container))
ret = PTR_ERR(container);
else
...
symbol_put(vfio_container_from_file);
}

symbol_get() won't try to load a module; it'll just fail.  This is what
you want, since they must have vfio in the kernel to get a valid fd...

Hope that helps,
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH percpu/for-3.11 1/2] percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu()

2013-06-18 Thread Rusty Russell

Tejun Heo  writes:
> Two small changes.
>
> * Unlike most init functions, percpu_ref_init() allocates memory and
>   may fail.  Let's mark it with __must_check in case the caller
>   forgets.

But it's quite OK to ignore OOM errors in builtin init functions.

It would be neatest to have it fail into slow mode, of course, but it's
probably not worth the pain.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drivers: hv: allocate synic structures before hv_synic_init()

2013-06-18 Thread Jason Wang

We currently allocate synic structures in hv_sync_init(), but there's no way for
the driver to know about the allocation failure and it may continue to use the
uninitialized pointers. Solve this by introducing helpers for allocating and
freeing and doing the allocation before the on_each_cpu() call in
vmbus_bus_init().

Cc: K. Y. Srinivasan 
Cc: Haiyang Zhang 
Signed-off-by: Jason Wang 
---
 drivers/hv/hv.c   |   85 -
 drivers/hv/hyperv_vmbus.h |4 ++
 drivers/hv/vmbus_drv.c|8 +++-
 3 files changed, 63 insertions(+), 34 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 3f88681..0039373 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -265,6 +265,59 @@ u16 hv_signal_event(void *con_id)
return status;
 }
 
+
+int hv_synic_alloc(void)
+{
+   size_t size = sizeof(struct tasklet_struct);
+   int cpu;
+
+   for_each_online_cpu(cpu) {
+   hv_context.event_dpc[cpu] = kmalloc(size, GFP_ATOMIC);
+   if (hv_context.event_dpc[cpu] == NULL) {
+   pr_err("Unable to allocate event dpc\n");
+   goto err;
+   }
+   tasklet_init(hv_context.event_dpc[cpu], vmbus_on_event, cpu);
+
+   hv_context.synic_message_page[cpu] =
+   (void *)get_zeroed_page(GFP_ATOMIC);
+
+   if (hv_context.synic_message_page[cpu] == NULL) {
+   pr_err("Unable to allocate SYNIC message page\n");
+   goto err;
+   }
+
+   hv_context.synic_event_page[cpu] =
+   (void *)get_zeroed_page(GFP_ATOMIC);
+
+   if (hv_context.synic_event_page[cpu] == NULL) {
+   pr_err("Unable to allocate SYNIC event page\n");
+   goto err;
+   }
+   }
+
+   return 0;
+err:
+   return -ENOMEM;
+}
+
+void hv_synic_free_cpu(int cpu)
+{
+   kfree(hv_context.event_dpc[cpu]);
+   if (hv_context.synic_message_page[cpu])
+   free_page((unsigned long)hv_context.synic_event_page[cpu]);
+   if (hv_context.synic_message_page[cpu])
+   free_page((unsigned long)hv_context.synic_message_page[cpu]);
+}
+
+void hv_synic_free(void)
+{
+   int cpu;
+
+   for_each_online_cpu(cpu)
+   hv_synic_free_cpu(cpu);
+}
+
 /*
  * hv_synic_init - Initialize the Synthethic Interrupt Controller.
  *
@@ -289,30 +342,6 @@ void hv_synic_init(void *arg)
/* Check the version */
rdmsrl(HV_X64_MSR_SVERSION, version);
 
-   hv_context.event_dpc[cpu] = kmalloc(sizeof(struct tasklet_struct),
-   GFP_ATOMIC);
-   if (hv_context.event_dpc[cpu] == NULL) {
-   pr_err("Unable to allocate event dpc\n");
-   goto cleanup;
-   }
-   tasklet_init(hv_context.event_dpc[cpu], vmbus_on_event, cpu);
-
-   hv_context.synic_message_page[cpu] =
-   (void *)get_zeroed_page(GFP_ATOMIC);
-
-   if (hv_context.synic_message_page[cpu] == NULL) {
-   pr_err("Unable to allocate SYNIC message page\n");
-   goto cleanup;
-   }
-
-   hv_context.synic_event_page[cpu] =
-   (void *)get_zeroed_page(GFP_ATOMIC);
-
-   if (hv_context.synic_event_page[cpu] == NULL) {
-   pr_err("Unable to allocate SYNIC event page\n");
-   goto cleanup;
-   }
-
/* Setup the Synic's message page */
rdmsrl(HV_X64_MSR_SIMP, simp.as_uint64);
simp.simp_enabled = 1;
@@ -355,14 +384,6 @@ void hv_synic_init(void *arg)
rdmsrl(HV_X64_MSR_VP_INDEX, vp_index);
hv_context.vp_index[cpu] = (u32)vp_index;
return;
-
-cleanup:
-   if (hv_context.synic_event_page[cpu])
-   free_page((unsigned long)hv_context.synic_event_page[cpu]);
-
-   if (hv_context.synic_message_page[cpu])
-   free_page((unsigned long)hv_context.synic_message_page[cpu]);
-   return;
 }
 
 /*
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 12f2f9e..d84918f 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -527,6 +527,10 @@ extern int hv_post_message(union hv_connection_id 
connection_id,
 
 extern u16 hv_signal_event(void *con_id);
 
+extern int hv_synic_alloc(void);
+
+extern void hv_synic_free(void);
+
 extern void hv_synic_init(void *irqarg);
 
 extern void hv_synic_cleanup(void *arg);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 4004e54..a2464bf 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -563,6 +563,9 @@ static int vmbus_bus_init(int irq)
 */
hv_register_vmbus_handler(irq, vmbus_isr);
 
+   ret = hv_synic_alloc();
+   if (ret)
+   goto err_alloc;
/*
 * Initialize the per-cpu interrupt state and
 * connect to the host.
@@ -570,13 +573,14 @@

Re: [PATCH v5] cpufreq: fix governor start/stop race condition

2013-06-18 Thread Xiaoguang Chen

2013/6/19 Viresh Kumar :
> On 19 June 2013 08:43, Viresh Kumar  wrote:
>> On 19 June 2013 06:50, Xiaoguang Chen  wrote:
>>> 2013/6/19 Rafael J. Wysocki :
>>
> 2) Current governor is userspace, now cpu0 hotplugs in cpu3, it will

 Can you please tell me what the above is supposed to mean?  Is it supposed 
 to
 mean "the online of cpu3 is being run on cpu0" or something different?  If
 something different, then what?

Sorry I missed this question, Let me explain it in detail
Suppose we are in such condtition, current cpufreq goveror is
userspace governor. cpu3 is offline.
and two things happen as above two cases, first thing is application
tries to change current governor to ondemand governor,
second thing is cpu0 tries to make cpu3 online which is off line
before. both of these two cases will try to stop current governor and
start a governor. if above two things interleave, unexpected behavior
will happen.

>>
>> Please read all the questions carefully. You missed this one.
>>
>> Actually you should write: Current governor is userspace, now cpu0 
>> hot-unplugs
>> cpu3, it will **
>
> Ahh I am mistaken, you are actually bringing cpu3 back to the system. Then you
> must have mentioned earlier that cpu3 wasn't online.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] USB: initialize or shutdown PHY when add or remove host controller

2013-06-18 Thread Chao Xie

On Wed, Jun 19, 2013 at 10:48 AM, Greg KH  wrote:
> On Tue, Jun 18, 2013 at 10:31:20PM -0400, Chao Xie wrote:
>> Some controller need software to initialize PHY before add
>> host controller, and shut down PHY after remove host controller.
>> Add the generic code for these controllers so they do not need
>> do it in its own host controller driver.
>
> Why?  What breaks if we add this patch, and what gets fixed?  I'm
> guessing you can then remove code?
>
> What out-of-tree code now works properly?  Or gets broken?
>
> we need more info here please...
>
The patch does not fix any bug.
Some echi-xxx driver will need initialize the phy before it do
usb_add_hcd, and shut down phy after
do usb_remove_hcd, and i did a patch for ehci-mv.c to do above thing.
Alan and Felipe comments on my patch, and they think it is a peice of
generic code, and it can be
moved to hcd to handle it, so other ehci-xxx will not do the same thing again.
So i add the patch to add the generic code in hcd to handle phy
initialization and shut down.

> thanks,
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Acpi: acpica: acmacros: fixed a semicolon formatting issue.

2013-06-18 Thread John B. Wyatt IV

From: "John B. Wyatt IV" 

Formatting patch: fixes all "space required after that ';'" errors in
acmarcos.h.

Please note this only fixes 12 out of 64 errors as reported by
./scripts/checkpatch.pl
---
 drivers/acpi/acpica/acmacros.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/acpica/acmacros.h b/drivers/acpi/acpica/acmacros.h
index 53666bd..b916eed 100644
--- a/drivers/acpi/acpica/acmacros.h
+++ b/drivers/acpi/acpica/acmacros.h
@@ -93,15 +93,15 @@
 /* 16-bit source, 16/32/64 destination */
 
 #define ACPI_MOVE_16_TO_16(d, s){((  u8 *)(void *)(d))[0] = ((u8 
*)(void *)(s))[1];\
- ((  u8 *)(void *)(d))[1] = ((u8 *)(void *)(s))[0];}
+ ((  u8 *)(void *)(d))[1] = ((u8 *)(void *)(s))[0]; }
 
 #define ACPI_MOVE_16_TO_32(d, s){(*(u32 *)(void *)(d))=0;\
  ((u8 *)(void *)(d))[2] = ((u8 *)(void 
*)(s))[1];\
- ((u8 *)(void *)(d))[3] = ((u8 *)(void 
*)(s))[0];}
+ ((u8 *)(void *)(d))[3] = ((u8 *)(void 
*)(s))[0]; }
 
 #define ACPI_MOVE_16_TO_64(d, s){(*(u64 *)(void *)(d))=0;\
   ((u8 *)(void 
*)(d))[6] = ((u8 *)(void *)(s))[1];\
-  ((u8 *)(void 
*)(d))[7] = ((u8 *)(void *)(s))[0];}
+  ((u8 *)(void 
*)(d))[7] = ((u8 *)(void *)(s))[0]; }
 
 /* 32-bit source, 16/32/64 destination */
 
@@ -110,13 +110,13 @@
 #define ACPI_MOVE_32_TO_32(d, s){((  u8 *)(void *)(d))[0] = ((u8 
*)(void *)(s))[3];\
  ((  
u8 *)(void *)(d))[1] = ((u8 *)(void *)(s))[2];\
  ((  
u8 *)(void *)(d))[2] = ((u8 *)(void *)(s))[1];\
- ((  
u8 *)(void *)(d))[3] = ((u8 *)(void *)(s))[0];}
+ ((  
u8 *)(void *)(d))[3] = ((u8 *)(void *)(s))[0]; }
 
 #define ACPI_MOVE_32_TO_64(d, s){(*(u64 *)(void *)(d))=0;\

   ((u8 *)(void *)(d))[4] = ((u8 *)(void *)(s))[3];\

   ((u8 *)(void *)(d))[5] = ((u8 *)(void *)(s))[2];\

   ((u8 *)(void *)(d))[6] = ((u8 *)(void *)(s))[1];\
-   
   ((u8 *)(void *)(d))[7] = ((u8 *)(void *)(s))[0];}
+   
   ((u8 *)(void *)(d))[7] = ((u8 *)(void *)(s))[0]; }
 
 /* 64-bit source, 16/32/64 destination */
 
@@ -131,7 +131,7 @@

 ((  u8 *)(void *)(d))[4] = ((u8 *)(void *)(s))[3];\

 ((  u8 *)(void *)(d))[5] = ((u8 *)(void *)(s))[2];\

 ((  u8 *)(void *)(d))[6] = ((u8 *)(void *)(s))[1];\
-   
 ((  u8 *)(void *)(d))[7] = ((u8 *)(void *)(s))[0];}
+   
 ((  u8 *)(void *)(d))[7] = ((u8 *)(void *)(s))[0]; }
 #else
 /*
  * Macros for little-endian machines
@@ -169,10 +169,10 @@
 /* 16-bit source, 16/32/64 destination */
 
 #define ACPI_MOVE_16_TO_16(d, s){((  u8 *)(void *)(d))[0] = ((u8 
*)(void *)(s))[0];\
-   
 ((  u8 *)(void *)(d))[1] = ((u8 *)(void *)(s))[1];}
+   
 ((  u8 *)(void *)(d))[1] = ((u8 *)(void *)(s))[1]; }
 
-#define ACPI_MOVE_16_TO_32(d, s){(*(u32 *)(void *)(d)) = 0; 
ACPI_MOVE_16_TO_16(d, s);}
-#define ACPI_MOVE_16_TO_64(d, s){(*(u64 *)(void *)(d)) = 0; 
ACPI_MOVE_16_TO_16(d, s);}
+#define ACPI_MOVE_16_TO_32(d, s){(*(u32 *)(void *)(d)) = 0; 
ACPI_MOVE_16_TO_16(d, s); }
+#define ACPI_MOVE_16_TO_64(d, s){(*(u64 *)(void *)(d)) = 0; 
ACPI_MOVE_16_TO_16(d, s); }
 
 /* 32-bit source, 16/32/64 destination */
 
@@ -181,9 +181,9 @@
 #define ACPI_MOVE_32_TO_32(d, s){((  u8 *)(void *)(d))[0] = ((u8 
*)(void *)(s))[0];\

 ((  u8 *)(void *)(d))[1] = ((u8 *)(void *)(s))[1];\

 ((  u8 *)(void *)(d))[2] = ((u8

Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-18 Thread Alexey Kardashevskiy

On 06/16/2013 02:39 PM, Benjamin Herrenschmidt wrote:
>>  static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool 
>> writing,
>> -unsigned long *pte_sizep)
>> +unsigned long *pte_sizep, bool do_get_page)
>>  {
>>  pte_t *ptep;
>>  unsigned int shift = 0;
>> @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
>> long hva, bool writing,
>>  if (!pte_present(*ptep))
>>  return __pte(0);
>>  
>> +/*
>> + * Put huge pages handling to the virtual mode.
>> + * The only exception is for TCE list pages which we
>> + * do need to call get_page() for.
>> + */
>> +if ((*pte_sizep > PAGE_SIZE) && do_get_page)
>> +return __pte(0);
>> +
>>  /* wait until _PAGE_BUSY is clear then set it atomically */
>>  __asm__ __volatile__ (
>>  "1: ldarx   %0,0,%3\n"
>> @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
>> long hva, bool writing,
>>  : "cc");
>>  
>>  ret = pte;
>> +if (do_get_page && pte_present(pte) && (!writing || pte_write(pte))) {
>> +struct page *pg = NULL;
>> +pg = realmode_pfn_to_page(pte_pfn(pte));
>> +if (realmode_get_page(pg)) {
>> +ret = __pte(0);
>> +} else {
>> +pte = pte_mkyoung(pte);
>> +if (writing)
>> +pte = pte_mkdirty(pte);
>> +}
>> +}
>> +*ptep = pte;/* clears _PAGE_BUSY */
>>  
>>  return ret;
>>  }
> 
> So now you are adding the clearing of _PAGE_BUSY that was missing for
> your first patch, except that this is not enough since that means that
> in the "emulated" case (ie, !do_get_page) you will in essence return
> and then use a PTE that is not locked without any synchronization to
> ensure that the underlying page doesn't go away... then you'll
> dereference that page.
> 
> So either make everything use speculative get_page, or make the emulated
> case use the MMU notifier to drop the operation in case of collision.
> 
> The former looks easier.
> 
> Also, any specific reason why you do:
> 
>   - Lock the PTE
>   - get_page()
>   - Unlock the PTE
> 
> Instead of
> 
>   - Read the PTE
>   - get_page_unless_zero
>   - re-check PTE
> 
> Like get_user_pages_fast() does ?
> 
> The former will be two atomic ops, the latter only one (faster), but
> maybe you have a good reason why that can't work...



If we want to set "dirty" and "young" bits for pte then I do not know how
to avoid _PAGE_BUSY.



-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] Cpufreq: Fix governor start/stop race condition

2013-06-18 Thread Viresh Kumar

On 19 June 2013 07:13, Xiaoguang Chen  wrote:
> There are 4 CPUs and policy->cpu=cpu0, cpu1/2/3 are linked to cpu0.
> The normal sequence is as below:

I thought Rafael asked to write cpu0 as CPU0, ...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] include/linux/spinlock_api_smp.h: beautify code, correct the related comments.

2013-06-18 Thread Chen Gang


Correct the related comments for '#ifdef ... #endif'.

Signed-off-by: Chen Gang 
---
 include/linux/spinlock_api_smp.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index 51df117..bdb9993 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -144,7 +144,7 @@ static inline void __raw_spin_lock(raw_spinlock_t *lock)
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
 }
 
-#endif /* CONFIG_PREEMPT */
+#endif /* !CONFIG_GENERIC_LOCKBREAK || CONFIG_DEBUG_LOCK_ALLOC */
 
 static inline void __raw_spin_unlock(raw_spinlock_t *lock)
 {
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5] cpufreq: fix governor start/stop race condition

2013-06-18 Thread Viresh Kumar

On 19 June 2013 08:43, Viresh Kumar  wrote:
> On 19 June 2013 06:50, Xiaoguang Chen  wrote:
>> 2013/6/19 Rafael J. Wysocki :
>
 2) Current governor is userspace, now cpu0 hotplugs in cpu3, it will
>>>
>>> Can you please tell me what the above is supposed to mean?  Is it supposed 
>>> to
>>> mean "the online of cpu3 is being run on cpu0" or something different?  If
>>> something different, then what?
>
> Please read all the questions carefully. You missed this one.
>
> Actually you should write: Current governor is userspace, now cpu0 hot-unplugs
> cpu3, it will **

Ahh I am mistaken, you are actually bringing cpu3 back to the system. Then you
must have mentioned earlier that cpu3 wasn't online.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5] cpufreq: fix governor start/stop race condition

2013-06-18 Thread Viresh Kumar

On 19 June 2013 06:50, Xiaoguang Chen  wrote:
> 2013/6/19 Rafael J. Wysocki :

>>> 2) Current governor is userspace, now cpu0 hotplugs in cpu3, it will
>>
>> Can you please tell me what the above is supposed to mean?  Is it supposed to
>> mean "the online of cpu3 is being run on cpu0" or something different?  If
>> something different, then what?

Please read all the questions carefully. You missed this one.

Actually you should write: Current governor is userspace, now cpu0 hot-unplugs
cpu3, it will **
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[no subject]

2013-06-18 Thread MERCEDES BENZ COMPANY



Did you receive Our last E-mail?.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5] cpufreq: fix governor start/stop race condition

2013-06-18 Thread Viresh Kumar

On 19 June 2013 06:50, Xiaoguang Chen  wrote:
> 2013/6/19 Rafael J. Wysocki :
>> On Thursday, June 13, 2013 05:01:58 PM Xiaoguang Chen wrote:
>>> cpufreq governor stop and start should be kept in sequence.
>>> If not, there will be unexpected behavior, for example:
>>>
>>> we have 4 cpus and policy->cpu=cpu0, cpu1/2/3 are linked to cpu0.
>>
>> Please spell cpus as "CPUs".  And please start sequences from capitals.
>
> Ok, thanks for the remind
>
>>
>> [Yes, it *really* is a problem.]

Just wanted to know the reasoning behind it so that I can remind
others about it and then argue :)

>>> the normal sequence is as below:
>>>
>>> 1) Current governor is userspace, one application tries to set
>>> governor to ondemand. it will call __cpufreq_set_policy in which it
>>> will stop userspace governor and then start ondemand governor.
>>
>> Do I think correctly that this is for all CPUs?
>
> From current code design, it is for all CPUs.

Why? This can be for a single cpu (which would eventually force all
others CPUs sharing policy with it).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kernel/timer.c: using spin_lock_irqsave instead of spin_lock + local_irq_save, especially when CONFIG_LOCKDEP not defined

2013-06-18 Thread Chen Gang


When CONFIG_LOCKDEP is not defined, spin_lock_irqsave() is not equal to
spin_lock() + local_irq_save().

In __mod_timer(), After call spin_lock_irqsave() with 'base->lock' in
lock_timer_base(), it may use spin_lock() with the 'new_base->lock'.

It may let original call do_raw_spin_lock_flags() with 'base->lock',
but new  call LOCK_CONTENDED() with 'new_base->lock'.

In fact, we need both of them call do_raw_spin_lock_flags(), so use
spin_lock_irqsave() instead of spin_lock() + local_irq_save().


Signed-off-by: Chen Gang 
---
 kernel/timer.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index aa8b964..2550a62 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -754,9 +754,9 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
if (likely(base->running_timer != timer)) {
/* See the comment in lock_timer_base() */
timer_set_base(timer, NULL);
-   spin_unlock(>lock);
+   spin_unlock_irqrestore(>lock, flags);
base = new_base;
-   spin_lock(>lock);
+   spin_lock_irqsave(>lock, flags);
timer_set_base(timer, base);
}
}
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Part3 PATCH v2 0/4] Support hot-remove local pagetable pages.

2013-06-18 Thread Yasuaki Ishimatsu


2013/06/19 8:59, Toshi Kani wrote:

On Tue, 2013-06-18 at 19:05 +0200, Vasilis Liaskovitis wrote:

Hi,

On Thu, Jun 13, 2013 at 09:03:52PM +0800, Tang Chen wrote:

The following patch-set from Yinghai allocates pagetables to local nodes.
v1: https://lkml.org/lkml/2013/3/7/642
v2: https://lkml.org/lkml/2013/3/10/47
v3: https://lkml.org/lkml/2013/4/4/639
v4: https://lkml.org/lkml/2013/4/11/829

Since pagetable pages are used by the kernel, they cannot be offlined.
As a result, they cannot be hot-remove.

This patch fix this problem with the following solution:

  1.   Introduce a new bootmem type LOCAL_NODE_DATAL, and register local
   pagetable pages as LOCAL_NODE_DATAL by setting page->lru.next to
   LOCAL_NODE_DATAL, just like we register SECTION_INFO pages.

  2.   Skip LOCAL_NODE_DATAL pages in offline/online procedures. When the
   whole memory block they reside in is offlined, the kernel can
   still access the pagetables.
   (This changes the semantics of offline/online a little bit.)


This could be a design problem of part3: if we allow local pagetable memory
to not be offlined but allow the offlining to return successfully, then
hot-remove is going to succeed. But the direct mapped pagetable pages are still
mapped in the kernel. The hot-removed memblocks will suddenly disappear (think
physical DIMMs getting disabled in real hardware, or in a VM case the
corresponding guest memory getting freed from the emulator e.g. qemu/kvm). The
system can crash as a result.

I think these local pagetables do need to be unmapped from kernel, offlined and
removed somehow - otherwise hot-remove should fail. Could they be migrated
alternatively e.g. to node 0 memory?  But Iiuc direct mapped pages cannot be
migrated, correct?

What is the original reason for local node pagetable allocation with regards
to memory hotplug? I assume we want to have hotplugged nodes use only their 
local
memory, so that there are no inter-node memory dependencies for hot-add/remove.
Are there other reasons that I am missing?


I second Vasilis.  The part1/2/3 series could be much simpler & less
riskier if we focus on the SRAT changes first, and make the local node
pagetable changes as a separate item.  Is there particular reason why
they have to be done at a same time?


If my understanding is correct:
Main purpose of Yinghai's work is to put pagetable on local node ram.
For this, he needs to know SRAT information before setting pagetable.
So part1 does them same time.

Thanks,
Yasuaki Ishimatsu



Thanks,
-Toshi





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: Enable overrides for missing ACS capabilities

2013-06-18 Thread Bjorn Helgaas

On Tue, Jun 18, 2013 at 5:03 PM, Don Dutile  wrote:
> On 06/18/2013 06:22 PM, Alex Williamson wrote:
>>
>> On Tue, 2013-06-18 at 15:31 -0600, Bjorn Helgaas wrote:
>>>
>>> On Tue, Jun 18, 2013 at 12:20 PM, Alex Williamson
>>>   wrote:

 On Tue, 2013-06-18 at 11:28 -0600, Bjorn Helgaas wrote:
>
> On Thu, May 30, 2013 at 12:40:19PM -0600, Alex Williamson wrote:
> ...
> Who do you expect to decide whether to use this option?  I think it
> requires intimate knowledge of how the device works.
>
> I think the benefit of using the option is that it makes assignment of
> devices to guests more flexible, which will make it attractive to
> users.
> But most users have no way of knowing whether it's actually *safe* to
> use this.  So I worry that you're adding an easy way to pretend
> isolation
> exists when there's no good way of being confident that it actually
> does.

> ...

>>> I wonder if we should taint the kernel if this option is used (but not
>>> for specific devices added to pci_dev_acs_enabled[]).  It would also
>>> be nice if pci_dev_specific_acs_enabled() gave some indication in
>>> dmesg for the specific devices you're hoping to add to
>>> pci_dev_acs_enabled[].  It's not an enumeration-time quirk right now,
>>> so I'm not sure how we'd limit it to one message per device.
>>
>> Right, setup vs use and getting single prints is a lot of extra code.
>> Tainting is troublesome for support, Don had some objections when I
>> suggested the same to him.
>>
> For RH GSS (Global Support Services), a 'taint' in the kernel printk means
> RH doesn't support that system.  The 'non-support' due to 'taint' being
> printed
> out in this case may be incorrect -- RH may support that use, at least until
> a more sufficient patched kernel is provided.
> Thus my dissension that 'taint' be output.  WARN is ok. 'driver beware',
> 'unleashed dog afoot' sure...

So ...  that's really a RH-specific support issue, and easily worked
around by RH adding a patch that turns off tainting.

It still sounds like a good idea to me for upstream, where use of this
option can very possibly lead to corruption or information leakage
between devices the user claimed were isolated, but in fact were not.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next] dlm: remove duplicated include from lowcomms.c

2013-06-18 Thread Wei Yongjun

From: Wei Yongjun 

Remove duplicated include.

Signed-off-by: Wei Yongjun 
---
 fs/dlm/lowcomms.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 4f539dd..d90909e 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -52,7 +52,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] USB: initialize or shutdown PHY when add or remove host controller

2013-06-18 Thread Greg KH

On Tue, Jun 18, 2013 at 10:31:20PM -0400, Chao Xie wrote:
> Some controller need software to initialize PHY before add
> host controller, and shut down PHY after remove host controller.
> Add the generic code for these controllers so they do not need
> do it in its own host controller driver.

Why?  What breaks if we add this patch, and what gets fixed?  I'm
guessing you can then remove code?

What out-of-tree code now works properly?  Or gets broken?

we need more info here please...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-18 Thread Xiao Guangrong

On 06/18/2013 10:26 PM, Paolo Bonzini wrote:
> Il 07/06/2013 10:51, Xiao Guangrong ha scritto:
>> Changelog:
>> V3:
>>   All of these changes are from Gleb's review:
>>   1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
>>   2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
>>  to avoid kvm_memslots->generation overflow.
>>
>> V2:
>>   - rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
>>   - use kvm->memslots->generation as kvm global generation-number
>>   - fix comment and codestyle
>>   - init kvm generation close to mmio wrap-around value
>>   - keep kvm_mmu_zap_mmio_sptes
>>
>> The current way is holding hot mmu-lock and walking all shadow pages, this
>> is not scale. This patchset tries to introduce a very simple and scale way
>> to fast invalidate all mmio sptes - it need not walk any shadow pages and 
>> hold
>> any locks.
>>
>> The idea is simple:
>> KVM maintains a global mmio valid generation-number which is stored in
>> kvm->memslots.generation and every mmio spte stores the current global
>> generation-number into his available bits when it is created
>>
>> When KVM need zap all mmio sptes, it just simply increase the global
>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>> then it walks the shadow page table and get the mmio spte. If the
>> generation-number on the spte does not equal the global generation-number,
>> it will go to the normal #PF handler to update the mmio spte
>>
>> Since 19 bits are used to store generation-number on mmio spte, we zap all
>> mmio sptes when the number is round
>>
>> Xiao Guangrong (6):
>>   KVM: MMU: retain more available bits on mmio spte
>>   KVM: MMU: store generation-number into mmio spte
>>   KVM: MMU: make return value of mmio page fault handler more readable
>>   KVM: MMU: fast invalidate all mmio sptes
>>   KVM: MMU: add tracepoint for check_mmio_spte
>>   KVM: MMU: init kvm generation close to mmio wrap-around value
>>
>>  arch/x86/include/asm/kvm_host.h |   2 +-
>>  arch/x86/kvm/mmu.c  | 131 
>> 
>>  arch/x86/kvm/mmu.h  |  17 ++
>>  arch/x86/kvm/mmutrace.h |  34 +--
>>  arch/x86/kvm/paging_tmpl.h  |  10 ++-
>>  arch/x86/kvm/vmx.c  |  12 ++--
>>  arch/x86/kvm/x86.c  |  11 +++-
>>  7 files changed, 177 insertions(+), 40 deletions(-)
>>
> 
> Xiao, is it time to add more comments to the code or update
> Documentation/virtual/kvm/mmu.txt?  

Yes. it is.

We missed to update mmu.txt for a long time. I will post a separate patchset
to update it to the current mmu code.

> Don't worry about the English, it is
> more than understandable and I can help with the editing.

Okay. Thank you in advance, Paolo! :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next] mfd: htc-egpio: use devm_ioremap_nocache() instead of ioremap_nocache()

2013-06-18 Thread Wei Yongjun

From: Wei Yongjun 

Replace probe-time ioremap_nocache() call with devm_ioremap_nocache()
to avoid iounmap() missing and get rid of the corresponding iounmap()
call on remove.

Signed-off-by: Wei Yongjun 
---
 drivers/mfd/htc-egpio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/mfd/htc-egpio.c b/drivers/mfd/htc-egpio.c
index f2e0ad4..26aca54 100644
--- a/drivers/mfd/htc-egpio.c
+++ b/drivers/mfd/htc-egpio.c
@@ -286,7 +286,8 @@ static int __init egpio_probe(struct platform_device *pdev)
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
goto fail;
-   ei->base_addr = ioremap_nocache(res->start, resource_size(res));
+   ei->base_addr = devm_ioremap_nocache(>dev, res->start,
+resource_size(res));
if (!ei->base_addr)
goto fail;
pr_debug("EGPIO phys=%08x virt=%p\n", (u32)res->start, ei->base_addr);
@@ -380,7 +381,6 @@ static int __exit egpio_remove(struct platform_device *pdev)
irq_set_chained_handler(ei->chained_irq, NULL);
device_init_wakeup(>dev, 0);
}
-   iounmap(ei->base_addr);
 
return 0;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] vme: vme_tsi148.c: fix error return code in tsi148_probe()

2013-06-18 Thread Wei Yongjun

From: Wei Yongjun 

Fix to return a negative error code in the tsi148_crcsr_init() error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun 
---
 drivers/vme/bridges/vme_tsi148.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vme/bridges/vme_tsi148.c b/drivers/vme/bridges/vme_tsi148.c
index 9c1aa4d..95c9b54 100644
--- a/drivers/vme/bridges/vme_tsi148.c
+++ b/drivers/vme/bridges/vme_tsi148.c
@@ -2582,7 +2582,8 @@ static int tsi148_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
dev_info(>dev, "VME Write and flush and error check is %s\n",
err_chk ? "enabled" : "disabled");
 
-   if (tsi148_crcsr_init(tsi148_bridge, pdev)) {
+   retval = tsi148_crcsr_init(tsi148_bridge, pdev);
+   if (retval) {
dev_err(>dev, "CR/CSR configuration failed.\n");
goto err_crcsr;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] arm: add bandgap DT entry for OMAP5

2013-06-18 Thread Eduardo Valentin

Add bandgap device DT entry for OMAP5 dtsi.

Cc: "Benoît Cousson" 
Cc: Tony Lindgren 
Cc: Russell King 
Cc: linux-o...@vger.kernel.org
Cc: devicetree-disc...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin 
Signed-off-by: J Keerthy 
---
 arch/arm/boot/dts/omap5.dtsi | 8 
 1 file changed, 8 insertions(+)
---

Benoit,

Sorry for this very late request, but can you please consider
these patches for 3.11 still?

I completely forgot to send these on my "Enable TI SoC thermal driver" series.

All best,

Eduardo

diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
index 2ad63c4..5ede6e1 100644
--- a/arch/arm/boot/dts/omap5.dtsi
+++ b/arch/arm/boot/dts/omap5.dtsi
@@ -615,5 +615,13 @@
interrupts = <0 80 0x4>;
ti,hwmods = "wd_timer2";
};
+   bandgap {
+   reg = <0x4a0021e0 0xc
+   0x4a00232c 0xc
+   0x4a002380 0x2c
+   0x4a0023C0 0x3c>;
+   interrupts = <0 126 4>; /* talert */
+   compatible = "ti,omap5430-bandgap";
+   };
};
 };
-- 
1.8.2.1.342.gfa7285d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2] USB: initialize or shutdown PHY when add or remove host controller

2013-06-18 Thread Chao Xie

Some controller need software to initialize PHY before add
host controller, and shut down PHY after remove host controller.
Add the generic code for these controllers so they do not need
do it in its own host controller driver.

Signed-off-by: Chao Xie 
---
 drivers/usb/core/hcd.c |   24 +++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index d53547d..301c639 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -40,9 +40,11 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
+#include 
 
 #include "usb.h"
 
@@ -2531,12 +2533,26 @@ int usb_add_hcd(struct usb_hcd *hcd,
 */
set_bit(HCD_FLAG_RH_RUNNING, >flags);
 
+   /* In case hcd->phy contains the error code. */
+   if (IS_ERR(hcd->phy))
+   hcd->phy = NULL;
+
+   /* Initialize the PHY before other hardware operation. */
+   if (hcd->phy) {
+   retval = usb_phy_init(hcd->phy);
+   if (retval) {
+   dev_err(hcd->self.controller,
+   "can't initialize phy\n");
+   goto err_hcd_driver_setup;
+   }
+   }
+
/* "reset" is misnamed; its role is now one-time init. the controller
 * should already have been reset (and boot firmware kicked off etc).
 */
if (hcd->driver->reset && (retval = hcd->driver->reset(hcd)) < 0) {
dev_err(hcd->self.controller, "can't setup\n");
-   goto err_hcd_driver_setup;
+   goto err_hcd_driver_init_phy;
}
hcd->rh_pollable = 1;
 
@@ -2608,6 +2624,9 @@ err_hcd_driver_start:
if (usb_hcd_is_primary_hcd(hcd) && hcd->irq > 0)
free_irq(irqnum, hcd);
 err_request_irq:
+err_hcd_driver_init_phy:
+   if (hcd->phy)
+   usb_phy_shutdown(hcd->phy);
 err_hcd_driver_setup:
 err_set_rh_speed:
usb_put_dev(hcd->self.root_hub);
@@ -2674,6 +2693,9 @@ void usb_remove_hcd(struct usb_hcd *hcd)
free_irq(hcd->irq, hcd);
}
 
+   if (hcd->phy)
+   usb_phy_shutdown(hcd->phy);
+
usb_put_dev(hcd->self.root_hub);
usb_deregister_bus(>self);
hcd_buffer_destroy(hcd);
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Tux3 Report: Meet Shardmap, the designated successor of HTree

2013-06-18 Thread Daniel Phillips

Greetings all,

 From time to time, one may fortunate enough to be blessed with a 
discovery in computer science that succeeds at improving all four of 
performance, scalability, reliability and simplicity. Of these normally 
conflicting goals, simplicity is usually the most elusive. It is 
therefore with considerable satisfaction that I present the results of 
our recent development work in directory indexing technology, which 
addresses some long-standing and vexing scalability problems exhibited 
by HTree, my previous contribution to the art of directory indexing. 
This new approach, Shardmap, will not only enhance Tux3 scalability, but 
provides an upgrade path for Ext4 and Lustre as well. Shardmap is also 
likely to be interesting for high performance database design. Best of 
all, Shardmap is considerably simpler than the technology we expect it 
to replace.

The most interesting thing about Shardmap is that it remained 
undiscovered for so long. I expect that you will agree that this is 
particularly impressive, considering how obvious Shardmap is in 
retrospect. I can only speculate that the reason for not seeing this 
obvious solution is that we never asked the right question. The question 
should have been: how do we fix this write multiplication issue? Instead 
we spent ten years asking: what should be do about this cache 
thrashing?. It turns out that an answer to the former is also an answer 
to the latter.

Now let us proceed without further ado to a brief tour of Shardmap, 
starting with the technology we expect it to replace.

The Problem with HTree

Occasionally we see LKML reports of performance issues in HTree at high 
scale, usually from people running scalability benchmarks. Lustre users 
have encountered these issues in real life. I always tended to shy away 
from those discussions because, frankly, I did not see any satisfactory 
answer, other than that HTree works perfectly well at the scale it was 
designed for and at which it is normally used. Recently I did learn the 
right answer: HTree is unfixable, and this is true of any media backed 
B-Tree index. Let me reiterate: contrary to popular opinion, a media 
backed B-Tree is an abysmally poor choice of information structure for 
any randomly updated indexing load.

But how can this be, doesn't everybody use B-Trees in just this way? 
Yes, and everybody is making a big mistake. Let me explain. The big 
issue is write multiplication. Any index that groups entries together in 
blocks will tend to have nearly every block dirty under a random update 
load. How do we transfer all those dirty blocks to cache incrementally, 
efficiently and atomically? We don't, it just cannot be done. In 
practice, we end up writing out most index blocks multiple times due to 
just a few small changes. For example, at the end of a mass update 
create we may find that each block has been written hundreds of times. 
Media transfer latency therefore dominates the operation.

This obvious issue somehow escaped our attention over the entire time 
HTree has been in service. We have occasionally misattributed degraded 
HTree performance to inode table thrashing. To be sure, thrashing at 
high scale is a known problem with Tree, but it is not the biggest 
problem. That would be write multiplication. To fix this, we need to 
step back and adopt a completely different approach.

Dawning of the Light

I am kind of whacking myself on the forehead about this. For an entire 
decade I thought that HTree could be fixed by incremental improvements 
and consequently devoted considerable energy to that effort, the high 
water mark of which was my PHTree post earlier this year:

http://phunq.net/pipermail/tux3/2013-January/26.html

The PHTree design is a respectable if uninspired piece of work that 
fixes all the known issues with HTree except for write multiplication, 
which I expected to be pretty easy. Far from it. The issue is 
fundamental to the nature of B-Trees. Though not hitherto recognized in 
the Linux file system community, academics recognized this issue some 
time ago and have been busy hunting for a solution. During one of our 
sushi meetings in the wilds of Mountain View, Kent Overstreet of BCache 
fame pointed me at this work:

http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/

Such attempts generally fail to get anywhere close to the efficiency 
levels we have become accustomed to with Ext4 and its ilk. But it got me 
thinking along productive lines. (Thank you Kent!) One day the answer 
just hit me like a slow rolling thunderbolt: instead of committing the 
actual B-Tree to disk we should leave it dirty in cache and just log the 
updates to it. This is obviously write-efficient and ACID friendly. It 
is also a poor solution because it sacrifices recovery latency. In the 
event of a crash we need to read the entire log to reconstruct the dirty 
B-Tree, which could take several minutes. During this time, even though 
the raw

Re: [PATCH v2 1/2] pci: Fix flaw in pci_acs_enabled()

2013-06-18 Thread Bjorn Helgaas

On Tue, Jun 18, 2013 at 4:47 PM, Alex Williamson
 wrote:
> On Tue, 2013-06-18 at 16:10 -0600, Bjorn Helgaas wrote:
>> On Tue, Jun 18, 2013 at 12:38 PM, Alex Williamson
>>  wrote:
>> > On Tue, 2013-06-18 at 11:09 -0600, Bjorn Helgaas wrote:
>> >> On Fri, Jun 07, 2013 at 10:34:41AM -0600, Alex Williamson wrote:
> ...
>> >> >   * pci_acs_enabled - test ACS against required flags for a given device
>> >> >   * @pdev: device to test
>> >> > @@ -2364,8 +2377,7 @@ void pci_enable_acs(struct pci_dev *dev)
>> >> >   */
>> >> >  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
>> >>
>> >> I know you didn't change the *name* of this function, but I think it
>> >> would be easier to follow if you did change the name to something more
>> >> descriptive, e.g., something to do with the actual property you're
>> >> interested in, which has to do with routing peer-to-peer DMA.
>> >>
>> >> That property makes sense even for the excluded devices, while the
>> >> idea of an ACS capability that doesn't even exist is implicitly
>> >> enabled, really doesn't.
>> >
>> > I think we also don't want to put the complexity at the caller for
>> > understanding what capabilities are applicable to a given device.  It's
>> > also convenient to use the set of ACS flags.  Given that, the current
>> > naming came about.  It's a little awkward, but it's easy to use.
>> > Suggestions for a better name?
>>
>> 100% agreed the caller shouldn't have to worry about different device
>> types.  I was thinking something like "pci_enforces_peer_isolation()"
>> or "pci_peer_dma_routed_upstream()".  Or maybe it should be
>> "pci_dev_...()".
>
> Ok, I'll play with those.  I'm worried that there are nuances to each
> flag bit that don't all fit under such a broad description.

It's true that they might be overly broad.  On the other hand, the set
of flags we look for is always the same: PCI_ACS_SV | PCI_ACS_RR |
PCI_ACS_CR | PCI_ACS_UF, so what's the point in making a completely
general-purpose interface?  I'm not sure it's even worth passing the
flags around if the code would be clearer without that.

> Do you want
> to gate this series on a rename of an existing function?

You put me in a bit of a tight spot :)  My #1 concern is correctness
and maintainability.  Naming things so they're consistent with other
code and make sense to other readers is a huge part of that.
Unfortunately I don't have time to do any work myself, and my only
tool is to apply patches or not.  But no, I don't want to gate a
simple bug fix on other "cleanup" rework.

> ...
>> >> Maybe something like (pidgin C):
>> >>
>> >> if (PCI_EXP_TYPE_DOWNSTREAM || PCI_EXP_TYPE_ROOT_PORT)
>> >>   return pci_acs_flags_enabled(pdev, acs_flags);
>> >>
>> >> if (!pdev->multifunction)
>> >>   return true;
>> >>
>> >> acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | ...);
>> >> return pci_acs_flags_enabled(pdev, acs_flags);
>> >
>> > ...  Note that
>> > the above simplification incorrectly handles multifunction bridges or EC
>> > devices.
>>
>> Hmm...  What *is* the correct behavior for a bridge?  You return
>> "true," i.e., you're saying that a PCIe-to-PCI bridge will always
>> route peer-to-peer transactions from PCI devices upstream to its PCIe
>> link.  But that seems wrong: a PCI DMA transaction can target a peer
>> on the same PCI bus, and it's not even possible for the bridge to
>> validate the transaction or forward it upstream.
>>
>> I suspect the "ACS is never applicable to a PCI Express to PCI Bridge
>> Function" statement in 6.12.1 just means "it's impossible for ACS to
>> isolate the devices below the bridge from each other, so it would be
>> misleading to implement the capability."
>
> Note that we never consider ACS to be enabled for a conventional PCI
> device.  I suppose we could have cases where it's the only device on a
> bus, but for the most part, it's not worth the trouble (it may be the
> only device now, then a hotplug occurs).  So really saying the bridge
> does or doesn't support ACS doesn't matter to the devices behind it.

> What does matter is the fan-out of that isolation group of the
> conventional devices beyond the bridge.  If the spec is indicating that
> a bridge cannot do peer-to-peer with other devices  then all of the
> conventional devices behind it are in an isolation domain so long as the
> path between the bridge and the RC supports ACS.  If the bridge can do
> peer-to-peer and it is a multifunction device, then the isolation domain
> grows to include the other functions and subordinates of the other
> functions.  I took the assumption that a bridge probably needs to
> forward transaction upstream.  Do you have an alternate opinion?

I think you're talking about a multi-function device with several
PCIe-to-PCI bridges (e.g., Option A of Figure 1-4, p. 29, of the PCIe
bridge spec 1.0), and the question is whether the bridge can forward a
transaction between bus X and bus Y without forwarding it upstream.

I don't see any mention of

[PATCH -next v2] Staging: netlogic: fix missing free_netdev() on error in xlr_net_probe()

2013-06-18 Thread Wei Yongjun

Fix missing free_netdev() before return from function xlr_net_probe()
in the devm_ioremap_resource() error handling case.

Signed-off-by: Wei Yongjun 
---
v1 -> v2: remove redundant error message.
---
 drivers/staging/netlogic/xlr_net.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/netlogic/xlr_net.c 
b/drivers/staging/netlogic/xlr_net.c
index b529d79..af9e3f1 100644
--- a/drivers/staging/netlogic/xlr_net.c
+++ b/drivers/staging/netlogic/xlr_net.c
@@ -1023,9 +1023,8 @@ static int xlr_net_probe(struct platform_device *pdev)
ndev->base_addr = (unsigned long) devm_ioremap_resource
(>dev, res);
if (IS_ERR_VALUE(ndev->base_addr)) {
-   dev_err(>dev,
-   "devm_ioremap_resource failed\n");
-   return ndev->base_addr;
+   err = ndev->base_addr;
+   goto err_gmac;
}
 
res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv4 1/6] thermal: ti-soc-thermal: use standard GPIO DT bindings

2013-06-18 Thread Eduardo Valentin

This change updates the ti-soc-thermal driver to use
standard GPIO DT bindings to read the GPIO number associated
to thermal shutdown IRQ, in case the device features it.

Previously, the code was using a specific DT bindings.
As now OMAP supports the standard way to model GPIOs,
there is no point in having a ti specific binding.

Cc: Zhang Rui 
Cc: Grant Likely 
Cc: Rob Herring 
Cc: linux...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: devicetree-disc...@lists.ozlabs.org
Signed-off-by: Eduardo Valentin 
---
 Documentation/devicetree/bindings/thermal/ti_soc_thermal.txt | 9 +
 drivers/thermal/ti-soc-thermal/ti-bandgap.c  | 8 ++--
 2 files changed, 7 insertions(+), 10 deletions(-)
---
Rui,

This is the rebased version of patch 01. It has been rebased on
your thermal/next. Please send it for 3.11. All remaining patches
on this series have been applied to respective for_3.11 branches
on corresponding trees.

diff --git a/Documentation/devicetree/bindings/thermal/ti_soc_thermal.txt 
b/Documentation/devicetree/bindings/thermal/ti_soc_thermal.txt
index 1953b33..0c9222d 100644
--- a/Documentation/devicetree/bindings/thermal/ti_soc_thermal.txt
+++ b/Documentation/devicetree/bindings/thermal/ti_soc_thermal.txt
@@ -17,8 +17,9 @@ Required properties:
 - interrupts : this entry should indicate which interrupt line
 the talert signal is routed to;
 Specific:
-- ti,tshut-gpio : this entry should be used to inform which GPIO
-line the tshut signal is routed to;
+- gpios : this entry should be used to inform which GPIO
+line the tshut signal is routed to. The informed GPIO will
+be treated as an IRQ;
 - regs : this entry must also be specified and it is specific
 to each bandgap version, because the mapping may change from
 soc to soc, apart of depending on available features.
@@ -37,7 +38,7 @@ bandgap {
0x4a002378 0x18>;
compatible = "ti,omap4460-bandgap";
interrupts = <0 126 4>; /* talert */
-   ti,tshut-gpio = <86>;
+   gpios = < 22 0>; /* tshut */
 };
 
 OMAP4470:
@@ -47,7 +48,7 @@ bandgap {
0x4a002378 0x18>;
compatible = "ti,omap4470-bandgap";
interrupts = <0 126 4>; /* talert */
-   ti,tshut-gpio = <86>;
+   gpios = < 22 0>; /* tshut */
 };
 
 OMAP5430:
diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c 
b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
index 7c0b3eb..9dfd471 100644
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "ti-bandgap.h"
@@ -1129,7 +1130,6 @@ static struct ti_bandgap *ti_bandgap_build(struct 
platform_device *pdev)
const struct of_device_id *of_id;
struct ti_bandgap *bgp;
struct resource *res;
-   u32 prop;
int i;
 
/* just for the sake */
@@ -1173,11 +1173,7 @@ static struct ti_bandgap *ti_bandgap_build(struct 
platform_device *pdev)
} while (res);
 
if (TI_BANDGAP_HAS(bgp, TSHUT)) {
-   if (of_property_read_u32(node, "ti,tshut-gpio", ) < 0) {
-   dev_err(>dev, "missing tshut gpio in device 
tree\n");
-   return ERR_PTR(-EINVAL);
-   }
-   bgp->tshut_gpio = prop;
+   bgp->tshut_gpio = of_get_gpio(node, 0);
if (!gpio_is_valid(bgp->tshut_gpio)) {
dev_err(>dev, "invalid gpio for tshut (%d)\n",
bgp->tshut_gpio);
-- 
1.8.2.1.342.gfa7285d

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] drivers: gpio: msm: Fix the error condition for reading ngpio

2013-06-18 Thread Rohit Vaswani

of_property_read_u32 return 0 on success. The check was using a ! to
return error. Fix the if condition.

Signed-off-by: Rohit Vaswani 
---
 drivers/gpio/gpio-msm-v2.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpio/gpio-msm-v2.c b/drivers/gpio/gpio-msm-v2.c
index f4491a4..c2fa770 100644
--- a/drivers/gpio/gpio-msm-v2.c
+++ b/drivers/gpio/gpio-msm-v2.c
@@ -378,7 +378,7 @@ static int msm_gpio_probe(struct platform_device *pdev)
int ret, ngpio;
struct resource *res;
 
-   if (!of_property_read_u32(pdev->dev.of_node, "ngpio", )) {
+   if (of_property_read_u32(pdev->dev.of_node, "ngpio", )) {
dev_err(>dev, "%s: ngpio property missing\n", __func__);
return -EINVAL;
}
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] ARM: msm: Consolidate gpiomux for older architectures

2013-06-18 Thread Rohit Vaswani

Msm gpiomux can be used only for 7x30 and 8x50.
Prevent compilation and fix build issues on 7X00, 8X60 and 8960.

Signed-off-by: Rohit Vaswani 
---
 arch/arm/mach-msm/Kconfig  |3 +--
 arch/arm/mach-msm/gpiomux-v1.c |   33 -
 arch/arm/mach-msm/gpiomux.h|   10 --
 3 files changed, 1 insertions(+), 45 deletions(-)
 delete mode 100644 arch/arm/mach-msm/gpiomux-v1.c

diff --git a/arch/arm/mach-msm/Kconfig b/arch/arm/mach-msm/Kconfig
index 614e41e..905efc8 100644
--- a/arch/arm/mach-msm/Kconfig
+++ b/arch/arm/mach-msm/Kconfig
@@ -121,8 +121,7 @@ config MSM_SMD
bool
 
 config MSM_GPIOMUX
-   depends on !(ARCH_MSM8X60 || ARCH_MSM8960)
-   bool "MSM V1 TLMM GPIOMUX architecture"
+   bool
help
  Support for MSM V1 TLMM GPIOMUX architecture.
 
diff --git a/arch/arm/mach-msm/gpiomux-v1.c b/arch/arm/mach-msm/gpiomux-v1.c
deleted file mode 100644
index 27de2ab..000
--- a/arch/arm/mach-msm/gpiomux-v1.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
- * 02110-1301, USA.
- */
-#include 
-#include "gpiomux.h"
-#include "proc_comm.h"
-
-void __msm_gpiomux_write(unsigned gpio, gpiomux_config_t val)
-{
-   unsigned tlmm_config  = (val & ~GPIOMUX_CTL_MASK) |
-   ((gpio & 0x3ff) << 4);
-   unsigned tlmm_disable = 0;
-   int rc;
-
-   rc = msm_proc_comm(PCOM_RPC_GPIO_TLMM_CONFIG_EX,
-  _config, _disable);
-   if (rc)
-   pr_err("%s: unexpected proc_comm failure %d: %08x %08x\n",
-  __func__, rc, tlmm_config, tlmm_disable);
-}
diff --git a/arch/arm/mach-msm/gpiomux.h b/arch/arm/mach-msm/gpiomux.h
index 8e82f41..4410d77 100644
--- a/arch/arm/mach-msm/gpiomux.h
+++ b/arch/arm/mach-msm/gpiomux.h
@@ -73,16 +73,6 @@ extern struct msm_gpiomux_config 
msm_gpiomux_configs[GPIOMUX_NGPIOS];
 int msm_gpiomux_write(unsigned gpio,
  gpiomux_config_t active,
  gpiomux_config_t suspended);
-
-/* Architecture-internal function for use by the framework only.
- * This function can assume the following:
- * - the gpio value has passed a bounds-check
- * - the gpiomux spinlock has been obtained
- *
- * This function is not for public consumption.  External users
- * should use msm_gpiomux_write.
- */
-void __msm_gpiomux_write(unsigned gpio, gpiomux_config_t val);
 #else
 static inline int msm_gpiomux_write(unsigned gpio,
gpiomux_config_t active,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/22] Audit: remove duplicate comments

2013-06-18 Thread Gao feng

Remove it.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index ad3084c..843e7a2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1067,13 +1067,6 @@ static void wait_for_auditd(unsigned long sleep_time)
remove_wait_queue(_backlog_wait, );
 }
 
-/* Obtain an audit buffer.  This routine does locking to obtain the
- * audit buffer, but then no locking is required for calls to
- * audit_log_*format.  If the tsk is a task that is currently in a
- * syscall, then the syscall is marked as auditable and an audit record
- * will be written at syscall exit.  If there is no associated task, tsk
- * should be NULL. */
-
 /**
  * audit_log_start - obtain an audit buffer
  * @ctx: audit_context (may be NULL)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/22] Audit: only allow init user namespace to change audit_failure

2013-06-18 Thread Gao feng

Setting audit_failure to AUDIT_FAIL_PANIC may
cause system panic.

We should disallow uninit user namesapce to change it.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 306231d..79a8b8e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -327,6 +327,9 @@ static int audit_set_failure(int state)
&& state != AUDIT_FAIL_PANIC)
return -EINVAL;
 
+   if (current_user_ns() != _user_ns)
+   return -EPERM;
+
return audit_do_config_change("audit_failure", _failure, state);
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/22] Audit: only allow init user namespace to change backlog_limit

2013-06-18 Thread Gao feng

Prevent un-init user namespace from generating lots of skb.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 79a8b8e..297ac6e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -303,6 +303,9 @@ static int audit_set_rate_limit(int limit)
 
 static int audit_set_backlog_limit(int limit)
 {
+   if (current_user_ns() != _user_ns)
+   return -EPERM;
+
return audit_do_config_change("audit_backlog_limit", 
_backlog_limit, limit);
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/22] Audit: make kauditd_task per user namespace

2013-06-18 Thread Gao feng

This patch makes kauditd_task per user namespace,
Since right now we only allow user in init user
namesapce to send audit netlink message to kernel,
so actually the kauditd_task belongs to other user
namespace will still not run.

Signed-off-by: Gao feng 
---
 include/linux/audit.h  |  1 +
 include/linux/user_namespace.h | 15 +--
 kernel/audit.c | 58 ++
 kernel/audit.h |  5 ++--
 kernel/auditsc.c   |  6 ++---
 5 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 6720901..179351d 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct audit_sig_info {
uid_t   uid;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 53420a4..ae69f20 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -21,8 +21,10 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
 #ifdef CONFIG_AUDIT
 struct audit_ctrl {
struct sock *sock;
+   int pid;
struct sk_buff_head queue;
struct sk_buff_head hold_queue;
+   struct task_struct  *kauditd_task;
 };
 #endif
 
@@ -59,8 +61,17 @@ extern void free_user_ns(struct user_namespace *ns);
 
 static inline void put_user_ns(struct user_namespace *ns)
 {
-   if (ns && atomic_dec_and_test(>count))
-   free_user_ns(ns);
+   if (ns) {
+   if (atomic_dec_and_test(>count)) {
+   free_user_ns(ns);
+   } else if (atomic_read(>count) == 1) {
+   /* If the last user of this userns is kauditd,
+* we should wake up the kauditd and let it kill
+* itself, Then this userns will be destroyed.*/
+   if (ns->audit.kauditd_task)
+   wake_up_process(ns->audit.kauditd_task);
+   }
+   }
 }
 
 struct seq_operations;
diff --git a/kernel/audit.c b/kernel/audit.c
index 75325f0..7b696cd5 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -94,7 +94,6 @@ static intaudit_failure = AUDIT_FAIL_PRINTK;
  * contains the pid of the auditd process and audit_nlk_portid contains
  * the portid to use to send netlink messages to that process.
  */
-intaudit_pid;
 static int audit_nlk_portid;
 
 /* If audit_rate_limit is non-zero, limit the rate of sending audit records
@@ -131,7 +130,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-static struct task_struct *kauditd_task;
 static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
 static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
 
@@ -184,7 +182,7 @@ void audit_panic(const char *message)
break;
case AUDIT_FAIL_PANIC:
/* test audit_pid since printk is always losey, why bother? */
-   if (audit_pid)
+   if (_user_ns.audit.pid)
panic("audit: %s\n", message);
break;
}
@@ -386,9 +384,10 @@ static void kauditd_send_skb(struct sk_buff *skb)
  audit_nlk_portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
-   printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", 
audit_pid);
+   printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n",
+  init_user_ns.audit.pid);
audit_log_lost("auditd disappeared\n");
-   audit_pid = 0;
+   init_user_ns.audit.pid = 0;
/* we might get lucky and get this in the next auditd */
audit_hold_skb(skb);
} else
@@ -411,19 +410,19 @@ static void kauditd_send_skb(struct sk_buff *skb)
  * in 5 years when I want to play with this again I'll see this
  * note and still have no friggin idea what i'm thinking today.
  */
-static void flush_hold_queue(void)
+static void flush_hold_queue(struct user_namespace *ns)
 {
struct sk_buff *skb;
-   struct sk_buff_head *hold_queue = _user_ns.audit.hold_queue;
+   struct sk_buff_head *hold_queue = >audit.hold_queue;
 
-   if (!audit_default || !audit_pid || !init_user_ns.audit.sock)
+   if (!audit_default || !ns->audit.pid || !ns->audit.sock)
return;
 
skb = skb_dequeue(hold_queue);
if (likely(!skb))
return;
 
-   while (skb && audit_pid) {
+   while (skb && ns->audit.pid) {
kauditd_send_skb(skb);
skb = skb_dequeue(hold_queue);
}
@@ -438,18 +437,26 @@ static void flush_hold_queue(void)
 
 static int kauditd_thread(void *dummy)
 {
+   struct user_namespace *ns = dummy;
+
set_freezable();
while

[PATCH 06/22] Audit: make audit_skb_queue per user namespace

2013-06-18 Thread Gao feng

After this patch, ervery user namespace has one
audit_skb_queue. Since we havn't finish the preparations,
only allow user to operate the skb queue of init user
namespace.

Signed-off-by: Gao feng 
---
 include/linux/audit.h  |  4 
 include/linux/user_namespace.h |  2 ++
 kernel/audit.c | 34 +-
 kernel/user_namespace.c|  1 +
 4 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 85f9d7f..6720901 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -439,6 +439,7 @@ extern int audit_log_task_context(struct audit_buffer *ab);
 extern void audit_log_task_info(struct audit_buffer *ab,
struct task_struct *tsk);
 
+extern voidaudit_set_user_ns(struct user_namespace *ns);
 extern voidaudit_free_user_ns(struct user_namespace *ns);
 
 extern int audit_update_lsm_rules(void);
@@ -495,6 +496,9 @@ static inline void audit_log_task_info(struct audit_buffer 
*ab,
   struct task_struct *tsk)
 { }
 
+static inline void audit_set_user_ns(struct user_namespace *ns)
+{ }
+
 static inline void audit_free_user_ns(struct user_namespace *ns)
 { }
 #define audit_enabled 0
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8797421..e322f20 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define UID_GID_MAP_MAX_EXTENTS 5
 
@@ -20,6 +21,7 @@ struct uid_gid_map {  /* 64 bytes -- 1 cache line */
 #ifdef CONFIG_AUDIT
 struct audit_ctrl {
struct sock *sock;
+   struct sk_buff_head queue;
 };
 #endif
 
diff --git a/kernel/audit.c b/kernel/audit.c
index a411b02..e2f6366 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -131,7 +131,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-static struct sk_buff_head audit_skb_queue;
 /* queue of skbs to send to auditd when/if it comes back */
 static struct sk_buff_head audit_skb_hold_queue;
 static struct task_struct *kauditd_task;
@@ -441,11 +440,12 @@ static int kauditd_thread(void *dummy)
set_freezable();
while (!kthread_should_stop()) {
struct sk_buff *skb;
+   struct sk_buff_head *queue = _user_ns.audit.queue;
DECLARE_WAITQUEUE(wait, current);
 
flush_hold_queue();
 
-   skb = skb_dequeue(_skb_queue);
+   skb = skb_dequeue(queue);
wake_up(_backlog_wait);
if (skb) {
if (audit_pid && init_user_ns.audit.sock)
@@ -457,7 +457,7 @@ static int kauditd_thread(void *dummy)
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(_wait, );
 
-   if (!skb_queue_len(_skb_queue)) {
+   if (!skb_queue_len(queue)) {
try_to_freeze();
schedule();
}
@@ -648,11 +648,13 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
struct audit_sig_info   *sig_data;
char*ctx = NULL;
u32 len;
+   struct user_namespace   *ns;
 
err = audit_netlink_ok(skb, msg_type);
if (err)
return err;
 
+   ns = current_user_ns();
/* As soon as there's any sign of userspace auditd,
 * start kauditd to talk to it */
if (!kauditd_task) {
@@ -674,7 +676,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
status_set.rate_limit= audit_rate_limit;
status_set.backlog_limit = audit_backlog_limit;
status_set.lost  = atomic_read(_lost);
-   status_set.backlog   = skb_queue_len(_skb_queue);
+   status_set.backlog   =
+   skb_queue_len(>audit.queue);
audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
 _set, sizeof(status_set));
break;
@@ -952,7 +955,7 @@ static int __init audit_init(void)
if (register_pernet_subsys(_net_ops) < 0)
return -1;
 
-   skb_queue_head_init(_skb_queue);
+   audit_set_user_ns(_user_ns);
skb_queue_head_init(_skb_hold_queue);
audit_initialized = AUDIT_INITIALIZED;
audit_enabled = audit_default;
@@ -1102,12 +1105,13 @@ static inline void audit_get_stamp(struct audit_context 
*ctx,
  */
 static void wait_for_auditd(unsigned long sleep_time)
 {
+   const struct sk_buff_head *queue = _user_ns.audit.queue;
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
add_wait_queue(_backlog_wait, );
 
if

[PATCH 1/3] ARM: msm: dts: Fix the gpio register address for msm8960

2013-06-18 Thread Rohit Vaswani

Fix the the gpio reg address for the device tree entry.

Signed-off-by: Rohit Vaswani 
---
 arch/arm/boot/dts/msm8960-cdp.dts |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/msm8960-cdp.dts 
b/arch/arm/boot/dts/msm8960-cdp.dts
index db2060c..9c1167b0 100644
--- a/arch/arm/boot/dts/msm8960-cdp.dts
+++ b/arch/arm/boot/dts/msm8960-cdp.dts
@@ -26,7 +26,7 @@
cpu-offset = <0x8>;
};
 
-   msmgpio: gpio@fd51 {
+   msmgpio: gpio@80 {
compatible = "qcom,msm-gpio";
gpio-controller;
#gpio-cells = <2>;
@@ -34,7 +34,7 @@
interrupts = <0 32 0x4>;
interrupt-controller;
#interrupt-cells = <2>;
-   reg = <0xfd51 0x4000>;
+   reg = <0x80 0x4000>;
};
 
serial@1644 {
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/22] Audit: make audit_ever_enabled per user namespace

2013-06-18 Thread Gao feng

We set audit_ever_enabled true after we enabled audit once.
and if audit_ever_enabled is true, we will allocate audit
context for task.

We should decide if to allocate audit context for tasks based on
if the audit is enabled once in the user namespace which the
task belongs to.

Signed-off-by: Gao feng 
---
 include/linux/user_namespace.h | 1 +
 kernel/audit.c | 7 +++
 kernel/auditsc.c   | 5 -
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 9972f0f..a2c0a79 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,6 +27,7 @@ struct audit_ctrl {
struct sk_buff_head queue;
struct sk_buff_head hold_queue;
struct task_struct  *kauditd_task;
+   boolever_enabled;
 };
 #endif
 
diff --git a/kernel/audit.c b/kernel/audit.c
index 758b1e8..923fe27 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -78,7 +78,6 @@ static intaudit_initialized;
 #define AUDIT_OFF  0
 #define AUDIT_ON   1
 #define AUDIT_LOCKED   2
-bool   audit_ever_enabled;
 
 /* Default state when kernel boots without any parameters. */
 static int audit_default;
@@ -313,7 +312,7 @@ static int audit_set_enabled(struct user_namespace *ns, int 
state)
rc =  audit_do_config_change("audit_enabled", >audit.enabled,
 state);
if (!rc)
-   audit_ever_enabled |= !!state;
+   ns->audit.ever_enabled |= !!state;
 
return rc;
 }
@@ -965,7 +964,6 @@ static int __init audit_init(void)
 
audit_set_user_ns(_user_ns);
audit_initialized = AUDIT_INITIALIZED;
-   audit_ever_enabled |= !!audit_default;
 
audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized");
 
@@ -987,7 +985,7 @@ static int __init audit_enable(char *str)
 
if (audit_initialized == AUDIT_INITIALIZED) {
init_user_ns.audit.enabled = audit_default;
-   audit_ever_enabled |= !!audit_default;
+   init_user_ns.audit.ever_enabled |= !!audit_default;
} else if (audit_initialized == AUDIT_UNINITIALIZED) {
printk(" (after initialization)");
} else {
@@ -1792,6 +1790,7 @@ void audit_set_user_ns(struct user_namespace *ns)
skb_queue_head_init(>audit.queue);
skb_queue_head_init(>audit.hold_queue);
ns->audit.enabled = audit_default;
+   ns->audit.ever_enabled |= !!audit_default;
 }
 
 void audit_free_user_ns(struct user_namespace *ns)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8ba8684..3fa69cb 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -938,8 +938,11 @@ int audit_alloc(struct task_struct *tsk)
struct audit_context *context;
enum audit_state state;
char *key = NULL;
+   struct user_namespace *ns = current_user_ns();
+   /* Use current_user_ns, since this new task may run
+* in new user namespace */
 
-   if (likely(!audit_ever_enabled))
+   if (likely(!ns->audit.ever_enabled))
return 0; /* Return if not auditing. */
 
state = audit_filter_task(tsk, );
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/22] Audit: implement audit self-defined compare function

2013-06-18 Thread Gao feng

After this patch, audit netlink sockets can
communicate with each other when they belong
to the same user namespace.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 11b56b7..a411b02 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -895,6 +895,11 @@ static void audit_receive(struct sk_buff  *skb)
mutex_unlock(_cmd_mutex);
 }
 
+static bool audit_compare(struct net *net, struct sock *sk)
+{
+   return (sock_net(sk)->user_ns == net->user_ns);
+}
+
 static int __net_init audit_net_init(struct net *net)
 {
struct user_namespace *ns = net->user_ns;
@@ -907,6 +912,7 @@ static int __net_init audit_net_init(struct net *net)
 */
struct netlink_kernel_cfg cfg = {
.input  = audit_receive,
+   .compare = audit_compare,
};
 
sk = netlink_kernel_create(net, NETLINK_AUDIT, );
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/22] netlink: Add compare function for netlink_table

2013-06-18 Thread Gao feng

As we know, netlink sockets are private resource of
net namespace, they can communicate with each other
only when they in the same net namespace. this works
well until we try to add namespace support for other
subsystems which use netlink.

Don't like ipv4 and route table.., it is not suited to
make these subsytems belong to net namespace, Such as
audit and crypto subsystems,they are more suitable to
user namespace.

So we must have the ability to make the netlink sockets
in same user namespace can communicate with each other.

This patch adds a new function pointer "compare" for
netlink_table, we can decide if the netlink sockets can
communicate with each other through this netlink_table
self-defined compare function.

The behavior isn't changed if we don't provide the compare
function for netlink_table.

Signed-off-by: Gao feng 
---
 include/linux/netlink.h  |  1 +
 net/netlink/af_netlink.c | 32 
 net/netlink/af_netlink.h |  1 +
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 6358da5..f78b430 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -46,6 +46,7 @@ struct netlink_kernel_cfg {
void(*input)(struct sk_buff *skb);
struct mutex*cb_mutex;
void(*bind)(int group);
+   bool(*compare)(struct net *net, struct sock *sk);
 };
 
 extern struct sock *__netlink_kernel_create(struct net *net, int unit,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 57ee84d..942d429 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -854,16 +854,23 @@ netlink_unlock_table(void)
wake_up(_table_wait);
 }
 
+static bool netlink_compare(struct net *net, struct sock *sk)
+{
+   return net_eq(sock_net(sk), net);
+}
+
 static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
 {
-   struct nl_portid_hash *hash = _table[protocol].hash;
+   struct netlink_table *table = _table[protocol];
+   struct nl_portid_hash *hash = >hash;
struct hlist_head *head;
struct sock *sk;
 
read_lock(_table_lock);
head = nl_portid_hashfn(hash, portid);
sk_for_each(sk, head) {
-   if (net_eq(sock_net(sk), net) && (nlk_sk(sk)->portid == 
portid)) {
+   if (table->compare(net, sk) &&
+   (nlk_sk(sk)->portid == portid)) {
sock_hold(sk);
goto found;
}
@@ -976,7 +983,8 @@ netlink_update_listeners(struct sock *sk)
 
 static int netlink_insert(struct sock *sk, struct net *net, u32 portid)
 {
-   struct nl_portid_hash *hash = _table[sk->sk_protocol].hash;
+   struct netlink_table *table = _table[sk->sk_protocol];
+   struct nl_portid_hash *hash = >hash;
struct hlist_head *head;
int err = -EADDRINUSE;
struct sock *osk;
@@ -986,7 +994,8 @@ static int netlink_insert(struct sock *sk, struct net *net, 
u32 portid)
head = nl_portid_hashfn(hash, portid);
len = 0;
sk_for_each(osk, head) {
-   if (net_eq(sock_net(osk), net) && (nlk_sk(osk)->portid == 
portid))
+   if (table->compare(net, osk) &&
+   (nlk_sk(osk)->portid == portid))
break;
len++;
}
@@ -1183,7 +1192,8 @@ static int netlink_autobind(struct socket *sock)
 {
struct sock *sk = sock->sk;
struct net *net = sock_net(sk);
-   struct nl_portid_hash *hash = _table[sk->sk_protocol].hash;
+   struct netlink_table *table = _table[sk->sk_protocol];
+   struct nl_portid_hash *hash = >hash;
struct hlist_head *head;
struct sock *osk;
s32 portid = task_tgid_vnr(current);
@@ -1195,7 +1205,7 @@ retry:
netlink_table_grab();
head = nl_portid_hashfn(hash, portid);
sk_for_each(osk, head) {
-   if (!net_eq(sock_net(osk), net))
+   if (!table->compare(net, osk))
continue;
if (nlk_sk(osk)->portid == portid) {
/* Bind collision, search negative portid values. */
@@ -2285,6 +2295,8 @@ __netlink_kernel_create(struct net *net, int unit, struct 
module *module,
if (cfg) {
nl_table[unit].bind = cfg->bind;
nl_table[unit].flags = cfg->flags;
+   if (cfg->compare)
+   nl_table[unit].compare = cfg->compare;
}
nl_table[unit].registered = 1;
} else {
@@ -2707,6 +2719,7 @@ static void *netlink_seq_next(struct seq_file *seq, void 
*v, loff_t *pos)
 {
struct sock *s;
struct nl_seq_iter *iter;
+   struct net *net;
int i, j;
 
++*pos;
@@ -2714,11 +2727,12 @@ static void *netlink_seq_next(struct seq_file *seq, 
void *v, loff_t *pos)

[PATCH 13/22] Audit: only allow init user namespace to change rate limit

2013-06-18 Thread Gao feng

Because We want to avoid the DoS attack caused by other user
namespace,so don't make audit_rate_limit per user namespace.
And only init user namespace has rights to change it.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 0b9cef2..306231d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -295,6 +295,9 @@ static int audit_do_config_change(char *function_name, int 
*to_change, int new)
 
 static int audit_set_rate_limit(int limit)
 {
+   if (current_user_ns() != _user_ns)
+   return -EPERM;
+
return audit_do_config_change("audit_rate_limit", _rate_limit, 
limit);
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 17/22] Audit: make audit_backlog_wait per user namespace

2013-06-18 Thread Gao feng

Tasks are added to audit_backlog_wait when the
audit_skb_queue of user namespace is full, so
audit_backlog_wait should be per user namespace too.

Signed-off-by: Gao feng 
---
 include/linux/user_namespace.h |  1 +
 kernel/audit.c | 11 +--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 28938f3..c186a84 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -29,6 +29,7 @@ struct audit_ctrl {
struct sk_buff_head hold_queue;
struct task_struct  *kauditd_task;
wait_queue_head_t   kauditd_wait;
+   wait_queue_head_t   backlog_wait;
boolever_enabled;
 };
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index e3d7da7..3dcaa97 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -119,8 +119,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
-
 /* Serialize requests from userspace. */
 DEFINE_MUTEX(audit_cmd_mutex);
 
@@ -453,7 +451,7 @@ static int kauditd_thread(void *dummy)
flush_hold_queue(ns);
 
skb = skb_dequeue(queue);
-   wake_up(_backlog_wait);
+   wake_up(>audit.backlog_wait);
if (skb) {
if (ns->audit.pid && ns->audit.sock)
kauditd_send_skb(ns, skb);
@@ -1119,14 +1117,14 @@ static void wait_for_auditd(unsigned long sleep_time)
const struct sk_buff_head *queue = _user_ns.audit.queue;
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
-   add_wait_queue(_backlog_wait, );
+   add_wait_queue(_user_ns.audit.backlog_wait, );
 
if (audit_backlog_limit &&
skb_queue_len(queue) > audit_backlog_limit)
schedule_timeout(sleep_time);
 
__set_current_state(TASK_RUNNING);
-   remove_wait_queue(_backlog_wait, );
+   remove_wait_queue(_user_ns.audit.backlog_wait, );
 }
 
 /**
@@ -1185,7 +1183,7 @@ struct audit_buffer *audit_log_start(struct audit_context 
*ctx, gfp_t gfp_mask,
   audit_backlog_limit);
audit_log_lost("backlog limit exceeded");
audit_backlog_wait_time = audit_backlog_wait_overflow;
-   wake_up(_backlog_wait);
+   wake_up(_user_ns.audit.backlog_wait);
return NULL;
}
 
@@ -1799,6 +1797,7 @@ void audit_set_user_ns(struct user_namespace *ns)
ns->audit.enabled = audit_default;
ns->audit.ever_enabled |= !!audit_default;
init_waitqueue_head(>audit.kauditd_wait);
+   init_waitqueue_head(>audit.backlog_wait);
 
ns->audit.initialized = AUDIT_INITIALIZED;
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/22] Audit: make kauditd_wait per user namespace

2013-06-18 Thread Gao feng

kauditd_task is added to the wait queue kaudit_wait when
there is no audit message being generated in user namespace,
so the kaudit_wait should be per user namespace too.

Signed-off-by: Gao feng 
---
 include/linux/user_namespace.h |  1 +
 kernel/audit.c | 36 ++--
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index c665569..28938f3 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -28,6 +28,7 @@ struct audit_ctrl {
struct sk_buff_head queue;
struct sk_buff_head hold_queue;
struct task_struct  *kauditd_task;
+   wait_queue_head_t   kauditd_wait;
boolever_enabled;
 };
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 297ac6e..e3d7da7 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -119,7 +119,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
 static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
 
 /* Serialize requests from userspace. */
@@ -345,9 +344,9 @@ static int audit_set_failure(int state)
  * This only holds messages is audit_default is set, aka booting with audit=1
  * or building your kernel that way.
  */
-static void audit_hold_skb(struct sk_buff *skb)
+static void audit_hold_skb(struct user_namespace *ns, struct sk_buff *skb)
 {
-   struct sk_buff_head *hold_queue = _user_ns.audit.hold_queue;
+   struct sk_buff_head *hold_queue = >audit.hold_queue;
 
if (audit_default &&
skb_queue_len(hold_queue) < audit_backlog_limit)
@@ -360,7 +359,7 @@ static void audit_hold_skb(struct sk_buff *skb)
  * For one reason or another this nlh isn't getting delivered to the userspace
  * audit daemon, just send it to printk.
  */
-static void audit_printk_skb(struct sk_buff *skb)
+static void audit_printk_skb(struct user_namespace *ns, struct sk_buff *skb)
 {
struct nlmsghdr *nlh = nlmsg_hdr(skb);
char *data = nlmsg_data(nlh);
@@ -372,24 +371,24 @@ static void audit_printk_skb(struct sk_buff *skb)
audit_log_lost("printk limit exceeded\n");
}
 
-   audit_hold_skb(skb);
+   audit_hold_skb(ns, skb);
 }
 
-static void kauditd_send_skb(struct sk_buff *skb)
+static void kauditd_send_skb(struct user_namespace *ns, struct sk_buff *skb)
 {
int err;
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
-   err = netlink_unicast(init_user_ns.audit.sock, skb,
- init_user_ns.audit.portid, 0);
+   err = netlink_unicast(ns->audit.sock, skb,
+ ns->audit.portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n",
-  init_user_ns.audit.pid);
+  ns->audit.pid);
audit_log_lost("auditd disappeared\n");
-   init_user_ns.audit.pid = 0;
+   ns->audit.pid = 0;
/* we might get lucky and get this in the next auditd */
-   audit_hold_skb(skb);
+   audit_hold_skb(ns, skb);
} else
/* drop the extra reference if sent ok */
consume_skb(skb);
@@ -423,7 +422,7 @@ static void flush_hold_queue(struct user_namespace *ns)
return;
 
while (skb && ns->audit.pid) {
-   kauditd_send_skb(skb);
+   kauditd_send_skb(ns, skb);
skb = skb_dequeue(hold_queue);
}
 
@@ -457,13 +456,13 @@ static int kauditd_thread(void *dummy)
wake_up(_backlog_wait);
if (skb) {
if (ns->audit.pid && ns->audit.sock)
-   kauditd_send_skb(skb);
+   kauditd_send_skb(ns, skb);
else
-   audit_printk_skb(skb);
+   audit_printk_skb(ns, skb);
continue;
}
set_current_state(TASK_INTERRUPTIBLE);
-   add_wait_queue(_wait, );
+   add_wait_queue(>audit.kauditd_wait, );
 
if (!skb_queue_len(queue)) {
try_to_freeze();
@@ -471,7 +470,7 @@ static int kauditd_thread(void *dummy)
}
 
__set_current_state(TASK_RUNNING);
-   remove_wait_queue(_wait, );
+   remove_wait_queue(>audit.kauditd_wait, );
}
 
put_user_ns(ns);
@@ -1728,9 +1727,9 @@ void audit_log_end(struct audit_buffer *ab)
 
if (init_user_ns.audit.pid && init_user_ns.audit.sock) {
skb_queue_tail(_user_ns.audit.queue,

[PATCH 12/22] Audit: make audit_initialized per user namespace

2013-06-18 Thread Gao feng

audit_initialized is used to identify if the audit
related resources have been initialized. it should
be per user namespace too.

Signed-off-by: Gao feng 
---
 include/linux/user_namespace.h |  1 +
 kernel/audit.c | 21 +++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a2c0a79..c665569 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -21,6 +21,7 @@ struct uid_gid_map {  /* 64 bytes -- 1 cache line */
 #ifdef CONFIG_AUDIT
 struct audit_ctrl {
struct sock *sock;
+   int initialized;
int enabled;
int pid;
int portid;
diff --git a/kernel/audit.c b/kernel/audit.c
index 923fe27..0b9cef2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -68,12 +68,12 @@
 
 #include "audit.h"
 
-/* No auditing will take place until audit_initialized == AUDIT_INITIALIZED.
+/* No auditing will take place until user namespace's
+ * audit.initialized == AUDIT_INITIALIZED.
  * (Initialization happens after skb_init is called.) */
 #define AUDIT_DISABLED -1
 #define AUDIT_UNINITIALIZED0
 #define AUDIT_INITIALIZED  1
-static int audit_initialized;
 
 #define AUDIT_OFF  0
 #define AUDIT_ON   1
@@ -953,7 +953,7 @@ static int __init audit_init(void)
 {
int i;
 
-   if (audit_initialized == AUDIT_DISABLED)
+   if (init_user_ns.audit.initialized == AUDIT_DISABLED)
return 0;
 
printk(KERN_INFO "audit: initializing netlink socket (%s)\n",
@@ -963,7 +963,6 @@ static int __init audit_init(void)
return -1;
 
audit_set_user_ns(_user_ns);
-   audit_initialized = AUDIT_INITIALIZED;
 
audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized");
 
@@ -979,14 +978,14 @@ static int __init audit_enable(char *str)
 {
audit_default = !!simple_strtol(str, NULL, 0);
if (!audit_default)
-   audit_initialized = AUDIT_DISABLED;
+   init_user_ns.audit.initialized = AUDIT_DISABLED;
 
printk(KERN_INFO "audit: %s", audit_default ? "enabled" : "disabled");
 
-   if (audit_initialized == AUDIT_INITIALIZED) {
+   if (init_user_ns.audit.initialized == AUDIT_INITIALIZED) {
init_user_ns.audit.enabled = audit_default;
init_user_ns.audit.ever_enabled |= !!audit_default;
-   } else if (audit_initialized == AUDIT_UNINITIALIZED) {
+   } else if (init_user_ns.audit.initialized == AUDIT_UNINITIALIZED) {
printk(" (after initialization)");
} else {
printk(" (until reboot)");
@@ -1147,7 +1146,7 @@ struct audit_buffer *audit_log_start(struct audit_context 
*ctx, gfp_t gfp_mask,
unsigned long timeout_start = jiffies;
struct sk_buff_head *queue = _user_ns.audit.queue;
 
-   if (audit_initialized != AUDIT_INITIALIZED)
+   if (init_user_ns.audit.initialized != AUDIT_INITIALIZED)
return NULL;
 
if (unlikely(audit_filter_type(type)))
@@ -1784,18 +1783,20 @@ EXPORT_SYMBOL(audit_log_secctx);
 
 void audit_set_user_ns(struct user_namespace *ns)
 {
-   if (audit_initialized == AUDIT_DISABLED)
+   if (init_user_ns.audit.initialized == AUDIT_DISABLED)
return;
 
skb_queue_head_init(>audit.queue);
skb_queue_head_init(>audit.hold_queue);
ns->audit.enabled = audit_default;
ns->audit.ever_enabled |= !!audit_default;
+
+   ns->audit.initialized = AUDIT_INITIALIZED;
 }
 
 void audit_free_user_ns(struct user_namespace *ns)
 {
-   if (audit_initialized == AUDIT_DISABLED)
+   if (init_user_ns.audit.initialized == AUDIT_DISABLED)
return;
 
if (ns->audit.sock) {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Btrfs: do not ignore errors when truncating the free space cache inode

2013-06-18 Thread Miao Xie

It was fixed by Wei Yongjun

http://marc.info/?l=linux-btrfs=136910396606489=2

Thanks
Miao

On tue, 18 Jun 2013 22:57:41 +0100, Djalal Harouni wrote:
> btrfs_check_trunc_cache_free_space() tries to check if there is enough
> space for cache inode truncation but it fails.
> 
> Currently this function always returns success even if there is no
> enough space. Fix this by returning the -ENOSPC error code.
> 
> Signed-off-by: Djalal Harouni 
> ---
> Totally untested code.
> 
>  fs/btrfs/free-space-cache.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 2750b50..9629830 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -201,7 +201,7 @@ int btrfs_check_trunc_cache_free_space(struct btrfs_root 
> *root,
>  struct btrfs_block_rsv *rsv)
>  {
>   u64 needed_bytes;
> - int ret;
> + int ret = 0;
>  
>   /* 1 for slack space, 1 for updating the inode */
>   needed_bytes = btrfs_calc_trunc_metadata_size(root, 1) +
> @@ -210,10 +210,10 @@ int btrfs_check_trunc_cache_free_space(struct 
> btrfs_root *root,
>   spin_lock(>lock);
>   if (rsv->reserved < needed_bytes)
>   ret = -ENOSPC;
> - else
> - ret = 0;
> +
>   spin_unlock(>lock);
> - return 0;
> +
> + return ret;
>  }
>  
>  int btrfs_truncate_free_space_cache(struct btrfs_root *root,
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/22] Audit: make audit kernel side netlink sock per userns

2013-06-18 Thread Gao feng

This patch try to make the audit_sock per user namespace,
not global.

Since sock is assigned to net namespace, when creating
a netns, we will allocate a audit_sock for the userns
which create this netns, and this netns will keep alive
until the creator userns being destroyed.

If userns creates many netns, the audit_sock is only
allocated once.

Signed-off-by: Gao feng 
---
 include/linux/audit.h  |   5 +++
 include/linux/user_namespace.h |   9 
 kernel/audit.c | 100 +++--
 kernel/user_namespace.c|   2 +
 4 files changed, 93 insertions(+), 23 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index b20b038..85f9d7f 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -439,6 +439,8 @@ extern int audit_log_task_context(struct audit_buffer *ab);
 extern void audit_log_task_info(struct audit_buffer *ab,
struct task_struct *tsk);
 
+extern voidaudit_free_user_ns(struct user_namespace *ns);
+
 extern int audit_update_lsm_rules(void);
 
/* Private API (for audit.c only) */
@@ -492,6 +494,9 @@ static inline int audit_log_task_context(struct 
audit_buffer *ab)
 static inline void audit_log_task_info(struct audit_buffer *ab,
   struct task_struct *tsk)
 { }
+
+static inline void audit_free_user_ns(struct user_namespace *ns)
+{ }
 #define audit_enabled 0
 #endif /* CONFIG_AUDIT */
 static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index b6b215f..8797421 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,12 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
} extent[UID_GID_MAP_MAX_EXTENTS];
 };
 
+#ifdef CONFIG_AUDIT
+struct audit_ctrl {
+   struct sock *sock;
+};
+#endif
+
 struct user_namespace {
struct uid_gid_map  uid_map;
struct uid_gid_map  gid_map;
@@ -26,6 +32,9 @@ struct user_namespace {
kuid_t  owner;
kgid_t  group;
unsigned intproc_inum;
+#ifdef CONFIG_AUDIT
+   struct audit_ctrl   audit;
+#endif
boolmay_mount_sysfs;
boolmay_mount_proc;
 };
diff --git a/kernel/audit.c b/kernel/audit.c
index 843e7a2..11b56b7 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "audit.h"
 
@@ -120,9 +121,6 @@ u32 audit_sig_sid = 0;
 */
 static atomic_taudit_lost = ATOMIC_INIT(0);
 
-/* The netlink socket. */
-static struct sock *audit_sock;
-
 /* Hash for inode-based rules */
 struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
 
@@ -385,7 +383,8 @@ static void kauditd_send_skb(struct sk_buff *skb)
int err;
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
-   err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
+   err = netlink_unicast(init_user_ns.audit.sock, skb,
+ audit_nlk_portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", 
audit_pid);
@@ -417,7 +416,7 @@ static void flush_hold_queue(void)
 {
struct sk_buff *skb;
 
-   if (!audit_default || !audit_pid)
+   if (!audit_default || !audit_pid || !init_user_ns.audit.sock)
return;
 
skb = skb_dequeue(_skb_hold_queue);
@@ -449,7 +448,7 @@ static int kauditd_thread(void *dummy)
skb = skb_dequeue(_skb_queue);
wake_up(_backlog_wait);
if (skb) {
-   if (audit_pid)
+   if (audit_pid && init_user_ns.audit.sock)
kauditd_send_skb(skb);
else
audit_printk_skb(skb);
@@ -479,8 +478,11 @@ int audit_send_list(void *_dest)
mutex_lock(_cmd_mutex);
mutex_unlock(_cmd_mutex);
 
+   if (!init_user_ns.audit.sock)
+   return 0;
+
while ((skb = __skb_dequeue(>q)) != NULL)
-   netlink_unicast(audit_sock, skb, pid, 0);
+   netlink_unicast(init_user_ns.audit.sock, skb, pid, 0);
 
kfree(dest);
 
@@ -521,7 +523,8 @@ static int audit_send_reply_thread(void *arg)
 
/* Ignore failure. It'll only happen if the sender goes away,
   because our timeout is set to infinite. */
-   netlink_unicast(audit_sock, reply->skb, reply->pid, 0);
+   netlink_unicast(init_user_ns.audit.sock, reply->skb,
+   reply->pid, 0);
kfree(reply);
return 0;
 }
@@ -538,16 +541,21 @@ static int audit_send_reply_thread(void

[Part1 PATCH 00/22] Add namespace support for audit

2013-06-18 Thread Gao feng

This patchset is first part of namespace support for audit.
in this patchset, the mainly resources of audit system have
been isolated. the audit filter, rules havn't been isolated
now. It will be implemented in Part2. We finished the isolation
of user audit message in this patchset.

I choose to assign audit to the user namespace.
Right now,there are six kinds of namespaces, such as
net, mount, ipc, pid, uts and user. the first five
namespaces have special usage. the audit isn't suitable to
belong to these five namespaces, And since the flag of system
call clone is in short supply, we can't provide a new flag such
as CLONE_NEWAUDIT to enable audit namespace separately. so the
user namespace may be the best choice.

[Patch 4/21] add a compare function pointer for netlink table,
so audit subsystem can use it's self-defined compare function
to make sure audit netlink sockets can communicate with each
other when they in the same user namespace. this patch has been
merged into David's net-next tree.

There is one point that some people may dislike,in [PATCH 3/21],
the kernel side audit netlink socket is created only when we create
the first netns for the userns, and this userns will hold the netns
until we destroy this userns. It also means if we only unshare the
user namespace, the audit is unavailable since we don't have audit
netlink socket. if we should unshare user and net namespace both.

change from RFC:
1, Move the cleanup patches to the head of this patchset.
2, Fix a scheduling while atomic BUG. This bug is caused by
   kthread_stop in audit_free_user_ns.
3, Only allow init user namespace to change backlog_limit.
4, Audit subsystem is available only when kernel side audit
   netlink socket has been created.
5, Only isolate the basic resources of audit, and only make
   user audit message namespace aware.


This patchset is based on linus' linux tree.

You can pull this patchset from:
git://github.com/gao-feng/auditns.git

The following changes since commit 8177a9d79c0e942dcac3312f15585d0344d505a5

"lseek(fd, n, SEEK_END) does *not* go to eof - n"

are available in the git repository at:

git://github.com/gao-feng/auditns.git

for you to fetch changes up to 85c36b981ac692ec18e362ba484629a457d50cb2

"Audit: Allow GET,SET,USER MSG operations in uninit user namespace"

Gao feng (22):
  Audit: change type of audit_ever_enabled to bool
  Audit: remove duplicate comments
  Audit: make audit kernel side netlink sock per userns
  netlink: Add compare function for netlink_table
  Audit: implement audit self-defined compare function
  Audit: make audit_skb_queue per user namespace
  Audit: make audit_skb_hold_queue per user namespace
  Audit: make kauditd_task per user namespace
  Audit: make audit_nlk_portid per user namesapce
  Audit: make audit_enabled per user namespace
  Audit: make audit_ever_enabled per user namespace
  Audit: make audit_initialized per user namespace
  Audit: only allow init user namespace to change rate limit
  Audit: only allow init user namespace to change audit_failure
  Audit: only allow init user namespace to change backlog_limit
  Audit: make kauditd_wait per user namespace
  Audit: make audit_backlog_wait per user namespace
  Audit: introduce new audit logging interface for user namespace
  Audit: pass proper user namespace to audit_log_common_recv_msg
  Audit: Log audit config change in uninit user namespace
  Audit: send reply message to the auditd in proper user namespace
  Audit: Allow GET,SET,USER MSG operations in uninit user namespace

 include/linux/audit.h  |  39 +++-
 include/linux/netlink.h|   1 +
 include/linux/user_namespace.h |  33 ++-
 kernel/audit.c | 452 +
 kernel/audit.h |   7 +-
 kernel/auditsc.c   |  11 +-
 kernel/user_namespace.c|   3 +
 net/netlink/af_netlink.c   |  32 ++-
 net/netlink/af_netlink.h   |   1 +
 9 files changed, 387 insertions(+), 192 deletions(-)

-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/22] Audit: change type of audit_ever_enabled to bool

2013-06-18 Thread Gao feng

It's better to define audit_ever_enabled as bool.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 2 +-
 kernel/audit.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 91e53d0..ad3084c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -78,7 +78,7 @@ static intaudit_initialized;
 #define AUDIT_ON   1
 #define AUDIT_LOCKED   2
 intaudit_enabled;
-intaudit_ever_enabled;
+bool   audit_ever_enabled;
 
 EXPORT_SYMBOL_GPL(audit_enabled);
 
diff --git a/kernel/audit.h b/kernel/audit.h
index 1c95131..2258827 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -205,7 +205,7 @@ struct audit_context {
 #endif
 };
 
-extern int audit_ever_enabled;
+extern bool audit_ever_enabled;
 
 extern void audit_copy_inode(struct audit_names *name,
 const struct dentry *dentry,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 19/22] Audit: pass proper user namespace to audit_log_common_recv_msg

2013-06-18 Thread Gao feng

The audit log that generated in user namespace should be
received by the auditd running in this user namespace.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 5d3764c..2d81aac 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -624,17 +624,18 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
msg_type)
return err;
 }
 
-static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
+static int audit_log_common_recv_msg(struct user_namespace *ns,
+struct audit_buffer **ab, u16 msg_type)
 {
int rc = 0;
-   uid_t uid = from_kuid(_user_ns, current_uid());
+   uid_t uid = from_kuid(ns, current_uid());
 
-   if (!audit_enabled_ns(_user_ns)) {
+   if (!audit_enabled_ns(ns)) {
*ab = NULL;
return rc;
}
 
-   *ab = audit_log_start(NULL, GFP_KERNEL, msg_type);
+   *ab = audit_log_start_ns(ns, NULL, GFP_KERNEL, msg_type);
if (unlikely(!*ab))
return rc;
audit_log_format(*ab, "pid=%d uid=%u", task_tgid_vnr(current), uid);
@@ -737,7 +738,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (err)
break;
}
-   audit_log_common_recv_msg(, msg_type);
+   audit_log_common_recv_msg(ns, , msg_type);
if (msg_type != AUDIT_USER_TTY)
audit_log_format(ab, " msg='%.1024s'",
 (char *)data);
@@ -752,7 +753,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
audit_log_n_untrustedstring(ab, data, size);
}
audit_set_pid(ab, NETLINK_CB(skb).portid);
-   audit_log_end(ab);
+   audit_log_end_ns(ns, ab);
}
break;
case AUDIT_ADD_RULE:
@@ -760,10 +761,11 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (nlmsg_len(nlh) < sizeof(struct audit_rule_data))
return -EINVAL;
if (ns->audit.enabled == AUDIT_LOCKED) {
-   audit_log_common_recv_msg(, AUDIT_CONFIG_CHANGE);
+   audit_log_common_recv_msg(ns, ,
+ AUDIT_CONFIG_CHANGE);
audit_log_format(ab, " audit_enabled=%d res=0",
 ns->audit.enabled);
-   audit_log_end(ab);
+   audit_log_end_ns(ns, ab);
return -EPERM;
}
/* fallthrough */
@@ -773,9 +775,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
break;
case AUDIT_TRIM:
audit_trim_trees();
-   audit_log_common_recv_msg(, AUDIT_CONFIG_CHANGE);
+   audit_log_common_recv_msg(ns, , AUDIT_CONFIG_CHANGE);
audit_log_format(ab, " op=trim res=1");
-   audit_log_end(ab);
+   audit_log_end_ns(ns, ab);
break;
case AUDIT_MAKE_EQUIV: {
void *bufp = data;
@@ -803,14 +805,14 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
/* OK, here comes... */
err = audit_tag_tree(old, new);
 
-   audit_log_common_recv_msg(, AUDIT_CONFIG_CHANGE);
+   audit_log_common_recv_msg(ns, , AUDIT_CONFIG_CHANGE);
 
audit_log_format(ab, " op=make_equiv old=");
audit_log_untrustedstring(ab, old);
audit_log_format(ab, " new=");
audit_log_untrustedstring(ab, new);
audit_log_format(ab, " res=%d", !err);
-   audit_log_end(ab);
+   audit_log_end_ns(ns, ab);
kfree(old);
kfree(new);
break;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] ARM: MSM: Consolidate MSM gpiomux for 7X30 and 8X50

2013-06-18 Thread Rohit Vaswani

This series is based on David Brown's for-next branch
git://git.kernel.org/pub/scm/linux/kernel/git/davidb/linux-msm.git for-next

The previous gpio patches left some room for consolidation and required
a few fixes. This patch provides the necessary patches to do that.

Rohit Vaswani (3):
  ARM: msm: dts: Fix the gpio register address for msm8960
  drivers: gpio: msm: Fix the error condition for reading ngpio
  ARM: msm: Consolidate gpiomux for older architectures

 arch/arm/boot/dts/msm8960-cdp.dts |4 ++--
 arch/arm/mach-msm/Kconfig |3 +--
 arch/arm/mach-msm/gpiomux-v1.c|   33 -
 arch/arm/mach-msm/gpiomux.h   |   10 --
 drivers/gpio/gpio-msm-v2.c|2 +-
 5 files changed, 4 insertions(+), 48 deletions(-)
 delete mode 100644 arch/arm/mach-msm/gpiomux-v1.c

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/22] Audit: make audit_skb_hold_queue per user namespace

2013-06-18 Thread Gao feng

After this patch, ervery user namespace has one
audit_skb_hold_queue. Since we havn't finish the
preparations, only allow user to operate the skb
hold queue of init user namespace.

Signed-off-by: Gao feng 
---
 include/linux/user_namespace.h |  1 +
 kernel/audit.c | 16 +---
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e322f20..53420a4 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -22,6 +22,7 @@ struct uid_gid_map {  /* 64 bytes -- 1 cache line */
 struct audit_ctrl {
struct sock *sock;
struct sk_buff_head queue;
+   struct sk_buff_head hold_queue;
 };
 #endif
 
diff --git a/kernel/audit.c b/kernel/audit.c
index e2f6366..75325f0 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -131,8 +131,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-/* queue of skbs to send to auditd when/if it comes back */
-static struct sk_buff_head audit_skb_hold_queue;
 static struct task_struct *kauditd_task;
 static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
 static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
@@ -351,9 +349,11 @@ static int audit_set_failure(int state)
  */
 static void audit_hold_skb(struct sk_buff *skb)
 {
+   struct sk_buff_head *hold_queue = _user_ns.audit.hold_queue;
+
if (audit_default &&
-   skb_queue_len(_skb_hold_queue) < audit_backlog_limit)
-   skb_queue_tail(_skb_hold_queue, skb);
+   skb_queue_len(hold_queue) < audit_backlog_limit)
+   skb_queue_tail(hold_queue, skb);
else
kfree_skb(skb);
 }
@@ -414,17 +414,18 @@ static void kauditd_send_skb(struct sk_buff *skb)
 static void flush_hold_queue(void)
 {
struct sk_buff *skb;
+   struct sk_buff_head *hold_queue = _user_ns.audit.hold_queue;
 
if (!audit_default || !audit_pid || !init_user_ns.audit.sock)
return;
 
-   skb = skb_dequeue(_skb_hold_queue);
+   skb = skb_dequeue(hold_queue);
if (likely(!skb))
return;
 
while (skb && audit_pid) {
kauditd_send_skb(skb);
-   skb = skb_dequeue(_skb_hold_queue);
+   skb = skb_dequeue(hold_queue);
}
 
/*
@@ -956,7 +957,6 @@ static int __init audit_init(void)
return -1;
 
audit_set_user_ns(_user_ns);
-   skb_queue_head_init(_skb_hold_queue);
audit_initialized = AUDIT_INITIALIZED;
audit_enabled = audit_default;
audit_ever_enabled |= !!audit_default;
@@ -1784,6 +1784,7 @@ void audit_set_user_ns(struct user_namespace *ns)
return;
 
skb_queue_head_init(>audit.queue);
+   skb_queue_head_init(>audit.hold_queue);
 }
 
 void audit_free_user_ns(struct user_namespace *ns)
@@ -1798,6 +1799,7 @@ void audit_free_user_ns(struct user_namespace *ns)
}
 
skb_queue_purge(>audit.queue);
+   skb_queue_purge(>audit.hold_queue);
 }
 
 EXPORT_SYMBOL(audit_log_start);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 22/22] Audit: Allow GET,SET,USER MSG operations in uninit user namespace

2013-06-18 Thread Gao feng

After this patch, user can set/get audit informations
in container, and they can also send user msg to the
audit subsystem.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 0b3fd8b..1b60a5a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -594,11 +594,6 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
msg_type)
 {
int err = 0;
 
-   /* Only support the initial namespaces for now. */
-   if ((current_user_ns() != _user_ns) ||
-   (task_active_pid_ns(current) != _pid_ns))
-   return -EPERM;
-
switch (msg_type) {
case AUDIT_LIST:
case AUDIT_ADD:
@@ -606,6 +601,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
msg_type)
return -EOPNOTSUPP;
case AUDIT_GET:
case AUDIT_SET:
+   break;
case AUDIT_LIST_RULES:
case AUDIT_ADD_RULE:
case AUDIT_DEL_RULE:
@@ -614,13 +610,17 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
msg_type)
case AUDIT_TTY_SET:
case AUDIT_TRIM:
case AUDIT_MAKE_EQUIV:
-   if (!capable(CAP_AUDIT_CONTROL))
+   /* These operations only support the initial
+* namespaces for now. */
+   if ((current_user_ns() != _user_ns) ||
+   (task_active_pid_ns(current) != _pid_ns) ||
+   !capable(CAP_AUDIT_CONTROL))
err = -EPERM;
break;
case AUDIT_USER:
case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
-   if (!capable(CAP_AUDIT_WRITE))
+   if (!ns_capable(current_user_ns(), CAP_AUDIT_WRITE))
err = -EPERM;
break;
default:  /* bad msg */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 20/22] Audit: Log audit config change in uninit user namespace

2013-06-18 Thread Gao feng

This patch allow to log audit config change in
uninit user namespace.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 2d81aac..84a882c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -245,13 +245,14 @@ void audit_log_lost(const char *message)
}
 }
 
-static int audit_log_config_change(char *function_name, int new, int old,
+static int audit_log_config_change(struct user_namespace *ns,
+  char *function_name, int new, int old,
   int allow_changes)
 {
struct audit_buffer *ab;
int rc = 0;
 
-   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
+   ab = audit_log_start_ns(ns, NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
if (unlikely(!ab))
return rc;
audit_log_format(ab, "%s=%d old=%d", function_name, new, old);
@@ -260,7 +261,7 @@ static int audit_log_config_change(char *function_name, int 
new, int old,
if (rc)
allow_changes = 0; /* Something weird, deny request */
audit_log_format(ab, " res=%d", allow_changes);
-   audit_log_end(ab);
+   audit_log_end_ns(ns, ab);
return rc;
 }
 
@@ -276,7 +277,8 @@ static int audit_do_config_change(char *function_name, int 
*to_change, int new)
allow_changes = 1;
 
if (ns->audit.enabled != AUDIT_OFF) {
-   rc = audit_log_config_change(function_name, new, old, 
allow_changes);
+   rc = audit_log_config_change(ns, function_name, new,
+old, allow_changes);
if (rc)
allow_changes = 0;
}
@@ -711,7 +713,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
int new_pid = status_get->pid;
 
if (ns->audit.enabled != AUDIT_OFF)
-   audit_log_config_change("audit_pid", new_pid,
+   audit_log_config_change(ns, "audit_pid",
+   new_pid,
ns->audit.pid, 1);
ns->audit.pid = new_pid;
ns->audit.portid = NETLINK_CB(skb).portid;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 18/22] Audit: introduce new audit logging interface for user namespace

2013-06-18 Thread Gao feng

This interface audit_log_start_ns and audit_log_end_ns
will be used for logging audit logs in user namespace.

Signed-off-by: Gao feng 
---
 include/linux/audit.h | 25 --
 kernel/audit.c| 95 ++-
 2 files changed, 78 insertions(+), 42 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index cc30db9..b64f268 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -404,10 +404,18 @@ extern __printf(4, 5)
 void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
   const char *fmt, ...);
 
-extern struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t 
gfp_mask, int type);
+extern struct audit_buffer *
+audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type);
+
+extern struct audit_buffer *
+audit_log_start_ns(struct user_namespace *ns, struct audit_context *ctx,
+  gfp_t gfp_mask, int type);
+
 extern __printf(2, 3)
 void audit_log_format(struct audit_buffer *ab, const char *fmt, ...);
 extern voidaudit_log_end(struct audit_buffer *ab);
+extern voidaudit_log_end_ns(struct user_namespace *ns,
+struct audit_buffer *ab);
 extern int audit_string_contains_control(const char *string,
  size_t len);
 extern voidaudit_log_n_hex(struct audit_buffer *ab,
@@ -457,8 +465,16 @@ static inline __printf(4, 5)
 void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
   const char *fmt, ...)
 { }
-static inline struct audit_buffer *audit_log_start(struct audit_context *ctx,
-  gfp_t gfp_mask, int type)
+static inline
+struct audit_buffer *audit_log_start(struct audit_context *ctx,
+gfp_t gfp_mask, int type)
+{
+   return NULL;
+}
+static inline
+struct audit_buffer *audit_log_start_ns(struct user_namespace *ns,
+   struct audit_context *ctx,
+   gfp_t gfp_mask, int type)
 {
return NULL;
 }
@@ -467,6 +483,9 @@ void audit_log_format(struct audit_buffer *ab, const char 
*fmt, ...)
 { }
 static inline void audit_log_end(struct audit_buffer *ab)
 { }
+static inline void audit_log_end_ns(struct user_namespace *ns,
+   struct audit_buffer *ab)
+{ }
 static inline void audit_log_n_hex(struct audit_buffer *ab,
   const unsigned char *buf, size_t len)
 { }
diff --git a/kernel/audit.c b/kernel/audit.c
index 3dcaa97..5d3764c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1112,47 +1112,35 @@ static inline void audit_get_stamp(struct audit_context 
*ctx,
 /*
  * Wait for auditd to drain the queue a little
  */
-static void wait_for_auditd(unsigned long sleep_time)
+static void wait_for_auditd(struct user_namespace *ns,
+   unsigned long sleep_time)
 {
-   const struct sk_buff_head *queue = _user_ns.audit.queue;
+   const struct sk_buff_head *queue = >audit.queue;
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
-   add_wait_queue(_user_ns.audit.backlog_wait, );
+   add_wait_queue(>audit.backlog_wait, );
 
if (audit_backlog_limit &&
skb_queue_len(queue) > audit_backlog_limit)
schedule_timeout(sleep_time);
 
__set_current_state(TASK_RUNNING);
-   remove_wait_queue(_user_ns.audit.backlog_wait, );
+   remove_wait_queue(>audit.backlog_wait, );
 }
 
-/**
- * audit_log_start - obtain an audit buffer
- * @ctx: audit_context (may be NULL)
- * @gfp_mask: type of allocation
- * @type: audit message type
- *
- * Returns audit_buffer pointer on success or NULL on error.
- *
- * Obtain an audit buffer.  This routine does locking to obtain the
- * audit buffer, but then no locking is required for calls to
- * audit_log_*format.  If the task (ctx) is a task that is currently in a
- * syscall, then the syscall is marked as auditable and an audit record
- * will be written at syscall exit.  If there is no associated task, then
- * task context (ctx) should be NULL.
- */
-struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
-int type)
+struct audit_buffer *audit_log_start_ns(struct user_namespace *ns,
+   struct audit_context *ctx,
+   gfp_t gfp_mask,
+   int type)
 {
struct audit_buffer *ab = NULL;
struct timespec t;
unsigned intuninitialized_var(serial);
int reserve;
unsigned long timeout_start = jiffies;
-   struct sk_buff_head *queue = _user_ns.audit.queue;
+   struct sk_buff_head *queue =

[PATCH 21/22] Audit: send reply message to the auditd in proper user namespace

2013-06-18 Thread Gao feng

We can send the audit reply message to userspace auditd
process which running in the same user namespace with the
process which send the audit request message to kernel.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 84a882c..0b3fd8b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -146,6 +146,7 @@ struct audit_buffer {
 struct audit_reply {
int pid;
struct sk_buff *skb;
+   struct user_namespace *ns;
 };
 
 static void audit_set_pid(struct audit_buffer *ab, pid_t pid)
@@ -532,8 +533,9 @@ static int audit_send_reply_thread(void *arg)
 
/* Ignore failure. It'll only happen if the sender goes away,
   because our timeout is set to infinite. */
-   netlink_unicast(init_user_ns.audit.sock, reply->skb,
+   netlink_unicast(reply->ns->audit.sock, reply->skb,
reply->pid, 0);
+   put_user_ns(reply->ns);
kfree(reply);
return 0;
 }
@@ -572,11 +574,13 @@ static int audit_send_reply(int pid, int seq, int type, 
int done, int multi,
 
reply->pid = pid;
reply->skb = skb;
+   reply->ns = get_user_ns(current_user_ns());
 
tsk = kthread_run(audit_send_reply_thread, reply, "audit_send_reply");
if (!IS_ERR(tsk))
return 0;
kfree_skb(skb);
+   put_user_ns(reply->ns);
 out:
kfree(reply);
return ret;
@@ -833,7 +837,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
security_release_secctx(ctx, len);
return -ENOMEM;
}
-   sig_data->uid = from_kuid(_user_ns, audit_sig_uid);
+   sig_data->uid = from_kuid(ns, audit_sig_uid);
sig_data->pid = audit_sig_pid;
if (audit_sig_sid) {
memcpy(sig_data->ctx, ctx, len);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/22] Audit: make audit_enabled per user namespace

2013-06-18 Thread Gao feng

This patch makes audit_enabled per user namespace,
Right now,use audit_enabled of init user namespace to
decide if audit is enabled no matter which user namespace
we belong to.

Signed-off-by: Gao feng 
---
 include/linux/audit.h  |  4 +++-
 include/linux/user_namespace.h |  1 +
 kernel/audit.c | 32 
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 179351d..cc30db9 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -450,7 +450,8 @@ extern int audit_filter_user(int type);
 extern int audit_filter_type(int type);
 extern int  audit_receive_filter(int type, int pid, int seq,
void *data, size_t datasz);
-extern int audit_enabled;
+#define audit_enabled (init_user_ns.audit.enabled)
+#define audit_enabled_ns(ns) (ns->audit.enabled)
 #else /* CONFIG_AUDIT */
 static inline __printf(4, 5)
 void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
@@ -503,6 +504,7 @@ static inline void audit_set_user_ns(struct user_namespace 
*ns)
 static inline void audit_free_user_ns(struct user_namespace *ns)
 { }
 #define audit_enabled 0
+#define audit_enabled_ns(ns) 0
 #endif /* CONFIG_AUDIT */
 static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
 {
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 60dd6da..9972f0f 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -21,6 +21,7 @@ struct uid_gid_map {  /* 64 bytes -- 1 cache line */
 #ifdef CONFIG_AUDIT
 struct audit_ctrl {
struct sock *sock;
+   int enabled;
int pid;
int portid;
struct sk_buff_head queue;
diff --git a/kernel/audit.c b/kernel/audit.c
index ca61cf0..758b1e8 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -78,11 +78,8 @@ static int   audit_initialized;
 #define AUDIT_OFF  0
 #define AUDIT_ON   1
 #define AUDIT_LOCKED   2
-intaudit_enabled;
 bool   audit_ever_enabled;
 
-EXPORT_SYMBOL_GPL(audit_enabled);
-
 /* Default state when kernel boots without any parameters. */
 static int audit_default;
 
@@ -274,14 +271,15 @@ static int audit_log_config_change(char *function_name, 
int new, int old,
 static int audit_do_config_change(char *function_name, int *to_change, int new)
 {
int allow_changes, rc = 0, old = *to_change;
+   struct user_namespace *ns = current_user_ns();
 
/* check if we are locked */
-   if (audit_enabled == AUDIT_LOCKED)
+   if (ns->audit.enabled == AUDIT_LOCKED)
allow_changes = 0;
else
allow_changes = 1;
 
-   if (audit_enabled != AUDIT_OFF) {
+   if (ns->audit.enabled != AUDIT_OFF) {
rc = audit_log_config_change(function_name, new, old, 
allow_changes);
if (rc)
allow_changes = 0;
@@ -306,13 +304,14 @@ static int audit_set_backlog_limit(int limit)
return audit_do_config_change("audit_backlog_limit", 
_backlog_limit, limit);
 }
 
-static int audit_set_enabled(int state)
+static int audit_set_enabled(struct user_namespace *ns, int state)
 {
int rc;
if (state < AUDIT_OFF || state > AUDIT_LOCKED)
return -EINVAL;
 
-   rc =  audit_do_config_change("audit_enabled", _enabled, state);
+   rc =  audit_do_config_change("audit_enabled", >audit.enabled,
+state);
if (!rc)
audit_ever_enabled |= !!state;
 
@@ -625,7 +624,7 @@ static int audit_log_common_recv_msg(struct audit_buffer 
**ab, u16 msg_type)
int rc = 0;
uid_t uid = from_kuid(_user_ns, current_uid());
 
-   if (!audit_enabled) {
+   if (!audit_enabled_ns(_user_ns)) {
*ab = NULL;
return rc;
}
@@ -677,7 +676,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
 
switch (msg_type) {
case AUDIT_GET:
-   status_set.enabled   = audit_enabled;
+   status_set.enabled   = ns->audit.enabled;
status_set.failure   = audit_failure;
status_set.pid   = ns->audit.pid;
status_set.rate_limit= audit_rate_limit;
@@ -693,7 +692,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
return -EINVAL;
status_get   = (struct audit_status *)data;
if (status_get->mask & AUDIT_STATUS_ENABLED) {
-   err = audit_set_enabled(status_get->enabled);
+   err = audit_set_enabled(ns, status_get->enabled);
if (err < 0)
return err;
}
@@ -705,7 +704,7 @@ static int audit_receive_msg(struct sk_buff *skb,

[PATCH 09/22] Audit: make audit_nlk_portid per user namesapce

2013-06-18 Thread Gao feng

After this patch, audit_nlk_port is per user namespace.
Just like prev patch does,use audit_nlk_portid of init
user namespace in kauditd_send_skb.

Signed-off-by: Gao feng 
---
 include/linux/user_namespace.h |  1 +
 kernel/audit.c | 11 ++-
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index ae69f20..60dd6da 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -22,6 +22,7 @@ struct uid_gid_map {  /* 64 bytes -- 1 cache line */
 struct audit_ctrl {
struct sock *sock;
int pid;
+   int portid;
struct sk_buff_head queue;
struct sk_buff_head hold_queue;
struct task_struct  *kauditd_task;
diff --git a/kernel/audit.c b/kernel/audit.c
index 7b696cd5..ca61cf0 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -89,13 +89,6 @@ static int   audit_default;
 /* If auditing cannot proceed, audit_failure selects what happens. */
 static int audit_failure = AUDIT_FAIL_PRINTK;
 
-/*
- * If audit records are to be written to the netlink socket, audit_pid
- * contains the pid of the auditd process and audit_nlk_portid contains
- * the portid to use to send netlink messages to that process.
- */
-static int audit_nlk_portid;
-
 /* If audit_rate_limit is non-zero, limit the rate of sending audit records
  * to that number per second.  This prevents DoS attacks, but results in
  * audit records being dropped. */
@@ -381,7 +374,7 @@ static void kauditd_send_skb(struct sk_buff *skb)
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
err = netlink_unicast(init_user_ns.audit.sock, skb,
- audit_nlk_portid, 0);
+ init_user_ns.audit.portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n",
@@ -716,7 +709,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
audit_log_config_change("audit_pid", new_pid,
ns->audit.pid, 1);
ns->audit.pid = new_pid;
-   audit_nlk_portid = NETLINK_CB(skb).portid;
+   ns->audit.portid = NETLINK_CB(skb).portid;
}
if (status_get->mask & AUDIT_STATUS_RATE_LIMIT) {
err = audit_set_rate_limit(status_get->rate_limit);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6] Cpufreq: Fix governor start/stop race condition

2013-06-18 Thread Xiaoguang Chen

Cpufreq governor's stop and start operation should be kept in sequence.
If not, there will be unexpected behavior, for example:

There are 4 CPUs and policy->cpu=cpu0, cpu1/2/3 are linked to cpu0.
The normal sequence is as below:

1) Current governor is userspace, One application tries to set
governor to ondemand. It will call __cpufreq_set_policy in which it
will stop userspace governor and then start ondemand governor.

2) Current governor is userspace, Now cpu0 hotplugs in cpu3, It will
call cpufreq_add_policy_cpu in which it first stops userspace
governor, and then starts userspace governor.

Now if the sequence of above two cases interleaves, It becames
below sequence:

1) Application stops userspace governor
2)  Hotplug stops userspace governor
3) Application starts ondemand governor
4)  Hotplug starts a governor

In step 4, hotplug is supposed to start userspace governor, But now
the governor has been changed by application to ondemand, So hotplug
starts ondemand governor again 

The solution is: Do not allow stop one policy's governor multi-times.
Governor stop should only do once for one policy, After it is stopped,
No other governor stop should be executed. also add one mutext to
protect __cpufreq_governor so governor operation can be kept in sequence.

Signed-off-by: Xiaoguang Chen 
---
 drivers/cpufreq/cpufreq.c | 24 
 include/linux/cpufreq.h   |  1 +
 2 files changed, 25 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 2d53f47..6f5aa6f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -46,6 +46,7 @@ static DEFINE_PER_CPU(struct cpufreq_policy *, 
cpufreq_cpu_data);
 static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor);
 #endif
 static DEFINE_RWLOCK(cpufreq_driver_lock);
+static DEFINE_MUTEX(cpufreq_governor_lock);
 
 /*
  * cpu_policy_rwsem is a per CPU reader-writer semaphore designed to cure
@@ -1562,6 +1563,21 @@ static int __cpufreq_governor(struct cpufreq_policy 
*policy,
 
pr_debug("__cpufreq_governor for CPU %u, event %u\n",
policy->cpu, event);
+
+   mutex_lock(_governor_lock);
+   if ((!policy->governor_enabled && (event == CPUFREQ_GOV_STOP)) ||
+   (policy->governor_enabled && (event == CPUFREQ_GOV_START))) {
+   mutex_unlock(_governor_lock);
+   return -EBUSY;
+   }
+
+   if (event == CPUFREQ_GOV_STOP)
+   policy->governor_enabled = false;
+   else if (event == CPUFREQ_GOV_START)
+   policy->governor_enabled = true;
+
+   mutex_unlock(_governor_lock);
+
ret = policy->governor->governor(policy, event);
 
if (!ret) {
@@ -1569,6 +1585,14 @@ static int __cpufreq_governor(struct cpufreq_policy 
*policy,
policy->governor->initialized++;
else if (event == CPUFREQ_GOV_POLICY_EXIT)
policy->governor->initialized--;
+   } else {
+   /* Restore original values */
+   mutex_lock(_governor_lock);
+   if (event == CPUFREQ_GOV_STOP)
+   policy->governor_enabled = true;
+   else if (event == CPUFREQ_GOV_START)
+   policy->governor_enabled = false;
+   mutex_unlock(_governor_lock);
}
 
/* we keep one module reference alive for
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 037d36a..1a81b74 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -107,6 +107,7 @@ struct cpufreq_policy {
unsigned intpolicy; /* see above */
struct cpufreq_governor *governor; /* see below */
void*governor_data;
+   boolgovernor_enabled; /* governor start/stop flag */
 
struct work_struct  update; /* if update_policy() needs to be
 * called, but you're in IRQ context */
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kvm api doc: fix section numbers

2013-06-18 Thread Alexey Kardashevskiy

Signed-off-by: Alexey Kardashevskiy 
---
 Documentation/virtual/kvm/api.txt |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 5f91eda..6365fef 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2261,7 +2261,7 @@ return indicates the attribute is implemented.  It does 
not necessarily
 indicate that the attribute can be read or written in the device's
 current state.  "addr" is ignored.
 
-4.77 KVM_ARM_VCPU_INIT
+4.82 KVM_ARM_VCPU_INIT
 
 Capability: basic
 Architectures: arm
@@ -2285,7 +2285,7 @@ Possible features:
  Depends on KVM_CAP_ARM_PSCI.
 
 
-4.78 KVM_GET_REG_LIST
+4.83 KVM_GET_REG_LIST
 
 Capability: basic
 Architectures: arm
@@ -2305,7 +2305,7 @@ This ioctl returns the guest registers that are supported 
for the
 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
 
 
-4.80 KVM_ARM_SET_DEVICE_ADDR
+4.84 KVM_ARM_SET_DEVICE_ADDR
 
 Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
 Architectures: arm
@@ -2342,7 +2342,7 @@ and distributor interface, the ioctl must be called after 
calling
 KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs.  Calling
 this ioctl twice for any of the base addresses will return -EEXIST.
 
-4.82 KVM_PPC_RTAS_DEFINE_TOKEN
+4.85 KVM_PPC_RTAS_DEFINE_TOKEN
 
 Capability: KVM_CAP_PPC_RTAS
 Architectures: ppc
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Jun 18 usb/chipidea)

2013-06-18 Thread Peter Chen

On Wed, Jun 19, 2013 at 4:12 AM, Felipe Balbi  wrote:
> Hi,
>
> On Tue, Jun 18, 2013 at 11:52:36AM -0700, Randy Dunlap wrote:
>> On 06/18/13 00:30, Stephen Rothwell wrote:
>> > Hi all,
>> >
>> > Changes since 20130617:
>> >
>>
>>
>> on i386:
>>
>> # CONFIG_USB_PHY is not set
>>
>> drivers/built-in.o: In function `ci_hdrc_probe':
>> core.c:(.text+0x20446b): undefined reference to `of_usb_get_phy_mode'
>>
>>
>> chipidea needs to depend on or select some kind of USB_PHY support...?
>
> hmm, looks like a missing stub to me. Alex ?
>
> --
> balbi

Seems i386 chooses CONFIG_OF, but not CONFIG_USB_PHY.
I will send a patch to fix it.

--
BR,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 6/6] mm/pgtable: Don't accumulate addr during pgd prepopulate pmd

2013-06-18 Thread Zhang Yanfei

On 06/19/2013 07:52 AM, Wanpeng Li wrote:
> Changelog:
>  v2 - > v3:
>* add Michal's Reviewed-by
> 
> The old codes accumulate addr to get right pmd, however,
> currently pmds are preallocated and transfered as a parameter,
> there is unnecessary to accumulate addr variable any more, this
> patch remove it.
> 
> Reviewed-by: Michal Hocko 
> Signed-off-by: Wanpeng Li 

Reviewed-by: Zhang Yanfei 

> ---
>  arch/x86/mm/pgtable.c |4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 17fda6a..dfa537a 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -240,7 +240,6 @@ static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t 
> *pgdp)
>  static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t 
> *pmds[])
>  {
>   pud_t *pud;
> - unsigned long addr;
>   int i;
>  
>   if (PREALLOCATED_PMDS == 0) /* Work around gcc-3.4.x bug */
> @@ -248,8 +247,7 @@ static void pgd_prepopulate_pmd(struct mm_struct *mm, 
> pgd_t *pgd, pmd_t *pmds[])
>  
>   pud = pud_offset(pgd, 0);
>  
> - for (addr = i = 0; i < PREALLOCATED_PMDS;
> -  i++, pud++, addr += PUD_SIZE) {
> + for (i = 0; i < PREALLOCATED_PMDS; i++, pud++) {
>   pmd_t *pmd = pmds[i];
>  
>   if (i >= KERNEL_PGD_BOUNDARY)


-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Percpu tag allocator

2013-06-18 Thread Kent Overstreet

On Thu, Jun 13, 2013 at 12:23:39PM -0700, Andrew Morton wrote:
> On Thu, 13 Jun 2013 12:15:07 -0700 Tejun Heo  wrote:
> > Oh, I'm sure the current id[r|a] can be improved upon a lot but I'm
> > very skeptical one can reach the level of scalability necessary for,
> > say, pci-e attached extremely high-iops devices while still keeping
> > the lowest number allocation, which can't be achieved without strong
> > synchronization on each alloc/free.
> > 
> > Maybe we can layer things so that we have percpu layer on top of
> > id[r|a] and, say, mapping id to point is still done by idr, or the
> > percpu tag allocator uses ida for tag chunk allocations, but it's
> > still gonna be something extra on top.
> 
> It's not obvious that explicit per-cpu is needed.  Get an ID from
> ida_get_new_above(), multiply it by 16 and store that in device-local
> storage, along with a 16-bit bitmap.  Blam, 30 lines of code and the
> ida_get_new_above() cost is reduced 16x and it's off the map.

I was just rereading emails and realized I should've replied to this.

So, I started out aiming for something simpler. I don't quite follow
what the approach you're suggesting is, but it ends up you really do
need the percpu freelists (buffering batching/freeing from the main
freelist) because ids/tags may not be freed on the cpu they were
allocated on.

In particular, if this is for a driver and the device doesn't implement
per cpu queues, tags are almost always going to be freed on a different
cpu. If you just give each cpu a fraction of the tag space they always
allocate out of (with ida_get_new_above() or similar) - that only helps
allocation, half the cacheline contention is freeing.

I originally wrote this tag allocator to use in a driver for a device
that didn't support multiple hardware queues at all, but it was fast
enough that any cacheline bouncing really hurt.

So that right there gets you to the basic design where you've got a
global freelist and percpu freelists, and you just use the percpu
freelists to batch up allocation/freeing to the global freelist.

The tag allocator I was going to submit for the aio code was pretty much
just that, nothing more. It was simple. It worked. I was happy with it.

The one concern with this approach is what happens when all the percpu
freelists except your are full of free tags. Originally, I had an easy
solution - we calculate the size of the percpu freelists based on
nr_tags and num_possible_cpus, so that there can't be more than half of
the tag space stranded like this.

(Keep in mind the main use case is drivers where the tag/id is used to
talk to the device, so you're limited by whatever the hardware people
thought was enough - 16 bits if you're lucky).

But then Tejun went and pointed out, just as I was about to mail it off
- "Hey, what happens if you've got 4096 cpus and not all that many tags?
Youv'e got a nice divice by zero in there".

After which I proceeded to swear at him a bit, but - well, it's a real
problem. And that is what led to the tag stealing stuff and all the
cmpxchg() shenanigans. And I'm pretty happy with the solution - there's
an elegance to it and I bet if I cared I could come up with a proof that
it's more or less optimal w.r.t. cacheline bounces for some decent
fraction of workloads we might care about. But yeah, it's not as simple
as I would've liked.

Anyways, now you've got an ida/idr api cleanup/rewrite to go along with
it, and it's all nicely integrated. Integrating the percpu tag allocator
with regular ida really doesn't save us any code - the original global
freelist was a stack and like 10 lines of code total.

But having the apis be consistent and having it all be organized and
pretty is nice. I think it is now, anyways :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH ] cgroup: rename cont to cgrp

2013-06-18 Thread Li Zefan

Hi Tejun,

Could you apply this patch?

On 2013/6/14 11:17, Li Zefan wrote:
> Cont is short for container. control group was named process container
> at first, but then people found container already has a meaning in
> linux kernel.
> 
> Clean up the leftover variable name @cont.
> 
> Signed-off-by: Li Zefan 
> ---
> 
> I'll clean up this for memcg later.
> 
> ---
>  include/linux/cgroup.h |  4 ++--
>  kernel/cgroup.c| 22 +++---
>  2 files changed, 13 insertions(+), 13 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] ARM: dts: AM33XX: Add PWMSS device tree nodes

2013-06-18 Thread Benoit Cousson


Hi Sebastian,

On 06/18/2013 08:36 AM, Sebastian Andrzej Siewior wrote:

On 06/12/2013 06:40 PM, Felipe Balbi wrote:

On Wed, Jun 12, 2013 at 06:10:32PM +0200, Sebastian Andrzej Siewior wrote:

On 06/06/2013 03:52 PM, Sebastian Andrzej Siewior wrote:

From: Philip Avinash 

Add PWMSS device tree nodes in relation with ECAP & EHRPWM DT nodes to
AM33XX SoC family. Also populates device tree nodes for ECAP & EHRPWM by
adding necessary properties like pwm-cells, base reg & set disabled as
status.

Can someone please grab #2 till #4? Paul took just #1 as far as I can
tell.


DTS should be Benoit Cousson


So, Benoit. Would you please be so kind and pick up the dts pieces?


That's done. Patches are available in 
git://git.kernel.org/pub/scm/linux/kernel/git/bcousson/linux-omap-dt.git 
for_3.11/dts


Thanks,
Benoit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/9] memcg: make memcg's life cycle the same as cgroup

2013-06-18 Thread Li Zefan

Hi Andrew, any chance for this patchset to be queued for 3.11?

On 2013/6/14 9:53, Li Zefan wrote:
> Hi Andrew,
> 
> All the patches in this patchset has been acked by Michal and Kamezawa-san, 
> and
> it's ready to be merged into -mm.
> 
> I have another pending patchset that kills css_id, which depends on this one.
> 
> 
> Changes since v3:
> - rebased against mmotm 2013-06-06-16-19
> - changed wmb() to smp_wmb() and moved it to memcg_kmem_mark_dead() and added
>   more comment.
> 
> Changes since v2:
> 
> - rebased against 3.10-rc1
> - collected some acks
> - the two memcg bug fixes has been merged into mainline
> - the cgroup core patch has been merged into mainline
> 
> Changes since v1:
> 
> - wrote better changelog and added acked-by and reviewed-by tags
> - revised some comments as suggested by Michal
> - added a wmb() in kmem_cgroup_css_offline(), pointed out by Michal
> - fixed a bug which causes a css_put() never be called
> 
> 
> Now memcg has its own refcnt, so when a cgroup is destroyed, the memcg can
> still be alive. This patchset converts memcg to always use css_get/put, so
> memcg will have the same life cycle as its corresponding cgroup.
> 
> The historical reason that memcg didn't use css_get in some cases, is that
> cgroup couldn't be removed if there're still css refs. The situation has
> changed so that rmdir a cgroup will succeed regardless css refs, but won't
> be freed until css refs goes down to 0.
> 
> Since the introduction of kmemcg, the memcg refcnt handling grows even more
> complicated. This patchset greately simplifies memcg's life cycle management.
> 
> Also, after those changes, we can convert memcg to use cgroup->id, and then
> we can kill css_id.
> 
> Li Zefan (7):
>   memcg: use css_get() in sock_update_memcg()
>   memcg: don't use mem_cgroup_get() when creating a kmemcg cache
>   memcg: use css_get/put when charging/uncharging kmem
>   memcg: use css_get/put for swap memcg
>   memcg: don't need to get a reference to the parent
>   memcg: kill memcg refcnt
>   memcg: don't need to free memcg via RCU or workqueue
> 
> Michal Hocko (2):
>   Revert "memcg: avoid dangling reference count in creation failure."
>   memcg, kmem: fix reference count handling on the error path
> 
>  mm/memcontrol.c | 208 
> +---
>  1 file changed, 77 insertions(+), 131 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: next-20130607 BUG: Bad page state in process systemd pfn:127643

2013-06-18 Thread Chen Gang

Hello Valdis:

Could you delete the patch: (8094dd4 mm/page_alloc.c: add additional
checking and return value for the 'table->data') and try it again to
see whether still has this issue ?

If no issue any more, it means I cause this issue directly, and I
should continue to analyze it to find the root cause.

Else (still has this issue), please continue to delete another related
patches step by step to catch the direct cause (to speed up, each time,
can delete half of related patches to see the result).

When catch the direct cause, as one of related members, I need continue
to find the root cause with you together.

Thanks.


Also sorry for replying late. You already sent the mail to me. but I
don't know why I did not receive it. and yesterday I checked my the
other mail which also can receive mailing list, then find this mail.

So if sent/cc mail to me and get no response, please help to re-send again.


Thanks.
-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2078 matches

Mail list logo