On 18 September 2013 11:05, Zhang Haijun <b42...@freescale.com> wrote:
>
> 于 2013/9/18 16:10, Ulf Hansson 写道:
>
>> On 18 September 2013 05:52, Zhang Haijun <b42...@freescale.com> wrote:
>>>
>>> 于 2013/9/17 21:21, Ulf Hansson 写道:
>>>
>>>> Hi Haijun,
>>>>
>>>> On 17 September 2013 11:03, Haijun Zhang <haijun.zh...@freescale.com>
>>>> wrote:
>>>>>
>>>>> When card is in cpu polling mode to detect card present. Card detecting
>>>>> task will be scheduled about once every second. When card is busy in
>>>>> large
>>>>> file transfer, detecting task will be hang and call trace will be
>>>>> prompt.
>>>>> When handling the request, CMD13 is always followed by every command
>>>>> when
>>>>> it was complete. So assume that card is present to avoid this duplicate
>>>>> detecting. Only polling card when card is free to reduce conflict with
>>>>> data transfer.
>>>>>
>>>>> <7>mmc0: req done (CMD13): 0: 00000e00 00000000 00000000 00000000
>>>>> INFO: task kworker/u:1:12 blocked for more than 120 seconds.
>>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>>>>> message.
>>>>> kworker/u:1     D 00000000     0    12      2 0x00000000
>>>>> Call Trace:
>>>>> [ee06dd50] [44042028] 0x44042028
>>>>> (unreliable)
>>>>> [ee06de10] [c0007a0c] __switch_to+0xa0/0xf0
>>>>> [ee06de30] [c04dd50c] __schedule+0x1f8/0x4a4
>>>>>
>>>>> [ee06dea0] [c04dd898] schedule+0x30/0xbc
>>>>>
>>>>> [ee06deb0] [c03816a4] __mmc_claim_host+0x98/0x19c
>>>>>
>>>>> [ee06df00] [c0385f88] mmc_sd_detect+0x38/0xc0
>>>>>
>>>>> [ee06df10] [c0382b0c] mmc_rescan+0x294/0x2e0
>>>>> [ee06df40] [c00661cc] process_one_work+0x140/0x3e0
>>>>>
>>>>> [ee06df70] [c0066bf8] worker_thread+0x18c/0x36c
>>>>> [ee06dfb0] [c006bf10] kthread+0x7c/0x80
>>>>>
>>>>> [ee06dff0] [c000de58] kernel_thread+0x4c/0x68
>>>>> <7>sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001
>>>>>
>>>>> Signed-off-by: Haijun Zhang <haijun.zh...@freescale.com>
>>>>> ---
>>>>> changes for V2:
>>>>>           - Add card detecting once the last request is done.
>>>>>
>>>>>    drivers/mmc/card/block.c | 28 ++++++++++++++++++++++++++--
>>>>>    drivers/mmc/core/core.c  |  5 +++++
>>>>>    drivers/mmc/core/mmc.c   |  3 ++-
>>>>>    drivers/mmc/core/sd.c    |  3 ++-
>>>>>    drivers/mmc/core/sdio.c  |  3 ++-
>>>>>    5 files changed, 37 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>>>> index 1a3163f..f280320 100644
>>>>> --- a/drivers/mmc/card/block.c
>>>>> +++ b/drivers/mmc/card/block.c
>>>>> @@ -1960,9 +1960,20 @@ static int mmc_blk_issue_rq(struct mmc_queue
>>>>> *mq,
>>>>> struct request *req)
>>>>>           struct mmc_host *host = card->host;
>>>>>           unsigned long flags;
>>>>>
>>>>> -       if (req && !mq->mqrq_prev->req)
>>>>> +       if (req && !mq->mqrq_prev->req) {
>>>>> +               /*
>>>>> +                * When we are here, card polling task will be blocked.
>>>>> +                * So disable it to void this useless schedule.
>>>>
>>>> "avoid"
>>>
>>> :-)
>>>
>>>>> +                */
>>>>> +               if (host->caps & MMC_CAP_NEEDS_POLL) {
>>>>> +                       spin_lock_irqsave(&host->lock, flags);
>>>>> +                       host->rescan_disable = 1;
>>>>> +                       spin_unlock_irqrestore(&host->lock, flags);
>>>>
>>>> I believe you need to add "cancel_delayed_work_sync(&host->detect)"
>>>> well!?
>>>>
>>>> Do note, that I am not confident sure that cancel_delayed_work_sync is
>>>> safe to run this from this context. What about the scenario were the
>>>
>>> When we are here the current task is still running,
>>> Invoke cancel_delayed_work_sync here will cause the task hang.
>>>
>>>> detect work is just about to remove a card then will be waiting for
>>>> the block queue to close, which then waits for the
>>>> cancel_delayed_work_sync to end. :-)
>>>
>>> Let me confirm whether I understand what you mean.
>>> You worried about the remove task is block by the block queue. This patch
>>> changed the priority of remove task?
>>> There several scenario here:
>>> a . schedule detect work( HZ )
>>> b . block work queue
>>> c . schedule detect work immediately ( 0 )
>>> d. detect remove task running
>>> 1. a->b->c->d
>>> 2. a->d->b->c->b->c->d
>>> 3. a->b->b->c->b->c->b->d
>>>
>>> So if step 'd' is really running is depend on the followed task 'b'.
>>>
>>> Or, I miss something else? Hope to make this clear.:-)
>>>
>>>>
>>>>> +               }
>>>>> +
>>>>>                   /* claim host only for the first request */
>>>>>                   mmc_get_card(card);
>>>>> +       }
>>>>>
>>>>>           ret = mmc_blk_part_switch(card, md);
>>>>>           if (ret) {
>>>>> @@ -1999,7 +2010,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
>>>>> struct request *req)
>>>>>
>>>>>    out:
>>>>>           if ((!req && !(mq->flags & MMC_QUEUE_NEW_REQUEST)) ||
>>>>> -            (req && (req->cmd_flags & MMC_REQ_SPECIAL_MASK)))
>>>>> +            (req && (req->cmd_flags & MMC_REQ_SPECIAL_MASK))) {
>>>>>                   /*
>>>>>                    * Release host when there are no more requests
>>>>>                    * and after special request(discard, flush) is done.
>>>>> @@ -2007,6 +2018,19 @@ out:
>>>>>                    * the 'mmc_blk_issue_rq' with 'mqrq_prev->req'.
>>>>>                    */
>>>>>                   mmc_put_card(card);
>>>>> +
>>>>> +               /*
>>>>> +                * Detecting card status immediately in case card being
>>>>> +                * removed just after the request is complete.
>>>>> +                */
>>>>> +               if (host->caps & MMC_CAP_NEEDS_POLL) {
>>>>> +                       spin_lock_irqsave(&host->lock, flags);
>>>>> +                       host->rescan_disable = 0;
>>>>> +                       spin_unlock_irqrestore(&host->lock, flags);
>>>>> +                       mmc_detect_change(host, 0);
>>>>> +               }
>>>>> +       }
>>>>> +
>>>>>           return ret;
>>>>>    }
>>>>>
>>>>> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
>>>>> index b9b9fb6..2831c03 100644
>>>>> --- a/drivers/mmc/core/core.c
>>>>> +++ b/drivers/mmc/core/core.c
>>>>> @@ -925,15 +925,20 @@ EXPORT_SYMBOL(__mmc_claim_host);
>>>>>     */
>>>>>    int mmc_try_claim_host(struct mmc_host *host)
>>>>>    {
>>>>> +       struct mmc_card *card;
>>>>>           int claimed_host = 0;
>>>>>           unsigned long flags;
>>>>>
>>>>> +       card = host->card;
>>>>> +       pm_runtime_get_sync(&card->dev);
>>>>
>>>> Nope, this wont work. pm_runtime_get_sync can trigger an
>>>> mmc_claim_host, thus this entire function is not "non-blocking" any
>>>> more.
>>>
>>> I missed this. so this check should be invoked before schedule detect
>>> work.
>>
>> Not sure how you mean? It is the task executing the detect work that
>> need to be able to claim the host in a non-blocking manner, right?
>
> Yes.  so no need pm_runtime_get_sync here?

You do need pm_runtime_get_sync, since that will give you provision
for sending a cmd/request to the card. This function will then not be
non-blocking - so you are in a limbo. :-)

Even if you would manage to split up this function in a better way and
to make it non-blocking, you must not allow mmc_sdio_detect,
mmc_detect and mmc_sd_detect to accept a failure from a non-blocking
function and then just return, which is what you do in this patch. In
principle it means that we could sometimes ignore to properly remove a
card.

So this solution just does not work, as is.

>
>>>> Could and option be to just do a best effort? Skipping the try_claim
>>>> entirely -  maybe?
>>>
>>> You mean just to check the card status without hold the task? or rewrite
>>> another
>>> function like: non_block_claim_host?
>>
>> I mean you can leave the code as is, before this patch, for
>> drivers/mmc/core/* and only rely on the fix you done in the
>> drivers/mmc/card/block.c. It wont be perfect but still better. :-)
>> That is what I thought of, unless we can fix this without breaking
>> something else.
>
> If we still use mmc_claim_host, my chang in block.c still can't work.
> mmc_claim_host will still be pending and blocked by the followed detect task
> work for more than 120s.
> Work queue from block layer is not expected.
> Except you cleaned the detect task before every work sequence from block
> layer.
> This is the key problem.

Yes, that is why I was saying best effort. :-)

> BTW "breaking something", you mean?

See more above.

>
>>
>> Kind regards
>> Ulf Hansson
>>
>>>>> +
>>>>>           spin_lock_irqsave(&host->lock, flags);
>>>>>           if (!host->claimed || host->claimer == current) {
>>>>>                   host->claimed = 1;
>>>>>                   host->claimer = current;
>>>>>                   host->claim_cnt += 1;
>>>>>                   claimed_host = 1;
>>>>> +               set_current_state(TASK_RUNNING);
>>>>>           }
>>>>>           spin_unlock_irqrestore(&host->lock, flags);
>>>>>           if (host->ops->enable && claimed_host && host->claim_cnt ==
>>>>> 1)
>>>>> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
>>>>> index 6d02012..90e5555 100644
>>>>> --- a/drivers/mmc/core/mmc.c
>>>>> +++ b/drivers/mmc/core/mmc.c
>>>>> @@ -1447,7 +1447,8 @@ static void mmc_detect(struct mmc_host *host)
>>>>>           BUG_ON(!host);
>>>>>           BUG_ON(!host->card);
>>>>>
>>>>> -       mmc_get_card(host->card);
>>>>> +       if (!mmc_try_claim_host(host))
>>>>> +               return;
>>>>>
>>>>>           /*
>>>>>            * Just check if our card has been removed.
>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>> index 5e8823d..7dfb24d 100644
>>>>> --- a/drivers/mmc/core/sd.c
>>>>> +++ b/drivers/mmc/core/sd.c
>>>>> @@ -1045,7 +1045,8 @@ static void mmc_sd_detect(struct mmc_host *host)
>>>>>           BUG_ON(!host);
>>>>>           BUG_ON(!host->card);
>>>>>
>>>>> -       mmc_get_card(host->card);
>>>>> +       if (!mmc_try_claim_host(host))
>>>>> +               return;
>>>>>
>>>>>           /*
>>>>>            * Just check if our card has been removed.
>>>>> diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
>>>>> index 80d89cff..e0cabf5 100644
>>>>> --- a/drivers/mmc/core/sdio.c
>>>>> +++ b/drivers/mmc/core/sdio.c
>>>>> @@ -875,7 +875,8 @@ static void mmc_sdio_detect(struct mmc_host *host)
>>>>>                   }
>>>>>           }
>>>>>
>>>>> -       mmc_claim_host(host);
>>>>> +       if (!mmc_try_claim_host(host))
>>>>> +               return;
>>>>>
>>>>>           /*
>>>>>            * Just check if our card has been removed.
>>>>> --
>>>>> 1.8.0
>>>>>

Considering your "bug" you are trying to solve. Could it just not be
that for MMC_CAP_NEEDS_POLL the work-queue task hangtimeout should be
set to "infinite" instead of 120s? Can that be configured?

Maybe there are other solutions as well, I just can't think of any
good one at this moment. :-)

Kind regards
Ulf Hansson

>>>>>
>>>> Kind regards
>>>> Ulf Hansson
>>>>
>>> --
>>> Thanks & Regards
>>> Haijun.
>>>
>>>
>
> --
> Thanks & Regards
> Haijun.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to