Re: your mail
Thanks Ray for pointing this out. Looks like the mail ended up in my spam folder otherwise. Apart from that this patch is a really really big NAK. I can't count how often I had to reject stuff like this! Using the page reference for TTM pages is illegal and can lead to struct page corruption. Can you please describe why you need that? Regards, Christian. Am 07.04.21 um 10:25 schrieb Huang Rui: On Wed, Apr 07, 2021 at 09:27:46AM +0800, songqiang wrote: Please add the description in the commit message and subject. Thanks, Ray Signed-off-by: songqiang --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 14660f723f71..f3698f0ad4d7 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -736,8 +736,16 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, if (++p != pages[i + j]) break; -if (j == HPAGE_PMD_NR) + if (j == HPAGE_PMD_NR) { order = HPAGE_PMD_ORDER; + for (j = 1; j < HPAGE_PMD_NR; ++j) + page_ref_dec(pages[i+j]); + } } #endif @@ -868,10 +876,12 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, p = alloc_pages(huge_flags, HPAGE_PMD_ORDER); if (!p) break; - - for (j = 0; j < HPAGE_PMD_NR; ++j) + for (j = 0; j < HPAGE_PMD_NR; ++j) { pages[i++] = p++; - + if (j > 0) + page_ref_inc(pages[i-1]); + } npages -= HPAGE_PMD_NR; } } ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=04%7C01%7Cray.huang%40amd.com%7C4ccc617b77d746db5af108d8f98db612%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533734805563118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9bSP90LYdJyJYJYmuphVmqk%2B3%2FE4JPrtXkQTbxwAt68%3D&reserved=0
Re: your mail
On Wed, Apr 07, 2021 at 09:27:46AM +0800, songqiang wrote: Please add the description in the commit message and subject. Thanks, Ray > Signed-off-by: songqiang > --- > drivers/gpu/drm/ttm/ttm_page_alloc.c | 18 ++ > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c > b/drivers/gpu/drm/ttm/ttm_page_alloc.c > index 14660f723f71..f3698f0ad4d7 100644 > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c > @@ -736,8 +736,16 @@ static void ttm_put_pages(struct page **pages, unsigned > npages, int flags, > if (++p != pages[i + j]) > break; > > - if (j == HPAGE_PMD_NR) > + if (j == HPAGE_PMD_NR) { > order = HPAGE_PMD_ORDER; > + for (j = 1; j < HPAGE_PMD_NR; ++j) > + page_ref_dec(pages[i+j]); > + } > } > #endif > > @@ -868,10 +876,12 @@ static int ttm_get_pages(struct page **pages, unsigned > npages, int flags, > p = alloc_pages(huge_flags, HPAGE_PMD_ORDER); > if (!p) > break; > - > - for (j = 0; j < HPAGE_PMD_NR; ++j) > + for (j = 0; j < HPAGE_PMD_NR; ++j) { > pages[i++] = p++; > - > + if (j > 0) > + page_ref_inc(pages[i-1]); > + } > npages -= HPAGE_PMD_NR; > } > } > > > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=04%7C01%7Cray.huang%40amd.com%7C4ccc617b77d746db5af108d8f98db612%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533734805563118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9bSP90LYdJyJYJYmuphVmqk%2B3%2FE4JPrtXkQTbxwAt68%3D&reserved=0
Re: your mail
On Thu, Apr 01, 2021 at 02:16:09PM -0700, Bhaumik Bhatt wrote: > Subject: [PATCH v8 0/9] Updates to MHI channel handling > Subject is present in the body ;) > MHI specification shows a state machine with support for STOP channel command > and the validity of certain state transitions. MHI host currently does not > provide any mechanism to stop a channel and restart it without resetting it. > There are also times when the device moves on to a different execution > environment while client drivers on the host are unaware of it and still > attempt to reset the channels facing unnecessary timeouts. > > This series addresses the above areas to provide support for stopping an MHI > channel, resuming it back, improved documentation and improving upon channel > state machine handling in general. > > This set of patches was tested on arm64 and x86_64 architecture. > Series applied to mhi-next! Thanks, Mani > v8: > -Split the state machine improvements patch to three patches as per review > > v7: > -Tested on x86_64 architecture > -Drop the patch "Do not clear channel context more than once" as issue is > fixed > differently using "bus: mhi: core: Fix double dma free()" > -Update the commit text to better reflect changes on state machine > improvements > > v6: > -Dropped the patch which introduced start/stop transfer APIs for lack of users > -Updated error handling and debug prints on channel handling improvements > patch > -Improved commit text to better explain certain patches based on review > comments > -Removed references to new APIs from the documentation improvement patch > > v5: > -Added reviewed-by tags from Hemant I missed earlier > -Added patch to prevent kernel warnings on clearing channel context twice > > v4: > -Updated commit text/descriptions and addressed checkpatch checks > -Added context validity check before starting/stopping channels from new API > -Added patch to clear channel context configuration after reset/unprepare > > v3: > -Updated documentation for channel transfer APIs to highlight differences > -Create separate patch for "allowing channel to be disabled from stopped > state" > > v2: > -Renamed the newly introduced APIs to mhi_start_transfer() / > mhi_stop_transfer() > -Added improved documentation to avoid confusion with the new APIs > -Removed the __ prefix from mhi_unprepare_channel() API for consistency. > > Bhaumik Bhatt (9): > bus: mhi: core: Allow sending the STOP channel command > bus: mhi: core: Clear context for stopped channels from remove() > bus: mhi: core: Improvements to the channel handling state machine > bus: mhi: core: Update debug messages to use client device > bus: mhi: core: Hold device wake for channel update commands > bus: mhi: core: Clear configuration from channel context during reset > bus: mhi: core: Check channel execution environment before issuing > reset > bus: mhi: core: Remove __ prefix for MHI channel unprepare function > bus: mhi: Improve documentation on channel transfer setup APIs > > drivers/bus/mhi/core/init.c | 22 - > drivers/bus/mhi/core/internal.h | 12 +++ > drivers/bus/mhi/core/main.c | 190 > > include/linux/mhi.h | 18 +++- > 4 files changed, 162 insertions(+), 80 deletions(-) > > -- > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project >
Re: your mail
On Mon, Mar 22, 2021 at 05:36:44PM -0400, Steven Rostedt wrote: $@#@#$%%% Bah! There was another typo in the email list! Take 3 -- Steve
Re: your mail
On Mon, Mar 22, 2021 at 05:21:56PM -0400, Steven Rostedt wrote: Bah! John 'Warthog' Hawley email had those single quotes in it that I cut and pasted into the Cc list, causing the quilt mail parsing to fail, but as LKML was in the "To" part, it still sent! Take 2 -- Steve
Re: your mail
On Wed, Dec 02, 2020 at 08:51:27PM +0200, Andy Shevchenko wrote: > On Thu, Dec 03, 2020 at 03:27:33AM +0900, Yun Levi wrote: > > On Thu, Dec 3, 2020 at 2:36 AM Andy Shevchenko > > wrote: > > > On Wed, Dec 02, 2020 at 09:26:05AM -0800, Yury Norov wrote: ... > > > Side note: speaking of performance, any plans to fix for_each_*_bit*() for > > > cases when the nbits is known to be <= BITS_PER_LONG? > > > > > > Now it makes an awful code generation (something like few hundred bytes of > > > code). > > > Frankly Speaking, I don't have an idea in now. > > Could you share your idea or wisdom? > > Something like (I may be mistaken by names, etc, I'm not a compiler expert, > and this is in pseudo language, I don't remember all API names by hart, > just to express the idea) as a rough first step > > __builtin_constant(nbits, find_next_set_bit_long, find_next_set_bit) > > find_next_set_bit_long() > { > unsigned long v = BIT_LAST_WORD(i); > return ffs_long(v); > } > > Same for find_first_set_bit() -> map it to ffs_long(). > > And I believe it can be optimized more. Btw it will also require to reconsider test cases where such constant small nbits values are passed (forcing compiler to avoid optimization somehow, one way is to try random nbits for some test cases). -- With Best Regards, Andy Shevchenko
Re: your mail
On Thu, Dec 03, 2020 at 03:27:33AM +0900, Yun Levi wrote: > On Thu, Dec 3, 2020 at 2:36 AM Andy Shevchenko > wrote: > > On Wed, Dec 02, 2020 at 09:26:05AM -0800, Yury Norov wrote: ... > > Side note: speaking of performance, any plans to fix for_each_*_bit*() for > > cases when the nbits is known to be <= BITS_PER_LONG? > > > > Now it makes an awful code generation (something like few hundred bytes of > > code). > Frankly Speaking, I don't have an idea in now. > Could you share your idea or wisdom? Something like (I may be mistaken by names, etc, I'm not a compiler expert, and this is in pseudo language, I don't remember all API names by hart, just to express the idea) as a rough first step __builtin_constant(nbits, find_next_set_bit_long, find_next_set_bit) find_next_set_bit_long() { unsigned long v = BIT_LAST_WORD(i); return ffs_long(v); } Same for find_first_set_bit() -> map it to ffs_long(). And I believe it can be optimized more. -- With Best Regards, Andy Shevchenko
Re: your mail
On Mon 17 Aug 17:12 UTC 2020, Amit Pundir wrote: > On Thu, 13 Aug 2020 at 12:38, Bjorn Andersson > wrote: > > > > On Thu 06 Aug 15:31 PDT 2020, Konrad Dybcio wrote: > > > > > Subject: Re: [PATCH v4] arm64: dts: qcom: Add support for Xiaomi Poco F1 > > > (Beryllium) > > > > > > >// This removed_region is needed to boot the device > > > > // TODO: Find out the user of this reserved memory > > > > removed_region: memory@88f0 { > > > > > > This region seems to belong to the Trust Zone. When Linux tries to access > > > it, TZ bites and shuts the device down. > > > > > > > This is in line with what the documentation indicates and then it would > > be better to just bump &tz_mem to a size of 0x490. > > Hi, so just to be sure that I got this right, you want me to extend > &tz_mem to the size of 0x490 from the default size of 0x2D0 by > including this downstream &removed_region (of size 0x1A0) + > previously unreserved downstream memory region (of size 0x20), to > align with the starting address of &qseecom_mem? > Yes Regards, Bjorn > I just gave this &tz_mem change a spin and I do not see any obvious > regression in my limited smoke testing (Boots AOSP to UI with > v5.9-rc1. Touch/BT/WiFi works) so far, with 20+ out-of-tree patches. > > Regards, > Amit Pundir > > > > > Regards, > > Bjorn
Re: your mail
On Thu, 13 Aug 2020 at 12:38, Bjorn Andersson wrote: > > On Thu 06 Aug 15:31 PDT 2020, Konrad Dybcio wrote: > > > Subject: Re: [PATCH v4] arm64: dts: qcom: Add support for Xiaomi Poco F1 > > (Beryllium) > > > > >// This removed_region is needed to boot the device > > > // TODO: Find out the user of this reserved memory > > > removed_region: memory@88f0 { > > > > This region seems to belong to the Trust Zone. When Linux tries to access > > it, TZ bites and shuts the device down. > > > > This is in line with what the documentation indicates and then it would > be better to just bump &tz_mem to a size of 0x490. Hi, so just to be sure that I got this right, you want me to extend &tz_mem to the size of 0x490 from the default size of 0x2D0 by including this downstream &removed_region (of size 0x1A0) + previously unreserved downstream memory region (of size 0x20), to align with the starting address of &qseecom_mem? I just gave this &tz_mem change a spin and I do not see any obvious regression in my limited smoke testing (Boots AOSP to UI with v5.9-rc1. Touch/BT/WiFi works) so far, with 20+ out-of-tree patches. Regards, Amit Pundir > > Regards, > Bjorn
Re: your mail
On Thu 06 Aug 15:31 PDT 2020, Konrad Dybcio wrote: > Subject: Re: [PATCH v4] arm64: dts: qcom: Add support for Xiaomi Poco F1 > (Beryllium) > > >// This removed_region is needed to boot the device > > // TODO: Find out the user of this reserved memory > > removed_region: memory@88f0 { > > This region seems to belong to the Trust Zone. When Linux tries to access it, > TZ bites and shuts the device down. > This is in line with what the documentation indicates and then it would be better to just bump &tz_mem to a size of 0x490. Regards, Bjorn
Re: your mail
On Tue, Jun 09, 2020 at 07:38:38AM -0400, Gaurav Singh wrote: > Please find the patch below. > > Thanks and regards, > Gaurav. > > >From Gaurav Singh # This line is ignored. > From: Gaurav Singh > Reply-To: > Subject: > In-Reply-To: > > I think something went wrong in your submission, just use 'git send-email' to send the patch out. thanks, greg k-h
Re: your mail
On Wed, May 06, 2020 at 01:52:45PM +0800, Jiaxun Yang wrote: > Subject: [PATCH v6] MIPS: Truncate link address into 32bit for 32bit kernel > In-Reply-To: <20200413062651.3992652-1-jiaxun.y...@flygoat.com> > > LLD failed to link vmlinux with 64bit load address for 32bit ELF > while bfd will strip 64bit address into 32bit silently. > To fix LLD build, we should truncate load address provided by platform > into 32bit for 32bit kernel. > > Signed-off-by: Jiaxun Yang > Link: https://github.com/ClangBuiltLinux/linux/issues/786 > Link: https://sourceware.org/bugzilla/show_bug.cgi?id=25784 > Reviewed-by: Fangrui Song > Reviewed-by: Kees Cook > Tested-by: Nathan Chancellor > Cc: Maciej W. Rozycki > --- > V2: Take MaskRay's shell magic. > > V3: After spent an hour on dealing with special character issue in > Makefile, I gave up to do shell hacks and write a util in C instead. > Thanks Maciej for pointing out Makefile variable problem. > > v4: Finally we managed to find a Makefile method to do it properly > thanks to Kees. As it's too far from the initial version, I removed > Review & Test tag from Nick and Fangrui and Cc instead. > > v5: Care vmlinuz as well. > > v6: Rename to LIKER_LOAD_ADDRESS > --- > arch/mips/Makefile | 13 - > arch/mips/boot/compressed/Makefile | 2 +- > arch/mips/kernel/vmlinux.lds.S | 2 +- > 3 files changed, 14 insertions(+), 3 deletions(-) applied to mips-next. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea.[ RFC1925, 2.3 ]
Re: your mail
On Wed, Jun 26, 2019 at 04:52:36PM +0200, Sebastian Andrzej Siewior wrote: > A small series of tiny cleanups. Applied 1-2 to wq/for-5.3. Thanks. -- tejun
Re: your mail
Peter Zijlstra's on April 11, 2019 8:53 pm: > Was this supposed to be patch 6/5 of your previous series? Dang, I screwed up the headers? Thanks for the ping, I will resend. It is standalone. It seems more suited to the scheduler tree than the timers one, but your call. It is generally of more use when CPU0 is _not_ a housekeeping one, and that's where I've done most testing, but I don't see any hard dependency. Thanks, Nick > > On Thu, Apr 11, 2019 at 04:05:36PM +1000, Nicholas Piggin wrote: >> Date: Tue, 9 Apr 2019 20:23:16 +1000 >> Subject: [PATCH] kernel/sched: run nohz idle load balancer on HK_FLAG_MISC >> CPUs >> >> The nohz idle balancer runs on the lowest idle CPU. This can >> interfere with isolated CPUs, so confine it to HK_FLAG_MISC >> housekeeping CPUs. >> >> HK_FLAG_SCHED is not used for this because it is not set anywhere >> at the moment. This could be folded into HK_FLAG_SCHED once that >> option is fixed. >> >> The problem was observed with increased jitter on an application >> running on CPU0, caused by nohz idle load balancing being run on >> CPU1 (an SMT sibling). >> >> Signed-off-by: Nicholas Piggin >> --- >> kernel/sched/fair.c | 16 ++-- >> 1 file changed, 10 insertions(+), 6 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index fdab7eb6f351..d29ca323214d 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -9522,22 +9522,26 @@ static inline int on_null_domain(struct rq *rq) >> * - When one of the busy CPUs notice that there may be an idle rebalancing >> * needed, they will kick the idle load balancer, which then does idle >> * load balancing for all the idle CPUs. >> + * - HK_FLAG_MISC CPUs are used for this task, because HK_FLAG_SCHED not set >> + * anywhere yet. >> */ >> >> static inline int find_new_ilb(void) >> { >> -int ilb = cpumask_first(nohz.idle_cpus_mask); >> +int ilb; >> >> -if (ilb < nr_cpu_ids && idle_cpu(ilb)) >> -return ilb; >> +for_each_cpu_and(ilb, nohz.idle_cpus_mask, >> + housekeeping_cpumask(HK_FLAG_MISC)) { >> +if (idle_cpu(ilb)) >> +return ilb; >> +} >> >> return nr_cpu_ids; >> } >> >> /* >> - * Kick a CPU to do the nohz balancing, if it is time for it. We pick the >> - * nohz_load_balancer CPU (if there is one) otherwise fallback to any idle >> - * CPU (if there is one). >> + * Kick a CPU to do the nohz balancing, if it is time for it. We pick any >> + * idle CPU in the HK_FLAG_MISC housekeeping set (if there is one). >> */ >> static void kick_ilb(unsigned int flags) >> { >> -- >> 2.20.1 >> >
Re: your mail
Was this supposed to be patch 6/5 of your previous series? On Thu, Apr 11, 2019 at 04:05:36PM +1000, Nicholas Piggin wrote: > Date: Tue, 9 Apr 2019 20:23:16 +1000 > Subject: [PATCH] kernel/sched: run nohz idle load balancer on HK_FLAG_MISC > CPUs > > The nohz idle balancer runs on the lowest idle CPU. This can > interfere with isolated CPUs, so confine it to HK_FLAG_MISC > housekeeping CPUs. > > HK_FLAG_SCHED is not used for this because it is not set anywhere > at the moment. This could be folded into HK_FLAG_SCHED once that > option is fixed. > > The problem was observed with increased jitter on an application > running on CPU0, caused by nohz idle load balancing being run on > CPU1 (an SMT sibling). > > Signed-off-by: Nicholas Piggin > --- > kernel/sched/fair.c | 16 ++-- > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index fdab7eb6f351..d29ca323214d 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9522,22 +9522,26 @@ static inline int on_null_domain(struct rq *rq) > * - When one of the busy CPUs notice that there may be an idle rebalancing > * needed, they will kick the idle load balancer, which then does idle > * load balancing for all the idle CPUs. > + * - HK_FLAG_MISC CPUs are used for this task, because HK_FLAG_SCHED not set > + * anywhere yet. > */ > > static inline int find_new_ilb(void) > { > - int ilb = cpumask_first(nohz.idle_cpus_mask); > + int ilb; > > - if (ilb < nr_cpu_ids && idle_cpu(ilb)) > - return ilb; > + for_each_cpu_and(ilb, nohz.idle_cpus_mask, > + housekeeping_cpumask(HK_FLAG_MISC)) { > + if (idle_cpu(ilb)) > + return ilb; > + } > > return nr_cpu_ids; > } > > /* > - * Kick a CPU to do the nohz balancing, if it is time for it. We pick the > - * nohz_load_balancer CPU (if there is one) otherwise fallback to any idle > - * CPU (if there is one). > + * Kick a CPU to do the nohz balancing, if it is time for it. We pick any > + * idle CPU in the HK_FLAG_MISC housekeeping set (if there is one). > */ > static void kick_ilb(unsigned int flags) > { > -- > 2.20.1 >
Re: your mail
On Tue, 2019-03-19 at 09:22 -0600, Keith Busch wrote: > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote: > > -> Share the NVMe device between host and guest. > > Even in fully virtualized configurations, > > some partitions of nvme device could be used by guests as block > > devices > > while others passed through with nvme-mdev to achieve balance between > > all features of full IO stack emulation and performance. > > > > -> NVME-MDEV is a bit faster due to the fact that in-kernel driver > > can send interrupts to the guest directly without a context > > switch that can be expensive due to meltdown mitigation. > > > > -> Is able to utilize interrupts to get reasonable performance. > > This is only implemented > > as a proof of concept and not included in the patches, > > but interrupt driven mode shows reasonable performance > > > > -> This is a framework that later can be used to support NVMe devices > > with more of the IO virtualization built-in > > (IOMMU with PASID support coupled with device that supports it) > > Would be very interested to see the PASID support. You wouldn't even > need to mediate the IO doorbells or translations if assigning entire > namespaces, and should be much faster than the shadow doorbells. > > I think you should send 6/9 "nvme/pci: init shadow doorbell after each > reset" separately for immediate inclusion. > > I like the idea in principle, but it will take me a little time to get > through reviewing your implementation. I would have guessed we could > have leveraged something from the existing nvme/target for the mediating > controller register access and admin commands. Maybe even start with > implementing an nvme passthrough namespace target type (we currently > have block and file). Hi! Sorry to bother you, but any update? I was somewhat sick for the last week, now finally back in shape to continue working on this and other tasks I have. I am studing now the nvme target code and the io_uring to evaluate the difficultiy of using something similiar to talk to the block device instead of / in addtion to the direct connection I implemented. I would be glad to hear more feedback on this project. I will also soon post the few fixes separately as you suggested. Best regards, Maxim Levitskky
Re: your mail
On Sat, Mar 23, 2019 at 01:17:38PM -0400, William J. Cunningham wrote: > >From bb04b0ca982b7042902fffe1377e0e38e83b402b Mon Sep 17 00:00:00 2001 > From: Will Cunningham > Date: Sat, 23 Mar 2019 12:54:34 -0400 > Subject: [PATCH] Staging: emxx_udc: emxx_udc: Fixed a coding style error > > Removed unnecessary parentheses. > > Signed-off-by: Will Cunningham Please fix up the headers and resend. regards, dan carpenter
Re: your mail
On Thu, Mar 21, 2019 at 07:07:38PM +0200, Maxim Levitsky wrote: > On Thu, 2019-03-21 at 16:13 +, Stefan Hajnoczi wrote: > > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote: > > > Date: Tue, 19 Mar 2019 14:45:45 +0200 > > > Subject: [PATCH 0/9] RFC: NVME VFIO mediated device > > > > > > Hi everyone! > > > > > > In this patch series, I would like to introduce my take on the problem of > > > doing > > > as fast as possible virtualization of storage with emphasis on low > > > latency. > > > > > > In this patch series I implemented a kernel vfio based, mediated device > > > that > > > allows the user to pass through a partition and/or whole namespace to a > > > guest. > > > > > > The idea behind this driver is based on paper you can find at > > > https://www.usenix.org/conference/atc18/presentation/peng, > > > > > > Although note that I stared the development prior to reading this paper, > > > independently. > > > > > > In addition to that implementation is not based on code used in the paper > > > as > > > I wasn't being able at that time to make the source available to me. > > > > > > ***Key points about the implementation:*** > > > > > > * Polling kernel thread is used. The polling is stopped after a > > > predefined timeout (1/2 sec by default). > > > Support for all interrupt driven mode is planned, and it shows promising > > > results. > > > > > > * Guest sees a standard NVME device - this allows to run guest with > > > unmodified drivers, for example windows guests. > > > > > > * The NVMe device is shared between host and guest. > > > That means that even a single namespace can be split between host > > > and guest based on different partitions. > > > > > > * Simple configuration > > > > > > *** Performance *** > > > > > > Performance was tested on Intel DC P3700, With Xeon E5-2620 v2 > > > and both latency and throughput is very similar to SPDK. > > > > > > Soon I will test this on a better server and nvme device and provide > > > more formal performance numbers. > > > > > > Latency numbers: > > > ~80ms - spdk with fio plugin on the host. > > > ~84ms - nvme driver on the host > > > ~87ms - mdev-nvme + nvme driver in the guest > > > > You mentioned the spdk numbers are with vhost-user-nvme. Have you > > measured SPDK's vhost-user-blk? > > I had lot of measuments of vhost-user-blk vs vhost-user-nvme. > vhost-user-nvme was always a bit faster but only a bit. > Thus I don't think it makes sense to benchamrk against vhost-user-blk. It's interesting because mdev-nvme is closest to the hardware while vhost-user-blk is closest to software. Doing things at the NVMe level isn't buying much performance because it's still going through a software path comparable to vhost-user-blk. From what you say it sounds like there isn't much to optimize away :(. Stefan signature.asc Description: PGP signature
Re: your mail
On Thu, 2019-03-21 at 16:13 +, Stefan Hajnoczi wrote: > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote: > > Date: Tue, 19 Mar 2019 14:45:45 +0200 > > Subject: [PATCH 0/9] RFC: NVME VFIO mediated device > > > > Hi everyone! > > > > In this patch series, I would like to introduce my take on the problem of > > doing > > as fast as possible virtualization of storage with emphasis on low latency. > > > > In this patch series I implemented a kernel vfio based, mediated device > > that > > allows the user to pass through a partition and/or whole namespace to a > > guest. > > > > The idea behind this driver is based on paper you can find at > > https://www.usenix.org/conference/atc18/presentation/peng, > > > > Although note that I stared the development prior to reading this paper, > > independently. > > > > In addition to that implementation is not based on code used in the paper > > as > > I wasn't being able at that time to make the source available to me. > > > > ***Key points about the implementation:*** > > > > * Polling kernel thread is used. The polling is stopped after a > > predefined timeout (1/2 sec by default). > > Support for all interrupt driven mode is planned, and it shows promising > > results. > > > > * Guest sees a standard NVME device - this allows to run guest with > > unmodified drivers, for example windows guests. > > > > * The NVMe device is shared between host and guest. > > That means that even a single namespace can be split between host > > and guest based on different partitions. > > > > * Simple configuration > > > > *** Performance *** > > > > Performance was tested on Intel DC P3700, With Xeon E5-2620 v2 > > and both latency and throughput is very similar to SPDK. > > > > Soon I will test this on a better server and nvme device and provide > > more formal performance numbers. > > > > Latency numbers: > > ~80ms - spdk with fio plugin on the host. > > ~84ms - nvme driver on the host > > ~87ms - mdev-nvme + nvme driver in the guest > > You mentioned the spdk numbers are with vhost-user-nvme. Have you > measured SPDK's vhost-user-blk? I had lot of measuments of vhost-user-blk vs vhost-user-nvme. vhost-user-nvme was always a bit faster but only a bit. Thus I don't think it makes sense to benchamrk against vhost-user-blk. Best regards, Maxim Levitsky
Re: your mail
On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote: > Date: Tue, 19 Mar 2019 14:45:45 +0200 > Subject: [PATCH 0/9] RFC: NVME VFIO mediated device > > Hi everyone! > > In this patch series, I would like to introduce my take on the problem of > doing > as fast as possible virtualization of storage with emphasis on low latency. > > In this patch series I implemented a kernel vfio based, mediated device that > allows the user to pass through a partition and/or whole namespace to a guest. > > The idea behind this driver is based on paper you can find at > https://www.usenix.org/conference/atc18/presentation/peng, > > Although note that I stared the development prior to reading this paper, > independently. > > In addition to that implementation is not based on code used in the paper as > I wasn't being able at that time to make the source available to me. > > ***Key points about the implementation:*** > > * Polling kernel thread is used. The polling is stopped after a > predefined timeout (1/2 sec by default). > Support for all interrupt driven mode is planned, and it shows promising > results. > > * Guest sees a standard NVME device - this allows to run guest with > unmodified drivers, for example windows guests. > > * The NVMe device is shared between host and guest. > That means that even a single namespace can be split between host > and guest based on different partitions. > > * Simple configuration > > *** Performance *** > > Performance was tested on Intel DC P3700, With Xeon E5-2620 v2 > and both latency and throughput is very similar to SPDK. > > Soon I will test this on a better server and nvme device and provide > more formal performance numbers. > > Latency numbers: > ~80ms - spdk with fio plugin on the host. > ~84ms - nvme driver on the host > ~87ms - mdev-nvme + nvme driver in the guest You mentioned the spdk numbers are with vhost-user-nvme. Have you measured SPDK's vhost-user-blk? Stefan signature.asc Description: PGP signature
Re: your mail
On Wed, 2019-03-20 at 11:03 -0600, Keith Busch wrote: > On Wed, Mar 20, 2019 at 06:30:29PM +0200, Maxim Levitsky wrote: > > Or instead I can use the block backend, > > (but note that currently the block back-end doesn't support polling which is > > critical for the performance). > > Oh, I think you can do polling through there. For reference, fs/io_uring.c > has a pretty good implementation that aligns with how you could use it. That is exactly my thought. The polling recently got lot of improvements in the block layer, which migh make this feasable. I will give it a try. Best regards, Maxim Levitsky
Re: your mail
On Wed, Mar 20, 2019 at 06:30:29PM +0200, Maxim Levitsky wrote: > Or instead I can use the block backend, > (but note that currently the block back-end doesn't support polling which is > critical for the performance). Oh, I think you can do polling through there. For reference, fs/io_uring.c has a pretty good implementation that aligns with how you could use it.
Re: your mail
On Tue, 2019-03-19 at 23:49 +, Chaitanya Kulkarni wrote: > Hi Keith, > On 03/19/2019 08:21 AM, Keith Busch wrote: > > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote: > > >-> Share the NVMe device between host and guest. > > > Even in fully virtualized configurations, > > > some partitions of nvme device could be used by guests as block > > > devices > > > while others passed through with nvme-mdev to achieve balance > > > between > > > all features of full IO stack emulation and performance. > > > > > >-> NVME-MDEV is a bit faster due to the fact that in-kernel driver > > > can send interrupts to the guest directly without a context > > > switch that can be expensive due to meltdown mitigation. > > > > > >-> Is able to utilize interrupts to get reasonable performance. > > > This is only implemented > > > as a proof of concept and not included in the patches, > > > but interrupt driven mode shows reasonable performance > > > > > >-> This is a framework that later can be used to support NVMe devices > > > with more of the IO virtualization built-in > > > (IOMMU with PASID support coupled with device that supports it) > > > > Would be very interested to see the PASID support. You wouldn't even > > need to mediate the IO doorbells or translations if assigning entire > > namespaces, and should be much faster than the shadow doorbells. > > > > I think you should send 6/9 "nvme/pci: init shadow doorbell after each > > reset" separately for immediate inclusion. > > > > I like the idea in principle, but it will take me a little time to get > > through reviewing your implementation. I would have guessed we could > > have leveraged something from the existing nvme/target for the mediating > > controller register access and admin commands. Maybe even start with > > implementing an nvme passthrough namespace target type (we currently > > have block and file). > > I have the code for the NVMeOf target passthru-ctrl, I think we can use > that as it is if you are looking for the passthru for NVMeOF. > > I'll post patch-series based on the latest code base soon. I am very intersing in this code. Could you explain how your NVMeOF target passthrough works? Which components of the NVME stack does it involve? Best regards, Maxim Levitsky > > > > ___ > > Linux-nvme mailing list > > linux-n...@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-nvme > > > > > ___ > Linux-nvme mailing list > linux-n...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme
Re: your mail
On Tue, 2019-03-19 at 09:22 -0600, Keith Busch wrote: > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote: > > -> Share the NVMe device between host and guest. > > Even in fully virtualized configurations, > > some partitions of nvme device could be used by guests as block > > devices > > while others passed through with nvme-mdev to achieve balance between > > all features of full IO stack emulation and performance. > > > > -> NVME-MDEV is a bit faster due to the fact that in-kernel driver > > can send interrupts to the guest directly without a context > > switch that can be expensive due to meltdown mitigation. > > > > -> Is able to utilize interrupts to get reasonable performance. > > This is only implemented > > as a proof of concept and not included in the patches, > > but interrupt driven mode shows reasonable performance > > > > -> This is a framework that later can be used to support NVMe devices > > with more of the IO virtualization built-in > > (IOMMU with PASID support coupled with device that supports it) > > Would be very interested to see the PASID support. You wouldn't even > need to mediate the IO doorbells or translations if assigning entire > namespaces, and should be much faster than the shadow doorbells. I fully agree with that. Note that to enable PASID support two things have to happen in this vendor. 1. Mature support for IOMMU with PASID support. On Intel side I know that they only have a spec released and currently the kernel bits to support it are placed. I still don't know when a product actually supporting this spec is going to be released. For other vendors (ARM/AMD/) I haven't done yet a research on state of PASID based IOMMU support on their platforms. 2. NVMe spec has to be extended to support PASID. At minimum, we need an ability to assign an PASID to a sq/cq queue pair and ability to relocate the doorbells, such as each guest would get its own (hardware backed) MMIO page with its own doorbells. Plus of course the hardware vendors have to embrace the spec. I guess these two things will happen in collaborative manner. > > I think you should send 6/9 "nvme/pci: init shadow doorbell after each > reset" separately for immediate inclusion. I'll do this soon. Also '5/9 nvme/pci: add known admin effects to augment admin effects log page' can be considered for immediate inclusion as well, as it works around a flaw in the NVMe controller badly done admin side effects page with no side effects (pun intended) for spec compliant controllers (I think so). This can be fixed with a quirk if you prefer though. > > I like the idea in principle, but it will take me a little time to get > through reviewing your implementation. I would have guessed we could > have leveraged something from the existing nvme/target for the mediating > controller register access and admin commands. Maybe even start with > implementing an nvme passthrough namespace target type (we currently > have block and file). I fully agree with you on that I could have used some of the nvme/target code, and I am planning to do so eventually. For that I would need to make my driver, to be one of the target drivers, and I would need to add another target back end, like you said to allow my target driver to talk directly to the nvme hardware bypassing the block layer. Or instead I can use the block backend, (but note that currently the block back-end doesn't support polling which is critical for the performance). Switch to the target code might though have some (probably minor) performance impact, as it would probably lengthen the critical code path a bit (I might need for instance to translate the PRP lists I am getting from the virtual controller to a scattergather list and back). This is why I did this the way I did, but now knowing that probably I can afford to loose a bit of performance, I can look at doing that. Best regards, Thanks in advance for the review, Maxim Levitsky PS: For reference currently the IO path looks more or less like that: My IO thread notices a doorbell write, reads a command from a submission queue, translates it (without even looking at the data pointer) and sends it to the nvme pci driver together with pointer to data iterator'. The nvme pci driver calls the data iterator N times, which makes the iterator translate and fetch the DMA addresses where the data is already mapped on the its pci nvme device (the mdev driver maps all the guest memory to the nvme pci device). The nvme pci driver uses these addresses it receives, to create a prp list, which it puts into the data pointer. The nvme pci driver also allocates an free command id, from a list, puts it into the command ID and sends the command to the real hardware. Later the IO thread calls to the nvme pci driver to poll the queue. When completions arrive, the nvme pci driver returns them back to the IO thread. > > ___
Re: your mail
On Mon, Mar 18, 2019 at 08:20:01PM -0600, George Hilliard wrote: > Because of this change, the driver now expects a pinctrl device > reference in the mmc controller's device tree node; without it, it will > bail out. This could break existing setups that don't specify it > because it "just worked" up until now. So currently I just let the old > behavior fall away because this is a staging driver. But if this is a > problem, the old behavior could be added back as a fallback. > > This is version 2 of a patchset that I requested feedback for about a > month ago. Please review as if they are a new patchset; all the patches > were rebased several times and a couple new correctness fixes added. > > The TODO list is largely unchanged, aside from the couple of TODO > comments in the code that I have addressed. Ultimately, I think this > driver could potentially be merged with the "real" mtk-mmc driver as the > TODO suggests, but someone who is more familiar with the IP core will > have to do that. Mediatek documentation (that I can find) is very > sparse. > > This is ready to merge if there is no other feedback! > > >From George Hilliard # This line is ignored. > From: George Hilliard > Reply-To: > Subject: [PATCH v2 00/11] mt7621-mmc: Various correctness fixes > In-Reply-To: > > No subject for this email?
Re: your mail
Hi Keith, On 03/19/2019 08:21 AM, Keith Busch wrote: > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote: >>-> Share the NVMe device between host and guest. >> Even in fully virtualized configurations, >> some partitions of nvme device could be used by guests as block devices >> while others passed through with nvme-mdev to achieve balance between >> all features of full IO stack emulation and performance. >> >>-> NVME-MDEV is a bit faster due to the fact that in-kernel driver >> can send interrupts to the guest directly without a context >> switch that can be expensive due to meltdown mitigation. >> >>-> Is able to utilize interrupts to get reasonable performance. >> This is only implemented >> as a proof of concept and not included in the patches, >> but interrupt driven mode shows reasonable performance >> >>-> This is a framework that later can be used to support NVMe devices >> with more of the IO virtualization built-in >> (IOMMU with PASID support coupled with device that supports it) > > Would be very interested to see the PASID support. You wouldn't even > need to mediate the IO doorbells or translations if assigning entire > namespaces, and should be much faster than the shadow doorbells. > > I think you should send 6/9 "nvme/pci: init shadow doorbell after each > reset" separately for immediate inclusion. > > I like the idea in principle, but it will take me a little time to get > through reviewing your implementation. I would have guessed we could > have leveraged something from the existing nvme/target for the mediating > controller register access and admin commands. Maybe even start with > implementing an nvme passthrough namespace target type (we currently > have block and file). I have the code for the NVMeOf target passthru-ctrl, I think we can use that as it is if you are looking for the passthru for NVMeOF. I'll post patch-series based on the latest code base soon. > > ___ > Linux-nvme mailing list > linux-n...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme >
Re: your mail
On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote: > -> Share the NVMe device between host and guest. > Even in fully virtualized configurations, > some partitions of nvme device could be used by guests as block devices > while others passed through with nvme-mdev to achieve balance between > all features of full IO stack emulation and performance. > > -> NVME-MDEV is a bit faster due to the fact that in-kernel driver > can send interrupts to the guest directly without a context > switch that can be expensive due to meltdown mitigation. > > -> Is able to utilize interrupts to get reasonable performance. > This is only implemented > as a proof of concept and not included in the patches, > but interrupt driven mode shows reasonable performance > > -> This is a framework that later can be used to support NVMe devices > with more of the IO virtualization built-in > (IOMMU with PASID support coupled with device that supports it) Would be very interested to see the PASID support. You wouldn't even need to mediate the IO doorbells or translations if assigning entire namespaces, and should be much faster than the shadow doorbells. I think you should send 6/9 "nvme/pci: init shadow doorbell after each reset" separately for immediate inclusion. I like the idea in principle, but it will take me a little time to get through reviewing your implementation. I would have guessed we could have leveraged something from the existing nvme/target for the mediating controller register access and admin commands. Maybe even start with implementing an nvme passthrough namespace target type (we currently have block and file).
Re: your mail
On Mon, Feb 25, 2019 at 03:16:29PM -0500, Johannes Weiner wrote: > [resending, rebased on top of latest mmots] > > The memcg LRU stats usage is currently a bit messy. Memcg has private > per-zone counters because reclaim needs zone granularity sometimes, > but we also have plenty of users that need to awkwardly sum them up to > node or memcg granularity. Meanwhile the canonical per-memcg vmstats > do not track the LRU counts (NR_INACTIVE_ANON etc.) as you'd expect. > > This series enables LRU count tracking in the per-memcg vmstats array > such that lruvec_page_state() and memcg_page_state() work on the enum > node_stat_item items for the LRU counters. Then it converts all the > callers that don't specifically need per-zone numbers over to that. The updated version looks very good* to me! Please, feel free to use: Reviewed-by: Roman Gushchin Looking through the patchset, I have a feeling that we're sometimes gathering too much data. Perhaps we don't need the whole set of counters to be per-cpu on both memcg- and memcg-per-node levels. Merging them can save quite a lot of space. Anyway, it's a separate topic. * except "to" and "subject" of the cover letter Thanks!
Re: your mail
On Wed, Jul 25, 2018 at 01:22:07AM +0300, Georgios Tsotsos wrote: > Date: Wed, 25 Jul 2018 01:18:58 +0300 > Subject: [PATCH 0/4] Staging: octeon-usb: Fixes and Coding style applied. > > Hello, Somehow your subject here got messed up and put in the bod of the email. Not a big deal this time, but be more careful next time please. thanks, greg k-h
Re: your mail
Thanks for this. This is a lot of work. On Wed, Jun 13, 2018 at 08:31:28PM +0300, Anton Vasilyev wrote: > diff --git a/drivers/staging/rts5208/rtsx.c b/drivers/staging/rts5208/rtsx.c > index 70e0b8623110..69e6abe14abf 100644 > --- a/drivers/staging/rts5208/rtsx.c > +++ b/drivers/staging/rts5208/rtsx.c > @@ -857,7 +857,7 @@ static int rtsx_probe(struct pci_dev *pci, > dev->chip = kzalloc(sizeof(*dev->chip), GFP_KERNEL); > if (!dev->chip) { > err = -ENOMEM; > - goto errout; > + goto chip_alloc_fail; The most recent successful allocation is scsi_host_alloc(). I was really hoping this would say something like "goto err_free_host;" or something. The naming style here is a "come from" label which doesn't say if it's going to free the scsi host or not... It turns out we don't free the the host, but we should: err_put_host: scsi_host_put(host); The kzalloc() has it's own error message built in, and all the other error paths as well so the dev_err() is not super important to me... Killing the threads seems actually really complicated so maybe we should just have a separate error paths for that. I'm not sure... regards, dan carpenter
Re: your mail
On Thu, Dec 07, 2017 at 01:26:14AM -0800, Alexander Kappner wrote: > Date: Wed, 6 Dec 2017 15:28:37 -0800 > Subject: [PATCH] usb-core: Fix potential null pointer dereference in > xhci-debugfs.c Something went wrong here, resulting in an email with no subject. Can you fix this up and resend? thanks, greg k-h
Re: your mail
On Fri, Aug 18, 2017 at 11:12:14PM +0530, Rajneesh Bhardwaj wrote: > Bcc: > Subject: Re: [PATCH] platform/x86: intel_pmc_core: Add Package C-states > residency info > Reply-To: > In-Reply-To: > > Please ignore my previous email without subject. It was sent by mistake. > On Fri, Aug 18, 2017 at 08:17:32PM +0300, Andy Shevchenko wrote: > > +PeterZ (since I mentioned his name) > > > > On Fri, Aug 18, 2017 at 5:58 PM, Rajneesh Bhardwaj > > wrote: > > > On Fri, Aug 18, 2017 at 03:57:34PM +0300, Andy Shevchenko wrote: > > >> On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj > > >> wrote: > > >> > This patch introduces a new debugfs entry to read current Package > > >> > C-state > > >> > residency values and, one new kernel API to read the Package C-10 > > >> > residency > > >> > counter. > > >> > > > >> > Package C-state residency MSRs provide useful debug information about > > >> > system > > >> > idle states. In idle states system must enter deeper Package C-states. > > > > >> Why this patch is needed? > > > > > > Andy, I'll try to give some background for this. > > > > > > This is needed to enhance the S0ix failure debug capabilities from within > > > the kernel. On ChromeOS we have S0ix failsafe kernel framework that is > > > used > > > to validate S0ix and report the blockers in case of a failure. > > > https://patchwork.kernel.org/patch/9148999/ > > > > (It's not part of upstream) > > Sorry i sent an older link. There are fresh attempts to get this into > mainline kernel and looks like there is a traction for it. > https://patchwork.kernel.org/patch/9831229/ > > Package C-state (PC10) validation is discussed there. > > > > > > So far only intel_pmc_slp_s0_counter_read is called by this framework to > > > check whether the previous attempt to enter S0ix was success or not. > > > > I harder see even a single user of that API in current kernel. It > > should be unexported and removed I think. > > > > > Having > > > another PC10 counter related exported function enhances the S0ix debug > > > since > > > PC10 state is a prerequisite to enter S0ix. > > > > > >> See, we have turbostat and cpupower user space tools which do this > > >> without any additional code to be written in kernel. What prevents > > >> your user space application do the same? > > >> > > >> Moreover, we have events for cstate, I assume perf or something alike > > >> can monitor those counters as well. > > > > > > You're right, perhaps the debugfs is redundant when we have those user > > > space > > > tools but such tools are not available readily for all platforms/distros. > > > Interfaces like /dev/cpu/*/msr that turbostat uses are not available on > > > all > > > the platforms. > > > PMC driver is a debug driver so i thought its better to show Package > > > C-state > > > related info for low power debug here. > > > > > >> > > >> Sorry, NAK. > > > > > > This patch has two parts i.e. exported PC10 API and the debugfs. Based on > > > the above explanation, if the patch is not good as is, please let me know > > > if > > > i should drop the debugfs part and respin a v2 with just the exported API > > > or > > > drop this totally. > > > > > > Thanks for the feedback and thanks for taking time to review! > > > > Reading above makes me think that entire design of this is misguided. > > Since the most of values are counters they better to be accessed in a > > way how perf does. > > > > In case you need *in-kernel* facility, do some APIs (if it's not done > > yet) for events drivers first. > > cstate event driver is already in upstream. > > > > Sorry, NAK for entire patch until it would be blessed by people like Peter > > Z. > > > > -- > > With Best Regards, > > Andy Shevchenko > > -- > Best Regards, > Rajneesh -- Best Regards, Rajneesh
Re: your mail
Hi Catalin, all. Thank you for your time on reviewing the series. I really appreciate it. This is the updated version where I tried to address all comments: https://github.com/norov/linux/commits/ilp32-20170613.4 (3 last patches here is the Andrew Pinski's rework of vdso rebased on ilp32 series) If nothing will come here on review, I'll send v8 at the beginning of the next week. Is this plan OK? And this is the backport on the v4.11 kernel: https://github.com/norov/linux/commits/ilp32-4.11.4 Yury On Sun, Jun 04, 2017 at 02:59:49PM +0300, Yury Norov wrote: > Subject: [PATCH v7 resend 2 00/20] ILP32 for ARM64 > > Hi Catalin, > > Here is a rebase of latest kernel patchset against next-20170602. There's > almost > no changes, but there are some conflicts that are not trivial, and I'd like to > refresh the submission therefore. > > How are your experiments with testing and benchmarking of ILP32 are going? In > my current tests I see 0 failures on LTP. Benchmarking on SPEC CPU2006 and > LMBench shows no difference for LP64 and expected performance boost for ILP32 > (compared to LP64 results). > > Steve Ellcey is handling upstream submission of Glibc patches. The patches are > ready and have been reviewed and reworked per community’s comments. There are > no outstanding userspace ABI issues from Glibc. Glibc submission is now > waiting > on ILP32 kernel submission. > > Catalin, regarding rootfs, is OpenSuSe’s build sufficient for your > experiments? > I’ve also seen Wookey merging patches for ILP32 triplet to binutils and > pushing > them to Debian. > > One last thing I wanted to check with you about is ILP32 PCS - does, in your > view, ARM Ltd. needs to publish any additional docs for ABI to become > official? > > Below is the regular description. > > Thanks. > Yury > > > > This series enables aarch64 with ilp32 mode. > > As supporting work, it introduces ARCH_32BIT_OFF_T configuration > option that is enabled for existing 32-bit architectures but disabled > for new arches (so 64-bit off_t is is used by new userspace). Also it > deprecates getrlimit and setrlimit syscalls prior to prlimit64. > > This version is based on linux-next from 2017-03-01. It works with > glibc-2.25, and tested with LTP, glibc testsuite, trinity, lmbench, > CPUSpec. > > Patches 1, 2, 3 and 8 are general, and may be applied separately. > > This is the rebase of v7 - still no major changes has been made. > > Kernel and GLIBC trees: > https://github.com/norov/linux/tree/ilp32-20170602 > https://github.com/norov/glibc/tree/dev9 > > (GLIBC patches are managed by Steve Ellcey, so my tree is only for > reference.) > > Changes: > v3: https://lkml.org/lkml/2014/9/3/704 > v4: https://lkml.org/lkml/2015/4/13/691 > v5: https://lkml.org/lkml/2015/9/29/911 > v6: https://lkml.org/lkml/2016/5/23/661 > v7: RFC nowrap: https://lkml.org/lkml/2016/6/17/990 > v7: RFC2 nowrap: https://lkml.org/lkml/2016/8/17/245 > v7: RFC3 nowrap: https://lkml.org/lkml/2016/10/21/883 > v7: https://lkml.org/lkml/2017/1/9/213 > v7: Resend: > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/490801.html > v7: Resend 2: > - vdso-ilp32 Makefile synced with lp64 Makefile (patch 19); > - rebased on next-20170602. > > Andrew Pinski (6): > arm64: rename COMPAT to AARCH32_EL0 in Kconfig > arm64: ensure the kernel is compiled for LP64 > arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64 > arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use > it > arm64: ilp32: introduce ilp32-specific handlers for sigframe and > ucontext > arm64:ilp32: add ARM64_ILP32 to Kconfig > > Philipp Tomsich (1): > arm64:ilp32: add vdso-ilp32 and use for signal return > > Yury Norov (13): > compat ABI: use non-compat openat and open_by_handle_at variants > 32-bit ABI: introduce ARCH_32BIT_OFF_T config option > asm-generic: Drop getrlimit and setrlimit syscalls from default list > arm64: ilp32: add documentation on the ILP32 ABI for ARM64 > thread: move thread bits accessors to separated file > arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat) > arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64 > arm64: introduce binfmt_elf32.c > arm64: ilp32: introduce binfmt_ilp32.c > arm64: ilp32: share aarch32 syscall handlers > arm64: signal: share lp64 signal routines to ilp32 > arm64: signal32: move ilp32 and aarch32 common code to separated file > arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32 > > Documentation/arm64/ilp32.txt | 45 +++ > arch/Kconfig | 4 + > arch/arc/Kconfig | 1 + > arch/arc/include/uapi/asm/unistd.h| 1 + > arch/arm/Kconfig | 1 + > arch/arm64/Kconfig| 19 ++- > arch/arm64/Makefile | 8 ++ > arch/arm64/include/as
Re: your mail
On Thu 27-04-17 11:08:38, Joonsoo Kim wrote: > On Wed, Apr 26, 2017 at 11:19:06AM +0200, Michal Hocko wrote: > > > > [...] > > > > > > > > > > You are trying to change a semantic of something that has a well > > > > > > defined > > > > > > meaning. I disagree that we should change it. It might sound like a > > > > > > simpler thing to do because pfn walkers will have to be checked but > > > > > > what > > > > > > you are proposing is conflating two different things together. > > > > > > > > > > I don't think that *I* try to change the semantic of pfn_valid(). > > > > > It would be original semantic of pfn_valid(). > > > > > > > > > > "If pfn_valid() returns true, we can get proper struct page and the > > > > > zone information," > > > > > > > > I do not see any guarantee about the zone information anywhere. In fact > > > > this is not true with the original implementation as I've tried to > > > > explain already. We do have new pages associated with a zone but that > > > > association might change during the online phase. So you cannot really > > > > rely on that information until the page is online. There is no real > > > > change in that regards after my rework. > > > > > > I know that what you did doesn't change thing much. What I try to say > > > is that previous implementation related to pfn_valid() in hotplug is > > > wrong. Please do not assume that hotplug implementation is correct and > > > other pfn_valid() users are incorrect. There is no design document so > > > I'm not sure which one is correct but assumption that pfn_valid() user > > > can access whole the struct page information makes much sense to me. > > > > Not really. E.g. ZONE_DEVICE pages are never online AFAIK. I believe we > > still need pfn_valid to work for those pfns. Really, pfn_valid has a > > It's really contrary example to your insist. They requires not only > struct page but also other information, especially, the zone index. > They checks zone idx to know whether this page is for ZONE_DEVICE or not. Yes and they guarantee this association is true. Without memory onlining though. This memory is never online for anybody who is asking. [...] > I think that I did my best to explain my reasoning. It seems that we > cannot agree with each other so it's better for some others to express > their opinion to this problem. I will stop this discussion from now > on. I _do_ appreciate your feedback and if the general consensus is to modify pfn_valid I can go that direction but my gut feeling tells me that conflating "existing struct page" test and "fully online and initialized" one is a wrong thing to do. -- Michal Hocko SUSE Labs
Re: your mail
On Wed, Apr 26, 2017 at 11:19:06AM +0200, Michal Hocko wrote: > > > [...] > > > > > > > > You are trying to change a semantic of something that has a well > > > > > defined > > > > > meaning. I disagree that we should change it. It might sound like a > > > > > simpler thing to do because pfn walkers will have to be checked but > > > > > what > > > > > you are proposing is conflating two different things together. > > > > > > > > I don't think that *I* try to change the semantic of pfn_valid(). > > > > It would be original semantic of pfn_valid(). > > > > > > > > "If pfn_valid() returns true, we can get proper struct page and the > > > > zone information," > > > > > > I do not see any guarantee about the zone information anywhere. In fact > > > this is not true with the original implementation as I've tried to > > > explain already. We do have new pages associated with a zone but that > > > association might change during the online phase. So you cannot really > > > rely on that information until the page is online. There is no real > > > change in that regards after my rework. > > > > I know that what you did doesn't change thing much. What I try to say > > is that previous implementation related to pfn_valid() in hotplug is > > wrong. Please do not assume that hotplug implementation is correct and > > other pfn_valid() users are incorrect. There is no design document so > > I'm not sure which one is correct but assumption that pfn_valid() user > > can access whole the struct page information makes much sense to me. > > Not really. E.g. ZONE_DEVICE pages are never online AFAIK. I believe we > still need pfn_valid to work for those pfns. Really, pfn_valid has a It's really contrary example to your insist. They requires not only struct page but also other information, especially, the zone index. They checks zone idx to know whether this page is for ZONE_DEVICE or not. So, pfn_valid() for ZONE_DEVICE pages assume that struct page has all the valid information. It's perfectly matched with my suggestion. Online isn't important issue here. What the important point is the condition that pfn_valid() return true. pfn_valid() for ZONE_DEVICE returns true after arch_add_memory() since all the struct page information is fixed there. If zone of hotplugged memory cannot be fixed at this moment, you can defef it until all the information is fixed (onlining). That seems to be better semantic of pfn_valid() to me. > different meaning than you would like it to have. Who knows how many > others like that are lurking there. I feel much more comfortable to go > and hunt already broken code and fix it rathert than break something > unexpectedly. I think that I did my best to explain my reasoning. It seems that we cannot agree with each other so it's better for some others to express their opinion to this problem. I will stop this discussion from now on. Thanks.
Re: your mail
On Tue 25-04-17 11:50:45, Joonsoo Kim wrote: > On Mon, Apr 24, 2017 at 09:53:12AM +0200, Michal Hocko wrote: > > On Mon 24-04-17 10:44:43, Joonsoo Kim wrote: > > > On Fri, Apr 21, 2017 at 09:16:16AM +0200, Michal Hocko wrote: > > > > On Fri 21-04-17 13:38:28, Joonsoo Kim wrote: > > > > > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote: > > > > > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > > > > > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote: > > > > > > [...] > > > > > > > > Which pfn walkers you have in mind? > > > > > > > > > > > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by > > > > > > > using pfn_valid(). > > > > > > > > > > > > Yeah, I've checked that one and in fact this is a good example of > > > > > > the > > > > > > case where you do not really care about holes. It just checks the > > > > > > page > > > > > > count which is a valid information under any circumstances. > > > > > > > > > > I don't think so. First, it checks the page *map* count. Is it still > > > > > valid > > > > > even if PageReserved() is set? > > > > > > > > I do not know about any user which would manipulate page map count for > > > > referenced pages. The core MM code doesn't. > > > > > > That's weird that we can get *map* count without PageReserved() check, > > > but we cannot get zone information. > > > Zone information is more static information than map count. > > > > As I've already pointed out the rework of the hotplug code is mainly > > about postponing the zone initialization from the physical hot add to > > the logical onlining. The zone is really not clear until that moment. > > > > > It should be defined/documented in this time that what information in > > > the struct page is valid even if PageReserved() is set. And then, we > > > need to fix all the things based on this design decision. > > > > Where would you suggest documenting this? We do have > > Documentation/memory-hotplug.txt but it is not really specific about > > struct page. > > pfn_valid() in include/linux/mmzone.h looks proper place. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c412e6a3a1e9..443258fcac93 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1288,10 +1288,14 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long); #ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL /* * pfn_valid() is meant to be able to tell if a given PFN has valid memmap - * associated with it or not. In FLATMEM, it is expected that holes always - * have valid memmap as long as there is valid PFNs either side of the hole. - * In SPARSEMEM, it is assumed that a valid section has a memmap for the - * entire section. + * associated with it or not. This means that a struct page exists for this + * pfn. The caller cannot assume the page is fully initialized though. + * pfn_to_online_page() should be used to make sure the struct page is fully + * initialized. + * + * In FLATMEM, it is expected that holes always have valid memmap as long as + * there is valid PFNs either side of the hole. In SPARSEMEM, it is assumed + * that a valid section has a memmap for the entire section. * * However, an ARM, and maybe other embedded architectures in the future * free memmap backing holes to save memory on the assumption the memmap is > > [...] > > > > > > You are trying to change a semantic of something that has a well defined > > > > meaning. I disagree that we should change it. It might sound like a > > > > simpler thing to do because pfn walkers will have to be checked but what > > > > you are proposing is conflating two different things together. > > > > > > I don't think that *I* try to change the semantic of pfn_valid(). > > > It would be original semantic of pfn_valid(). > > > > > > "If pfn_valid() returns true, we can get proper struct page and the > > > zone information," > > > > I do not see any guarantee about the zone information anywhere. In fact > > this is not true with the original implementation as I've tried to > > explain already. We do have new pages associated with a zone but that > > association might change during the online phase. So you cannot really > > rely on that information until the page is online. There is no real > > change in that regards after my rework. > > I know that what you did doesn't change thing much. What I try to say > is that previous implementation related to pfn_valid() in hotplug is > wrong. Please do not assume that hotplug implementation is correct and > other pfn_valid() users are incorrect. There is no design document so > I'm not sure which one is correct but assumption that pfn_valid() user > can access whole the struct page information makes much sense to me. Not really. E.g. ZONE_DEVICE pages are never online AFAIK. I believe we still need pfn_valid to work for those pfns. Really, pfn_valid has a different meaning than you would like it to have. Who knows how many others like that a
Re: your mail
On Mon, Apr 24, 2017 at 09:53:12AM +0200, Michal Hocko wrote: > On Mon 24-04-17 10:44:43, Joonsoo Kim wrote: > > On Fri, Apr 21, 2017 at 09:16:16AM +0200, Michal Hocko wrote: > > > On Fri 21-04-17 13:38:28, Joonsoo Kim wrote: > > > > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote: > > > > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > > > > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote: > > > > > [...] > > > > > > > Which pfn walkers you have in mind? > > > > > > > > > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by > > > > > > using pfn_valid(). > > > > > > > > > > Yeah, I've checked that one and in fact this is a good example of the > > > > > case where you do not really care about holes. It just checks the page > > > > > count which is a valid information under any circumstances. > > > > > > > > I don't think so. First, it checks the page *map* count. Is it still > > > > valid > > > > even if PageReserved() is set? > > > > > > I do not know about any user which would manipulate page map count for > > > referenced pages. The core MM code doesn't. > > > > That's weird that we can get *map* count without PageReserved() check, > > but we cannot get zone information. > > Zone information is more static information than map count. > > As I've already pointed out the rework of the hotplug code is mainly > about postponing the zone initialization from the physical hot add to > the logical onlining. The zone is really not clear until that moment. > > > It should be defined/documented in this time that what information in > > the struct page is valid even if PageReserved() is set. And then, we > > need to fix all the things based on this design decision. > > Where would you suggest documenting this? We do have > Documentation/memory-hotplug.txt but it is not really specific about > struct page. pfn_valid() in include/linux/mmzone.h looks proper place. > > [...] > > > > You are trying to change a semantic of something that has a well defined > > > meaning. I disagree that we should change it. It might sound like a > > > simpler thing to do because pfn walkers will have to be checked but what > > > you are proposing is conflating two different things together. > > > > I don't think that *I* try to change the semantic of pfn_valid(). > > It would be original semantic of pfn_valid(). > > > > "If pfn_valid() returns true, we can get proper struct page and the > > zone information," > > I do not see any guarantee about the zone information anywhere. In fact > this is not true with the original implementation as I've tried to > explain already. We do have new pages associated with a zone but that > association might change during the online phase. So you cannot really > rely on that information until the page is online. There is no real > change in that regards after my rework. I know that what you did doesn't change thing much. What I try to say is that previous implementation related to pfn_valid() in hotplug is wrong. Please do not assume that hotplug implementation is correct and other pfn_valid() users are incorrect. There is no design document so I'm not sure which one is correct but assumption that pfn_valid() user can access whole the struct page information makes much sense to me. So, I hope that please fix hotplug implementation rather than modifying each pfn_valid() users. > > [...] > > > So please do not conflate those two different concepts together. I > > > believe that the most prominent pfn walkers should be covered now and > > > others can be evaluated later. > > > > Even if original pfn_valid()'s semantic is not the one that I mentioned, > > I think that suggested semantic from me is better. > > Only hotplug code need to be changed and others doesn't need to be changed. > > There is no overhead for others. What's the problem about this approach? > > That this would require to check _every_ single pfn_valid user in the > kernel. That is beyond my time capacity and not really necessary because > the current code already suffers from the same/similar class of > problems. I think that all the pfn_valid() user doesn't consider hole case. Unlike your expectation, if your way is taken, it requires to check _every_ pfn_valid() users. Thanks.
Re: your mail
On Mon 24-04-17 10:44:43, Joonsoo Kim wrote: > On Fri, Apr 21, 2017 at 09:16:16AM +0200, Michal Hocko wrote: > > On Fri 21-04-17 13:38:28, Joonsoo Kim wrote: > > > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote: > > > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > > > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote: > > > > [...] > > > > > > Which pfn walkers you have in mind? > > > > > > > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by > > > > > using pfn_valid(). > > > > > > > > Yeah, I've checked that one and in fact this is a good example of the > > > > case where you do not really care about holes. It just checks the page > > > > count which is a valid information under any circumstances. > > > > > > I don't think so. First, it checks the page *map* count. Is it still valid > > > even if PageReserved() is set? > > > > I do not know about any user which would manipulate page map count for > > referenced pages. The core MM code doesn't. > > That's weird that we can get *map* count without PageReserved() check, > but we cannot get zone information. > Zone information is more static information than map count. As I've already pointed out the rework of the hotplug code is mainly about postponing the zone initialization from the physical hot add to the logical onlining. The zone is really not clear until that moment. > It should be defined/documented in this time that what information in > the struct page is valid even if PageReserved() is set. And then, we > need to fix all the things based on this design decision. Where would you suggest documenting this? We do have Documentation/memory-hotplug.txt but it is not really specific about struct page. [...] > > You are trying to change a semantic of something that has a well defined > > meaning. I disagree that we should change it. It might sound like a > > simpler thing to do because pfn walkers will have to be checked but what > > you are proposing is conflating two different things together. > > I don't think that *I* try to change the semantic of pfn_valid(). > It would be original semantic of pfn_valid(). > > "If pfn_valid() returns true, we can get proper struct page and the > zone information," I do not see any guarantee about the zone information anywhere. In fact this is not true with the original implementation as I've tried to explain already. We do have new pages associated with a zone but that association might change during the online phase. So you cannot really rely on that information until the page is online. There is no real change in that regards after my rework. [...] > > So please do not conflate those two different concepts together. I > > believe that the most prominent pfn walkers should be covered now and > > others can be evaluated later. > > Even if original pfn_valid()'s semantic is not the one that I mentioned, > I think that suggested semantic from me is better. > Only hotplug code need to be changed and others doesn't need to be changed. > There is no overhead for others. What's the problem about this approach? That this would require to check _every_ single pfn_valid user in the kernel. That is beyond my time capacity and not really necessary because the current code already suffers from the same/similar class of problems. > And, I'm not sure that you covered the most prominent pfn walkers. > Please see pagetypeinfo_showblockcount_print() in mm/vmstat.c. I probably haven't (and will send a patch to fix this one - thanks for pointing to it) but the point is they those are broken already and they can be fixed in follow up patches. If you change pfn_valid you might break an existing code in an unexpected ways. -- Michal Hocko SUSE Labs
Re: your mail
On Fri, Apr 21, 2017 at 09:16:16AM +0200, Michal Hocko wrote: > On Fri 21-04-17 13:38:28, Joonsoo Kim wrote: > > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote: > > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote: > > > [...] > > > > > Which pfn walkers you have in mind? > > > > > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by > > > > using pfn_valid(). > > > > > > Yeah, I've checked that one and in fact this is a good example of the > > > case where you do not really care about holes. It just checks the page > > > count which is a valid information under any circumstances. > > > > I don't think so. First, it checks the page *map* count. Is it still valid > > even if PageReserved() is set? > > I do not know about any user which would manipulate page map count for > referenced pages. The core MM code doesn't. That's weird that we can get *map* count without PageReserved() check, but we cannot get zone information. Zone information is more static information than map count. It should be defined/documented in this time that what information in the struct page is valid even if PageReserved() is set. And then, we need to fix all the things based on this design decision. > > > What I'd like to ask in this example is > > that what information is valid if PageReserved() is set. Is there any > > design document on this? I think that we need to define/document it first. > > NO, it is not AFAIK. > > [...] > > > OK, fair enough. I did't consider memblock allocations. I will rethink > > > this patch but there are essentially 3 options > > > - use a different criterion for the offline holes dection. I > > > have just realized we might do it by storing the online > > > information into the mem sections > > > - drop this patch > > > - move the PageReferenced check down the chain into > > > isolate_freepages_block resp. isolate_migratepages_block > > > > > > I would prefer 3 over 2 over 1. I definitely want to make this more > > > robust so 1 is preferable long term but I do not want this to be a > > > roadblock to the rest of the rework. Does that sound acceptable to you? > > > > I like #1 among of above options and I already see your patch for #1. > > It's much better than your first attempt but I'm still not happy due > > to the semantic of pfn_valid(). > > You are trying to change a semantic of something that has a well defined > meaning. I disagree that we should change it. It might sound like a > simpler thing to do because pfn walkers will have to be checked but what > you are proposing is conflating two different things together. I don't think that *I* try to change the semantic of pfn_valid(). It would be original semantic of pfn_valid(). "If pfn_valid() returns true, we can get proper struct page and the zone information," That situation is now being changed by your patch *hotplug rework*. "Even if pfn_valid() returns true, we cannot get the zone information without PageReserved() check, since *zone is determined during onlining* and pfn_valid() return true after adding the memory." > > > > [..] > > > > Let me clarify my desire(?) for this issue. > > > > > > > > 1. If pfn_valid() returns true, struct page has valid information, at > > > > least, in flags (zone id, node id, flags, etc...). So, we can use them > > > > without checking PageResereved(). > > > > > > This is no longer true after my rework. Pages are associated with the > > > zone during _onlining_ rather than when they are physically hotpluged. > > > > If your rework make information valid during _onlining_, my > > suggestion is making pfn_valid() return false until onlining. > > > > Caller of pfn_valid() expects that they can get valid information from > > the struct page. There is no reason to access the struct page if they > > can't get valid information from it. So, passing pfn_valid() should > > guarantee that, at least, some kind of information is valid. > > > > If pfn_valid() doesn't guarantee it, most of the pfn walker should > > check PageResereved() to make sure that validity of information from > > the struct page. > > This is true only for those walkers which really depend on the full > initialization. This is not the case for all of them. I do not see any > reason to introduce another _pfn_valid to just check whether there is a > struct page... It's really confusing concept that only some information is valid for *not* fully initialized struct page. Even, there is no document that what information is valid for this half-initialized struct page. Better design would be that we regard that every information is invalid for half-initialized struct page. In this case, it's natural to make pfn_valid() returns false for this half-initialized struct page. > > So please do not conflate those two different concepts together. I > believe that the most prominent pfn walkers should be covered now and
Re: your mail
On Fri 21-04-17 13:38:28, Joonsoo Kim wrote: > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote: > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote: > > [...] > > > > Which pfn walkers you have in mind? > > > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by > > > using pfn_valid(). > > > > Yeah, I've checked that one and in fact this is a good example of the > > case where you do not really care about holes. It just checks the page > > count which is a valid information under any circumstances. > > I don't think so. First, it checks the page *map* count. Is it still valid > even if PageReserved() is set? I do not know about any user which would manipulate page map count for referenced pages. The core MM code doesn't. > What I'd like to ask in this example is > that what information is valid if PageReserved() is set. Is there any > design document on this? I think that we need to define/document it first. NO, it is not AFAIK. [...] > > OK, fair enough. I did't consider memblock allocations. I will rethink > > this patch but there are essentially 3 options > > - use a different criterion for the offline holes dection. I > > have just realized we might do it by storing the online > > information into the mem sections > > - drop this patch > > - move the PageReferenced check down the chain into > > isolate_freepages_block resp. isolate_migratepages_block > > > > I would prefer 3 over 2 over 1. I definitely want to make this more > > robust so 1 is preferable long term but I do not want this to be a > > roadblock to the rest of the rework. Does that sound acceptable to you? > > I like #1 among of above options and I already see your patch for #1. > It's much better than your first attempt but I'm still not happy due > to the semantic of pfn_valid(). You are trying to change a semantic of something that has a well defined meaning. I disagree that we should change it. It might sound like a simpler thing to do because pfn walkers will have to be checked but what you are proposing is conflating two different things together. > > [..] > > > Let me clarify my desire(?) for this issue. > > > > > > 1. If pfn_valid() returns true, struct page has valid information, at > > > least, in flags (zone id, node id, flags, etc...). So, we can use them > > > without checking PageResereved(). > > > > This is no longer true after my rework. Pages are associated with the > > zone during _onlining_ rather than when they are physically hotpluged. > > If your rework make information valid during _onlining_, my > suggestion is making pfn_valid() return false until onlining. > > Caller of pfn_valid() expects that they can get valid information from > the struct page. There is no reason to access the struct page if they > can't get valid information from it. So, passing pfn_valid() should > guarantee that, at least, some kind of information is valid. > > If pfn_valid() doesn't guarantee it, most of the pfn walker should > check PageResereved() to make sure that validity of information from > the struct page. This is true only for those walkers which really depend on the full initialization. This is not the case for all of them. I do not see any reason to introduce another _pfn_valid to just check whether there is a struct page... So please do not conflate those two different concepts together. I believe that the most prominent pfn walkers should be covered now and others can be evaluated later. -- Michal Hocko SUSE Labs
Re: your mail
On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote: > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote: > [...] > > > Which pfn walkers you have in mind? > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by > > using pfn_valid(). > > Yeah, I've checked that one and in fact this is a good example of the > case where you do not really care about holes. It just checks the page > count which is a valid information under any circumstances. I don't think so. First, it checks the page *map* count. Is it still valid even if PageReserved() is set? What I'd like to ask in this example is that what information is valid if PageReserved() is set. Is there any design document on this? I think that we need to define/document it first. And, I hope that all the information in flags field is valid in all cases if pfn_valid() return true. By the design. This makes all the exsiting pfn walkers happy since we don't need an additional check for PageReserved(). > > > > > The other problem I found is that your change will makes some > > > > contiguous zones to be considered as non-contiguous. Memory allocated > > > > by memblock API is also marked as PageResereved. If we consider this as > > > > a hole, we will set such a zone as non-contiguous. > > > > > > Why would that be a problem? We shouldn't touch those pages anyway? > > > > Skipping those pages in compaction are valid so no problem in this > > case. > > > > The problem I mentioned above is that adding PageReserved() check in > > __pageblock_pfn_to_page() invalidates optimization by > > set_zone_contiguous(). In compaction, we need to get a valid struct > > page and it requires a lot of work. There is performance problem > > report due to this so set_zone_contiguous() optimization is added. It > > checks if the zone is contiguous or not in boot time. If zone is > > determined as contiguous, we can easily get a valid struct page in > > runtime without expensive checks. > > OK, I see. I've had some vague understading and the clarification helps. > > > Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It > > woule make that zone->contiguous usually returns false since memory > > used by memblock API is marked as PageReserved() and your patch regard > > it as a hole. It invalidates set_zone_contiguous() optimization and I > > worry about it. > > OK, fair enough. I did't consider memblock allocations. I will rethink > this patch but there are essentially 3 options > - use a different criterion for the offline holes dection. I > have just realized we might do it by storing the online > information into the mem sections > - drop this patch > - move the PageReferenced check down the chain into > isolate_freepages_block resp. isolate_migratepages_block > > I would prefer 3 over 2 over 1. I definitely want to make this more > robust so 1 is preferable long term but I do not want this to be a > roadblock to the rest of the rework. Does that sound acceptable to you? I like #1 among of above options and I already see your patch for #1. It's much better than your first attempt but I'm still not happy due to the semantic of pfn_valid(). > [..] > > Let me clarify my desire(?) for this issue. > > > > 1. If pfn_valid() returns true, struct page has valid information, at > > least, in flags (zone id, node id, flags, etc...). So, we can use them > > without checking PageResereved(). > > This is no longer true after my rework. Pages are associated with the > zone during _onlining_ rather than when they are physically hotpluged. If your rework make information valid during _onlining_, my suggestion is making pfn_valid() return false until onlining. Caller of pfn_valid() expects that they can get valid information from the struct page. There is no reason to access the struct page if they can't get valid information from it. So, passing pfn_valid() should guarantee that, at least, some kind of information is valid. If pfn_valid() doesn't guarantee it, most of the pfn walker should check PageResereved() to make sure that validity of information from the struct page. > Basically only the nid is set properly. Strictly speaking this is the > case also without my rework because the zone might change during online > phase so you cannot assume it is correct even now. It just happens that > it more or less works just fine. > > > 2. pfn_valid() for offlined holes returns false. This can be easily > > (?) implemented by manipulating SECTION_MAP_MASK in hotplug code. I > > guess that there is no reason that pfn_valid() returns true for > > offlined holes. If there is, please let me know. > > There is some code which really expects that pfn_valid returns true iff > there is a struct page and it doesn't care about the online status. > E.g. hotplug code itself so no, we cannot change pfn_valid. What we can > do though is to add pfn_to_onl
Re: your mail
On Thu 20-04-17 13:56:34, Vlastimil Babka wrote: > On 04/20/2017 10:49 AM, Michal Hocko wrote: > > On Thu 20-04-17 09:28:20, Michal Hocko wrote: > >> On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > > [...] > >>> Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It > >>> woule make that zone->contiguous usually returns false since memory > >>> used by memblock API is marked as PageReserved() and your patch regard > >>> it as a hole. It invalidates set_zone_contiguous() optimization and I > >>> worry about it. > >> > >> OK, fair enough. I did't consider memblock allocations. I will rethink > >> this patch but there are essentially 3 options > >>- use a different criterion for the offline holes dection. I > >> have just realized we might do it by storing the online > >> information into the mem sections > >>- drop this patch > >>- move the PageReferenced check down the chain into > >> isolate_freepages_block resp. isolate_migratepages_block > >> > >> I would prefer 3 over 2 over 1. I definitely want to make this more > >> robust so 1 is preferable long term but I do not want this to be a > >> roadblock to the rest of the rework. Does that sound acceptable to you? > > > > So I've played with all three options just to see how the outcome would > > look like and it turned out that going with 1 will be easiest in the > > end. What do you think about the following? It should be free of any > > false positives. I have only compile tested it yet. > > That looks fine, can't say immediately if fully correct. I think you'll > need to bump SECTION_NID_SHIFT as well and make sure things still fit? > Otherwise looks like nobody needed a new section bit since 2005, so we > should be fine. You are absolutely right. Thanks for spotting this! I have folded this in diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 611ff869fa4d..c412e6a3a1e9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1166,7 +1166,7 @@ extern unsigned long usemap_size(void); #define SECTION_IS_ONLINE (1UL<<2) #define SECTION_MAP_LAST_BIT (1UL<<3) #define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) -#define SECTION_NID_SHIFT 2 +#define SECTION_NID_SHIFT 3 static inline struct page *__section_mem_map_addr(struct mem_section *section) { -- Michal Hocko SUSE Labs
Re: your mail
On 04/20/2017 10:49 AM, Michal Hocko wrote: > On Thu 20-04-17 09:28:20, Michal Hocko wrote: >> On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > [...] >>> Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It >>> woule make that zone->contiguous usually returns false since memory >>> used by memblock API is marked as PageReserved() and your patch regard >>> it as a hole. It invalidates set_zone_contiguous() optimization and I >>> worry about it. >> >> OK, fair enough. I did't consider memblock allocations. I will rethink >> this patch but there are essentially 3 options >> - use a different criterion for the offline holes dection. I >>have just realized we might do it by storing the online >>information into the mem sections >> - drop this patch >> - move the PageReferenced check down the chain into >>isolate_freepages_block resp. isolate_migratepages_block >> >> I would prefer 3 over 2 over 1. I definitely want to make this more >> robust so 1 is preferable long term but I do not want this to be a >> roadblock to the rest of the rework. Does that sound acceptable to you? > > So I've played with all three options just to see how the outcome would > look like and it turned out that going with 1 will be easiest in the > end. What do you think about the following? It should be free of any > false positives. I have only compile tested it yet. That looks fine, can't say immediately if fully correct. I think you'll need to bump SECTION_NID_SHIFT as well and make sure things still fit? Otherwise looks like nobody needed a new section bit since 2005, so we should be fine. > --- > From 747794c13c0e82b55b793a31cdbe1a84ee1c6920 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Thu, 13 Apr 2017 10:28:45 +0200 > Subject: [PATCH] mm: consider zone which is not fully populated to have holes > > __pageblock_pfn_to_page has two users currently, set_zone_contiguous > which checks whether the given zone contains holes and > pageblock_pfn_to_page which then carefully returns a first valid > page from the given pfn range for the given zone. This doesn't handle > zones which are not fully populated though. Memory pageblocks can be > offlined or might not have been onlined yet. In such a case the zone > should be considered to have holes otherwise pfn walkers can touch > and play with offline pages. > > Current callers of pageblock_pfn_to_page in compaction seem to work > properly right now because they only isolate PageBuddy > (isolate_freepages_block) or PageLRU resp. __PageMovable > (isolate_migratepages_block) which will be always false for these pages. > It would be safer to skip these pages altogether, though. > > In order to do this patch adds a new memory section state > (SECTION_IS_ONLINE) which is set in memory_present (during boot > time) or in online_pages_range during the memory hotplug. Similarly > offline_mem_sections clears the bit and it is called when the memory > range is offlined. > > pfn_to_online_page helper is then added which check the mem section and > only returns a page if it is onlined already. > > Use the new helper in __pageblock_pfn_to_page and skip the whole page > block in such a case. > > Signed-off-by: Michal Hocko > --- > include/linux/memory_hotplug.h | 21 > include/linux/mmzone.h | 20 ++- > mm/memory_hotplug.c| 3 +++ > mm/page_alloc.c| 5 - > mm/sparse.c| 45 > +- > 5 files changed, 91 insertions(+), 3 deletions(-) > > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index 3c8cf86201c3..fc1c873504eb 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -14,6 +14,19 @@ struct memory_block; > struct resource; > > #ifdef CONFIG_MEMORY_HOTPLUG > +/* > + * Return page for the valid pfn only if the page is online. All pfn > + * walkers which rely on the fully initialized page->flags and others > + * should use this rather than pfn_valid && pfn_to_page > + */ > +#define pfn_to_online_page(pfn) \ > +({ \ > + struct page *___page = NULL;\ > + \ > + if (online_section_nr(pfn_to_section_nr(pfn))) \ > + ___page = pfn_to_page(pfn); \ > + ___page;\ > +}) > > /* > * Types for free bootmem stored in page->lru.next. These have to be in > @@ -203,6 +216,14 @@ extern void set_zone_contiguous(struct zone *zone); > extern void clear_zone_contiguous(struct zone *zone); > > #else /* ! CONFIG_MEMORY_HOTPLUG */ > +#define pfn_to_online_page(pfn) \ > +({ \ > + struct page *___page = NULL;\ > + if (pfn_valid(pfn))
Re: your mail
On Thu 20-04-17 09:28:20, Michal Hocko wrote: > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: [...] > > Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It > > woule make that zone->contiguous usually returns false since memory > > used by memblock API is marked as PageReserved() and your patch regard > > it as a hole. It invalidates set_zone_contiguous() optimization and I > > worry about it. > > OK, fair enough. I did't consider memblock allocations. I will rethink > this patch but there are essentially 3 options > - use a different criterion for the offline holes dection. I > have just realized we might do it by storing the online > information into the mem sections > - drop this patch > - move the PageReferenced check down the chain into > isolate_freepages_block resp. isolate_migratepages_block > > I would prefer 3 over 2 over 1. I definitely want to make this more > robust so 1 is preferable long term but I do not want this to be a > roadblock to the rest of the rework. Does that sound acceptable to you? So I've played with all three options just to see how the outcome would look like and it turned out that going with 1 will be easiest in the end. What do you think about the following? It should be free of any false positives. I have only compile tested it yet. --- >From 747794c13c0e82b55b793a31cdbe1a84ee1c6920 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 13 Apr 2017 10:28:45 +0200 Subject: [PATCH] mm: consider zone which is not fully populated to have holes __pageblock_pfn_to_page has two users currently, set_zone_contiguous which checks whether the given zone contains holes and pageblock_pfn_to_page which then carefully returns a first valid page from the given pfn range for the given zone. This doesn't handle zones which are not fully populated though. Memory pageblocks can be offlined or might not have been onlined yet. In such a case the zone should be considered to have holes otherwise pfn walkers can touch and play with offline pages. Current callers of pageblock_pfn_to_page in compaction seem to work properly right now because they only isolate PageBuddy (isolate_freepages_block) or PageLRU resp. __PageMovable (isolate_migratepages_block) which will be always false for these pages. It would be safer to skip these pages altogether, though. In order to do this patch adds a new memory section state (SECTION_IS_ONLINE) which is set in memory_present (during boot time) or in online_pages_range during the memory hotplug. Similarly offline_mem_sections clears the bit and it is called when the memory range is offlined. pfn_to_online_page helper is then added which check the mem section and only returns a page if it is onlined already. Use the new helper in __pageblock_pfn_to_page and skip the whole page block in such a case. Signed-off-by: Michal Hocko --- include/linux/memory_hotplug.h | 21 include/linux/mmzone.h | 20 ++- mm/memory_hotplug.c| 3 +++ mm/page_alloc.c| 5 - mm/sparse.c| 45 +- 5 files changed, 91 insertions(+), 3 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 3c8cf86201c3..fc1c873504eb 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -14,6 +14,19 @@ struct memory_block; struct resource; #ifdef CONFIG_MEMORY_HOTPLUG +/* + * Return page for the valid pfn only if the page is online. All pfn + * walkers which rely on the fully initialized page->flags and others + * should use this rather than pfn_valid && pfn_to_page + */ +#define pfn_to_online_page(pfn)\ +({ \ + struct page *___page = NULL;\ + \ + if (online_section_nr(pfn_to_section_nr(pfn))) \ + ___page = pfn_to_page(pfn); \ + ___page;\ +}) /* * Types for free bootmem stored in page->lru.next. These have to be in @@ -203,6 +216,14 @@ extern void set_zone_contiguous(struct zone *zone); extern void clear_zone_contiguous(struct zone *zone); #else /* ! CONFIG_MEMORY_HOTPLUG */ +#define pfn_to_online_page(pfn)\ +({ \ + struct page *___page = NULL;\ + if (pfn_valid(pfn)) \ + ___page = pfn_to_page(pfn); \ + ___page;\ + }) + /* * Stub functions for when hotplug is off */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0fc121bbf4ff..cad16ac080f5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1143,7 +1143,8 @@ extern unsigned long usemap_size(void); */ #defineSECTION_MARKE
Re: your mail
On Thu 20-04-17 10:27:55, Joonsoo Kim wrote: > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote: [...] > > Which pfn walkers you have in mind? > > For example, kpagecount_read() in fs/proc/page.c. I searched it by > using pfn_valid(). Yeah, I've checked that one and in fact this is a good example of the case where you do not really care about holes. It just checks the page count which is a valid information under any circumstances. > > > The other problem I found is that your change will makes some > > > contiguous zones to be considered as non-contiguous. Memory allocated > > > by memblock API is also marked as PageResereved. If we consider this as > > > a hole, we will set such a zone as non-contiguous. > > > > Why would that be a problem? We shouldn't touch those pages anyway? > > Skipping those pages in compaction are valid so no problem in this > case. > > The problem I mentioned above is that adding PageReserved() check in > __pageblock_pfn_to_page() invalidates optimization by > set_zone_contiguous(). In compaction, we need to get a valid struct > page and it requires a lot of work. There is performance problem > report due to this so set_zone_contiguous() optimization is added. It > checks if the zone is contiguous or not in boot time. If zone is > determined as contiguous, we can easily get a valid struct page in > runtime without expensive checks. OK, I see. I've had some vague understading and the clarification helps. > Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It > woule make that zone->contiguous usually returns false since memory > used by memblock API is marked as PageReserved() and your patch regard > it as a hole. It invalidates set_zone_contiguous() optimization and I > worry about it. OK, fair enough. I did't consider memblock allocations. I will rethink this patch but there are essentially 3 options - use a different criterion for the offline holes dection. I have just realized we might do it by storing the online information into the mem sections - drop this patch - move the PageReferenced check down the chain into isolate_freepages_block resp. isolate_migratepages_block I would prefer 3 over 2 over 1. I definitely want to make this more robust so 1 is preferable long term but I do not want this to be a roadblock to the rest of the rework. Does that sound acceptable to you? [..] > Let me clarify my desire(?) for this issue. > > 1. If pfn_valid() returns true, struct page has valid information, at > least, in flags (zone id, node id, flags, etc...). So, we can use them > without checking PageResereved(). This is no longer true after my rework. Pages are associated with the zone during _onlining_ rather than when they are physically hotpluged. Basically only the nid is set properly. Strictly speaking this is the case also without my rework because the zone might change during online phase so you cannot assume it is correct even now. It just happens that it more or less works just fine. > 2. pfn_valid() for offlined holes returns false. This can be easily > (?) implemented by manipulating SECTION_MAP_MASK in hotplug code. I > guess that there is no reason that pfn_valid() returns true for > offlined holes. If there is, please let me know. There is some code which really expects that pfn_valid returns true iff there is a struct page and it doesn't care about the online status. E.g. hotplug code itself so no, we cannot change pfn_valid. What we can do though is to add pfn_to_online_page which would do the proper check. I have already sent [1]. As noted above we can (ab)use the remaining bit in SECTION_MAP_MASK to detect offline pages more robustly. > 3. We don't need to check PageReserved() in most of pfn walkers in > order to check offline holes. We still have to distinguish those who care about offline pages from those who do not care about it. Thanks! -- Michal Hocko SUSE Labs
Re: your mail
On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote: > On Mon 17-04-17 14:47:20, Joonsoo Kim wrote: > > On Sat, Apr 15, 2017 at 02:17:31PM +0200, Michal Hocko wrote: > > > Hi, > > > here I 3 more preparatory patches which I meant to send on Thursday but > > > forgot... After more thinking about pfn walkers I have realized that > > > the current code doesn't check offline holes in zones. From a quick > > > review that doesn't seem to be a problem currently. Pfn walkers can race > > > with memory offlining and with the original hotplug impementation those > > > offline pages can change the zone but I wasn't able to find any serious > > > problem other than small confusion. The new hotplug code, will not have > > > any valid zone, though so those code paths should check PageReserved > > > to rule offline holes. I hope I have addressed all of them in these 3 > > > patches. I would appreciate if Vlastimil and Jonsoo double check after > > > me. > > > > Hello, Michal. > > > > s/Jonsoo/Joonsoo. :) > > ups, sorry about that. > > > I'm not sure that it's a good idea to add PageResereved() check in pfn > > walkers. First, this makes struct page validity check as two steps, > > pfn_valid() and then PageResereved(). > > Yes, those are two separate checkes because semantically they are > different. Not all pfn walkers do care about the online status. If offlined page has no valid information, reading information about offlined pages are just wrong. So, all pfn walkers that reads information about the page should do care about it. I guess that many callers for pfn_valid() is in this category. > > > If we should not use struct page > > in this case, it's better to pfn_valid() returns false rather than > > adding a separate check. Anyway, we need to fix more places (all pfn > > walker?) if we want to check validity by two steps. > > Which pfn walkers you have in mind? For example, kpagecount_read() in fs/proc/page.c. I searched it by using pfn_valid(). > > The other problem I found is that your change will makes some > > contiguous zones to be considered as non-contiguous. Memory allocated > > by memblock API is also marked as PageResereved. If we consider this as > > a hole, we will set such a zone as non-contiguous. > > Why would that be a problem? We shouldn't touch those pages anyway? Skipping those pages in compaction are valid so no problem in this case. The problem I mentioned above is that adding PageReserved() check in __pageblock_pfn_to_page() invalidates optimization by set_zone_contiguous(). In compaction, we need to get a valid struct page and it requires a lot of work. There is performance problem report due to this so set_zone_contiguous() optimization is added. It checks if the zone is contiguous or not in boot time. If zone is determined as contiguous, we can easily get a valid struct page in runtime without expensive checks. Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It woule make that zone->contiguous usually returns false since memory used by memblock API is marked as PageReserved() and your patch regard it as a hole. It invalidates set_zone_contiguous() optimization and I worry about it. > > > And, I guess that it's not enough to check PageResereved() in > > pageblock_pfn_to_page() in order to skip these pages in compaction. If > > holes are in the middle of the pageblock, pageblock_pfn_to_page() > > cannot catch it and compaction will use struct page for this hole. > > Yes pageblock_pfn_to_page cannot catch it and it wouldn't with the > current implementation anyway. So the implementation won't be any worse > than with the current code. On the other hand offline holes will always > fill the whole pageblock (assuming those are not spanning multiple > memblocks). > > > Therefore, I think that making pfn_valid() return false for not > > onlined memory is a better solution for this problem. I don't know the > > implementation detail for hotplug and I don't see your recent change > > but we may defer memmap initialization until the zone is determined. > > It will make pfn_valid() return false for un-initialized range. > > I am not really sure. pfn_valid is used in many context and its only > purpose is to tell whether pfn_to_page will return a valid struct page > AFAIU. > > I agree that having more checks is more error prone and we can add a > helper pfn_to_valid_page or something similar but I believe we can do > that on top of the current hotplug rework. This would require a non > trivial amount of changes and I believe that a lacking check for the > offline holes is not critical - we would (ab)use the lowest zone which > is similar to (ab)using ZONE_NORMAL/MOVABLE with the original code. I'm not objecting your hotplug rework. In fact, I don't know the relationship between this work and hotplug rework. I'm agreeing with checking offline holes but I don't like the design and implementation about it. Let me clarify my desire(?) for this issue. 1.
Re: your mail
On Mon 17-04-17 14:47:20, Joonsoo Kim wrote: > On Sat, Apr 15, 2017 at 02:17:31PM +0200, Michal Hocko wrote: > > Hi, > > here I 3 more preparatory patches which I meant to send on Thursday but > > forgot... After more thinking about pfn walkers I have realized that > > the current code doesn't check offline holes in zones. From a quick > > review that doesn't seem to be a problem currently. Pfn walkers can race > > with memory offlining and with the original hotplug impementation those > > offline pages can change the zone but I wasn't able to find any serious > > problem other than small confusion. The new hotplug code, will not have > > any valid zone, though so those code paths should check PageReserved > > to rule offline holes. I hope I have addressed all of them in these 3 > > patches. I would appreciate if Vlastimil and Jonsoo double check after > > me. > > Hello, Michal. > > s/Jonsoo/Joonsoo. :) ups, sorry about that. > I'm not sure that it's a good idea to add PageResereved() check in pfn > walkers. First, this makes struct page validity check as two steps, > pfn_valid() and then PageResereved(). Yes, those are two separate checkes because semantically they are different. Not all pfn walkers do care about the online status. > If we should not use struct page > in this case, it's better to pfn_valid() returns false rather than > adding a separate check. Anyway, we need to fix more places (all pfn > walker?) if we want to check validity by two steps. Which pfn walkers you have in mind? > The other problem I found is that your change will makes some > contiguous zones to be considered as non-contiguous. Memory allocated > by memblock API is also marked as PageResereved. If we consider this as > a hole, we will set such a zone as non-contiguous. Why would that be a problem? We shouldn't touch those pages anyway? > And, I guess that it's not enough to check PageResereved() in > pageblock_pfn_to_page() in order to skip these pages in compaction. If > holes are in the middle of the pageblock, pageblock_pfn_to_page() > cannot catch it and compaction will use struct page for this hole. Yes pageblock_pfn_to_page cannot catch it and it wouldn't with the current implementation anyway. So the implementation won't be any worse than with the current code. On the other hand offline holes will always fill the whole pageblock (assuming those are not spanning multiple memblocks). > Therefore, I think that making pfn_valid() return false for not > onlined memory is a better solution for this problem. I don't know the > implementation detail for hotplug and I don't see your recent change > but we may defer memmap initialization until the zone is determined. > It will make pfn_valid() return false for un-initialized range. I am not really sure. pfn_valid is used in many context and its only purpose is to tell whether pfn_to_page will return a valid struct page AFAIU. I agree that having more checks is more error prone and we can add a helper pfn_to_valid_page or something similar but I believe we can do that on top of the current hotplug rework. This would require a non trivial amount of changes and I believe that a lacking check for the offline holes is not critical - we would (ab)use the lowest zone which is similar to (ab)using ZONE_NORMAL/MOVABLE with the original code. Thanks! -- Michal Hocko SUSE Labs
Re: your mail
On Sat, Apr 15, 2017 at 02:17:31PM +0200, Michal Hocko wrote: > Hi, > here I 3 more preparatory patches which I meant to send on Thursday but > forgot... After more thinking about pfn walkers I have realized that > the current code doesn't check offline holes in zones. From a quick > review that doesn't seem to be a problem currently. Pfn walkers can race > with memory offlining and with the original hotplug impementation those > offline pages can change the zone but I wasn't able to find any serious > problem other than small confusion. The new hotplug code, will not have > any valid zone, though so those code paths should check PageReserved > to rule offline holes. I hope I have addressed all of them in these 3 > patches. I would appreciate if Vlastimil and Jonsoo double check after > me. Hello, Michal. s/Jonsoo/Joonsoo. :) I'm not sure that it's a good idea to add PageResereved() check in pfn walkers. First, this makes struct page validity check as two steps, pfn_valid() and then PageResereved(). If we should not use struct page in this case, it's better to pfn_valid() returns false rather than adding a separate check. Anyway, we need to fix more places (all pfn walker?) if we want to check validity by two steps. The other problem I found is that your change will makes some contiguous zones to be considered as non-contiguous. Memory allocated by memblock API is also marked as PageResereved. If we consider this as a hole, we will set such a zone as non-contiguous. And, I guess that it's not enough to check PageResereved() in pageblock_pfn_to_page() in order to skip these pages in compaction. If holes are in the middle of the pageblock, pageblock_pfn_to_page() cannot catch it and compaction will use struct page for this hole. Therefore, I think that making pfn_valid() return false for not onlined memory is a better solution for this problem. I don't know the implementation detail for hotplug and I don't see your recent change but we may defer memmap initialization until the zone is determined. It will make pfn_valid() return false for un-initialized range. Thanks.
Re: your mail
On Wed, Nov 16, 2016 at 09:25:43AM -0500, Steven Rostedt wrote: > On Wed, 16 Nov 2016 11:40:14 +0100 > Peter Zijlstra wrote: > > > > On top of which, the implementation had issues; now I know you're the > > blinder kind of person that disregards everything not in his immediate > > interest, but if you'd looked at the patch you'd have seen he'd added > > code the idle entry path, which will slow down every single to-idle > > transition. > > Isn't to-idle a bit bloated anyway? Or has that been fixed. I know > there was some issues with idle_balance() which can add latency to > wakeups. idle_balance() is also in the to-idle path. > Yes it is too heavy as is, but just stacking more crap in just because its already expensive seems to wrong way around.
Re: your mail
On Wed, 16 Nov 2016 11:40:14 +0100 Peter Zijlstra wrote: > On top of which, the implementation had issues; now I know you're the > blinder kind of person that disregards everything not in his immediate > interest, but if you'd looked at the patch you'd have seen he'd added > code the idle entry path, which will slow down every single to-idle > transition. Isn't to-idle a bit bloated anyway? Or has that been fixed. I know there was some issues with idle_balance() which can add latency to wakeups. idle_balance() is also in the to-idle path. Note, that this is a sched feature which would be a nop (jump_label) when disabled. And I'm sure it could also be optimized to be a static inline as well when it is enabled. I'm not saying we need to go this approach, but I'm just saying that the to-idle issue is a bit of a red herring. -- Steve
Re: your mail
On Tue, Nov 15, 2016 at 02:29:16PM -0600, Christoph Lameter wrote: > > > > There is a deadlock, Peter!!! > > > > Describe please? Also, have you tried disabling RT_RUNTIME_SHARE ? > > > > > The description was given earlier in the the thread and the drawbacks of > using RT_RUNTIME_SHARE as well. I've not seen a deadlock described. It either was an unbounded priority inversion or a starvation issue, both of which are 'design' features of the !rt kernel. Neither things are new, so its not a regression either. And, as stated, I'm not really happy to muck with this known troublesome code and add features for which we must then maintain feature parity when replacing it either. On top of which, the implementation had issues; now I know you're the blinder kind of person that disregards everything not in his immediate interest, but if you'd looked at the patch you'd have seen he'd added code the idle entry path, which will slow down every single to-idle transition.
Re: your mail
Subject line got dropped the first time around. Will send again. Apologies for the chatter, Andrew On Tue, Sep 20, 2016 at 05:21:06PM -0500, Andrew Banman wrote: > From Andrew Banman # This line is ignored. > From: Andrew Banman > Subject: [PATCH 0/9] arch/x86/platform/uv: add UV4 support to BAU > In-Reply-To: > > The following patch set adds support for UV4 architecture to the Broadcast > Assist Unit (BAU). Major hardware changes to the BAU require these fixes to > ensure correct operation and to avoid illegal MMR writes. > > arch/x86/include/asm/uv/uv_bau.h | 45 ++ > arch/x86/platform/uv/tlb_uv.c| 114 > --- > - > > The patch set can be thought of in three logical groups: > > 1) General cleanup. > > [PATCH 1/9] arch/x86/platform/uv: BAU cleanup: update printks > [PATCH 2/9] arch/x86/platform/uv: BAU cleanup: pq_init > [PATCH 3/9] arch/x86/platform/uv: BAU replace uv_physnodeaddr > > These housekeeping patches make the subsequent UV4 patches clearer, > and they should be done in any case. > > > 2) Implement a new scheme to abstract UV version-specific functions. > > [PATCH 4/9] arch/x86/platform/uv: BAU add generic function pointers > [PATCH 5/9] arch/x86/platform/uv: BAU use generic function pointers > > We add a struct of function pointers to define version-specific BAU > operations. The philosophy is to abstract functions that perform the same > operation on all UV versions but have different implementations. This will > simplify their use in the body of the driver code and greatly simplify the > UV4 patches to follow. > > > 3) Add UV4 functionality. > > [PATCH 6/9] arch/x86/platform/uv: BAU UV4 populate uvhub_version > [PATCH 7/9] arch/x86/platform/uv: BAU UV4 disable software timeout > [PATCH 8/9] arch/x86/platform/uv: BAU UV4 fix payload queue setup > [PATCH 9/9] arch/x86/platform/uv: BAU UV4 add version-specific > > These patches feature a minimal set of changes to make the BAU on UV4 > operational. > > > This patch set has been tested for regressions on pre-UV4 architectures and > for correct functionality on UV4. The patches apply cleanly to 4.8-rc7. > Fine-tuned performance tweaking for UV4 will come in a future patch set. > > > Thank you, > > Andrew Banman
Re: your mail
Returning EINVAL here is the wrong thing. Just leave the code as is. regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, Aug 03, 2015 at 11:48:59AM +0530, Shraddha Barke wrote: > From b67c6c20455b04b77447ab4561e44f1a75dd978d Mon Sep 17 00:00:00 2001 > From: Shraddha Barke > Date: Mon, 3 Aug 2015 11:34:19 +0530 > Subject: [PATCH] Staging : lustre : Use -EINVAL instead of -ENOSYS You do not need these in the commit message. regards sudip -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Wed, 21 Jan 2015, Jason Gunthorpe wrote: > [unfutzd the cc a bit, sorry] > > On Wed, Jan 21, 2015 at 04:19:17PM -0600, atull wrote: > > > If we consider a Zynq, for instance, there are a number of clock nets > > > that the CPU drives into the FPGA fabric. These nets are controlled by > > > the kernel CLK framework. So, before we program the FPGA bitstream the > > > clocks must be setup properly. > > > > It's pretty normal for drivers to find out what their clocks are from > > the DT and enable them. > > Sure, but the clocks are bitfile specific, and not related to > programming. Some bitfiles may not require CPU clocks at all. The bitfile specific clocks are the clocks that are turned on by the device driver for that chunk of the bitfile. So those clocks can be specified the same way as clocks are specied in the DT. > > > Yes the DT overlay can specify: > > * clock info > > * firmware file name if user is doing it that way > > * fpga manager - specific info > > * compatiblity string specifies what type of fpga it is > > * which fpga this image should go into > > * fpga/processor bridges to enable > > * driver(s) info that is dependent on the above > > All sounds reasonable > > > > Today in our Zynq systems we have the bootloader preconfigure > > > everything for what we are trying to do - but that is specific to the > > > particular FPGA we are expecting to run, and eg, I expect if we ran a > > > kernel using the Zynq clk framework there would be problems with it > > > mangling the configuration. > > > > > > So there would have to be some kind of sequence where the DT is > > > loaded, the zynq specific FPGA programmer does its pre setup, then the > > > request_firmarw/fpga_program_fw loads the bitstream and another pass > > > for a zynq specific post setup and completion handshake? > > > > fpga-mgr.c has the concept that each different FPGA family will > > likely need its own way of doing these 3 steps: > > * write_init (prepare fgpa for receiving configuration information) > > * write (write configuration info to the fpga) > > * write_complete (done writing, put fpga into running mode) > > > > There are callbacks into the manufacturor/fpga family specific lower > > level driver to do these things (as part of the "fpga_manager_ops" > > I think the missing bit here is that there are bitfile specific things > as well. > > The functions above are fine for a generic manufacturer bitfile loader, > ie Xilinx GPIO twiddling, Altera JTAG, Zynq DMA, etc. > > But wrappered around that should be another set of functions that are > bitfile specific. > > Like Zynq-PL-boot-protocol-v1 - which deasserts a reset line and waits > for the PL to signal back that it has completed reset. > > Or jgg-boot-protocol-v1 which monitors the configuration GPIOs for a > specific ready pattern.. > > Or ... > > All of those procedures depend on the bitfile to implement something. > > > > The DT needs to specify not only the bitstream programming HW to use > > > but this ancillary programming protocol. There are many ways to do > > > a out of reset and completion handshake on Zynq, for instance. > > > > Currently the lower level driver supports only one preferred method > > of programming. I guess we could add an enumerated DT property to > > select programming protocol. It would have to be manufacturor specific. > > Alternatively it could be encoded into the compatibility string if that > > makes sense. > > From a DT perspective I'd expect it to look something like: > > soc { > > // This is the 'how to program a bitstream' > fpga-bitstream0: zynq_pl_dma > { > compatible = "xilinx,zynq,pl,dma"; > regs = <..> > } > > fpga: .. > { > // This is 'what is in the bitstream' > boot-protocol = "xilinx,zynq,protocol1"; > compatible = "jgg,fpga-foo-bar"; > manager = @fpga-bitstream0 > clocks = .. > clock-frequency = ... > > zynq_axi_gp0 > { > // Settings for a CPU to FPGA AXI bridge >axi setting 1 = ... >[...] > } > } > } That's good. It also needs to specify the driver(s) for the hardware that the bitstream will instantiate. > > I could also see integrating with the regulator framework as well to > power up FPGA specific controllable power supplies. > > > > And then user space would need to have control points between each of > > > these steps. > > > We could have two options, configurable from the ioctl: > > * When the DT is loaded, do everything > > * Even when the DT is loaded, wait for further instructions from ioctl or > > > > Freewheeling flow: > > * Tell ioctl that we are in freewheeling mode > > * Load DT overlay > > > > Tightly controlled flow: > > * Tell ioctl that we are doing things stepwise > > * Load DT overlay > > * Use ioctl to step through getting the fpga loaded and known to be happy > > I think you've certainly got the idea! > > Thinking it through some more, if the kernel
Re: your mail
> The functions above are fine for a generic manufacturer bitfile loader, > ie Xilinx GPIO twiddling, Altera JTAG, Zynq DMA, etc. > > But wrappered around that should be another set of functions that are > bitfile specific. And also a transport layer. You can have the same FPGA with the same loader protocol off multiple different bus types (from USB to on CPU die and all the way between). Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
[unfutzd the cc a bit, sorry] On Wed, Jan 21, 2015 at 04:19:17PM -0600, atull wrote: > > If we consider a Zynq, for instance, there are a number of clock nets > > that the CPU drives into the FPGA fabric. These nets are controlled by > > the kernel CLK framework. So, before we program the FPGA bitstream the > > clocks must be setup properly. > > It's pretty normal for drivers to find out what their clocks are from > the DT and enable them. Sure, but the clocks are bitfile specific, and not related to programming. Some bitfiles may not require CPU clocks at all. > Yes the DT overlay can specify: > * clock info > * firmware file name if user is doing it that way > * fpga manager - specific info > * compatiblity string specifies what type of fpga it is > * which fpga this image should go into > * fpga/processor bridges to enable > * driver(s) info that is dependent on the above All sounds reasonable > > Today in our Zynq systems we have the bootloader preconfigure > > everything for what we are trying to do - but that is specific to the > > particular FPGA we are expecting to run, and eg, I expect if we ran a > > kernel using the Zynq clk framework there would be problems with it > > mangling the configuration. > > > > So there would have to be some kind of sequence where the DT is > > loaded, the zynq specific FPGA programmer does its pre setup, then the > > request_firmarw/fpga_program_fw loads the bitstream and another pass > > for a zynq specific post setup and completion handshake? > > fpga-mgr.c has the concept that each different FPGA family will > likely need its own way of doing these 3 steps: > * write_init (prepare fgpa for receiving configuration information) > * write (write configuration info to the fpga) > * write_complete (done writing, put fpga into running mode) > > There are callbacks into the manufacturor/fpga family specific lower > level driver to do these things (as part of the "fpga_manager_ops" I think the missing bit here is that there are bitfile specific things as well. The functions above are fine for a generic manufacturer bitfile loader, ie Xilinx GPIO twiddling, Altera JTAG, Zynq DMA, etc. But wrappered around that should be another set of functions that are bitfile specific. Like Zynq-PL-boot-protocol-v1 - which deasserts a reset line and waits for the PL to signal back that it has completed reset. Or jgg-boot-protocol-v1 which monitors the configuration GPIOs for a specific ready pattern.. Or ... All of those procedures depend on the bitfile to implement something. > > The DT needs to specify not only the bitstream programming HW to use > > but this ancillary programming protocol. There are many ways to do > > a out of reset and completion handshake on Zynq, for instance. > > Currently the lower level driver supports only one preferred method > of programming. I guess we could add an enumerated DT property to > select programming protocol. It would have to be manufacturor specific. > Alternatively it could be encoded into the compatibility string if that > makes sense. >From a DT perspective I'd expect it to look something like: soc { // This is the 'how to program a bitstream' fpga-bitstream0: zynq_pl_dma { compatible = "xilinx,zynq,pl,dma"; regs = <..> } fpga: .. { // This is 'what is in the bitstream' boot-protocol = "xilinx,zynq,protocol1"; compatible = "jgg,fpga-foo-bar"; manager = @fpga-bitstream0 clocks = .. clock-frequency = ... zynq_axi_gp0 { // Settings for a CPU to FPGA AXI bridge axi setting 1 = ... [...] } } } I could also see integrating with the regulator framework as well to power up FPGA specific controllable power supplies. > > And then user space would need to have control points between each of > > these steps. > We could have two options, configurable from the ioctl: > * When the DT is loaded, do everything > * Even when the DT is loaded, wait for further instructions from ioctl or > > Freewheeling flow: > * Tell ioctl that we are in freewheeling mode > * Load DT overlay > > Tightly controlled flow: > * Tell ioctl that we are doing things stepwise > * Load DT overlay > * Use ioctl to step through getting the fpga loaded and known to be happy I think you've certainly got the idea! Thinking it through some more, if the kernel DT tells the fpga-mgr that it is boot-protocol = "xilinx,zynq,protocol1","jgg,foo-bar"; Then the kernel should refuse to start it if it doesn't know how to do both 'xilinx,zynq,protocol1' and 'jgg,foo-bar'. Thus the user space ioctl interface becomes more of how to implement a boot protocol helper in userspace? With the proper locking - while the helper is working the FPGA cannot be messed with.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-
Re: your mail
Hello, Christoph. On Wed, Oct 15, 2014 at 03:10:37AM -0500, Christoph Lameter wrote: > Subject: Convert remaining __get_cpu_var uses > > During the 3.18 merge period additional __get_cpu_var uses were > added. The patch converts these to this_cpu_ptr(). > > [This does not address the powerpc issue where the conversion > patches were routed directly to the powerpc maintainers but were > not applied in the merge period. Will have to be handled separately] > > Signed-off-by: Christoph Lameter Can you please repost with proper subject line and the subsys maintainers cc'd? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
No subject. It should be a subject about adding spaces. regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, Sep 01, 2014 at 03:34:56PM +0800, sunwxg wrote: > From: Sun Wang > > Subject: [PATCH] staging: unisys: visorutil: procobjecttree: fix coding style > issue > Your email headers are mangled. The subject is vague. regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Wed, Jul 09, 2014 at 10:03:32AM +0900, James Ban wrote: > > > + ret = regmap_read(chip->regmap, DA9211_REG_EVENT_B, ®_val); > > > + if (ret < 0) > > > + goto error_i2c; > > > + if (reg_val & DA9211_E_OV_CURR_A) { > > > + if (reg_val & DA9211_E_OV_CURR_B) { > > > + return IRQ_HANDLED; > > This is buggy - the driver should only return IRQ_HANDLED if it handled the > > interrupt somehow, otherwise it should return IRQ_NONE and let the interrupt > > core handle things. This is especially important since the device appears > > to > > require that interrupts are explicitly acknoweldged so if something is > > flagged > > but not handled the interrupt will just sit constantly asserted. > Basically all interrupts are masked when the chip wakes up. > Only two interrupts are unmasked at the start of driver like below. I know that's the intention but the code should still be written robustly - something might go wrong somewhere which causes another interrupt to be enabled, or we might even gain support for shared threaded interrupts in the interrupt core and someone could then try to use that in a system. signature.asc Description: Digital signature
Re: your mail
On Tue, Feb 25, 2014 at 11:20:11AM +, srikanth TS wrote: > > On Feb 25, 2014 2:28 AM, "Will Deacon" > mailto:will.dea...@arm.com>> wrote: > > > > On Mon, Feb 24, 2014 at 03:12:21PM +, srikanth TS wrote: > > > Hi Will Deacon, > > > > Hello, > > > > > Currently SMMU driver expecting all stream ID used by respective master > > > should be defined in the DT. > > > > > > We want to know how to handle in the case of virtual functions dynamically > > > created and destroyed. > > > > > > Is PCI driver responsible for creating stream ID respective BDand > > > requesting SMMU to add to the mapping table[stream Id to context mapping > > > table]? > > > > > > Or is there any right way of doing it? > > > > Correct, the driver currently doesn't support dynamic mappings (mainly > > because I didn't want to try and invent something that I couldn't test). > > > > There are a couple of ways to solve this: > > > > (1) Add a way for a PCI RC to dynamically allocate StreamIDs on an SMMU > > within a fixed range. That would probably need some code in the bus > > layer, so that a bus notifier can kick and call back to the relevant > > SMMU. > > I think first way of solving seems to be better, because we don't know how > many > > VF are used and i feel its not good idea to keep whole list of streamID > [which is > > equal to max num vf] in DT. Again in this method we need to generate the > stream ID > > dynamically whenever VF is added in pci iov driver side. And then pass that > > stream ID to SMMU. > > Is it ok this way? Or you prefer 2nd way which is simpler. I'm happy either way, but I'd need to see some patches before I can merge anything ;) Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: your mail
> -Original Message- > From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > boun...@lists.linux-foundation.org] On Behalf Of Will Deacon > Sent: Monday, February 24, 2014 10:59 PM > To: srikanth TS > Cc: io...@lists.linux-foundation.org; sungjinn.ch...@samsung.com; linux- > ker...@vger.kernel.org; ts.srika...@samsung.com > Subject: Re: your mail > > On Mon, Feb 24, 2014 at 03:12:21PM +, srikanth TS wrote: > > Hi Will Deacon, > > Hello, > > > Currently SMMU driver expecting all stream ID used by respective > > master should be defined in the DT. > > > > We want to know how to handle in the case of virtual functions > > dynamically created and destroyed. > > > > Is PCI driver responsible for creating stream ID respective BDand > > requesting SMMU to add to the mapping table[stream Id to context > > mapping table]? > > > > Or is there any right way of doing it? > > Correct, the driver currently doesn't support dynamic mappings (mainly > because I didn't want to try and invent something that I couldn't test). > > There are a couple of ways to solve this: > > (1) Add a way for a PCI RC to dynamically allocate StreamIDs on an SMMU > within a fixed range. That would probably need some code in the bus > layer, so that a bus notifier can kick and call back to the > relevant > SMMU. This could be done in add device notifier. I am working on similar(not PCI) hot plug device infrastructure for arm smmu driver. I will post an RFC patch by next week. -Varun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, Feb 24, 2014 at 03:12:21PM +, srikanth TS wrote: > Hi Will Deacon, Hello, > Currently SMMU driver expecting all stream ID used by respective master > should be defined in the DT. > > We want to know how to handle in the case of virtual functions dynamically > created and destroyed. > > Is PCI driver responsible for creating stream ID respective BDand > requesting SMMU to add to the mapping table[stream Id to context mapping > table]? > > Or is there any right way of doing it? Correct, the driver currently doesn't support dynamic mappings (mainly because I didn't want to try and invent something that I couldn't test). There are a couple of ways to solve this: (1) Add a way for a PCI RC to dynamically allocate StreamIDs on an SMMU within a fixed range. That would probably need some code in the bus layer, so that a bus notifier can kick and call back to the relevant SMMU. (2) Describe the RID -> SID mapping in the device-tree. We probably want to avoid an enormous table, so this would only work for simple `SID = RID + offset' or 'SID = RID & mask' cases. How do your IDs map to each other? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Thu, Jan 23, 2014 at 02:36:05PM +0530, Prabhakar Lad wrote: > Hi Mark, Please use a subject line for your e-mails, otherwise they look a lot like spam. > So currently I am booting it traditional way (NON DT way) and > regulator_dev_lookup() > fails (return NULL) and for this check it fails. > +if (ret && ret != -ENODEV) { > regulator = ERR_PTR(ret); > goto out; > } > In the NON-DT case the 'ret' is never updated in regulator_dev_lookup(). What is the problem you're trying to report here? You're describing the behaviour of the code but I don't understand the problem. signature.asc Description: Digital signature
Re: your mail
On Thu, Jan 09, 2014 at 06:49:39PM +, Joe Bor?? wrote: > > I didn't do the changes as root, I sent them from my server as it has SMTP > out. > Hmm, this gives me an idea. There's nothing, I believe, that makes the root user have to have the name "root" except for the passwd file. Maybe I'll just rename "root" to "walley" and then use "root" as my normal account. If anyone tries to break into "root" they will just gain access to a normal account and nothing more ;-) /me goes back to hacking -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
Hi Greg, I'll re do them tonight. I didn't do the changes as root, I sent them from my server as it has SMTP out. Thanks Regards, Joseph David Borġ http://www.jdborg.com On 9 January 2014 18:39, Greg KH wrote: > On Mon, Dec 30, 2013 at 05:40:44PM +, Joe Borg wrote: >> >From 6d9f6446434c4021cc9452e31c374ac50e08f0f9 Mon Sep 17 00:00:00 2001 >> From: Joe Borg > > This isn't matching your "from:" line on your email, why should I trust > it? > > And doing kernel work as 'root'? That's not a good idea for lots of > reasons... > >> Date: Mon, 30 Dec 2013 15:35:08 + >> Subject: [PATCH 62/62] DAS1800: Fixing error from checkpatch. >> >> Fixed pointer typeo; foo * bar should be foo *bar. >> >> Signed-off by Joe Borg > > What happened to your Subject:? > > And why is the whole git header in the email, please use git send-email > so that I don't have to hand-edit the body of the email to apply it. > > Can you please fix this up and resend? > > thanks, > > greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, Dec 30, 2013 at 05:40:44PM +, Joe Borg wrote: > >From 6d9f6446434c4021cc9452e31c374ac50e08f0f9 Mon Sep 17 00:00:00 2001 > From: Joe Borg This isn't matching your "from:" line on your email, why should I trust it? And doing kernel work as 'root'? That's not a good idea for lots of reasons... > Date: Mon, 30 Dec 2013 15:35:08 + > Subject: [PATCH 62/62] DAS1800: Fixing error from checkpatch. > > Fixed pointer typeo; foo * bar should be foo *bar. > > Signed-off by Joe Borg What happened to your Subject:? And why is the whole git header in the email, please use git send-email so that I don't have to hand-edit the body of the email to apply it. Can you please fix this up and resend? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, Nov 26, 2012 at 10:14:44PM +0400, Vasiliy Tolstov wrote: > Hello, Greg. Hello kernel team! I'm system enginer at clodo.ru (russian cloud > hosting provider) we are use xen and sles11-sp2 for our compute xen nodes. > Each virtual machine (domU) have disks that attached by Infiniband SRP. On top > of disk that attached by srp we use multipath (to do failover) > Now we have issues like all commands that uses multipath hang while one > storage > is rebooted. > After some discussion with maintainer of linux-rdma (Bart Van Assche) and > using > it backported ib_srp with HA patches we can't solve deadlock issues. Bart > thinks that SLES team does not backport some core scsi patches to their kernel > (3.0.42) to prevent multipath deadlock (currently is about 2.5 minutes) on > failed target. > Is that possible to determine or getting help to solve this issue? As you are using SLES, please contact the SUSE for support for that kernel, as you are paying for it, and the community can't do anything to support their kernel, sorry. Best of luck, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
Please ignore. On Wed, Oct 10, 2012 at 10:06:53AM -0500, Kent Yoder wrote: > The following changes since commit ecefbd94b834fa32559d854646d777c56749ef1c: > > Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm > (2012-10-04 09:30:33 -0700) > > are available in the git repository at: > > > git://github.com/shpedoikal/linux.git tpmdd-fixes-v3.6 > > for you to fetch changes up to 1631cfb7cee28388b04aef6c0a73050f6fd76e4d: > > driver/char/tpm: fix regression causesd by ppi (2012-10-10 09:50:56 -0500) > > > Gang Wei (1): > driver/char/tpm: fix regression causesd by ppi > > drivers/char/tpm/tpm.c | 3 ++- > drivers/char/tpm/tpm.h | 9 +++-- > drivers/char/tpm/tpm_ppi.c | 18 ++ > 3 files changed, 19 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Thu, 4 Oct 2012, Andrea Arcangeli wrote: > So we could drop page_autonuma by creating a CONFIG_SLUB=y dependency > (AUTONUMA wouldn't be available in the kernel config if SLAB=y, and it > also wouldn't be available on 32bit archs but the latter isn't a > problem). Nope it should depend on page struct alignment. Other kernel subsystems may be depeding on page struct alignment in the future (and some other arches may already have that requirement) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Fri, Aug 03, 2012 at 10:43:45AM -0700, Tejun Heo wrote: > delayed_work has been annoyingly missing the mechanism to modify timer > of a pending delayed_work - ie. mod_timer() counterpart. delayed_work > users have been working around this using several methods - using an > explicit timer + work item, messing directly with delayed_work->timer, > and canceling before re-queueing, all of which are error-prone and/or > ugly. > > Gustavo Padovan posted a RFC implementation[1] of mod_delayed_work() a > while back but it wasn't complete. To properly implement > mod_delayed_work[_on](), it should be able to steal pending work items > which may be on timer or worklist or anywhere inbetween. This is > similar to what __cancel_work_timer() does but it turns out that there > are a lot of holes around this area and try_to_grab_pending() needs > considerable amount of work to be used for other purposes too. > > This patchset improves canceling and try_to_grab_pending(), use it to > implement mod_delayed_work[_on](), convert easy ones, and drop > __cancel_delayed_work_sync() which doesn't have relevant users > afterwards. Applied to wq/for-3.7. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
>Nope, wrong clues. >The right clues are in the footer of this message after it travels thru the >list. > >I supplied them to Nicholas already, but apparently others need to be reminded >of >them every now and then :-] That footer is in these list messages for a >reason! > >/Matti Aarnio -- one of <[EMAIL PROTECTED]> > >PS: You want to contact VGER's email and list managers ? >We use the internet email standard address "postmaster" > Jan, Matti, + List, I am very sorry about the noise, that's what I get for using cut and paste while tired and before my third cup of coffee. ;p Apologies. Nick >>On Wed, Oct 17, 2007 at 06:36:19PM +0200, Jan Engelhardt wrote: >> Date: Wed, 17 Oct 2007 18:36:19 +0200 (CEST) >> From: Jan Engelhardt <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> cc: linux-kernel@vger.kernel.org >> Subject: Re: your mail >> >> On Oct 17 2007 16:30, [EMAIL PROTECTED] wrote: >> >Date: Wed, 17 Oct 2007 16:30:24 + >> >From: <[EMAIL PROTECTED]> >> >To: >> ^^ > >> > >>subscribe linux-alpha >> ^ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
Nope, wrong clues. The right clues are in the footer of this message after it travels thru the list. I supplied them to Nicholas already, but apparently others need to be reminded of them every now and then :-] That footer is in these list messages for a reason! /Matti Aarnio -- one of <[EMAIL PROTECTED]> PS: You want to contact VGER's email and list managers ? We use the internet email standard address "postmaster" On Wed, Oct 17, 2007 at 06:36:19PM +0200, Jan Engelhardt wrote: > Date: Wed, 17 Oct 2007 18:36:19 +0200 (CEST) > From: Jan Engelhardt <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > cc: linux-kernel@vger.kernel.org > Subject: Re: your mail > > On Oct 17 2007 16:30, [EMAIL PROTECTED] wrote: > >Date: Wed, 17 Oct 2007 16:30:24 + > >From: <[EMAIL PROTECTED]> > >To: > ^^ > > > >subscribe linux-alpha > ^ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Oct 17 2007 16:30, [EMAIL PROTECTED] wrote: >Date: Wed, 17 Oct 2007 16:30:24 + >From: <[EMAIL PROTECTED]> >To: ^^ > >subscribe linux-alpha ^ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
-- On Mon, 24 Sep 2007, Steven Rostedt wrote: > [EMAIL PROTECTED] > Bcc: > Subject: Re: realtime preemption performance difference > Reply-To: > In-Reply-To: <[EMAIL PROTECTED]> [ I'm actually just learning how to screw-up^Wuse mutt ] bah! -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Thu, Aug 16, 2007 at 06:06:00AM +0530, Satyam Sharma wrote: > > that are: > > while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { > mdelay(1); > msecs--; > } > > where mdelay() becomes __const_udelay() which happens to be in another > translation unit (arch/i386/lib/delay.c) and hence saves this callsite > from being a bug :-) The udelay itself certainly should have some form of cpu_relax in it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
Olof Johansson wrote: [Wed May 16 2007, 01:11:00PM EDT] > On Wed, May 16, 2007 at 11:43:41AM -0500, Linas Vepstas wrote: > > On Wed, May 16, 2007 at 09:30:46AM -0400, Bob Picco wrote: > > > Subject: Re: 2.6.22-rc1-mm1 powerpc build breakage > > > > > > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: > > > error: unknown field `subsys' specified in initializer > > > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: > > > warning: initialization from incompatible pointer type > > > make[4]: *** [drivers/pci/hotplug/rpadlpar_sysfs.o] Error 1 > > > make[3]: *** [drivers/pci/hotplug] Error 2 > > > make[2]: *** [drivers/pci] Error 2 > > > make[1]: *** [drivers] Error 2 > > > make: *** [_all] Error 2 > > > > John Rose is working to fix this "real soon now". > > Do you mean the fix Al Viro posted yesterday? > > http://patchwork.ozlabs.org/linuxppc/patch?id=11177 > > > -Olof Missed that patch. thanks, bob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Wed, May 16, 2007 at 11:43:41AM -0500, Linas Vepstas wrote: > On Wed, May 16, 2007 at 09:30:46AM -0400, Bob Picco wrote: > > Subject: Re: 2.6.22-rc1-mm1 powerpc build breakage > > > > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: > > error: unknown field `subsys' specified in initializer > > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: > > warning: initialization from incompatible pointer type > > make[4]: *** [drivers/pci/hotplug/rpadlpar_sysfs.o] Error 1 > > make[3]: *** [drivers/pci/hotplug] Error 2 > > make[2]: *** [drivers/pci] Error 2 > > make[1]: *** [drivers] Error 2 > > make: *** [_all] Error 2 > > John Rose is working to fix this "real soon now". Do you mean the fix Al Viro posted yesterday? http://patchwork.ozlabs.org/linuxppc/patch?id=11177 -Olof - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Wed, May 16, 2007 at 09:30:46AM -0400, Bob Picco wrote: > Subject: Re: 2.6.22-rc1-mm1 powerpc build breakage > > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: > error: unknown field `subsys' specified in initializer > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: > warning: initialization from incompatible pointer type > make[4]: *** [drivers/pci/hotplug/rpadlpar_sysfs.o] Error 1 > make[3]: *** [drivers/pci/hotplug] Error 2 > make[2]: *** [drivers/pci] Error 2 > make[1]: *** [drivers] Error 2 > make: *** [_all] Error 2 John Rose is working to fix this "real soon now". --linas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
Sorry, this wasn't supposed to happen. Already done... Unsubscribed due to lack of a digest mail. I wonder why people can't send their unsubscribe message to the same address they sent their subscribe message to. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
From: Jan Engelhardt <[EMAIL PROTECTED]> Date: Thu, 29 Mar 2007 23:42:17 +0200 (MEST) > > unsubscribe linux-kernel .. > > I wonder why people can't send their unsubscribe message to the same > address they sent their subscribe message to. People get frustrated that it doesn't work then start doing stupid things like sending it to the actual list, like this person did. Of course they always fail to consider doing the proper thing which is to ask [EMAIL PROTECTED] or the list owner ([EMAIL PROTECTED] in this case) for help if it is the case that their email has changed and they no longer have a way to send from the subscribed address. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
> > unsubscribe linux-kernel .. I wonder why people can't send their unsubscribe message to the same address they sent their subscribe message to. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, Feb 05, 2007 at 04:01:23PM +0200, Pekka Enberg wrote: > Hi Joerg, > > On 2/5/07, Joerg Roedel <[EMAIL PROTECTED]> wrote: > >Hmm, this seems to be the same issue as in [1] and [2]. A page that is > >assumed to belong to the slab but is not longer marked as a slab page. > >Could this be a bug in the memory management? > > The BUG_ON triggers whenever you feed an invalid pointer to kfree() or > kmem_cache_free() so I am guessing the caller is simply broken. Note > that kernels prior to 2.6.18 would quietly corrupt the slab unless > CONFIG_SLAB_DEBUG was enabled which might explain why this hasn't been > noticed before. Ok. I was not aware of that. Thanks for clarification. Joerg -- Joerg Roedel Operating System Research Center AMD Saxony LLC & Co. KG - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
Hi Joerg, On 2/5/07, Joerg Roedel <[EMAIL PROTECTED]> wrote: Hmm, this seems to be the same issue as in [1] and [2]. A page that is assumed to belong to the slab but is not longer marked as a slab page. Could this be a bug in the memory management? The BUG_ON triggers whenever you feed an invalid pointer to kfree() or kmem_cache_free() so I am guessing the caller is simply broken. Note that kernels prior to 2.6.18 would quietly corrupt the slab unless CONFIG_SLAB_DEBUG was enabled which might explain why this hasn't been noticed before. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, Feb 05, 2007 at 05:41:29PM +0200, [EMAIL PROTECTED] wrote: > Good morning, > > I am experiencing a bug i think. I am running a 2.6.19.2 kernel on a 3Ghz > Intel with HT activated, 1 gb ram, and noname motherboard. Here is the > output of the hang: Hmm, this seems to be the same issue as in [1] and [2]. A page that is assumed to belong to the slab but is not longer marked as a slab page. Could this be a bug in the memory management? Joerg [1] http://lkml.org/lkml/2007/2/4/77 [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=406477 -- Joerg Roedel Operating System Research Center AMD Saxony LLC & Co. KG - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Wed, 2 Feb 2005, Aleksey Gorelov wrote: > Hi Matt, Alan, > > Could you please tell me (link would do) why it makes default > delay_use=5 > really necessary (from the patch below)? > https://lists.one-eyed-alien.net/pipermail/usb-storage/2004-August/00074 > 7.html > > It makes USB boot really painfull and slow :( > > I understand there should be a good reason for it. I've tried to find > an answer in > archives, without much success though. Lots of devices don't need that delay, but enough of them do that we decided to add it. The value of 5 seconds was more or less arbitrary; it was long enough for every device we could test and it didn't seem _too_ long. Maybe 1 second would be long enough -- we just didn't know so we were conservative. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
It's basically just like the code says. A lot of devices choke if you access them too quickly after enumeration. The 5 second delay seems to be enough for most devices. But we made it adjustable exactly for people like you. Matt On Wed, Feb 02, 2005 at 04:17:13PM -0800, Aleksey Gorelov wrote: > Hi Matt, Alan, > > Could you please tell me (link would do) why it makes default > delay_use=5 > really necessary (from the patch below)? > https://lists.one-eyed-alien.net/pipermail/usb-storage/2004-August/00074 > 7.html > > It makes USB boot really painfull and slow :( > > I understand there should be a good reason for it. I've tried to find > an answer in > archives, without much success though. > > Thanks, > Aleks. -- Matthew Dharm Home: [EMAIL PROTECTED] Maintainer, Linux USB Mass Storage Driver Now payink attention, please. This is mouse. Click-click. Easy to use, da? Now you try... -- Pitr to Miranda User Friendly, 10/11/1998 pgpDzPywIhFwy.pgp Description: PGP signature
Re: your mail
> So it seems that PnP finds the card, but the connections (or even the > forced values) to the sb module fail. Back when this was a single > processor machine, but still running 2.4 kernel, a windoze > installation found the SB at the listed interface parameters. > > > Anyone have a solution? > > Same problem without modules.conf settings, valid version of mod > utilities, a web search did not help,... I had a similar problem with different card (Gravi Usltrasound PnP). The solution turned out to be to avoid dma 1 channel. May be some BIOSes or ISA chipsets got the 8-bit dma channels handling wrong, but I really don't know. Btw: for me 2.2.x autodetected right, 2.4.x need explicit setting. - Jan Hudec `Bulb' <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
I have the sound blaster 16 card on one of my athlon (on PIII i have es1731), that has one isa slot on its MB. It works well, but i do not use isapnp nor any pnp support is enabled inside of the kernel. running 2.4.5/2.4.6-pre2 Luigi On Tue, 12 Jun 2001, Colonel wrote: > From: Colonel <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Subject: ISA Soundblaster problem > Reply-to: [EMAIL PROTECTED] > > > The Maintainers list does not contain anyone for 2.4 Soundblaster > modules, so perhaps some one on the mailing list is aware of a > solution. My ISA Soundblaster 16waveffects worked fine in kernel 2.2 > with XMMS. But I have never been successful in a varity of 2.4 > kernels, the latest being 2.4.5-ac12. This is what I know: > > [DMESG] > isapnp: Scanning for PnP cards... > isapnp: Calling quirk for 01:00 > isapnp: SB audio device quirk - increasing port range > isapnp: Card 'Creative SB16 PnP' > isapnp: 1 Plug & Play card detected total > > }modprobe sb > /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o: init_module: No such device > /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o: Hint: insmod errors can be caused >by incorrect module parameters, including invalid IO or IRQ parameters > /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o: insmod >/lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o failed > /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o: insmod sb failed > > > [/etc/modules.conf] > options sb io=0x220 irq=5 dma=1 dma16=5 mpu_io=0x330 > > > [DMESG} > Soundblaster audio driver Copyright (C) by Hannu Savolainen 1993-1996 > sb: No ISAPnP cards found, trying standard ones... > sb: dsp reset failed. > > > So it seems that PnP finds the card, but the connections (or even the > forced values) to the sb module fail. Back when this was a single > processor machine, but still running 2.4 kernel, a windoze > installation found the SB at the listed interface parameters. > > > Anyone have a solution? > > Same problem without modules.conf settings, valid version of mod > utilities, a web search did not help,... > > > > TIA > > > please CC: [EMAIL PROTECTED], not currently on the mailing list. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
Followup to: <[EMAIL PROTECTED]> By author:Andrzej Krzysztofowicz <[EMAIL PROTECTED]> In newsgroup: linux.dev.kernel > > BTW, linux-kernel readers: anybody is a volunteer for making the kernel size > counter 32-bit here? This would enable using the simple bootloader for > greater kernel loading... (current limit is sligtly below 1MB) > Possibly some 16/32-bit real mode code mixing would be necessary. > PLEASE don't go there. bootsect.S is fundamentally broken these days (doesn't work on USB floppies, for example.) It should be killed DEAD, DEAD, DEAD and not dragged along like a dead albatross. -hpa -- <[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
> > Minor issue with bootsect.s. 1. bootsect.S is the source file > The single instance of the lds assembly instruction includes the comment of > ! ds:si is source > ... > seg fs > lds si,(bx)! ds:si is source > ... > Is this comment not in reverse order (i.e should be lds > dest,src) 2. This is not a comment of i386 mnemonics. This comments the role of specific register in the following instructions. BTW, linux-kernel readers: anybody is a volunteer for making the kernel size counter 32-bit here? This would enable using the simple bootloader for greater kernel loading... (current limit is sligtly below 1MB) Possibly some 16/32-bit real mode code mixing would be necessary. Andrzej - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
1. The corrupted files have the same length but differ (I cannot say on what bit-position) 2. I reproduced the problem while burning CD from SCSI-Disk to SCSI-CD-Burner!!! -> It´s definetly not a (single?) IDE-Problem Burning CD (on slow 4x speed) seems to initialize many small transfers (instead of a smooth stream)(same as copying many small files) on PCI/DMA wich generate the same problems!!! > On Mon, 21 May 2001, Thomas Palm wrote: > > > there ist still file-corruption. I use an ASUS A7V133 (Revision 1.05, > > including Sound + Raid). My tests: > > 1st run of "diff -r srcdir destdir" -> no differs > > 2nd run of "diff -r srcdir destdir" -> 2 files differ > > 3rd run of "diff -r srcdir destdir" -> 1 file differs > > 4th run of "diff -r srcdir destdir" -> 1 file differs > > 5th run of "diff -r srcdir destdir" -> no differs > > Could you check WHERE the file differ and WHERE the data come from ? > > I've got the same mobo AND some nasty DAT tape corruption problems... > (also, VERY rarely, on the CD burner). I've got all on SCSI, but if it's > the DMA troubling us... > > -- Lorenzo Marcantonio > > -- Machen Sie Ihr Hobby zu Geld bei unserem Partner 1&1! http://profiseller.de/info/index.php3?ac=OM.PS.PS003K00596T0409a -- GMX - Die Kommunikationsplattform im Internet. http://www.gmx.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, 21 May 2001, Thomas Palm wrote: > there ist still file-corruption. I use an ASUS A7V133 (Revision 1.05, > including Sound + Raid). My tests: > 1st run of "diff -r srcdir destdir" -> no differs > 2nd run of "diff -r srcdir destdir" -> 2 files differ > 3rd run of "diff -r srcdir destdir" -> 1 file differs > 4th run of "diff -r srcdir destdir" -> 1 file differs > 5th run of "diff -r srcdir destdir" -> no differs Could you check WHERE the file differ and WHERE the data come from ? I've got the same mobo AND some nasty DAT tape corruption problems... (also, VERY rarely, on the CD burner). I've got all on SCSI, but if it's the DMA troubling us... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Wed, May 16, 2001 at 08:05:38AM -0700, siva prasad wrote: > Is it true that the ipc calls like > msgget(),shmget()... > are not really system calls? No, they all use a system call, but the system call is the same for all functions. > Cos in the file "asm/unistd.h" where the > system calls are listed as __NR_xxx we dont find > the appropriate listing for the ipc calls. > What I guessed was that all the ipc calls are > clubbed under the 'int ipc()' system call and this > is well listed in the "asm/unistd.h" Right, they all use __NR_ipc. See sys_ipc() in arch/i386/kernel.sys_i386.c, especially the comment right above the function... > Could some one explain why the ipc is implemented > this way rather that implementing them as individual > system calls as in UNIX. Probably because the original designer liked it this way and nobody cared enough to do it otherwise. > Or is it the same way in UNIX I don't know, I don't have Unix source available. Erik -- J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department of Electrical Engineering, Faculty of Information Technology and Systems, Delft University of Technology, PO BOX 5031, 2600 GA Delft, The Netherlands Phone: +31-15-2783635 Fax: +31-15-2781843 Email: [EMAIL PROTECTED] WWW: http://www-ict.its.tudelft.nl/~erik/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Tue, 8 May 2001, Jens Axboe wrote: > On Tue, May 08 2001, Richard B. Johnson wrote: > > > Use a kernel thread? If you don't need to access user space, context > > > switches are very cheap. > > > > > > > So, what am I supposed to do to add a piece of driver code to the > > > > run queue so it gets scheduled occasionally? > > > > > > Several, grep for kernel_thread. > > > > > > -- > > > Jens Axboe > > > > > > > Okay. Thanks. I thought I would have to do that too. No problem. > > A small worker thread and a wait queue to sleeep on and you are all set, > 10 minutes tops :-) > > > It's a "tomorrow" thing. Ten hours it too long to stare at a > > screen. > > Sissy! > Okay. I am now awake. I will now try the kernel thread. Looks simple. Got to remember to kill it before/during module removal. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Tue, 8 May 2001, Alan Cox wrote: > > I have a driver which needs to wait for some hardware. > > Basically, it needs to have some code added to the run-queue > > so it can get some CPU time even though it's not being called. > > Wht does it have to wait ? Why cant it just poll and come back next time ? > Good question. I wanted to be able to call the exact same routine(s) that other routines (exected from read() and write()), execute. These routines are complex and sleep while waiting for events. I didn't want to duplicate that code with different time-out mechanisms. GPIB is nasty because you can't do anything unless the 'controller' tells you to do it. When "addressed to talk", you have to parse all the stuff sent via interrupt (ATN bit set, control byte, which address from the control byte, etc.), then let somebody sleeping in poll() know that they can now "write()". That can all be handled via interrupt. But, now for the receive . The user-mode code needs to be sleeping until some data are available. That data may never be available. Something in the driver needs to wait until the hardware is addressed to receive. Since it is not now receiving, there is no interrupt! It takes time for the controller to tell you to listen and then tell somebody else to talk to you. This means that I need some timeout to recover from the fact that the other guy may never talk. Once the other guy starts sending data, the interrupts can be used to handle the data and, once there are valid data, the device owner can be awakened, presumably sleeping in poll() or select(). It's the intermediate time where there are no interrupts that needs the CPU to determine that we've waited too long for interrupts so the device had better get off the bus to start the error recovery procedure. Bright an early tommorrow, I will check out both ways. A kernel thread might be "neat". However, I may just look to see if I can just poll while using existing code. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
> I have a driver which needs to wait for some hardware. > Basically, it needs to have some code added to the run-queue > so it can get some CPU time even though it's not being called. Wht does it have to wait ? Why cant it just poll and come back next time ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Tue, May 08 2001, Richard B. Johnson wrote: > > Use a kernel thread? If you don't need to access user space, context > > switches are very cheap. > > > > > So, what am I supposed to do to add a piece of driver code to the > > > run queue so it gets scheduled occasionally? > > > > Several, grep for kernel_thread. > > > > -- > > Jens Axboe > > > > Okay. Thanks. I thought I would have to do that too. No problem. A small worker thread and a wait queue to sleeep on and you are all set, 10 minutes tops :-) > It's a "tomorrow" thing. Ten hours it too long to stare at a > screen. Sissy! -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Tue, 8 May 2001, Jens Axboe wrote: > On Tue, May 08 2001, Richard B. Johnson wrote: > > > > To driver wizards: > > > > I have a driver which needs to wait for some hardware. > > Basically, it needs to have some code added to the run-queue > > so it can get some CPU time even though it's not being called. > > > > It needs to get some CPU time which can be "turned on" or > > "turned off" as a result of an interrupt or some external > > input from an ioctl(). > > > > So I thought that the "tasklet" would be ideal. However, the > > scheduler "thinks" that a tasklet is an interrupt, so any > > attempt to sleep in the tasklet results in a kernel panic, > > "ieee scheduling in an interrupt..., BUG sched.c line 688". > > Use a kernel thread? If you don't need to access user space, context > switches are very cheap. > > > So, what am I supposed to do to add a piece of driver code to the > > run queue so it gets scheduled occasionally? > > Several, grep for kernel_thread. > > -- > Jens Axboe > Okay. Thanks. I thought I would have to do that too. No problem. It's a "tomorrow" thing. Ten hours it too long to stare at a screen. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/