Re: Creating tasks and the l4_task_map function

Adam Lackorzynski Thu, 21 Apr 2022 13:58:37 -0700

Hi Paul,

On Tue Apr 19, 2022 at 01:20:30 +0200, Paul Boddie wrote:
> On Monday, 18 April 2022 23:26:03 CEST you wrote:
> > Hi Paul,
> > 
> > On Tue Apr 12, 2022 at 01:09:40 +0200, Paul Boddie wrote:
> > > 
> > > OK, I did see that function being used, too, but I also found plenty of
> > > other things in my perusal of the different files. Obviously, being able
> > > to extend the UTCB memory is an important consideration.
> > 
> > It is, because, for example, one might not know how many threads a task
> > will have, especially the component that creates the task.
> 
> Right. And so, in the Moe_app_model the size of the UTCB is dependent of the 
> default number of threads. It still seems that I have to provide a UTCB 
> flexpage to l4_factory_create_task, however.


Yes. Both are different levels of APIs. l4_factory_create_task /
L4::Factory::create_task is a kernel API, while Moe_app_model is a
user-level abstraction that is using those APIs.

> > > Right. I see that the factory function actually sends the flexpage in the
> > > IPC call (using l4_factory_create_add_fpage_u), thus mapping it in the
> > > task. I find it hard to follow where this message is actually handled (I
> > > presume that Moe acts as the factory) or what the factory actually does
> > > with the flexpage, but I presume that it ultimately causes it to be
> > > mapped in the new task.
> >
> > It is handled in Fiasco.
> 
> OK. I think I found this in Task::create (src/kern/task.cpp).

Yes, that's there.

> > > However, I wonder about the "chicken and egg" situation in new tasks. It
> > > seems to me that the way things work is that a new task in L4Re is
> > > typically populated with the l4re binary containing the region
> > > mapper/manager (RM). This seems to be initiated here (in launch_loader):
> >
> > Yes, the l4re binary loads the application and then serves as its pager.
> 
> OK. And it seems that the RM provided by this binary is able to indicate the 
> receive window for flexpages when asking for mappings from dataspaces.

Yes, it is.

> [Pagers for other tasks]
> 
> > That's how it works. Moe also has region managers that are used for the
> > l4re binary to be paged. When a page fault is resolved, then there is
> > someone sending memory via a flexpage to the task in question. In our
> > case it's the dataspace manager which sends the memory via an 'map'
> > call. Here it does not matter whether the l4re binary faulted or the
> > application, because in the end the task is the receiver of the flex
> > page, not the particular application (which are runnig in the same
> > task).
> 
> So, the one thing I didn't understand until I started digging around in the 
> Fiasco sources and also implementing my own page fault handler was the scope 
> of the receive window for issued flexpages, but it appears that the whole 
> address space is indicated as the receive window in 
> Thread::handle_page_fault_pager (src/kern/thread-ipc.cpp).

Yes, for page faults this is the case, i.e., the kernel allows the pager
to map into the whole address space to resolve the page fault. There is
not restriction made by the kernel for that, and the page fault handler
has control over the virtual memory space of the task in any case.

> [Mapping memory using l4_task_map]
> 
> > Jdb has facilities to check how the address spaces look like, exactly to
> > debug issue like you describe. You can press 's' to see all the tasks in
> > the system, navigate onto them, and then press 'p' to see the page-table
> > view. Here you can navigate the page tables and verify that the pages at
> > some virtual address is actually pointing to the physical location they
> > should point at. For a particular physical address (page frame number)
> > you can also show the mapping hierarchy via the 'm'.
> 
> The "coherency" problems actually turned out to be me forgetting the 
> appropriate alignment for the mapped flexpages. But I have discovered a few 
> things about jdb in my attempts to troubleshoot my code, including 's' to 
> look 
> at tasks. I found the page table view bewildering, though, rather hoping for 
> a 
> nice summary of mapped pages instead. However, the object space view was very 
> useful in establishing that capabilities were being mapped.

Yeah, the page-table view really shows lots of tables :)
I agree that a list of mapped regions would also be nice to have.

> I did manage to get l4_task_map to work between two fully initialised tasks 
> created by Ned as follows:
> 
> l4_fpage_t payload_fpage = l4_fpage((l4_addr_t) buf,
>                                     l4util_log2(L4_PAGESIZE * NUM_PAGES),
>                                     L4_FPAGE_RW);
> 
> l4_task_map(recipient, L4RE_THIS_TASK_CAP, payload_fpage, 0x3000000);
> 
> This permitted the recipient task to access the memory in the appropriately 
> aligned buf region, this being mapped into the region starting at 0x3000000 
> in 
> the recipient.
> 
> I also managed to achieve the same thing using IPC between the two tasks 
> instead, having the recipient indicate a receive window in the buffer 
> registers as follows:
> 
> br[0] = l4_map_control(0, 0, L4_MAP_ITEM_MAP);
> br[1] = l4_fpage(0x3000000, l4util_log2(L4_PAGESIZE * NUM_PAGES), 0).raw;
> 
> And then constructing the send flexpage as follows:
> 
> mr[0] = l4_map_control(0, L4_FPAGE_CACHEABLE, 0);
> mr[1] = l4_fpage((l4_addr_t) buf, l4util_log2(L4_PAGESIZE * NUM_PAGES),
>                  L4_FPAGE_RW).raw;
> 
> Alternatively, setting the receive window to the entire address space...
> 
> br[1] = l4_fpage_all().raw;
> 
> ...and indicating a different send base also worked:
> 
> mr[0] = l4_map_control(0x3000000, L4_FPAGE_CACHEABLE, 0);
> 
> This approximating to the l4_task_map scenario.
> 
> However, trying this with a completely newly created task, I cannot seem to 
> get l4_task_map to map memory into the task, and my page fault handler does 
> not seem to be able to respond to a page fault message with such a flexpage 
> and have the request satisfied. This means that there is some detail I am 
> overlooking, but I have yet to determine what it is.
> 
> One thing I have tried to do is to get Fiasco to report what it is doing when 
> processing page fault messages, but this is quite challenging. Really, I just 
> want to establish whether the reply to a page fault message gets interpreted 
> correctly and causes mappings to be established in the page tables.
> 
> Doing some "old school" tracing in various routines like transfer_msg_items 
> by 
> detecting the appropriate fault conditions, enabling tracing, and then 
> producing output on the console doesn't seem to yield any indications of the 
> messages being processed, but perhaps I misunderstand the flow of control 
> from 
> Thread::handle_page_fault_pager within the kernel when the IPC is initiated.
> 
> Reviewing the old thread on this broader topic, I found this advice:
> 
> http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/015441.html
> 
> This does yield page fault trace log entries of the following form:
> 
> pf:  00bc pfa=0000000001000ae3 ip=0000000001000ae3 (rp) 
> spc=0xffffffff13dc5ad8 
> err=15
> 
> Here, I presume that the error is R_aborted (src/abi/l4_error.cpp), meaning 
> that the page fault was not handled.
> 
> Looking at advice about IPC tracing...
> 
> http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/015475.html
> 
> ...I can also get log entries that I think might indicate some element of 
> success in terms of the messages being sent, with this being the fault 
> message:
> 
>      00be answ [fffffffffffe0002] L=8000fcf3 err=0 (OK) (1000ae5,1000ae3)
> ipc: 00be wait->[C:INV] DID=be L=0 TO=INF
> 
> And elsewhere:
> 
>      00be answ [00000040] L=0 err=0 (OK) (1000038,400595)
> ipc: 00be send rcap->[C:INV] DID=bc L=0 [0000000000000040]
>                (0000000001000038,0000000000400595) TO=INF
> 
> Here, I am attempting to resolve the page fault caused by execution at 
> 0x1000ae3 by sending a flexpage providing memory in one task at 0x400000 to 
> the recipient at 0x1000000.
> 
> None of this really explains why the page fault handler keeps getting called 
> with the same details, sending the same message, and so on.
> 
> [...]
> 
> > For sure this is an area where the code is pretty involved. Back in the
> > old days we had an IDL compiler that grew and grew and was in the end
> > not easy to maintain. When we switched to the capability system and thus
> > changing all the APIs we had the choice of whether adapting the IDL
> > compiler or doing something different. Back then it was a major hassle
> > to parse the input because eventually one wants to have the whole
> > language understood by the IDL compiler to be able to use all sorts of
> > types (of course one could make compromises there). With C / C++ that is
> > not so easy, at least back then. Now there's LLVM and that's a major
> > improvement in this area. Still the actual tool needs to be implemented
> > and maintained. Now, as we all see, we have opted for the "do something
> > else" option. With C++ as our main language and the possibilities with
> > it, there was the idea to implement the "IDL thing" purely with C++
> > directly in the code. That's what we have now. For me, all the code
> > around it is the IDL compiler, and abstractly, if it would not be in
> > header files it would sit somewhere else but it would be there in one
> > form or another.
> 
> The unfortunate thing about the "do something else" solution is that it 
> becomes difficult to determine what the interfaces are for components: the 
> details are all presumably present, but they are encoded in ways that are not 
> particularly readable. That might not seem to matter if the existing classes 
> are usable and readily understandable, but my own experience was that I found 
> myself trying to understand the low-level details before I could hope to 
> understand why the C++ API did things in a particular way, which is really 
> the 
> wrong way round. I must have spent hours staring at Gen_fpage and related 
> types in one way or another.

Yes this is involved. I'm convinced that the code handling all this
needs to be somewhere. For sure it can be argued how it is written but
the overall details seem to be as they are.
 
> > > This might just sound like me complaining, but I also have some concerns
> > > about being able to verify the behaviour of some of the code. For
> > > example, I recently found that my dataspace implementation was getting
> > > requests from a region mapper/manager with an opcode of 0x100000000,
> > > which doesn't make any sense to me at all, given that the dataspace
> > > interface code in L4Re implicitly defines opcodes that are all likely to
> > > be very small integers. At first I obviously blamed my own code, but then
> > > I found that in the IPC call implementation found here...
> > > 
> > > pk/l4re-core/l4sys/include/cxx/ipc_iface
> > > 
> > > ...if I explicitly cleared the first message register before this
> > > statement...> 
> > >   int send_bytes =
> > >   
> > >     Args::template write_op<Do_in_data>(mrs->mr, 0, Mr_bytes,
> > >     
> > >                                         Opt::Opcode, a...);
> > > 
> > > ...then the opcode was produced as expected again.
> > 
> > Which does not fully make sense to me because the message registers seem
> > to be written from 0 on. Anyway, do you have an example maybe?
> 
> I only found this to happen when a program of mine had fetched many pages 
> from 
> a dataspace via the RM. It would be useful to understand the conditions under 
> which this occurs, and I obviously suspect that I must be doing something 
> wrong, but I can't see how my dataspace implementation would corrupt the 
> opcode sent by the RM in its own IPC messages. But I do think it is odd that 
> somehow, the rather opaque code above doesn't manage to fully initialise the 
> message registers.
> 
> Anyway, I will aim to continue my investigations and hopefully make some kind 
> of progress.

Thanks!


On Thu Apr 21, 2022 at 01:14:02 +0200, Paul Boddie wrote:
> Hello,
> 
> Continuing my bad habit of following up to my own messages...
> 
> On Tuesday, 19 April 2022 01:20:30 CEST Paul Boddie wrote:
> > 
> > Doing some "old school" tracing in various routines like transfer_msg_items
> > by detecting the appropriate fault conditions, enabling tracing, and then
> > producing output on the console doesn't seem to yield any indications of
> > the messages being processed, but perhaps I misunderstand the flow of
> > control from Thread::handle_page_fault_pager within the kernel when the IPC
> > is initiated.
> > 
> > Reviewing the old thread on this broader topic, I found this advice:
> > 
> > http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/015441.html
> > 
> > This does yield page fault trace log entries of the following form:
> > 
> > pf:  00bc pfa=0000000001000ae3 ip=0000000001000ae3 (rp)
> > spc=0xffffffff13dc5ad8 err=15
> > 
> > Here, I presume that the error is R_aborted (src/abi/l4_error.cpp), meaning
> > that the page fault was not handled.

'err' is the error code of the page fault from the CPU, as described in
the x86 architecture manuel (In Intel's, chapter 4.7). It is a hex
number. And says user-mode access, page not present, and instruction
fetch (that's also what the 'rp' says, read-only fault, not present).

> I see that formatter_pf (in src/kern/tb_entry_output.cc) is responsible for 
> this form of output. Similarly, formatter_ipc and formatter_ipc_res seem to 
> be 
> responsible for the IPC-related log entries.
> 
> The log entries appear to be populated when Thread::handle_page_fault (in src/
> kern/thread-pagefault.cpp) calls page_fault_log (in src/kern/thread-log.cpp), 
> which populates a Tb_entry_pf instance with the given error details. So, it 
> seems like the reported error is that causing the page fault, not any kind of 
> eventual outcome.
> 
> > Looking at advice about IPC tracing...
> > 
> > http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/015475.html
> > 
> > ...I can also get log entries that I think might indicate some element of
> > success in terms of the messages being sent, with this being the fault
> > message:
> > 
> >      00be answ [fffffffffffe0002] L=8000fcf3 err=0 (OK) (1000ae5,1000ae3)
> > ipc: 00be wait->[C:INV] DID=be L=0 TO=INF
> > 
> > And elsewhere:
> > 
> >      00be answ [00000040] L=0 err=0 (OK) (1000038,400595)
> > ipc: 00be send rcap->[C:INV] DID=bc L=0 [0000000000000040]
> >                (0000000001000038,0000000000400595) TO=INF
> > 
> > Here, I am attempting to resolve the page fault caused by execution at
> > 0x1000ae3 by sending a flexpage providing memory in one task at 0x400000 to
> > the recipient at 0x1000000.
> > 
> > None of this really explains why the page fault handler keeps getting called
> > with the same details, sending the same message, and so on.
> 
> I still can't explain this. Doing some more invasive debugging in 
> Task::sys_map (in src/kern/task.cpp) indicates that an explicit l4_task_map 
> call will cause fpage_map (in src/kern/map_util.cpp) and subsequently mem_map 
> (in src/kern/map_util-mem.cpp) and then map (in src/kern/map_util.cpp) to be 
> called, attempting to map 0x400000 (with size 0x400000) in the original task 
> at 0x1000000 in the recipient. This is reportedly successful.
> 
> With page fault handling, the fpage_map, mem_map and map functions are called 
> similarly, with the same supposedly successful outcome. But the page fault 
> continues to occur with the same details, as if the mapping did not actually 
> happen. I did wonder if it might be due to the original task not really 
> having 
> the pages it is trying to "export" actually mapped in itself (this being a 
> potential pitfall when implementing dataspaces, in my experience), but this 
> is 
> not the case.
> 
> I suppose I must be overlooking something else...

Maybe it helps to share some code here?


Adam

_______________________________________________
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
https://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Re: Creating tasks and the l4_task_map function

Reply via email to