About effective resolution of cpu execution clocks

2012-10-24 Thread Miguel Telleria de Esteban
Dear all,

I resend this question that I posted yesterday in the Linux Real-Time
mailing list but received no answers maybe due to its "newbie status".

http://marc.info/?l=linux-rt-users&m=135099100016609&w=2

let me rephrase-it for a generic not necessary real-time aware audience:

The question concerns the per-process CPU usage statistics maintained by
the kernel.  As far as I can tell, the only places where this usage
counter is stored are in the utime and stime fields of task_struct.

http://lxr.linux.no/#linux+v3.6.3/include/linux/sched.h#L1362 
(line 1362)

I have observed that these fields are of type "cputime_t" which seems
to be defined as an unsigned long and therefore contain 32bits (at
least in a 32bit architecture such as x86).

http://lxr.linux.no/linux+v3.6.3/include/asm-generic/cputime.h#L7

These fields utime and stime are used as accumulators of time usage in
the implementation of POSIX CPU-usage clocks and timers.

http://lxr.linux.no/#linux+v3.6.3/kernel/posix-cpu-timers.c

A typical use-case of this functionality is measuring the CPU time
consumed by a thread.  In real-time systems this information can be
used for further actions such as changing its priority, sending a
signal or whatever.

Here is an example using NPTL from libc:

clockid_t clock;
struct timespec before_ts, after_ts, interval;

pthread_getcpuclockid( pthread_self(), &clock );
clock_gettime(clock, &before_ts);

... do your things here

clock_gettime(clock, &after_ts);

interval = timespec_substract(after_ts, before_ts);

In this code the time is stored in a struct timespec which is composed
of 2 32-bit longs obtaining both a resolution of nanoseconds and a
expand of years. 

On the other side 32bit integers such as utime and stime cannot provide
both a high resolution and high time span.  And according to the man
page of proc, when these fields are output from /proc//stat they
give the value in jiffies (1/CONFIG_HZ sg, i.e. 4 millisec in most
kernel configs).

The way clock_gettime works, when linked to a process CPU clock is by
keeping a counter of CPU usage updated by the scheduler on every
preemption action + using hardware facilities to measure the latest
time period.

I assume that Linux, specially since the merge of high
resolution timers in 2.6.21, benefits now from the latest "hardware
facilities for time management" gaining resolutions of micro and
nanoseconds, as reported by clock_getres().

With this background in mind I repeat the same questions that I asked
in the linux-rt mailing list:

*  What is the effective resolution of two invocations of
   clock_gettime() on the same running thread for a long period
   involving several CPU preemptions?

*  Are there other fields apart from stime and utime with the
   sufficient precision to maintain a CPU usage count?

*  Does the PREEMPT_RT branch improve this resolution somehow?

Thanks in advance for your time.
Cheers,

   Miguel Telleria

-- 

-----------
  Miguel TELLERIA DE ESTEBANGrupo de computadores y tiempo real
  telleriam ENSAIMADA unican.es Dept. Electrónica y Computadores
   (change "ENSAIMADA" for @)   Universidad de Cantabria

  http://www.ctr.unican.es  Tel trabajo: +34 942 201477
---



signature.asc
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Page Cache Address Space Concept

2011-02-15 Thread Miguel Telleria de Esteban
My 2 cents on this topic, since I have recently read some litterature
about it.

On Mon, 14 Feb 2011 16:29:42 +0530 piyush moghe wrote:

> While going through Page Cache explanation in "Professional Linux
> Kernel" book I came across one term called "address space" ( not
> related to virtual or physical address space )

First of all I assume that you are referring to the structure

struct addres_space

defined in fs.h.  See for kernel 2.6.37 the following link
http://lxr.linux.no/linux+*/include/linux/fs.h#L632

> 
> I did not get what is the meaning of this address space, following is
> verbatim description:
> 
> "To manage the various target objects that can be processed and
> cached in whole pages, the kernel uses an abstraction of
> the "address space" that associates the pages in memory with a
> specific block device (or any other system unit or part of a
> system unit).
> This type of address space must not be confused with the virtual and
> physical address spaces provided by the
> system or processor. It is a separate abstraction of the Linux kernel
> that unfortunately bears the same name.
> Initially, we are interested in only one aspect. Each address space
> has a "host" from which it obtains its data. In most
> cases, these are inodes that represent just one file.[2] Because all
> existing inodes are linked with their superblock (as
> discussed in Chapter 8), all the kernel need do is scan a list of all
> superblocks and follow their associated inodes to obtain
> a list of cached pages"
> 
> 
> Can anyone please explain what is the use of this and what this is all
> about?

This structure makes up the implementation of the "page cache"
mechanism which I see coarsely as a disk cache (of the kind of smartdrv
in Windows 3.1 in the conceptual case).

The idea is that, in most cases (exceptions are DIRECT_IO disk access)
all filesystem reads and writes are performed in memory first.  Reads
are fetched from disk only the first time and writes are flushed to
disk at certain intervals.

The most typical use of address_space is with regular files.  In these
cases the structure is embedded inside the inode (typically in the
i_data) and accessed through the i_mapping pointer.  It can also be
accessed through the f_mapping pointer in the struct file data passed
to read() and write() 

Whenever you want to access a certain offset of a file the procedure
goes as follows (skipping access permissions, file locking, reference
counting, spin locks, page flags and other issues):

1.  Find the address_space of the file through the f_mapping pointer
inside struct file.

2.  Compute the "page index" of the data to read based on:
-  the memory page size (e.g. 4 Kb for i386)
-  the file pointer location (ppos) from previous file operations
-  the offset specified in the syscall

3.  Invoke find_get_page() on the address_space object giving the index.

This function is the actual page-cache lookup operation.  It goes
through the 

page_tree member of struct adress_space (a radix tree)

and if available it returns a pointer of the corresponding struct
page.

If the page is not yet available or it is not up to date a block
I/O operation would be scheduled.

Let's assume that the page is available and up to date.  So we have
a

struct page

representing the page that holds the cached data.

There are still some more tweaks to arrive to the actual data
although they fall outside of address_space.

4.  From the struct page, the "private" pointer leads to a
circular single linked list of

   struct buffer_head

5.  Each buffer head represents an I/O block (as opossed to memory page)
chunk of data.  The size of the block chunks is stored in b_size.

So we need to locate the corresponding i/o block(s) (or buffer_head)
inside the linked list that matches the actual data to read.

6.  Once the corresponding buffer_head is selected, the ACTUAL DATA is
available through the b_data pointer.

In all those 6 steps I have given you an overview of the role of

-  struct address_space

-  struct page (yes the one of memory pages used everywhere in the
kernel)
-  struct buffer_head

in filesystem access.  Of course there is much more to tell
regarding spin-locks, reference counts, page flags and the like.  If
you want to get deep in the issue (it took me 3 weeks to understand it)
you can read chapters 12, 15 and 16 of Understanding the Linux Kernel
book.

Hoping it Helps,

Miguel


PS:  Corrections welcome :)



-- 

---
  Miguel TELLERIA DE ESTEBANGrupo de computadores y tiempo real
  telleriam ENSAIMADA unican

Re: buffer page concepts in the page cache

2011-01-25 Thread Miguel Telleria de Esteban
Thanks Mulyadi,

On Wed, 26 Jan 2011 01:19:52 +0700 Mulyadi Santosa wrote:

> Hi Miguel...
> 
> Tough questions, let's see if I can made it :D
> 
> On Tue, Jan 25, 2011 at 19:56, Miguel Telleria de Esteban
>  wrote:
> > MY INTERPRETATION (please correct me if I am wrong)
> >
> > Q1  What is a "buffer page"?
> >
> > A "buffer page" is a "struct page" data describing a page allocated
> > to hold one or more i/o blocks from disk.
> 
> I agree...in other word, they are pages that hold data when the I/O
> are still in flight. But since it's part of page cache, they aren't
> thrown away after the I/O is done...for few moment they are held in
> RAM, in case they're subsequently read...thus, I/O frequency toward
> physical discs are reduced
> 
> I think, we know call it page cache
> 
> > Q2  Is the whole page cache content organized as buffer pages?
> >
> > YES, there is no other way to link memory-mapped disk i/o data to
> > the struct page pointed by address_space radix-tree entries.
> 
> Not so sure, but it's something like that IMHO.
> 
> > ---
> >
> > Q3  block device buffer_pages vs file buffer_pages
> >
> > This I really don't understand.  From what UTLK page 614 says:
> >
> > *  File buffer_pages ONLY refer to non-contiguous (on disk layout)
> > file contents.
> >
> > *  blockdev buffer_pages refer to single-block or continuous (on
> > disk layout) portions of block.
> >
> > My question is:  what happens with non-fragmented medium size files
> > that do not contain "disk holes" or non-adjancent block submissions?
> 
> Here's my understanding:
> 1. when you're dealing with file in raw, e.g using "dd" on /dev/sda1
> or "dd" with direct I/O command, you use block buffer cache

> 2. when you deal with files using read()/write facility of filesystem
> (thus via VFS), you use file page cache...

This makes sense.  Looking through LXR at the do_generic_file_read()
function (actually do_generic_mapping_read() ), the address_space used
is the one of the file, not the dev.

Maybe dd goes also through this same path since you directly specify
the devfile to read from.

The other read path (bread() function) seems to be used when looking
for metadata (inode, superblocks) which are not requested by the
user-space read() call.


> 
> to experiment with it, simply start "top" and examine which field
> increases when you do "dd", cat, etc

Uhhmm I don't have this clear.  I would like to check on which
adress_space object I am using (the block device or the file) so I
guess I need more deep tools (maybe ftrace??) to see it.

> 
> I hope I help you instead confusing you :D
> 

Thanks, you have helped.  On my side I continue (re)reading :).



-- 

  (O-O)
---oOO-(_)-OOo-
 Miguel TELLERIA DE ESTEBAN   http://www.mtelleria.com
 Email: miguel at mtelleria.com   Tel GSM:  +34 650 801098
  Tel Fix:  +34 942 280174

 Miembro de http://www.linuca.orgMembre du http://www.bxlug.be
 ¿Usuario captivo o libre?http://www.obtengalinux.org/windows/
 Free or  captive user?http://www.getgnulinux.org/windows/
---



signature.asc
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


buffer page concepts in the page cache

2011-01-25 Thread Miguel Telleria de Esteban

Dear all,

CONTEXT

I am spending the last weeks learning how the kernel executes disk I/O
writes and reads from the userland read() to the hard disk drive.  I
want to see (and understand) the WHOLE PICTURE regarding the VFS,
block I/O layer and page cache.

To do this, I am following as a base guide Bovet and Cesati UTLK 3rd
edition [1] (chapters 12-16 so far) and the new edition of Robert
Love's Linux Kernel Development[2] (chapters 13-16).  A lot of reading
so far that I still need to slowly digest.

For the moment, I have not dived yet into the details of "page frame
reclaiming", "swap memory" and "filesystem implementations" areas.  My
knowledge about memory allocation (slab allocator) is also limited.

[1]  Understanding The Linux Kernel 3rd Edition, O'Reilly
[2]  Linux Kernel Development 3rd Edition Addison Wesley


MY QUESTIONS

1.  What do we understand by "buffer pages"?

2.  Is the whole page cache content (i.e. the radix tree in the
address_space of the different inodes) organized as buffer pages?

3.  What is the functional difference between "block device buffer
pages" (stored in the address_space of the master bdev inode) and
the "file buffer pages" stored in the address_space of a file
inode? [ UTLK, page 614 ]


Maybe I am missing an important point of course...


MY INTERPRETATION (please correct me if I am wrong)

Q1  What is a "buffer page"?

A "buffer page" is a "struct page" data describing a page allocated to
hold one or more i/o blocks from disk.

As such, the "private" field points to a single circular  list of
"buffer_heads" each describing the mapping between the i/o blocks in
memory (b_data field) and the i/o blocks on disk (b_size,
b_blocknr...).

The buffer_head structures themselves are stored outside of the page
as shown in UTLK Fig 15.2.

---

Q2  Is the whole page cache content organized as buffer pages?

YES, there is no other way to link memory-mapped disk i/o data to the
struct page pointed by address_space radix-tree entries.

---

Q3  block device buffer_pages vs file buffer_pages

This I really don't understand.  From what UTLK page 614 says:

*  File buffer_pages ONLY refer to non-contiguous (on disk layout) file
   contents.

*  blockdev buffer_pages refer to single-block or continuous (on disk
   layout) portions of block.

My question is:  what happens with non-fragmented medium size files
that do not contain "disk holes" or non-adjancent block submissions?



Thanks in advance for your attention,

 Miguel



___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies