Re: variant length array?

2016-04-05 Thread Rajat Sharma
On Tue, Apr 5, 2016 at 1:00 PM, Bjørn Mork  wrote:
>
> "Robert P. J. Day"  writes:
> > On Tue, 5 Apr 2016, Wenda Ni wrote:
> >
> >> Hi all,
> >>
> >> I come across the following code in a kernel module code. It defines
> >> an array whose length is variant at runtime, depending on the actual
> >> inputs. It seems that kernel compiler supports this, which is
> >> obvious an error in the standard ANSI C. Do I have the correct
> >> understanding on it?
> >>
> >> Thank you.
> >>
> >>
> >> u32 rxe_icrc_hdr(struct rxe_pkt_info *pkt, struct sk_buff *skb)
> >> {
> >>  ...
> >>  int hdr_size = sizeof(struct udphdr) +
> >>  (skb->protocol == htons(ETH_P_IP) ?
> >>  sizeof(struct iphdr) : sizeof(struct ipv6hdr));
> >>  u8 tmp[hdr_size + RXE_BTH_BYTES];
> >>  ...
> >> }
> >
> >   pretty sure "sizeof" can be calculated at compile time so i don't
> > see a problem here.
>
> Yes, but skb->protocol is variable and sizeof(struct iphdr) !=
> sizeof(struct ipv6hdr)).  Is the compiler smart enough to just use the
> largest possible value?  The logic here is pretty similar to a union,
> which it of course wouldn't have any problems calculating the size of.
>
>
> Bjørn
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


It is actually a variable length array on stack. BTW nothing fancy
about kernel compiler, its GCC extention.
https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html

-Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Question about switch_mm function

2015-03-25 Thread Rajat Sharma
On Mar 25, 2015 12:26 PM,  wrote:
>
> On Wed, 25 Mar 2015 12:13:55 -0700, Rajat Sharma said:
>
> > Okay bit more details, I admit I had to dig through bit more to find
> > this out. After all, we all are newbies :)
>
> And you probably learned 3 times more while digging than if I had spelled
it
> out for you :)
>
>

Completely agree :)
___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Question about switch_mm function

2015-03-25 Thread Rajat Sharma
On Wed, Mar 25, 2015 at 10:33 AM, Rajat Sharma  wrote:
>
>
> On Mar 25, 2015 10:31 AM, "Sreejith M M"  wrote:
> >
> > On Wed, Mar 25, 2015 at 10:55 PM,   wrote:
> > > On Wed, 25 Mar 2015 21:35:22 +0530, Sreejith M M said:
> > >
> > >> > This code is handling context switch from a kernel thread back to user 
> > >> > mode
> > >> > thread so TLB entries are invalid translation for user mode thread and 
> > >> > do
> > >> > not correspond to user process pgd. It is Master kernel page table
> > >> > translation as a result of kernel thread execution.
> > >> >
> > >> > -Rajat
> > >> Hi Rajat,
> > >>
> > >> If that is the case, why this code is put under CONFIG_SMP switch?
> > >
> > > Vastly simplified because I'm lazy :)
> > >
> > > If you look at the code, it's poking the status on *other* CPUs.  That's 
> > > why
> > > the cpumask() stuff.
> > >
> > > If you're on a single execution unit, you don't have to tell the other
> > > CPU about the change in state, because there isn't an other CPU.
> >
> > can you come out of this lazy mode explain this a bit more because I
> > am a newbie ?or tell me what else I should know before I have to
> > understand this code
> >
> > --
> > Regards,
> > Sreejith
>
> Valdis is talking about lazy tlb flush, not him being lazy. Otherwise he 
> wouldn't have replied at all :)


Okay bit more details, I admit I had to dig through bit more to find
this out. After all, we all are newbies :)

On SMP system, there is an optimization called lazy TLB mode for
kernel threads. Follow the steps:

1. Assume that some of the CPU are executing a multithreaded user mode
application so essentially they all share same mm and page tables.
2. Now lets say some other CPU changes/assigns physical page frame to
user mode linear address, tets say as a result of processing a system
call on behalf of user mode process. Putting data in user mode buffer
etc. It needs to invalidate old TLB entry for this linear address in
local page table.
3. Since application is multithreaded, some other CPU sharing the same
page table will have old values for corresponding linear address in
its TLB.
4. Normally we would invalidate TLB entries of all CPUs sharing this page table.
5. Now suppose some of the participating CPUs were running a kernel
thread and does not want to be bothered about this change as it has
nothing to do with user mode pages TLB entries, it makes its executing
CPU with do not disturb mode called lazy TLB mode.
6. TLB invalidation of all CPU executing kernel thread are deferred
till kernel thread is finished.
7. At this point, when kernel thread switches back to user mode
process, the invalidation is done and is the code which are are
referring to.

Just in case, if you wonder where is invalidation happening, so
invalidation is arch specific step. In most simple way it is flush all
TLB entries and let it build up over a period of time in future.
That's why it is costly and optimization like lazy TLB mode pays off.
how it is done in x86 is by loading cr3.

http://stackoverflow.com/questions/1090218/what-does-this-little-bit-of-x86-doing-with-cr3

-Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Question about switch_mm function

2015-03-25 Thread Rajat Sharma
On Mar 25, 2015 10:31 AM, "Sreejith M M"  wrote:
>
> On Wed, Mar 25, 2015 at 10:55 PM,   wrote:
> > On Wed, 25 Mar 2015 21:35:22 +0530, Sreejith M M said:
> >
> >> > This code is handling context switch from a kernel thread back to
user mode
> >> > thread so TLB entries are invalid translation for user mode thread
and do
> >> > not correspond to user process pgd. It is Master kernel page table
> >> > translation as a result of kernel thread execution.
> >> >
> >> > -Rajat
> >> Hi Rajat,
> >>
> >> If that is the case, why this code is put under CONFIG_SMP switch?
> >
> > Vastly simplified because I'm lazy :)
> >
> > If you look at the code, it's poking the status on *other* CPUs.
That's why
> > the cpumask() stuff.
> >
> > If you're on a single execution unit, you don't have to tell the other
> > CPU about the change in state, because there isn't an other CPU.
>
> can you come out of this lazy mode explain this a bit more because I
> am a newbie ?or tell me what else I should know before I have to
> understand this code
>
> --
> Regards,
> Sreejith

Valdis is talking about lazy tlb flush, not him being lazy. Otherwise he
wouldn't have replied at all :)
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Question about switch_mm function

2015-03-25 Thread Rajat Sharma
On Mar 25, 2015 6:33 AM, "Sreejith M M"  wrote:
>
>
>
> On Wed, Jan 28, 2015 at 9:56 PM, Sreejith M M 
wrote:
>>
>> Hi,
>>
>> I was trying to understand the difference in scheduling between
>> processes and threads(belong to same process).
>>
>> I was thinking that, when kernel has to switch to a task which belong
>> to the same process, it does not have to clear / replace page global
>> directories and other memory related information.
>>
>> But in switch_mm function some code is put under CONFIG_SMP function.
>> What is its signigicance? Code is
>> below(
http://lxr.free-electrons.com/source/arch/x86/include/asm/mmu_context.h#L37)
>> .
>> What I infer is that the code is doing flush tlb, reload page table
>> directories etc in multiprocessor mode(obviously)  but I believe this
>> code may never be executed .
>>
>> Can anyone help to understand what this part of the function supposed to
do?
>>
>>  60 #ifdef CONFIG_SMP
>>  61   else {
>>  62 this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
>>  63 BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) !=
next);
>>  64
>>  65 if (!cpumask_test_cpu(cpu, mm_cpumask(next))) {
>>  66 /*
>>  67  * On established mms, the mm_cpumask is
>> only changed
>>  68  * from irq context, from
>> ptep_clear_flush() while in
>>  69  * lazy tlb mode, and here. Irqs are blocked
during
>>  70  * schedule, protecting us from
>> simultaneous changes.
>>  71  */
>>  72 cpumask_set_cpu(cpu, mm_cpumask(next));
>>  73 /*
>>  74  * We were in lazy tlb mode and leave_mm
disabled
>>  75  * tlb flush IPI delivery. We must reload CR3
>>  76  * to make sure to use no freed page tables.
>>  77  */
>>  78 load_cr3(next->pgd);
>>  79 trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
>> TLB_FLUSH_ALL);
>>  80 load_LDT_nolock(&next->context);
>>  81 }
>>  82 }
>>  83 #endif
>>
>>
>> --
>> Regards,
>> Sreejith
>
>
> Hi ,
>
> can someone please give me any answers for this?
>
> --
> Regards,
> Sreejith
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

This code is handling context switch from a kernel thread back to user mode
thread so TLB entries are invalid translation for user mode thread and do
not correspond to user process pgd. It is Master kernel page table
translation as a result of kernel thread execution.

-Rajat
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Query regarding Kernel Space Memory

2015-02-24 Thread Rajat Sharma
On Tue, Feb 24, 2015 at 11:53 AM, Vishwas Srivastava
 wrote:
>
> Hi Sannu,
> 1G/3G address split is for virtual address. I am not talking 
> about physical address translation stuff here.
> When a kernel code is compiled for a 32 bit architecture, my assumption is, 
> compiler will generate the code for the full 4G address range. So there must 
> be somebody who is relocating/translating the kernel code address between 
> 0-4G to 3G--4G address space.
> Who is that guy? and is my understanding correct??
>
>
> On Tue, Feb 24, 2015 at 10:54 PM, Sannu K  wrote:
>>
>> Address generated by the linker is virtual address, it is not same as 
>> physical address. Kernel / MMU will do the virtual to physical address 
>> mapping.
>>
>> Hope this helps,
>> PrasannaKumar
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

I believe it is done as part of kernel image linker script, please
have a look at linux/arch/x86/kernel/vmlinux.lds.S. The very first
logical address inside script is derived from LOAD_OFFSET (for x86_32)
or __PAGE_OFFSET which is 3G mark. Hope I am pointing you in right
way.

-Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Implementing a NFS proxy

2014-06-05 Thread Rajat Sharma
Have a look at this: https://github.com/nfs-ganesha/nfs-ganesha/wiki/PROXY

I believe NFS ganesha is quite mature and production ready implementation.


On Thu, Jun 5, 2014 at 11:04 AM, Ramana Reddy  wrote:

> Thanks for your reply. This link does not solve my problem.
> I need a proxy which act like a nfs client as well as nfs server.
> It takes request from the clients and parse it and sends to the real
> server.
> In the second case it acts as a client. I am looking for minimal
> implementation of nfs daemons.
>
> Thanks,
> Ramana.
>
>
> On Thu, Jun 5, 2014 at 7:18 PM, Peter Senna Tschudin <
> peter.se...@gmail.com> wrote:
>
>> I would start here:
>>
>> http://serverfault.com/questions/401312/how-to-create-an-nfs-proxy-by-using-kernel-server-client
>>
>> On Thu, Jun 5, 2014 at 8:53 AM, Ramana Reddy  wrote:
>> > Hi all,
>> >
>> > I want to implement a minimal skeleton of NFS proxy.
>> > Is there any place where I can look at and find some useful
>> > info. What are the steps I should follow to implement this approach.
>> >
>> > Help in this regards is highly appreciated.
>> >
>> > Thanks,
>> > Ramana.
>> >
>> > ___
>> > Kernelnewbies mailing list
>> > Kernelnewbies@kernelnewbies.org
>> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>> >
>>
>>
>>
>> --
>> Peter
>>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Linux reboot command takes too long

2014-05-30 Thread Rajat Sharma
you have kdump enabled crashkernel=512M@128M


On Fri, May 30, 2014 at 2:38 PM, Vipul Jain  wrote:

> On Fri, May 30, 2014 at 12:32 PM, Vipul Jain  wrote:
>
>> Hi All,
>>
>> I have a quick question on Linux reboot:
>> On my system I have /var/core directory created which has 300G space and
>> if I fill the /var/core with files say upto 290G and reboot the system and
>> after it comes up and delete the files in /var/core and try to reboot the
>> system takes 45 mins before it actually reboots. Wondering if anyone has
>> seen this before and what could be the issue?
>>
>>
>> Regards,
>> Vipul.
>>
>> Anybody knowns what does below means:
> ps elxf | grep shutdown
> 4 0  6327  3842  20   0  12496   788 jbd2_l D?  0:00  \_
> shutdown -r 0 wCONSOLE=/dev/console TERM=linux SHELL=/bin/sh rootmnt=/root
> cpiorootsize= crashkernel=512M@128M image=/xxx/image1/
> INIT_VERSION=sysvinit-2.88 init=/sbin/init COLUMNS=80
> PATH=/xxx/sbin:/xxx/bin:/sbin:/usr/sbin:/bin:/usr/bin runlevel=2 RUNLEVEL=2
> PWD=/root PREVLEVEL=N previous=N LINES=24 HOME=/ SHLVL=2 env=0x3DA97000
> _=/sbin/shutdown
> 0 0  6764  6651  20   0   6304   600 pipe_w S+   pts/0  0:00
>\_ grep shutdownTERM=xterm SHELL=/bin/bash SSH_CLIENT=172.16.92.39 49891
> 22 SSH_TTY=/dev/pts/0 USER=root MAIL=/var/mail/root
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/root
> SHLVL=1 HOME=/root LOGNAME=root SSH_CONNECTION=172.16.92.39 49891
> 172.16.85.88 22 _=/bin/grep
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Linux reboot command takes too long

2014-05-30 Thread Rajat Sharma
It looks like your system has:

   1. Kernel crash dump configured.
   2. Something is causing crash while you issue a reboot, may be some
   module unload path has a bug.
   3. Crash is triggering kernel dump to take dump of memory (I guess you
   have plenty of RAM to justify 45 minutes) in /var/core directory. Although
   dump image is highly compressible but compression is the last step.
   4. If there is no space in /var/core, kdump gives up and you see faster
   reboot.

-Rajat


On Fri, May 30, 2014 at 12:39 PM,  wrote:

> On Fri, 30 May 2014 12:32:32 -0700, Vipul Jain said:
>
> > On my system I have /var/core directory created which has 300G space and
> if
> > I fill the /var/core with files say upto 290G and reboot the system and
> > after it comes up and delete the files in /var/core and try to reboot the
> > system takes 45 mins before it actually reboots. Wondering if anyone has
> > seen this before and what could be the issue?
>
> Is this reproducible?
>
> Do you have any clue at all where in the reboot process it's sitting
> for the 45 minutes?
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Blocked I/O in read() and mmap()

2014-02-26 Thread Rajat Sharma
Check for two things:
1. Handle error from mmap() call in user space, it seems munmap is called
unconditionally.

while(1) {
  str = mmap(vma);   //always map the same address and offset
(they are set to zero), logger will handle this

  fwrite(str, 4096, fd);

  munmap(str);
}

2. Debug more why logger_vma_fault is failing? Is there a window where you
might have buffer list empty while you arrange for new page?

-Rajat


On Wed, Feb 26, 2014 at 5:31 PM, Le Tan  wrote:

> OK, I will look into this. Is it OK to block vm_fault?
>  I have another question. In my userspace program, I mmap() my device,
> then read something, then munmap() my device() and then mmap() my device
> again. The program do this in a loop. Everytime it mmap() the same address
> and offset.
> My device maintains a list of pages. In vm_operations fault, I just map
> the address to the head of the list. And in vm_operations close, I just
> delete the head from the list and free the page.
> Everything seems to be OK except that after I call the munmap() in the
> program, there is an error message. The error seems to happens between the
> call of vm_operations fault and vm_operations close. I have posted this
> question before ( See 
> this<http://www.spinics.net/lists/newbies/msg51339.html>).
>
> [42522.596689] logger_mmap():start:7f8ff57be000, end:7f8ff57bf000
> //this is the mmap() function of my device module
> [42522.596694]
> logger_vma_fault():vmf->pgoff:0d,start:7f8ff57be000,pgoff:0,offset:0
> //this is the fault function of struct vm_operations_struct
> [42522.596729] BUG: Bad page map in process logger_pro
> pte:800612a30025 pmd:314175067 //this is the error
> [42522.596740] page:ea00184a8c00 count:2 mapcount:-2146959356
> mapping: (null) index:0x880612a36000
> [42522.596747] page flags: 0x2004080(slab|head)
> [42522.596811] addr:7f8ff57be000 vm_flags:04040071 anon_vma:
> (null) mapping:880613b25f08 index:0
> [42522.596824] vma->vm_ops->fault: logger_vma_fault+0x0/0x140 [logger]
> [42522.596834] vma->vm_file->f_op->mmap: logger_mmap+0x0/0xd50 [logger]
> [42522.596842] CPU: 1 PID: 21571 Comm: logger_pro Tainted: G B
> IO 3.11.0+ #1
> [42522.596844] Hardware name: Dell Inc. PowerEdge M610/000HYJ, BIOS
> 2.0.13 04/06/2010
> [42522.596846] 7f8ff57be000 880314199c98 816ad166
> 6959
> [42522.596851] 880314539a98 880314199ce8 8114e270
> ea00184a8c00
> [42522.596854]  880314199cc8 7f8ff57be000
> 880314199e18
> [42522.596858] Call Trace:
> [42522.596867] [] dump_stack+0x46/0x58
> [42522.596872] [] print_bad_pte+0x190/0x250
> [42522.596877] [] unmap_single_vma+0x6cb/0x7a0
> [42522.596880] [] unmap_vmas+0x54/0xa0
> [42522.596885] [] unmap_region+0xa7/0x110
> [42522.596888] [] do_munmap+0x1f7/0x3e0
> [42522.596891] [] vm_munmap+0x4e/0x70
> [42522.596904] [] SyS_munmap+0x2b/0x40
> [42522.596915] [] system_call_fastpath+0x16/0x1b
> [42522.596920] logger_vma_close():start:7f8ff57be000,
> end:7f8ff57bf000, vmas:0 //this is the close function of struct
> vm_operations_struct
>
> So do you have any idea about this error?
> Thanks very much!
>
>
> 2014-02-27 9:04 GMT+08:00 Rajat Sharma :
>
> Why do you need to block in mmap()? mmap is supposed to create a mapping
>> area in virtual address space for the process. Actual transfer happens
>> later through page fault handlers on demand basis. look at vm_operations
>> fault/readpage etc methods, these might be the places you want to wait for
>> the data.
>>
>>
>> On Wed, Feb 26, 2014 at 4:14 PM, Le Tan  wrote:
>>
>>> So what should I do if I want the mmap() not to return right now? Is
>>> it strange to block in mmap() and few people will do this? Thanks for
>>> your help!
>>>
>>> 2014-02-27 4:45 GMT+08:00 Rajat Sharma :
>>> > It seems this task "landscape-sysin" is trying to peek into virtual
>>> memory
>>> > of your processes and the process within mmap call is holding its
>>> > mm->mmap_sem semaphore which grants access to its address space.
>>> > landscape-sysin is trying to grab this semaphore to poke into address
>>> space
>>> > of your mmap process address space. As from your description, it might
>>> be
>>> > invoked everytime you are opening a new shell. Not sure why this
>>> process
>>> > bother's about other process address space. Little googling shows this
>>> as
>>> > relevant to your case:
>>> >
>>> >
>>> http://www.techques.com/question/2-66765/Disable-usage-of-console-kit-daemon-in-Ubuntu
&

Re: Blocked I/O in read() and mmap()

2014-02-26 Thread Rajat Sharma
Why do you need to block in mmap()? mmap is supposed to create a mapping
area in virtual address space for the process. Actual transfer happens
later through page fault handlers on demand basis. look at vm_operations
fault/readpage etc methods, these might be the places you want to wait for
the data.


On Wed, Feb 26, 2014 at 4:14 PM, Le Tan  wrote:

> So what should I do if I want the mmap() not to return right now? Is
> it strange to block in mmap() and few people will do this? Thanks for
> your help!
>
> 2014-02-27 4:45 GMT+08:00 Rajat Sharma :
> > It seems this task "landscape-sysin" is trying to peek into virtual
> memory
> > of your processes and the process within mmap call is holding its
> > mm->mmap_sem semaphore which grants access to its address space.
> > landscape-sysin is trying to grab this semaphore to poke into address
> space
> > of your mmap process address space. As from your description, it might be
> > invoked everytime you are opening a new shell. Not sure why this process
> > bother's about other process address space. Little googling shows this as
> > relevant to your case:
> >
> >
> http://www.techques.com/question/2-66765/Disable-usage-of-console-kit-daemon-in-Ubuntu
> >
> > Your read process is innocent and not involved in this deadlock.
> >
> > -Rajat
> >
> >
> > On Wed, Feb 26, 2014 at 4:13 AM, Le Tan  wrote:
> >>
> >> Hi, I am writing a driver module. Now I have some questions about
> blocked
> >> I/O.
> >> my_read() is the read function in the file_operations struct in my
> >> module. my_read() is just as simple as this:
> >> ssize_t my_read()
> >> {
> >> if(wait_event_interruptible(dev->queue, a == b))
> >> return -ERESTARTSYS;
> >> return count;
> >> }
> >> Then I write a simple program to open and read the device. Obviously
> >> the program will be blocked. Now I still can open a new shell window
> >> and log in ( I use xshell).
> >>
> >> However, then I implement my_mmap(), the mmap function in the
> >> file_operations struct in my module, like this:
> >> int my_mmap()
> >> {
> >> if(wait_event_interruptible(dev->queue, a == b))
> >> return -ERESTARTSYS;
> >> return 0;
> >> }
> >> Then I write a simple program to open and mmap() the device. Obviously
> >> the program will be blocked again. However, when I open a new shell
> >> window in xshell and try to connect to the linux, it displays like
> >> this:
> >>
> >> Connecting to 192.168.146.118:22...
> >> Connection established.
> >> To escape to local shell, press 'Ctrl+Alt+]'.
> >>
> >> And I can't log in! Then after a while, in the syslog, there is one
> >> message like this:
> >> [38306.614103] INFO: task landscape-sysin:17616 blocked for more than
> >> 120 seconds.
> >> [38306.614114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [38306.614120] landscape-sysin D 8180fb60 0 17616  17609
> >> 0x
> >> [38306.614125]  88031d609c90 0082 88032fffdb08
> >> 
> >> [38306.614130]  8803130bdc40 88031d609fd8 88031d609fd8
> >> 88031d609fd8
> >> [38306.614133]  88062150c530 8803130bdc40 0041
> >> 8803130bdc40
> >> [38306.614137] Call Trace:
> >> [38306.614147]  [] schedule+0x29/0x70
> >> [38306.614151]  [] rwsem_down_read_failed+0x9d/0xf0
> >> [38306.614157]  []
> call_rwsem_down_read_failed+0x14/0x30
> >> [38306.614160]  [] ? down_read+0x24/0x2b
> >> [38306.614166]  [] __access_remote_vm+0x41/0x1f0
> >> [38306.614170]  [] access_process_vm+0x5b/0x80
> >> [38306.614175]  [] proc_pid_cmdline+0x93/0x120
> >> [38306.614178]  [] proc_info_read+0xa5/0xf0
> >> [38306.614182]  [] vfs_read+0xb4/0x180
> >> [38306.614185]  [] SyS_read+0x52/0xa0
> >> [38306.614189]  [] system_call_fastpath+0x16/0x1b
> >>
> >> If I terminate the program by force, then I can log in right now.
> >> So, are there any differences between the read and the mmap function
> >> to the wait_event_interruptible()? Why? If I want to block mmap() just
> >> like blocking read(), what should I do? Or it is impossible?
> >> Thanks!
> >>
> >> ___
> >> Kernelnewbies mailing list
> >> Kernelnewbies@kernelnewbies.org
> >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> >
> >
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Blocked I/O in read() and mmap()

2014-02-26 Thread Rajat Sharma
It seems this task "landscape-sysin" is trying to peek into virtual memory
of your processes and the process within mmap call is holding its
mm->mmap_sem semaphore which grants access to its address space.
landscape-sysin is trying to grab this semaphore to poke into address space
of your mmap process address space. As from your description, it might be
invoked everytime you are opening a new shell. Not sure why this process
bother's about other process address space. Little googling shows this as
relevant to your case:

http://www.techques.com/question/2-66765/Disable-usage-of-console-kit-daemon-in-Ubuntu

Your read process is innocent and not involved in this deadlock.

-Rajat


On Wed, Feb 26, 2014 at 4:13 AM, Le Tan  wrote:

> Hi, I am writing a driver module. Now I have some questions about blocked
> I/O.
> my_read() is the read function in the file_operations struct in my
> module. my_read() is just as simple as this:
> ssize_t my_read()
> {
> if(wait_event_interruptible(dev->queue, a == b))
> return -ERESTARTSYS;
> return count;
> }
> Then I write a simple program to open and read the device. Obviously
> the program will be blocked. Now I still can open a new shell window
> and log in ( I use xshell).
>
> However, then I implement my_mmap(), the mmap function in the
> file_operations struct in my module, like this:
> int my_mmap()
> {
> if(wait_event_interruptible(dev->queue, a == b))
> return -ERESTARTSYS;
> return 0;
> }
> Then I write a simple program to open and mmap() the device. Obviously
> the program will be blocked again. However, when I open a new shell
> window in xshell and try to connect to the linux, it displays like
> this:
>
> Connecting to 192.168.146.118:22...
> Connection established.
> To escape to local shell, press 'Ctrl+Alt+]'.
>
> And I can't log in! Then after a while, in the syslog, there is one
> message like this:
> [38306.614103] INFO: task landscape-sysin:17616 blocked for more than
> 120 seconds.
> [38306.614114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [38306.614120] landscape-sysin D 8180fb60 0 17616  17609
> 0x
> [38306.614125]  88031d609c90 0082 88032fffdb08
> 
> [38306.614130]  8803130bdc40 88031d609fd8 88031d609fd8
> 88031d609fd8
> [38306.614133]  88062150c530 8803130bdc40 0041
> 8803130bdc40
> [38306.614137] Call Trace:
> [38306.614147]  [] schedule+0x29/0x70
> [38306.614151]  [] rwsem_down_read_failed+0x9d/0xf0
> [38306.614157]  [] call_rwsem_down_read_failed+0x14/0x30
> [38306.614160]  [] ? down_read+0x24/0x2b
> [38306.614166]  [] __access_remote_vm+0x41/0x1f0
> [38306.614170]  [] access_process_vm+0x5b/0x80
> [38306.614175]  [] proc_pid_cmdline+0x93/0x120
> [38306.614178]  [] proc_info_read+0xa5/0xf0
> [38306.614182]  [] vfs_read+0xb4/0x180
> [38306.614185]  [] SyS_read+0x52/0xa0
> [38306.614189]  [] system_call_fastpath+0x16/0x1b
>
> If I terminate the program by force, then I can log in right now.
> So, are there any differences between the read and the mmap function
> to the wait_event_interruptible()? Why? If I want to block mmap() just
> like blocking read(), what should I do? Or it is impossible?
> Thanks!
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Interrupt fires when module is unloaded

2014-01-06 Thread Rajat Sharma
It would be nice to post the code when asking for debugging help. Looks
like your interrupts are in masked state but when you unload the driver
they are getting unmasked and hence you are receiving them on unload.

-Rajat


On Mon, Jan 6, 2014 at 4:09 PM, Eric Fowler  wrote:

> I am trying to figure out interrupts by writing a shadow of Rubini's
> 'short' program. Recall that Rubini tells us to enable parallel port
> interrupts by wiring pins 9&10 together, then writing binary data to the
> parallel port's address.
>
> I am doing that, but:
> - I don't see interrupts when I write to the port
> - I do see one interrupt when I unload the driver (in the fop's .release
> method)
> - This happens whether or not the pins are wired up.
>
> What is going on here?
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Breaking up a bvec in a bio for reading more than 512

2014-01-06 Thread Rajat Sharma
I am not sure if I understood your question correctly but there is no
scatter-gather semantics in terms of disk/file offsets, I don't think any
OS would have implemented such complex thing for no good reason. To
implement such thing, it more than just software what it would take to
achieve. Disk controller should support such parallelism, which they
already do by cylinders, but one cylinder is laid out sequentially in terms
of disk _logical_ offset.

Regards,
Rajat


On Mon, Jan 6, 2014 at 1:09 PM, neha naik  wrote:

> Hi Rajat,
>  I am not opposed to creating multiple bio. I just meant to say that
> if there is another method which involves not breaking the bio (as i
> understand breaking the bio) i would love to know it.
>
> Regards,
> Neha
>
>
>
> On Mon, Jan 6, 2014 at 12:23 PM, Rajat Sharma  wrote:
> > Why do you want to avoid creating multiple bio? If you have data on
> multiple
> > disks, create multiple of them and shoot them simultaneously to get
> > advantage of parallel IO. And if it is single disk, elevators of lower
> disk
> > would do a good job of reading/writing them in serial order of disk
> seek. I
> > don't see much of savings with not creating bio, it is going to be
> allocated
> > from slab anyways. Also risks you involve with leaving bio in a
> corruptible
> > state after customization for one disk are higher.
> >
> >
> > On Mon, Jan 6, 2014 at 10:03 AM, neha naik  wrote:
> >>
> >> Hi All,
> >>   I figured out the method by some trial and error and looking at the
> >> linux source code.
> >>   We can do something like this :
> >>   Say we want to read pages of bvec in 512 chunks. Create bio with
> >> a single page and read 512 chunk of data from wherever you want to (it
> >> can be different disks).
> >>
> >>dst = kmap_atomic(bvec->bv_page, KM_USER0); ---> bvec is of
> >> original bio
> >>src = kmap_atomic(page, KM_USER0); ---> page we read by
> >> creating new bio
> >>memcpy(dst+offset, src, 512);
> >>kunmap_atomic(src, KM_USER0);
> >>kunmap_atomic(dst, KM_USER0);
> >>
> >> My difficulty was not being able to access the high memory page in
> >> kernel. I was earlier trying to increment the offset of the bvec and
> >> pass the page to the layer below assuming that it would read in the
> >> data at correct offset but of course it was resulting in panic. The
> >> above solves that. Of course, if there is some other method which
> >> involves not creating any bio i would love to know.
> >>
> >> Regards,
> >> Neha
> >>
> >>
> >> On Sat, Jan 4, 2014 at 9:32 AM, Pranay Srivastava 
> >> wrote:
> >> >
> >> > On 04-Jan-2014 5:18 AM, "neha naik"  wrote:
> >> >>
> >> >> Hi All,
> >> >>I am getting a request with bvec->bv_len > 512. Now, the
> >> >> information to be read is scattered across the entire disk in 512
> >> >> chunks. So that, information on disk can be : sector 8, sector 100,
> >> >> sector 9.
> >> >>  Now if i get a request to read with the bvec->bv_len > 512 i need to
> >> >> pull in the information from
> >> >> multiple places on disk since the data is not sequentially located.
> >> >>  I tried to look at the linux source code because i think raid must
> be
> >> >> doing it all the time. (eg : on disk 1 we may be storing sector 6 and
> >> >> on disk 2 we may be storing sector 7 and so on).
> >> >
> >> > You are right. Perhaps you need to clone the bio and set them
> properly.
> >> > I
> >> > guess you ought to check dm driver's make_request function. It does
> >> > clone
> >> > bio.
> >> >
> >> > I don't know if you can split that request while handling it. Perhaps
> >> > reinserting that request could work.
> >> >
> >> >>   However, i have not really got any useful information from it. Also
> >> >> scouring through articles on
> >> >> google has not helped much.
> >> >>I am hoping somebody points me in the right direction.
> >> >>
> >> >> Thanks in advance,
> >> >> Neha
> >> >>
> >> >> ___
> >> >> Kernelnewbies mailing list
> >> >> Kernelnewbies@kernelnewbies.org
> >> >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> >> >
> >> >   ---P.K.S
> >>
> >> ___
> >> Kernelnewbies mailing list
> >> Kernelnewbies@kernelnewbies.org
> >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> >
> >
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Breaking up a bvec in a bio for reading more than 512

2014-01-06 Thread Rajat Sharma
Why do you want to avoid creating multiple bio? If you have data on
multiple disks, create multiple of them and shoot them simultaneously to
get advantage of parallel IO. And if it is single disk, elevators of lower
disk would do a good job of reading/writing them in serial order of disk
seek. I don't see much of savings with not creating bio, it is going to be
allocated from slab anyways. Also risks you involve with leaving bio in a
corruptible state after customization for one disk are higher.


On Mon, Jan 6, 2014 at 10:03 AM, neha naik  wrote:

> Hi All,
>   I figured out the method by some trial and error and looking at the
> linux source code.
>   We can do something like this :
>   Say we want to read pages of bvec in 512 chunks. Create bio with
> a single page and read 512 chunk of data from wherever you want to (it
> can be different disks).
>
>dst = kmap_atomic(bvec->bv_page, KM_USER0); ---> bvec is of
> original bio
>src = kmap_atomic(page, KM_USER0); ---> page we read by
> creating new bio
>memcpy(dst+offset, src, 512);
>kunmap_atomic(src, KM_USER0);
>kunmap_atomic(dst, KM_USER0);
>
> My difficulty was not being able to access the high memory page in
> kernel. I was earlier trying to increment the offset of the bvec and
> pass the page to the layer below assuming that it would read in the
> data at correct offset but of course it was resulting in panic. The
> above solves that. Of course, if there is some other method which
> involves not creating any bio i would love to know.
>
> Regards,
> Neha
>
>
> On Sat, Jan 4, 2014 at 9:32 AM, Pranay Srivastava 
> wrote:
> >
> > On 04-Jan-2014 5:18 AM, "neha naik"  wrote:
> >>
> >> Hi All,
> >>I am getting a request with bvec->bv_len > 512. Now, the
> >> information to be read is scattered across the entire disk in 512
> >> chunks. So that, information on disk can be : sector 8, sector 100,
> >> sector 9.
> >>  Now if i get a request to read with the bvec->bv_len > 512 i need to
> >> pull in the information from
> >> multiple places on disk since the data is not sequentially located.
> >>  I tried to look at the linux source code because i think raid must be
> >> doing it all the time. (eg : on disk 1 we may be storing sector 6 and
> >> on disk 2 we may be storing sector 7 and so on).
> >
> > You are right. Perhaps you need to clone the bio and set them properly. I
> > guess you ought to check dm driver's make_request function. It does clone
> > bio.
> >
> > I don't know if you can split that request while handling it. Perhaps
> > reinserting that request could work.
> >
> >>   However, i have not really got any useful information from it. Also
> >> scouring through articles on
> >> google has not helped much.
> >>I am hoping somebody points me in the right direction.
> >>
> >> Thanks in advance,
> >> Neha
> >>
> >> ___
> >> Kernelnewbies mailing list
> >> Kernelnewbies@kernelnewbies.org
> >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> >
> >   ---P.K.S
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Breaking up a bvec in a bio for reading more than 512

2014-01-03 Thread Rajat Sharma
create multiple bio.

-Rajat


On Fri, Jan 3, 2014 at 3:41 PM, neha naik  wrote:

> Hi All,
>I am getting a request with bvec->bv_len > 512. Now, the
> information to be read is scattered across the entire disk in 512
> chunks. So that, information on disk can be : sector 8, sector 100,
> sector 9.
>  Now if i get a request to read with the bvec->bv_len > 512 i need to
> pull in the information from
> multiple places on disk since the data is not sequentially located.
>  I tried to look at the linux source code because i think raid must be
> doing it all the time. (eg : on disk 1 we may be storing sector 6 and
> on disk 2 we may be storing sector 7 and so on).
>   However, i have not really got any useful information from it. Also
> scouring through articles on
> google has not helped much.
>I am hoping somebody points me in the right direction.
>
> Thanks in advance,
> Neha
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Can't cleanly unload driver

2013-12-30 Thread Rajat Sharma
Hi Eric,

I have seen some errors with module reference counting with a nicely
written code, but culprit for my case was a missing compilation flag
-DMODULE which gives definition of THIS_MODULE, otherwise it is null e.g.
for modules which are compiled in kernel, so they are never unloaded.
Unless you have some customization done to Makefiles, this definition
should be included, but its anyways good to double check and rule out this
possibility.

-Rajat


On Mon, Dec 30, 2013 at 9:48 AM, Eric Fowler  wrote:

> Still working on this. Here is some dmesg spew:
>
> [  514.245846] foobar: module verification failed: signature and/or
> required key missing - tainting kernel
> [  514.245937] kobject: 'foobar' (f7f060c8): kobject_add_internal: parent:
> 'module', set: 'module'
> [  514.245951] kobject: 'holders' (f5ff3d40): kobject_add_internal:
> parent: 'foobar', set: ''
> [  514.245981] kobject: 'notes' (f2d25f80): kobject_add_internal: parent:
> 'foobar', set: ''
> [  514.245987] kobject: 'foobar' (f7f060c8): kobject_uevent_env
> [  514.245998] kobject: 'foobar' (f7f060c8): fill_kobj_path: path =
> '/module/foobar'
>
> So it looks like kernel validation is failing. I have printk's in my init
> fxn that are never turning up in /var/log/messages, until, weirdly, AFTER I
> remove the device:
>
> 
> Dec 30 09:43:03 localhost kernel: [  514.245846] foobar: module
> verification failed: signature and/or required key missing - tainting kernel
> Dec 30 09:43:16 localhost fprintd[1085]: ** Message: No devices in use,
> exit
>
> 
> Dec 30 09:45:53 localhost kernel: [  514.249323] foobar: got device number
> 248, minor is 0   THIS IS IN init() fxn
> Dec 30 09:45:53 localhost kernel: [  684.102912] unregister_chrdev(248)
> called for foobar<7>[  684.102927] kobject: '(null)' (f7f06220):
> kobject_cleanup, parent   (null)
>
> 
> insmod: ERROR: could not insert module ./foobar.ko: Device or resource busy
>
>
>
>
>
>
>
>
> On Fri, Dec 27, 2013 at 9:13 PM,  wrote:
>
>> On Fri, 27 Dec 2013 19:33:50 -0800, Eric Fowler said:
>>
>> > I suspect I am doing something wrong in the code with
>> > register/unregister_chrdev(), but I have been over that code a million
>> > times. It looks fine.
>> >
>> > Now:
>> > insmod the device, OK
>> > rmmod the device, OK
>> > Check /proc/devices , device # is present
>> > insmod the device again, fails with ERROR: could not insert module
>> > ./foobar.ko: Device or resource busy
>>
>> It does smell like an unregister issue.  You may want to try adding
>> printk() calls to print out the return code from register and unregister.
>> I'm willing to bet that (a) the unegister is failing because somebody
>> still has a reference on the device, and (b) the second register call
>> fails
>> because the device already exists, causing your module_init() to bail out.
>>
>> The fun is that you may not have taken a reference on the device directly
>> yourself - you may have called some other get_foo() that ends up taking an
>> implicit reference under the covers, causing issues when you fail to call
>> put_foo() at the right place...
>>
>
>
>
> --
> cc:NSA
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Generic question

2013-12-05 Thread Rajat Sharma
Usually a user process can hang inside kernel if it makes a system call and
is blocked inside kernel waiting on a semaphore (where it would leave CPU)
or spinning for a spinlock which might lock down the CPU core. You can
examine where process is stuck by:

# echo w > /proc/sysrq-trigger

This sysrq will show you hung tasks and their stack trace in
/var/log/messages or dmesg output.

But a process might as well hang in user mode e.g. on user mode semaphore.
Most of the time there are some logical bugs in the program or kernel
subsystems for which hang occurs which needs to be fixed instead of just
restarting the process. Might work for user mode hang, but may not for hang
inside kernel.

Rajat

On Thu, Dec 5, 2013 at 8:34 PM, Vipul Jain  wrote:

> Hi All,
>
> A quick question:
>
> If any of user space process is hanged or not responding, does it means
> kernel is also hung  in some part? I trying to understand why hang happens
> and what triggers it.happening ? How to go about recover a hang process,
> apart from restarting it.
>
> Regards,
> Vipul.
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: watchdog pet in kernel module

2013-12-04 Thread Rajat Sharma
Although /dev/watchdog is available in usermode, but nothing should stop
you to write to it from a kernel thread.

Rajat


On Wed, Dec 4, 2013 at 5:50 PM, Peter Teoh  wrote:

>
>
>
> On Thu, Dec 5, 2013 at 9:06 AM, Vipul Jain  wrote:
>
>>
>>
>>
>> On Wed, Dec 4, 2013 at 4:57 PM,  wrote:
>>
>>> On Wed, 04 Dec 2013 16:45:44 -0800, Vipul Jain said:
>>>
>>> > If you don't mind can you please provide me more insight as what can be
>>> > false alarm I can encounter to move pet inside kernel module?
>>>
>>> The issue isn't false alarms - it's failure to alarm when it should.
>>>
>>> The problem is that it's possible for a kernel to get wedged in such a
>>> way that
>>> a kernel thread is still able to feed the watchdog timer on a regular
>>> basis,
>>> but userspace is effectively hung and unable to proceed.  For example,
>>> if an
>>> OOPS happens while a filesystem lock is held, all future userspace
>>> references
>>> to that filesystem (and possibly all filesystems of the same type) will
>>> hang,
>>> eventually strangling the box while the kernel is still perfectly able
>>> to keep
>>> the watchdog working.
>>>
>>> Hi Valdis,
>>
>> I see what you are saying but what if the user process that's feeding the
>> dog gets hung and rest of the system is fine then it will bring the whole
>> system down won't it? I basically want to avoid this?
>>
>>
> Normally the process that feed the dog, is a simple process that JUST
> periodically set the watchdog device descriptor.Yes, one main() with a
> while loop just periodically resetting the descriptor.
>
> And so it is is not able to respond in time, by inference, OTHER PROCESS
> must have hung.   In other system i saw there is a mother process that
> monitor a few (not all) of its key child process  so perhaps one child
> will have one variable to signal to the mother that it is running.   If not
> responding in time, the mother will clean up everything and then purposely
> not setting the watchdog, resulting in reboot.
>
>
>> Regards,
>> Vipul.
>>
>>
>
>
> --
> Regards,
> Peter Teoh
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Ordering / preemption of work in a workqueue preempt?

2013-11-15 Thread Rajat Sharma
Hi Rajat,


On Fri, Nov 15, 2013 at 7:16 AM, Rajat Jain  wrote:

> Hi,
>
> I have a single work queue, on which I have scheduled a worker function
> [using queue_work(wq, fn)] in interrupt context.
>
> I get the interrupt twice before the work queue gets a chance to run, and
> hence the same function will get queued twice (with different private
> context - arguments etc) which is fine and expected.
>
> Questions:
>
> 1) Is it possible that the instance that was queued by 2nd interrupt, can
> get to run BEFORE the instance that was queued by 1st interrupt? In other
> words, is reordering possible?
>

It is unlikely as workqueue would have internal queing of tasks.


>
> 2) Is it possible that one running instance of the function, can get
> preempted by second instance of the same work queue? I read through
> http://lwn.net/Articles/511421/ and it talks about same work queue cannot
> run on different CPU, but I have doubt about single CPU. If If I am writing
> a worker function, does my code have to be ready that it can be preempted
> by another instance of the same function?
>
>
Do you mean the system has just one CPU, then there would be just one
worker thread in the workqueue which will pick up requests one by one. Do
you see multiple threads?


> Please note that I understand that my worker function can preempted by
> other processes, my doubts are related to the same workqueue.
>
> Thanks,
>
> Rajat
>
>
>
Rajat


> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Block device driver question

2013-11-01 Thread Rajat Sharma
On Fri, Nov 1, 2013 at 12:17 PM, neha naik  wrote:

> Hi Rajat,
>  Thanks for the information. One more question :
>Say my block device driver doesn't support reads and the
> application always does aligned io in 512 chunks (but it is not direct
> io). In that case, will i get a read because the page size is 4096 and
> yet we are writing 512. Because i am not getting any read which is why
> i am confused.I have been doing the io after syncing the page cache so
> it is not like i get a pagecache hit every time.
>

sync does not evict page cache. And is your block device sector size
declared as 512 ?


>   I am doing a normal dd without any special flags, just 'bs=512'.
>
> Regards,
> Neha
>
>
> On Fri, Nov 1, 2013 at 12:16 PM, Rajat Sharma  wrote:
> > Hi Neha,
> >
> >
> > On Fri, Nov 1, 2013 at 10:26 AM, neha naik  wrote:
> >>
> >> Hi,
> >>   I am writing a block device driver and i am using the
> >> 'blq_queue_make_request' call while registering my block device
> >> driver.
> >>   Now as far as i understand this will bypass the linux kernel queue
> >> for each block device driver (bypassing the elevator algorithm etc).
> >> However, i am still not very clear about exactly how i get a request.
> >>
> >>  1.  Consider i am doing a dd on the block device directly :
> >>   Will it bypass the buffer cache(/page cache) or will it use it.
> >> Example if i register my block device with set_blocksize() as 512. And
> >> i do a dd of 512 bytes will i get a read because it passes through the
> >> buffer cache and since the minimum page size is 4096 it has to read
> >> the page first and then pass it to me.
> >> I am still unclear about the 'page' in the bvec. What does that
> >> refer to? Is it a page from the page cache or a user buffer (DMA).
> >>
> >
> > If you are not using oflag=direct with dd, then you are getting 'page' in
> > bvec that belongs to buffer cache (in 2.6 it is implemented as
> page-cache of
> > block_device->bd_inode->i_mapping). You get user buffer only with direct
> IO,
> > but then you need to take care to issue aligned IO requests yourself (if
> > your block device wants only aligned buffers its your implementation
> > though).
> >
> >>
> >> 2. Another thing i am not clear about is a queue. When i register my
> >> driver, the 'make_request' function gets called whenever there is an
> >> io. Now in my device driver, i have some more logic about  writing
> >> this io i.e some time may be spent in the device driver for each io.
> >> In such a case, if i get two ios on the same block one after the other
> >> (say one is writing 'a' and the other is writing 'b') then isn't it
> >> possible that i may end up passing 'b' followed by 'a' to the layer
> >> below me (changing the order because thread 'a' took more time than
> >> thread 'b'). Then in that case should i be using a queue in my layer -
> >> put the ios in the queue whenever i get a call to 'make_request'.
> >> Another thread keeps pulling the ios from the queue and processing
> >> them and passing it to the layer below.
> >>
> >
> > If your application does not quarantee the ordering of writes, then you
> > don't have to worry either. Most likely block layer will do the merges in
> > page-cache if it is not a direct IO. As a driver developer, you don't
> need
> > to worry about out of order writes from application.
> >
> >>
> >>
> >> Regards,
> >> Neha
> >>
> >> ___
> >> Kernelnewbies mailing list
> >> Kernelnewbies@kernelnewbies.org
> >> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> >
> >
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Block device driver question

2013-11-01 Thread Rajat Sharma
Hi Neha,


On Fri, Nov 1, 2013 at 10:26 AM, neha naik  wrote:

> Hi,
>   I am writing a block device driver and i am using the
> 'blq_queue_make_request' call while registering my block device
> driver.
>   Now as far as i understand this will bypass the linux kernel queue
> for each block device driver (bypassing the elevator algorithm etc).
> However, i am still not very clear about exactly how i get a request.
>
>  1.  Consider i am doing a dd on the block device directly :
>   Will it bypass the buffer cache(/page cache) or will it use it.
> Example if i register my block device with set_blocksize() as 512. And
> i do a dd of 512 bytes will i get a read because it passes through the
> buffer cache and since the minimum page size is 4096 it has to read
> the page first and then pass it to me.
> I am still unclear about the 'page' in the bvec. What does that
> refer to? Is it a page from the page cache or a user buffer (DMA).
>
>
If you are not using oflag=direct with dd, then you are getting 'page' in
bvec that belongs to buffer cache (in 2.6 it is implemented as page-cache
of block_device->bd_inode->i_mapping). You get user buffer only with direct
IO, but then you need to take care to issue aligned IO requests yourself
(if your block device wants only aligned buffers its your implementation
though).


> 2. Another thing i am not clear about is a queue. When i register my
> driver, the 'make_request' function gets called whenever there is an
> io. Now in my device driver, i have some more logic about  writing
> this io i.e some time may be spent in the device driver for each io.
> In such a case, if i get two ios on the same block one after the other
> (say one is writing 'a' and the other is writing 'b') then isn't it
> possible that i may end up passing 'b' followed by 'a' to the layer
> below me (changing the order because thread 'a' took more time than
> thread 'b'). Then in that case should i be using a queue in my layer -
> put the ios in the queue whenever i get a call to 'make_request'.
> Another thread keeps pulling the ios from the queue and processing
> them and passing it to the layer below.
>
>
If your application does not quarantee the ordering of writes, then you
don't have to worry either. Most likely block layer will do the merges in
page-cache if it is not a direct IO. As a driver developer, you don't need
to worry about out of order writes from application.


>
> Regards,
> Neha
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


RE: Help - Linux kernel and Device drivers

2013-08-22 Thread Rajat Sharma
I think you can start with Device driver book, try implementing its example
drivers, at the same time you can read corresponding topics from Robert
love. Later when you want to deep dive into some particular topics, refer
to UTLK book.

Rajat
--
From: Tharanga Abeyseela
Sent: 23-08-2013 06:47
To: kernelnewbies@kernelnewbies.org
Subject: Help - Linux kernel and Device drivers

Hi ,



I have done C and some Linux kernel programming (only Netfilter
sockets,iptables), but now i decided to learn more on Linux kernel level,
device drivers..

so i bought Robert  Loves (Kernel development) book, device driver 3rd ed
(orelly), and Understanding Linux kernel 3rd edition as my base reference,
with Operating system design concepts (dragon book).

Do i need to read at least one book before do some kernel level stuff or
what is the best way to learn  kernel programming and device driver
area...(i consider myself as a newbie to the kernel/device drivers)

Really appreciate your expert advice on this

Best Regards,
T
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


RE: Conceptual questions about device driver

2013-08-02 Thread Rajat Sharma
BTW I have not seen filesystem implementations doing write ordering for
direct IO unless a request merge is possible (different user credentials),
since you have mentioned you are from filesystem background, do you know
any such implimentation?

Rajat
--
From: Rajat Sharma
Sent: 03-08-2013 09:25
To: neha naik; Greg Freemyer
Cc: kernelnewbies
Subject: RE: Conceptual questions about device driver

If filesystem has to guarantee write ordering, it has to serialize write
request. So, only when 'A' is written to the block (completion received
from block driver) filesystem can dispatch 'B' not before that. Remember
that there could be failures too. So if 'A' is failed, request for 'B' is
pending, would a filesystem fail that too? I guess not. Usually page cache
absorbs such overwrites first than issues a single merged IO to block
device, but what about direct IO? If applications need consistency
guarantee against such failures, it should serialize writes. So to
summarize, write ordering effort from each layers
Block driver: none
Filesystem: best effort
Application: full

Rajat
--
From: neha naik
Sent: 03-08-2013 02:40
To: Greg Freemyer
Cc: Rajat Sharma; kernelnewbies
Subject: Re: Conceptual questions about device driver

Thanks for the responses.  I have one more  question for Greg. I come from
filesystem background and not device driver so i may be a bit confused
about the write order fidelity. I know that filesystems guarantee that.
Looking from filesystem perspective, no write will be allowed on the same
block until
the first write finishes. So, if 'B' is written after 'A' you can always
guarantee that you will see 'B' at the end of the two writes.
  Now imagine not having a filesystem, and doing a write directly on the
device. Do device drivers honour it. Should they? I imagine device driver
as a kind of
queue. So any writes are always queued up one after the other so that it
gives write order fidelity whether it wants to or not. Am i missing
something here.

Regards,
Neha


On Fri, Aug 2, 2013 at 1:56 PM, Greg Freemyer wrote:

> On Fri, Aug 2, 2013 at 1:32 AM, Rajat Sharma  wrote:
> > On Fri, Aug 2, 2013 at 2:25 AM, neha naik  wrote:
> >> Hi,
> >>  I have some conceptual questions about device driver :
> >>
> >> 1. Write order fidelity should be maintained when submitting requests
> from
> >> device driver to disk below.
> >> However, acknowledging these requests it is okay if we don't
> necessarily
> >> maintain that order, right?
> >>
> >
> > Yes it should not matter as long as application can rely on data being
> > written is in order of submission.
>
> But it can't . unless the write cache is turned off and it is
> known the the cache is truly off.
>
> There is no guarantee of write order in the block stack.  Not between
> the filesystem and the driver.  Not between the driver and the drive.
>
> There are at least 2 elevators shuffling the order of writes to
> optimize performance.
>
> Rajat, did you get confused?  Or were you trying to say something else?
>
> Greg
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


RE: Conceptual questions about device driver

2013-08-02 Thread Rajat Sharma
If filesystem has to guarantee write ordering, it has to serialize write
request. So, only when 'A' is written to the block (completion received
from block driver) filesystem can dispatch 'B' not before that. Remember
that there could be failures too. So if 'A' is failed, request for 'B' is
pending, would a filesystem fail that too? I guess not. Usually page cache
absorbs such overwrites first than issues a single merged IO to block
device, but what about direct IO? If applications need consistency
guarantee against such failures, it should serialize writes. So to
summarize, write ordering effort from each layers
Block driver: none
Filesystem: best effort
Application: full

Rajat
--
From: neha naik
Sent: 03-08-2013 02:40
To: Greg Freemyer
Cc: Rajat Sharma; kernelnewbies
Subject: Re: Conceptual questions about device driver

Thanks for the responses.  I have one more  question for Greg. I come from
filesystem background and not device driver so i may be a bit confused
about the write order fidelity. I know that filesystems guarantee that.
Looking from filesystem perspective, no write will be allowed on the same
block until
the first write finishes. So, if 'B' is written after 'A' you can always
guarantee that you will see 'B' at the end of the two writes.
  Now imagine not having a filesystem, and doing a write directly on the
device. Do device drivers honour it. Should they? I imagine device driver
as a kind of
queue. So any writes are always queued up one after the other so that it
gives write order fidelity whether it wants to or not. Am i missing
something here.

Regards,
Neha


On Fri, Aug 2, 2013 at 1:56 PM, Greg Freemyer wrote:

> On Fri, Aug 2, 2013 at 1:32 AM, Rajat Sharma  wrote:
> > On Fri, Aug 2, 2013 at 2:25 AM, neha naik  wrote:
> >> Hi,
> >>  I have some conceptual questions about device driver :
> >>
> >> 1. Write order fidelity should be maintained when submitting requests
> from
> >> device driver to disk below.
> >> However, acknowledging these requests it is okay if we don't
> necessarily
> >> maintain that order, right?
> >>
> >
> > Yes it should not matter as long as application can rely on data being
> > written is in order of submission.
>
> But it can't . unless the write cache is turned off and it is
> known the the cache is truly off.
>
> There is no guarantee of write order in the block stack.  Not between
> the filesystem and the driver.  Not between the driver and the drive.
>
> There are at least 2 elevators shuffling the order of writes to
> optimize performance.
>
> Rajat, did you get confused?  Or were you trying to say something else?
>
> Greg
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: sys_read buffer too large

2013-07-20 Thread Rajat Sharma
On Sat, Jul 20, 2013 at 8:49 PM, anish singh wrote:

>
>
>
> On Sat, Jul 20, 2013 at 2:04 PM, Don Raikes  wrote:
>
>> Hello all,
>>
>> ** **
>>
>> I am very new to kernel programming. In fact, I have been working on it
>> for a week now.
>>
>> ** **
>>
>> I am taking a computer science class, and one of our assignments is to
>> hook some of the system calls in a 2.6.28 kernel.
>>
>> ** **
>>
>> I have the basic module created, and I am hooking into the sys_read
>> function.  The assignment is to print to the log what is being read using
>> sys_read.
>>
>> ** **
>>
>> I have some checks in my function to limit my printing to only a certain
>> file, so I don’t get the contents of every file being read on the system.
>> 
>>
>> ** **
>>
>> OnceI know I have the right file, I decided to print the size of the
>> buffer.
>>
>> ** **
>>
>> The function signature is:
>>
>> ** **
>>
>> Asmlinkage long sys_read(unsigned int fd, char __user *buf, size_t count);
>> 
>>
>> ** **
>>
>> So I do:
>>
>> ** **
>>
>> Printk(KERN_INFO “Read buffer is [%d] bytes.\n”,count);
>>
>> ** **
>>
>> The particular file I am reading has 2 lines the first is 14 bytes long
>> and the second is 30 bytes long.
>>
>> **
>>
>
What utility are you using to read files? It is better to write your own
program using glibc 'read' routine. Also you can use 'strace' utility to
trace read system call and its arguments from user space.


> **
>>
>> The output from my printk statements are 8192 and 4096 respectively.
>>
>> Why is the count so large?
>>
> I think it is the size of one PAGE.I think reading from sys_read
> is PAGE aligned.Just print the size of the page and i think it will
> be 4K.
>

I doubt it is true, sys_read will merely pass down arguments which it
received from user space. My hunch is it is user mode utility which is
doing buffered read.


> 
>>
>> ** **
>>
>> Also I do a strlen(buf) and print the value as well, and it is 29 and 16
>> respectively.
>>
>> ** **
>>
>> Next I did  copy_from_user(tbuf,buf,strlen(buf))
>>
>> And tried to print the contents of the buffer but it came out garbage.***
>> *
>>
>> ** **
>>
>> Does anyone have any ideas how I can get this to work?
>>
>> ** **
>>
>> BTW: the assignment is due on Sunday so any help would be aappreciated.**
>> **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> --
>> Best Regards, Donald
>>
>> [image: Oracle] 
>> Donald raikes | Accessibility Specialist/ QA Engineer
>> Phone: +15202717608 | Mobile: +15202717608
>> Oracle Quality Assurance
>> | Tucson, Arizona 
>>
>> [image: Green Oracle] 
>>
>> Oracle is committed to developing practices and products that help
>> protect the environment
>>
>> ** **
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


RE: Git gui tools for linux

2013-07-16 Thread Rajat Sharma
One non free option is git-eye.
--
From: Anand Moon
Sent: 17-07-2013 00:08
To: Kernelnewbies@kernelnewbies.org
Subject: Git gui tools for linux

Hi All,


Could you suggest me good Git GUI tool in Linux to create/merge/update.

I have installed few of them like git-gui and git-cola, but I need it with
some more advance option.

-Anand Moon
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to benchmark my OS

2013-07-14 Thread Rajat Sharma
I am not sure if this one is directly relevant to your case, but you might
want to take a look at LTP: http://ltp.sourceforge.net/

-Rajat


On Mon, Jul 15, 2013 at 8:26 AM, Steven Zhou  wrote:

> Hi,
>
> We have developed our private OS and we want to benchmark the performance
> of it, including time of context switch, interrupt latency, IO output and
> so on ...
>
> We want to study from Linux firstly, so could you guys give me some guide
> of Linux benchmark testing, including test methodologies, test code and so
> on.
>
> Thanks in advance.
>
> --
> Best Regards.
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: kdb understanding

2013-06-09 Thread Rajat Sharma
Hi,

This link describe setting up kgdb for virtualbox:
http://fotis.loukos.me/blog/?p=25. So you can install this kernel inside
virtual box VM to start experimenting with.

-Rajat


On Sun, Jun 9, 2013 at 11:04 AM, Varun Sharma  wrote:

>
> Hi,
>
> I am using 2.6.32.60 kernel . I want to use kdb on this kernel.
> Pls suggest me some tutorials or links that help me to understand kdb.
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: lookup function return value in inode_operations

2013-05-27 Thread Rajat Sharma
returning NULL means you are fairly welcoming the supplied dentry and
attach it to your inode.
returning a dentry means it is a dentry which filesystem figured out was
already associated with its inode. When can this happen? lookup process
builds up a chain from parent to child dentry. In case of NFS export,
lookup is done from file handle which can result in dentries which are
disconnected from their parent, but are associated with inode. So, when you
return such dentries from this operation, VFS will change their
disconnected status.

-Rajat


On Mon, May 27, 2013 at 11:38 PM, Sankar P wrote:

> Hi,
>
> What is the return value of the lookup function under the
> inode_operations struct ? I see that the function gets a dentry as a
> parameter, which can be used to associate an inode. For example:
>
> If we have:
>
> struct dentry * somefs_lookup(struct inode *parent_inode,
>   struct dentry *child_dentry, unsigned int flags)
> {
> d_add(child_dentry, inode);
> }
>
> so, what is the meaning of the return "struct dentry" in this lookup
> function ? In some simple filesystems that I saw, they return NULL
> here and it seem to work just fine. So what is that the lookup
> function should do if we are implementing our own filesystem ?
>
> Thanks.
>
> --
> Sankar P
> http://psankar.blogspot.com
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: hi Why is "can't find a register in class ‘AREG’ while reloading ‘asm’" ?

2013-05-27 Thread Rajat Sharma
I tried compiling this but I get slightly different error:

  error: can’t find a register in class ‘CREG’ while reloading ‘asm’

this error is because of explicit clobber list which is not needed in this
case as all registers in clobber list are input list.
Note that it is CREG because it encounter "cx" in clobber list first.

-Rajat


On Mon, May 27, 2013 at 1:21 PM, lx  wrote:

> hi all:
>The codes is:
> static int match(int len,const char * name,struct dir_entry * de)
> {
> register int same __asm__("ax");
>
> if (!de || !de->inode || len > NAME_LEN)
> return 0;
> if (len < NAME_LEN && de->name[len])
> return 0;
> *__asm__ ("cld\n\t"*
> "fs ; repe ; cmpsb\n\t"
> "setz %%al"
> :"=a" (same)
> :"0" (0),"S" ((long) name),"D" ((long) de->name),"c" (len)
> :"cx","di","si");
> return same;
> }
>
> When I make it, the error messages is:
> *namei.c:35: error: can't find a register in class ‘AREG’ while reloading
> ‘asm’*
> *
> *
> *This is why? *
> *Thank you*
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Reference material for x86-64 assembly on Linux

2013-05-20 Thread Rajat Sharma
Following book describes GNU assembler in a very simple manner:
http://www.wrox.com/WileyCDA/WroxTitle/productCd-0764579010.html


On Mon, May 20, 2013 at 2:43 PM, wannabehacker wb wrote:

>
> "The art of assembly language" by randall hyde is an excellent reference.
>  Intel website has x86 reference manual which is also a very good in-depth
> reference
>
>
> http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
>
> Cheers,
> wbhack3r
>
>
> On Mon, May 20, 2013 at 11:43 AM, amit mehta  wrote:
>
>> Folks, Can you please suggest me some nice books or tutorials
>> that concentrates more on the x86_64 assembly programming
>> using GNU assembler. Unfortunately, I haven't done any assembly
>> programming for the last seven years, but my current job requires
>> me to analyse kernel crashes and a lot of them (probably due to
>> widespread use of x86-64 architecture) originate on x86-64 machines
>> and quite often disassembling is the last resort to inspect the function
>> parameters, stack frames etc.
>>
>> There seem to tons of books, tutorials, assemblers available over the
>> internet, but  I'm looking for something that can give me jumpstart on
>> x86_64 assembly, specially in Linux environment.
>>
>> Recently, While browsing, I've found these two:
>> 1: x86_64 ABI (System V Application Binary Interface,
>> AMD64 Architecture Processor Supplement) -
>> 2: x86-64 Machine-Level Programming - Randal and David
>>
>> Appreciate a lot, If you can recommend me your favourite text book
>> on x86-64 assembly or any such reference material.
>>
>> Thanks,
>> -Amit
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Forum for asking questions related to block device drivers

2013-04-11 Thread Rajat Sharma
so you mean direct I/O read of your passthrough device is lower than
direct I/O read of lvm?

On Thu, Apr 11, 2013 at 8:39 PM, neha naik  wrote:
> Hi,
>  I am calling the merge function of the block device driver below me(since
> mine is only pass through). Does this not work?
> When i tried seeing what read requests were coming then i saw that when i
> issue dd with count=1 it retrieves 4 pages,
> so i tried with 'direct' flag. But even with direct io my read performance
> is way lower than my write performance.
>
> Regards,
> Neha
>
>
> On Wed, Apr 10, 2013 at 11:15 PM, Rajat Sharma  wrote:
>>
>> Hi,
>>
>> On Thu, Apr 11, 2013 at 2:23 AM, neha naik  wrote:
>> > Hi All,
>> >Nobody has replied to my query here. So i am just wondering if there
>> > is a
>> > forum for block device driver where i can post my query.
>> > Please tell me if there is any such forum.
>> >
>> > Thanks,
>> > Neha
>> >
>> > -- Forwarded message --
>> > From: neha naik 
>> > Date: Tue, Apr 9, 2013 at 10:18 AM
>> > Subject: Passthrough device driver performance is low on reads compared
>> > to
>> > writes
>> > To: kernelnewbies@kernelnewbies.org
>> >
>> >
>> > Hi All,
>> >   I have written a passthrough block device driver using 'make_request'
>> > call. This block device driver simply passes any request that comes to
>> > it
>> > down to lvm.
>> >
>> > However, the read performance for my passthrough driver is around 65MB/s
>> > (measured through dd) and write performance is around 140MB/s for dd
>> > block
>> > size 4096.
>> > The write performance matches with lvm's write performance more or less
>> > but,
>> > the read performance on lvm is around 365MB/s.
>> >
>> > I am posting snippets of code which i think are relevant here:
>> >
>> > static int passthrough_make_request(
>> > struct request_queue * queue, struct bio * bio)
>> > {
>> >
>> > passthrough_device_t * passdev = queue->queuedata;
>> > bio->bi_bdev = passdev->bdev_backing;
>> > generic_make_request(bio);
>> > return 0;
>> > }
>> >
>> > For initializing the queue i am using following:
>> >
>> > blk_queue_make_request(passdev->queue, passthrough_make_request);
>> > passdev->queue->queuedata = sbd;
>> > passdev->queue->unplug_fn = NULL;
>> > bdev_backing = passdev->bdev_backing;
>> > blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing));
>> > if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) {
>> > blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn);
>> > }
>> >
>>
>> What is the implementation for sbd_merge_bvec_fn? Please debug through
>> it to check requests are merging or not? May be that is the cause of
>> lower performance?
>>
>> > Now, I browsed through dm code in kernel to see if there is some flag or
>> > something which i am not using which is causing this huge performance
>> > penalty.
>> > But, I have not found anything.
>> >
>> > If you have any ideas about what i am possibly doing wrong then please
>> > tell
>> > me.
>> >
>> > Thanks in advance.
>> >
>> > Regards,
>> > Neha
>> >
>>
>> -Rajat
>>
>> >
>> > ___
>> > Kernelnewbies mailing list
>> > Kernelnewbies@kernelnewbies.org
>> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>> >
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Forum for asking questions related to block device drivers

2013-04-10 Thread Rajat Sharma
Hi,

On Thu, Apr 11, 2013 at 2:23 AM, neha naik  wrote:
> Hi All,
>Nobody has replied to my query here. So i am just wondering if there is a
> forum for block device driver where i can post my query.
> Please tell me if there is any such forum.
>
> Thanks,
> Neha
>
> -- Forwarded message --
> From: neha naik 
> Date: Tue, Apr 9, 2013 at 10:18 AM
> Subject: Passthrough device driver performance is low on reads compared to
> writes
> To: kernelnewbies@kernelnewbies.org
>
>
> Hi All,
>   I have written a passthrough block device driver using 'make_request'
> call. This block device driver simply passes any request that comes to it
> down to lvm.
>
> However, the read performance for my passthrough driver is around 65MB/s
> (measured through dd) and write performance is around 140MB/s for dd block
> size 4096.
> The write performance matches with lvm's write performance more or less but,
> the read performance on lvm is around 365MB/s.
>
> I am posting snippets of code which i think are relevant here:
>
> static int passthrough_make_request(
> struct request_queue * queue, struct bio * bio)
> {
>
> passthrough_device_t * passdev = queue->queuedata;
> bio->bi_bdev = passdev->bdev_backing;
> generic_make_request(bio);
> return 0;
> }
>
> For initializing the queue i am using following:
>
> blk_queue_make_request(passdev->queue, passthrough_make_request);
> passdev->queue->queuedata = sbd;
> passdev->queue->unplug_fn = NULL;
> bdev_backing = passdev->bdev_backing;
> blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing));
> if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) {
> blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn);
> }
>

What is the implementation for sbd_merge_bvec_fn? Please debug through
it to check requests are merging or not? May be that is the cause of
lower performance?

> Now, I browsed through dm code in kernel to see if there is some flag or
> something which i am not using which is causing this huge performance
> penalty.
> But, I have not found anything.
>
> If you have any ideas about what i am possibly doing wrong then please tell
> me.
>
> Thanks in advance.
>
> Regards,
> Neha
>

-Rajat

>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Where does kernel store per task file position?

2013-01-29 Thread Rajat Sharma
Correct :)


On Wed, Jan 30, 2013 at 12:01 PM, Pranay Kumar Srivastava <
pranay.shrivast...@hcl.com> wrote:

>
>
> > -Original Message-----
> > From: Rajat Sharma [mailto:fs.ra...@gmail.com]
> > Sent: Wednesday, January 30, 2013 11:16 AM
> > To: Pranay Kumar Srivastava
> > Cc: kernelnewbies@kernelnewbies.org
> > Subject: Re: Where does kernel store per task file position?
> >
> > > I'm still not able to figure out where exactly is the position of file
> stored per
> > task_struct.
> > struct file * itself is per process (task_struct) so file->f_pos is file
> position per
> > process, if thats what you are looking for. I hope you haven't assumed
> that
> > struct file itself is unique for a file, i.e. per inode? Then that
> assumption is
> > wrong.
> > -Rajat
>
> [Pranay Kumar Srivastava] That really was a stupid question, it says right
> there get_empty_filp() in do_sys_open. For forks the inherited file have
> common struct file [Correct?] but for the files opened after fork in
> child/parent will not have shared struct file[Correct?].  So the same
> dentry can be pointed to by multiple struct file[Correct?] that's why
> there's an increment of dentry while doing lookup[Correct?].
>
> Thanks a lot!
> >
> > On Tue, Jan 29, 2013 at 6:38 PM, Pranay Kumar Srivastava
> >  wrote:
> > Hi Everyone,
> >
> > I was trying to find out where does Linux store per process file
> position?
> > Since struct file is allocated once when the file is first opened
> > (get_empty_filp() via do_sys_open) .I looked at these,
> >
> > Copy_process--->copy_files-->dup_fd  it seemed to allocate only (struct
> > file*)
> >
> > struct files_struct , but I couldn't find any field that is actually
> being used to
> > store the file position.
> >
> >
> > I'm still not able to figure out where exactly is the position of file
> stored per
> > task_struct. Secondly even if this was being saved does the kernel
> changes
> > f_pos of struct file whenever a (read/write) is done? I don't that
> happens
> > [Correct?].
> >
> > Regards,
> > Pranay Kumar Srivastava
> >
> >
> > ::DISCLAIMER::
> >
> --
> > --
> >
> > The contents of this e-mail and any attachment(s) are confidential and
> > intended for the named recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as
> > information could be intercepted, corrupted, lost, destroyed, arrive
> late or
> > incomplete, or may contain viruses in transmission. The e mail and its
> > contents (with or without referred errors) shall therefore not attach any
> > liability on the originator or HCL or its affiliates.
> > Views or opinions, if any, presented in this email are solely those of
> the
> > author and may not necessarily reflect the views or opinions of HCL or
> its
> > affiliates. Any form of reproduction, dissemination, copying, disclosure,
> > modification, distribution and / or publication of this message without
> the
> > prior written consent of authorized representative of HCL is strictly
> > prohibited. If you have received this email in error please delete it
> and notify
> > the sender immediately.
> > Before opening any email and/or attachments, please check them for
> viruses
> > and other defects.
> >
> >
> --
> > --
> >
> >
> > ___
> > Kernelnewbies mailing list
> > Kernelnewbies@kernelnewbies.org
> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Where does kernel store per task file position?

2013-01-29 Thread Rajat Sharma
> I'm still not able to figure out where exactly is the position of file
stored per task_struct.
struct file * itself is per process (task_struct) so file->f_pos is file
position per process, if thats what you are looking for. I hope you haven't
assumed that struct file itself is unique for a file, i.e. per inode? Then
that assumption is wrong.

-Rajat


On Tue, Jan 29, 2013 at 6:38 PM, Pranay Kumar Srivastava <
pranay.shrivast...@hcl.com> wrote:

> Hi Everyone,
>
> I was trying to find out where does Linux store per process file position?
> Since struct file is allocated once when the file is first opened
> (get_empty_filp() via do_sys_open) .I looked at these,
>
> Copy_process--->copy_files-->dup_fd  it seemed to allocate only (struct
> file*)
>
> struct files_struct , but I couldn't find any field that is actually being
> used to store the file position.
>
>
> I'm still not able to figure out where exactly is the position of file
> stored per task_struct. Secondly even if this was being saved does the
> kernel changes f_pos of struct file whenever a (read/write) is done? I
> don't that happens [Correct?].
>
> Regards,
> Pranay Kumar Srivastava
>
>
> ::DISCLAIMER::
>
> 
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior
> written consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error
> please delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
>
> 
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: internel implemention of file operation

2013-01-11 Thread Rajat Sharma
> Default read/write inerfaces does not move file's data to process address
space ?
Yes it does, in either case, user space memory has to be in process address
space. But Difference is in the access pattern. With read/write, you demand
for it explicitly through a system call, hence application is more involved
here. While for mmap access, all transfer happens in application-unaware
mechanism, with page-fault handlers inside kernel. Application just access
mapped buffers like memory array and magic happens inside kernel as you
keep on accessing bytes.

Best way to find out difference is to try out writing a simple mmap program.

-Rajat


On Fri, Jan 11, 2013 at 3:31 AM, horseriver  wrote:

> On Fri, Jan 11, 2013 at 12:39:26PM +0530, Rajat Sharma wrote:
>
> > Default read/write inerfaces are better suited for sequential read/write
> > within your program. Although you can seek to any location within the
> file,
> > you still have overhead to issue system calls to get data. However mmap
> > allows you to map a section of file into program address space.
>
>   Default read/write inerfaces does not move file's data to process
> address space ?
>
>   when  r/w a file descript which returnd by open() , how do the file data
> move from one place to another place ?
>
>   For each time the write function being  called  , will kernel  call
> filesystem's driver's write  to respond  ??
>   In my opinion,kernel will passed a  buffer's head address  which is
> passed form user-layer into driver,then driver will fill this buffer with
> file's
>   data which is got by filesystem's read operation ?
>
>   Am I right?
>
>   Thanks!
> >
> >
> > -Rajat
> >
> >
> > On Fri, Jan 11, 2013 at 2:44 AM, horseriver 
> wrote:
> >
> > > hi:
> > >
> > >   these two wayes of operating one file :
> > >
> > >   1.use open/write interface call .
> > >
> > >   2.mmap this file into memory , then access this memory area and do
> r/w .
> > >
> > >   what is the essential difference between this teo wayes?
> > >
> > > thanks!
> > >
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: internel implemention of file operation

2013-01-10 Thread Rajat Sharma
Default read/write inerfaces are better suited for sequential read/write
within your program. Although you can seek to any location within the file,
you still have overhead to issue system calls to get data. However mmap
allows you to map a section of file into program address space. Now if your
access patter is rather random, modifying few bytes here and there, but at
random offset. You just get continous memory array, which is much easier
than issuing read and write at different file offsets.

A most common and mandatory use case of mmap is in mapping executable
binary program image, and libraries into process address spaces. Access
pattern for a program is not sequential, you can have multiple jump (if,
else, for loop), so it is better suited with mmap. It is read only and
private memory mapping, Any modifications you do will create a COW page
which if private to your process, so that is another advantage of mmap
which is completely transparent to user mode. If your filesystem does not
support, this basic mmap mode, you can not execute a binary file stored in
this filesystem, unless you copy it to some other filesystem which does.

Apart from this, read more about mmap from UTLK book.

-Rajat


On Fri, Jan 11, 2013 at 2:44 AM, horseriver  wrote:

> hi:
>
>   these two wayes of operating one file :
>
>   1.use open/write interface call .
>
>   2.mmap this file into memory , then access this memory area and do r/w .
>
>   what is the essential difference between this teo wayes?
>
> thanks!
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: What is asmlinkage ?

2013-01-10 Thread Rajat Sharma
> it is defined even in  much earlier release:
http://lxr.free-electrons.com/ident?v=2.6.32;i=asmlinkage

There seems to be no definition for arm here too. I literally meant
definition as '#define asmlinkage' not the usage of it. For arm it is none
so default defined in include/linux/linkage.h is used which is nothing
special and just extern 'C' declaration to avoid garbled naming of C++
linkage, thats it.

-Rajat


On Fri, Jan 11, 2013 at 12:00 PM, Peter Teoh wrote:

>
>
> On Fri, Jan 11, 2013 at 1:35 PM, Rajat Sharma  wrote:
>
>> > asmlinkage is defined for almost all arch:
>> > grep asmlinkage arch/arm/*/* and u got the answer.
>>
>> I didn't see a definition of macro atleast in linux source I was browsing
>> (3.2.0), Could you please point out to any one you have found.
>>
>
> it is defined even in  much earlier release:
>
> http://lxr.free-electrons.com/ident?v=2.6.32;i=asmlinkage
>
> for example, and every arch possible has a use of it.
>
> --
> Regards,
> Peter Teoh
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: What is asmlinkage ?

2013-01-10 Thread Rajat Sharma
> asmlinkage is defined for almost all arch:
> grep asmlinkage arch/arm/*/* and u got the answer.

I didn't see a definition of macro atleast in linux source I was browsing
(3.2.0), Could you please point out to any one you have found.

Alternatively I tried more precise search:
linux-source-3.2.0$ grep -R "#define asmlinkage" arch/arm/

No output. Although similar search gave following result for x86:

linux-source-3.2.0$ grep -R "#define asmlinkage" arch/x86/
arch/x86/include/asm/linkage.h:#define asmlinkage CPP_ASMLINKAGE
__attribute__((regparm(0)))
arch/x86/include/asm/linkage.h:#define asmlinkage_protect(n, ret, args...) \

Typically it should be defined in arch specific linkage.h file only, for
arm there was none I found. If none is found, default definition is given
from inlucde/linux/linkage.h which is nop:

#ifndef asmlinkage
#define asmlinkage CPP_ASMLINKAGE
#endif

CPP_ASMLINKAGE is just extern C stuff:

#ifdef __cplusplus
#define CPP_ASMLINKAGE extern "C"
#else
#define CPP_ASMLINKAGE
#endif

Indeed the only meaningful definition I found for x86_32 only in
arch/x86/include/asm/linkage.h:

#ifdef CONFIG_X86_32
#define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))

regparm controls number or integer arguments passed to a function. And it
is specific to x86 only according to gcc manuals.

Hence the above wiki FAQ seems valid for x86_32 arch, although it should
specifically mention this arch dependence to avoid confusion on public
forums.

-Rajat


On Fri, Jan 11, 2013 at 8:43 AM, Peter Teoh  wrote:

> asmlinkage is defined for almost all arch:
>
> grep asmlinkage arch/arm/*/* and u got the answer.
>
> It seemed that:
>
>  http://kernelnewbies.org/FAQ/asmlinkage
>
> gave the impression that asmlinkage is only for system call, or associated
> with it.   Not really, nothing to do with system call actually.
>
> Even though majority (or ALL) of syscall are defined with asmlinkage, but
> there are many that are not, eg, in arch/arm subdirectory:
>
> kernel/irq.c:asmlinkage void __exception_irq_entry
> kernel/process.c:asmlinkage void ret_from_fork(void)
> __asm__("ret_from_fork");
> kernel/smp.c:asmlinkage void __cpuinit secondary_start_kernel(void)
> kernel/smp.c:asmlinkage void __exception_irq_entry do_IPI(int ipinr,
> struct pt_regs *regs)
>
> just a few examples.   Essentially, it is just declared so that the name,
> for example, "do_IPI" can be called from assembly, for example in the
> following pair (arch/arm assumed):
>
> /kernel/smp.c:
> asmlinkage void __exception_irq_entry do_IPI(int ipinr, struct pt_regs
> *regs)
>
> ./include/asm/entry-macro-multi.S:
> bne do_IPI
>
> More info:
>
>
> http://stackoverflow.com/questions/10060168/is-asmlinkage-required-for-a-c-function-to-be-called-from-assembly
>
> From above, asmlinkage is also NOT the only way..
>
>
> On Fri, Jan 4, 2013 at 6:29 PM, anish singh 
> wrote:
>
>> On Fri, Jan 4, 2013 at 3:41 PM, Rajat Sharma  wrote:
>> >> Is this correct for all architectures?
>> >
>> > I guess not, asmlinkage is undefined for arm, so I assume this
>> mechanism is
>> > not there for arm.
>> then how do they do it?
>> >
>> >
>> >
>> > On Fri, Jan 4, 2013 at 2:24 PM, 卜弋天  wrote:
>> >>
>> >>
>> >>
>> >> 在 2013-1-4,15:38,"Rajat Sharma"  写道:
>> >>
>> >> > So with asmlinkage we request compiler to put args on stack. What is
>> >> > advantage of this to start_kernel or in general to other functions ?
>> >>
>> >> See its about implementation ease and little of performance too.
>> Assuming
>> >> the default model of keeping arguments in registers is used. lets say
>> >> arguments are assumed to be in registers R1, R2, R3, R4, R5, R6 and
>> beyond
>> >> that in stack. Since system call number is a transparent argument
>> which is
>> >> chopped off when calling the actual kernel handler and if R1 had the
>> system
>> >> call number, then you have to shift all register values and stack
>> arguments
>> >> too.
>> >>
>> >>Is this correct for all architectures?
>> >>
>> >>As I remembered, ARM uses SWI instruction to implement the system
>> call,
>> >> it will pass system call number by register R7, and use normal register
>> >> R0~R3 to pass parameters.
>> >>
>> >>
>> >>
>> >> Now consider that all arguments are pushed on stack (as enforced by
>> >> asmlinkage), you have all function argument in the beginning of the
>&

RE: Kernel-space flirting with user-space

2013-01-10 Thread Rajat Sharma
If you do not have much dependency to be in kernel mode, you can implement,
or atleast initially, a user mode filesystem using fuse.
--
From: Simon
Sent: 11-01-2013 02:45
To: kernelnewbies@kernelnewbies.org
Subject: Kernel-space flirting with user-space

Hi there,

  I'm rather new to kernel development, but I am somewhat experienced with
Linux, kernel compilation, etc.  I'm just passed writing a helloworld
module, dummy filesystem (ramfs wrapper) and proc entry.  I would like to
know how communication between kernel & user-space is done (the standard)
for a high number of tiny and large messages going in both directions
(large messages would need to be transfered incrementally, in chunks, can
be sequential or random access).  I plan on developping a network
filesystem.  I would appreciate if someone could clarify where I'm wrong
and confirm where I'm right, and ideally suggest some vocabulary, source
files to lookup, online docs or even books.  The minimum you can give will
be most appreciated!  =)

  Firstly, I was wondering if it would be possible to implement the
filesystem entirely as a kernel module.  I would need TCP/UDP sockets.  I
think this is really not the recommended approach, but an advantage I see
is it could be used to mount the root filesystem before calling init (that
would be before user-space exists, right?).  On the other hand, separating
the work to have a program/daemon in user-space do the communication and
processing would allow me to write that part in C++ (which I personally
prefer).

  Secondly, I wonder how I can "bind" a user-space program/daemon with the
kernel-space of the module.

  Using procfs:  It seems technically possible to achieve my goal with it,
but I feel this would not be the standard approach.  I would rather keep it
for live config & live status reporting.
  Using sysfs:  I've read too much misleading information that I'm not sure
anymore what it's for or even if it's still used or deprecated!  Can
someone clarify?  It looks technically identical to procfs, but it's
mission is a different one, correct?
  Using pipes:  This may be a good way, but again, I'm not sure if it's the
standard way.
  Using IPC:  As far as I understand, IPC would be the way if we'd be
talking of two threads within the same process, or at least within
kernel-space.  Though I am likely to be wrong.
  Using a char device:  It seems technically possible as well, but it may
be difficult to deal with if I ever have more than one user-space process.
  Using a block device:  My file system works with files and their
metadata, but not with blocks, so this is not suitable.  Though it might be
a nice experiment to build a network block device containing an ext2
filesystem (which I think would be similar to iSCSI).
  Using net interface: Not applicable.  And an experiment to try to make it
applicable doesn't even seem remotely fun.  ;)

  I've seen the functions copy_to/from_user() or something like that, but I
haven't seen the user-space counter-part of these.  I think this may
actually be the best approach by far, but I really lack info/docs to go
that way.  The truly best way might be to use shared memory to avoid
copying data internally, but I haven't looked into that much yet.

  Finally, is it possible/correct/standard to do something similar to the
following for the local kernel & user-space communication?  (I haven't
reviewed the syntax, so take it as some kind of pseudo-code!)

struct foo {...} outgoingFoo;
// foo contains no pointers, so it is of a fixed size always.
fwrite(&outgoingFoo, 1, sizeof(outgoingFoo), output);
  [...then on the other end...]
if(sizeof(fread( buffer, 1, buffer_size, input);
struct foo * incomingFoo = (struct foo *) buffer;

Regards,
  Simon
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: What is asmlinkage ?

2013-01-04 Thread Rajat Sharma
> Is this correct for all architectures?

I guess not, asmlinkage is undefined for arm, so I assume this mechanism is
not there for arm.



On Fri, Jan 4, 2013 at 2:24 PM, 卜弋天  wrote:

>
>
> 在 2013-1-4,15:38,"Rajat Sharma"  写道:
>
> > So with asmlinkage we request compiler to put args on stack. What is
> advantage of this to start_kernel or in general to other functions ?
>
> See its about implementation ease and little of performance too. Assuming
> the default model of keeping arguments in registers is used. lets say
> arguments are assumed to be in registers R1, R2, R3, R4, R5, R6 and beyond
> that in stack. Since system call number is a transparent argument which is
> chopped off when calling the actual kernel handler and if R1 had the system
> call number, then you have to shift all register values and stack arguments
> too.
>
>Is this correct for all architectures?
>
>As I remembered, ARM uses SWI instruction to implement the system call,
> it will pass system call number by register R7, and use normal register
> R0~R3 to pass parameters.
>
>
>
> Now consider that all arguments are pushed on stack (as enforced by
> asmlinkage), you have all function argument in the beginning of the stack
> and the system call number on top of the stack. you just need to pop out
> stack top to remove system call number from function argument.
>
> You might argue that why not always keep system call number on stack top
> and use registers for function arguments? But thats part of the compiler
> ABI and if you had fewer arguments lets say 2 only and used up R1 and R2
> only, you may not jump to stack top directly for storing system call as its
> turn for R3 as argument.
>
> So, isn't it simpler implementation with everything on stack?
>
> -Rajat
>
>
> On Fri, Jan 4, 2013 at 12:13 PM, Rahul Bedarkar  wrote:
>
>> Thanks. So with asmlinkage we request compiler to put args on stack. What
>> is advantage of this to start_kernel or in general to other functions ?
>>
>> Regards,
>> Rahul
>>
>>
>> On Thu, Jan 3, 2013 at 9:34 PM, Mulyadi Santosa <
>> mulyadi.sant...@gmail.com> wrote:
>>
>>> On Thu, Jan 3, 2013 at 7:40 PM, Rahul Bedarkar 
>>> wrote:
>>> > Hi,
>>> >
>>> > I was searching for asmlinkage and found that it is already explained
>>> at
>>> > http://kernelnewbies.org/FAQ/asmlinkage
>>> >
>>> > But I didn't get this. Can someone tell me about it in brief ?
>>>
>>> the point is, parameters which is usually passed via stack, is passed
>>> using different way.
>>>
>>> A good example is system call they are passed using registers IIRC
>>>
>>>
>>> --
>>> regards,
>>>
>>> Mulyadi Santosa
>>> Freelance Linux trainer and consultant
>>>
>>> blog: the-hydra.blogspot.com
>>> training: mulyaditraining.blogspot.com
>>>
>>
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: What is asmlinkage ?

2013-01-03 Thread Rajat Sharma
> So with asmlinkage we request compiler to put args on stack. What is
advantage of this to start_kernel or in general to other functions ?

See its about implementation ease and little of performance too. Assuming
the default model of keeping arguments in registers is used. lets say
arguments are assumed to be in registers R1, R2, R3, R4, R5, R6 and beyond
that in stack. Since system call number is a transparent argument which is
chopped off when calling the actual kernel handler and if R1 had the system
call number, then you have to shift all register values and stack arguments
too.

Now consider that all arguments are pushed on stack (as enforced by
asmlinkage), you have all function argument in the beginning of the stack
and the system call number on top of the stack. you just need to pop out
stack top to remove system call number from function argument.

You might argue that why not always keep system call number on stack top
and use registers for function arguments? But thats part of the compiler
ABI and if you had fewer arguments lets say 2 only and used up R1 and R2
only, you may not jump to stack top directly for storing system call as its
turn for R3 as argument.

So, isn't it simpler implementation with everything on stack?

-Rajat


On Fri, Jan 4, 2013 at 12:13 PM, Rahul Bedarkar  wrote:

> Thanks. So with asmlinkage we request compiler to put args on stack. What
> is advantage of this to start_kernel or in general to other functions ?
>
> Regards,
> Rahul
>
>
> On Thu, Jan 3, 2013 at 9:34 PM, Mulyadi Santosa  > wrote:
>
>> On Thu, Jan 3, 2013 at 7:40 PM, Rahul Bedarkar  wrote:
>> > Hi,
>> >
>> > I was searching for asmlinkage and found that it is already explained at
>> > http://kernelnewbies.org/FAQ/asmlinkage
>> >
>> > But I didn't get this. Can someone tell me about it in brief ?
>>
>> the point is, parameters which is usually passed via stack, is passed
>> using different way.
>>
>> A good example is system call they are passed using registers IIRC
>>
>>
>> --
>> regards,
>>
>> Mulyadi Santosa
>> Freelance Linux trainer and consultant
>>
>> blog: the-hydra.blogspot.com
>> training: mulyaditraining.blogspot.com
>>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: internel implemention of file operation

2013-01-03 Thread Rajat Sharma
> in this vm_operations_struct , there are open/close functions ,  are
there necessary  relations between file operations and this struct ?
well not really for open/close of vm_ops are of interest to filesystems,
but page fault handler and making page writable is where filesystem come
into picture. Have a look at ext4_file_vm_ops, it implements operations of
interest.

static const struct vm_operations_struct ext4_file_vm_ops = {
.fault  = filemap_fault,
.page_mkwrite   = ext4_page_mkwrite,
}

Note that only filesystem knows how to fill up this page.


On Thu, Jan 3, 2013 at 3:39 PM, horseriver  wrote:

> On Thu, Jan 03, 2013 at 01:16:06PM +0530, Rajat Sharma wrote:
>
> > > will it be maped with vm_area struct ?
> > Yes if it is accessed via mmap system call.
>
> you know that , in the struct vm_area_struct,there is a struct
> vm_operations_struct * vm_ops;
>
> in this vm_operations_struct , there are open/close functions ,  are there
> necessary  relations between file
>
> operations and this struct ?
>
> thanks!
>
> >
> > > what is the relation between page-cache and file operation?
> > file operations for data access like read/write will look into page-cache
> > first before going to disk.
> >
> > -Rajat
> >
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: internel implemention of file operation

2013-01-02 Thread Rajat Sharma
unfortunately these are not the topic to digest in a mail, I recommend you
reading UTLK book (http://shop.oreilly.com/product/9780596005658.do).

Still...

> will it be maped with vm_area struct ?
Yes if it is accessed via mmap system call.

> what is the relation between page-cache and file operation?
file operations for data access like read/write will look into page-cache
first before going to disk.

-Rajat


On Thu, Jan 3, 2013 at 1:01 PM, horseriver  wrote:

> On Thu, Jan 03, 2013 at 12:48:01PM +0530, Rajat Sharma wrote:
> > Never heard of page-cache?
>
> will it be maped with vm_area struct ?
> what is the relation between page-cache and file operation?
>
> thanks!
>
> >
> >
> > On Thu, Jan 3, 2013 at 12:29 PM, horseriver 
> wrote:
> >
> > > hi:
> > >
> > >   when one file is opened , does its  data be put into memory ? and all
> > > operation on this file
> > >
> > >   will be implemented by operation on its mapping memory area ?
> > >
> > > thanks!
> > >
> > > ___
> > > Kernelnewbies mailing list
> > > Kernelnewbies@kernelnewbies.org
> > > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
> > >
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: internel implemention of file operation

2013-01-02 Thread Rajat Sharma
Never heard of page-cache?


On Thu, Jan 3, 2013 at 12:29 PM, horseriver  wrote:

> hi:
>
>   when one file is opened , does its  data be put into memory ? and all
> operation on this file
>
>   will be implemented by operation on its mapping memory area ?
>
> thanks!
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Max open file limit

2012-12-05 Thread Rajat Sharma
Well this is not really a limit on number of files, rather a memory
allocation optimization for fd_array. The limit you are looking for is:

int sysctl_nr_open __read_mostly = 1024*1024;

$ cat /proc/sys/fs/nr_open
1048576

-Rajat


On Wed, Dec 5, 2012 at 4:38 PM, Vijay Chauhan wrote:

> Hello,
>
> How many files a process can open at a time? Is it configurable?
>
> I found following in the kernel code:
>
> ..
>   .max_fds= NR_OPEN_DEFAULT,
> ..
> ..
> #define NR_OPEN_DEFAULT BITS_PER_LONG
> ..
> ..
> #ifdef __KERNEL__
> #define BITS_PER_LONG 32
> ..
>
> But I can open more than 32 files in my user space program.
>
> Thank you,
> Vijay
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Compiling the source RPM file.

2012-09-28 Thread Rajat Sharma
If you already have a source rpm, rather use rpmbuild to automatically
apply patches and build rpm. When you install the source rpm, it will
create a directory hierarchy in your home directory something like this:
BUILD
SOURCES
SPEC
RPMS

SOURCES will hold the tar ball and SPEC holds rpm specification file, which
is the main file used for building rpm use this file and feed it to
rpmbuild command.

# rpmbuild -ba 

Also refer to maximum rpm, the ultimate reference about rpms:
http://www.rpm.org/max-rpm/. I bet you will get everything you need to know
about rpm here.

-Rajat

On Fri, Sep 28, 2012 at 12:55 PM, K Arun Kumar  wrote:

> On Fri, 2012-09-28 at 12:45 +0530, Rahul Bedarkar wrote:
> > Can you write command here, how you have applied patches ?
> >
> I went into the extracted tarball and tried applying patches one after
> another like this -
>   patch -p1 < ../
> > Packages (.deb, .rpm and others) meant to build and installed
> automatically.
> >
> > You can build and install it by following command
> >
> > $sudo rpm -i 
> I am just interested in fully-patched source I want to compile and
> install it myself without any rpm tools - more clearly -
> I want to -
> (1) extract the source tarball available with the rpm package
> (2) patch this source with all the packages available in the rpm package
> (3) configure , build and install this source at *custom* location
>
> ** I do not want to use any rpm tools for this **
> > On Fri, Sep 28, 2012 at 11:51 AM, K Arun Kumar 
> wrote:
> > > I have a rpm for a module  which I extracted like -
> > > rpm2cpio numactl-2.0.7-3.el6.src.rpm | cpio -idmv
> > >
> > > then I extracted the source like -
> > > tar -xzvf numactl-2.0.7-3.tar.gz
> > >
> > > then I tried to apply all the patches to available with
> > > this rpm to the extracted source which results in Hunks and
> > > rejections.
> > >
> > > I want to apply all patches and compile the source and get
> > > an exe. I am not much aware of the rpm build procedure
> > >
> > > I want to do similar thing to kernel source also
> > >
> > > Please help me...
> > >
> > > Regards,
> > > Arun
> > >
> Thanks,
> Arun
> > >
> > > ___
> > > Kernelnewbies mailing list
> > > Kernelnewbies@kernelnewbies.org
> > > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem during /init execve

2012-09-25 Thread Rajat Sharma
> The result is the same with and without CONFIG_EXT2_FS_XATTR enabled.
so may be its not getxattr, but inode itself is null:

if (!inode || !inode->i_op->getxattr)

you can verify that by breaking if condition into two separate conditions.

> CONFIG_AUDIT is disabled in my config.
> A priori, I don't need it at the moment.

There are other options like CONFIG_AUDIT_TREE, disable that too. Also
make sure that initrd image has the ext2 driver compiled for this
kernel.

> After searching 'getxattr' reference in my whole linux directory, .getxattr
> field
> should be initialized with "generic_getxattr" in ext2 and ext3 case.
> Is it really what it need, or does it require something specific else?
>

If you are using vanilla ext2 fs and CONFIG_EXT2_FS_XATTR is set, then
it must be initialized in static inode operations table, look at:
ext2_file_inode_operations for files
ext2_dir_inode_operations for dir
ext2_special_inode_operations for special files
ext2_fast_symlink_inode_operations and ext2_symlink_inode_operations
for symlinks
all of them are compiled and initialized with generic_getxattr if you
compile with CONFIG_EXT2_FS_XATTR.

-Rajat

On Tue, Sep 25, 2012 at 1:31 PM, stl  wrote:
> hi,
>
>
>> Does your kernel config set this option:
>> CONFIG_EXT2_FS_XATTR=y
>
> The result is the same with and without CONFIG_EXT2_FS_XATTR enabled.
>
>
>> And what is the state of CONFIG_AUDIT option? Do you need Kernel Audit
>> support in your environment?
>
> CONFIG_AUDIT is disabled in my config.
> A priori, I don't need it at the moment.
>
> After searching 'getxattr' reference in my whole linux directory, .getxattr
> field
> should be initialized with "generic_getxattr" in ext2 and ext3 case.
> Is it really what it need, or does it require something specific else?
>
> Thanks in advance

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem during /init execve

2012-09-24 Thread Rajat Sharma
Does your kernel config set this option:
CONFIG_EXT2_FS_XATTR=y

And what is the state of CONFIG_AUDIT option? Do you need Kernel Audit
support in your environment?

-Rajat

On Mon, Sep 24, 2012 at 9:42 PM, stl  wrote:
> Hi
> it is for the VFS, ext2 is selected in my .config.
>
> Concerning the filesystem, if the boot hasn't crashed until the start of
> /init means
> that it is correclty mounted (i suppose)
> In addition, the kernel does not output the well known message:
> "warning: unable to open an initial console", this means that it is able to
> open the /dev/console.
>
> I used initramfs support in order to mount the VFS, by supplying a config
> file to
> CONFIG_INITRAMFS_SOURCE option.
>
> However, the getattr method has been initialized, but not the getxattr.
>
> Have you any idea  what  the problem could be?
> Thanks.
>
>
>
> 2012/9/24 Mulyadi Santosa 
>>
>> Hi...
>>
>> On Mon, Sep 24, 2012 at 9:58 PM, stl  wrote:
>> > Hello all,
>> >
>> > What is the purpose of the inode->i_op->getxattr method?
>>
>> I think it has something related to extended attribute
>>
>> > During the boot of linux-2.6.37 on a new architecture, it crashes in the
>> > get_vfs_caps_from_disk() function because of the following security
>> > check,
>> > (in my case, this field is ot initialized).
>> >
>> > if (!inode || !inode->i_op->getxattr)
>> >   return -ENODATA;
>>
>> May I know, what's the filesystem type that code currently checks? And
>> are you sure that filesystem is completely ok?
>>
>> --
>> regards,
>>
>> Mulyadi Santosa
>> Freelance Linux trainer and consultant
>>
>> blog: the-hydra.blogspot.com
>> training: mulyaditraining.blogspot.com
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Maximum size of data segment used by c program

2012-09-17 Thread Rajat Sharma
On Mon, Sep 17, 2012 at 4:34 PM, devendra.aaru  wrote:
>
> > On Mon, Sep 17, 2012 at 6:17 AM, Rajat Sharma  wrote:
> >> yes there is a limit, look at
> >>
> > getrlimit show the size as 4GiGs of RLIMIT_DATA on a 32-bit. which
> > means that its unlimited?
> >
> I mean 4Gigs of RLIMIT_DATA is both the rlim_cur and rlim_max.
> i have a data section of size 213K, i am thinking that i should not be
> growing the data section size.

yes try the c function setrlimit, thats what I meant by looking at man page.

#include 
#include 

int getrlimit(int resource, struct rlimit *rlim);
int setrlimit(int resource, const struct rlimit *rlim);

> I have went through google and didn't found any article that says
> about the maximum data section a process can use.
>
> > Thanks,

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Maximum size of data segment used by c program

2012-09-17 Thread Rajat Sharma
yes there is a limit, look at

# man setrlimit

RLIMIT_DATA
  The maximum size of the process's data segment (initialized
data, uninitialized data, and heap).  This limit affects calls to brk(2)
and sbrk(2), which fail with the error ENOMEM  upon
  encountering the soft limit of this resource.

-Rajat

On Mon, Sep 17, 2012 at 3:11 PM, devendra.aaru wrote:

> Hi all,
>
> Is there any limit on the maximum data segment size used by c programs?
>
> Thanks,
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: kernel stack memory

2012-09-16 Thread Rajat Sharma
Some useful debugging options:

CONFIG_DEBUG_STACK_USAGE
CONFIG_DEBUG_STACKOVERFLOW

> if you take enough large array so as to hit the heap
> area at that present moment (they both grow towards each other)

I doubt if it is true for kernel mode stack. Kernel stack allocation is
plain pages allocation using zone buddy allocator (alloc_pages_node), so
what is beyond that page, allocated to something else or not is really not
predictable. However one thing which is predictable (and other people have
pointed out as well) is overrunning thread_info struct of the current task.
Also this overrun may not show up with this simple module, because there is
no need to access thread_info here. My suggestions:

   1. Try putting a schedule call after overrunning thread_info structure,
   thread_info might be accessed while scheduling back current process.
   2. Try to access current macro after overrun, it will try do access
   corrupt thread_info structure to get task_struct pointer.

But make sure to corrupt the thread_info structure in predictive manner as
pointed out in previous mails :).
-Rajat

On Sun, Sep 16, 2012 at 10:35 AM, Shreyansh Jain 
wrote:
> Hi 卜弋天 and list,
>
> Please find my comments inline
>
> On Thu, Sep 13, 2012 at 7:38 PM, 卜弋天  wrote:
>> i don't know why you want to corrupt kernel stack by using this method,
>> stack usually grow from high address to low address,
>> if you allocate a buff in a function then use memset(), it is writing
data
>> from low address to high address.
>> in your implementation, you allocate an array with 8000*4=32000 bytes (
int
>> arr[8000]; ), then you try to corrupt stack by using memset(), which
operate
>> memory by bytes, rather than by int. so this memset() only corrupt the
first
>> 8192 bytes of the buffer, which is far away from your current task stack.
>>
>>   thread_info locates at the bottom of current task's stack, please
>> reference the source code of current_thread_info() function of your
>> platform. i think it is true for X86 or ARM.
>>
>>   if you really want to corrupt current kernel task's stack, please
try
>> below code, i did't test it but i think it should work, at least you can
>> find something from the log:
>>
>>  char *sp_addr;
>>  struct thread_info *thread = current_thread_info();
>>  sp_addr = (char*)thread;
>>
>>  printk("sp_addr==thread:%p, task:%p\n", thread, thread->task);
>>
>>  memset (sp_addr, 0x0, 1024);
>>
>>  printk("after corrupt, task:%p, it is dying...\n", thread->task);
>
> Actually, after reading through the first authors email, it seems he
> is trying to find the answer to "How much is maximum allocatable space
> on the Kernel Stack" (Shubham, please correct me if  I am wrong).
>
> In that essence what you have mentioned above is more of a direct
> method of corrupting the thread_info structure - a definitive stack
> corruption.
>
>>
>>
>>> Date: Thu, 13 Sep 2012 15:32:05 +0530
>>> Subject: Re: kernel stack memory
>>> From: mujeeb.a...@gmail.com
>>> To: getaru...@gmail.com
>>> CC: shubham20...@gmail.com; kernelnewbies@kernelnewbies.org
>>
>>>
>>> Hi,
>>>
>>> On Thu, Sep 13, 2012 at 1:59 PM, Arun KS  wrote:
>>> > Hello Shubham,
>>> >
>>> > On Thu, Sep 13, 2012 at 12:15 PM, shubham sharma
>>> > 
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> As far as i know, the size of stack allocated in the kernel space is
>>> >> 8Kb for each process. But in case i use more than 8Kb of memory from
>>> >> the stack then what will happen? I think that in that case the system
>>> >> would crash because i am accessing an illegal memory area. I wrote
>>> >> kernel module in which i defined an integer array whose size was
8000.
>>> >> But still it did not crash my system. Why?
>>> >>
>>> >> The module i wrote was as follows:
>>> >>
>>> >> #include 
>>> >> #include 
>>> >>
>>> >> int __init init_my_module(void)
>>> >> {
>>> >> int arr[8000];
>>> >> printk("%s:%d\tmodule initilized\n", __func__, __LINE__);
>>> >> arr[1] = 1;
>>> >> arr[4000] = 1;
>>> >> arr[7999] = 1;
>>> >
>>> > Instead do a memset.
>>> > memset(arr, 0, 8192);
>>> >
>>> > If you do this the current calling process thread_info will be set to
>>> > zero.
>>> > This should cause a crash.
>>>
>>> I tried and this is also not causing any crash.
>>>
>>> Thanks,
>>> Adil
>>> >
>>> > Thanks,
>>> > Arun
>>> >
>>> >
>>> >>
>>> >> printk("%s:%d\tarr[1]:%d, arr[4000]:%d, arr[7999]:%d\n", __func__,
>>> >> __LINE__, arr[1], arr[4000], arr[7999]);
>>> >> return 0;
>>> >> }
>>> >>
>>> >> void __exit cleanup_my_module(void)
>>> >> {
>>> >> printk("exiting\n");
>>> >> return;
>>> >> }
>>> >>
>>> >> module_init(init_my_module);
>>> >> module_exit(cleanup_my_module);
>>> >>
>>> >> MODULE_LICENSE("GPL");
>>> >>
>
> Though I don't have an exact answer, but what I understand is that
> there is no limit imposed by the kernel while writing in the kernel
> layer (as Kshemendra's first post pointed out). Thus, you keep writing
> what you wish till either you corrupt the next instruction set or some
> dat

Re: kernel stack memory

2012-09-13 Thread Rajat Sharma
"The kernel stack is part of task_struct of the running process"

Please double check that, its not part of task_struct, rather on some
architectures, kernel stack is extended by a thread_info structure at
the end which keeps a link to task_struct of the process.

-Rajat

On Thu, Sep 13, 2012 at 1:59 PM, Arun KS  wrote:
> Hello Shubham,
>
> On Thu, Sep 13, 2012 at 12:15 PM, shubham sharma 
> wrote:
>>
>> Hi,
>>
>> As far as i know, the size of stack allocated in the kernel space is
>> 8Kb for each process. But in case i use more than 8Kb of memory from
>> the stack then what will happen? I think that in that case the system
>> would crash because i am accessing an illegal memory area. I wrote
>> kernel module in which i defined an integer array whose size was 8000.
>> But still it did not crash my system. Why?
>>
>> The module i wrote was as follows:
>>
>> #include 
>> #include 
>>
>> int __init init_my_module(void)
>> {
>> int arr[8000];
>> printk("%s:%d\tmodule initilized\n", __func__, __LINE__);
>> arr[1] = 1;
>> arr[4000] = 1;
>> arr[7999] = 1;
>
> Instead do a memset.
> memset(arr, 0, 8192);
>
> If you do this the current calling process thread_info will be set to zero.
> This should cause a crash.
>
> Thanks,
> Arun
>
>
>>
>> printk("%s:%d\tarr[1]:%d, arr[4000]:%d, arr[7999]:%d\n", __func__,
>> __LINE__, arr[1], arr[4000], arr[7999]);
>> return 0;
>> }
>>
>> void __exit cleanup_my_module(void)
>> {
>> printk("exiting\n");
>> return;
>> }
>>
>> module_init(init_my_module);
>> module_exit(cleanup_my_module);
>>
>> MODULE_LICENSE("GPL");
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Is it possible to use a local drive to cache an NFS share

2012-09-03 Thread Rajat Sharma
Hi,

NFS client support FSCache support which is default in recent kernels
CONFIG_NFS_FSCACHE=y, But as I remember, it is a read cache (not even
write-through), so a compile load generates lots of new files, so
those won't really be benefited by FSCache.

But as you are pointing out, your setup is a single user access, so
probably you can relax some of the NFS consistency options, e.g. async
mount is really suitable for your case, also you can increase actimeo
values. Compiler writes will hit-page cache and with asynchronous
writebacks, it is way more faster than SSDs :).

-Rajat

On Mon, Sep 3, 2012 at 6:30 PM, Graeme Russ  wrote:
> Hi All,
>
> I am mounting /home over NFS which is great, but it really kills compile
> times. So I have a local HDD which I have copied all my source code over
> from which I do my coding and compiling.
>
> What I would love to do is use the local drive to cache /home. Specifically
> one user directory under /home - only one machine will ever be modifying
> the contents of this directory.
>
> Or should I just use rsync?
>
> On a side note - I moved my source code from a 1TB HDD on a 3Gb/s SATA port
> to an Intel 510 series 250GB SSD on a 6Gb/s SATA port. But I didn't see an
> appreciable increase in compile speed.
>
> Regards,
>
> Graeme
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Stackable file systems and NFS

2012-08-16 Thread Rajat Sharma
So is it truncating the file? i.e.

# ping > /nfs/somefile

On Thu, Aug 16, 2012 at 2:46 PM, Ranjan Sinha  wrote:
> On Thu, Aug 16, 2012 at 1:33 PM, Rajat Sharma  wrote:
>> What is the pattern other NFS client is writing to the file? Can't it
>> be a legitimate NUL by any chance?
>
> Redirected output of ping.
>
>
>>
>> On Thu, Aug 16, 2012 at 1:22 PM, Ranjan Sinha  wrote:
>>> On Thu, Aug 16, 2012 at 1:00 PM, Rajat Sharma  wrote:
>>>> Correct me if I am reading something wrong, in your program listing,
>>>> while printing the buffer you are passing a total_count variable,
>>>> while vfs_read returned value is collected in count variable.
>>>>
>>>> debug_dump("Read buffer", buf, total_count);
>>>
>>> My apologies. Please read that as count only. A typo in the listing.
>>>
>>>>
>>>> One suggestion, please fill up buf with some fixed known pattern
>>>> before vfs_read.
>>>
>>> I tried that as well. It still comes out as ASCII NUL.
>>>
>>>>
>>>>> We have also noticed that the expected increase (inc) and the size
>>>> returned in (vfs_read()) is different.
>>>>
>>>> There is nothing which is blocking updates to file size between
>>>> vfs_getattr() and vfs_read(), right? no locking?
>>>
>>> No locking. On second thoughts I think this is ok since more data could be
>>> available between the calls to vfs_getattr and vfs_read as the other NFS 
>>> client
>>> is continuously writing to that file.
>>>
>>> --
>>> Ranjan
>>>
>>>
>>>>
>>>> -Rajat
>>>>
>>>> On Thu, Aug 16, 2012 at 12:01 PM, Ranjan Sinha  
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, Aug 14, 2012 at 4:19 PM, Rajat Sharma  wrote:
>>>>>> Try mounting with noac nfs mount option to disable attribute caching.
>>>>>>
>>>>>> ac / noac
>>>>>>
>>>>>> "Selects whether the client may cache file attributes. If neither
>>>>>> option is specified (or if ac is specified), the client caches file
>>>>>> attributes."
>>>>>
>>>>> i don't think this is because of attribute caching. The size does change 
>>>>> and
>>>>> that is why we go to the read call (think of this is a simplified case of
>>>>> tail -f). The only problem is that sometimes when we read we get ASCII 
>>>>> NUL bytes
>>>>> at the end. If we read the same block again, we get the correct data.
>>>>>
>>>>> In addition, we cannot force specific mount options in actual deployment
>>>>> scenarios.
>>>>>
>>>>>
>>>>> 
>>>>>
>>>>>>> On Tue, Aug 14, 2012 at 5:10 PM, Ranjan Sinha  
>>>>>>> wrote:
>>>>>>> > For now, /etc/export file has the following setting
>>>>>>> > *(rw,sync,no_root_squash)
>>>>>>>
>>>>>>> hm, AFAIK that means synchronous method is selected. So,
>>>>>>> theoritically, if there is no further data, the other end of NFS
>>>>>>> should just wait.
>>>>>>>
>>>>>>> Are you using blocking or non blocking read, btw? Sorry, i am not
>>>>>>> really that good reading VFS code...
>>>>>>>
>>>>>
>>>>> This is a blocking read call. I think this is not because there is no 
>>>>> data,
>>>>> rather somehow the updated data is not present in the VM buffers but the
>>>>> inode size has changed. As I just said, if we read the file again from the
>>>>> exact same location, we get the actual contents. Though after going 
>>>>> through the
>>>>> code I don't understand how is this possible.
>>>>>
>>>>>>> > On client side we have not specified any options explicitly. This is
>>>>>>> > from /proc/mounts entry
>>>>>>> > >rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys
>>>>>>>
>>>>>>> hm, not sure, maybe in your case, read and write buffer should be
>>>>>>> reduced so any new data should be transmitted ASAP. I was inspired by
>>>>>>> bufferbloat handling, but maybe I am wrong here somewhere
>>>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Ranjan

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Stackable file systems and NFS

2012-08-16 Thread Rajat Sharma
What is the pattern other NFS client is writing to the file? Can't it
be a legitimate NUL by any chance?

On Thu, Aug 16, 2012 at 1:22 PM, Ranjan Sinha  wrote:
> On Thu, Aug 16, 2012 at 1:00 PM, Rajat Sharma  wrote:
>> Correct me if I am reading something wrong, in your program listing,
>> while printing the buffer you are passing a total_count variable,
>> while vfs_read returned value is collected in count variable.
>>
>> debug_dump("Read buffer", buf, total_count);
>
> My apologies. Please read that as count only. A typo in the listing.
>
>>
>> One suggestion, please fill up buf with some fixed known pattern
>> before vfs_read.
>
> I tried that as well. It still comes out as ASCII NUL.
>
>>
>>> We have also noticed that the expected increase (inc) and the size
>> returned in (vfs_read()) is different.
>>
>> There is nothing which is blocking updates to file size between
>> vfs_getattr() and vfs_read(), right? no locking?
>
> No locking. On second thoughts I think this is ok since more data could be
> available between the calls to vfs_getattr and vfs_read as the other NFS 
> client
> is continuously writing to that file.
>
> --
> Ranjan
>
>
>>
>> -Rajat
>>
>> On Thu, Aug 16, 2012 at 12:01 PM, Ranjan Sinha  wrote:
>>> Hi,
>>>
>>> On Tue, Aug 14, 2012 at 4:19 PM, Rajat Sharma  wrote:
>>>> Try mounting with noac nfs mount option to disable attribute caching.
>>>>
>>>> ac / noac
>>>>
>>>> "Selects whether the client may cache file attributes. If neither
>>>> option is specified (or if ac is specified), the client caches file
>>>> attributes."
>>>
>>> i don't think this is because of attribute caching. The size does change and
>>> that is why we go to the read call (think of this is a simplified case of
>>> tail -f). The only problem is that sometimes when we read we get ASCII NUL 
>>> bytes
>>> at the end. If we read the same block again, we get the correct data.
>>>
>>> In addition, we cannot force specific mount options in actual deployment
>>> scenarios.
>>>
>>>
>>> 
>>>
>>>>> On Tue, Aug 14, 2012 at 5:10 PM, Ranjan Sinha  
>>>>> wrote:
>>>>> > For now, /etc/export file has the following setting
>>>>> > *(rw,sync,no_root_squash)
>>>>>
>>>>> hm, AFAIK that means synchronous method is selected. So,
>>>>> theoritically, if there is no further data, the other end of NFS
>>>>> should just wait.
>>>>>
>>>>> Are you using blocking or non blocking read, btw? Sorry, i am not
>>>>> really that good reading VFS code...
>>>>>
>>>
>>> This is a blocking read call. I think this is not because there is no data,
>>> rather somehow the updated data is not present in the VM buffers but the
>>> inode size has changed. As I just said, if we read the file again from the
>>> exact same location, we get the actual contents. Though after going through 
>>> the
>>> code I don't understand how is this possible.
>>>
>>>>> > On client side we have not specified any options explicitly. This is
>>>>> > from /proc/mounts entry
>>>>> > >rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys
>>>>>
>>>>> hm, not sure, maybe in your case, read and write buffer should be
>>>>> reduced so any new data should be transmitted ASAP. I was inspired by
>>>>> bufferbloat handling, but maybe I am wrong here somewhere
>>>>>
>>>
>>> --
>>> Regards,
>>> Ranjan

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Stackable file systems and NFS

2012-08-16 Thread Rajat Sharma
Correct me if I am reading something wrong, in your program listing,
while printing the buffer you are passing a total_count variable,
while vfs_read returned value is collected in count variable.

debug_dump("Read buffer", buf, total_count);

One suggestion, please fill up buf with some fixed known pattern
before vfs_read.

> We have also noticed that the expected increase (inc) and the size
returned in (vfs_read()) is different.

There is nothing which is blocking updates to file size between
vfs_getattr() and vfs_read(), right? no locking?

-Rajat

On Thu, Aug 16, 2012 at 12:01 PM, Ranjan Sinha  wrote:
> Hi,
>
> On Tue, Aug 14, 2012 at 4:19 PM, Rajat Sharma  wrote:
>> Try mounting with noac nfs mount option to disable attribute caching.
>>
>> ac / noac
>>
>> "Selects whether the client may cache file attributes. If neither
>> option is specified (or if ac is specified), the client caches file
>> attributes."
>
> i don't think this is because of attribute caching. The size does change and
> that is why we go to the read call (think of this is a simplified case of
> tail -f). The only problem is that sometimes when we read we get ASCII NUL 
> bytes
> at the end. If we read the same block again, we get the correct data.
>
> In addition, we cannot force specific mount options in actual deployment
> scenarios.
>
>
> 
>
>>> On Tue, Aug 14, 2012 at 5:10 PM, Ranjan Sinha  wrote:
>>> > For now, /etc/export file has the following setting
>>> > *(rw,sync,no_root_squash)
>>>
>>> hm, AFAIK that means synchronous method is selected. So,
>>> theoritically, if there is no further data, the other end of NFS
>>> should just wait.
>>>
>>> Are you using blocking or non blocking read, btw? Sorry, i am not
>>> really that good reading VFS code...
>>>
>
> This is a blocking read call. I think this is not because there is no data,
> rather somehow the updated data is not present in the VM buffers but the
> inode size has changed. As I just said, if we read the file again from the
> exact same location, we get the actual contents. Though after going through 
> the
> code I don't understand how is this possible.
>
>>> > On client side we have not specified any options explicitly. This is
>>> > from /proc/mounts entry
>>> > >rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys
>>>
>>> hm, not sure, maybe in your case, read and write buffer should be
>>> reduced so any new data should be transmitted ASAP. I was inspired by
>>> bufferbloat handling, but maybe I am wrong here somewhere
>>>
>
> --
> Regards,
> Ranjan

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Stackable file systems and NFS

2012-08-14 Thread Rajat Sharma
Try mounting with noac nfs mount option to disable attribute caching.

ac / noac

"Selects whether the client may cache file attributes. If neither
option is specified (or if ac is specified), the client caches file
attributes."

-Rajat

On Tue, Aug 14, 2012 at 3:51 PM, Mulyadi Santosa
 wrote:
>
> Hi Ranjan...
>
> On Tue, Aug 14, 2012 at 5:10 PM, Ranjan Sinha  wrote:
> > For now, /etc/export file has the following setting
> > *(rw,sync,no_root_squash)
>
> hm, AFAIK that means synchronous method is selected. So,
> theoritically, if there is no further data, the other end of NFS
> should just wait.
>
> Are you using blocking or non blocking read, btw? Sorry, i am not
> really that good reading VFS code...
>
> > On client side we have not specified any options explicitly. This is
> > from /proc/mounts entry
> > >rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys
>
> hm, not sure, maybe in your case, read and write buffer should be
> reduced so any new data should be transmitted ASAP. I was inspired by
> bufferbloat handling, but maybe I am wrong here somewhere
>
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


RE: How to know which programming model is used on may Linux

2012-07-25 Thread Rajat Sharma
 machine
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

Just compile the code with kbuild and print sizeof?

--
Rajat
From: Pritam Bankar
Sent: 25-07-2012 16:34
To: kernelnewbies
Subject: How to know which programming model is used on may Linux
machine
Hi,

AFAIK there are three programming model that can be chosen on 64 bit
environment. These models are LP64, ILP64, LLP64

Question 1)  But how can I know which model is used on my system ?

Question 2) Does long long data type is limited for LLP64 type model ?

(My system is redhat 6.2 64bit)



Thanks

Pritam Bankar

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


RE: [RFC]confusions about 'struct' define

2012-05-30 Thread Rajat Sharma
This might be the case of cyclic dependency where header files defining
these structures are also including this .h file.

-Rajat
From: harryxiyou
Sent: 30-05-2012 23:08
To: Gaurav Jain
Cc: Greg-Kroah-Hartman; Harry Wei; kernelnewbies@kernelnewbies.org
Subject: Re: [RFC]confusions about 'struct' define
On Thu, May 31, 2012 at 1:20 AM, Gaurav Jain  wrote:
Hi Gaurav,

> Those are forward declarations as they are being used in defining struct
> bus_attribute. It's nothing special about GNU-C. That's the case for ANSI-C
> too. Pretty standard.
>

Hmmm.., that is to say, they may be used before definitions in this file or
defined in other files like 'struct iommu_ops;' field (Actually, i can
not find this field's
definition in this file). However, if it has been defined in other
header files, we need
not declare here, right?



-- 
Thanks
Harry Wei

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Question on registering driver fops

2012-02-14 Thread Rajat Sharma
On Wed, Feb 15, 2012 at 10:15 AM, Ezequiel García wrote:

> El día 14 de febrero de 2012 19:42, Greg KH  escribió:
> > On Tue, Feb 14, 2012 at 07:05:48PM -0300, Ezequiel García wrote:
> >> I noticed that after registering a video driver with
> >> "video_register_device" the "open" function gets called.
> >> The registration is like:
> >>
> >> peasycap->video_device.fops = &v4l2_fops;
> >> peasycap->video_device.minor = -1;
> >> peasycap->video_device.release = (void *)(&videodev_release);
> >>
> >> video_set_drvdata(&(peasycap->video_device), (void *)peasycap);
> >>
> >> First question: who calls it and why does it open the device?
> >
> > Userspace probably.
> >
>
> I find hard that userspace is who opens the device in this particular case,
> because I get the "open" call just by plugging in the usb, right after
> usb_probe.
> (unless there is a daemon, or udev opens the device for module
> insertion or something).
>
> I will investigate this further.
>

Although it doesn't easily click, but very simple. Implement open fop and
put following print in it:

printk("%s: My caller is %s\n", __func__, current->comm);

You will get to know the user space daemon name. I have seen a daemon on
some linux distributions, called 'hald' (Hardware Abstraction Layer Daemon)
doing these sort of activities when it detects hardware changes. Even in
case of new mount point, it stats every mounted filesystem.

-Rajat

>
> Thanks,
> Ezequiel.
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Init global variables at run-time

2012-01-22 Thread Rajat Sharma
Means the initialization function of this driver is called multiple
times, once on module load, further may be some reset functionality?

On Sun, Jan 22, 2012 at 2:14 AM, V l  wrote:
> Can some one throw more light on this statement=>
>
>  /* Init global variables at run-time, not as part of the declaration.
>  * This is required to support init/de-init of the driver.
> Initialization
>
>  * of globals as part of the declaration results in
> non-deterministic
>  * behaviour since the value of the globals may be different on the
>  * first time that the driver is initialized vs subsequent
> initializations.
>
>
> Thanks and Regards
>
> Sraddha
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Calling a module method from inside the kernel - is it possible Inbox

2012-01-21 Thread Rajat Sharma
well you can do that by exporting a kernel interface for registering
your callbacks and your module can register static function pointers
to be called by kernel.

-Rajat

On Sat, Jan 21, 2012 at 5:42 AM, SaNtosh kuLkarni
 wrote:
> Can you be more specific...wot do u mean by inside the kerneldo like
> want to call a function written inside another kernel module
>
>
> On Sat, Jan 21, 2012 at 3:38 PM, Kevin Wilson  wrote:
>>
>> Hi, all,
>>
>> I want to calling a module method  (I am developing the module)
>> from inside the kernel. How can I achieve it ?
>>
>> (BTW, I know that vice versa is possible by EXPORT_SYMBOL.)
>>
>> rgs,
>> Kevin
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
>
>
> --
> Regards,
> Santosh Kulkarni
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Re: Re: Problem with OOM killer killing process even when there is plenty of RAM.

2011-12-07 Thread Rajat Sharma
On Thu, Dec 8, 2011 at 1:41 AM, Greg KH  wrote:
> On Thu, Dec 08, 2011 at 01:26:42AM +0530, mindentropy wrote:
>> On Wednesday 07 Dec 2011 11:35:36 AM Greg KH wrote:
>> > On Thu, Dec 08, 2011 at 12:36:43AM +0530, mindentropy wrote:
>> > > On Thursday 08 Dec 2011 1:38:19 AM Mulyadi Santosa wrote:
>> > > > On Wed, Dec 7, 2011 at 01:47, mindentropy  
>> > > > wrote:
>> > > > > Hi,
>> > > > >
>> > > > >  I am trying to allocate 512MB of RAM in my driver loaded as a
>> > > > >  module,> > >
>> > > > > but the OOM killer starts killing all my processes. This machine
>> > > > > has
>> > > > > around 24GB RAM and is a 8 core Xeon. The RAM is allocated in
>> > > > > page size chunks (i.e. 131072 chunks each of size PAGE_SIZE).
>> > > >
>> > > > mind to tell us, how do you allocate memory? kmalloc?
>> > >
>> > > Via kmalloc.
>> >
>> > Please use the functions written to give you large memory chunks.  As
>> > you are using a video-for-linux device, why not use the apis that this
>> > framework provides you for this type of thing?
>> >
>>
>> Yes V4L would be done next. There are some things being tried out, this being
>> a custom FPGA hence some controls being experimented.
>>
>> As for the allocating in PAGE_SIZE chunks why am I running out of memory?
>
> The kernel doesn't like allocating such large sizes all at once.
>
>> Can't it steal smaller chunks from the larger chunks? Just curious about the
>> reason.
>
> Use vmalloc if you need/want larger chunks, kmalloc is not for that at
> all.
>
> greg k-h
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

Probably if it is some custom hardware, and you can afford to allocate
memory at boot time, then reserve it exclusively for your use only.

-Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problem with OOM killer killing process even when there is plenty of RAM.

2011-12-06 Thread Rajat Sharma
driver allocating 512M !!?? what kind of driver it is?

-Rajat

On Wed, Dec 7, 2011 at 12:17 AM, mindentropy  wrote:
> Hi,
>
>  I am trying to allocate 512MB of RAM in my driver loaded as a module, but
> the OOM killer starts killing all my processes. This machine has around 24GB
> RAM and is a 8 core Xeon. The RAM is allocated in page size chunks (i.e.
> 131072 chunks each of size PAGE_SIZE). I am using a 32 bit kernel with PAE
> enabled. The allocation works fine on a machine with 8GB of RAM.
>
> Thanks.
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Spinlocks and interrupts

2011-11-10 Thread Rajat Sharma
For most of the block drivers bio_endio runs in a context of its
tasklet, so it is indeed atomic context.

-Rajat

On Fri, Nov 11, 2011 at 4:50 AM, Kai Meyer  wrote:
>
>
> On 11/10/2011 04:00 PM, Jeff Haran wrote:
>>> -Original Message-
>>> From: kernelnewbies-boun...@kernelnewbies.org [mailto:kernelnewbies-
>>> boun...@kernelnewbies.org] On Behalf Of Kai Meyer
>>> Sent: Thursday, November 10, 2011 1:55 PM
>>> To: kernelnewbies@kernelnewbies.org
>>> Subject: Re: Spinlocks and interrupts
>>>
>>> Alright, to summarize, for my benefit mostly,
>>>
>>> I'm writing a block device driver, which has 2 entry points into my
>> code
>>> that will reach this critical section. It's either the make request
>>> function for the block device, or the resulting bio->bi_end_io
>> function.
>>> I do some waiting with msleep() (for now) from the make request
>> function
>>> entry point, so I'm confident that entry point is not in an atomic
>>> context. I also only end up requesting the critical section to call
>>> kmalloc from this context, which is why I never ran into the
>> scheduling
>>> while atomic issue before.
>>>
>>> I'm fairly certain the critical section executes in thread context not
>>> interrupt context from either entry point.
>>>
>>> I'm certain that the spinlock_t is only ever used in one function (a I
>>> posted a simplified version of the critical section earlier).
>>>
>>> It seems that the critical section is often called in an atomic
>> context.
>>> The spin_lock function sounds like it will only cause a second call to
>>> spin_lock to spin if it is called on a separate core.
>>>
>>> But, since I'm certain the critical section is never called from
>>> interrupt context, only thread context, the fact that pre-emption is
>>> disabled on the core should provide the protection I need with out
>>> having to disable IRQs. Disabling IRQs would prevent an interrupt from
>>> occurring while the lock is acquired. I would like to avoid disabling
>>> interrupts if I don't need to.
>>>
>>> So it sounds like spin_lock/spin_unlock is the correct choice?
>>>
>>> In addition, I'd like to be more confident in my assumptions above.
>> Can
>>> I test for atomic context? For instance, I know that you can call
>>> irqs_disabled(), is there a similar is_atomic() function I can call? I
>>> would like to put a few calls in different places to learn what sort
>> of
>>> context I'm.
>>>
>>> -Kai Meyer
>>>
>>> On 11/10/2011 12:19 PM, Jeff Haran wrote:
> -Original Message-
> From: kernelnewbies-
>>> bounces+jharan=bytemobile@kernelnewbies.org
> [mailto:kernelnewbies-
> bounces+jharan=bytemobile@kernelnewbies.org] On Behalf Of
>>> Dave
> Hylands
> Sent: Thursday, November 10, 2011 11:07 AM
> To: Kai Meyer
> Cc: kernelnewbies@kernelnewbies.org
> Subject: Re: Spinlocks and interrupts
>
> Hi Kai,
>
> On Thu, Nov 10, 2011 at 10:14 AM, Kai Meyer   wrote:
>> I think I get it. I'm hitting the scheduling while atomic because
 I'm
>> calling my function from a struct bio's endio function, which is
>> probably running with a lock held somewhere else, and then my
>> mutex
>> sleeps, while the spin_lock functions do not sleep.
> Actually, just holding a lock doesn't create an atomic context.
 I believe on kernels with kernel pre-emption enabled the act of
>> taking
 the lock disables pre-emption. If it didn't work this way you could
>> end
 up taking the lock in one process context and while the lock was
>> held
 get pre-empted. Then another process tries to take the lock and you
>> dead
 lock.

 Jeff Haran

>> Kai, you might want to try bottom posting. It is the standard on these
>> lists. It makes it easier for others to follow the thread.
>>
>> I know of no kernel call that you can make to test for current execution
>> context. There are the in_irq(), in_interrupt() and in_softirq() macros
>> in hardirq.h, but when I've looked at the code that implements them I've
>> come to the conclusion that they sometimes will lie. in_softirq()
>> returns non-zero if you are in a software IRQ. Fair enough. But based on
>> my reading in the past it's looked to me like it will also return
>> non-zero if you've disabled bottom halves from process context with say
>> a call to spin_lock_bh().
>>
>> It would be nice if there were some way of asking the kernel what
>> context you are in, for debugging if for no other reason, but if it's
>> there I haven't found it.
>>
>> I'd love to be proven wrong here, BTW. If others know better, please
>> enlighten me.
>>
>> Jeff Haran
>>
>>
>>
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
> I try to remember to bottom post on message lists, but obviously I've
> been negligent :)
>
> Perhaps I'll just add some calls to msleep() at various places to help
> me i

Re: Spinlocks and interrupts

2011-11-09 Thread Rajat Sharma
Hi Dave,

> Also, remember that spin-locks are no-ops on a single processor
> machine, so as coded, you have no protection on a single-processor
> machine if you're calling from thread context.

Not completely true, spinlock can still provide protection even in
this case i.e. two thread context sharing a resource on UP. Remember
that for peemptive kernel, it disables preemption, so it is guranteed
that scheduler does not through out this thread as long as spinlock is
held.

I agree that a mutex can as well be used for this example but it again
depends on situation, if you want to be quick about your task and are
100% sure that task does not sleep or takes too long processing
cycles, disabling preemption (i.e. spinlock in this case) can be a
better option.

Thanks,
Rajat

On Thu, Nov 10, 2011 at 10:17 AM, rohan puri  wrote:
>
>
> On Thu, Nov 10, 2011 at 3:10 AM, Dave Hylands  wrote:
>>
>> Hi Kai,
>>
>> On Wed, Nov 9, 2011 at 1:07 PM, Kai Meyer  wrote:
>> > When I readup on spinlocks, it seems like I need to choose between
>> > disabling interrupts and not. If a spinlock_t is never used during an
>> > interrupt, am I safe to leave interrupts enabled while I hold the lock?
>> > (Same question for read/write locks if it is different.)
>>
>> So the intention behind using a spinlock is to provide mutual exclusion.
>>
>> A spinlock by itself only really provides mutual exclusion between 2
>> cores, and not within the same core. To provide the mutual exclusion
>> within the same core, you need to disable interrupts.
>>
>> Normally, you would disable interrupts and acquire the spinlock to
>> guarantee that mutual exclusion, and the only reason you would
>> normally use the spinlock without disabling interrupts is when you
>> know that interrupts are already disabled.
>>
>> The danger of acquiring a spinlock with interrupts enabled is that if
>> another interrupt fired (or the same interrupt fired again) and it
>> tried to acquire the same spinlock, then you could have deadlock.
>>
>> If no interrupts touch the spinlock, then you're probably using the
>> wrong mutual exclusion mechanism. spinlocks are really intended to
>> provide mutual exclsion between interrupt context and non-interrupt
>> context.
>>
>> Also remember, that on a non-SMP (aka UP) build, spinlocks become
>> no-ops (except when certain debug checking code is enabled).
>>
>> --
>> Dave Hylands
>> Shuswap, BC, Canada
>> http://www.davehylands.com
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
> Nice explanation Dave.
>
> Regards,
> Rohan Puri
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I test if a physical address is already mapped or not.

2011-10-18 Thread Rajat Sharma
On Wed, Oct 19, 2011 at 6:46 AM, Jeff Haran  wrote:
>> -Original Message-
>> From: StephanT [mailto:stman937-linew...@yahoo.com]
>> Sent: Tuesday, October 18, 2011 6:01 PM
>> To: Jeff Haran; kernelnewbies
>> Subject: Re: How can I test if a physical address is already mapped or not.
>>
>>
>>
>> >>  Yes you are right when you are in user space. However in a kernel
>> >>  module this would be physical address.
>> >
>> > Nope. Unless you are using some really strange processor that I am not
>> familiar
>> > with, a memory read is a memory read. *pReg is going to read what's at
>> > virtual address 0xc000 regardless of whether you are executing in user
>> or
>> > kernel context. I know it's going to work this way on a PC with an Intel 
>> > x86
>> > processor.
>> >
>>
>>
>> OK, but how do you explain I can read 0xC000 and I get kicked out if
>> I try 0x0010.
>
> I can only assume that there is nothing mapped to virtual address 0x0010. 
> On a 32 bit PC, the bottom 3 GB of virtual memory is by default mapped to 
> user space, so if anything you are accessing memory in the current process' 
> virtual memory.
>
>>
>> >>  The 0xC000 falls under PCI address range. I guess on a PC Linux
>> >>  just doesn't map this address range or it maps it one2one (phys==virt)
>> >>  I can dereference this address and I get something plausible.
>> >
>> > On most 32 bit Intel processors running "standard" Linux 0xC000 is
>> > the first address of kernel VM. This is a function of kernel 
>> > configuration, but
>> > it's this way by default. It's still a virtual address. So yes, on your
>> > PC virtual address 0xC000 is mapped to physical RAM.
>> >
>>
>>
>> Would this mean by reading the 0xC000 I am reading the Linux code not
>> PCI registers?
>
> First part of the page table, I think. I don't really remember what the 
> kernel maps down there but I don't think its code.
> Read the Gorman book. It's a lot to get through, but you'll likely learn a 
> lot. There might even be something newer out there.
>
> Jeff Haran
>
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

Stephan,

>From your post it looks like you are trying to simulate environment on
Embedded Linux board to your PC? Did you check if embedded Linux has
CONFIG_MMU kernel config option set? If that is the case then you
might be bypassing page-table translation on your board and CPU
assumes virtual address = physical address in that case. However same
may not be true on PC which uses page-tables.

"Understanding Linux Kernel" book is also a good reference for
understanding memory management in Linux.

One Advice: when shifting from one hardware to another, be ready for
surprises and open to unlearn things :)

Thanks,
Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I test if a physical address is already mapped or not.

2011-10-17 Thread Rajat Sharma
On Tue, Oct 18, 2011 at 1:34 AM, StephanT  wrote:
> Hi all,
> In a kernel module is there a way to test if an address range has
> been already mapped by the kernel or not.
> Ex:  (on a PC with the latest kernel)
> - an access to physical 0xC000 will return the value at this address
> - an access to physical 0x0010 will have the kernel kill the
> module/application
>            using the module. Dmesg: BUG: unable to handle kernel paging
> request at 0010.
>
> How can I test this condition before dereferencing the address.
> Thanks,
> Stephan.
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

When you say an access to physical address, how are you managing to
access the physical address directly?

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Hooking exec system call

2011-09-23 Thread Rajat Sharma
> Untidy way : -
> Yes, you can do that by registering a new binary format handler. Whenever
> exec is called, a list of registered binary format handlers is scanned, in
> the same way you can hook the load_binary & load_library function pointers
> of the already registered binary format handlers.

Challenge with this untidy way is to identify the correct format, for
example if you are interested in only hooking ELF format, there is no
special signature withing the registered format handler to identify
that, however if one format handler recognizes the file header, its
load_binary will return 0. This can give you the hint that you are
sitting on top of correct file format. Long time back I had written
the similar module in Linux to do the same, but can't share the code
:)

-Rajat

On Thu, Sep 22, 2011 at 3:14 PM, rohan puri  wrote:
>
>
> On Thu, Sep 22, 2011 at 1:53 PM, Abhijit Pawar 
> wrote:
>>
>> hi list,
>> Is there any way to hook the exec system call on Linux box apart from
>> replacing the call in System Call table?
>>
>> Regards,
>> Abhijit Pawar
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
> Tidy way : -
>
> You can do that from LSM (Linux security module).
>
> Untidy way : -
> Yes, you can do that by registering a new binary format handler. Whenever
> exec is called, a list of registered binary format handlers is scanned, in
> the same way you can hook the load_binary & load_library function pointers
> of the already registered binary format handlers.
>
> Regards,
> Rohan Puri
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: kthread Working

2011-09-13 Thread Rajat Sharma
On Mon, Sep 12, 2011 at 3:47 PM, Gaurav Mahajan
 wrote:
> Hi,
>
> I have a douby regarding  kthread working.
>
> How can we check that a kthread is working or exited.
> It  may exit beacuse of call to do_exit in threadfn or callling kthread_stop
> function on task_struct (correct me if i am weong).
>
> I want to check that if kthread has finished its work or not.
> If not then what should I call so that kthread gets killed.
> If it is finshhed then which function I should call so that as to free
> kthread resources.
>
> Regards,
> Gaurav Mahajan.
>
>
>
>
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

kthread_stop function should itself block untill thread gets killed.
However if thread is already killed, its task_struct won't be valid.
If its your thread, it should be fairly possible for you keep keep
single exit point, i.e. notify this thread to stop using
'kthread_stop'.

However if you are upto hacking some kernel thread, kthread_stop() may
not work on already exited task, as there won't be any way to identify
a killed thread. Probably you should avoid such design choices.

-Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: difference between io_schedule() and schedule()

2011-08-10 Thread Rajat Sharma
On Wed, Aug 10, 2011 at 1:07 PM, Mulyadi Santosa
 wrote:
> Hi Rajat...
>
> On Tue, Aug 9, 2011 at 09:30, Rajat Sharma  wrote:
>> of course I looked at the source (obvious first step) before asking
>> question and further following tsk->in_iowait, it seems it is just
>> needed for accounting purpose.
>>
>>                        if (tsk->in_iowait) {
>>                                se->statistics.iowait_sum += delta;
>>                                se->statistics.iowait_count++;
>>                                trace_sched_stat_iowait(tsk, delta);
>>                        }
>>
>> Wanted to be sure of "is that it all about"? or I am missing something here?
>
> I got a feeling that the main show is:
>       blk_flush_plug(current);
> that name somehow indicates it is "forcing" read/write to happen,
> either real soon or at the earliest moment possible...
>
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>

Ah, just missed that, thats actually countable difference. Thanks for
the pointer :)

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: difference between io_schedule() and schedule()

2011-08-09 Thread Rajat Sharma
On Mon, Aug 8, 2011 at 8:25 PM, Jonathan Neuschäfer
 wrote:
> On Sat, Aug 06, 2011 at 12:31:21PM +0530, Rajat Sharma wrote:
>> Hi All,
>>
>> What is the difference between io_schedule() and schedule(), is
>> io_schedule() more restrictive to shedule only I/O bound processes or
>> it just favours I/O bound processes. Any documentation link would be
>> great help.
>
> Have you even looked at the source code?
>
> From kernel/sched.c +5721:
> /*
>  * This task is about to go to sleep on IO. Increment rq->nr_iowait so
>  * that process accounting knows that this is a task in IO wait state.
>  */
> void __sched io_schedule(void)
> {
>        struct rq *rq = raw_rq();
>
>        delayacct_blkio_start();
>        atomic_inc(&rq->nr_iowait);
>        blk_flush_plug(current);
>        current->in_iowait = 1;
>        schedule();
>        current->in_iowait = 0;
>        atomic_dec(&rq->nr_iowait);
>        delayacct_blkio_end();
> }
> EXPORT_SYMBOL(io_schedule);
>
> HTH,
>        Jonathan Neuschäfer
>

of course I looked at the source (obvious first step) before asking
question and further following tsk->in_iowait, it seems it is just
needed for accounting purpose.

if (tsk->in_iowait) {
se->statistics.iowait_sum += delta;
se->statistics.iowait_count++;
trace_sched_stat_iowait(tsk, delta);
}

Wanted to be sure of "is that it all about"? or I am missing something here?

-Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


difference between io_schedule() and schedule()

2011-08-06 Thread Rajat Sharma
Hi All,

What is the difference between io_schedule() and schedule(), is
io_schedule() more restrictive to shedule only I/O bound processes or
it just favours I/O bound processes. Any documentation link would be
great help.

Thanks,
Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Asynchronous read

2011-07-31 Thread Rajat Sharma
On Mon, Aug 1, 2011 at 8:15 AM, Adam Cozzette  wrote:
> On Sun, Jul 31, 2011 at 03:58:55PM -0700, Da Zheng wrote:
>> Hello,
>>
>> I'm trying to understand the read operation in VFS, and get confused by the
>> asynchronous and synchronous operations.
>>
>> At the beginning, do_sync_read() invokes aio_read, which is
>> generic_file_aio_read for ext4. generic_file_aio_read should be asynchronous
>> read. But what really confuses me is do_generic_file_read, which is called by
>> generic_file_aio_read. It seems to me do_generic_file_read implements
>> synchronous read as this is the only function I can find that copy data to 
>> the
>> user space by invoking the actor callback function. If do_generic_file_read 
>> is
>> synchronous, how can generic_file_aio_read be asynchronous?
>>
>> In do_generic_file_read, if the data to be read isn't in the cache, normally
>> page_cache_sync_readahead should be called. As far as I understand, when
>> page_cache_sync_readahead returns, the pages will be ready in the cache, but 
>> the
>> corresponding data in the disk isn't necessarily copied to the pages yet
>> (because it eventually only invokes submit_bio to submit the IO requests to 
>> the
>> block layer), so PageUptodate of the requested page might still return false,
>> and then do_generic_file_read tries to invoke readpage to read the page again
>> instead of waiting. Since the disk is always very slow, doesn't it just waste
>> CPU time? Or do I miss something?
>
> This is a bit puzzling. I haven't figured it out but here are some things I 
> came
> across as I was trying to solve the problem.
>
> First of all, this article might shine some light on the problem:
>
> http://lwn.net/Articles/170954/
>
> Essentially, a few years ago there was a simplification of the API and 
> aio_read
> and aio_write gained the ability to do vectored operations, making it possible
> to eliminate readv and writev. This even made it possible for drivers and
> filesystems to avoid implementing read() and write(), since the aio versions
> could take care of that.
>
> So my point is that I suspect that aio_read and aio_write are now often used 
> in
> cases where they're not actually expected to be asynchronous, just because it
> simplifies the API to be able to reuse those functions for synchronous
> operations. In fact the LWN article says:
>
>    Note that this change does not imply that asynchronous operations 
> themselves
>    must be supported - it is entirely permissible (if suboptimal) for
>    aio_read() and aio_write() to operate synchronously at all times.
>
> So perhaps generic_file_aio_read is not actually asynchronous? My only other
> guess is that whatever it does happens fast enough to count as asynchronous.
>
> --
> Adam Cozzette
> Harvey Mudd College Class of 2012
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

Linux libaio behave asynchronous only if file is opened with O_DIRECT mode.

-Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Regarding remap_pfn_page

2011-07-21 Thread Rajat Sharma
On Thu, Jul 21, 2011 at 11:48 PM, mindentropy  wrote:
>>Sorry to kicks in...
> Was wondering where you were :)
>
>>hm, quite likely you run out of contigous virtual address space that
>>as big as you requested... in above, you requested 16 MiB, right?
>
> Not virtual address space but physical page frames. Yes I requested 16MB.
> I want a huge chunk for dma for a high bandwidth pcie(16GB/s!) device . I am
> thinking in terms of no scatter gather but generally any sane device would
> have one.
>

do you see any performance implication of scatter-gather? I think
unless you reach limits, performance should not vary much. Just
curious to know what sort of device it is targeting 16GB/s?

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Regarding remap_pfn_page

2011-07-21 Thread Rajat Sharma
On Wed, Jul 20, 2011 at 11:59 PM, mindentropy  wrote:
>>>huge range means you already have huge physical memory to be mapped
>>>and user space process has huge virtual memory area to accomodate
>>>that. Note that is mapping is specific to one particular process (or
>>>threads sharing process address space) so it is fairly possible to
>>>establish mmap.
>
>>Assuming you have 1 single page size of physical memory. Now when you mmap a
>>huge size would all the different virtual addresses be mapping onto the same
>>physical page frame?
>
> Apologize for multiple messages.
>
> On my x86_64 machine with 3GB of RAM I cannot mmap a of len=16*1024*1024. So I
> am assuming it did not find contiguous pages or has run out of contiguous
> pages(Again assuming that the amount of pages reserved for this is small
> either the memory hole or the high memory).
>
>

So your question is rather more confined : Whether you can allocate
physically cotigous memory of 16MB, right? Mapping is not an issue
here. proper error handling will anyways tell you whether you
succeeded or not. I think with kmalloc, you may not get memory larger
than 4MB.
You can definitely maintain allocation in smaller chunks and maintain
a pool on top of them (simple array). with some linear one-to-one
scheme between virtual address and your pool, you can assign them to
vma.

-Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Regarding remap_pfn_page

2011-07-20 Thread Rajat Sharma
On Tue, Jul 19, 2011 at 11:26 PM, mindentropy  wrote:
> Hi,
>  When I mmap pages via remap_pfn_page method would the physical frames
> assigned the linear address be contiguous? If yes what would happen if I mmap
> a huge range?
>
> Thanks.
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

>  When I mmap pages via remap_pfn_page method would the physical frames
> assigned the linear address be contiguous?

The place where you typically call remap_pfn_page is mmap handler of
your device driver. And you want to assign device memory to user
space. Note that the memory (physical pages) might be already
allocated and there is no requirement for them to have a kernel linear
address assigned to them. For example, on i386, you might allocate
pages with from high memory and remap them. These pages may not have
kernel linear address.
remap_pfn_range is just going to established a secondary virtual
address mapping for these pages into the caller process' page tables,
irrespective of whether these pages have kernel linear address or not.

> If yes what would happen if I mmap a huge range?
huge range means you already have huge physical memory to be mapped
and user space process has huge virtual memory area to accomodate
that. Note that is mapping is specific to one particular process (or
threads sharing process address space) so it is fairly possible to
establish mmap.

Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Device mapper header file error

2011-07-14 Thread Rajat Sharma
On Wed, Jul 13, 2011 at 11:49 PM, Adil Mujeeb  wrote:
> Hi List,
>
>
>
> I am trying to compile a kernel module which uses the device mapper header
> files (dm.h and dm-bio-list.h). I checked in the stock kernel source that
> these files are present under drivers/md directory.
>
> When I try to build the module, it gives error for these header files as
> missing. I checked the Makefile and found that it includes the path
> (-I$(TOPDIR)/ drivers/md)
>
>
>
> But when I check the content of this directory on my machine it doesn’t
> include the header files:
>
> [root@localhost redhat]# ls /usr/src/kernels/2.6.18-8.el5-i686/drivers/md
>
> Kconfig  Makefile  raid6test
>
> [root@localhost redhat]#
>
>
>
> While the other header files referred by the kernel module under
> include/linux directory exists.
>
>
>
> My machine detail:
>
> [adil@localhost ~]$ uname -a
>
> Linux localhost.localdomain 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:21 EST 2007
> i686 i686 i386 GNU/Linux
>
> [adil@localhost ~]$
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Adil Mujeeb
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

Well technically these are internal files of dm module and not
supposed to be used by external loadable kernel modules. So, what you
see in /usr/src/kernels/2.6.18-8.el5-i686 directory is kernel-devel
package against which you can compile module only if they are using
include/linux headers files (as they are supposed to).

But if you really want to compile against internal files of dm, please
use complete kernel source package and compile it on your test
machine. It will have everything you are looking including .c files as
well.

Thanks,
Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to write linux drivers?

2011-06-12 Thread Rajat Sharma
http://lwn.net/Kernel/LDD3/

On Sun, Jun 12, 2011 at 1:11 PM, Lalit Pratap Singh
 wrote:
> Hi all,
>
> I am linux kernel newbie, I am trying to understand linux kernel since very
> long.
> Can anyone tell me how to understand drivers, How can i start working on
> writing drivers?
>
> I just wanted to know that, How can i write the smallest driver and test it?
> Also suggest me where to find the specification of the target hardware, for
> which driver is to be written?
> Ex. floppy drive, bluetooth etc.
>
> I want some basic tutorial links also.
>
>
>
> Thanks
> Lalit Pratap
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: system call

2011-06-11 Thread Rajat Sharma
search for SYSCALL_DEFINE(syscall) in linux source e.g.

SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, int,
mode) in fs/open.c

here  depends on number of arguments to system call: 3 in case of open.

you would find open, read, write and close in fs directory only.

-Rajat

On Sat, Jun 11, 2011 at 3:09 PM, Venkateswarlu P
 wrote:
>
>
> Where do i find system call implementation code?
>
> for  read, write, open, close
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: System call question

2011-06-11 Thread Rajat Sharma
sys_call_table is the array of system call handlers (function
pointers). The syntax is how you typically access an array in GAS.
%eax i.e. EAX register holder the system call number which inturn is
index into this array. since size of pointer on 32 bit platform is 4
bytes, thats why you give 4 as last argument. So, you multiply 4 with
%eax to reach at exact function pointer and then deference that
pointer with a * (i.e. like value at operator in C) and call the
system call handler function.

Formally it is called indexed memory addressing and syntax is like:

base_address(offset_address, index, size)

The data value retrieved is located at

base_address + offset_address + index * size

here in this example base_address is address of sys_call_table,
offset_address is skipped so it defaults to zero. offset address tells
you where to start your search in array, typically its not used. index
is %eax and size is 4.

I recommend "Professional Assembly Language" book by Richard Blum from
Wrox publication.

Thanks,
Rajat
On Sat, Jun 11, 2011 at 1:05 AM, Naman shekhar Mishra
 wrote:
> Platform: x86 32 bit
> system_call() ,afetr doing some checks, looks up in the sys_call_table and
> finds the correct address of the system call and jumps to it. i.e.:
> call *sys_call_table(,%eax,4)
>
> Can you please explain this syntax to me. I think 'call' is the processor
> instruction which works on a function name.
> * is used in gas syntax to denote that the address following it is to be
> used as the jumping address. Tell me if I m wrong so far.
> () is used to index a memory location(GAS syntax again, I think). Please
> explain to me (,%eax,4). I have trouble understanding this syntax.
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Problems with hypercalls

2011-06-08 Thread Rajat Sharma
are you doing 64bit devision on 32 bit arch? If that is the case,
do_div is worth considering.

On Wed, Jun 8, 2011 at 3:25 PM, Mulyadi Santosa
 wrote:
> On Tue, Jun 7, 2011 at 15:39, emilie lefebvre  wrote:
>> "divide error:  [#1] SMP
>> ...
>>  [] panic+0x78/0x137
>>  [] oops_end+0xe4/0x100
>>  [] die+0x5b/0x90
>>  [] do_trap+0xc4/0x160
>>  [] do_divide_error+0x8f/0xb0
>>  [] ? my_function+0xdc/0xe70 "
>>
>> Could you have any suggestions ?
>
> Could you show us your code? perhaps by pasting them somewhere?
>
> >From what I guess, sounds like your code did some math (directly or
> indirectly) that fiddle with floating point numbers?
>
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: mmap on a block device

2011-06-06 Thread Rajat Sharma
yes its fairly possible, go ahead and try it.

On Tue, Jun 7, 2011 at 9:57 AM, Kaustubh Ashtekar  wrote:
> Hello All,
> Is it possible to mmap on a block device? I am writing my own kernel space
> filesystem on Linux (academic project). I am writing a user space tool to
> format the block device (again my own ramdisk driver). It would be much
> simpler if I could just mmap the block device into my address space.
> -Kaustubh
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How a program could generate the memory addresses for its variables, when it is about to run?

2011-05-26 Thread Rajat Sharma
Hi Sandeep,

probably you want to look at how your program is loaded in memory. For
example an ELF binary is understood by ELF format handler inside
kernel. Format handler supply their load_binary methods to load a
program image im memory and initial its different virtual memory areas
(stack, heap etc.). exec system call searches for correct format
handler for you based on file header.

Please go thoroghly through Understanding the Linux Kernel, 3rd
Edition, Chapter 20. Program ExZecution.

-Rajat

On Fri, May 27, 2011 at 12:14 PM, sandeep kumar
 wrote:
> hi all,
> I am new to the linux kernel internals. I know there is a memory management
> subsystem which handles all the memory related things.
>
> But Now i want to know a bit deeper how things work.
>
> I want to start with the following question,
> How a program could generate the memory addresses for its variables, when it
> is about to run?
>
> Can please somebody give pointers how to learn this kind of things like,
> "in the early stages (when our program is about to be executed..about to
> become a process) what are the things that will be done by the kernel?"
>
> Please help me in this regard,
>
> Thanking you,
> Sandeep Kumar A.
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to retrieve page pointer from vm_area_struct?

2011-04-17 Thread Rajat Sharma
use get_user_pages, this function works on vma and pin pages in memory
as well, typically used by filesystem Direct I/O implementation.

Thanks,
Rajat

On Sun, Apr 17, 2011 at 9:34 AM, Park Chan ho  wrote:
> Hi all,
> I want to retrieve "struct page" pointer from vm_area_struct.
> How do I write code below example?
>
> example code
> struct task_struct *p;
> struct vm_area_struct *vma;
> struct page *page;
>
> for_each_process(p) {
>    if (!p->mm) continue;
>    for (vma = p->mm->mmap; vma; vma = vma->vm_next) {
>       for (i = vma->vm_start; i < vma->vm_end; i += PAGE_SIZE) {
>            /* How to get page pointer? */
>            page = ???
>       }
>    }
> }
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: console debugging without a serial line

2011-03-23 Thread Rajat Sharma
If virtual machine is feasible in debug environment, there is no need
even for serial port. For example, following blog shows setting up
kgdb with Sun Virtual Box:

http://fotis.loukos.me/blog/?p=25

It actually is helpful in my projects.

Rajat

On Thu, Mar 24, 2011 at 3:25 AM, julie Sullivan
 wrote:
> Hi list
> On linux-kernel today there were a couple of interesting tips for
> debugging the kernel from a console in the absence of a serial port. I
> thought this might be useful/interesting to someone here (it was to
> me) so here's the link:
>
> https://lkml.org/lkml/2011/3/23/1
>
> Andi Kleen mentions the following:
>
>>There is also USB debugport, but you need a quite expensive cable
>>(~$100) for it. Usually it only works on one of the USB ports.
>
> Googling for 'USB Debugport' found this, which also might be useful:
>
> http://www.coreboot.org/EHCI_Debug_Port#AMIDebug_RX
>
> Does anyone (who might also be interested in reading this article)
> know if the 'Ajays NET20DC USB Debug Device' mentioned in this article
> might be the 'quite expensive cable' mentioned by Andi? (I saw one of
> these listed for $89.)  Apart from generally coming in handy, I have a
> netbook I'm thinking of sacrificing as a kernel testbox in the future
> which I'm sure doesn't have a serial port... (yep, I think that's a
> really great excuse for buying yet another new funky device :-)  )
>
> Cheers
> Julie
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: GSoC project idea: HFS Plus journal

2011-03-20 Thread Rajat Sharma
On Mon, Mar 21, 2011 at 2:15 AM, Greg Freemyer  wrote:
> On Sun, Mar 20, 2011 at 1:07 PM, Naohiro Aota  wrote:
>> Greg KH  writes:
>>
>>> On Thu, Mar 17, 2011 at 06:57:27PM +0900, Naohiro Aota wrote:
 Hi,

 I see some of you are talking about GSoC participation. I'm also
 thinking of it.

 I have MacBookPro booting both MacOSX and Linux. Now Linux can write the
 filesystem safely only when HFS Plus journal is off. My life would be
 more better if Linux have complete HFS Plus filesystem read/write
 support.

 So I'm thinking of implementing HFS Plus Journal support on Linux. I've
 searched and found a technote about HFS Plus format describe its Journal
 [1].

 How do you think about it?
>>>
>>> Looks like a nice self-contained, project proposal, good luck!
>>
>> Thanks. I need someone to mentor me :) How can I find him/her? Maybe I
>> should post developing mailing list. But I don't find HFS+ developing
>> list :(
>>
>> Regards,
>
> You can try linux-fsde...@vger.kernel.org,
>
> That is where generic file system and vfs discussions take place.  I
> assume all  the major file system developers subscribe to that.
>
> Greg (not KH)
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

The code files for HFS mention following copyright:

Copyright (C) 1995-1997  Paul H. Hargrove
(C) 2003 Ardis Technologies 

probably you should talk to Paul and take a lock for your development :)

Rajat

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: release used memory

2011-03-11 Thread Rajat Sharma
> If I have enough input data, the memory is not enough (the "free" value gets
> smaller and smaller, and the "cached" value gets bigger and bigger). Then
> kswapd appears in the list of running processes. I would expect that Linux
> gets rid of the "cached" stuff so that the program gets more memory. Instead,
> nothing happens and kswapd is activated. What's the sense of keeping some
> filesystem buffers when they are not used anymore?

Its filesystem's cache (more popularly known as Page Cache in Linux
(buffer cache historically)).
Writing to memory (DRAM) is always faster than writing to Disk,
similarly if you try to read back the same data, and if it is already
in cache, it would be faster to read it from DRAM. So, linux
page-cache is used for following purpose:
1. Write-back cache
2. Read-caching
3. prefetch more data in case of reading file sequentially i.e. doing
single large read from disk instead of multiple reads.

Linux implementation of page-cache is boundless it can grow as much as
available memory however there are kernel threads to do the cleanup
stuff - bdflush process (not kswapd). There are two things to be dealt
with:
1. writing dirty data to disk to make pages in memory clean. Note that
pages are still kept with file-system page-cache but are clean pages
so can be removed from cache on memory pressure.
2. reclaiming unwanted memory pages. This is done only if there is not
sufficient memory. Although there are multiple ways to clean memory
e.g. swapping out unwanted process but page-cache pages are the most
obvious target. First clean pages are removed from page-cache and if
that it not sufficient dirty pages are flushed to disk first and then
removed from page-cache.

Hope it helps.

Rajat



On Fri, Mar 11, 2011 at 11:24 PM, Andreas Leppert  wrote:
> Hello,
>
> for study reasons, I evaluate a kernel module by running a parallel version of
> bzip. Input data are some senseless data files (500 x 10MB for example). While
> the bzip program runs, I took a look at the output of top and noticed
> something which I do not understand.
>
> In the following, I'm refering to that two lines of top which look something
> like that:
>
> Mem:   8185716k total,  5603224k used,  2582492k free,     9104k buffers
> Swap:  8388604k total,        0k used,  8388604k free,  5374400k cached
>
> While the bzip program runs, it uses memory and thus, the value before "free"
> in the Mem: line is getting smaller. As you can see above, there is also some
> "cached stuff" (5374400k). What is meant by this value? Someone explained me
> that it has to do something with filesystem buffers which were read or
> written. Could you elaborate on this?
>
> If I have enough input data, the memory is not enough (the "free" value gets
> smaller and smaller, and the "cached" value gets bigger and bigger). Then
> kswapd appears in the list of running processes. I would expect that Linux
> gets rid of the "cached" stuff so that the program gets more memory. Instead,
> nothing happens and kswapd is activated. What's the sense of keeping some
> filesystem buffers when they are not used anymore?
>
> The main problem is that I do not understand what by "buffers" and "cached" is
> meant. It would be nice if you can help me on this.
>
> Thanks in advance
> Andreas
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: project ideas

2011-03-10 Thread Rajat Sharma
> You may want to try NFS version 4, nfs.sourceforge.net

could you be more informative, version 4 is long been into the mainline kernel.

Rajat

On Thu, Mar 10, 2011 at 2:03 PM, chandrakant kumar
 wrote:
> Hi Mohit
>
> You may want to try NFS version 4, nfs.sourceforge.net
>
> On 3/10/11, mohit verma  wrote:
>> Hi all  ,
>>
>> I am seeking for any project on Networking related stuff ( in kernel land).
>> I have visited the kernelnewbies.org janitors and kernel projects. Among
>> them i found two projects of my interest but both of them are completed. So
>> can anyone suggest me any ideas?
>>
>>
>> I would appreciate any suggestions and ideas.
>>
>> --
>> 
>> *MOHIT VERMA*
>>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to read data of specified disk block in ext2 ?

2011-02-19 Thread Rajat Sharma
do you mean reading disk block from an ext2 formated disk partition?
open the device file e.g. /dev/sda1 and read the desired block!!

Rajat

On Sat, Feb 19, 2011 at 4:09 PM, kashish bhatia  wrote:
> Hi All,
>
> Does anybody know a function which can be used to read the data of disk
> block in ext2 fs?
> --
> Regards,
> Kashish
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Page table queries

2011-02-19 Thread Rajat Sharma
> so after the setting of user space virtual address the physical memory can
> be directly addreseed by just doing a page table translation in user space
> and kernel doesn't need to be involved.As it has already created the page
> table and returned the corresponding page.Am i right?

Right. Basically this translation is done by CPU while executing any
portion of user program code segment instruction which tries to access
this mapped memory region (vma). Note that this is user vma getting
accessed in user mode, so access is allowed at RPL=3, with smooth
access :). However if page is not mapped, it will result in page fault
exception which traps into kernel and calls nopage method of vma. So,
kernel is involved only for establishing the mapping on demand.

You can say it is equivalent to accessing any other user data segment
in application space, e.g. accessing heap (malloced area). If page is
not already mapped into the heap, it traps into kernel with page fault
and page fault handler of heap (nopage method of vma) will allocate a
page-frame and assign it to vma. So, you can see the equivalence
between two. Only difference comes from kernel handlers for first time
mapping page-frames (RAM or device memory).

Rajat

On Sat, Feb 19, 2011 at 3:02 PM, anish singh
 wrote:
>
>
> On Sat, Feb 19, 2011 at 2:44 PM, Rajat Sharma  wrote:
>>
>> > So, in the start, the page might have kernel mode address, but in the
>> > end it has user mode address. But kernel is still the one that tracks
>> > all the page...be it anonymous or non anonymous ones.
>>
>> Not really. In this particular case of .nopage (page fault handler) of
>> vma, we already have a user space virtual address for the faulty page,
>> we just need it to map it with a physical page frame (RAM or device
>> memory). Just allocate a physical page frame without any address,
>> caller for .nopage will take care of properly setting its user space
>> virtual address (page->va). So, no association with kernel space
>> virtual address is required, e.g. it can be high memory page too
>> without any kernel mapping.
>>
> so after the setting of user space virtual address the physical memory can
> be directly addreseed by just doing a page table translation in user space
> and kernel doesn't need to be involved.As it has already created the page
> table and returned the corresponding page.Am i right?
>>
>> Rajat
>>
>> On Sat, Feb 19, 2011 at 2:10 PM, Mulyadi Santosa
>>  wrote:
>> > Hi :)
>> >
>> > I'll try to help...
>> >
>> > On Sat, Feb 19, 2011 at 13:10, anish singh 
>> > wrote:
>> >> As i understood whenver a user space program is run it is represented
>> >> in
>> >> kernel using VMA which is managed by struct mm_struct
>> >> and whenever the program is trying to read/write to a memory location
>> >> in
>> >> user space it will be directed to physical address using PAGE TABLE
>> >> translation done by struct mm_struct(done in kernel space).Am i right?
>> >
>> > i think not "done" but mm_struct points to PGD that represent's the
>> > whole process address space. Using information provided there, MMU
>> > does translation.
>> >
>> >> Suppose a simple driver wants the user to directly access its device
>> >> memory
>> >> then we use mmap.This mmap associates a set of user space virtual
>> >> address
>> >> with device driver memory and it is done by creating kernel page tables
>> >> for
>> >> the user space virtual addresses.Is the page table translation done
>> >> everytime whenever user space does read/write to the device memory??
>> >
>> > if it's recently translated, quite likely it is already cached in TLB
>> > (translation look aside buffer)
>> >
>> >> In .nopage function call we return the page associated with the
>> >> physical
>> >> address which the user wants to associate with user space virtual
>> >> address.Is
>> >> the page address returned by the nopage function same as seen by the
>> >> user or
>> >> will it be converted to user space virtual address(range between 0-2
>> >> GB)?
>> >
>> > AFAIK, nopage is one of the functions that handle minor page fault, no?
>> >
>> > Anyway, memory allocation, usually start by kernel allocates a page.
>> > Then it is "handed" to user space memory allocator.
>> >
>> > So, in the start, the page might have kernel mode address, but in th

Re: Page Table query

2011-02-19 Thread Rajat Sharma
Even I keep on seeing these errors, everytime I post anything on
kernelnewbies. Initially I kept ignoring, but looks I am not alone to
face this.

As a resolution to this, we need to remove id "a...@fibcom.com" from
kernelnewbies subscribers.

Rajat

On Sat, Feb 19, 2011 at 12:28 PM, anish singh
 wrote:
>
>
> On Sat, Feb 19, 2011 at 12:23 PM, Anuz Pratap Singh Tomar
>  wrote:
>>
>>
>> On Sat, Feb 19, 2011 at 6:40 AM, anish singh 
>> wrote:
>>>
>>> As i understood whenver a user space program is run it is represented in
>>> kernel using VMA which is managed by struct mm_struct
>>> and whenever the program is trying to read/write to a memory location in
>>> user space it will be directed to physical address using PAGE TABLE
>>> translation done by struct mm_struct(done in kernel space).Am i right?
>>>
>>> Suppose a simple driver wants the user to directly access its device
>>> memory then we use mmap.This mmap associates a set of user space virtual
>>> address with device driver memory and it is done by creating kernel page
>>> tables for the user space virtual addresses.Is the page table translation
>>> done everytime whenever user space does read/write to the device memory??
>>>
>>> In .nopage function call we return the page associated with the physical
>>> address which the user wants to associate with user space virtual address.Is
>>> the page address returned by the nopage function same as seen by the user or
>>> will it be converted to user space virtual address(range between 0-2 GB)?
>>>
>>> Thanks for reading.
>>
>> Can you please not post the same stuff twice?
>> Be patient, people do reply to queries here quite often, why show this
>> desperation?
>
>
> The reason i sent the mail twice was due to below mail which i am
> recieving.However sorry for causing any trouble.
> I am getting this erro whenever i am trying to send mail to kernelnewbie.
>
> The attached message had PERMANENT fatal delivery errors!
> After one or more unsuccessful delivery attempts the attached message has
> been removed from the mail queue on this server.  The number and frequency
> of delivery attempts are determined by local configuration parameters.
> YOUR MESSAGE WAS NOT DELIVERED TO ANY OF IT'S RECIPIENTS!
> --- Session Transcript ---
>  Sat 2011-02-19 12:20:43: Parsing message
> 
>  Sat 2011-02-19 12:20:43: *  From: anish198519851...@gmail.com
>  Sat 2011-02-19 12:20:43: *  To: a...@fibcom.com
>  Sat 2011-02-19 12:20:43: *  Subject: Page Table query
>  Sat 2011-02-19 12:20:43: *  Message-ID:
> 
>  Sat 2011-02-19 12:20:43: *  Route slip host: 202.71.129.104
>  Sat 2011-02-19 12:20:43: *  Route slip port: 25
>  Sat 2011-02-19 12:20:43: Attempting SMTP connection to [202.71.129.104]
>  Sat 2011-02-19 12:20:43: Attempting to send message to smart host
>  Sat 2011-02-19 12:20:43: Attempting SMTP connection to [202.71.129.104:25]
>  Sat 2011-02-19 12:20:43: Waiting for socket connection...
>  Sat 2011-02-19 12:20:43: *  Connection established (192.5.1.2:1859 ->
> 202.71.129.104:25)
>  Sat 2011-02-19 12:20:43: Waiting for protocol to start...
>  Sat 2011-02-19 12:20:43: <-- 220 smtp ready smtp.bizmail.net4india.com
>  Sat 2011-02-19 12:20:43: --> EHLO fibcom.com
>  Sat 2011-02-19 12:20:43: <-- 250-smtp.bizmail.net4india.com Hello
> fibcom.com [203.101.81.201]
>  Sat 2011-02-19 12:20:43: <-- 250-SIZE 57671680
>  Sat 2011-02-19 12:20:43: <-- 250-PIPELINING
>  Sat 2011-02-19 12:20:43: <-- 250 HELP
>  Sat 2011-02-19 12:20:43: --> MAIL From:
> SIZE=6874
>  Sat 2011-02-19 12:20:43: <-- 250 OK
>  Sat 2011-02-19 12:20:43: --> RCPT To:
>  Sat 2011-02-19 12:20:43: <-- 550 relay not permitted
> --- End Transcript ---
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Page table queries

2011-02-19 Thread Rajat Sharma
> So, in the start, the page might have kernel mode address, but in the
> end it has user mode address. But kernel is still the one that tracks
> all the page...be it anonymous or non anonymous ones.

Not really. In this particular case of .nopage (page fault handler) of
vma, we already have a user space virtual address for the faulty page,
we just need it to map it with a physical page frame (RAM or device
memory). Just allocate a physical page frame without any address,
caller for .nopage will take care of properly setting its user space
virtual address (page->va). So, no association with kernel space
virtual address is required, e.g. it can be high memory page too
without any kernel mapping.

Rajat

On Sat, Feb 19, 2011 at 2:10 PM, Mulyadi Santosa
 wrote:
> Hi :)
>
> I'll try to help...
>
> On Sat, Feb 19, 2011 at 13:10, anish singh  
> wrote:
>> As i understood whenver a user space program is run it is represented in
>> kernel using VMA which is managed by struct mm_struct
>> and whenever the program is trying to read/write to a memory location in
>> user space it will be directed to physical address using PAGE TABLE
>> translation done by struct mm_struct(done in kernel space).Am i right?
>
> i think not "done" but mm_struct points to PGD that represent's the
> whole process address space. Using information provided there, MMU
> does translation.
>
>> Suppose a simple driver wants the user to directly access its device memory
>> then we use mmap.This mmap associates a set of user space virtual address
>> with device driver memory and it is done by creating kernel page tables for
>> the user space virtual addresses.Is the page table translation done
>> everytime whenever user space does read/write to the device memory??
>
> if it's recently translated, quite likely it is already cached in TLB
> (translation look aside buffer)
>
>> In .nopage function call we return the page associated with the physical
>> address which the user wants to associate with user space virtual address.Is
>> the page address returned by the nopage function same as seen by the user or
>> will it be converted to user space virtual address(range between 0-2 GB)?
>
> AFAIK, nopage is one of the functions that handle minor page fault, no?
>
> Anyway, memory allocation, usually start by kernel allocates a page.
> Then it is "handed" to user space memory allocator.
>
> So, in the start, the page might have kernel mode address, but in the
> end it has user mode address. But kernel is still the one that tracks
> all the page...be it anonymous or non anonymous ones.
>
> Does it help? :)
>
>
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Mapping memory between kernel and user space

2011-02-16 Thread Rajat Sharma
> So virtual address allocated to processes in user space is mapped to kernel
> virtual address(3g-4g range)  using page tables?

there are two types of virtual address - user mode virtual address,
which is mapped between 0-3G and kernel mode virtual address which are
mapped from 3G-4G.

kernel mode virtual addresses page table entries are taken from a
reference master kernel page-table. Basically any function which
establishes a kernel virtual address -> page frame mapping (e.g.
vmalloc() returns kernel virtual address) makes an entry into Master
kernel page table. If a process tries to access same virtual address,
the page fault handler tries to lookup same address mapping into the
Master page table before giving up. If an entry is found, process page
table is updated with this entry. You can say a user process' kernel
mode virtual address range mapping is established only on access.

This scheme allows each process, kernel thread, interrupt handler,
basically all possible kernel mode context to share same kernel
virtual addresses, because all mappings are coming from Master page
table entries.

Thanks,
Rajat

On Wed, Feb 16, 2011 at 1:32 PM, anish singh
 wrote:
>
> On Wed, Feb 16, 2011 at 11:22 AM, Rajat Sharma  wrote:
>>
>> > But coming to kernel they distinguish logical address from virtual
>> > address. What is the main difference.
>>
>> kernel logical address has linear (one-to-one) mapping of physical
>> address to virtual address range. e.g. kernel logical address (linear
>> address) from 3G to 4G (on x86) can map physical memory of 0-1G, so it
>> is intutive to get physical address from a logical address by
>> subtrating 3G from logical address.
>>
>> while kernel virtual address can be though of as logical address with
>> no restriction of linear mapping. Then how they map to physical pages?
>> Well this is achieved through page tables mapping like user space
>> address, however the virtual address range falls in 3G-4G (on x86)
>> range only. Basically you can say it is the process mapping of kernel
>> virtual address range (3G-4G) in its page tables. CPU works through
>> page-tables hence requires kernel virtual address in code
>> instructions. So, it is possible thats a kernel page has kernel
>> virtual address as well as logical address.
>
> So virtual address allocated to processes in user space is mapped to kernel
> virtual address(3g-4g range)  using page tables?
>>
>> > Also, they emphasize on high memory and low memory. Why can not high
>> > memory can be mapped in to kernel completely.
>> > Why is that kernel has less visibility of complete space available on
>> > RAM.
>>
>> For a limited address space range of linear mapping, physical memory
>> has to be limited (one-to-one mapping). So, if your system (x86) has
>> more than 1G physical RAM, Linux provides some mechanism releasing
>> some small slot between (3G + 896M) to 4G for dynamically mapping High
>> phyical memory page frames (>896M physical address), since you can't
>> always map the complete physical RAM all the time. This dynamic
>> mapping is done through kmap().
>>
>> > what is very minimal implementation of MMU for real time systems.
>> # CONFIG_MMU is not set
>> means linear mapping of all physical address to virtual address. Not
>> sure, but seems it requires processor support to work on linear
>> address bypassing page-tables conversion.
>>
>> Rajat
>>
>> On Tue, Feb 15, 2011 at 10:45 PM, Sri Ram Vemulpali
>>  wrote:
>> > As was suggested I started reading chap 15v from LDD. I ran more into
>> > confusion state.
>> >
>> > I know that virtual address(process space), linear address
>> > (segmentation) and physical address. And how are they resolved from
>> > virtual to physical.
>> > But coming to kernel they distinguish logical address from virtual
>> > address. What is the main difference.
>> > Also, they emphasize on high memory and low memory. Why can not high
>> > memory can be mapped in to kernel completely.
>> > Why is that kernel has less visibility of complete space available on
>> > RAM.
>> > Linux MM is it a very specific implementation of linux, or Is that a
>> > traditional implementation.
>> >
>> > what is very minimal implementation of MMU for real time systems.
>> >
>> > Thanks in advance.
>> >
>> > --Sri.
>> >
>> > On Tue, Feb 15, 2011 at 12:54 AM, Ankita Garg  wrote:
>> >> On Wed, Feb 09, 2011 at 06:45:42PM -0500, Sri Ram Vemulpali wrote:
>> >>> Hi all,
&

Re: Mapping memory between kernel and user space

2011-02-15 Thread Rajat Sharma
> But coming to kernel they distinguish logical address from virtual
> address. What is the main difference.

kernel logical address has linear (one-to-one) mapping of physical
address to virtual address range. e.g. kernel logical address (linear
address) from 3G to 4G (on x86) can map physical memory of 0-1G, so it
is intutive to get physical address from a logical address by
subtrating 3G from logical address.

while kernel virtual address can be though of as logical address with
no restriction of linear mapping. Then how they map to physical pages?
Well this is achieved through page tables mapping like user space
address, however the virtual address range falls in 3G-4G (on x86)
range only. Basically you can say it is the process mapping of kernel
virtual address range (3G-4G) in its page tables. CPU works through
page-tables hence requires kernel virtual address in code
instructions. So, it is possible thats a kernel page has kernel
virtual address as well as logical address.

> Also, they emphasize on high memory and low memory. Why can not high
> memory can be mapped in to kernel completely.
> Why is that kernel has less visibility of complete space available on RAM.

For a limited address space range of linear mapping, physical memory
has to be limited (one-to-one mapping). So, if your system (x86) has
more than 1G physical RAM, Linux provides some mechanism releasing
some small slot between (3G + 896M) to 4G for dynamically mapping High
phyical memory page frames (>896M physical address), since you can't
always map the complete physical RAM all the time. This dynamic
mapping is done through kmap().

> what is very minimal implementation of MMU for real time systems.
# CONFIG_MMU is not set
means linear mapping of all physical address to virtual address. Not
sure, but seems it requires processor support to work on linear
address bypassing page-tables conversion.

Rajat

On Tue, Feb 15, 2011 at 10:45 PM, Sri Ram Vemulpali
 wrote:
> As was suggested I started reading chap 15v from LDD. I ran more into
> confusion state.
>
> I know that virtual address(process space), linear address
> (segmentation) and physical address. And how are they resolved from
> virtual to physical.
> But coming to kernel they distinguish logical address from virtual
> address. What is the main difference.
> Also, they emphasize on high memory and low memory. Why can not high
> memory can be mapped in to kernel completely.
> Why is that kernel has less visibility of complete space available on RAM.
> Linux MM is it a very specific implementation of linux, or Is that a
> traditional implementation.
>
> what is very minimal implementation of MMU for real time systems.
>
> Thanks in advance.
>
> --Sri.
>
> On Tue, Feb 15, 2011 at 12:54 AM, Ankita Garg  wrote:
>> On Wed, Feb 09, 2011 at 06:45:42PM -0500, Sri Ram Vemulpali wrote:
>>> Hi all,
>>>
>>>   How do I map some space between kernel and user space. Can anyone
>>> point me in to right direction. I was trying to map the packets from
>>> my netfilter function to kernel user space, to avoid over head of
>>> copying. Thanks in advance.
>>>
>>
>> You can take a look at remap_pfn_range() routine when implementing mmap
>> in your driver.
>>
>> --
>> Regards,
>> Ankita Garg (ank...@in.ibm.com)
>> Linux Technology Center
>> IBM India Systems & Technology Labs,
>> Bangalore, India
>>
>
>
>
> --
> Regards,
> Sri.
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: directory entry

2011-02-15 Thread Rajat Sharma
well just do a readdir and count yourself where you find a match!!

-Rajat

On Tue, Feb 15, 2011 at 4:44 PM, mohit verma  wrote:
> hi all,
> is there any way to find out the offset or the directory entry number of a
> file in the directory ?
>
> let me explain a bit more:
>
> when we open a drectory via open(2) and find out the directory entry by
> readdir(2,3) or getdents(2) it automatically increases to the next
> directory entry and fills in the dirent structure.
> but in kernel space , let we have the parent directory name and child of it
> . so is there any way to figure out at what  offset or what number (exactly)
> that perticular child will occur in the directory contains?
>
> --
> 
> MOHIT VERMA
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Page Cache Address Space Concept

2011-02-14 Thread Rajat Sharma
One vital use of address_space object is by filesystem to manage
page-cache of a file:
- cache recently accessed data in page-cache
- Read-ahead (prefectiching) of data sequentially read data in page-cache
- support memory mapped I/O through page-cache.

Look at address_space_operations vector how it achieve it through
readpage, readpages, writepage and writepages methods to populate and
flush page-cache of an inode.

Thanks,
Rajat

On Mon, Feb 14, 2011 at 4:29 PM, piyush moghe  wrote:
> While going through Page Cache explanation in "Professional Linux Kernel"
> book I came across one term called "address space" ( not related to virtual
> or physical address space )
> I did not get what is the meaning of this address space, following is
> verbatim description:
> "To manage the various target objects that can be processed and cached in
> whole pages, the kernel uses an abstraction of
> the "address space" that associates the pages in memory with a specific
> block device (or any other system unit or part of a
> system unit).
> This type of address space must not be confused with the virtual and
> physical address spaces provided by the
> system or processor. It is a separate abstraction of the Linux kernel that
> unfortunately bears the same name.
> Initially, we are interested in only one aspect. Each address space has a
> "host" from which it obtains its data. In most
> cases, these are inodes that represent just one file.[2] Because all
> existing inodes are linked with their superblock (as
> discussed in Chapter 8), all the kernel need do is scan a list of all
> superblocks and follow their associated inodes to obtain
> a list of cached pages"
>
> Can anyone please explain what is the use of this and what this is all
> about?
> Regards,
> Piyush
>
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Mapping memory between kernel and user space

2011-02-11 Thread Rajat Sharma
Hi Sri,

LKD chapter 15 is right resource for you. specifically you can look
for remap_pfn_range function to map I/O memory to user space.

Thanks,
Rajat

On Fri, Feb 11, 2011 at 6:12 PM, YOUNGWHAN SONG  wrote:
> Hi Sri,
>
>
> On Feb 10, 2011, at 11:56 AM, Sri Ram Vemulpali wrote:
>
>> Hi Santosa,
>>
>>   Can you please be more explicit. I do manage buffers internally in
>> my module.
>> Some cases if it full I will lose data. Can you please provide more
>> detailed explanation
>> on how to approach this. Thanks.
>>
>> --Sri
>>
>> On Wed, Feb 9, 2011 at 10:21 PM, Mulyadi Santosa
>>  wrote:
>>> On Thu, Feb 10, 2011 at 06:45, Sri Ram Vemulpali
>>>  wrote:
 Hi all,

  How do I map some space between kernel and user space. Can anyone
 point me in to right direction. I was trying to map the packets from
 my netfilter function to kernel user space, to avoid over head of
 copying. Thanks in advance.
>
> Isn't it possible if your driver supports mmap? Have you checked it out in 
> Linux Kernel Drive chapter 15?
>
>
>>>
>>> Not trying to discourage you, but I assume your "filtering" function
>>> will be engaged many many times in the case of rapid traffic...thus,
>>> the buffer might grow rapidly too, right? In that case, are you sure
>>> direct mapping could cope with it? Well unless you're ready to loose
>>> some data .
>>>
>>> Anyway, I think you can do that by reserve the buffer in user space
>>> and the get_user_page() them. As the bridge, a unique device with
>>> ioctl() might do the job.
>>>
>>> --
>>> regards,
>>>
>>> Mulyadi Santosa
>>> Freelance Linux trainer and consultant
>>>
>>> blog: the-hydra.blogspot.com
>>> training: mulyaditraining.blogspot.com
>>>
>>
>>
>>
>> --
>> Regards,
>> Sri.
>>
>> ___
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
> Daniel.
>
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


  1   2   >