Re: delete from hlist ???

2010-04-07 Thread ZhangMeng
Because the field "pprev" holds the address of the "next" field in the
previous node.
And *pprev = next; just like the prev->next = next; connect the two nodes
in forward direction.
Hope it's helpful for you.




On Thu, Apr 8, 2010 at 1:58 PM, Onkar Mahajan  wrote:

> I am not able to understand how the marked like skips the node n :
>
> static inline void __hlist_del(struct hlist_node *n)
> {
> struct hlist_node *next = n->next;
> struct hlist_node **pprev = n->pprev;
> *pprev = next; <---
> 
> if (next)
> next->pprev = pprev;
> }
>
> in this case
>
> static inline void __list_del(struct list_head * prev, struct list_head *
> next)
> {
> next->prev = prev;
> prev->next = next;
> }
>
> It is clear ...
>
>
> Please help.
>
> Regards,
> Onkar
>
>


-- 
Yours sincerely
ZhangMeng


Re: delete from hlist ???

2010-04-07 Thread Manish Katiyar
On Thu, Apr 8, 2010 at 11:28 AM, Onkar Mahajan  wrote:
> I am not able to understand how the marked like skips the node n :
>
> static inline void __hlist_del(struct hlist_node *n)
> {
>     struct hlist_node *next = n->next;
>     struct hlist_node **pprev = n->pprev;
>     *pprev = next; <--- 
>     if (next)
>         next->pprev = pprev;
> }
>
> in this case
>
> static inline void __list_del(struct list_head * prev, struct list_head *
> next)
> {
>     next->prev = prev;
>     prev->next = next;
> }
>
> It is clear ...

Why is not clear in the first case. It is doing exactly the same
thing. Except the fact the in list_head both the fields prev and next
are pointers while in hlist next is pointer but prev is pointer to
pointer.

NO ?

Thanks -
Manish


>
>
> Please help.
>
> Regards,
> Onkar
>
>



-- 
Thanks -
Manish
==
[$\*.^ -- I miss being one of them
==

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: BH_Boundary flag

2010-04-07 Thread Joel Fernandes
> On Thu, Apr 8, 2010 at 11:34 AM, Joel Fernandes 
> wrote:
>>
>> Could anyone explain the significance of the BH_Boundary flag in a
>> buffer head. and when you should this flag be set or cleared?
>> What other effects does setting or clearing this flag have?
>>
> Set if the block to be submitted after this one will not be adjacent to this
> one.
>

Do you mean the next logical block (in an inode) is not physically
adjancent to the current block being submitted?

Wouldn't it be hard to determine this as then you would have to map
the next logical block to determine its physical number and check if
it is adjacent to the current block.

Could you tell me what would happen if we never set this flag?

I appreciate your response, thanks a lot.

It might be a better idea to bottom post, people get pissed otherwise.
You could see a sample here:
http://en.wikipedia.org/wiki/Posting_style#Bottom-posting

Thanks,
-Joel

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: BH_Boundary flag

2010-04-07 Thread Onkar Mahajan
Set if the block to be submitted after this one will not be adjacent to this
one.
On Thu, Apr 8, 2010 at 11:34 AM, Joel Fernandes wrote:

> Could anyone explain the significance of the BH_Boundary flag in a
> buffer head. and when you should this flag be set or cleared?
> What other effects does setting or clearing this flag have?
>
> Thanks,
> Joel
>
> --
> To unsubscribe from this list: send an email with
> "unsubscribe kernelnewbies" to ecar...@nl.linux.org
> Please read the FAQ at http://kernelnewbies.org/FAQ
>
>


BH_Boundary flag

2010-04-07 Thread Joel Fernandes
Could anyone explain the significance of the BH_Boundary flag in a
buffer head. and when you should this flag be set or cleared?
What other effects does setting or clearing this flag have?

Thanks,
Joel

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



delete from hlist ???

2010-04-07 Thread Onkar Mahajan
I am not able to understand how the marked like skips the node n :

static inline void __hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next;
struct hlist_node **pprev = n->pprev;
*pprev = next; <--- 
if (next)
next->pprev = pprev;
}

in this case

static inline void __list_del(struct list_head * prev, struct list_head *
next)
{
next->prev = prev;
prev->next = next;
}

It is clear ...


Please help.

Regards,
Onkar


list query

2010-04-07 Thread Onkar Mahajan
What is LIST_POISON1 and LIST_POISON2  ??

#define LIST_POISON1  ((void *) 0x00100100 + POISON_POINTER_DELTA)
#define LIST_POISON2  ((void *) 0x00200200 + POISON_POINTER_DELTA)

and why they are used ?

Regards,
Onkar


Re: Good News

2010-04-07 Thread Shyamal Shukla
Seems like my account was made use of by some kind of a virus .. Please
ignore any such mails and sorry for the inconvenience ..

I myself woke up to this surprising mail being sent out to all my contacts

Thanks,
Shyamal

On Thu, Apr 8, 2010 at 4:51 AM, Shyamal Shukla wrote:

>  Dear Sir
> New month has come. Mooiaa will give you a surprising gift from April.
> All products will be sold at 15% - 30% discount. Good quality, quality
> of service, but more preferential price. The promotion will last only
> 45 days, from 1 April to 15 May, 2010. Browse  www.mooiaa66.info
> today!
> Best wishes!
>



-- 
Linux - because life is too short for reboots...


Re: Good News

2010-04-07 Thread Shyamal Shukla
Seems like my account was made use of by some kind of a virus .. Please
ignore any such mails and sorry for the inconvenience ..

I myself woke up to this surprising mail being sent out to all my contacts

Thanks,
Shyamal

On Thu, Apr 8, 2010 at 4:51 AM, Shyamal Shukla wrote:

>  Dear Sir
> New month has come. Mooiaa will give you a surprising gift from April.
> All products will be sold at 15% - 30% discount. Good quality, quality
> of service, but more preferential price. The promotion will last only
> 45 days, from 1 April to 15 May, 2010. Browse  www.mooiaa66.info
> today!
> Best wishes!
>



-- 
Linux - because life is too short for reboots...


Re: why choose 896MB to the start point of ZONE_HIGHMEM

2010-04-07 Thread Nobin Mathew
On Wed, Apr 7, 2010 at 10:44 PM, H. Peter Anvin  wrote:
> On 04/07/2010 09:48 AM, Himanshu Aggarwal wrote:
>> I think for some architectures, the position of highmem is constrained
>> by hardware as well.  It is not always a kernel decision and not always
>> configurable as in case of x86.
>
>
> This is correct.
>
>> In case of MIPS32, low memory is between 0 and 512 MB and high memory
>> starts above 512 MB. Also the user space is of size 2 GB.
>>
>> Please see the definition of macros PAGE_OFFSET and HIGHMEM_START at :
>> http://lxr.linux.no/linux+v2.6.33/arch/mips/include/asm/mach-generic/spaces.h
>
> Right so far...
>
>> This is because MIPS32 processors have KSEG0 and KSEG1 segments lying
>> between 0 and 512 MB and KSEG2/3 lies above it.
>>
>> May be someone on the group can confirm this.
>
> Wrong.  I have to say this thread has been just astonishing in the
> amount of misinformation.
>
> On MIPS32, userspace is 0-2 GB, kseg0 is 2.0-2.5 GB and kseg1 is 2.5-3.0
> GB.  kseg2/3 (3.0-4.0 GB), which invokes the TLB, is used for the
> vmalloc/iomap/kmap area.
>
> LOWMEM has to fit inside kseg0, so LOWMEM is limited to 512 MB in thie
> current Linux implementation.

http://www.johnloomis.org/microchip/pic32/memory/memory.html

So what is the memory division here in mips, again 1:3?

kseg2 is already 1 GB address space?


-Nobin

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: how to hook a syscall in kernel 2.6

2010-04-07 Thread Mulyadi Santosa
Hi...

On Thu, Apr 8, 2010 at 01:01, Elvis Y. Tamayo Moyares
 wrote:
> It's true. I managed to hook into the kernel 2.4 and 2.6 using LKM but how
> can do it in  2.6.30 or higher, not let me change the syscall table
> references ...
> when I add the LKM to stdout I get 'Killed'.
> and when I try to remove the LKM tells me that is in use.
> In some sites say that around 2.6.30 the syscall table is readonly.
> I need to know if there is another way to make the syscall hook arround
> 2.6.30

Please don't top post...

Another step you can try is by hooking directly to the syscall
function itself. I mean, let's say you want to hook fork, then IIRC
you can hook the do_fork(). You may use kprobe for this need (if you
enable it in your current kernel). Or you may try to explore ftrace's
function hook.

Have fun 

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Unable to change IRQ affinity

2010-04-07 Thread Abraham Arce
Hi,

I've been working with 2 drivers, an i2c based touchcreen and a spi
based ethernet. My problem comes when I try to change the affinity for
their irqs to be handled by processor 2, change is not reflected, for
the ethernet I have...

   # cat /proc/interrupts | grep eth0
   194:   4883  0GPIO  eth0
   # cat /proc/irq/194/smp_affinity
   3
   # echo 1 > /proc/irq/194/smp_affinity
   # cat /proc/irq/194/smp_affinity
   3

Is there any flag needed in the irq handler to allow the affinity? I
have tested the irq affinity for a keypad driver and it is working as
expected...

Thanks for your comments...

Best Regards
Abraham

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: how to hook a syscall in kernel 2.6

2010-04-07 Thread Elvis Y. Tamayo Moyares
It's true. I managed to hook into the kernel 2.4 and 2.6 using LKM but  
how can do it in  2.6.30 or higher, not let me change the syscall  
table references ...

when I add the LKM to stdout I get 'Killed'.
and when I try to remove the LKM tells me that is in use.
In some sites say that around 2.6.30 the syscall table is readonly.
I need to know if there is another way to make the syscall hook arround 2.6.30

Elvis.

"Sangman Kim"  escribió:


Hello Elvis,

There are numerous ways you can do, once you have root privilege.
But if you don't, it is probably impossible without some illegal way.

Actually, system call hooking itself is not very proper thing even for
people with root,
but you can refer to many linux rootkit codes available in security sites.

Most of them use LKM(loadable kernel module)s to load their code,
and manipulate either syscall handler, the system call table, or other
structures available in kernel.
You can even manipulate page tables and make the code section writable with
your module.

Sangman


On Wed, Apr 7, 2010 at 8:43 AM, Elvis Y. Tamayo Moyares <
etmoya...@grm.uci.cu> wrote:


hi list
I need to hook a system call in kernel 2.6,for kernel 2.6.30 or higher it
is very dificulty. I have read in some places and tell me that in these
versions the system call table is read only. Is there any way to hook a
system call in kernel 2.6.30 or higher?
thanks in advance


This message was sent using IMP, the Internet Messaging Program.



--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ









This message was sent using IMP, the Internet Messaging Program.


--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: how to hook a syscall in kernel 2.6

2010-04-07 Thread Sangman Kim
Hello Elvis,

There are numerous ways you can do, once you have root privilege.
But if you don't, it is probably impossible without some illegal way.

Actually, system call hooking itself is not very proper thing even for
people with root,
but you can refer to many linux rootkit codes available in security sites.

Most of them use LKM(loadable kernel module)s to load their code,
and manipulate either syscall handler, the system call table, or other
structures available in kernel.
You can even manipulate page tables and make the code section writable with
your module.

Sangman


On Wed, Apr 7, 2010 at 8:43 AM, Elvis Y. Tamayo Moyares <
etmoya...@grm.uci.cu> wrote:

> hi list
> I need to hook a system call in kernel 2.6,for kernel 2.6.30 or higher it
> is very dificulty. I have read in some places and tell me that in these
> versions the system call table is read only. Is there any way to hook a
> system call in kernel 2.6.30 or higher?
> thanks in advance
>
> 
> This message was sent using IMP, the Internet Messaging Program.
>
>
>
> --
> To unsubscribe from this list: send an email with
> "unsubscribe kernelnewbies" to ecar...@nl.linux.org
> Please read the FAQ at http://kernelnewbies.org/FAQ
>
>


how to hook a syscall in kernel 2.6

2010-04-07 Thread Elvis Y. Tamayo Moyares

hi list
I need to hook a system call in kernel 2.6,for kernel 2.6.30 or higher  
it is very dificulty. I have read in some places and tell me that in  
these versions the system call table is read only. Is there any way to  
hook a system call in kernel 2.6.30 or higher?

thanks in advance


This message was sent using IMP, the Internet Messaging Program.



--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: why choose 896MB to the start point of ZONE_HIGHMEM

2010-04-07 Thread Himanshu Aggarwal
I think for some architectures, the position of highmem is constrained by
hardware as well.  It is not always a kernel decision and not always
configurable as in case of x86.

In case of MIPS32, low memory is between 0 and 512 MB and high memory starts
above 512 MB. Also the user space is of size 2 GB.

Please see the definition of macros PAGE_OFFSET and HIGHMEM_START at :
http://lxr.linux.no/linux+v2.6.33/arch/mips/include/asm/mach-generic/spaces.h

This is because MIPS32 processors have KSEG0 and KSEG1 segments lying
between 0 and 512 MB and KSEG2/3 lies above it.

May be someone on the group can confirm this.

~Himanshu


On Wed, Apr 7, 2010 at 6:20 PM, Chetan Nanda  wrote:

>
>
> On Wed, Apr 7, 2010 at 5:40 PM, Xianghua Xiao wrote:
>
>> On Wed, Apr 7, 2010 at 12:48 AM, Venkatram Tummala
>>  wrote:
>> > I completely agree with you. I was just trying to clarify Xianghua's
>> > statement "last 128 MB is used for HIGHMEM". I got the feeling that he
>> > thought that last 128MB can be used for vmalloc, IO and for HIGHMEM. So,
>> i
>> > was clarifying that last 128MB is not "used for highmem" but it is used
>> to
>> > support highmem.(among many other things). That was what i intended.
>> >
>> > On Tue, Apr 6, 2010 at 7:09 PM, H. Peter Anvin  wrote:
>> >>
>> >> On 04/06/2010 07:04 PM, Venkatram Tummala wrote:
>> >> > Hey Xiao,
>> >> >
>> >> > last 128MB is not used for highmem. last 128MB is used for data
>> >> > structures(page tables etc.) to support highmem .  Highmem is not
>> >> > something which is "INSIDE" Kernel's Virtual Address space. Highmem
>> >> > refers to a region of "Physical memory" which can be mapped into
>> >> > kernel's virtual address space through page tables.
>> >> >
>> >> > Regards,
>> >> > Venkatram Tummala
>> >> >
>> >>
>> >> Not quite.
>> >>
>> >> The vmalloc region is for *anything which is dynamically mapped*, which
>> >> includes I/O, vmalloc, and HIGHMEM (kmap).
>> >>
>> >>-hpa
>> >>
>> >> --
>> >> H. Peter Anvin, Intel Open Source Technology Center
>> >> I work for Intel.  I don't speak on their behalf.
>> >>
>> >
>> >
>>
>> Thanks Venkatram, do these sound right:
>>
>> 1. All HIGHMEM(physical address beyond 896MB) are kmapped back into
>> the last 128MB kernel "virtual" address space(using page tables stored
>> in the last 128MB physical address). That also implies it's a very
>> limited virtual space for large memory system and need do kunmap when
>> you're done with it(so you can kmap other physical memories in).
>> I'm not familiar with large-memory systems, not sure how kmap cope
>> with that using this limited 128M window assuming kernel is 1:3 split.
>>
>
> Small correction here,  page-table is not store in last 128MB of physical
> memory. Instead the complete page table for 4GB virtual address space is
> stored in kernel data-structures (kernel data section).
>
> ~cnanda
>
> 2. The last 128MB physical address can be used for page tables(kmap),
>> vmalloc, IO,etc
>>
>> Regards,
>> Xianghua
>>
>> --
>> To unsubscribe from this list: send an email with
>> "unsubscribe kernelnewbies" to ecar...@nl.linux.org
>> Please read the FAQ at http://kernelnewbies.org/FAQ
>>
>>
>


Re: __read_mostly

2010-04-07 Thread Onkar Mahajan
Yes precisely the same what I see..

Thanks,
Onkar

On Wed, Apr 7, 2010 at 9:16 PM, Mulyadi Santosa
wrote:

> Hi Onkar
>
> On Wed, Apr 7, 2010 at 17:56, Onkar Mahajan  wrote:
> > static struct hlist_head *inode_hashtable __read_mostly;
> >
> >
> > what is the use of  __read_mostly ?
> >
> > I did not find any detailed documentation for this
> > on GCC  website , Is it GCC optimization feature ?
>
> I failed to search the definition in kernel source or gcc docs.
> However, I am pretty confidence that this is somekind of definition to
> mark the related variable/structure/whatever and put them into the
> same memory segment.
>
> By doing so, the programmers are hoping that these data are grouped
> into the same cache line. And since they are mostly read only, cache
> won't be invalidated too much (maybe zero in some cases) if there is
> update into that cache line.
>
> Does this clear your doubt?
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>


Re: __read_mostly

2010-04-07 Thread Mulyadi Santosa
Hi Onkar

On Wed, Apr 7, 2010 at 17:56, Onkar Mahajan  wrote:
> static struct hlist_head *inode_hashtable __read_mostly;
>
>
> what is the use of  __read_mostly ?
>
> I did not find any detailed documentation for this
> on GCC  website , Is it GCC optimization feature ?

I failed to search the definition in kernel source or gcc docs.
However, I am pretty confidence that this is somekind of definition to
mark the related variable/structure/whatever and put them into the
same memory segment.

By doing so, the programmers are hoping that these data are grouped
into the same cache line. And since they are mostly read only, cache
won't be invalidated too much (maybe zero in some cases) if there is
update into that cache line.

Does this clear your doubt?

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: data sheet for SATA drive

2010-04-07 Thread Greg Freemyer
Onkar,

First thing you need to learn is not to Top Post.  I'll let you
research it, but it is basically taboo on lkml lists, so you will get
flamed if you do it on one of the official lists.

My answers below:

On Wed, Apr 7, 2010 at 1:49 AM, Onkar Mahajan  wrote:
> Greg,
>   My intention is to learn to write a SATA driver with the hardware
> that I have. I have a SATA hard drive from Western Digital (MDL :
> WD800JD-75MSAS)
> and SATA controller (Intel 82801 GB/GR/GH ( ICH7 family ) for which the
> drivers are already
> present. I want to unload the drivers and reverse engineer them and make
> them working.
> Is it a good way to learn SATA/SCSI drivers ?? Please guide me with your
> invaluable experience.
>
> Regards,
> Onkar

A little background.

First a little about SCSI:

SCSI is a layered protocol stack not too different than TCP/IP.  There
is actually SCSI-1, SCSI-2, and SCSI-3.  I believe those correlate
directly with layers 1, 2, 3 of the 7 layer network protocol stack
that I assume you are familiar with from TCP/IP type networking.

>From the inception of libata (the new ata driver that is in 2.6.x
kernels) it has been part of the scsi subsystem and I believe replaces
SCSI-1 and SCSI-2, but uses SCSI-3 from the scsi stack itself.  That
is why sata drives are name /dev/sda etc.

fyi: usb does this to, but usb actually uses scsi-2 on the wire.  Then
within the external drive enclosure is a SAT.  (Scsi to ATA
translator).

So at the connection medium layer (replacing SCSI-1) we have the sata
cable and associated voltages.  I'll assume you don't care much about
that.

At the messaging layer (replacing scsi-2) we have the sata spec.
uhhh.. I mean we have the portion of the ATA-7 spec. associated with
sata devices.

You will definitely want to get your hands on a RFC version of the
ATA-7 spec.  Or even the ATA-8 spec. which has been seeing a lot of
work in the last couple years.  I don't know if it was ever finalized.
 I've posted a link to the ATA-8 draft spec. a few times but don't
have it handy right now.

fyi: I think most of the changes in ATA-8 relate to SSDs and your
asking about traditional rotating disk, so ATA-7 should be sufficient.

If you want to experiment with sata messaging, I'd suggest you look at
hdparm.  Its on sourceforge and Mark Lord keeps it pretty current.  It
can send a large number of the sata messages.  I don't think it can
send commands that actually update a sector with data.  So it is
mostly about the command and control, not payload.

Ok, now to a little more comparison to networking.

I assume you know in the LAN world we have NICs, switches, and
destination computers (ie. more nics).

In the sata world we have analogous items:

NIC => Sata Controller
Switch => Port Multiplier
destination computer => hard drive

So your request for a hdd spec. is sort of like being in the
networking world and saying "I want to write a TCP/IP driver, can some
one tell me where I can find the Win2008 spec. so I know how to talk
to it?"

Not very logical, right.  At some point you may have to add a "quirk"
in to your nic driver to support a weird behavior in the Win2008
stack, but it is by no means where you would start.

libata works the same.  The core libata driver handles the sata
messaging protocol for talking to disks and is implemented to follow
the ata-7 spec.  So for most hard drives they should just work and no
extra coding is needed at all.

 But, within the libata framework are sata controller drivers, just as
within the network portion of linux there are NIC drivers.

So what you need primarily are the sata spec (ata-7) and the controller spec.

I hope that at least explains things enough to let you try to read the
kernel code and begin to learn the libata subsystem.

Greg

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: why choose 896MB to the start point of ZONE_HIGHMEM

2010-04-07 Thread Chetan Nanda
On Wed, Apr 7, 2010 at 5:40 PM, Xianghua Xiao wrote:

> On Wed, Apr 7, 2010 at 12:48 AM, Venkatram Tummala
>  wrote:
> > I completely agree with you. I was just trying to clarify Xianghua's
> > statement "last 128 MB is used for HIGHMEM". I got the feeling that he
> > thought that last 128MB can be used for vmalloc, IO and for HIGHMEM. So,
> i
> > was clarifying that last 128MB is not "used for highmem" but it is used
> to
> > support highmem.(among many other things). That was what i intended.
> >
> > On Tue, Apr 6, 2010 at 7:09 PM, H. Peter Anvin  wrote:
> >>
> >> On 04/06/2010 07:04 PM, Venkatram Tummala wrote:
> >> > Hey Xiao,
> >> >
> >> > last 128MB is not used for highmem. last 128MB is used for data
> >> > structures(page tables etc.) to support highmem .  Highmem is not
> >> > something which is "INSIDE" Kernel's Virtual Address space. Highmem
> >> > refers to a region of "Physical memory" which can be mapped into
> >> > kernel's virtual address space through page tables.
> >> >
> >> > Regards,
> >> > Venkatram Tummala
> >> >
> >>
> >> Not quite.
> >>
> >> The vmalloc region is for *anything which is dynamically mapped*, which
> >> includes I/O, vmalloc, and HIGHMEM (kmap).
> >>
> >>-hpa
> >>
> >> --
> >> H. Peter Anvin, Intel Open Source Technology Center
> >> I work for Intel.  I don't speak on their behalf.
> >>
> >
> >
>
> Thanks Venkatram, do these sound right:
>
> 1. All HIGHMEM(physical address beyond 896MB) are kmapped back into
> the last 128MB kernel "virtual" address space(using page tables stored
> in the last 128MB physical address). That also implies it's a very
> limited virtual space for large memory system and need do kunmap when
> you're done with it(so you can kmap other physical memories in).
> I'm not familiar with large-memory systems, not sure how kmap cope
> with that using this limited 128M window assuming kernel is 1:3 split.
>

Small correction here,  page-table is not store in last 128MB of physical
memory. Instead the complete page table for 4GB virtual address space is
stored in kernel data-structures (kernel data section).

~cnanda

2. The last 128MB physical address can be used for page tables(kmap),
> vmalloc, IO,etc
>
> Regards,
> Xianghua
>
> --
> To unsubscribe from this list: send an email with
> "unsubscribe kernelnewbies" to ecar...@nl.linux.org
> Please read the FAQ at http://kernelnewbies.org/FAQ
>
>


Re: why choose 896MB to the start point of ZONE_HIGHMEM

2010-04-07 Thread Xianghua Xiao
On Wed, Apr 7, 2010 at 12:48 AM, Venkatram Tummala
 wrote:
> I completely agree with you. I was just trying to clarify Xianghua's
> statement "last 128 MB is used for HIGHMEM". I got the feeling that he
> thought that last 128MB can be used for vmalloc, IO and for HIGHMEM. So, i
> was clarifying that last 128MB is not "used for highmem" but it is used to
> support highmem.(among many other things). That was what i intended.
>
> On Tue, Apr 6, 2010 at 7:09 PM, H. Peter Anvin  wrote:
>>
>> On 04/06/2010 07:04 PM, Venkatram Tummala wrote:
>> > Hey Xiao,
>> >
>> > last 128MB is not used for highmem. last 128MB is used for data
>> > structures(page tables etc.) to support highmem .  Highmem is not
>> > something which is "INSIDE" Kernel's Virtual Address space. Highmem
>> > refers to a region of "Physical memory" which can be mapped into
>> > kernel's virtual address space through page tables.
>> >
>> > Regards,
>> > Venkatram Tummala
>> >
>>
>> Not quite.
>>
>> The vmalloc region is for *anything which is dynamically mapped*, which
>> includes I/O, vmalloc, and HIGHMEM (kmap).
>>
>>        -hpa
>>
>> --
>> H. Peter Anvin, Intel Open Source Technology Center
>> I work for Intel.  I don't speak on their behalf.
>>
>
>

Thanks Venkatram, do these sound right:

1. All HIGHMEM(physical address beyond 896MB) are kmapped back into
the last 128MB kernel "virtual" address space(using page tables stored
in the last 128MB physical address). That also implies it's a very
limited virtual space for large memory system and need do kunmap when
you're done with it(so you can kmap other physical memories in).
I'm not familiar with large-memory systems, not sure how kmap cope
with that using this limited 128M window assuming kernel is 1:3 split.

2. The last 128MB physical address can be used for page tables(kmap),
vmalloc, IO,etc

Regards,
Xianghua

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: hlist_node v/s list_head

2010-04-07 Thread Neependra Khare
On Wed, Apr 7, 2010 at 5:09 PM, Onkar Mahajan  wrote:

> hi All,
>
> struct hlist_head {
> struct hlist_node *first;
> };
>
> struct hlist_node {
> struct hlist_node *next, **pprev;<
> };
>
> Why there is a double pointer here  ??
>

Hope this helps:-
http://mail.nl.linux.org/kernelnewbies/2008-02/msg1.html
 http://lkml.org/lkml/2000/7/28/10

Neependra

>
>
> struct list_head {
> struct list_head *next, *prev;
> };
>
>
> Regards,
> Onkar
>
>


hlist_node v/s list_head

2010-04-07 Thread Onkar Mahajan
hi All,

struct hlist_head {
struct hlist_node *first;
};

struct hlist_node {
struct hlist_node *next, **pprev;<
};

Why there is a double pointer here  ??


struct list_head {
struct list_head *next, *prev;
};


Regards,
Onkar


Re: need to get blocks per group in user space

2010-04-07 Thread arshad hussain
On Wed, Apr 7, 2010 at 1:37 PM, arshad hussain  wrote:
> On Wed, Apr 7, 2010 at 1:10 PM, Manish Katiyar  wrote:
>> On Wed, Apr 7, 2010 at 1:06 PM, nidhi mittal hada
>>  wrote:
>>> hello All
>>>
>>> I want to get some of filesystem specific information
>>> like blocksize , s_blocks_per_group
>>> etc in userspace c program
>>>
>>> How do i get it ?
>>
>> Use libext2fs or see the code of dumpe2fs
My bad. I jumped to answer without going
through the question throughly.  Manish
is correct, in case your requirement is to get
FS blk size  etc ... through userspace 'c' program
libext2fs is the way. One another way is to
read fs partition or img raw read in first 1024
bytes and cast it to super block struct. You will
be need to include ext2_fs.h.

Thanks

> debugfs . Assuming you want it for ext2/3/4.
>
> Thanks.
>>
>>> i have obtained block size by stat() function
>>> but not more than that :(
>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Nidhi Mittal Hada
>>> Scientific officer D
>>> Computer Division
>>> Bhabha Atomic Research Center
>>> Mumbai
>>>
>>>
>>>
>>
>>
>>
>> --
>> Thanks -
>> Manish
>> ==
>> [$\*.^ -- I miss being one of them
>> ==
>>
>> --
>> To unsubscribe from this list: send an email with
>> "unsubscribe kernelnewbies" to ecar...@nl.linux.org
>> Please read the FAQ at http://kernelnewbies.org/FAQ
>>
>>
>

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



__read_mostly

2010-04-07 Thread Onkar Mahajan
static struct hlist_head *inode_hashtable __read_mostly;


what is the use of  *__read_mostly *?

I did not find any detailed documentation for this
on GCC  website , Is it GCC optimization feature ?



Only thing I learnt is that It is used in SMP system
in that , given only reads , each processor can cache
copy of these kind of variables however if a write occurs,
one processor assumes ownership of that general memory
area and this is a very expensive operation . ( cache bouncing ...)
??
Is this the correct use of these variables ???

Regards,
onkar


Re: need to get blocks per group in user space

2010-04-07 Thread Onkar Mahajan
You can get this information through proc interface , or if you just need to
see it in
logs printk serves the purpose :-)

Regards,
Onkar

On Wed, Apr 7, 2010 at 1:06 PM, nidhi mittal hada
wrote:

> hello All
>
> I want to get some of filesystem specific information
> like blocksize , s_blocks_per_group
> etc in userspace c program
>
> How do i get it ?
> i have obtained block size by stat() function
> but not more than that :(
>
>
>
>
> --
> Thanks & Regards
> Nidhi Mittal Hada
> Scientific officer D
> Computer Division
> Bhabha Atomic Research Center
> Mumbai
>
>
>


Re: why choose 896MB to the start point of ZONE_HIGHMEM

2010-04-07 Thread Venkatram Tummala
Hey Chetan,

Exactly.You are absolutely correct! But I thought that as the conversion is
platform specific & in most cases, we can do away with page tables in the
identity mapped segment if it is as simple as adding an offset, which is a
kind of a performance hack. We can look at the arch specific __pa(..) and
__va(..) implementation in the kernel to have a better idea.

For example, if we look at arch specific IA64 code, __pa and __va are
defined as follows :

FILE : arch/IA64/include/asm/page.h
.

#ifdef __ASSEMBLY__ 
 #
define __pa (x
)((x
) - PAGE_OFFSET
)
 #
define __va (x
)((x
) + PAGE_OFFSET
)
 #else
/* !__ASSEMBLY */

So, in this case no page tables were used. This is what i meant.

On Tue, Apr 6, 2010 at 11:48 PM, Chetan Nanda  wrote:

>
>
> On Wed, Apr 7, 2010 at 11:50 AM, Venkatram Tummala  > wrote:
>
>> Can you please explain the last statement. My understanding was - the
>> exact conversion formula of virtual address to physical address in identity
>> mapped segment is platform specific and in most cases, this is as simple as
>> the addition of an offset. So we can do away with page tables to access
>> memory mapped in identity mapped segment.
>>
>
> Hi Venkat,
>
> That is what I was trying to point out. even the address that are generated
> by kernel are virtual address and they still goes through page table for
> mapping to actual physical address.
> And this is done by MMU (via page table), so MMU must be configured to do
> the same.
>
> ~cnanda
>
> Am i missing something?
>>
>> Regards,
>> Venkatram Tummala
>>
>>
>> On Tue, Apr 6, 2010 at 11:04 PM, H. Peter Anvin  wrote:
>>
>>> On 04/06/2010 10:57 PM, Venkatram Tummala wrote:
>>> > Just a note Chetan.
>>> >
>>> > We can't exactly say that we require "page table settings" to map that
>>> > 896 MB of physical ram. It is an identity mapped segment (1-1 mapping).
>>> > So, we dont require the "page tables".  Virtual address will be equal
>>> to
>>> > Physical Address + Page Offset. It is just an addition of offset
>>> >
>>>
>>> No, we still need page tables for the identity-mapped segment.
>>>
>>>-hpa
>>>
>>> --
>>> H. Peter Anvin, Intel Open Source Technology Center
>>> I work for Intel.  I don't speak on their behalf.
>>>
>>>
>>
>


Re: need to get blocks per group in user space

2010-04-07 Thread arshad hussain
On Wed, Apr 7, 2010 at 1:10 PM, Manish Katiyar  wrote:
> On Wed, Apr 7, 2010 at 1:06 PM, nidhi mittal hada
>  wrote:
>> hello All
>>
>> I want to get some of filesystem specific information
>> like blocksize , s_blocks_per_group
>> etc in userspace c program
>>
>> How do i get it ?
>
> Use libext2fs or see the code of dumpe2fs
debugfs . Assuming you want it for ext2/3/4.

Thanks.
>
>> i have obtained block size by stat() function
>> but not more than that :(
>>
>>
>>
>>
>> --
>> Thanks & Regards
>> Nidhi Mittal Hada
>> Scientific officer D
>> Computer Division
>> Bhabha Atomic Research Center
>> Mumbai
>>
>>
>>
>
>
>
> --
> Thanks -
> Manish
> ==
> [$\*.^ -- I miss being one of them
> ==
>
> --
> To unsubscribe from this list: send an email with
> "unsubscribe kernelnewbies" to ecar...@nl.linux.org
> Please read the FAQ at http://kernelnewbies.org/FAQ
>
>

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: why choose 896MB to the start point of ZONE_HIGHMEM

2010-04-07 Thread Venkatram Tummala
Yes, to be technically very correct, process switch is a more appropriate
word to use here. Because, context switch includes the case where an
interrupt has to be serviced while a process is executing and after the
interrupt is serviced, the same process may continue execution. A process
switch on other hand involves switching to another user level process.

Having said that, words context switching & process switching are often used
interchangeably. I dont know the actual norm used in the linux kernel
discussion.

On Wed, Apr 7, 2010 at 12:33 AM, Siddu  wrote:

> Hi Venkat
>
> On Wed, Apr 7, 2010 at 3:58 AM, Venkatram Tummala 
> wrote:
>
>> Joel,
>>
>> To make things clear, 896 MB is not a hardware limitation. The 3GB:1GB
>> split can be configured during the kernel build but the split cannot be
>> changed dynamically.
>>
>> you are correct that ZONE_* refers to grouping of physical memory but the
>> very concept of ZONES is logical and not physical.
>>
>> Now, why does ZONE_NORMAL has only 896MB on a 32 bit system?
>>
>> If you recall the concept of virtual memory, you will remember that its
>> aim is to provide a illusion to the user processes that it has all the
>> theoritical maximum memory possible on that specific architecture, which is
>> 4GB in this case, and that that is only process running on the system. The
>> kernel internally deals with pages, swapping in & out pages to create this
>> illusion. The advantage is that user processes does not have to care about
>> how much physical memory is actually present in the system.
>>
>> So, out of this 4GB, it was conceptually decided that 3GB is the process's
>> virtual address space and 1GB is the kernel virtual address space. The
>> kernel maps these 3GB of user processes' virtual address space to physical
>> memory using page tables. The kernel can just address 1GB of virtual
>> addresses. This 1GB of virtual addresses is directly mapped (1-1 mapping)
>> into the physical memory without using page tables. If the kernel wants to
>> address more virtual addresses, it has to kmap the high memory(ZONE_HIGHMEM)
>> which sets up the page tables etc. So, you can imagine this as : "Whenever a
>> context switch
>
>
> Shouldnt "context switch" be termed as "process switch" over here .
> Correct me if am wrong !
>
>
>> occurs, 3GB virtual address space of the previous running process will be
>> replaced by the virtual address space of the newly selected process, and the
>> 1GB always remains with the kernel." Note that all this is virtual (That is,
>> conceptual), this is only an illusion.
>>
>> So, out of this 1GB of kernel virtual address space that is 1-1 mapped
>> into the physical memory(without requiring page tables), 0-16MB is used by
>> device drivers, 896MB - 1024MB is used by the kernel for vmalloc, kmap, etc
>> which leaves (16MB - 896MB) and this range is "called" ZONE_NORMAL.
>>
>> Giving specific emphasis to the word "called" in the previous sentence.
>>
>> In summary, the kernel can only access 896 MB of physical ram because it
>> only has 1GB of virtual address space available out of which the lower 16MB
>> is used for DMA by device drivers and the 896MB-1024MB is used to support
>> kmap, vmalloc etc. And note that this limitation is not because of the
>> hardware but this is because of the conceptualization of the division of
>> virtual address space into user address space & kernel address space.
>>
>> For example, you can make the split 2G-2G instead of 3G-1G. So, the kernel
>> can now use 2GB of virtual address space (directly mapped to 2GB of physical
>> memory). You can also make the split 1GB:3GB instead of 3GB:1GB as already
>> explained.
>>
>> Hope this clears the confusion.
>>
>> Regards,
>> Venkatram Tummala
>>
>>
>>
>> On Tue, Apr 6, 2010 at 1:01 PM, Joel Fernandes wrote:
>>
>>> Hi Peter,
>>>
>>> On Wed, Apr 7, 2010 at 1:14 AM, H. Peter Anvin  wrote:
>>> > On 04/06/2010 12:20 PM, Frank Hu wrote:
>>> >>>
>>> >>> The ELF ABI specifies that user space has 3 GB available to it.  That
>>> >>> leaves 1 GB for the kernel.  The kernel, by default, uses 128 MB for
>>> I/O
>>> >>> mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
>>> >>>
>>> >>> All of these boundaries are configurable; with PAE enabled the user
>>> >>> space boundary has to be on a 1 GB boundary.
>>> >>>
>>> >>
>>> >> the VM split is also configurable when building the kernel (for 32-bit
>>> >> processors).
>>> >
>>> > I did say "all these boundaries are configurable".  Rather explicitly.
>>> >
>>>
>>> I thought the 896 MB was a hardware limitation on 32 bit architectures
>>> and something that cannot be configured? Or am I missing something
>>> here? Also the vm-splits refer to "virtual memory" . While ZONE_* and
>>> the 896MB we were discussing refers to "physical memory". How then is
>>> discussing about vm splits pertinent here?
>>>
>>> Thanks,
>>> -Joel
>>>
>>> --
>>> To unsubscribe from this list: send an email with
>>> "unsubscribe kernelnewbies

Re: need to get blocks per group in user space

2010-04-07 Thread Manish Katiyar
On Wed, Apr 7, 2010 at 1:06 PM, nidhi mittal hada
 wrote:
> hello All
>
> I want to get some of filesystem specific information
> like blocksize , s_blocks_per_group
> etc in userspace c program
>
> How do i get it ?

Use libext2fs or see the code of dumpe2fs

> i have obtained block size by stat() function
> but not more than that :(
>
>
>
>
> --
> Thanks & Regards
> Nidhi Mittal Hada
> Scientific officer D
> Computer Division
> Bhabha Atomic Research Center
> Mumbai
>
>
>



-- 
Thanks -
Manish
==
[$\*.^ -- I miss being one of them
==

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecar...@nl.linux.org
Please read the FAQ at http://kernelnewbies.org/FAQ



Re: why choose 896MB to the start point of ZONE_HIGHMEM

2010-04-07 Thread Siddu
Hi Venkat

On Wed, Apr 7, 2010 at 3:58 AM, Venkatram Tummala wrote:

> Joel,
>
> To make things clear, 896 MB is not a hardware limitation. The 3GB:1GB
> split can be configured during the kernel build but the split cannot be
> changed dynamically.
>
> you are correct that ZONE_* refers to grouping of physical memory but the
> very concept of ZONES is logical and not physical.
>
> Now, why does ZONE_NORMAL has only 896MB on a 32 bit system?
>
> If you recall the concept of virtual memory, you will remember that its aim
> is to provide a illusion to the user processes that it has all the
> theoritical maximum memory possible on that specific architecture, which is
> 4GB in this case, and that that is only process running on the system. The
> kernel internally deals with pages, swapping in & out pages to create this
> illusion. The advantage is that user processes does not have to care about
> how much physical memory is actually present in the system.
>
> So, out of this 4GB, it was conceptually decided that 3GB is the process's
> virtual address space and 1GB is the kernel virtual address space. The
> kernel maps these 3GB of user processes' virtual address space to physical
> memory using page tables. The kernel can just address 1GB of virtual
> addresses. This 1GB of virtual addresses is directly mapped (1-1 mapping)
> into the physical memory without using page tables. If the kernel wants to
> address more virtual addresses, it has to kmap the high memory(ZONE_HIGHMEM)
> which sets up the page tables etc. So, you can imagine this as : "Whenever a
> context switch


Shouldnt "context switch" be termed as "process switch" over here .
Correct me if am wrong !


> occurs, 3GB virtual address space of the previous running process will be
> replaced by the virtual address space of the newly selected process, and the
> 1GB always remains with the kernel." Note that all this is virtual (That is,
> conceptual), this is only an illusion.
>
> So, out of this 1GB of kernel virtual address space that is 1-1 mapped into
> the physical memory(without requiring page tables), 0-16MB is used by device
> drivers, 896MB - 1024MB is used by the kernel for vmalloc, kmap, etc which
> leaves (16MB - 896MB) and this range is "called" ZONE_NORMAL.
>
> Giving specific emphasis to the word "called" in the previous sentence.
>
> In summary, the kernel can only access 896 MB of physical ram because it
> only has 1GB of virtual address space available out of which the lower 16MB
> is used for DMA by device drivers and the 896MB-1024MB is used to support
> kmap, vmalloc etc. And note that this limitation is not because of the
> hardware but this is because of the conceptualization of the division of
> virtual address space into user address space & kernel address space.
>
> For example, you can make the split 2G-2G instead of 3G-1G. So, the kernel
> can now use 2GB of virtual address space (directly mapped to 2GB of physical
> memory). You can also make the split 1GB:3GB instead of 3GB:1GB as already
> explained.
>
> Hope this clears the confusion.
>
> Regards,
> Venkatram Tummala
>
>
>
> On Tue, Apr 6, 2010 at 1:01 PM, Joel Fernandes wrote:
>
>> Hi Peter,
>>
>> On Wed, Apr 7, 2010 at 1:14 AM, H. Peter Anvin  wrote:
>> > On 04/06/2010 12:20 PM, Frank Hu wrote:
>> >>>
>> >>> The ELF ABI specifies that user space has 3 GB available to it.  That
>> >>> leaves 1 GB for the kernel.  The kernel, by default, uses 128 MB for
>> I/O
>> >>> mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
>> >>>
>> >>> All of these boundaries are configurable; with PAE enabled the user
>> >>> space boundary has to be on a 1 GB boundary.
>> >>>
>> >>
>> >> the VM split is also configurable when building the kernel (for 32-bit
>> >> processors).
>> >
>> > I did say "all these boundaries are configurable".  Rather explicitly.
>> >
>>
>> I thought the 896 MB was a hardware limitation on 32 bit architectures
>> and something that cannot be configured? Or am I missing something
>> here? Also the vm-splits refer to "virtual memory" . While ZONE_* and
>> the 896MB we were discussing refers to "physical memory". How then is
>> discussing about vm splits pertinent here?
>>
>> Thanks,
>> -Joel
>>
>> --
>> To unsubscribe from this list: send an email with
>> "unsubscribe kernelnewbies" to ecar...@nl.linux.org
>> Please read the FAQ at http://kernelnewbies.org/FAQ
>>
>>
>


-- 
Regards,
~Sid~
A little bird which escaped the nest had to fall before it learnt to fly !


need to get blocks per group in user space

2010-04-07 Thread nidhi mittal hada
hello All

I want to get some of filesystem specific information
like blocksize , s_blocks_per_group
etc in userspace c program

How do i get it ?
i have obtained block size by stat() function
but not more than that :(




-- 
Thanks & Regards
Nidhi Mittal Hada
Scientific officer D
Computer Division
Bhabha Atomic Research Center
Mumbai