Re[2]: vkernel & GSoC, some questions

2008-03-18 Thread Igor Shmukler
> I have thought of the vkernel primarily as an aid to kernel development 
> (where performance is not a prime concern), not as a virtualisation 
> solution that will compete with Xen and VMWare. It's difficult to 
> compete with thousands of men-hours paid by corporate funding.
> 
> So far nobody has expressed interest in vkernels as a tool for kernel 
> development. And I got the general impression that I've proposed 
> something stupid and useless.

I don't think that what you have proposed is stupid or useless. Sorry if I came 
across rude.

However, if I understand what Matt has done correctly, DragonFly can be used to 
develop virtualized FreeBSD and the 5 seconds restart would still be there. 
[Perhaps, some extension might be necessary, but fundamentally it should be 
possible. Is it not?]

If that indeed is the case, I would rather more people worked on the same 
codebase as opposed to everyone maintaining their own version [one with 
renaming and the one without]. Would it not be better to extend existing 
vkernels on DragonFly to do more and support other guests making into a [more] 
powerful kernel development platform?

BSDs have many great "things" to offer, but there is not enough people. I was 
under the impression even laptops are not fully supported yet. That has been on 
the TODO for years.

If the goal is have the "power to serve" real people, extending the existing 
jail into a complete container is probably more useful. Does it matter whether 
a developer is using FreeBSD-over-FreeBSD instead of virtualized FreeBSD over 
DragonFly?

Even easier could be extending FreeBSD to support afterburning and running 
L4FreeBSD as an L4 server. 
That is however another dog with different fleas.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[6]: vkernel & GSoC, some questions

2008-03-18 Thread Igor Shmukler
Matt,

We use VMWare Server at work. It does not have the same nice image management 
interface and/or video capture as commercial counterparts. However, it is is 
free and testing on it helps us out big time. We never concluded whether it 
maked sense to pay for VMWare licenses, instead of using free shell scripts 
legally available for free.

I have used UML for development in the past. I even used bochs once to debug a 
boot loader. All nice tools. Beats real hardware for me.

Xen and KVM are significantly slower than commercial products due to hardware 
switching. There is a GPLed product that works about as fast as VMWare's BT - 
VirtualBox by innotek. Sun recently scooped them up.

Don't you use something like VMWare for development and debugging?

In production, we don't use any of these products - too slow and too much RAM 
would be required.

Sincerely,

Igor Shmukler, http://www.elusiva.com

-Original Message-
From: Matthew Dillon <[EMAIL PROTECTED]>
To: Igor Shmukler <[EMAIL PROTECTED]>
Date: Mon, 17 Mar 2008 14:58:25 -0700 (PDT)
Subject: Re: Re[4]: vkernel & GSoC, some questions

> 
> 
> :
> :Matt,
> :
> :You sure won't argue that UML isolation is inherently better than one that 
> can be provided by a hypervisor. If the performance is the same, what are you 
> gaining?
> :
> :Hypervisor while slow, allows treating a complete OS with all applications 
> as a black box. Why would I choose UML over a hypervisor?
> :
> :I am not trying to say there cannot be a place for vkernel. [I don't even 
> yet understand what is does or how.] However, as a hosting company, why would 
> I choose UML over a hypervisor?
> :
> :...
> :
> :igor
> 
> Well, whos hypervisor are you using?
> 
>   -Matt
>   Matthew Dillon 
>   <[EMAIL PROTECTED]>

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[4]: vkernel & GSoC, some questions

2008-03-16 Thread Igor Shmukler
Matt,

You sure won't argue that UML isolation is inherently better than one that can 
be provided by a hypervisor. If the performance is the same, what are you 
gaining?

Hypervisor while slow, allows treating a complete OS with all applications as a 
black box. Why would I choose UML over a hypervisor?

I am not trying to say there cannot be a place for vkernel. [I don't even yet 
understand what is does or how.] However, as a hosting company, why would I 
choose UML over a hypervisor?

I can provide a number of reasons to pick a hypervisor:
1. use the same platform to host Unix, Windows and other guests
2. load balance all available hardware [based on some policy]
3. better implies that a hypervisor upgrade is less likely to damage guests

I am sure people hosting on hypervisors could write a longer list.

Containers [including jail] provide significantly lower overhead[, but more 
difficult to maintain]. At least it can be argued [probably both ways] that 
containers are cheaper.

Are there real world people hosting with UML today who could comment on this, 
perhaps supporting Matt's position?

igor

-Original Message-
From: Matthew Dillon <[EMAIL PROTECTED]>
To: Igor Shmukler <[EMAIL PROTECTED]>
Date: Sun, 16 Mar 2008 17:12:00 -0700 (PDT)
Subject: Re: Re[2]: vkernel & GSoC, some questions

> 
> 
> :
> :Given the fact that there are not as many developers as needed, what would 
> be a practical purpose of vkernel?
> :
> :UML is typically used to debug drivers and/or for hosting. Now that Linux 
> about to have or already has container technology, hosting on UML makes 
> little sense.
> 
> The single largest benefit UML or a hardware emulated environment has
> over a jail is that it is virtually impossible to crash the real kernel
> no matter what you are doing within the virtualized environment.  I
> don't know any ISP that is able to keep a user-accessible (shell prompt)
> machine up consistently outside of a UML environment.  The only reason
> machines don't crash more is that they tend to run a subset of available
> applications in a subset of possible load and resource related
> circumstances.
> 
> Neither jails no containers nor any other native-kernel technology will
> EVER solve that problem.  For that matter, no native-kernel technology
> will ever come close to providing the same level of compartmentalization
> from a security standpoint, and particularly not if you intend to run
> general purposes applications in that environment.
> 
> The reason UML is used, particularly for web hosting, is because 
> web developers require numerous non-trivial backend tools to be installed
> each of which has the potential to hog resources, crash the machine,
> create security holes, or otherwise create hell for everyone else.  The
> hell needs to be restricted and narrowed as much as possible so human
> resources can focus on the cause rather then on the collateral damage.
> For any compute-intensive business, collateral damage is the #1 IT issue,
> the cost of power is the #2 issue, and network resources are the #3
> issue.  Things like cpu and machines... those are in the noise.  They're
> basically free.
> 
> With a virtual kernel like UML (or our vkernel), the worse that happens
> is that the vkernel itself crashes and reboots in 5 seconds (+ fsck time
> for that particular user).  No other vkernel is effected, no other 
> customer is effected, no other compartmentalized resource is effected.
> 
> Jails are great, no question about it, and there are numerous applications
> which require the performance benefits that running in a jail verses
> an emulated environment provides, but we will never, EVER see jails
> replace UML.  This is particularly true considering the resource being
> put into improving emulated environments.  The overhead for running an
> emulated environment ten years from now is probably going to be a
> fraction of the overhead it is now, as hardware catches up to desire.
> 
>   -Matt
> 


[EMAIL PROTECTED]: Новый Bugatti – самый дорогой авто Женевы
http://r.mail.ru/cln3686/auto.mail.ru
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: vkernel & GSoC, some questions

2008-03-16 Thread Igor Shmukler
What's vkernel's or modern UML multithreaded performance compared to native?

I have not been reading hackers in a long time and have no idea what's going 
on... Please excuse my butting in...

Given the fact that there are not as many developers as needed, what would be a 
practical purpose of vkernel?

UML is typically used to debug drivers and/or for hosting. Now that Linux about 
to have or already has container technology, hosting on UML makes little sense.

KVM and other hypervisors are valuable testing tools and can sometimes make 
sense in a hosting environment. If someone was to work on an open source 
hypervisor, perhaps they should consider Innotek's product. KVM and Xen use VT 
extensions to run guests in a protected mode. It's a little slow. Innotek has a 
fast binary translator.

The big questions is whether there is a practical reason to run FreeBSD as a 
host, or this more about the "Freedom of choice?"

I couple of years ago, we implemented a fairly complete container functionality 
in FreeBSD 5.x. It even supported live-migration of virtual environments. I 
showed it A. Perlstein while he was working in New York. We tried to see if 
anyone was interested at the time, but we have found none.

-Original Message-
From: Robert Watson <[EMAIL PROTECTED]>
To: "Andrey V. Elsukov" <[EMAIL PROTECTED]>
Date: Sun, 16 Mar 2008 12:56:21 + (GMT)
Subject: Re: vkernel & GSoC, some questions

> 
> On Sun, 16 Mar 2008, Andrey V. Elsukov wrote:
> 
> > 16.03.08, 09:30, "David O'Brien" <[EMAIL PROTECTED]>:
> >
> >>> Add virtual kernel (vkernel) support to FreeBSD for the i386 and amd64 
> >>> architectures.
> >>>
> >>> The vkernel support in question is the one found in DragonFlyBSD.
> >>
> >> Not being up on DragonFlyBSD, can you better describe what "vkernel" is?
> >
> > vkernel is similar to User Mode Linux technology. You can boot vkernel as a 
> > user mode process. I think it will be good to have similar in FreeBSD. 
> > There 
> > are several links: 
> > http://leaf.dragonflybsd.org/mailarchive/users/2007-01/msg00237.html 
> > http://www.dragonflybsd.org/docs/articles/vkernel/vkernel.shtml
> 
> Another avenue to consider is the Linux KVM virtualization technology, which 
> is seeing a high level of interest in the Linux community and sounds 
> increasingly mature and well-exercised.  It would also offer interesting 
> migration benefits for Linux users wanting to try FreeBSD, allowing them to 
> trivially create new FreeBSD installs under their existing Linux install.  We 
> had an SoC project last year but I'm not sure what the outcome was; it would 
> be useful to give Fabio a ping and see how things are going.  Obviously, 
> anyone doing this project would need to manage the license issues involved 
> carefully.
> 
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> ___
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 


[EMAIL PROTECTED]: Новый Bugatti – самый дорогой авто Женевы
http://r.mail.ru/cln3686/auto.mail.ru
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[3]: vn_fullpath() again

2005-09-06 Thread Igor Shmukler
Robert,

Thank you very much for a detailed reply. I was aware of many of the things you 
mentioned, but it never hurts to hear something one more time.

How do you feel about small incremental improvements to name lookup?

What about looking up device name in the structure itself for VCHR nodes then 
prepending /dev/ and returning device name, as a first step?

If incremental improvements sound like a good idea, maybe we could do a few 
small modifications that would cover some additional cases. Would not it be 
good?

Thank you in advance,

Igor

-Original Message-
From: Robert Watson <[EMAIL PROTECTED]>
To: Igor Shmukler <[EMAIL PROTECTED]>
Date: Tue, 6 Sep 2005 16:21:47 +0100 (BST)
Subject: Re[2]: vn_fullpath() again

> 
> On Tue, 6 Sep 2005, Igor Shmukler wrote:
> 
> >>> You are correct about the Unix file system organization, but does it
> >>> mean reliable vnode to fullname conversation is not possible?
> >>
> >> Yes.  Get over it.
> >
> > Well, I do not think it is a Yes. I very much think it is a No. You 
> > should have continued reading my email 'til the middle or even farther.
> 
> There are various tricks that can be played to increase the chances of 
> finding a name in the name cache, but those tricks run out quickly on 
> systems like NFS servers where files can be accessed without being looked 
> up since the last boot, or with background fsck.  This is a fundamental 
> property of the UNIX file system design, and it while it offers some quite 
> powerful capabilities, nothing changes the fact that names are 
> fundamentally second class systems in the file system and VFS design.
> 
> The main tricks that can be played are:
> 
> - Don't purge intermediate but unused nodes from the name cache.  A
>specific design choice in FreeBSD has been to allow cache entries for
>unused nodes to be removes so that the nodes can be reused.  On systems
>that rapidly consume vnodes, this allows more vnodes to be recycled, so
>means more memory available.  However, it also means that it is less
>likely to be possible to reconstruct a name from the name cache.
> 
> - Maintain references to cache entries instead of vnodes when accessing
>leaf files.  This is actually somewhat the approach taken by Linux --
>typically the hardest name to "identify" is the last segment to reach a
>file, since files can have hard links (and directories typically don't).
>That name can rapidly be invalidated due to renaming, unlinking,
>linking, and so on, and hence can be quite stale, but if you assume the
>name space is static, this will help out with the "files don't have
>parents" problem.
> 
> - With a minor redesign of UFS, eliminating hard links, it is possible to
>add a directory back-pointer to the parent of a file.  In this case,
>there is an authoritative reference to the parent.  Mind you, this comes
>with many down-sides: Apple attempted to ship a UNIX system without
>support for hard links, and had to rapidly hack support for it back into
>the file system.
> 
> - Maintain a parent back-pointer for files in the vnode, reflecting the
>last directory used to reach the file, so that you can search that
>directory to find a possible name.  This requires different reference
>management behavior, prevents directories from falling out of the cache
>if a file reached via the directory is in use, and will also require
>walking directories, which can be very expensive.
> 
> At heart, though, fundamental issues remain: files can have no names, or 
> they can be looked up using a name that is removed, yet still have another 
> name.  They can have several names.  They can be accessed without any 
> lookup.  The same name can refer to several files due to mountpoint 
> covering.  Throughout the design, names are assumed to be only fleetingly 
> valid (during the lookup), and of secondary importance after that.
> 
> Most systems I've looked at try to work around a lack of names in two 
> ways:
> 
> (1) They treat the name as something valid only at time of lookup.  For
>  example, the Solaris audit system captures a name used to look up a
>  node, and after that it is the responsibility of the consumer of the
>  audit trail to identify any name operations that might affect the name
>  of an object in use, if names are important.  Typically they have to
>  handle three names during lookup: path to process root, path from
>  process root to cwd, and path from cwd to file.
> 
> (2) Apple has an underlying file system, HFS+, that actually maintains a
>  fairly strong notion of directory hierarchy, via its c

Re[2]: vn_fullpath() again

2005-09-06 Thread Igor Shmukler
Perhaps, I do not get it or maybe you are do not getting my point.

There are times when resolving would not be possible or a name returned is not 
necessarily the one used when file was first accessed. We have discussed it 
here and everyone agreed on that. The hardlinks or files unlinked while vnode 
is still open are corner cases. The unlink is a bit more difficult to deal 
with, but hardlinks are probably not a big issue. As long as we can get A name, 
we may not even need to know THE name.

I am pleasantly surprised to know that fabric of the universe is not MSDOS. :)


> Igor Shmukler <[EMAIL PROTECTED]> writes:
> > Dag-Erling SmЬrgrav <[EMAIL PROTECTED]> writes:
> > > Igor Shmukler <[EMAIL PROTECTED]> writes:
> > > > You are correct about the Unix file system organization, but does it
> > > > mean reliable vnode to fullname conversation is not possible?
> > > Yes.  Get over it.
> > Well, I do not think it is a Yes. I very much think it is a No. You
> > should have continued reading my email 'til the middle or even
> > farther.
> 
> I did.  You just don't get it.  A file may be associated with zero,
> one or more names and none of these names are more correct or
> authoritative than any of the others.  If a user does 'ln /bin/ls
> /tmp' (assuming /bin and /tmp are on the same filesystem), it may be
> obvious to you that /bin/ls is the "real name" is /tmp/ls is just an
> alias, but it is not obvious to the kernel.  In fact, the kernel is
> unable to see any difference at all between these two names.
> 
> Storing the name that was used to access a file in the vnode does not
> solve anything, because the vnode is shared by all users of that file,
> regardless of which name they used to access it, and there is no
> guarantee that the name that was used to access a file two seconds ago
> still references the same file, or any file at all; the file may have
> been renamed or deleted, or a new filesystem may have been mounted
> that covers the namespace that file was in.
> 
> In summary: THERE IS NO WAY TO UNIQUELY AND RELIABLY MAP A VNODE BACK
> TO A NAME, and I wish people would stop insisting that there must be.
> All the world is not MS-DOS.
> 
> DES
> -- 
> Dag-Erling SmЬrgrav - [EMAIL PROTECTED]
> 


Играй, общайся!  Скачай новую версию М-Агента 
http://r.mail.ru/cln2659/agent.mail.ru

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: vn_fullpath() again

2005-09-06 Thread Igor Shmukler
> > You are correct about the Unix file system organization, but does it
> > mean reliable vnode to fullname conversation is not possible?
> 
> Yes.  Get over it.

Well, I do not think it is a Yes. I very much think it is a No. You should have 
continued reading my email 'til the middle or even farther.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: vn_fullpath() again

2005-09-05 Thread Igor Shmukler
Robert,

You are correct about the Unix file system organization, but does it mean 
reliable vnode to fullname conversation is not possible?
As long as vnode is referenced we should be able to perform the lookup for any 
file system. Linux does a pretty good job with d_path() and I understand Matt 
changed his NC to provide this.

The FreeBSD name cache requires work. It could and IMHO should be improved. If 
there is a desire to have FreeBSD improved in this area, why doesn't someone 
look at a solution I submitted for returning devfs names.

While a perfect solution would require serious changes to the OS, a solution 
that would work for referenced vnodes is easier to implement.

igor.

-Original Message-
From: Robert Watson <[EMAIL PROTECTED]>
To: Sergey Uvarov <[EMAIL PROTECTED]>
Date: Mon, 5 Sep 2005 18:00:56 +0100 (BST)
Subject: Re: vn_fullpath() again

> 
> On Mon, 5 Sep 2005, Sergey Uvarov wrote:
> 
> > all knows that vn_fullpath() is unreliable. However I really need to get 
> > a filename for a given vnode. To simplify the task, I do not care of 
> > synthetic file systems or hardlinks.
> >
> > I have looked through archives in hope to find a better solution. It 
> > seems that linux_getcwd() approach could help. However to make that code 
> > work for me, I need to know a directory vnode where the file resides. 
> > vnode->v_dd field looks promising. But as I understand it did not help 
> > if file name is not in a name cache.
> >
> > So the question: is it ever possible to get directory vnode for a given 
> > file vnode?
> 
> One way to look at the problem is from the perspective of how you might 
> derive that information from an on-disk inode.  If you look at the UFS 
> layout on-disk, you'll see that there is no pointer to a directory back 
> from a leaf inode; in kernel, you can have a reference to a vnode with no 
> back pointer to a directory vnode.  In order to find the parent, you 
> potentially have to iterate through all directories on the hard disk 
> looking for the parent, which is a potentially long-running activity. 
> It's also not at all theoretical: vnodes are often accessed without any 
> path lookup at all.  For example, background fsck may pull inodes off disk 
> without a name lookup, and the NFS server can receive file handle 
> references following a reboot from a live client that maintains cached 
> references -- it will service them without performing a lookup.
> 
> So unfortunately, the answer is complex: (a) you may have to search the 
> disk for a name, and (b) you may not even find one, since there can be 
> files without any name at all (i.e., a temporary file that has been 
> unlinked).
> 
> On non-UFS style file systems, such as HFS+, it is possible to generate a 
> path from the file system root without extensive disk I/O.  However, all 
> common UNIX-like file systems don't have this property -- Sun's version of 
> UFS, ext2fs/ext3fs, and so on.
> 
> If the child vnode is a directory, you can just follow it's '..' link or 
> covered vnode, of course...
> 
> Robert N M Watson

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: per file lock list

2005-08-02 Thread Igor Shmukler
Matt,

Thank you very much for response. This is a general solution, but it
not sufficient for our needs. I guess I should have been more clear
while explaining what we need.

We want list of these locks for a group of processes.

We made an implementation based on your suggestion, but there is one problem...

Unfortunately this method does not return all shared locks for a
range. For example, if several processes have placed a shared lock on
a
range [1000 - 2000], F_GETLK returns a flock structure where l_pid field
contains a pid of process that takes the lock first. While, we want
to know all processes that takes this lock. Is there any way to retrieve
such information without using of internal kernel structures (inode
information)?

Thank you in advance,

igor

On 7/21/05, Matthew Dillon <[EMAIL PROTECTED]> wrote:
> :Hi,
> :
> :We have a question: how to get all POSIX locks for a given file?
> :..
> :
> :As far as I know, existing API does not allow to retrieve all file
> :locks. Therefore, we need to use kernel internal structures to get all
> :...
> :So the question: is there an elegant way to get the lock list for a given 
> file?
> :
> :Thank you in advance.
> 
> You can use F_GETLK to iterate through all posix locks held on a file.
> From man fcntl:
> 
>  F_GETLKGet the first lock that blocks the lock description pointed to
> by the third argument, arg, taken as a pointer to a struct
> flock (see above).  The information retrieved overwrites the
> information passed to fcntl() in the flock structure.  If no
> lock is found that would prevent this lock from being created,
> the structure is left unchanged by this function call except
> for the lock type which is set to F_UNLCK.
> 
> So what you do is you specify a lock description that covers the whole
> file and call F_GETLK.  You then use the results to modify the lock
> description to a range that starts just past the returned lock
> for the next call.  You continue iterating until F_GETLK tells you that
> there are no more locks.
> 
> -Matt
> Matthew Dillon
> <[EMAIL PROTECTED]>
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


per file lock list

2005-07-22 Thread Igor Shmukler
Hi,

We have a question: how to get all POSIX locks for a given file?

As far as I know, existing API does not allow to retrieve all file
locks. Therefore, we need to use kernel internal structures to get all
applied locks. Unfortunately, a head of list with file locks is
attached to inode rather then vnode. As result, it is much harder to
get the lock list head due to the need to know exact inode type that
is hidden behind the vnode.

Of course, the problem could be resolved in a hackish way: we may get
the address of VOP_ADVLOCK() method and compare it with all known FS
methods, that handles this VOP operation: (ufs_advlock, etc.) and
therefore apply a proper type cast to vnode->v_data to get valid
inode. However, this would be a last resort.

So the question: is there an elegant way to get the lock list for a given file?

Thank you in advance.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


debugging with Qemu

2005-06-08 Thread Igor Shmukler
Hello,

We have tried to use qemu for debugging of kernel-level code the same way we 
used 
bochs in past.
The qemu whether with or without kqemu is quite fast for our needs. The gdb 
connects 
to guest just fine, however breakpoints break things and qemu stops working.

Our guest OS is FreeBSD 5.3. We would not need to use qemu if not for the 
problems 
5.3 has with gdb.

Any ideas what could we do besides using painfully slow bochs?

Thank you in advance,

Igor
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


vn_fullpath() and devices

2005-04-26 Thread Igor Shmukler
hello,

i reported before that vn_fullpath() does not currently deal with VCHR type of 
vnodes.

There is an easy solution for this:

 if (vnp->v_type == VCHR) {
   fullpath = vnp->v_rdev->si_name;
   VOP_UNLOCK(vnp, 0, td);
   len = sizeof("/dev/") + strlen(fullpath);
freepath = vdt_malloc(len);
sprintf(freepath, "/dev/%s", fullpath);
fullpath = freepath;
 } else {

it this works for everyone, i could make and test a patch against whatever 
branch is 
appropriate.

thank you,
igor
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: name cache cont.

2005-03-28 Thread Igor Shmukler
Bruce,

Thank you for your reply. If you could email me a tarball, I would appreciate 
it.
I don't have that much practical experience with FreeBSD name cache. However, I 
had 
some expeirence with other systems.
I think the big question here do other developers agree that name cache could 
use 
some work.

Obviously everything here is IMHO. Solaris has a very straightforward scalable 
cache. The FreeBSD cache is not bad, but as far as I understand issues I 
mentioned 
are a side-effect of design desicions aiming to lower pressure on the cache. 
I would very much be interested to know what Jeff and everyone else thinks 
about 
this. Is there a desire within the people who have hands-on expereience 
maintaining 
this part of FreeBSD to change the cache.

I need d_path() like functionality for a specific kernel module. Right now, to 
get 
away, a secondary cache is implemented by intercepting open(2)/close(2). To 
minimize 
the amount of memory this cache needs (and keep to code simple) a dedicated 
kernel 
thread gc'es duplicate names from this cache, i.e. if vn_fullpath() works 
delete 
name from secondary cache). It's ugly, but it works for now.

I would rather FreeBSD takes care of this for us. I am willing to contribute if 
folks who know VFS well, think it's a worthy cause.

Igor.

> 
> On Mon, Mar 28, 2005 at 05:42:52PM +0400, Igor Shmukler wrote:
> > For my purposes the Linux/DragonFly functionality is needed.
> > 
> > Is there a way to know that once a patch that correctly resolves corner 
> cases 
for 
> > vn_fullpath() (including name cache changes) exists it will be committed 
> to 
the 5.x 
> > branch?
> 
> Hey, I did some of this work quite some time ago. It is still floating
> around in the archives. I'm more than happy for people with more vfs
> knowledge than I to pick it up, review it, and commit it, but I don't
> have free time to do this right now.
> 
> BMS
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


name cache cont.

2005-03-28 Thread Igor Shmukler
Robert

> which would generally explain the issues.  However, the >my.file case is a
> bit concerning.  Could you confirm that the file descriptor in that case
> is definitely pointed at a vnode?

Indeed does not work. It only happens when my.file is a file newly created by 
the 
shell. If one forwarded output to an existing file, vn_fullpath() returns name 
just 
fine.

For my purposes the Linux/DragonFly functionality is needed.

Is there a way to know that once a patch that correctly resolves corner cases 
for 
vn_fullpath() (including name cache changes) exists it will be committed to the 
5.x 
branch?

It appears that these changes would require serious labor changing the name 
cache.

Who would be the right person to even decide that these can/cannot be applied 
to a 
stable tree?

Igor.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: name cache (was Re[4]: vn_fullpath())

2005-03-27 Thread Igor Shmukler
> On FreeBSD, this occurs because devfs doesn't use the name cache.  Two
> easy solutions are:
>
> - Use the name cache in devfs.  This would have to be done carefully in
>   the context of cloning, etc, but should work out.
>
> - Add a VOP/VFS operation to help figure out a pathname with the help of
>   the file system, and implement it for devfs.  This would avoid having to
>   deal with cache invalidation issues in devfs.

I would prefer whatever would be a lowest impact uniform (for different FSs) 
solution. I will start looking into this issue.

> I'm not familiar with this issue specifically.  Normally these descriptors
> point to tty's (unnamed due to devfs issues above) and pipes (no name),
> which would generally explain the issues.  However, the >my.file case is a
> bit concerning.  Could you confirm that the file descriptor in that case
> is definitely pointed at a vnode?

I will do this. I would like to point out ( guess I was not clear the first 
time). 
That even if std[in/out/err] is VREG, not VCHR after child process inhereted 
this 
descriptor vn_fullpath() does not work.

I understand that this sounds fishy, because fd simply points to vnode, but 
that the 
impression for now. If one closes a "standard" descriptor then opens a file, it 
does 
work, but seems not to survive through inheritance.

I will follow-up with more information on this. Maybe, files issue for 0..2 is 
a 
just a product of imagination :)

> Linux does something a little different in how they maintain references to

I am aware that Linux dentry/inode/cache are different, but I was asking this 
for a 
simple (selfish) reason. If there is a concesus that d_path() like 
functionality [in 
a black-box way i.e. let's forget how it is implemented] would be very helpful, 
then 
I think if a patch was made it might be committed before 5.5 is out. In that 
case, I 
would try to work on this and/or even ask my colleagues to help with 
coding/testing. 
If this is viewed as an obscure feature that will not be included anytime soon, 
I 
would remove from my agenda for now.

I thank you Robert and everyone else who spent time reading this thread and 
thinking 
about this whole issue.

Thank you,

Igor
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


name cache (was Re[4]: vn_fullpath())

2005-03-27 Thread Igor Shmukler
Hi,

Sorry for reopening an old thread. I am doing this because last time around I 
was 
unaware of some issues.

There are more corner cases/issues with vn_fullpath() and possibly the name 
cache.
Please correct me if I am wrong. Perhaps, I would even personally look into 
fixing 
these, but I would like to know everyone agrees that this is needed.

1. vn_fullpath() does not return names for VCHR vnodes. I think it would be 
handy if 
this was possible.
2. It appears that vn_fullpath() has problems with FD 0..2. [It even seems to 
happen 
regardless whether file descriptors were inherited or open via $foo >my.file]

I am under the impression that Linux d_path() does these things. Is there an 
agreement that this a problem and it would be benefitial to have vn_fullpath() 
[and 
name cache] behave in a "proper" way?

Where does dragonfly stand on this?

Thank you,

Igor

> :I seem to recall that DragonFly keeps the intermediate nodes.
> 
> There's no way to backport that, it would be hundreds of man hours of
> work.  DragonFly uses a totally different namecache topology now, one
> that is mandatory and which guarentees the existance of intermediate
> nodes.
> 
> You'd have to implement something similar to libc's getcwd code.  e.g.
> ".." through and scan each directory to find the matching inode if
> the namecache entry is not present.  It actually wouldn't be too hard
> to do.  It wouldn't be efficient, but vn_fullpath() is rarely used
> so it shouldn't be a problem.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: relation between PQ_CACHESIZE and PQ_L2_SIZE

2005-03-26 Thread Igor Shmukler
> http://lists.freebsd.org/mailman/htdig/freebsd-hackers/2003-June/001655.html
>But what puzzled me is : why not page size is  a 
> factor when calculating the number of colors?

Page coloring in freebsd was implemented by John Dyson. It is needed to better 
utilize the 
cache. Depending on cache's implementation fully-associative vs. 4-way vs 2-way 
etc you might 
have problems.

A subset of bits (low-bits) from the page frame's (physical) address tells us 
where can data be 
stored in processor cache. We want a relatively equal distribution of these 
"colors" so that we 
utilize as much of cache real estate as possible. Hence, we are interested in 
the size of a 
set, not size of a page.

I am sure, there are whole bunch of articles written about this. I could give 
you some pointers 
offline.

Igor.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: vn_fullpath()

2005-02-21 Thread Igor Shmukler
Robert and David,

Thank you for your help.

> It depends a lot on the requirements.  There are some nasty edge cases
> where the process of determining a name for an object can be quite
> expensive.  Here's one of them:
> 
>   ln /usr/local/etc/apache/httpd.conf /usr/local/etc/apache.old/httpd.conf
>   reboot
>   apachectl start
>   rm /usr/local/etc/apache/httpd.conf
> 
> Now generate the name of the file that Apache has open.  Note that you
> can't just look in the name cache, because the object has a name but the
> name used to open the object has been invalidated.  And UFS even knows it
> has a name, because the link count remains non-zero when the unlink of one
> of the names occurs -- but the only way it can find the other name is to
> search the file system. 
> 
> So the first thing to do is to decied what your requirements are: are you
> willing to fail in the edge cases like the above?  If so, life is a lot
> easier :-). 

I guess I am willing to fail :). Perhaps in some distant future, we will look 
into the nasty corner cases, 
but for now, as long as I get a name, it will do. We don't even mind the 
hardlinks so much, but we cannot 
afford to use existing vn_fullpath() because it does not guarantee "anything".

Thank you,

Igor
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


vn_fullpath()

2005-02-20 Thread Igor Shmukler
Hello,

I was wondering if anyone has figured a way to make vn_fullpath() reliable?

Perhaps there is another approach to attacking this problem. Here is what I need
to accomplish:

I need to be able to determine dynamic linker, shared libraries or executable
name for a specific process.

The alternative to vn_fullpath() is intercepting calls, however I need an
interpreter name in case of a script.

The problem with name cache is:
a. name has to be in the cache
b. hardlinks cause vnodes with multiple names

This must be a common problem so I was curious whether there is a solution.

If anyone has any experience making this work, please advise.

Thank you,

Igor.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: loading kernel at any physical address

2004-12-15 Thread Igor Shmukler
I think this might be somewhat off topic, but to support superpages you 
probably want kernel to be aligned on 4MB boundary.

Also, Mach had macros for alignment. I browsed code and it seems there are 
macros in i386/include/asmacros.h
Perhaps I am missing something, but I don't see why would you want to align 
with NOPS.


-Original Message-
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Date: Wed, 15 Dec 2004 10:43:45 -0800
Subject: loading kernel at any physical address

> Hello all, for a project I am trying to figure out how to boot a FreeBSD 
> kernel loaded at any physical address. Right now the locore.s magic works 
> because the load addres (KERNLOAD) and (KERNBASE) are set such that
> 
> #define R(foo) ((foo)-KERNBASE)
> 
> macro is able to get the addresses before paging is enabled.
> 
> If the loadaddress information is not embedded in defines, then is the 
> following solution expected to work:
> 
>   .globl  _loadaddress/* should be at 16M aligned ??? */
>   .set_loadaddress,KERNBASE
> 
> and then:
> 
> NON_GPROF_ENTRY(btext)
> 
> nop /* nops for 8 byte alignment */
> nop
> nop
> call 0f
> 0:
> mov 4(%ebp), %eax
> add $-8, %eax   /* This is actual physical load addr 
> */
> add $-0x10, %eax
> subl %eax, _loadaddress /* new kernbase w.r.t load addr */
> /* instead of standard 1MB reloc */
> 
> and then 
> 
> #define R(foo) ((foo)- _loadaddress)
> 
> One issue might be loadaddress over 16M, but for this problem we can assume 
> that the processor has been in protected mode, so it has access to that space.
> 
> Any input on this is highly appreciated.
> 
> br
> vijay

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: Loadable Scheduler in Freebsd

2004-11-08 Thread Igor Shmukler
> If the schedulers were aware of the "selected" scheduler (or perhaps
> the previous scheduler), they could do the thread removal and insertions
> themselves I suppose.

I doubt you would want to do that.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[6]: FreeBSD on Xserve?

2004-09-12 Thread Igor Shmukler
That's true indeed.

Below is a quote from 
http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/AB70A3470F9CC0E287256ECC006D6A54/$file/970-software.pdf

The implementation of memory management in the 64-bit PowerPC processors is 
significantly different from the 32-bit
PowerPC implementations. The support for BAT (Block Address Translation) is no longer 
available in the PowerPC
970FX processor and in the 64-bit PowerPC architecture. The removal of the BAT 
mechanism will require all application
programs to enable the MMU (Memory Management Unit) in order to access non-cachable 
memory.

It's very strange that original manual states quite the opposite.

Igor.

-Original Message-
From: <[EMAIL PROTECTED]>
To: Igor Shmukler <[EMAIL PROTECTED]>
Date: Mon, 13 Sep 2004 16:07:45 +1000
Subject: Re: Re[4]: FreeBSD on Xserve ?

> 
> >I am not trying to suggest that you and/or him are wrong, 
> >but I cannot find (in manual) anything that would support 
> >your position that 970 has no block address translation. 
> >Regarding 16MB superpages, I believe manual explicitly says 
> >that 970 has no superpages, but I did not go through the doc 
> >again. Therefore, I could be mistaken.
> 
>  I think there's an IBM technote that states there are
> no BATs on the 970. Linux source has comments to that
> effect as well.
> 
>  And before I found that info, I tried in vain to get it
> to work on my G5.
> 
> later,
> 
> Peter.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: FreeBSD on Xserve?

2004-09-12 Thread Igor Shmukler
> > If original author wants to mature OS with MAC and SMP support SELinux
> > might be a good candidate.
> > However, Linux does not have jails. Only other OS that has them is
> > Solaris 10 which does not run on PPC.
> 
> There's something named User Mode Linux which seems to be a little like
> jails.  I haven't got the faintest idea how well it works.

I could be wrong, but AFAIK UML is not same thing as jail. AFAIK, UML has a serious 
performance penalty.
It used to work pretty well for 2.4.x kernels. However, there are associated issues 
with keeping UML up to date.
I don't think UML ever made it into mainline. Jail is part of kernel.

Personally, I think that if jail was available on Apple hardware it would be a serious 
argument for using FreeBSD instead of Linux.
IBM boxes support virtualization, but Apple machines don't have that feature. The flip 
side is that probably most people who buy G5 machines are more concerned about FP 
performance.

> > I am not sure what kind of stack protection was referred in the
> > original email. OpenBSD has propolis, but I was under impression there
> > is no such option in FreeBSD. I recall that it was decided that
> > security by obscurity will not make it into the kernel.
> 
> It's "propolice".

Thank you for correcting me. Indeed I did not spell propolice correctly.

> Maybe http://www.trl.ibm.com/projects/security/ssp/buildfreebsd.html
> would be of interest.
> 
> There's more than just obscurity to it, but it is obviously better to
> have correct code to begin with, then things like Propolice isn't
> needed...

That's a choice of terminilogy. The word obscurity has no mathematical style 
definition.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[4]: FreeBSD on Xserve?

2004-09-12 Thread Igor Shmukler
> > Why do you think that 970 does not have BAT registers?
> > There are 16 special purpose registers specifically to implement Block
> > Address Translation.
> >
>
> Because Peter already told us that they have no BAT registers:
>
> http://lists.freebsd.org/pipermail/freebsd-ppc/2004-February/000359.html

I don't know what Peter said, but I do have documentation in  front of me.
In 7.4 of Programming Environments Manual we have an overview of BAT including BAT 
array organization.

This comes as a shock to me. I am sending carbon copy to Peter Grehan, perhaps he 
could tell us where he got his information.

I am not trying to suggest that you and/or him are wrong, but I cannot find (in 
manual) anything that would support your position that 970 has no block address 
translation. Regarding 16MB superpages, I believe manual explicitly says that 970 has 
no superpages, but I did not go through the doc again. Therefore, I could be mistaken.

Regarding fans, it's not a big deal to support this kind of equipment. IMO, SMP 
support and other low-level stuff is order[s] of magnitude more complex.

Igor.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re[2]: FreeBSD on Xserve?

2004-09-12 Thread Igor Shmukler
Why do you think that 970 does not have BAT registers?
There are 16 special purpose registers specifically to implement Block Address 
Translation.

I don't know what's a story with fan-drivers. Personally, I was under impression that 
G5 has liquid cooling system.
Not that should be a major show stopper for FreeBSD support of G5 boxes.

AFAIK, PPC port of FreeBSD is incomplete, but moving ahead quite fast.

If original author wants to mature OS with MAC and SMP support SELinux might be a good 
candidate.
However, Linux does not have jails. Only other OS that has them is Solaris 10 which 
does not run on PPC.

I am not sure what kind of stack protection was referred in the original email. 
OpenBSD has propolis, but I was under impression there is no such option in FreeBSD. I 
recall that it was decided that security by obscurity will not make it into the kernel.


> I don't think we have G5 support yet.  G5's are significantly different
> from G4s in a few ways that really matter to operating systems.
> Missing BAT registers and other "fun stuff" like fan-drivers have meant
> that even platforms that support 64bit PPC don't necessarily support G5
> [like the L4 microkernels I've been playing with]
>
> Dave
> On Sep 12, 2004, at 2:30 AM, [EMAIL PROTECTED] wrote:
>
> > Hello,
> >
> > I'm planning on buying an Apple Xserve G5 bi-processor. I know mac os
> > X (server)
> > is running on it and that's a modified version of freebsd.
> > So here are my questions :
> >
> > - I've been using freebsd for a while now and if I buy the Xserve I'd
> > very much
> > like to replace mac os X by a freebsd 5.2 / 5.3 if this is possible. My
> > motivations are that I want to make intensive use of Jails and
> > Mandatory Access
> > Control (MAC). I'd also like to recompile the whole thing with stack
> > protection
> > (if possible).
> >
> > Yet I have no idea if Mac os X can run jails, and MAC (anyone an idea
> > here ?),
> > but if not, I'd switch to Freebie.
> >
> > So in general :
> > - has anyone experienced the change
> > - would it be difficult to replace OS X by FreeBSD ?
> > - would it be possible to run these options (Jails,MAC,stack
> > protection) on this
> > hardware ?
> >
> > Thanks for the hints, because I'm a little lost.
> >
> > By,
> > Jade.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Self-tuning parameters

2004-07-15 Thread Igor Shmukler
Avishay,
First thing to look at - statistics gathering code in vm_pageout.c.
It collects basic usage information upon which various decisions are made.
Then pageout thread basically does GCing based on memory pressure.
Used to be that depedning on pass number [vm_pageout(int pass)] system would either 
force or not force GC.
Logic is that is system recovered enough pages in one pass, pressure in not too high.
IS.
PS I am not 100% function names are accurate, but as far as I remember that's the 
basic idea.

-Original Message-
From: Avishay Traeger <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Date: Thu, 15 Jul 2004 12:51:22 -0400 (EDT)
Subject: Self-tuning parameters

> 
> I am currently looking into how various operating systems self-tune their
> memory-related parameters (automatically adjusting parameters such as how
> much memory is allocated for various caches, buffer flushing rates, etc.).
> I have read a few posts indicating that FreeBSD self-tunes many of these
> parameters.  I was wondering if you could provide me with specific
> information (or point me at source code) about where and how FreeBSD does
> this.
> 
> Thanks in advance,
> Avishay Traeger
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 


http://Mail.Ru - лучшая почта с неограниченным объемом почтового ящика!
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


re: OS X (was *BSD and Mac OS)

2004-07-01 Thread Igor Shmukler
Hello,
Sorry for intrusion.
This is not really what original argument was about.
I am curious, do you (or someone else) knows what exactly was changed in Tiger in 
regards to fine-grained locking.
I did not make to WWDC and I could not find any technical info on that.
Sincerely,
IS.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


a possible explanation for the mmap benchmarks

2003-12-17 Thread &quot;Igor Shmukler"
First of all I do not want to start any kind of war here.
I studied results of Felix's benchmark some time ago and now I think I have a 
possible explanation for what happens. I do not mean to invalidate results. I 
just want to offer a cause, in case someone is unaware.

What does mmapbench do? It does mmap of every other page on 200MB file (by
default). Since it does mmap sequentialy, free space linear search vs hint
works very well, so we get almost constant time free space allocation (I
have tested Linux kernel 2.4 - it spends more and more time on each
subsequent mmap as the number of mapped regions growth. 2.6 should be OK,
since they are start using search hint :-) ) During sequential mmaping,
splay tree [used in vm_map_lookup_entry()] degenerates to a list. Later,
when mmapbench sequentialy touches mmaped regions, entry search using
degenerated splay tree gives almost the same results as linear entry
search vs hint. So again - no improvements on such test conditions.
RB-tree (NetBSD), as I understand, rebalances itself after each insert.
Splay tree only does rebalancing during sort.

The questions is whether altered benchmark, where things are done randomly
will produce similar results. I do not know the answer, yet. However, I believe 
it very well might.

I would be glad to know whether it seems reasonable to anyone? I contacted the 
author, but he has not had a chance to reply yet.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Random disk cache expiry

2003-01-30 Thread Igor Shmukler
> You have found an optimal replacement algorithm for the case of
> repeated sequential reads.  In fact, if you know in advance what
> the access pattern is going to be, it is *always* possible to find
> an optimal replacement algorithm.  Specifically, you always
> replace the block in the cache that will not be used for the
> longest time in the future.

Did everyone read UBM paper from OSDI? It presents one possible solution for dealing 
with sequentaly accessed files. Why is it not enough (at least to begin with)?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



random cache expiry

2003-01-30 Thread Igor Shmukler
I partially missed the discussion, but why would anyone want to implement random 
expiration, when there are better method to deal with issue. Including UBM which was 
already imemented in FreeBSD 2.2.8. 
(http://www.usenix.org/events/osdi2000/full_papers/kim/kim_html/)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message