pw(8): $ (dollar sign) in username

2002-12-27 Thread Ryan Thompson

Hi all,

I've recently had the pleasure of configuring a FreeBSD machine as a
Samba Primary Domain Controller. In smb.conf, one can specify an add
user script directive to automate the creation of machine accounts.
Otherwise, you have to manually create accounts for each machine on
the network. See:

  http://us1.samba.org/samba/ftp/docs/htmldocs/Samba-PDC-HOWTO.html

Problem is, smb requires a '$' at the end of the username, which our
pw(8) doesn't allow.

Allowing the $ is a one-character change to usr.sbin/pw/pw_user.c .
Aside from the obvious pain of accidentally inserting shell variables
as part of a username if the $ is not escaped, are there any specific
problems with this change?

Others would probably benefit from this. Is the change worth
committing? Or would it be better to push this to pw.conf?

--- usr.sbin/pw/pw_user.c.orig  Sat Nov 16 21:55:28 2002
+++ usr.sbin/pw/pw_user.c   Fri Dec 27 11:17:33 2002
@@ -1195,7 +1195,7 @@
 pw_checkname(u_char *name, int gecos)
 {
int l = 0;
-   char const *notch = gecos ? :!@ : ,\t:+#%$^()!@~*?=|\\/\;
+   char const *notch = gecos ? :!@ : ,\t:+#%^()!@~*?=|\\/\;

while (name[l]) {
if (strchr(notch, name[l]) != NULL || name[l]  ' ' || name[l] == 127 
||

- Ryan

-- 
  Ryan Thompson [EMAIL PROTECTED]

  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: pw(8): $ (dollar sign) in username

2002-12-27 Thread Ryan Thompson
Craig Rodrigues wrote to Ryan Thompson:

 On Fri, Dec 27, 2002 at 11:35:45AM -0600, Ryan Thompson wrote:

  Problem is, smb requires a '$' at the end of the username, which
  our pw(8) doesn't allow.

 The same patch which you proposed was suggested on the
 freebsd-current list. See the thread pw_user.c change for samba:

Heh. Apparently I missed that. I don't get out much these days. :-)

Thanks,
- Ryan

-- 
  Ryan Thompson [EMAIL PROTECTED]

  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Who broke ls in FreeBSD? and why?

2000-10-31 Thread Ryan Thompson


Woo.. Trimmed the CC list on this one.  Didn't catch the last
one.  Sorry! :-)


Jose M. Alcaide wrote to Sean Lutner and [EMAIL PROTECTED]:

 Sean Lutner wrote:
  
  I may just be being naive here, which is why I took this off the list. I
  don't understand how a directory that is a-x will not let you run ls -l on
  it.

It won't, because in order to generate a listing of filenames in the
directory, you must have read access to that directory.  To run ls -l,
both read AND search are required, as you must be able to a) get a list of
files, b) map those files to inodes (to return size, link and date
information, etc)



  As you can see, after changing the permissions, you cannot run ls -l as
  you could before. Perhaps I don't have the broken version, or there is
  something I am missing. At any rate, a better understanding would be nice.
  

What broken version? :-)


 If a directory does not have search permission, the i-node contents of
 each of its entries cannot be examined. Under these circumstances,
 the directory listing "per se" does not fail, but the information
 requested cannot be shown. For example, in Solaris (and in SunOS 4.x):
 
 $ ls -ld Test/
 drw-r-   2 jose lsi  512 oct 25 11:13 Test/
 $ ls Test  
 1  2  3
 $ ls -i Test
 288799 1 288800 2 288801 3
 $ ls -l Test
 Test/1: Permission denied
 Test/2: Permission denied
 Test/3: Permission denied
 total 0
 $
 
 Anyway, I found something interesting: the bash shell is involved
 in some way:
 
 Using bash:

When you use wildcard expansion, the shell is definitely involved.  
Remember it is the shell, not ls, that expands wildcards.


 
 $ mkdir Test
 $ touch Test/{1,2,3}
 $ chmod a-x Test
 $ ls Test  echo SUCCESS
 SUCCESS  -- WRONG!!
 $ /bin/ls Test  echo SUCCESS
 1   2   3-- This works as expected (?!?!??)

What sort of shell functions/aliases do you have defined in bash?  Looks
like ls is possibly aliased to something.  (Possibly with different
command line arguments to /bin/ls).


 Using [t]csh:
 
 $ csh
 %ls -ld Test
 drw---  2 jose  lsi  512 25 oct 10:49 Test
 %ls Test
 1   2   3-- This works as expected
 %ls -i Test  -- WRONG!!
 %which ls
 /bin/ls
 %
 

Why not call /bin/ls explicitly in your examples, to remove the inherent
ambiguity?


 Using both bash and csh, 'ls -i' and 'ls -l' give nothing and
 don't return any error when the directory does not have search permission.
 "ls -i" should work, since getdirentries(2) only requires that
 the directory must be opened for reading. The behavior of "ls -l" may
 be a subject for discussion.

"Opened for reading" is different than "has execute (search) permission".  
The two can be independent.  Even still, I don't see where
getdirentries(2) "only requires the directory must be open for reading".  
If that IS the case, then the -doc people have a change to commit :-)

Search permission is required to map pathnames to inodes.  That's a
requirement of the kernel (and, consequently, of kernel calls) for normal
users.


 Cheers,
 -- JMA
 ** Jose M. Alcaide  //  [EMAIL PROTECTED]  //  [EMAIL PROTECTED] **
 ** "Beware of Programmers who carry screwdrivers" --  Leonard Brandwein **
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message
 

-- 
  Ryan Thompson [EMAIL PROTECTED]
  Network Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Who broke ls in FreeBSD? and why?

2000-10-31 Thread Ryan Thompson


Jose M. Alcaide wrote to Warner Losh:

 Speaking of ls(1)...
 
 $ mkdir Arghh
 $ touch Arghh/{one,two,three}
 $ ls Arghh
 one   three two
 $ chmod a-x Arghh
 $ ls Arghh  echo SUCCESS
 SUCCESS
 $ ls -l Arghh  echo SUCCESS
 SUCCESS
 
 ARH :-)
 
 This is not the expected behavior. If a directory does not have
 search permission, but it has read permission, a plain "ls" (or "ls -i")
 should list its contents, while "ls -l" should fail. And still worse,
 when ls fails to list the non-searchable directory contents, it
 does _not_ return an error code.

It's late... And I'm tired... But isn't that backwards?

"Search" (i.e., execute) permission on a directory implies that the
directory can be included as part of a directory search.  In other words,
mapping to inodes is provided, but obtaining a list of files in the
directory is NOT.  This is used by system administrators to "hide" a
directory of files, but still grant access to them if they are named
explicitly.

"Read" permission on a directory means that the directory can be listed.  
Thus, if a directory has read permission, you can list the contents of it.

Permutations of directory permissions (ignoring write):

read and execute
(Normal) Can get a listing of files, and inodes for each file
So, files within the dir can be opened.

read, no execute
Can get a listing of filenames only.  Can't actually map
filenames to inodes, so no open, ls -l, ls -li, etc.

execute, no read
Can't get a listing of filenames, but if you know the
filename in question, you can map it to an inode.  So, 
ls Arghh will never work (as it has to read the directory
to get the list of files!), but ls -l Arghh/one *will*
work, as it does not need to SEARCH to discover the name
of the file called "one".

no read, no execute
No service! :-)

FreeBSD's behaviour is as correct as the behaviour of any other UNIX
variant that I am aware of.  The problem is not with ls.

See my response to your next post (I'll have to type it, first ;-)


Hope this helps,

- Ryan

-- 
  Ryan Thompson [EMAIL PROTECTED]
  Network Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Logging users out

2000-10-30 Thread Ryan Thompson

[EMAIL PROTECTED] wrote to [EMAIL PROTECTED]:

 Hi,
 
 What is the best way to go about logging a user out given their tty? I had
 a couple of ideas:
 
 (a) open their tty and set the baud rate to 0

Probably wouldn't be very effective.
 

 (b) use tcgetpgrp to get the process group id of the foreground process
 group on that tty, then with that info use libkvm to find the session
 leader's pid and send it a SIGHUP

Why not just kill their controlling shell?  That's about all a logout will
do, anyway.  If they have processes detatched from the controlling
terminal, the user typing "logout" will not stop them.

Recall that "background" and "foreground" are shell concepts, not
properties of any given process.


 (c) use tcgetpgrp to get the process group id of the foreground process
 group on that tty then using killpg to kill all the members of that
 process group. I would need to do this in a loop to catch background
 process groups that come to the foreground after a process group is
 killed.
 
 Whenever sending a signal I will have to verify the process exited,
 possibly sending TERM and KILL until it does.
 
 Problems:
 
 (a) a doesn't seem to work...I'm guessing it only works on serial lines.
 
 (b) b would be quite unportable I would guess (although thats not a
 tragedy I would like to avoid it if it isn't too hard). Also if the
 session leader dies is there any guarentee everything else in the session
 goes as well? Or would I have to go through and kill every process group
 in the session?

Never any guarantee.. user programs can set up signal handlers to catch or
ignore any signal but SIGKILL.  Then again, a user typing "logout" will
not clean up absolutely everything, either.


 (c) c just seemed to be a bit of a hack (assuming you haven't just read
 b). Out of all of them it seems the best so far however.
 
 Does anyone have any suggestions or comments? Is there a "proper" way to
 do this?

a) Kill the controlling shell.  This will leave some processes behind that
   are no longer part of the user's session (like programs that have
   detatched from the terminal and become daemons), and processes that
   were never part of the user's session (like processes that they started
   on a different terminal)

b) kill -signal `ps -axo user,pid | grep user | awk '{print $2}'`
   Kills every process owned by ``user''.  Sending SIGKILL does so
   in a non-catchable way.

c) /sbin/halt is pretty much guaranteed to do the trick ;-)


 
 Thanks,
 
 Andrew
 
 
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message
 

-- 
  Ryan Thompson [EMAIL PROTECTED]
  Network Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Filesystem holes

2000-10-29 Thread Ryan Thompson

Matt Dillon wrote to Ryan Thompson:

 : : storage is rather inefficient for our table of about 2,850,000 members
 : : (~2.1 GB total storage).  There are 64M possible hash values in our
 : : current implementation, and our record size is variable, but could be
 : : safely fixed at about 1.5KB... So, total storage if all values were used
 : : would be about 96GB.  (See where I'm going with this?)
 :...
 :
 :Remember, though, I'm not proposing a hash table.  I have been lucky
 :enough to design a "hash" function that will crunch down all possible
 :values into about 64M unique keys--so, no collisions.  I propose to use
 :this function with address calculation.  So, O(1) everything, for single
 :random lookups, which is (virtually) all this data structure must deal
 :with.  Sorting isn't necessary, and that could be accomplished in O(n)
 :anyway.
 
 Are you still talking about creating a 96GB file to manage 2.1GB worth
 of data?  I gotta say, that sounds kinda nuts to me!  Cpu cycles are
 cheap, I/O and memory is not.

Hi, Matt!  Thanks for the replies.  I'll try and keep you interested ;-)

Hmm... Perhaps you're still missing my original point?  I'm talking about
a file with 96GB in addressable bytes (well, probably a bunch of files,
given logical filesize limitations, but let's say for simplicity's sake
that we have a single file).  It's actual "size" (in terms of allocated
blocks) will be only a bit larger than 2.1GB.  (Directly proportional to
the used size of the dataset.  Discrepancies only come into play when
record size != block size, but that can be worked around somewhat)

In other words, ls -ls will report the "size" as some ridiculously large
number, will show a much smaller block count.  So, assuming four records
are added to the file on block boundaries, the file will actually only use
four blocks... nowhere near 96GB!

In the UNIX filesystem (ya, I know.. just pick one :-), size of file !=
space allocated for file.  Thus, my original questions were centered
around filesystem holes.  I.e., non-allocated chunks in the middle of a
file.  When trying to READ from within a hole, the kernel just sends back
a buffer of zeros... which is enough to show that the record is not
initialized.  Actually, something like an "exists" function for a record
wouldn't touch the disk at all!  When writing to a hole, the kernel simply
allocates the necessary block(s).  This is really fast, too, for creation,
as the empty set can be written to disk with touch(1), and uses far less
memory than virtual initialization or memory structures ;-)

As an example, try 

fseek(f, 0-1, SEEK_SET);
fputc('\n', f);

And analyze the reported filesize, as well as the reported block count of
the file.  You should see a 2GB "size", and maybe 8K in allocated blocks
(depends how you ran newfs ;-).  This is behaviour that has been around
since the original UFS.


 Take the BTree example again -- if you think about it, all internal nodes
 in the tree will fit into memory trivially.  It is only the leaf nodes
 that do not.  So in regards to actual disk accesses you still wind up
 only doing *ONE* for the btree, even if it winds up being four or five
 levels deep.

Right.  And, actually, (without looking a bit more closely), I wouldn't be
suprised if you could replace the four-line address calculation I have
with your B+Tree structure and come up with the same result.  Only
difference would be a few hundred lines of code, much more testing, and
quite a few megs of RAM...  ;-)

What you referred to as "nuts", above, is just a logical way to provide a
huge address space for a set of data, without actually allocating blocks
in the filesystem for the entire address space until they are used.


 The result is that you will be able to create an index to your data,
 which is only 2.8 million records, in around 32 bytes per record or
 89 Megabytes.  Not 96GB.  Since you can cache all the internal nodes
 of the btree you are done.  The machine will do a much better job
 caching a 89MB index then a 96GB index.
 
 Do not make the mistake of equating cpu to I/O.  The cpu required to
 iterate through 4 levels in the btree (assuming 64 entries per node,
 it only takes 4 to cover 16 million records) is *ZERO* ... as in,
 probably less then a microsecond, whereas accessing the disk is going
 to be milliseconds.

CPU time for what I'm proposing is even closer to zero than for a tree...
But, you're right, it doesn't make any real difference when compared to
disk I/O...  B-Trees are good for a lot of things.  Address calculation
can be really good, too, given a finite key set, and a way to represent
that finite key set without wasting space.


   A B+Tree will also scale with the size of the dataset being managed,
   so you do not have to preallocate or prereserve file space. 

  So will address calculation + filesy

Re: Filesystem holes

2000-10-29 Thread Ryan Thompson

Ryan Thompson wrote to Matt Dillon:

 Matt Dillon wrote to Ryan Thompson:
 
  : : storage is rather inefficient for our table of about 2,850,000 members
  : : (~2.1 GB total storage).  There are 64M possible hash values in our
  : : current implementation, and our record size is variable, but could be
  : : safely fixed at about 1.5KB... So, total storage if all values were used
  : : would be about 96GB.  (See where I'm going with this?)
  :...
  :
  :Remember, though, I'm not proposing a hash table.  I have been lucky
  :enough to design a "hash" function that will crunch down all possible
  :values into about 64M unique keys--so, no collisions.  I propose to use
  :this function with address calculation.  So, O(1) everything, for single
  :random lookups, which is (virtually) all this data structure must deal
  :with.  Sorting isn't necessary, and that could be accomplished in O(n)
  :anyway.
  
  Are you still talking about creating a 96GB file to manage 2.1GB worth
  of data?  I gotta say, that sounds kinda nuts to me!  Cpu cycles are
  cheap, I/O and memory is not.
 
 Hi, Matt!  Thanks for the replies.  I'll try and keep you interested ;-)
 
 Hmm... Perhaps you're still missing my original point?  I'm talking about
 a file with 96GB in addressable bytes (well, probably a bunch of files,
 given logical filesize limitations, but let's say for simplicity's sake
 that we have a single file).  It's actual "size" (in terms of allocated
 blocks) will be only a bit larger than 2.1GB.  (Directly proportional to
 the used size of the dataset.  Discrepancies only come into play when
 record size != block size, but that can be worked around somewhat)
 
 In other words, ls -ls will report the "size" as some ridiculously large
 number, will show a much smaller block count.  So, assuming four records
 are added to the file on block boundaries, the file will actually only use
 four blocks... nowhere near 96GB!
 
 In the UNIX filesystem (ya, I know.. just pick one :-), size of file !=
 space allocated for file.  Thus, my original questions were centered
 around filesystem holes.  I.e., non-allocated chunks in the middle of a
 file.  When trying to READ from within a hole, the kernel just sends back
 a buffer of zeros... which is enough to show that the record is not
 initialized.  

If you prefer to read system documentation instead of me, see lseek(2) :-)

 The lseek() function allows the file offset to be set beyond the end of
 the existing end-of-file of the file. If data is later written at this
 point, subsequent reads of the data in the gap return bytes of zeros (un-
 til data is actually written into the gap).

I suppose gap == hole.  Silly semantics. :-) 

 Actually, something like an "exists" function for a record
 wouldn't touch the disk at all!  When writing to a hole, the kernel simply
 allocates the necessary block(s).  This is really fast, too, for creation,
 as the empty set can be written to disk with touch(1), and uses far less
 memory than virtual initialization or memory structures ;-)
 
 As an example, try 
 
 fseek(f, 0-1, SEEK_SET);
 fputc('\n', f);
 
 And analyze the reported filesize, as well as the reported block count of
 the file.  You should see a 2GB "size", and maybe 8K in allocated blocks
 (depends how you ran newfs ;-).  This is behaviour that has been around
 since the original UFS.
 
 
  Take the BTree example again -- if you think about it, all internal nodes
  in the tree will fit into memory trivially.  It is only the leaf nodes
  that do not.  So in regards to actual disk accesses you still wind up
  only doing *ONE* for the btree, even if it winds up being four or five
  levels deep.
 
 Right.  And, actually, (without looking a bit more closely), I wouldn't be
 suprised if you could replace the four-line address calculation I have
 with your B+Tree structure and come up with the same result.  Only
 difference would be a few hundred lines of code, much more testing, and
 quite a few megs of RAM...  ;-)
 
 What you referred to as "nuts", above, is just a logical way to provide a
 huge address space for a set of data, without actually allocating blocks
 in the filesystem for the entire address space until they are used.
 
 
  The result is that you will be able to create an index to your data,
  which is only 2.8 million records, in around 32 bytes per record or
  89 Megabytes.  Not 96GB.  Since you can cache all the internal nodes
  of the btree you are done.  The machine will do a much better job
  caching a 89MB index then a 96GB index.
  
  Do not make the mistake of equating cpu to I/O.  The cpu required to
  iterate through 4 levels in the btree (assuming 64 entries per node,
  it only takes 4 to cover 16 million records) is *ZERO* ... as in,
  probably less then a microsecond, whereas accessing the disk is going
  to be mil

Re: Filesystem holes

2000-10-29 Thread Ryan Thompson

Matt Dillon wrote to Ryan Thompson:

 :Hi, Matt!  Thanks for the replies.  I'll try and keep you interested ;-)
 :
 :Hmm... Perhaps you're still missing my original point?  I'm talking about
 :a file with 96GB in addressable bytes (well, probably a bunch of files,
 :given logical filesize limitations, but let's say for simplicity's sake
 :that we have a single file).  It's actual "size" (in terms of allocated
 :blocks) will be only a bit larger than 2.1GB.  (Directly proportional to
 :the used size of the dataset.  Discrepancies only come into play when
 :record size != block size, but that can be worked around somewhat)
 
 Ah, ok... that's better, though I think you will find yourself
 tuning it endlessly if the blocksize does not match-up.  Remember,
 the filesystem allocates 8K per block, so if your record size is
 1500 bytes and you have a random distribution, you are going to
 wind up eating 8-14GB.

True.. As they say, "Disk is cheap", though... And "tuning" isn't so bad
on a simple address calculation.  I agree that if the record size doesn't
closely match the blocksize, there will be a waste of space, but at least
that waste is proportional to the dataset... Better than many data
structures that I could call by name.  It wouldn't be that difficult with
many problem sets to optimize them to the point where their records align
neatly with the blocksize.  Granted, the waste space would hang you with
some problem sets.  I propose, though, that for some problems, it is easy
enough to tune them to reduce (or, if you're really lucky, eliminate),
waste space.  Yes, this is a specialized solution.


 :In other words, ls -ls will report the "size" as some ridiculously large
 :number, will show a much smaller block count.  So, assuming four records
 :are added to the file on block boundaries, the file will actually only use
 :four blocks... nowhere near 96GB!
 :
 :In the UNIX filesystem (ya, I know.. just pick one :-), size of file !=
 :space allocated for file.  Thus, my original questions were centered
 :around filesystem holes.  I.e., non-allocated chunks in the middle of a
 :file.  When trying to READ from within a hole, the kernel just sends back
 :a buffer of zeros... which is enough to show that the record is not
 :initialized.  Actually, something like an "exists" function for a record
 :wouldn't touch the disk at all!  When writing to a hole, the kernel simply
 :allocates the necessary block(s).  This is really fast, too, for creation,
 :as the empty set can be written to disk with touch(1), and uses far less
 :memory than virtual initialization or memory structures ;-)
 :
 :As an example, try 
 
 Ahh.. yes, I know.  I'm a filesystem expert :-)  

I know, Matt, and that's why it's great to talk to you about this :-)  I
guess I needed an example to convey my point, though.  I apologize if it
came across as an insult to your intelligence; 'twas not intended that
way in the least.


   However, that said, I
 will tell you quite frankly that virtually *nobody* depends on holes
 for efficient storage.  

Ahh.  Ok.  This is the kind of response I was looking for.  Case studies
:-)

 There are only a few problems where it's
 practical some forms of executables, and sparse matrixes.  That's
 pretty much it.


 :And analyze the reported filesize, as well as the reported block count of
 :the file.  You should see a 2GB "size", and maybe 8K in allocated blocks
 :(depends how you ran newfs ;-).  This is behaviour that has been around
 :since the original UFS.
 
 It's not a good idea to use a small block size in UFS.  The minimum
 is pretty much 1K frag, 8K block.  for a sparse file this means 8K
 is the minimum that will be allocated (frags are only allocated for
 the end of a file).

Right, which is why this strategy _does_ only work for a specific subset
of problems, just as a directed graph works well for a path structure, but
would really suck for (among other things) maintaining a sorted list of
account numbers.


 If you think the btree idea is too complex to implement quickly, then
 try using a radix tree (essentially what you said in regards to 
 breaking the hash you calculate into a node traversal).  For your
 problem I would recommend 64 elements per node, which corresponds to
 6 bits.  16 million records would fit in 6x4 = 4 levels.  If you
 cache the first three levels in memory you eat 64^3 = 262144 x
 sizeof(record).  Assuming a simple file offset for the record, you eat
 exactly 1MB of memory, 64MB for the file index, and your data can
 be placed sequentially in the file. 

Right.  One more way to skin the cat, and a pretty good one at that, if
you have main memory to burn (I don't :-)

Assuming, though, (yes, I'm going to be a bit stubborn, here... ;-) that I
WANT to use this address calculation method in conjunction with the
filesyste

Re: Filesystem holes

2000-10-29 Thread Ryan Thompson

Leif Neland wrote to Ryan Thompson and Matt Dillon:

 
 What will happen, if somebody (possibly you, as mahordomo says), tries to
 make a backup of that file.

Make sure to use a program that can cope ;-)


 Will the copy also be with holes, or would that file suddenly use all 96GB?
 It will at least do so, if one does cat filefile.bak
 Probably tar will do the same.

Actually, tar will handle holes elegantly (this I have tested), with the
--sparse option.  Older versions would not.  cat and other similar
"filters" are naive, as they simply block I/O.

Backing up with tar and/or a filesystem dump would be just as effective as
with any other storage strategy.

cat file  file.bak on even a 2GB file is probably not something that
would be popular, anyway.

 I'd be afraid to create something which could easily blow up by having
 normal operations applied to it.

That's a valid concern.  That's the biggest drawback I see to the overall
strategy... But, specialized problems sometimes encourage specialized
solutions.

 
 Leif
 

-- 
  Ryan Thompson [EMAIL PROTECTED]
  Network Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Filesystem holes

2000-10-29 Thread Ryan Thompson

Matt Dillon wrote to Ryan Thompson and [EMAIL PROTECTED]:

 : Hi all...
 : 
 : One the tasks that I have undertaken lately is to improve the efficiency
 : of a couple of storage facilities we use internally, here.  Basically,
 : they are moderate-size tables currently implemented in SQL, which is OK in
 : terms of performance, but the hash function is breaking down and the
 : storage is rather inefficient for our table of about 2,850,000 members
 : (~2.1 GB total storage).  There are 64M possible hash values in our
 : current implementation, and our record size is variable, but could be
 : safely fixed at about 1.5KB... So, total storage if all values were used
 : would be about 96GB.  (See where I'm going with this?)
 : 
 : One of the options I am considering is actually using address calculation,
 : and taking advantage of filesystem holes, to keep storage down to what is
 : actually being used, while providing instant lookups.
 : 
 : The single file would be about 96G addressable bytes... But the actual
 : block count would be much lower.  I suppose I will have to create a series
 : of these files and divide the problem into  4GB chunks, but one
 : lookup/store will still be done with one calculation and one disk seek
 : (just more filehandles open).
 : 
 : Deletes seem problematic.  My question is, does the operating system
 : provide any way to free blocks used in the middle of a file?
 : 
 : Must I periodically re-create these files (long and slow process, but not
 : so bad if done infrequently) to reclaim space, or is there a way to free
 : arbitrary blocks in a file in userland (WITHOUT trashing the filesystem?
 : :-)
 : 
 : - Ryan
 : 
 : -- 
 :   Ryan Thompson [EMAIL PROTECTED]
 :   Network Administrator, Accounts
 :   Phone: +1 (306) 664-1161
 : 
 :   SaskNow Technologies http://www.sasknow.com
 :   #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2
 
 I would strongly recommend using a B+Tree instead of a hash table.  With
 a hash table slightly different lookups will seek all over the place.
 With a B+Tree lookups will stay more localized.

Right... That's a good point, but (and, no, I really didn't mention this
in my original post), "sequential" access is never important with the
system I have described.  Lookups are always random, and they are almost
always done one at a time (multiple lookups only done for data swaps...
very rare, like 1/1e7 accesses... and those are still O(1) (well,
O(2), but who's counting ;-)))...


 For example, if you insert a (nearly) sorted dictionary of words into an
 SQL table with a B+Tree, the memory working set required to insert
 efficiently stays constant whether you are inserting a thousand, a million,
 or a billion records.  That is, the memory requirement is effecitvely
 O(LogN) for a disk requirement of O(N).   With a hash table, the memory
 working set required to insert efficiently is approximately O(N) for a disk
 requirement of O(N)... much much worse.

Remember, though, I'm not proposing a hash table.  I have been lucky
enough to design a "hash" function that will crunch down all possible
values into about 64M unique keys--so, no collisions.  I propose to use
this function with address calculation.  So, O(1) everything, for single
random lookups, which is (virtually) all this data structure must deal
with.  Sorting isn't necessary, and that could be accomplished in O(n)
anyway.


 A B+Tree will also scale with the size of the dataset being managed,
 so you do not have to preallocate or prereserve file space.

So will address calculation + filesystem holes, and sufficiently large
filesizes :-)

#include stdio.h

int main()
{
FILE*f;

f = fopen("bigfile", "w");
fseek(f, 0x7fff, SEEK_SET);
putc('\n', f);
fclose(f);
return 0;
}

$ cc -o hole hole.c
$ ./hole
$ ls -lsk bigfile
48 -rw-rw-r--  1 ryan  2147483648 Oct 29 14:09 bigfile


 We are using an in-house SQL database for our product (which I can't
 really talk about right now) and, using B+Tree's, I can consistently
 insert 300 records/sec on a cheap desktop PC (which I use for testing)
 with 64MB of ram (remember, insert requires an internal select to check
 for key conflicts), even when the test database grows to several
 gigabytes.

I haven't even begun implementation, and I haven't had a chance to greatly
experiment... So I don't know how this will perform.  It would be directly
dependent on disk seeks, though...

Well... We could try a simple test... Add the following to hole.c:

charbuf[1024];  /* 1K structure */
int i;

...

for (i = 0; i  8192; i++)  /* Insert 8192 records */
{
fseek(f, rand()/1024*1024, SEEK_SET); /* Random access simulation */
fwrite(buf, sizeof(buf), 1, f);
}

$ time ./hole.c

real0m25.436s
user0m0.180s
sys 0m4.912s

So, about 320 records/sec on the follo

Filesystem holes

2000-10-28 Thread Ryan Thompson


Hi all...

One the tasks that I have undertaken lately is to improve the efficiency
of a couple of storage facilities we use internally, here.  Basically,
they are moderate-size tables currently implemented in SQL, which is OK in
terms of performance, but the hash function is breaking down and the
storage is rather inefficient for our table of about 2,850,000 members
(~2.1 GB total storage).  There are 64M possible hash values in our
current implementation, and our record size is variable, but could be
safely fixed at about 1.5KB... So, total storage if all values were used
would be about 96GB.  (See where I'm going with this?)

One of the options I am considering is actually using address calculation,
and taking advantage of filesystem holes, to keep storage down to what is
actually being used, while providing instant lookups.

The single file would be about 96G addressable bytes... But the actual
block count would be much lower.  I suppose I will have to create a series
of these files and divide the problem into  4GB chunks, but one
lookup/store will still be done with one calculation and one disk seek
(just more filehandles open).

Deletes seem problematic.  My question is, does the operating system
provide any way to free blocks used in the middle of a file?

Must I periodically re-create these files (long and slow process, but not
so bad if done infrequently) to reclaim space, or is there a way to free
arbitrary blocks in a file in userland (WITHOUT trashing the filesystem?
:-)

- Ryan

-- 
  Ryan Thompson [EMAIL PROTECTED]
  Network Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Building customized kernel without root passwd

2000-02-29 Thread Ryan Thompson

Zhihui Zhang wrote to [EMAIL PROTECTED]:
 
 My professor plans to use FreeBSD for teaching purpose. We will allow
 students to build their kernel but do not want to give them root password.
 So it's better to find a way to let students build kernel under their own
 account, save the kernel on a floppy and then boot from the floppy.  
 
 I am familiar with normal kernel build process.  But have not done the
 above before.  I hope someone can give me some suggestions and I will try
 them out.
 
 Thanks a lot.
 
 -Zhihui

It might be possible to do... (SHOULD be possible, though modifications to
the Makefile would have to be done to point the build away from
/usr/src/sys/compile.  The install option would also have to be modified
to point to the floppy... And watch it die when the write protect tab is
locked.  ;-)

I would STRONGLY recommend against this though, as it's really a false
sense of security... Heck, maybe even less... After booting from the
floppy (presumably in single user mode), the user can make arbitrary root
mounts of the system's hard drive (and any maproot=0 NFS exports allowed
by that machine!).  In fact, enabling floppy boots on public machines
where wide physical access is available is generally a Bad Idea.  Of
course, not giving the students root's password on that machine is also a
moot point, as a 'passwd root' from that boot flopply sort of avoids the
whole issue.  :-)

Most colleges give students responsibility for their own computers for
this sort of work.  Things tend to go awry when budding SysAdmins (with
strict lab deadlines, no less) are given root privileges.

It is possible to modify the 'mount' command to require some extra
authentication (like a password or challenge phrase) to perform root
mounts, but unless you regulate all floppies that enter and exit your lab,
there is nothing to stop users with home systems from rolling their own
mount from an existing FreeBSD system without such restrictions.

Basically, if the user has the permissions to build and boot from their
own kernel and/or suite of utilities (be it from a floppy or the local
drive), assume they have free reign over the entire system, and any
network resources root normally has access to.

-- 
  Ryan Thompson [EMAIL PROTECTED]
  Systems Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Annoying nfsrcv hangs

2000-02-27 Thread Ryan Thompson

Matthew Dillon wrote to Ryan Thompson:

 :ps al on my system shows multiple nfsrcv hangs on processes such as df, ls
 :and umount.  Without any other characteristic problems, the nfs server
 : [...]
 
 I assume the hangs are on the client?  Not surprising if its a 3.2
 system.  A whole lot of NFS fixes went in between 3.2 and 3.4 so I 
 recommend upgrading the client to the latest post 3.4 -stable.

Yes, the hangs are on the 3.2-R client.  An upgrade is currently in the
works... Actually it's slated for installation next week.  Hopefully that
will help kick NFS into shape for me.


 :That's verbatim... The mount was NOT done on bigfs... It was in fact done
 :on /f/bigfs.  "We have secretly switched this SysAdmin's mountpoint with
 
 I don't know what is going on here, but it kinda sounds like cockpit
 trouble somewhere either in the exports line or the client's mount 
 command.

Something strange, certainly... I don't recall changing the mount status
of any of those, or mounting/unmounting anything at all for that matter.  
Didn't see anything weird in messages.  Could still have been recent pilot
error, I suppose.  However, exports on the server haven't changed in eight
months and that 'bigfs' was mounted at boot from fstab on the client, many
months ago.  Nothing OOTO in exports or fstab.. The mounts look pretty
homogenous.

That's what made me think something was strange;  several other
similarly-mapped mounts between the same two machines never missed a beat.

HOWEVER... (Update, here)... When I finally got in to the office today, I
did a `umount -a -t nfs ` on the client (the  for the purpose of
regaining my shell while umount hung trying to unmount bigfs :-).  Then
killed off mountd, nfsd, portmap, and even inetd in that order, and
restarted them in the reverse of that order.  The hung processes on the
client magically became unstuck and terminated, and I was able to unmount
that muddled bigfs and start over.  Everything seems to be in working
order once again.

Perhaps the 3.2 client isn't solely to blame, here, after all.  I'll try
this all again when both machines are running -STABLE.

 :Has anything been built into -CURRENT to address these hangs?  It has
 :plagued many in the past, and continues to do so.
 :
 :  Yours truly,
 :  Ryan Thompson [EMAIL PROTECTED]
 
 Both -current and -stable have various NFS fixes that earlier releases
 did not.  In general, NFS fixes for the -stable branch have been kept
 up to date with the work in -current, but -current ought to have much
 better NFS performance then stable.

Performance = transfer rates and latency?  Or just generalized stability?  
I haven't had the time to watch -CURRENT all that closely, but I'll be
happy when some of the new functionality becomes -STABLE.

Thanks for the reply,
- Ryan

-- 
  Ryan Thompson [EMAIL PROTECTED]
  Systems Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Annoying nfsrcv hangs

2000-02-26 Thread Ryan Thompson


ps al on my system shows multiple nfsrcv hangs on processes such as df, ls
and umount.  Without any other characteristic problems, the nfs server
machine's exports all seemed to be working correctly.  However, *one* and
only one of the mounts somehow went south.  'mount' on the client machine
shows:

# mount | grep 10.0.0.2
10.0.0.2:/usr on /f/usr
10.0.0.2:/devel on /f/devel
10.0.0.2:/bigfs on bigfs

That's verbatim... The mount was NOT done on bigfs... It was in fact done
on /f/bigfs.  "We have secretly switched this SysAdmin's mountpoint with
Foldger's crystals.  Think he'll taste the difference?" It appears to be
the cause of the hangs I mentioned.  One such hang was one that I just
created by issuing umount -f bigfs.

The client nfs mounts are mounted intr, yet I still can't send a TERM or
KILL that these processes will catch.

# grep bigfs /etc/fstab
10.0.0.2:/bigfs /f/bigfsnfs rw,bg,intr  2   2

The client is a 3.2-RELEASE system.  Server is 3.4-STABLE as of about 12
days ago.

It looks like reboots are in order... But, these are production machines!  
This is certainly annoying...!  I thought the intr option was supposed to
help with hung nfs procs.  Is there anything else I can try in my current
situation?  Any better ways to prevent this sort of thing (besides running
SunOS?)  Or is it PR time?

# uptime
 1:46AM  up 118 days 1:14, 12 users, load averages: 2.36, 2.34, 2.21

As you see, I haven't had any longevity problems up until now..

Has anything been built into -CURRENT to address these hangs?  It has
plagued many in the past, and continues to do so.

  Yours truly,
- Frustrated :-)

-- 
  Ryan Thompson [EMAIL PROTECTED]
  Systems Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: your mail

2000-01-13 Thread Ryan Thompson

On Thu, 13 Jan 2000, Ramiro Amaya wrote:

 I am new in this mail list, so I do not have so much experience about the
 questions I should ask, If I am in the worng place let me know, please.
 Well my question is related with Solaris 2.6, the story is like this:
  
 I have a Solaris 2.5 server which has configured all the printers so I can
 perint from that machine without any problem; The printers are remotes so I
 use the IP address on /etc/hosts ( Nis database ). Recently, I update the
 nis clients to Solaris 2.6, but I am not sure how to configure the
 printers, is there anybody there who can give me a hand?. Thanks
 

Hi, Ramiro..

You are indeed in the wrong mailing list (and on the wrong server :-) 

[EMAIL PROTECTED] is, firstly, pertaining to the FreeBSD UNIX
operating system, available on CD, or at ftp.cdrom.com in freely
downloadable forms.  This -hackers list is meant for the more technical
array of questions and their responses.  "More technical" generally means
questions pertaining to the source code of the operating system itself.

--
  Ryan Thompson [EMAIL PROTECTED]
  50% Owner, Technical and Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Default minfree performance restrictions?

1999-12-20 Thread Ryan Thompson

Hello all,

After creating some larger slices than I'm used to, I finally felt the
full force of a default 8% minfree setting.  So, I went to tunefs(8) to
try and put a damper on the multiple gigabytes that aren't being made
available to users.  However, I was a bit disappointed to note that
setting minfree at or below 5% (using integer values!) would result in
SPACE optimization.

So, on my 3.4-STABLE system, I did some hunting around.

In /usr/src/sys/ufs/ffs/fs.h, I see MINFREE defaults to 8%, and default
time optimization, like tunefs says.  Nothing fancy, there.

In ./ffs_alloc.c, however, I found out how the SPACE/TIME optimization is
determined.  In particular, in ffs_realloccg(), I find (from yesterday's
-STABLE), the following snippet:

/*
 * Allocate a new disk location.
 */
if (bpref = fs-fs_size)
bpref = 0;
switch ((int)fs-fs_optim) {
case FS_OPTSPACE:
/*
 * Allocate an exact sized fragment. Although this makes
 * best use of space, we will waste time relocating it if
 * the file continues to grow. If the fragmentation is
 * less than half of the minimum free reserve, we choose
 * to begin optimizing for time.
 */
request = nsize;
if (fs-fs_minfree = 5 ||  /* !!! */
fs-fs_cstotal.cs_nffree 
fs-fs_dsize * fs-fs_minfree / (2 * 100))
break;
log(LOG_NOTICE, "%s: optimization changed from SPACE to TIME\n",
fs-fs_fsmnt);
fs-fs_optim = FS_OPTTIME;
break;

Questions:

 - Can the line I've marked /* !!! */ have the minimum value of 5 safely
reduced? Eliminated? (safely = if/when a filesystem fills up, could writes
potentially corrupt the fs?) On small partitions with many inodes, perhaps
5% is appropriate, but in cases like mine, where I have filesystems in
excess of 20GB with  0.1% fragmentation,  5% minfree is frankly too much
to give away.

 - Would it make sense to externalize this option into a header file,
kernel config option, or perhaps tunefs itself?  I'm guessing the latter
would require modifications to our UFS implementation to allow for the
extra parameter for each filesystem... And would definitely qualify as an
"invasive" change.  Food for thought, though :-)

Any insights?

I suppose I could just go ahead and try it, but, before I end up doing a
reinstall (cd /usr/src  make blowupworld), I thought it better to ask a
more experienced following of users :-)

--
  Ryan Thompson [EMAIL PROTECTED]
  50% Owner, Technical and Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message