software distribution with subversion

2013-01-31 Thread Jason Keltz

Hi.

I am faced with a problem where I need to distribute a directory 
containing about 60 GB worth of software on a Linux file server to about 
100 systems.  The software must be localized on those systems and not 
shared out over NFS.  On a regular basis, software may be added or 
removed from the directory, and all the clients should update 
accordingly in the evening.  During the update period, some client 
systems may be off.


I think that Subversion would be a reasonable way to solve this problem 
which isn't quite the type of problem that rsync is intended to handle 
(because of the number of machines).  However, for a variety of reasons, 
I don't want to run subversion on the actual file server.  Instead, 
nightly, I'd like to rsync changes in the contents of the software 
directory on the file server to a software distribution server which 
would run its own svnserve.  The clients would then connect up to the 
server nightly, and update themselves accordingly.  Because of the 
versioning, if a client misses an update, it would be updated the next 
time around, even if its been off for a while.


The inital update between the file server and the software update server 
would require rsyncing the whole 60 GB of software to a "working 
directory", after which, to make subversion see this as a "working 
directory", I would have to commit the entire directory, then check it 
back out.  This process seems like a bit of a waste, but it's a one time 
process, and I don't really see any way around it.  In the future, I 
would like to be able to rsync changes between the file server and the 
working directory on the software distribution server, which would 
including using --delete to ensure that software deleted from the file 
server is also deleted from the subversion working idrectory, and 
including the excluding of the .svn directory from the working copy.  
However, after the rsync happens, I now need to run a command that would 
update the repository with the state of the working directory.  However, 
it's not exactly clear how this would work?  Running an "svn update" 
isn't going to delete directories from the repository that were deleted 
from the working directory.  I believe you need to use "svn delete" for 
this?


Any ideas that anyone might be able to offer?

I'm not on the list, so please ensure that you CC: me in any response.

Thanks for your help!

Jason.




Re: software distribution with subversion

2013-01-31 Thread Jason Keltz

On 31/01/2013 6:06 PM, Bob Archer wrote:

I am faced with a problem where I need to distribute a directory containing
about 60 GB worth of software on a Linux file server to about
100 systems.  The software must be localized on those systems and not shared
out over NFS.  On a regular basis, software may be added or removed from the
directory, and all the clients should update accordingly in the evening.  During
the update period, some client systems may be off.

I think that Subversion would be a reasonable way to solve this problem which
isn't quite the type of problem that rsync is intended to handle (because of the
number of machines).  However, for a variety of reasons, I don't want to run
subversion on the actual file server.  Instead, nightly, I'd like to rsync 
changes in
the contents of the software directory on the file server to a software
distribution server which would run its own svnserve.  The clients would then
connect up to the server nightly, and update themselves accordingly.  Because
of the versioning, if a client misses an update, it would be updated the next
time around, even if its been off for a while.

The inital update between the file server and the software update server would
require rsyncing the whole 60 GB of software to a "working directory", after
which, to make subversion see this as a "working directory", I would have to
commit the entire directory, then check it back out.  This process seems like a
bit of a waste, but it's a one time process, and I don't really see any way 
around
it.  In the future, I would like to be able to rsync changes between the file
server and the working directory on the software distribution server, which
would including using --delete to ensure that software deleted from the file
server is also deleted from the subversion working idrectory, and including the
excluding of the .svn directory from the working copy.
However, after the rsync happens, I now need to run a command that would
update the repository with the state of the working directory.  However, it's 
not
exactly clear how this would work?  Running an "svn update"
isn't going to delete directories from the repository that were deleted from the
working directory.  I believe you need to use "svn delete" for this?

Any ideas that anyone might be able to offer?

I'm not on the list, so please ensure that you CC: me in any response.

Thanks for your help!



What you need to do could work. I assume this "software" in order to run can 
build built or whatever during your nightly update on each client?

You keep saying "rsyncing" ... you wouldn't use that. You wouldn't use that of 
course, you would use the svn client binary.

Actually, maybe I wasn't clear..
The software includes various packages like say, Matlab, or Maple, or 
whatever else, already installed...  imagine a directory on the 
fileserver.. say, /local/software which includes "bin", "lib", etc...
I'm not "installing" the software.   it's already been installed..  I'm 
just syncing a directory between machines..
As for rsyncing.. I would rsync the software from the "file server" to 
the "software distribution" server, and then use svn from there to check 
in all the changes.



For you initial load... if the software is on the server where you will house 
your repository you can just import the data into the repository from that 
file... there is no need to send the data twice. In other words, you can have 
both a working copy and a repository on your central server.
Yes.  Initially I would do an import, but the problem is... the next 
day, the software gets updated on the "real" file server... say, new 
version of Matlab or something...  in the evening, I want the process to 
run that would rsync the data (with all the changes) from the file 
server to the software distribution server,  do something to commit the 
changes, then the 100 clients would eventually each "svn update". 
However, to be able to commit the changes, I need to have a working copy 
on the software distribution server



However, after the rsync happens, I now need to run a command that would
update the repository with the state of the working directory.  However, it's 
not
exactly clear how this would work?  Running an "svn update"

"svn update" brings any changes in the repository to your working copy. "svn 
commit" does the opposite... it puts any changes in a working directory into the repository.
See, this is where I'm confused... I created a few directories including 
"bin" and "pkg" for a test.  All committed fine... erased them from the 
working copy, did a commit then a status and I see:


!   bin
!   pkg

but when I go into a different directory and check out the current state..

Apkg
Abin
Checked out revision 2.

they're still there...


Hth...

That said, if this is actual software, wouldn't using one of the many package 
management tools available in Linux be a better fit?


The thing is, I'm moving around already i

Re: software distribution with subversion

2013-01-31 Thread Jason Keltz

On 31/01/2013 6:40 PM, Les Mikesell wrote:

On Thu, Jan 31, 2013 at 4:10 PM, Jason Keltz  wrote:

I am faced with a problem where I need to distribute a directory containing
about 60 GB worth of software on a Linux file server to about 100 systems.
The software must be localized on those systems and not shared out over NFS.
On a regular basis, software may be added or removed from the directory, and
all the clients should update accordingly in the evening.  During the update
period, some client systems may be off.

I think that Subversion would be a reasonable way to solve this problem
which isn't quite the type of problem that rsync is intended to handle
(because of the number of machines).

I'd think it is exactly the problem that rsync is intended to handle.
rsync is great when you want to sync the contents from one machine to 
another machine in one direction.. (unison if you need dual direction 
sync...)   I thought about using rsync to solve this problem... two 
ways I can think of..


1)  All the machines run rsync against the server.. kills the server, 
but let's say they do it all at different times.. the server is hefty..  
hey, it would work, but for every single rsync, the server needs to look 
at its entire file tree to see which files have changed 100 syncs = 
100 times processing the same thing over and over again... If only rsync 
would let me save that state to a file so that it doesn't need to reload 
it every time it runs, then I know which solution I'd be using...  other 
problem is, it would take a long time..
2) log/tree approach --- server updates one client, then the server and 
the one client each update another client, then each of those 3 update 
another...  much faster, but again, you have to read the server state 
each and every time... and then I have to deal with the fact that 
various random machines are off ...


It's a really interesting problem..


However, for a variety of reasons, I
don't want to run subversion on the actual file server.  Instead, nightly,
I'd like to rsync changes in the contents of the software directory on the
file server to a software distribution server which would run its own
svnserve.  The clients would then connect up to the server nightly, and
update themselves accordingly.  Because of the versioning, if a client
misses an update, it would be updated the next time around, even if its been
off for a while.

Subversion would give you the option of intentionally maintaining your
targets at different revision levels, but at a cost of needing a
'working copy' format where you have an unneeded 'pristine' duplicate
copy of everything.
The truth is, I wouldn't intentionally have the machines at different 
software levels... (well, that could be useful for testing, but that's 
another story)  but a machine could be off during the update and 
would be able to "catch up" no longer how long it was off...

However, after the rsync happens, I now need to run a
command that would update the repository with the state of the working
directory.  However, it's not exactly clear how this would work?  Running an
"svn update" isn't going to delete directories from the repository that were
deleted from the working directory.

Sure it will - it will make it match the state of whatever version you
are updating to.


I believe you need to use "svn delete"
for this?

That is for when you are making the changes you intend to commit.



I'll have to try that again .. didn't seem to be working the way I 
expected it to...


Jason.


--
Jason Keltz
Manager of Development
Department of Computer Science and Engineering
York University, Toronto, Canada
Tel: 416-736-2100 x. 33570
Fax: 416-736-5872



Re: software distribution with subversion

2013-01-31 Thread Jason Keltz

On 31/01/2013 9:13 PM, Ryan Schmidt wrote:

On Jan 31, 2013, at 20:05, Jason Keltz wrote:


On 31/01/2013 6:06 PM, Bob Archer wrote:

What you need to do could work. I assume this "software" in order to run can 
build built or whatever during your nightly update on each client?

You keep saying "rsyncing" ... you wouldn't use that. You wouldn't use that of 
course, you would use the svn client binary.

Actually, maybe I wasn't clear..
The software includes various packages like say, Matlab, or Maple, or whatever else, already installed...  
imagine a directory on the fileserver.. say, /local/software which includes "bin", "lib", 
etc...I'm not "installing" the software.   it's already been installed..  I'm just syncing a 
directory between machines..
As for rsyncing.. I would rsync the software from the "file server" to the "software 
distribution" server, and then use svn from there to check in all the changes.


For you initial load... if the software is on the server where you will house 
your repository you can just import the data into the repository from that 
file... there is no need to send the data twice. In other words, you can have 
both a working copy and a repository on your central server.

Yes.  Initially I would do an import, but the problem is... the next day, the software gets updated 
on the "real" file server... say, new version of Matlab or something...  in the evening, 
I want the process to run that would rsync the data (with all the changes) from the file server to 
the software distribution server,  do something to commit the changes, then the 100 clients would 
eventually each "svn update". However, to be able to commit the changes, I need to 
have a working copy on the software distribution server


However, after the rsync happens, I now need to run a command that would
update the repository with the state of the working directory.  However, it's 
not
exactly clear how this would work?  Running an "svn update"

"svn update" brings any changes in the repository to your working copy. "svn 
commit" does the opposite... it puts any changes in a working directory into the repository.

See, this is where I'm confused... I created a few directories including "bin" and 
"pkg" for a test.  All committed fine... erased them from the working copy, did a commit 
then a status and I see:

!   bin
!   pkg

but when I go into a different directory and check out the current state..

Apkg
Abin
Checked out revision 2.

they're still there...

Correct. Subversion does not track your movements. You must tell Subversion what you are moving and 
deleting by doing the moves and deletes using "svn mv" and "svn rm", not using 
regular OS commands.



Hth...

That said, if this is actual software, wouldn't using one of the many package 
management tools available in Linux be a better fit?

The thing is, I'm moving around already installed software, and there's nothing that 
great, as far as I can see, for doing that. The twitter guys are using something they 
wrote called "murder" which uses torrent to do this kind of thing...  excellent 
idea, but it uses Ruby and several other tools ...   and I don't want to get into that at 
the moment...

Subversion is not going to be a satisfactory solution for this use case. Besides all the 
issues you're describing with setting up the server-side infrastructure for this, and as 
was already mentioned, when you check out a working copy of this on your clients, there 
will be a "duplicate" pristine copy of everything. So if you have 60GB of 
software, it'll take up 120GB of space on the client machine.

I'm glad you brought that up :)


Subversion is not a software distribution tool; it is a document and revision 
management system. Use a different tool. As someone else said, rsync seems like 
a good tool for this job; I didn't understand why you think using rsync 
directly between your file server and your clients won't work.



See my email to Les...  If only the rsync server could save a copy of 
the file checksums when it runs, it would probably decrease the sync 
time by half and save a whole lot of disk activity...



--
Jason Keltz
Manager of Development
Department of Computer Science and Engineering
York University, Toronto, Canada
Tel: 416-736-2100 x. 33570
Fax: 416-736-5872



Re: software distribution with subversion

2013-02-01 Thread Jason Keltz
Thanks to everyone who provided me with very helpful feedback re: my 
problem of "software distribution with subversion".  I am re-evaluating 
the project, and how to complete it best.


Thanks!

Jason.