Re: workflow idiom to compare zip/tgz with folder subtree

Warren Young Fri, 25 Sep 2015 09:39:03 -0700

On Sep 24, 2015, at 6:50 PM, Paul wrote:
> 
> I am shuttling incremental work back and
> forth between two locations using disc.


In that case, you want a distributed version control system (DVCS), not a 
centralized one.  That rules out Subversion.  (And CVS.)  Fossil and Git are 
DVCSes, so they’ll work a case like this.  Mercurial and Bazaar (a.k.a bzr) are 
also DVCSes, and both are also in the Cygwin package repo.

I don’t know how to use any of the other three available DVCSes for a task like 
yours, but it’s certainly easy enough with Fossil.

The command flow looks like this, assuming the removable disk is called R:, and 
using “f” as a short alias for “fossil”:

   f new /cygdrive/r/shared-project.fossil
   cd ~/shared-project
   f open /cygdrive/r/shared-project.fossil
   f add *
   f ci -m 'initial checkin’ 

Now everything in ~/shared-project is copied into the Fossil repo on the R: 
volume.  When you get to the remote site:

   cd ~/shared-project
   f open /cygdrive/r/shared-project.fossil

Now you have a copy of all the files from the R: drive.  If you open a Fossil 
repo within an existing tree that previously wasn’t under Fossil management, it 
will ask whether you want to overwrite the preexisting files or leave them 
alone.  If you leave them alone, a subsequent “fossil diff” will show how your 
preexisting files differ from the ones in the Fossil repo on R:.

After you make changes to files at either site, say “fossil ci” and it will 
open a text editor for you to describe your changes.   (Or use the -m option, 
as above.)

Then back at the other site:

   cd ~/shared-project
   f up

Now all your remote changes are synchronized.

If all that looks complicated, realize that there are only a few day-to-day 
commands: f ci, f up, f diff.

> the majority of the
> differences will not be relevant as the hierarcy exists at both sites.

If you’re saying that there are files that need to be semi-synchronized between 
the sites, so that only *some* changes to individual files need to be copied, 
then Fossil is probably going to fight you.

If you’re saying instead that some files in a given tree are sync’d and some 
aren’t, that’s easy.  That’s actually the normal way to use Fossil, since with 
software development projects, you typically only store original source files, 
and never store anything that can be re-generated from those sources.

(Some projects bend that rule a bit, storing both configure.ac and configure, 
for example.)

> Most of the files are not software, though parallels can be drawn:
> Long SQL scripts, Matlab scripts, images, data files, VBA, Matlab
> files, text files, LaTeX files, image files, and M$ Office files
> (Access, Excel, Word, Powerpoint, PST).

Most of those things are sensible to store in Fossil.

The main thing you want to avoid storing are large binary files whose content 
largely changes frequently.

Uncompressed image files (e.g. TIFF without compression) are fine, because 
probably only *parts* of the image change from one update to the next, so 
Fossil will store only the differences, then compress that difference, so that 
you effectively get TIFF-with-compression, and more efficiently than storing a 
series of separately-compressed TIFFs besides.

Compressed image files (e.g. PNG) can be okay, as long as they change rarely.  
The problem with compressed images is that the compression algorithm can change 
every byte in the file just because a single pixel changed, so the whole image 
has to be stored in the Fossil repo again.

That said, your existing ZIP archival scheme may be re-copying unchanged images 
already, in which case Fossil will actually be more efficient, since versions 
where an image is unchanged refer back to the previously-stored copy of that 
image.

Besides TIFF, another image file format you might consider is PSD, which can be 
either compressed or uncompressed.  (Photoshop > Preferences > File Handling > 
Disable Compression of PSD and PSB Files.)  Plus, PSD layers ensure that only a 
changed layer needs to be stored separately, rather than the whole thing if you 
*do* use PSD compression.

MS Office docs are a similar problem to compressed PSD, since they’re just 
specially-structured ZIP files.  Unchanged assets within, say, a PPTX file 
shouldn’t be re-copied into the Fossil repo on checkin, but more data than 
would be stored if you could get an uncompressed PPTX file will still have to 
be stored.

By comparison, the LaTeX documents are wonderful for Fossil, since they’re 
uncompressed text, so you’ll get massive compression from them.  Not just the 
normal 2:1 you typically get for text, but potentially many times that because 
of the delta compression.

> This is not a development
> environment, it is an analysis environment (with code hackery to that
> end).  However, the evolution of files and version control
> requirements probably overlap

Yes, version control systems are good for more than just software source code.

> One differences from the days when I
> wrote "real" (compiled) code

SQL, VBA, and MatLab are real code.  Don’t let anyone tell you different.

> As much as possible,
> everything should be quickly generatable from raw client input data
> files.

That strategy matches exactly with what you want for a VCS: store the source 
data, not the data generated from it, unless it just takes too much time and 
effort to re-generate it.

> using vim window
> splitting, it is very efficient to browse the diff output

While you can still do that with Fossil, it’s probably better to switch to 
either “fossil gdiff” coupled with a graphical diff utility of your choice, or 
to use “fossil ui”, which will let you view diffs of checked-in versions in a 
browser, either inline or side-by-side, your choice.

On Windows, fossil gdiff defaults to WinDiff, which you may already have 
installed, since MS distributes it with some other software:

  https://en.wikipedia.org/wiki/WinDiff

A lot of people prefer Beyond Compare or Meld, both of which can be configured 
to act as the graphical diff handler for Fossil.

It should be possible to use Vim’s vimdiff feature this way, too.

> try a few baby steps at some point.

One of the smarter things you can do with Fossil is to use several 
repositories, one for each focused project, instead of trying to store 
everything in a single “world” repo.

So, put one project under Fossil management today.  Sync it back and forth, 
work out the kinks.  Put another project under Fossil in a separate repository 
next week.  Add more repos as you become comfortable with the process.

Fossil makes managing multiple active repos easy with its “all” command, which 
lets you do common things to all of the repositories.  “fossil all sync” is a 
common incantation, for example, meaning “Update all the local Fossil checkouts 
with the changes from the master repos.”
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Re: workflow idiom to compare zip/tgz with folder subtree

Reply via email to