Re: [SLUG] today's scary thought

2010-07-17 Thread dave b
 Hope this helps. (Understanding, that is -- I know it doesn't help solve
 anything.)
Talk is cheap. Show me the code.  Linus.
So when is google docs getting the time spent actively viewing count
for a document?

This would be a neat thing to have. So who is going to add this
feature to bzr or hg ;P ?
However, Jeff's  zeitgeis is pretty neat.
self._cursor.execute(INSERT OR IGNORE INTO uri (value) %s
...???...  perhaps I didn't want to see the code ;P
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-17 Thread dave b
On 18 July 2010 03:29, dave b db.pub.m...@gmail.com wrote:
 ...???...  perhaps I didn't want to see the code ;P


Bah copy pasta fail - the rest of the email is this:

Sure it is neat to talk about stuff like nosql etc.  - you still have
the interesting syncing problem. imho http://github.com/apenwarr/bup
looks pretty neat. That with git torrent could be rather 'awesome' ;P


--
Something's rotten in the state of Denmark. -- Shakespeare
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-16 Thread Matt Moor

On 16/07/10 12:31 PM, Daniel Pittman wrote:

Also, lots of different apps, so I might well end up with multiple
solutions.  A good distributed POSIX FS with replication, eventual
consistency, some sensible conflict resolution model, and data center
awareness would have been easy enough to use though.

If I could have my pony. ;)
   
So as you and Jamie have both alluded to, this is a pretty hard problem 
to solve. Most of the Enterprise-y solutions I deal with solve it by 
pushing the problem down the stack and using $expensive_storage_array 
with synchronous replication. Possibly over also $expensive_dwdm_fibre. 
The most accessible of these solutions is probably NetApp's MetroCluster 
tech. If you truly need real-time POSIX compliant synchronous access 
between sites, this (and it's kin from HDS and EMC) is pretty much your 
only choice.


Most of the other Open-Source (and indeed, commercial) solutions to 
doing this at a filesystem-level have left me wanting. We tested and 
deployed GlusterFS for a large customer project last year, purely for HA 
file serving, and regretted it so much that we ripped it out in the 
middle of a busy production period and replaced it with NFS + Rsync 
(particularly after the customer revised their recovery time objective 
:)). We've had similar amounts of pain with Microsoft's DFS solution (in 
Windows land, ugh).


As Jamie notes, it's at this point that you'd usually go back and 
redefine the problem, particularly after the sales dudes make your eyes 
bleed. :)


There are a couple of different shapes this problem usually takes in the 
market - I want a DR site or I need to share files with a remote 
branch. Considering these, and their various solutions (Publish/two+ 
subscribers, move the desktops closer to the data, active/passive 
access, ...) might give you some more ideas about outside-the-box ways 
to solve the problem.


Your mileage will almost certainly vary.

Cheers,

Matt
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-16 Thread Daniel Pittman
Jake Anderson ya...@vapourforge.com writes:
 On 15/07/10 16:14, Daniel Pittman wrote:

[...]

 We cant be the first people to come across this branch office scenario.

Nope.  Lots of people have, and wished there was a good solution, but it is a
*really* hard problem.  The difficulty curve in fixing it looks like a
backward L, basically: the most trivial bit is trivial, then the problem more
or less instantly gets insanely hard.

 My goal is to have the branch office get a copy of all the files (think MS
 office) without hitting performance at either end.  something like this
 rsync thing, with a distributed lock manager would be the solution to 99% of
 the problem

...only then you pay the cross-WAN latency cost for every file open, at least,
plus have to deal with the problem of disconnected operation, so still need
conflict resolution, plus...

 The only problem I can see is if person A in Newcastle wants person B in
 Sydney to look at their file, they press save and then person B opens it
 before the lazy copy has moved it over, Perhaps maintain a write lock on the
 file until its synched? with user definable behaviour in the case of failure.

 At the moment the branch office is going to be working over a VPN back to
 the main office, with all the files etc sitting inside VM's, the images of
 the VM's will get rsynced nightly.  Which all in all is a fairly craptacular
 solution to be honest.

Mmmm.  For what it is worth, the least-worst solutions that I have found are:

1. Fire up a WebDAV server in each office to store their files, and make
   sure that it can be accessed through a fully public DNS name.[1]

   Some document management solutions offer WebDAV as part of their feature
   set, and might be a good addition to this.  IIRC, SharePoint, in an MS
   environment, is one of 'em.

2. Go buy a copy of http://www.netdrive.net/ for every user that you have.
   (...or just use a Mac, since they do WebDAV OK too. :)

3. Use it to mount the WebDAV share for your users, because unlike the native
   Win32 WebDAV support, it doesn't suck.[2]  Specifically, it works even if
   you are using a program that *isn't* Microsoft Office.

That gives reasonable performance, akin to HTTP, for reading the remote file,
plus some local caching, and it works right no matter where on the Internet
your users are because they access a public URL, not a private CIFS share.


However, not perfect, especially the server options, and not exactly
replicated between sites.  I don't know what the latest NetDrive offers in
terms of offline operation, either.

Daniel

Footnotes: 
[1]  Add SSL, authentication, etc to taste, of course.

[2]  Disclosure: this might not be true in Windows 7, or the latest Vista
 service packs, but because I have never used them I can't actually say.

-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-16 Thread Jamie Wilkinson
On 16 July 2010 17:15, Daniel Pittman dan...@rimspace.net wrote:

 Jake Anderson ya...@vapourforge.com writes:
  On 15/07/10 16:14, Daniel Pittman wrote:

 [...]

  We cant be the first people to come across this branch office scenario.

 Nope.  Lots of people have, and wished there was a good solution, but it is
 a
 *really* hard problem.  The difficulty curve in fixing it looks like a
 backward L, basically: the most trivial bit is trivial, then the problem
 more
 or less instantly gets insanely hard.


I think it's important to mention why it's hard: the speed of light is
constant, so as distance increases, latency increases.  The longer it takes
to round trip, the more time you spend waiting for confirmation that your
atomic operation (eg writing new data) has committed at both ends, and the
more chance you have of conflicts arriving when both sides try to write (as
your window of opportunity is now wider). Add to that the difficulties in
sequencing events in a distributed system (i.e. how do both ends know the
other end's clock is accurate enough (but then see also vector clocks)) and
you see that suddenly there's a whole bunch of expensive problems to solve.

So people work around it by sacrificing some of the things they need, like
atomicity, or consistency, or currency; this is what they talk about in
these newfangled NoSQL cloud storage systems.   The less you need to
write, or check (like, say, a foreign key to mainttain referential
integrity) the faster you can write and replicate this data.  If you don't
cae that the immediate data is inconsistent at one end, but will be soon,
then you don't have to wait for the write to sync.  But sometimes you can't
make those sacrifices, such as in the case of trying to maintain POSIX
filesystem semantics.  The cloudy things like Google's GFS aren't POSIXy at
all for just this reason.  They like appends, but not overwrites, for
example.

Hope this helps. (Understanding, that is -- I know it doesn't help solve
anything.)
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Daniel Pittman
Jake Anderson ya...@vapourforge.com writes:
 On 15/07/10 14:10, Matthew Hannigan wrote:
 On Wed, Jul 14, 2010 at 04:06:17PM +1000, Peter Chubb wrote:

 You could do this with inotify, with `just a few' scripts around it.

 Related: http://code.google.com/p/lsyncd/ drives rsyncing with inotify.

 Actually that looks like a fairly handy tool, I have been trying to work out
 the best way of keeping files in two offices in sync and drbd seemed like
 overkill

Keep in mind that using rsync like that has absolutely *zero* conflict
resolution support, so you are inviting the data-loss fairy to visit when
there are concurrent modifications.

DRBD, meanwhile, is useless without a cluster file-system on top of it, since
you otherwise can't mount the data at both sites at the same time.


Sadly, I can't right now advise a better solution than these, however, since
it is the main problem I face in trying to bridge two data-centers and provide
coherent and sensible file access.

The best I can offer, right now, is xtreemfs[1] which will give you fair
performance but no local caching, so no disconnected operation.

Regards,
Daniel

Footnotes: 
[1]  http://www.xtreemfs.org/

-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Nick Andrew
On Thu, Jul 15, 2010 at 04:14:38PM +1000, Daniel Pittman wrote:
 Sadly, I can't right now advise a better solution than these, however, since
 it is the main problem I face in trying to bridge two data-centers and provide
 coherent and sensible file access.

Try GlusterFS with mirroring and preference for the local filesystem.

Nick.
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Morgan Storey
I know it is a bit un-maintained but what about Unison

--
Regards
Morgan Storey

On Thu, Jul 15, 2010 at 4:14 PM, Daniel Pittman dan...@rimspace.net wrote:

 Jake Anderson ya...@vapourforge.com writes:
  On 15/07/10 14:10, Matthew Hannigan wrote:
  On Wed, Jul 14, 2010 at 04:06:17PM +1000, Peter Chubb wrote:
 
  You could do this with inotify, with `just a few' scripts around it.
 
  Related: http://code.google.com/p/lsyncd/ drives rsyncing with inotify.
 
  Actually that looks like a fairly handy tool, I have been trying to work
 out
  the best way of keeping files in two offices in sync and drbd seemed like
  overkill

 Keep in mind that using rsync like that has absolutely *zero* conflict
 resolution support, so you are inviting the data-loss fairy to visit when
 there are concurrent modifications.

 DRBD, meanwhile, is useless without a cluster file-system on top of it,
 since
 you otherwise can't mount the data at both sites at the same time.


 Sadly, I can't right now advise a better solution than these, however,
 since
 it is the main problem I face in trying to bridge two data-centers and
 provide
 coherent and sensible file access.

 The best I can offer, right now, is xtreemfs[1] which will give you fair
 performance but no local caching, so no disconnected operation.

 Regards,
Daniel

 Footnotes:
 [1]  http://www.xtreemfs.org/

 --
 ✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155
 707
   ♽ made with 100 percent post-consumer electrons
 --
 SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
 Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Daniel Pittman
Morgan Storey m...@morganstorey.com writes:

I bet the manual part of that synchronization doesn't win any points with
the users. :)

Daniel

 I know it is a bit un-maintained but what about Unison

 --
 Regards
 Morgan Storey

 On Thu, Jul 15, 2010 at 4:14 PM, Daniel Pittman dan...@rimspace.net wrote:

 Jake Anderson ya...@vapourforge.com writes:
  On 15/07/10 14:10, Matthew Hannigan wrote:
  On Wed, Jul 14, 2010 at 04:06:17PM +1000, Peter Chubb wrote:
 
  You could do this with inotify, with `just a few' scripts around it.
 
  Related: http://code.google.com/p/lsyncd/ drives rsyncing with inotify.
 
  Actually that looks like a fairly handy tool, I have been trying to work
 out
  the best way of keeping files in two offices in sync and drbd seemed like
  overkill

 Keep in mind that using rsync like that has absolutely *zero* conflict
 resolution support, so you are inviting the data-loss fairy to visit when
 there are concurrent modifications.

 DRBD, meanwhile, is useless without a cluster file-system on top of it,
 since
 you otherwise can't mount the data at both sites at the same time.


 Sadly, I can't right now advise a better solution than these, however,
 since
 it is the main problem I face in trying to bridge two data-centers and
 provide
 coherent and sensible file access.

 The best I can offer, right now, is xtreemfs[1] which will give you fair
 performance but no local caching, so no disconnected operation.

 Regards,
Daniel

 Footnotes:
 [1]  http://www.xtreemfs.org/

 --
 ✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155
 707
   ♽ made with 100 percent post-consumer electrons
 --
 SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
 Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Daniel Pittman
Nick Andrew n...@nick-andrew.net writes:
 On Thu, Jul 15, 2010 at 04:14:38PM +1000, Daniel Pittman wrote:

 Sadly, I can't right now advise a better solution than these, however,
 since it is the main problem I face in trying to bridge two data-centers
 and provide coherent and sensible file access.

 Try GlusterFS with mirroring and preference for the local filesystem.

It turns out this has a major problem, in the current iteration, for WAN use:

GlusterFS currently requires synchronous communication with all replicas for a
non-trivial number of operations, so you will pay a cross-WAN latency cost for
a whole bunch of (read-only) operations regardless — *and* have to wait for
the client to write over the WAN to the remote replica anyway.

(Also, AFR you can cause the server to consume unbounded memory and die by
 writing local data faster than the WAN can flush it, if you don't require it
 to be synchronous in writing to remote replicas.


RedHat are funding development of solid, WAN capable asynchronous replication
support, expected to land some time later this year, around the same time that
XtreemFS expect to have read/write mirroring of objects implemented.

Regards,
   Daniel

As it happens, I have just been investing non-trivial time investigating the
options available here. :)

-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Morgan Storey
What do you mean? it is a manual initial sync to get the files in sync (just
copy the different files either way and work out which ones you need to
trash or merge) Then startup your unison scripts and let the servers build
there indexes then sync.
It has a disadvantage of not being real time of course like a cluster fs,
but it can be slow wan friendly, secured over ssh, multiplatform (I've used
it to sync up a remote Windows box with Linux with another Remote windows
box).

--
Regards
Morgan Storey


On Thu, Jul 15, 2010 at 11:12 PM, Daniel Pittman dan...@rimspace.netwrote:

 Morgan Storey m...@morganstorey.com writes:

 I bet the manual part of that synchronization doesn't win any points with
 the users. :)

Daniel

  I know it is a bit un-maintained but what about Unison
 
  --
  Regards
  Morgan Storey
 
  On Thu, Jul 15, 2010 at 4:14 PM, Daniel Pittman dan...@rimspace.net
 wrote:
 
  Jake Anderson ya...@vapourforge.com writes:
   On 15/07/10 14:10, Matthew Hannigan wrote:
   On Wed, Jul 14, 2010 at 04:06:17PM +1000, Peter Chubb wrote:
  
   You could do this with inotify, with `just a few' scripts around it.
  
   Related: http://code.google.com/p/lsyncd/ drives rsyncing with
 inotify.
  
   Actually that looks like a fairly handy tool, I have been trying to
 work
  out
   the best way of keeping files in two offices in sync and drbd seemed
 like
   overkill
 
  Keep in mind that using rsync like that has absolutely *zero* conflict
  resolution support, so you are inviting the data-loss fairy to visit
 when
  there are concurrent modifications.
 
  DRBD, meanwhile, is useless without a cluster file-system on top of it,
  since
  you otherwise can't mount the data at both sites at the same time.
 
 
  Sadly, I can't right now advise a better solution than these, however,
  since
  it is the main problem I face in trying to bridge two data-centers and
  provide
  coherent and sensible file access.
 
  The best I can offer, right now, is xtreemfs[1] which will give you fair
  performance but no local caching, so no disconnected operation.
 
  Regards,
 Daniel
 
  Footnotes:
  [1]  http://www.xtreemfs.org/
 
  --
  ✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401
 155
  707
♽ made with 100 percent post-consumer electrons
  --
  SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
  Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
 

 --
 ✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155
 707
   ♽ made with 100 percent post-consumer electrons
 --
 SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
 Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Daniel Pittman
Morgan Storey m...@morganstorey.com writes:

 What do you mean? it is a manual initial sync to get the files in sync (just
 copy the different files either way and work out which ones you need to
 trash or merge) Then startup your unison scripts and let the servers build
 there indexes then sync.

Ah.  Yes, that is fair, you can do that.  What conflict resolution strategy
does it use in that mode?  I presume some variant on preserve the lot, which
is what I would use.

Um, and yeah: you are dead right it can do that effectively.
Daniel

-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Jamie Wilkinson
On 14 July 2010 23:14, Daniel Pittman dan...@rimspace.net wrote:


 Sadly, I can't right now advise a better solution than these, however,
 since
 it is the main problem I face in trying to bridge two data-centers and
 provide
 coherent and sensible file access.


I think you're going to be out of luck without some fat short pipes to
satisfy fast atomic commits to both sides.  The cloud way is to have your
applications understand there's a replication delay and know how to deal
with conflict resolutions, drop the atomicity and integrity constraints to
gain some speed.

I suspect from your mention of file access then you're not dealing with
*an* application, but *all of them* and your storage layer API is just
POSIX, in which case I wish you well in your pipe procurement endeavours.

Random tangential brainstorm: if your application knew that your POSIX
filesystem was being slowly replicated between two DCs, and knew to look in
*both* for the same data, and was robust enough to handle the loss of one
DC, then it ought to be able to pick up where it left off in the other DC
modulo some journalling.  Again I suspect this isn't going to help you in
the slightest, not knowing anything about your app :)
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Daniel Pittman
Jamie Wilkinson j...@spacepants.org writes:
 On 14 July 2010 23:14, Daniel Pittman dan...@rimspace.net wrote:

 Sadly, I can't right now advise a better solution than these, however,
 since it is the main problem I face in trying to bridge two data-centers
 and provide coherent and sensible file access.

 I think you're going to be out of luck without some fat short pipes to
 satisfy fast atomic commits to both sides.  The cloud way is to have your
 applications understand there's a replication delay and know how to deal
 with conflict resolutions, drop the atomicity and integrity constraints to
 gain some speed.

*nod*  Factors I am well aware of, but thank you for being explicit about
them.

 I suspect from your mention of file access then you're not dealing with
 *an* application, but *all of them* and your storage layer API is just
 POSIX, in which case I wish you well in your pipe procurement endeavours.

Our needs vary wildly; in some cases we do want POSIX style file access,
with a remote mirror for read-mostly speed-of-access purposes, or where we
have a write-mostly application inside the one data center, with fail-over
to the other.

In others we are quite happy with an eventual-consistency.  I mostly focused
on files here because suggesting that the OP investigate Reak or Cassandra
probably wouldn't fly when they wanted Microsoft Office to access it. ;)


 Random tangential brainstorm: if your application knew that your POSIX
 filesystem was being slowly replicated between two DCs, and knew to look in
 *both* for the same data, and was robust enough to handle the loss of one
 DC, then it ought to be able to pick up where it left off in the other DC
 modulo some journalling.  Again I suspect this isn't going to help you in
 the slightest, not knowing anything about your app :)

Actually, that is pretty much the model I expect we will use for the most
legacy application in our stack, in which we are probably stuck for the next
year or so with nothing but POSIX.

For the more agile applications my hope is that we can avoid that by, indeed,
having the applications aware of the replication issues and all, and using a
simple eventual-consistency or last-update-wins vector-clock approach.

Daniel

Also, lots of different apps, so I might well end up with multiple
solutions.  A good distributed POSIX FS with replication, eventual
consistency, some sensible conflict resolution model, and data center
awareness would have been easy enough to use though.

If I could have my pony. ;)

-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Nick Andrew
On Fri, Jul 16, 2010 at 12:31:42PM +1000, Daniel Pittman wrote:
 Also, lots of different apps, so I might well end up with multiple
 solutions.

This seems likely. Databases have different consistency requirements
to people.

 A good distributed POSIX FS with replication, eventual
 consistency, some sensible conflict resolution model, and data center
 awareness would have been easy enough to use though.

Conflict resolution is the problem. The less of that you want, the more
synchronous your filesystem has to become - or expose more non-POSIX
filesystem behaviour to applications.

Nick.
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Daniel Pittman
Nick Andrew n...@nick-andrew.net writes:
 On Fri, Jul 16, 2010 at 12:31:42PM +1000, Daniel Pittman wrote:

 Also, lots of different apps, so I might well end up with multiple
 solutions.

 This seems likely. Databases have different consistency requirements to
 people.

One of the attractions of Cassandra is that it allows the client to specify
the consistency level required, from none, through to every node ever, or
quorum, or whatever.

I need to look further at Riak to work out how well their model expresses the
same, although as they don't do cross-WAN out of the box it has a lesser
problem to contend with.


 A good distributed POSIX FS with replication, eventual consistency, some
 sensible conflict resolution model, and data center awareness would have
 been easy enough to use though.

 Conflict resolution is the problem. The less of that you want, the more
 synchronous your filesystem has to become - or expose more non-POSIX
 filesystem behaviour to applications.

*nod*  Very true.  I think, for most people, the Dropbox model of conflict
resolution would be great to have in a file system:

Find a conflict, generate two documents, one with each version.  Viola, you
just punted the hard problem up to a human.

Less good for machines, naturally, although a similar process can help.

Daniel
-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Morgan Storey
Yeah Unisons conflict resolution isn't great, it simply doesn't replicate
conflicting files when run as a script in batch mode, you can run it in user
mode and it will warn on conflicts prompting you on which to keep. I would
prefer it copy the file to another directory that gets replicated something
like replicaroot/servername/path/to/file/filename, but my use for Unison is
only for fail over access to these files so conflicts should be minimal.
I get around all of this making it a bit more robust by running a cron that
md5deep's the roots at each of the servers then verifying the md5s are right
on all servers, then emails me failures.

--
Regards
Morgan Storey


On Thu, Jul 15, 2010 at 11:37 PM, Daniel Pittman dan...@rimspace.netwrote:

 Morgan Storey m...@morganstorey.com writes:

  What do you mean? it is a manual initial sync to get the files in sync
 (just
  copy the different files either way and work out which ones you need to
  trash or merge) Then startup your unison scripts and let the servers
 build
  there indexes then sync.

 Ah.  Yes, that is fair, you can do that.  What conflict resolution strategy
 does it use in that mode?  I presume some variant on preserve the lot,
 which
 is what I would use.

 Um, and yeah: you are dead right it can do that effectively.
Daniel

 --
 ✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155
 707
   ♽ made with 100 percent post-consumer electrons
 --
 SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
 Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-15 Thread Jake Anderson

On 15/07/10 16:14, Daniel Pittman wrote:


Keep in mind that using rsync like that has absolutely *zero* conflict
resolution support, so you are inviting the data-loss fairy to visit when
there are concurrent modifications.

DRBD, meanwhile, is useless without a cluster file-system on top of it, since
you otherwise can't mount the data at both sites at the same time.


Sadly, I can't right now advise a better solution than these, however, since
it is the main problem I face in trying to bridge two data-centers and provide
coherent and sensible file access.

The best I can offer, right now, is xtreemfs[1] which will give you fair
performance but no local caching, so no disconnected operation.

Regards,
 Daniel

Footnotes:
[1]  http://www.xtreemfs.org/

   

We cant be the first people to come across this branch office scenario.
My goal is to have the branch office get a copy of all the files (think 
MS office) without hitting performance at either end.
something like this rsync thing, with a distributed lock manager would 
be the solution to 99% of the problem


The only problem I can see is if person A in Newcastle wants person B in 
Sydney to look at their file, they press save and then person B opens it 
before the lazy copy has moved it over, Perhaps maintain a write lock on 
the file until its synched? with user definable behaviour in the case of 
failure.


At the moment the branch office is going to be working over a VPN back 
to the main office, with all the files etc sitting inside VM's, the 
images of the VM's will get rsynced nightly.

Which all in all is a fairly craptacular solution to be honest.

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-14 Thread Del

Jeff Waugh wrote:

quote who=Del


Someone asked me today, as they often ask me about things Linux, if I had
a Linux replacement for their favourite journal app that they run on
their (windows) PC.  I asked what that journal app did, and was told:

You can set it to track when you open files of various types [in other
applications] and how long they are open for..  Further quizzing revealed
that you can set it to record when those files were opened, saved, closed,
and when and where any saved and backup copies were stored.

I mentioned the security impacts of such an application, or even the fact
that such an application was possible, and left it at that.


Look around for Zeitgeist. :-)


Good point, but quite different.  It's a D-BUS based data logger which apps can choose to 
publish their information to.  In a way it's not unlike syslog.


So my OpenOffice.org calc program can choose to tell zeitgeist Del opened file X on his 
system.  zeitgeist doesn't interrupt OpenOffice.org calc's system calls to find out what files 
are being opened (and potentially dumping copies of those files to an IRC channel to be picked 
up by a botnet operating out of frangipangiland) without OpenOffice.org knowing about it.


--
Del
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-14 Thread Daniel Pittman
Jeff Waugh j...@perkypants.org writes:
 quote who=Del

 Someone asked me today, as they often ask me about things Linux, if I had a
 Linux replacement for their favourite journal app that they run on their
 (windows) PC.  I asked what that journal app did, and was told:

 You can set it to track when you open files of various types [in other
 applications] and how long they are open for..  Further quizzing revealed
 that you can set it to record when those files were opened, saved, closed,
 and when and where any saved and backup copies were stored.

Wow.  What a useful tool for tracking what you do!

 I mentioned the security impacts of such an application, or even the fact
 that such an application was possible, and left it at that.

 Look around for Zeitgeist. :-)

...or snapshot 'ls -l /proc/[0-9]*/fd/' on a regular basis, or better still
use one of the task notification hooks that I understand are floating
around[1] to capture task creation and exit automatically.

Also, what on earth security implications did you see, Del, in being able to
see what files you have opened yourself?  It isn't like your applications
couldn't record this anyhow...

Daniel

Heck, my Emacs does keep a long history of the files I have opened, since I
like to be able to do backward-isearch in an LRU list to get at things I
worked on in the last few days...

Footnotes: 
[1]  ...in that I have seen occasional discussion of 'em on the kernel list,
 so presume they have floated out to have a user-space interface by this
 point, but no nothing beyond that.

-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-14 Thread Peter Chubb
 Del == Del  d...@babel.com.au writes:

Del Jeff Waugh wrote:
 quote who=Del
 
 Someone asked me today, as they often ask me about things Linux,
 if I had a Linux replacement for their favourite journal app
 that they run on their (windows) PC.  I asked what that journal
 app did, and was told:
 
 You can set it to track when you open files of various types [in
 other applications] and how long they are open for..  Further
 quizzing revealed that you can set it to record when those files
 were opened, saved, closed, and when and where any saved and
 backup copies were stored.

You could do this with inotify, with `just a few' scripts around it.

Peter C--
Dr Peter Chubb  peter DOT chubb AT nicta.com.au
http://www.ertos.nicta.com.au   ERTOS within National ICT Australia
All things shall perish from under the sky/Music alone shall live, never to die
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-14 Thread Jamie Wilkinson
The equivalent on MacOS is Time Machine, as I understand it (which is not
very much as I don't understand Macs at all), but I'm not aware of any Linux
application that does this either.  I like Peter's idea of using inotify
though, you could whip up a 10 liner with the python language bindings to
record all file accesses in under an hour.

jdub, zeitgeist is a terrible project name for them, too many better things
with that name for it to get a page one ranking :)

On 13 July 2010 22:19, Del d...@babel.com.au wrote:


 Someone asked me today, as they often ask me about things Linux, if I had a
 Linux replacement for their favourite journal app that they run on their
 (windows) PC.  I asked what that journal app did, and was told:

 You can set it to track when you open files of various types [in other
 applications] and how long they are open for..  Further quizzing revealed
 that you can set it to record when those files were opened, saved, closed,
 and when and where any saved and backup copies were stored.

 I mentioned the security impacts of such an application, or even the fact
 that such an application was possible, and left it at that.

 --
 Del

 --
 SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
 Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-14 Thread Lindsay Holmwood
On 15 July 2010 02:10, Jamie Wilkinson j...@spacepants.org wrote:
 The equivalent on MacOS is Time Machine, as I understand it (which is not
 very much as I don't understand Macs at all), but I'm not aware of any Linux
 application that does this either.  I like Peter's idea of using inotify
 though, you could whip up a 10 liner with the python language bindings to
 record all file accesses in under an hour.


Dirvish[0] is vaguely equivalent to Time Machine.

[0] http://www.dirvish.org/

Lindsay

-- 
w: http://holmwood.id.au/~lindsay/
t: @auxesis
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-14 Thread Matthew Hannigan
On Wed, Jul 14, 2010 at 04:06:17PM +1000, Peter Chubb wrote:
 
 You could do this with inotify, with `just a few' scripts around it.

Related: http://code.google.com/p/lsyncd/ drives rsyncing with inotify.


-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-14 Thread Jake Anderson

On 15/07/10 14:10, Matthew Hannigan wrote:

On Wed, Jul 14, 2010 at 04:06:17PM +1000, Peter Chubb wrote:
   

You could do this with inotify, with `just a few' scripts around it.
 

Related: http://code.google.com/p/lsyncd/ drives rsyncing with inotify.

   
Actually that looks like a fairly handy tool, I have been trying to work 
out the best way of keeping files in two offices in sync and drbd seemed 
like overkill

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


[SLUG] today's scary thought

2010-07-13 Thread Del


Someone asked me today, as they often ask me about things Linux, if I had a Linux replacement 
for their favourite journal app that they run on their (windows) PC.  I asked what that 
journal app did, and was told:


You can set it to track when you open files of various types [in other applications] and how 
long they are open for..  Further quizzing revealed that you can set it to record when those 
files were opened, saved, closed, and when and where any saved and backup copies were stored.


I mentioned the security impacts of such an application, or even the fact that such an 
application was possible, and left it at that.


--
Del
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] today's scary thought

2010-07-13 Thread Jeff Waugh
quote who=Del

 Someone asked me today, as they often ask me about things Linux, if I had
 a Linux replacement for their favourite journal app that they run on
 their (windows) PC.  I asked what that journal app did, and was told:
 
 You can set it to track when you open files of various types [in other
 applications] and how long they are open for..  Further quizzing revealed
 that you can set it to record when those files were opened, saved, closed,
 and when and where any saved and backup copies were stored.
 
 I mentioned the security impacts of such an application, or even the fact
 that such an application was possible, and left it at that.

Look around for Zeitgeist. :-)

- Jeff

-- 
Ubuntu's Bleeding Edge  http://ubuntuedge.wordpress.com/
 
  Acts of random.
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html