Re: A couple thousand mp3 files (this is not spam I swear )
On Wed, Aug 17, 2016 at 9:09 PM, Adam Jensenwrote: > On 08/17/2016 04:36 PM, Johan Corveleyn wrote: > > On Wed, Aug 17, 2016 at 9:13 PM, Adam Jensen wrote: > [snip] > >> So basically, the checkout method will require twice (2x) the data-set > >> size of storage space for a working copy but there would be > >> significantly less network load during many of the branch switches. The > >> export method pretty much has the opposite storage/network trade-off. > > > > I guess you'd need this (very old) feature request to be implemented: > > > > https://issues.apache.org/jira/browse/SVN-525 (allow working copies > > without text-base/) > > Nice reference, thanks! > > Wow, that feature was requested during 2001. > > What I need (and what I think is generally needed) is a high-capacity, > large-file repository with a focus on data integrity (mandatory audit > trails), sophisticated access control (smart contracts (maybe blockchain > based)), probably (almost certainly) an encrypted file-system, and > distribution/replication (that is maybe torrent based). Files in this > type of system might need to be deleted but they wouldn't be revised. > This would not be a revision management system. You are probably already aware of this, but on the server side, the repository is essentially an opaque database. You will not see anything resembling your files directly on the server filesystem. So unless you build your own custom code that used the repository API layer you can only interact with the content of your repository using a SVN client of some kind. -- Thanks Mark Phippard http://markphip.blogspot.com/
RE: A couple thousand mp3 files (this is not spam I swear )
Hi, While I fully agree with Stefan, I would like to mention that we use svn in a non-standard way. We think still with very good performance and use case coverage. It is not for source code. It is technical sales project management. Sharing all kind of doks jpegs,mp3,mp4,pdf,word,... We successfully have ~3.000.000 Files in our repository. ~250 (commiters) ~1500 read only users since 2008. We use tortoisesvn for most use cases. In addition to standard svn functionality, we created an SOLR based application for PowerPoint slide search and full text search. The reasons to decide were: a) SVN provides versioning (hey I am able to recover my presentation) b) provides Path based authorization with Active Directory integration (as a large company we have to care about ip) c) simple ui (tortoise) (even Sales can use it ;-)) d) checkout/check-in for offline usage is simple... (Good on customer site where we have no access to network) e) Very good storage and network efficiency. (Good for home office/Hotel guys) f) stable and reliable (we can use it 24/7 99% of the time) g) active community (we can ask someone for help - and get it fast) h) free and open source :-) - no compliance issues i) multisite is supported, several multisite solutions exist. But keep in mind svn is not a distributed scm tool. (we thought we might need it, but actually not implemented it, just for speed of recovery we have a 2nd server that mirrors the data) j) updates were smooth we started with svn 1.3 - and are now on 1.9 (no big invest in upgrades ... my boss is happy) SVN was the tool that fitted best the needs of our project teams. And even better we have 2 part time admin (~10% of the effective Time) for the system. However, Svn has some "weaknesses' you should be aware of. OOTB it is not trivial/fast to query a file by properties like in Typical DMS. Backing up such large repositories through a dump is not feasible. Other means like btrfs/lvm2 should be put in place. SVN OOTB does not provide persistent encryption. You could encrypt during a commit by 3rd party tools. Transport encryption (https) is supported. And finally I would like to add that svn is not a file system, despite the fact that svn provides webdav capabilities. As Stefan stated, it is a scm tool :-) and it does the job really well. Regards Thomas -Original Message- From: Stefan Sperling [mailto:s...@elego.de] Sent: Donnerstag, 18. August 2016 09:46 To: Adam Jensen <han...@riseup.net> Cc: users@subversion.apache.org Subject: Re: A couple thousand mp3 files (this is not spam I swear ) On Wed, Aug 17, 2016 at 09:09:27PM -0400, Adam Jensen wrote: > What I need (and what I think is generally needed) is a high-capacity, > large-file repository with a focus on data integrity (mandatory audit > trails), sophisticated access control (smart contracts (maybe > blockchain based)), probably (almost certainly) an encrypted > file-system, and distribution/replication (that is maybe torrent > based). Files in this type of system might need to be deleted but they > wouldn't be revised. > This would not be a revision management system. > > I'm not sure how much of Subversion could be used/leveraged to build > such a system. Indeed it won't. I believe you should use something else for this job. Not tracking changes contradicts a core requirement SVN was built for. > At a minimum, it seems like it would involve a project fork and > serious gutting and refactoring of the code-base after rethinking the > basic principles, specifying the new requirements, and devising the > new architecture. (And definitely a name change ). You're free to use our code in whatever way you wish. And we're always open to patches, of course. But keep in mind that the code base is 16 years old and widely deployed. Adding new features was easy in the early stages of development but is getting increasingly hard because of growing complexity and very strict reliability requirements imposed by our user base. And we can't sever our roots: """ Apache Subversion is a full-featured version control system originally designed to be a better CVS. Subversion has since expanded beyond its original goal of replacing CVS, but its basic model, design, and interface remain heavily influenced by that goal. Even today, Subversion should still feel very familiar to CVS users. """ http://subversion.apache.org/features.html So if you're really going to write a new piece of software for this you'll be much happier starting a new project from scratch rather than using SVN as a base. - Siemens Industry Software GmbH; Anschrift: Franz-Geuer-Str. 10, 50823 Köln; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Urban August, Daniel Trebes; Sitz der Gesellschaft: Köln; Registergericht: Amtsgericht Köln, HRB 84564
Re: A couple thousand mp3 files (this is not spam I swear )
On Wed, Aug 17, 2016 at 09:09:27PM -0400, Adam Jensen wrote: > What I need (and what I think is generally needed) is a high-capacity, > large-file repository with a focus on data integrity (mandatory audit > trails), sophisticated access control (smart contracts (maybe blockchain > based)), probably (almost certainly) an encrypted file-system, and > distribution/replication (that is maybe torrent based). Files in this > type of system might need to be deleted but they wouldn't be revised. > This would not be a revision management system. > > I'm not sure how much of Subversion could be used/leveraged to build > such a system. Indeed it won't. I believe you should use something else for this job. Not tracking changes contradicts a core requirement SVN was built for. > At a minimum, it seems like it would involve a project > fork and serious gutting and refactoring of the code-base after > rethinking the basic principles, specifying the new requirements, and > devising the new architecture. (And definitely a name change ). You're free to use our code in whatever way you wish. And we're always open to patches, of course. But keep in mind that the code base is 16 years old and widely deployed. Adding new features was easy in the early stages of development but is getting increasingly hard because of growing complexity and very strict reliability requirements imposed by our user base. And we can't sever our roots: """ Apache Subversion is a full-featured version control system originally designed to be a better CVS. Subversion has since expanded beyond its original goal of replacing CVS, but its basic model, design, and interface remain heavily influenced by that goal. Even today, Subversion should still feel very familiar to CVS users. """ http://subversion.apache.org/features.html So if you're really going to write a new piece of software for this you'll be much happier starting a new project from scratch rather than using SVN as a base.
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/17/2016 04:36 PM, Johan Corveleyn wrote: > On Wed, Aug 17, 2016 at 9:13 PM, Adam Jensenwrote: [snip] >> So basically, the checkout method will require twice (2x) the data-set >> size of storage space for a working copy but there would be >> significantly less network load during many of the branch switches. The >> export method pretty much has the opposite storage/network trade-off. > > I guess you'd need this (very old) feature request to be implemented: > > https://issues.apache.org/jira/browse/SVN-525 (allow working copies > without text-base/) Nice reference, thanks! Wow, that feature was requested during 2001. What I need (and what I think is generally needed) is a high-capacity, large-file repository with a focus on data integrity (mandatory audit trails), sophisticated access control (smart contracts (maybe blockchain based)), probably (almost certainly) an encrypted file-system, and distribution/replication (that is maybe torrent based). Files in this type of system might need to be deleted but they wouldn't be revised. This would not be a revision management system. I'm not sure how much of Subversion could be used/leveraged to build such a system. At a minimum, it seems like it would involve a project fork and serious gutting and refactoring of the code-base after rethinking the basic principles, specifying the new requirements, and devising the new architecture. (And definitely a name change ). It's not the Pyramid of Khufu big, or the Panama Canal big, but it would be a big project. For now, I still think I can use Subversion as the file repository in a limited capability, ad-hoc implementation of a small demonstration-of-concept instrumentation and analysis system.
Re: A couple thousand mp3 files (this is not spam I swear )
On Wed, Aug 17, 2016 at 9:13 PM, Adam Jensenwrote: > On 08/17/2016 12:55 AM, Ryan Schmidt wrote: > [snip] >> He means avoid the 2x disk use by using "svn export" instead of "svn >> checkout". >> > [snip] >> >> Of course Subversion only transfers changes. >> > > Situation summary for the many-large-files scenario. Something like: > > svn checkout svn://URL/ProjX/branches/profile1 ~/test > cd ~/test > svn switch ^/branches/profile2 > > the branch switch will work nicely with a many-large-files data-set, > only transferring the files necessary to complete the new profile. But > the storage requirements for the working copy (~/test) is twice (2x) the > size of the checked-out data-set. > > Alternatively, using the export method: > > svn export svn://URL/ProjX/branches/profile1 ~/test > > will transfer all of the files of profile1 and the storage requirements > for the working copy is only (1x) the size of the data-set. But > switching between branches is not available in the export method. > > To switch branches (using the export method): > > svn export svn://URL/ProjX/branches/profile1 ~/test > (transfers all files in profile1) > > rm -rf ~/test > svn export svn://URL/ProjX/branches/profile2 ~/test > (transfers all files in profile2) > > So basically, the checkout method will require twice (2x) the data-set > size of storage space for a working copy but there would be > significantly less network load during many of the branch switches. The > export method pretty much has the opposite storage/network trade-off. I guess you'd need this (very old) feature request to be implemented: https://issues.apache.org/jira/browse/SVN-525 (allow working copies without text-base/) Most of the discussion in the issue tracker is rather old (and refers to the old pre-1.7 working copy format). But I suppose in the post-1.7 era this is still a big undertaking should someone decide to try and implement this. Help is always welcome of course. There is also, slightly related, this one, but I guess that wouldn't help in your case, as your files are not compressible: https://issues.apache.org/jira/browse/SVN-908 (Store text-base compressed) -- Johan
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/17/2016 12:55 AM, Ryan Schmidt wrote: [snip] > He means avoid the 2x disk use by using "svn export" instead of "svn > checkout". > [snip] > > Of course Subversion only transfers changes. > Situation summary for the many-large-files scenario. Something like: svn checkout svn://URL/ProjX/branches/profile1 ~/test cd ~/test svn switch ^/branches/profile2 the branch switch will work nicely with a many-large-files data-set, only transferring the files necessary to complete the new profile. But the storage requirements for the working copy (~/test) is twice (2x) the size of the checked-out data-set. Alternatively, using the export method: svn export svn://URL/ProjX/branches/profile1 ~/test will transfer all of the files of profile1 and the storage requirements for the working copy is only (1x) the size of the data-set. But switching between branches is not available in the export method. To switch branches (using the export method): svn export svn://URL/ProjX/branches/profile1 ~/test (transfers all files in profile1) rm -rf ~/test svn export svn://URL/ProjX/branches/profile2 ~/test (transfers all files in profile2) So basically, the checkout method will require twice (2x) the data-set size of storage space for a working copy but there would be significantly less network load during many of the branch switches. The export method pretty much has the opposite storage/network trade-off.
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/16/2016 09:17 AM, Stefan Hett wrote: > On 8/13/2016 2:56 AM, Adam Jensen wrote: >> My primary concerns are related to any potential file corruption, any >> data duplication, and/or any excessive network or disk I/O (other than >> the expected load of direct data communication). > > Just to have this mentioned: Be aware that the working copy (aka: the > checked out data of the repository) will have a 2x storage requirement > on the data since it will keep a copy of the pristine version of the > file in addition to the "actual" file. > If this is a concern for your use-case, you could export the files and > only use a working copy in cases where you need to commit or reorder files. > > To clarify: This is purely a client side storage requirement. It does > not apply to the storage requirements on the server side. > Wow, I totally misinterpreted that during the first reading. After some tinkering this morning (and reading Ryan's email) I think I see the light ;) svn checkout svn://minerva.bohemia.net/Project_Prometheus/trunk cotest du -sh cotest 104Mcotest svn export svn://minerva.bohemia.net/Project_Prometheus/trunk extest du -sh extest 52Mextest
Re: A couple thousand mp3 files (this is not spam I swear )
> On Aug 16, 2016, at 2:13 PM, Adam Jensenwrote: > > On 08/16/2016 09:17 AM, Stefan Hett wrote: >> Just to have this mentioned: Be aware that the working copy (aka: the >> checked out data of the repository) will have a 2x storage requirement >> on the data since it will keep a copy of the pristine version of the >> file in addition to the "actual" file. > > >> If this is a concern for your use-case, you could export the files and >> only use a working copy in cases where you need to commit or reorder files. > > By "export the files" do you mean something like an NFS share of the > repository, thus bypassing svnserve and the check-in/check-out process? > That seems like a clever possibility worth remembering, but for now the > system I am currently building/imagining is headed in a different direction. He means avoid the 2x disk use by using "svn export" instead of "svn checkout". >> To clarify: This is purely a client side storage requirement. It does >> not apply to the storage requirements on the server side. > > To reduce network load, are there any client-side caching options for > Subversion? Does the svn program account for the files already in the > working copy (on the local disk) and avoid transferring those files over > the network during a subsequent check-out [that requires those files]? Of course Subversion only transfers changes. > Is it possible to clone or mirror all or part of a Subversion repository? svnsync > This probably isn't relevant to Subversion, but in the > system I am imagining it might be reasonable for clients to check-out > data-sets via torrent connections with other full/partial repositories.
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/16/2016 09:17 AM, Stefan Hett wrote: > Just to have this mentioned: Be aware that the working copy (aka: the > checked out data of the repository) will have a 2x storage requirement > on the data since it will keep a copy of the pristine version of the > file in addition to the "actual" file. The type of system that I am imagining might typically have several terabytes of instrumentation data in a repository[1]. Various client machines might need to check-out a few gigabytes or a few hundred gigabytes at a time to run data analysis (automated compute jobs) or to perform a study (scientist/human-interest). [1]: Version control isn't a requirement in this use-case/hypothetical-system. Sophisticated access control is much more of a concern. Mandatory audit trails and distributed contract based data handling are examples of more relevant architectural characteristics. I am currently looking at the possibility of using Subversion (in a non-traditional, off-label fashion) to bootstrap a [very] simplified demonstration-of-concept type of setup. My current data-set is only about 25GB and growing at a rate of about 1GB/week. A desktop server and laptop client shouldn't have any storage space problems (in this case as a small demonstration system). > If this is a concern for your use-case, you could export the files and > only use a working copy in cases where you need to commit or reorder files. By "export the files" do you mean something like an NFS share of the repository, thus bypassing svnserve and the check-in/check-out process? That seems like a clever possibility worth remembering, but for now the system I am currently building/imagining is headed in a different direction. > To clarify: This is purely a client side storage requirement. It does > not apply to the storage requirements on the server side. To reduce network load, are there any client-side caching options for Subversion? Does the svn program account for the files already in the working copy (on the local disk) and avoid transferring those files over the network during a subsequent check-out [that requires those files]? Is it possible to clone or mirror all or part of a Subversion repository? This probably isn't relevant to Subversion, but in the system I am imagining it might be reasonable for clients to check-out data-sets via torrent connections with other full/partial repositories.
Re: A couple thousand mp3 files (this is not spam I swear )
Hi, On 8/13/2016 2:56 AM, Adam Jensen wrote: My primary concerns are related to any potential file corruption, any data duplication, and/or any excessive network or disk I/O (other than the expected load of direct data communication). Just to have this mentioned: Be aware that the working copy (aka: the checked out data of the repository) will have a 2x storage requirement on the data since it will keep a copy of the pristine version of the file in addition to the "actual" file. If this is a concern for your use-case, you could export the files and only use a working copy in cases where you need to commit or reorder files. To clarify: This is purely a client side storage requirement. It does not apply to the storage requirements on the server side. -- Regards, Stefan Hett
Re: A couple thousand mp3 files (this is not spam I swear )
On 8/14/2016 1:22 PM, Adam Jensen wrote: On 08/13/2016 09:33 PM, Branko Čibej wrote: But note that a rename is represented as an add+delete, so the hook would have to be rather smarter than it would seem at first glance to detect and allow renames without content modification. The literal file names are composed of a date and a sequence number, and like the contents of the files, the names should never change. The core data-set directory structure (trunk, maybe) will most likely be calendar-like (years->months->days->sequence->file). The analysis tools and meta-data will probably be kept in a separate fossil[1] repository. [1]: http://www.fossil-scm.org/ The near-term goal is to maintain an indelible record of the physical measurements of reality. Any analysis [of which, there will be plenty], annotations, and other meta-data generation must not alter the fundamental instrumentation data. Given that, by "rename" do you mean a change of the literal file name like what I tried to describe above, or are you referring to something more like the file references, links, or pointers within the repository [internal implementation], similar to David's use of the term "rename" (included below)? Yes, he and I are referring to the same thing. The file contents are not copied, which is your primary goal. -- David Chapman dcchap...@acm.org Chapman Consulting -- San Jose, CA Software Development Done Right. www.chapman-consulting-sj.com
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/13/2016 09:33 PM, Branko Čibej wrote: > There is (currently) no easy way to specify "write once" access for > files in the repository; whoever can create a file can modify or delete > it, too. You could achieve something like that by creating a custom > pre-commit hook that would examine the pending commit transaction and > reject the commit if it finds modifications or deletions of existing files Thanks. I have svnserve configured and ready for a few tests that should enable basic characterization of the system in this use-case. I suppose I could concurrently develop familiarity with the command and operation of this setup while I explore customization of its functioning. Tonight is for contemplation but I imagine testing and hook scripting will begin within the next few days. Does anyone have any comments on using Tcl as a hook scripting language? > But note that a rename is represented as an add+delete, so the hook > would have to be rather smarter than it would seem at first glance to > detect and allow renames without content modification. The literal file names are composed of a date and a sequence number, and like the contents of the files, the names should never change. The core data-set directory structure (trunk, maybe) will most likely be calendar-like (years->months->days->sequence->file). The analysis tools and meta-data will probably be kept in a separate fossil[1] repository. [1]: http://www.fossil-scm.org/ The near-term goal is to maintain an indelible record of the physical measurements of reality. Any analysis [of which, there will be plenty], annotations, and other meta-data generation must not alter the fundamental instrumentation data. Given that, by "rename" do you mean a change of the literal file name like what I tried to describe above, or are you referring to something more like the file references, links, or pointers within the repository [internal implementation], similar to David's use of the term "rename" (included below)? On 08/13/2016 02:21 PM, David Chapman wrote: > On 8/13/2016 11:07 AM, Adam Jensen wrote: >> When a branch is created, are the files under revision control in the >> trunk copied to the branch (is there any duplication of files in the >> repository)? > > No, the files are not copied; a rename is stored. These are "cheap > copies", and this is an advantage over simple backups - if you want to > save history using backups (per another suggestion), you need to retain > one backup per significant event. That can add up.
Re: A couple thousand mp3 files (this is not spam I swear )
On 14.08.2016 00:20, Adam Jensen wrote: > What would an "svnserve.conf" file with "write once" access control look like? There is (currently) no easy way to specify "write once" access for files in the repository; whoever can create a file can modify or delete it, too. You could achieve something like that by creating a custom pre-commit hook that would examine the pending commit transaction and reject the commit if it finds modifications or deletions of existing files But note that a rename is represented as an add+delete, so the hook would have to be rather smarter than it would seem at first glance to detect and allow renames without content modification. -- Brane
Re: A couple thousand mp3 files (this is not spam I swear )
On 13.08.2016 20:21, David Chapman wrote: > On 8/13/2016 11:07 AM, Adam Jensen wrote: >> On 08/12/2016 08:56 PM, Adam Jensen wrote: >>> Here's the situation: I have ~1500 mp3 files (not pirated music), and >>> the collection is growing. The sizes range from ~100kB to ~300MB. The >>> content of these files will never change. The directory structure will >>> change, files will be moved, and new (additional) [mp3] files will >>> be added. >> When a branch is created, are the files under revision control in the >> trunk copied to the branch (is there any duplication of files in the >> repository)? >> >> > > No, the files are not copied; a rename is stored. These are "cheap > copies", and this is an advantage over simple backups - if you want to > save history using backups (per another suggestion), you need to > retain one backup per significant event. That can add up. > > Subversion is most often used to store text files because it stores > intra-file deltas when content is modified. Your use case is unusual, > but as long as you don't make a lot of changes to the binary files, it > will be efficient. Subversion uses binary deltas to store differences between files. It's essentially irrelevant if the files are text or something else, *unless* they're compressed (or encrypted) -- deltas between compressed versions of files with even minor differences are usually quite large, so in such cases Subversion may end up storing the complete file contents. (MP3 files are compressed audio, which is why I bring this up.) In any case, Subversion handles binary files just fine. The tree reorganizations you mention will cause minimal storage overhead. -- Brane
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/13/2016 08:09 AM, Branko Čibej wrote: > On 13.08.2016 02:56, Adam Jensen wrote: >> I sent this text (^above^) to users-subscr...@subversion.apache.org >> earlier today assuming that one need not be subscribed to post. This is >> a post-subscribe re-post. >> >> {https://subversion.apache.org/mailing-lists.html}::("you don't need to >> be subscribed to post") > > > Indeed you don't need to be subscribed to post, but the users-subscribe@ > address doesn't post messages to the users@ list. :) I think the original message was in fact sent to users@ and the mention (^above^) of users-subscribe@ was an independent copy and paste error. The original post just showed up on: http://mail-archives.apache.org/mod_mbox/subversion-users/201608.mbox/browser Specifically: http://mail-archives.apache.org/mod_mbox/subversion-users/201608.mbox/%3C57AE4DA6.9020106%40riseup.net%3E
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/13/2016 05:31 PM, Nico Kadel-Garcia wrote: > Don't hurt yourself getting too clever. And don't forget that once > ingested, Subversion is designed to *never let go* of content. > Deleting any in the master simply won't ever clear the content from > the core repository and its history, *ever*. > > Why do I bring this up? Because if it's MP3's and you discover a > copyright violation, you cannot expunge the content without a *very* > painful dump, iflter, and reload operation on a quite large > repository. The mp3 files, in this case, are indeed audio recordings but they represent an aspect of an expanding instrumentation system that is part of a specific [controlled ownership] environment. I don't think I have overlooked any relevant copyright issues related to the storage of the data. An indelible record is the goal (for now). >> Since, in my case, the binary files should/must never change, is there a >> way to configure a read-only attribute on specific files in the >> repository such that any subsequent attempt to check-in a change to any >> of those files will be rejected and an alert raised? The directory >> structures should remain changeable. > > That's what user privileges in "svnserve.conf" that would provide > "write once" control are designed for. It would *not* prevent > operations on the local file system by an administrator, using > "file:///" based access. I assume that if an admin performs that kind > of access, they mean what they're doing and are aware they're avoiding > the filters. What would an "svnserve.conf" file with "write once" access control look like? The book[1] shows something like this: -- [general] password-db = userfile realm = example realm # anonymous users aren't allowed anon-access = none # authenticated users can both read and write auth-access = write -- [1]: http://svnbook.red-bean.com/nightly/en/svn.serverconfig.svnserve.html#svn.serverconfig.svnserve.auth.general I intend to host this repository from a FreeBSD-10.3(x86) system where `svnserve --version` is: -- svnserve, version 1.9.4 (r1740329) compiled Jul 28 2016, 02:56:08 on i386-portbld-freebsd10.1 Copyright (C) 2016 The Apache Software Foundation. This software consists of contributions made by many people; see the NOTICE file for more information. Subversion is open source software, see http://subversion.apache.org/ The following repository back-end (FS) modules are available: * fs_fs : Module for working with a plain file (FSFS) repository. * fs_x : Module for working with an experimental (FSX) repository. -- The manual page on that system `man svnserve.conf` describes: auth-access = none|read|write
Re: A couple thousand mp3 files (this is not spam I swear )
On Sat, Aug 13, 2016 at 3:29 PM, Adam Jensenwrote: > On 08/13/2016 02:21 PM, David Chapman wrote: >> On 8/13/2016 11:07 AM, Adam Jensen wrote: >>> When a branch is created, are the files under revision control in the >>> trunk copied to the branch (is there any duplication of files in the >>> repository)? >> >> No, the files are not copied; a rename is stored. These are "cheap >> copies", and this is an advantage over simple backups - if you want to >> save history using backups (per another suggestion), you need to retain >> one backup per significant event. That can add up. > Thanks! That's a critical issue for my case where there is a large & > growing core data-set and where it might be useful to have hundreds of > branches, each representing a particular configuration of a subset, > slice, or view of the core data-set. Don't hurt yourself getting too clever. And don't forget that once ingested, Subversion is designed to *never let go* of content. Deleting any in the master simply won't ever clear the content from the core repository and its history, *ever*. Why do I bring this up? Because if it's MP3's and you discover a copyright violation, you cannot expunge the content without a *very* painful dump, iflter, and reload operation on a quite large repository. > Since, in my case, the binary files should/must never change, is there a > way to configure a read-only attribute on specific files in the > repository such that any subsequent attempt to check-in a change to any > of those files will be rejected and an alert raised? The directory > structures should remain changeable. That's what user privileges in "svnserve.conf" that would provide "write once" control are designed for. It would *not* prevent operations on the local file system by an administrator, using "file:///" based access. I assume that if an admin performs that kind of access, they mean what they're doing and are aware they're avoiding the filters. I also think you're being really optimistic about "my repository will only grow and never need to actually clear content". Accidental submission of copyright violating content would be my big worry. But I tend to pay more attention to that kind of concern than most.
Re: A couple thousand mp3 files (this is not spam I swear )
On 8/13/2016 12:29 PM, Adam Jensen wrote: On 08/13/2016 02:21 PM, David Chapman wrote: On 8/13/2016 11:07 AM, Adam Jensen wrote: When a branch is created, are the files under revision control in the trunk copied to the branch (is there any duplication of files in the repository)? No, the files are not copied; a rename is stored. These are "cheap copies", and this is an advantage over simple backups - if you want to save history using backups (per another suggestion), you need to retain one backup per significant event. That can add up. Thanks! That's a critical issue for my case where there is a large & growing core data-set and where it might be useful to have hundreds of branches, each representing a particular configuration of a subset, slice, or view of the core data-set. Subversion is most often used to store text files because it stores intra-file deltas when content is modified. Your use case is unusual, but as long as you don't make a lot of changes to the binary files, it will be efficient. Thanks [again] for the [vindicating] confirmation. I am inspired to set up a test case and explore this approach further :) Since, in my case, the binary files should/must never change, is there a way to configure a read-only attribute on specific files in the repository such that any subsequent attempt to check-in a change to any of those files will be rejected and an alert raised? The directory structures should remain changeable. I don't know about an attribute, but you could define a hook script that would check the files being committed to ensure that no existing large binary files are being modified. I haven't done any work with hook scripts for several years, so I'll have to let someone else assist if you have more questions. -- David Chapman dcchap...@acm.org Chapman Consulting -- San Jose, CA Software Development Done Right. www.chapman-consulting-sj.com
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/13/2016 02:21 PM, David Chapman wrote: > On 8/13/2016 11:07 AM, Adam Jensen wrote: >> When a branch is created, are the files under revision control in the >> trunk copied to the branch (is there any duplication of files in the >> repository)? > > No, the files are not copied; a rename is stored. These are "cheap > copies", and this is an advantage over simple backups - if you want to > save history using backups (per another suggestion), you need to retain > one backup per significant event. That can add up. Thanks! That's a critical issue for my case where there is a large & growing core data-set and where it might be useful to have hundreds of branches, each representing a particular configuration of a subset, slice, or view of the core data-set. > Subversion is most often used to store text files because it stores > intra-file deltas when content is modified. Your use case is unusual, > but as long as you don't make a lot of changes to the binary files, it > will be efficient. Thanks [again] for the [vindicating] confirmation. I am inspired to set up a test case and explore this approach further :) Since, in my case, the binary files should/must never change, is there a way to configure a read-only attribute on specific files in the repository such that any subsequent attempt to check-in a change to any of those files will be rejected and an alert raised? The directory structures should remain changeable.
Re: A couple thousand mp3 files (this is not spam I swear )
On 8/13/2016 11:07 AM, Adam Jensen wrote: On 08/12/2016 08:56 PM, Adam Jensen wrote: Here's the situation: I have ~1500 mp3 files (not pirated music), and the collection is growing. The sizes range from ~100kB to ~300MB. The content of these files will never change. The directory structure will change, files will be moved, and new (additional) [mp3] files will be added. When a branch is created, are the files under revision control in the trunk copied to the branch (is there any duplication of files in the repository)? No, the files are not copied; a rename is stored. These are "cheap copies", and this is an advantage over simple backups - if you want to save history using backups (per another suggestion), you need to retain one backup per significant event. That can add up. Subversion is most often used to store text files because it stores intra-file deltas when content is modified. Your use case is unusual, but as long as you don't make a lot of changes to the binary files, it will be efficient. -- David Chapman dcchap...@acm.org Chapman Consulting -- San Jose, CA Software Development Done Right. www.chapman-consulting-sj.com
Re: A couple thousand mp3 files (this is not spam I swear )
On 08/12/2016 08:56 PM, Adam Jensen wrote: > Here's the situation: I have ~1500 mp3 files (not pirated music), and > the collection is growing. The sizes range from ~100kB to ~300MB. The > content of these files will never change. The directory structure will > change, files will be moved, and new (additional) [mp3] files will be added. When a branch is created, are the files under revision control in the trunk copied to the branch (is there any duplication of files in the repository)?
Re: A couple thousand mp3 files (this is not spam I swear )
On 13.08.2016 02:56, Adam Jensen wrote: > I sent this text (^above^) to users-subscr...@subversion.apache.org > earlier today assuming that one need not be subscribed to post. This is > a post-subscribe re-post. > > {https://subversion.apache.org/mailing-lists.html}::("you don't need to > be subscribed to post") Indeed you don't need to be subscribed to post, but the users-subscribe@ address doesn't post messages to the users@ list. :) -- Brane
Re: A couple thousand mp3 files (this is not spam I swear )
On Fri, Aug 12, 2016 at 8:56 PM, Adam Jensenwrote: > Here's the situation: I have ~1500 mp3 files (not pirated music), and > the collection is growing. The sizes range from ~100kB to ~300MB. The > content of these files will never change. The directory structure will > change, files will be moved, and new (additional) [mp3] files will be > added. > So the only thing you'd be accomplishing with this scheme would be to keep a revision history of the directory structure. I see no benefit to using Subversion for this, as the directory structure is also preserved by a simple backup. HTH...
A couple thousand mp3 files (this is not spam I swear )
Hi, I am considering a use of Subversion as a means to avoid accidentally deleting some important files that I will be working with on a regular basis. I am very interested to get comments on my plan or suggestions for alternative methods. Here's the situation: I have ~1500 mp3 files (not pirated music), and the collection is growing. The sizes range from ~100kB to ~300MB. The content of these files will never change. The directory structure will change, files will be moved, and new (additional) [mp3] files will be added. So basically, I am considering the use of Subversion as a live backup of sorts and a discipline/safety-net to prevent accidental deletion of any part of a continuously growing data-set while I explore different data organization and analysis strategies. My primary concerns are related to any potential file corruption, any data duplication, and/or any excessive network or disk I/O (other than the expected load of direct data communication). Again, any comments on [or analysis of] this plan, or suggestions for alternatives will be much appreciated! I sent this text (^above^) to users-subscr...@subversion.apache.org earlier today assuming that one need not be subscribed to post. This is a post-subscribe re-post. {https://subversion.apache.org/mailing-lists.html}::("you don't need to be subscribed to post")