Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-18 Thread Mark Phippard
On Wed, Aug 17, 2016 at 9:09 PM, Adam Jensen  wrote:

> On 08/17/2016 04:36 PM, Johan Corveleyn wrote:
> > On Wed, Aug 17, 2016 at 9:13 PM, Adam Jensen  wrote:
> [snip]
> >> So basically, the checkout method will require twice (2x) the data-set
> >> size of storage space for a working copy but there would be
> >> significantly less network load during many of the branch switches. The
> >> export method pretty much has the opposite storage/network trade-off.
> >
> > I guess you'd need this (very old) feature request to be implemented:
> >
> > https://issues.apache.org/jira/browse/SVN-525 (allow working copies
> > without text-base/)
>
> Nice reference, thanks!
>
> Wow, that feature was requested during 2001.
>
> What I need (and what I think is generally needed) is a high-capacity,
> large-file repository with a focus on data integrity (mandatory audit
> trails), sophisticated access control (smart contracts (maybe blockchain
> based)), probably (almost certainly) an encrypted file-system, and
> distribution/replication (that is maybe torrent based). Files in this
> type of system might need to be deleted but they wouldn't be revised.
> This would not be a revision management system.



You are probably already aware of this, but on the server side, the
repository is essentially an opaque database.  You will not see anything
resembling your files directly on the server filesystem.  So unless you
build your own custom code that used the repository API layer you can only
interact with the content of your repository using a SVN client of some
kind.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


RE: A couple thousand mp3 files (this is not spam I swear )

2016-08-18 Thread Stümpfig , Thomas
Hi,

While I fully agree with Stefan, I would like to mention that we use svn in a 
non-standard way. We think still with very good performance and use case 
coverage. It is not for source code. It is technical sales project management. 
Sharing all kind of doks jpegs,mp3,mp4,pdf,word,... We successfully have 
~3.000.000 Files in our repository. ~250 (commiters) ~1500 read only users 
since 2008.

We use tortoisesvn for most use cases. In addition to standard svn 
functionality, we created an SOLR based application for PowerPoint slide search 
and full text search.

The reasons to decide were:
a) SVN provides versioning (hey I am able to recover my presentation)
b) provides Path based authorization with Active Directory integration (as a 
large company we have to care about ip)
c) simple ui (tortoise) (even Sales can use it ;-))
d) checkout/check-in for offline usage is simple... (Good on customer site 
where we have no access to network)
e) Very good storage and network efficiency. (Good for home office/Hotel guys)
f) stable and reliable (we can use it 24/7 99% of the time)
g) active community (we can ask someone for help - and get it fast)
h) free and open source :-) - no compliance issues
i) multisite is supported, several multisite solutions exist.  But keep in mind 
svn is not a distributed scm tool. (we thought we might need it, but actually 
not implemented it, just for speed of recovery we have a 2nd server that 
mirrors the data)
j) updates were smooth we started with svn 1.3 - and are now on 1.9 (no big 
invest in upgrades ... my boss is happy)


SVN was the tool that fitted best the needs of our project teams. And even 
better we have 2 part time admin (~10% of the effective Time) for the system.

However,
Svn has some "weaknesses' you should be aware of. OOTB it is not trivial/fast 
to query a file by properties like in Typical DMS. Backing up such large 
repositories through a dump is not feasible. Other means like btrfs/lvm2 should 
be put in place. SVN OOTB does not provide persistent encryption. You could 
encrypt during a commit by 3rd party tools. Transport encryption (https) is 
supported. And finally I would like to add that svn is not a file system, 
despite the fact that svn provides webdav capabilities. As Stefan stated, it is 
a scm tool :-) and it does the job really well.


Regards
Thomas

-Original Message-
From: Stefan Sperling [mailto:s...@elego.de]
Sent: Donnerstag, 18. August 2016 09:46
To: Adam Jensen <han...@riseup.net>
Cc: users@subversion.apache.org
Subject: Re: A couple thousand mp3 files (this is not spam  I swear 
)

On Wed, Aug 17, 2016 at 09:09:27PM -0400, Adam Jensen wrote:
> What I need (and what I think is generally needed) is a high-capacity,
> large-file repository with a focus on data integrity (mandatory audit
> trails), sophisticated access control (smart contracts (maybe
> blockchain based)), probably (almost certainly) an encrypted
> file-system, and distribution/replication (that is maybe torrent
> based). Files in this type of system might need to be deleted but they 
> wouldn't be revised.
> This would not be a revision management system.
>
> I'm not sure how much of Subversion could be used/leveraged to build
> such a system.

Indeed it won't. I believe you should use something else for this job.
Not tracking changes contradicts a core requirement SVN was built for.

> At a minimum, it seems like it would involve a project fork and
> serious gutting and refactoring of the code-base after rethinking the
> basic principles, specifying the new requirements, and devising the
> new architecture. (And definitely a name change ).

You're free to use our code in whatever way you wish.
And we're always open to patches, of course.

But keep in mind that the code base is 16 years old and widely deployed.
Adding new features was easy in the early stages of development but is getting 
increasingly hard because of growing complexity and very strict reliability 
requirements imposed by our user base.

And we can't sever our roots:
"""
Apache Subversion is a full-featured version control system originally designed 
to be a better CVS. Subversion has since expanded beyond its original goal of 
replacing CVS, but its basic model, design, and interface remain heavily 
influenced by that goal. Even today, Subversion should still feel very familiar 
to CVS users.
"""
http://subversion.apache.org/features.html

So if you're really going to write a new piece of software for this you'll be 
much happier starting a new project from scratch rather than using SVN as a 
base.

-
Siemens Industry Software GmbH; Anschrift: Franz-Geuer-Str. 10, 50823 Köln; 
Gesellschaft mit beschränkter Haftung; Geschäftsführer: Urban August, Daniel 
Trebes; Sitz der Gesellschaft: Köln; Registergericht: Amtsgericht Köln, HRB 
84564


Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-18 Thread Stefan Sperling
On Wed, Aug 17, 2016 at 09:09:27PM -0400, Adam Jensen wrote:
> What I need (and what I think is generally needed) is a high-capacity,
> large-file repository with a focus on data integrity (mandatory audit
> trails), sophisticated access control (smart contracts (maybe blockchain
> based)), probably (almost certainly) an encrypted file-system, and
> distribution/replication (that is maybe torrent based). Files in this
> type of system might need to be deleted but they wouldn't be revised.
> This would not be a revision management system.
> 
> I'm not sure how much of Subversion could be used/leveraged to build
> such a system.

Indeed it won't. I believe you should use something else for this job.
Not tracking changes contradicts a core requirement SVN was built for.

> At a minimum, it seems like it would involve a project
> fork and serious gutting and refactoring of the code-base after
> rethinking the basic principles, specifying the new requirements, and
> devising the new architecture. (And definitely a name change ).

You're free to use our code in whatever way you wish.
And we're always open to patches, of course.

But keep in mind that the code base is 16 years old and widely deployed.
Adding new features was easy in the early stages of development but is
getting increasingly hard because of growing complexity and very strict
reliability requirements imposed by our user base.

And we can't sever our roots:
"""
Apache Subversion is a full-featured version control system originally designed
to be a better CVS. Subversion has since expanded beyond its original goal of
replacing CVS, but its basic model, design, and interface remain heavily
influenced by that goal. Even today, Subversion should still feel very familiar
to CVS users.
"""
http://subversion.apache.org/features.html

So if you're really going to write a new piece of software for this
you'll be much happier starting a new project from scratch rather
than using SVN as a base.


Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-17 Thread Adam Jensen
On 08/17/2016 04:36 PM, Johan Corveleyn wrote:
> On Wed, Aug 17, 2016 at 9:13 PM, Adam Jensen  wrote:
[snip]
>> So basically, the checkout method will require twice (2x) the data-set
>> size of storage space for a working copy but there would be
>> significantly less network load during many of the branch switches. The
>> export method pretty much has the opposite storage/network trade-off.
> 
> I guess you'd need this (very old) feature request to be implemented:
> 
> https://issues.apache.org/jira/browse/SVN-525 (allow working copies
> without text-base/)

Nice reference, thanks!

Wow, that feature was requested during 2001.

What I need (and what I think is generally needed) is a high-capacity,
large-file repository with a focus on data integrity (mandatory audit
trails), sophisticated access control (smart contracts (maybe blockchain
based)), probably (almost certainly) an encrypted file-system, and
distribution/replication (that is maybe torrent based). Files in this
type of system might need to be deleted but they wouldn't be revised.
This would not be a revision management system.

I'm not sure how much of Subversion could be used/leveraged to build
such a system. At a minimum, it seems like it would involve a project
fork and serious gutting and refactoring of the code-base after
rethinking the basic principles, specifying the new requirements, and
devising the new architecture. (And definitely a name change ).

It's not the Pyramid of Khufu big, or the Panama Canal big, but it would
be a big project.

For now, I still think I can use Subversion as the file repository in a
limited capability, ad-hoc implementation of a small
demonstration-of-concept instrumentation and analysis system.



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-17 Thread Johan Corveleyn
On Wed, Aug 17, 2016 at 9:13 PM, Adam Jensen  wrote:
> On 08/17/2016 12:55 AM, Ryan Schmidt wrote:
> [snip]
>> He means avoid the 2x disk use by using "svn export" instead of "svn 
>> checkout".
>>
> [snip]
>>
>> Of course Subversion only transfers changes.
>>
>
> Situation summary for the many-large-files scenario. Something like:
>
> svn checkout svn://URL/ProjX/branches/profile1 ~/test
> cd ~/test
> svn switch ^/branches/profile2
>
> the branch switch will work nicely with a many-large-files data-set,
> only transferring the files necessary to complete the new profile. But
> the storage requirements for the working copy (~/test) is twice (2x) the
> size of the checked-out data-set.
>
> Alternatively, using the export method:
>
> svn export svn://URL/ProjX/branches/profile1 ~/test
>
> will transfer all of the files of profile1 and the storage requirements
> for the working copy is only (1x) the size of the data-set. But
> switching between branches is not available in the export method.
>
> To switch branches (using the export method):
>
> svn export svn://URL/ProjX/branches/profile1 ~/test
> (transfers all files in profile1)
>
> rm -rf ~/test
> svn export svn://URL/ProjX/branches/profile2 ~/test
> (transfers all files in profile2)
>
> So basically, the checkout method will require twice (2x) the data-set
> size of storage space for a working copy but there would be
> significantly less network load during many of the branch switches. The
> export method pretty much has the opposite storage/network trade-off.

I guess you'd need this (very old) feature request to be implemented:

https://issues.apache.org/jira/browse/SVN-525 (allow working copies
without text-base/)

Most of the discussion in the issue tracker is rather old (and refers
to the old pre-1.7 working copy format). But I suppose in the post-1.7
era this is still a big undertaking should someone decide to try and
implement this. Help is always welcome of course.

There is also, slightly related, this one, but I guess that wouldn't
help in your case, as your files are not compressible:

https://issues.apache.org/jira/browse/SVN-908 (Store text-base compressed)

-- 
Johan


Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-17 Thread Adam Jensen
On 08/17/2016 12:55 AM, Ryan Schmidt wrote:
[snip]
> He means avoid the 2x disk use by using "svn export" instead of "svn 
> checkout".
> 
[snip]
> 
> Of course Subversion only transfers changes.
> 

Situation summary for the many-large-files scenario. Something like:

svn checkout svn://URL/ProjX/branches/profile1 ~/test
cd ~/test
svn switch ^/branches/profile2

the branch switch will work nicely with a many-large-files data-set,
only transferring the files necessary to complete the new profile. But
the storage requirements for the working copy (~/test) is twice (2x) the
size of the checked-out data-set.

Alternatively, using the export method:

svn export svn://URL/ProjX/branches/profile1 ~/test

will transfer all of the files of profile1 and the storage requirements
for the working copy is only (1x) the size of the data-set. But
switching between branches is not available in the export method.

To switch branches (using the export method):

svn export svn://URL/ProjX/branches/profile1 ~/test
(transfers all files in profile1)

rm -rf ~/test
svn export svn://URL/ProjX/branches/profile2 ~/test
(transfers all files in profile2)

So basically, the checkout method will require twice (2x) the data-set
size of storage space for a working copy but there would be
significantly less network load during many of the branch switches. The
export method pretty much has the opposite storage/network trade-off.



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-17 Thread Adam Jensen
On 08/16/2016 09:17 AM, Stefan Hett wrote:
> On 8/13/2016 2:56 AM, Adam Jensen wrote:
>> My primary concerns are related to any potential file corruption, any
>> data duplication, and/or any excessive network or disk I/O (other than
>> the expected load of direct data communication).
>
> Just to have this mentioned: Be aware that the working copy (aka: the
> checked out data of the repository) will have a 2x storage requirement
> on the data since it will keep a copy of the pristine version of the
> file in addition to the "actual" file.
> If this is a concern for your use-case, you could export the files and
> only use a working copy in cases where you need to commit or reorder files.
> 
> To clarify: This is purely a client side storage requirement. It does
> not apply to the storage requirements on the server side.
> 

Wow, I totally misinterpreted that during the first reading. After some
tinkering this morning (and reading Ryan's email) I think I see the light ;)

svn checkout svn://minerva.bohemia.net/Project_Prometheus/trunk cotest
du -sh cotest
104Mcotest

svn export svn://minerva.bohemia.net/Project_Prometheus/trunk extest
du -sh extest
 52Mextest



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-16 Thread Ryan Schmidt

> On Aug 16, 2016, at 2:13 PM, Adam Jensen  wrote:
> 
> On 08/16/2016 09:17 AM, Stefan Hett wrote:
>> Just to have this mentioned: Be aware that the working copy (aka: the
>> checked out data of the repository) will have a 2x storage requirement
>> on the data since it will keep a copy of the pristine version of the
>> file in addition to the "actual" file.
> 

> 
>> If this is a concern for your use-case, you could export the files and
>> only use a working copy in cases where you need to commit or reorder files.
> 
> By "export the files" do you mean something like an NFS share of the
> repository, thus bypassing svnserve and the check-in/check-out process?
> That seems like a clever possibility worth remembering, but for now the
> system I am currently building/imagining is headed in a different direction.

He means avoid the 2x disk use by using "svn export" instead of "svn checkout".


>> To clarify: This is purely a client side storage requirement. It does
>> not apply to the storage requirements on the server side.
> 
> To reduce network load, are there any client-side caching options for
> Subversion? Does the svn program account for the files already in the
> working copy (on the local disk) and avoid transferring those files over
> the network during a subsequent check-out [that requires those files]?

Of course Subversion only transfers changes.

> Is it possible to clone or mirror all or part of a Subversion repository?

svnsync

>  This probably isn't relevant to Subversion, but in the
> system I am imagining it might be reasonable for clients to check-out
> data-sets via torrent connections with other full/partial repositories.




Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-16 Thread Adam Jensen
On 08/16/2016 09:17 AM, Stefan Hett wrote:
> Just to have this mentioned: Be aware that the working copy (aka: the
> checked out data of the repository) will have a 2x storage requirement
> on the data since it will keep a copy of the pristine version of the
> file in addition to the "actual" file.

The type of system that I am imagining might typically have several
terabytes of instrumentation data in a repository[1]. Various client
machines might need to check-out a few gigabytes or a few hundred
gigabytes at a time to run data analysis (automated compute jobs) or to
perform a study (scientist/human-interest).

[1]: Version control isn't a requirement in this
use-case/hypothetical-system. Sophisticated access control is much more
of a concern. Mandatory audit trails and distributed contract based data
handling are examples of more relevant architectural characteristics.

I am currently looking at the possibility of using Subversion (in a
non-traditional, off-label fashion) to bootstrap a [very] simplified
demonstration-of-concept type of setup.

My current data-set is only about 25GB and growing at a rate of about
1GB/week. A desktop server and laptop client shouldn't have any storage
space problems (in this case as a small demonstration system).

> If this is a concern for your use-case, you could export the files and
> only use a working copy in cases where you need to commit or reorder files.

By "export the files" do you mean something like an NFS share of the
repository, thus bypassing svnserve and the check-in/check-out process?
That seems like a clever possibility worth remembering, but for now the
system I am currently building/imagining is headed in a different direction.

> To clarify: This is purely a client side storage requirement. It does
> not apply to the storage requirements on the server side.

To reduce network load, are there any client-side caching options for
Subversion? Does the svn program account for the files already in the
working copy (on the local disk) and avoid transferring those files over
the network during a subsequent check-out [that requires those files]?

Is it possible to clone or mirror all or part of a Subversion repository?

 This probably isn't relevant to Subversion, but in the
system I am imagining it might be reasonable for clients to check-out
data-sets via torrent connections with other full/partial repositories.



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-16 Thread Stefan Hett

Hi,
On 8/13/2016 2:56 AM, Adam Jensen wrote:

My primary concerns are related to any potential file corruption, any
data duplication, and/or any excessive network or disk I/O (other than
the expected load of direct data communication).
Just to have this mentioned: Be aware that the working copy (aka: the 
checked out data of the repository) will have a 2x storage requirement 
on the data since it will keep a copy of the pristine version of the 
file in addition to the "actual" file.
If this is a concern for your use-case, you could export the files and 
only use a working copy in cases where you need to commit or reorder files.


To clarify: This is purely a client side storage requirement. It does 
not apply to the storage requirements on the server side.


--
Regards,
Stefan Hett



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-14 Thread David Chapman

On 8/14/2016 1:22 PM, Adam Jensen wrote:

On 08/13/2016 09:33 PM, Branko Čibej wrote:


But note that a rename is represented as an add+delete, so the hook
would have to be rather smarter than it would seem at first glance to
detect and allow renames without content modification.

The literal file names are composed of a date and a sequence number, and
like the contents of the files, the names should never change. The core
data-set directory structure (trunk, maybe) will most likely be
calendar-like (years->months->days->sequence->file). The analysis tools
and meta-data will probably be kept in a separate fossil[1] repository.

[1]: http://www.fossil-scm.org/

The near-term goal is to maintain an indelible record of the physical
measurements of reality. Any analysis [of which, there will be plenty],
annotations, and other meta-data generation must not alter the
fundamental instrumentation data.

Given that, by "rename" do you mean a change of the literal file name
like what I tried to describe above, or are you referring to something
more like the file references, links, or pointers within the repository
[internal implementation], similar to David's use of the term "rename"
(included below)?



Yes, he and I are referring to the same thing.  The file contents are 
not copied, which is your primary goal.


--
David Chapman  dcchap...@acm.org
Chapman Consulting -- San Jose, CA
Software Development Done Right.
www.chapman-consulting-sj.com



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-14 Thread Adam Jensen
On 08/13/2016 09:33 PM, Branko Čibej wrote:
> There is (currently) no easy way to specify "write once" access for
> files in the repository; whoever can create a file can modify or delete
> it, too. You could achieve something like that by creating a custom
> pre-commit hook that would examine the pending commit transaction and
> reject the commit if it finds modifications or deletions of existing files

Thanks. I have svnserve configured and ready for a few tests that should
enable basic characterization of the system in this use-case. I suppose
I could concurrently develop familiarity with the command and operation
of this setup while I explore customization of its functioning.

Tonight is for contemplation but I imagine testing and hook scripting
will begin within the next few days.

Does anyone have any comments on using Tcl as a hook scripting language?

> But note that a rename is represented as an add+delete, so the hook
> would have to be rather smarter than it would seem at first glance to
> detect and allow renames without content modification.

The literal file names are composed of a date and a sequence number, and
like the contents of the files, the names should never change. The core
data-set directory structure (trunk, maybe) will most likely be
calendar-like (years->months->days->sequence->file). The analysis tools
and meta-data will probably be kept in a separate fossil[1] repository.

[1]: http://www.fossil-scm.org/

The near-term goal is to maintain an indelible record of the physical
measurements of reality. Any analysis [of which, there will be plenty],
annotations, and other meta-data generation must not alter the
fundamental instrumentation data.

Given that, by "rename" do you mean a change of the literal file name
like what I tried to describe above, or are you referring to something
more like the file references, links, or pointers within the repository
[internal implementation], similar to David's use of the term "rename"
(included below)?

On 08/13/2016 02:21 PM, David Chapman wrote:
> On 8/13/2016 11:07 AM, Adam Jensen wrote:
>> When a branch is created, are the files under revision control in the
>> trunk copied to the branch (is there any duplication of files in the
>> repository)?
>
> No, the files are not copied; a rename is stored.  These are "cheap
> copies", and this is an advantage over simple backups - if you want to
> save history using backups (per another suggestion), you need to retain
> one backup per significant event.  That can add up.



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread Branko Čibej
On 14.08.2016 00:20, Adam Jensen wrote:
> What would an "svnserve.conf" file with "write once" access control look like?

There is (currently) no easy way to specify "write once" access for
files in the repository; whoever can create a file can modify or delete
it, too. You could achieve something like that by creating a custom
pre-commit hook that would examine the pending commit transaction and
reject the commit if it finds modifications or deletions of existing files

But note that a rename is represented as an add+delete, so the hook
would have to be rather smarter than it would seem at first glance to
detect and allow renames without content modification.

-- Brane


Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread Branko Čibej
On 13.08.2016 20:21, David Chapman wrote:
> On 8/13/2016 11:07 AM, Adam Jensen wrote:
>> On 08/12/2016 08:56 PM, Adam Jensen wrote:
>>> Here's the situation: I have ~1500 mp3 files (not pirated music), and
>>> the collection is growing. The sizes range from ~100kB to ~300MB. The
>>> content of these files will never change. The directory structure will
>>> change, files will be moved, and new (additional) [mp3] files will
>>> be added.
>> When a branch is created, are the files under revision control in the
>> trunk copied to the branch (is there any duplication of files in the
>> repository)?
>>
>>
>
> No, the files are not copied; a rename is stored.  These are "cheap
> copies", and this is an advantage over simple backups - if you want to
> save history using backups (per another suggestion), you need to
> retain one backup per significant event.  That can add up.
>
> Subversion is most often used to store text files because it stores
> intra-file deltas when content is modified.  Your use case is unusual,
> but as long as you don't make a lot of changes to the binary files, it
> will be efficient.

Subversion uses binary deltas to store differences between files. It's
essentially irrelevant if the files are text or something else, *unless*
they're compressed (or encrypted) -- deltas between compressed versions
of files with even minor differences are usually quite large, so in such
cases Subversion may end up storing the complete file contents. (MP3
files are compressed audio, which is why I bring this up.)

In any case, Subversion handles binary files just fine. The tree
reorganizations you mention will cause minimal storage overhead.

-- Brane


Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread Adam Jensen
On 08/13/2016 08:09 AM, Branko Čibej wrote:
> On 13.08.2016 02:56, Adam Jensen wrote:
>> I sent this text (^above^) to users-subscr...@subversion.apache.org
>> earlier today assuming that one need not be subscribed to post. This is
>> a post-subscribe re-post.
>>
>> {https://subversion.apache.org/mailing-lists.html}::("you don't need to
>> be subscribed to post")
> 
> 
> Indeed you don't need to be subscribed to post, but the users-subscribe@
> address doesn't post messages to the users@ list. :)

I think the original message was in fact sent to users@ and the mention
(^above^) of users-subscribe@ was an independent copy and paste error.

The original post just showed up on:

http://mail-archives.apache.org/mod_mbox/subversion-users/201608.mbox/browser

Specifically:

http://mail-archives.apache.org/mod_mbox/subversion-users/201608.mbox/%3C57AE4DA6.9020106%40riseup.net%3E


Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread Adam Jensen
On 08/13/2016 05:31 PM, Nico Kadel-Garcia wrote:
> Don't hurt yourself getting too clever. And don't forget that once
> ingested, Subversion is designed to *never let go* of content.
> Deleting any in the master simply won't ever clear the content from
> the core repository and its history, *ever*.
> 
> Why do I bring this up? Because if it's MP3's and you discover a
> copyright violation, you cannot expunge the content without a *very*
> painful dump, iflter, and reload operation on a quite large
> repository.

The mp3 files, in this case, are indeed audio recordings but they
represent an aspect of an expanding instrumentation system that is part
of a specific [controlled ownership] environment. I don't think I have
overlooked any relevant copyright issues related to the storage of the
data. An indelible record is the goal (for now).

>> Since, in my case, the binary files should/must never change, is there a
>> way to configure a read-only attribute on specific files in the
>> repository such that any subsequent attempt to check-in a change to any
>> of those files will be rejected and an alert raised? The directory
>> structures should remain changeable.
> 
> That's what  user privileges in "svnserve.conf" that would provide
> "write once" control are designed for. It would *not* prevent
> operations on the local file system by an administrator, using
> "file:///" based access. I assume that if an admin performs that kind
> of access, they mean what they're doing and are aware they're avoiding
> the filters.

What would an "svnserve.conf" file with "write once" access control look
like? The book[1] shows something like this:

--
[general]
password-db = userfile
realm = example realm

# anonymous users aren't allowed
anon-access = none

# authenticated users can both read and write
auth-access = write
--

[1]:
http://svnbook.red-bean.com/nightly/en/svn.serverconfig.svnserve.html#svn.serverconfig.svnserve.auth.general

I intend to host this repository from a FreeBSD-10.3(x86) system where
`svnserve --version` is:

--
svnserve, version 1.9.4 (r1740329)
   compiled Jul 28 2016, 02:56:08 on i386-portbld-freebsd10.1

Copyright (C) 2016 The Apache Software Foundation.
This software consists of contributions made by many people;
see the NOTICE file for more information.
Subversion is open source software, see http://subversion.apache.org/

The following repository back-end (FS) modules are available:

* fs_fs : Module for working with a plain file (FSFS) repository.
* fs_x : Module for working with an experimental (FSX) repository.
--

The manual page on that system `man svnserve.conf` describes:

auth-access = none|read|write



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread Nico Kadel-Garcia
On Sat, Aug 13, 2016 at 3:29 PM, Adam Jensen  wrote:
> On 08/13/2016 02:21 PM, David Chapman wrote:
>> On 8/13/2016 11:07 AM, Adam Jensen wrote:
>>> When a branch is created, are the files under revision control in the
>>> trunk copied to the branch (is there any duplication of files in the
>>> repository)?
>>
>> No, the files are not copied; a rename is stored.  These are "cheap
>> copies", and this is an advantage over simple backups - if you want to
>> save history using backups (per another suggestion), you need to retain
>> one backup per significant event.  That can add up.

> Thanks! That's a critical issue for my case where there is a large &
> growing core data-set and where it might be useful to have hundreds of
> branches, each representing a particular configuration of a subset,
> slice, or view of the core data-set.

Don't hurt yourself getting too clever. And don't forget that once
ingested, Subversion is designed to *never let go* of content.
Deleting any in the master simply won't ever clear the content from
the core repository and its history, *ever*.

Why do I bring this up? Because if it's MP3's and you discover a
copyright violation, you cannot expunge the content without a *very*
painful dump, iflter, and reload operation on a quite large
repository.

> Since, in my case, the binary files should/must never change, is there a
> way to configure a read-only attribute on specific files in the
> repository such that any subsequent attempt to check-in a change to any
> of those files will be rejected and an alert raised? The directory
> structures should remain changeable.

That's what  user privileges in "svnserve.conf" that would provide
"write once" control are designed for. It would *not* prevent
operations on the local file system by an administrator, using
"file:///" based access. I assume that if an admin performs that kind
of access, they mean what they're doing and are aware they're avoiding
the filters.

I also think you're being really optimistic about "my repository will
only grow and never need to actually clear content". Accidental
submission of copyright violating content would be my big worry. But I
tend to pay more attention to that kind of concern than most.


Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread David Chapman

On 8/13/2016 12:29 PM, Adam Jensen wrote:

On 08/13/2016 02:21 PM, David Chapman wrote:

On 8/13/2016 11:07 AM, Adam Jensen wrote:

When a branch is created, are the files under revision control in the
trunk copied to the branch (is there any duplication of files in the
repository)?

No, the files are not copied; a rename is stored.  These are "cheap
copies", and this is an advantage over simple backups - if you want to
save history using backups (per another suggestion), you need to retain
one backup per significant event.  That can add up.

Thanks! That's a critical issue for my case where there is a large &
growing core data-set and where it might be useful to have hundreds of
branches, each representing a particular configuration of a subset,
slice, or view of the core data-set.


Subversion is most often used to store text files because it stores
intra-file deltas when content is modified.  Your use case is unusual,
but as long as you don't make a lot of changes to the binary files, it
will be efficient.

Thanks [again] for the [vindicating] confirmation. I am inspired to set
up a test case and explore this approach further :)

Since, in my case, the binary files should/must never change, is there a
way to configure a read-only attribute on specific files in the
repository such that any subsequent attempt to check-in a change to any
of those files will be rejected and an alert raised? The directory
structures should remain changeable.





I don't know about an attribute, but you could define a hook script that 
would check the files being committed to ensure that no existing large 
binary files are being modified.  I haven't done any work with hook 
scripts for several years, so I'll have to let someone else assist if 
you have more questions.


--
David Chapman  dcchap...@acm.org
Chapman Consulting -- San Jose, CA
Software Development Done Right.
www.chapman-consulting-sj.com



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread Adam Jensen
On 08/13/2016 02:21 PM, David Chapman wrote:
> On 8/13/2016 11:07 AM, Adam Jensen wrote:
>> When a branch is created, are the files under revision control in the
>> trunk copied to the branch (is there any duplication of files in the
>> repository)?
> 
> No, the files are not copied; a rename is stored.  These are "cheap
> copies", and this is an advantage over simple backups - if you want to
> save history using backups (per another suggestion), you need to retain
> one backup per significant event.  That can add up.

Thanks! That's a critical issue for my case where there is a large &
growing core data-set and where it might be useful to have hundreds of
branches, each representing a particular configuration of a subset,
slice, or view of the core data-set.

> Subversion is most often used to store text files because it stores
> intra-file deltas when content is modified.  Your use case is unusual,
> but as long as you don't make a lot of changes to the binary files, it
> will be efficient.

Thanks [again] for the [vindicating] confirmation. I am inspired to set
up a test case and explore this approach further :)

Since, in my case, the binary files should/must never change, is there a
way to configure a read-only attribute on specific files in the
repository such that any subsequent attempt to check-in a change to any
of those files will be rejected and an alert raised? The directory
structures should remain changeable.




Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread David Chapman

On 8/13/2016 11:07 AM, Adam Jensen wrote:

On 08/12/2016 08:56 PM, Adam Jensen wrote:

Here's the situation: I have ~1500 mp3 files (not pirated music), and
the collection is growing. The sizes range from ~100kB to ~300MB. The
content of these files will never change. The directory structure will
change, files will be moved, and new (additional) [mp3] files will be added.

When a branch is created, are the files under revision control in the
trunk copied to the branch (is there any duplication of files in the
repository)?




No, the files are not copied; a rename is stored.  These are "cheap 
copies", and this is an advantage over simple backups - if you want to 
save history using backups (per another suggestion), you need to retain 
one backup per significant event.  That can add up.


Subversion is most often used to store text files because it stores 
intra-file deltas when content is modified.  Your use case is unusual, 
but as long as you don't make a lot of changes to the binary files, it 
will be efficient.


--
David Chapman  dcchap...@acm.org
Chapman Consulting -- San Jose, CA
Software Development Done Right.
www.chapman-consulting-sj.com



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread Adam Jensen
On 08/12/2016 08:56 PM, Adam Jensen wrote:
> Here's the situation: I have ~1500 mp3 files (not pirated music), and
> the collection is growing. The sizes range from ~100kB to ~300MB. The
> content of these files will never change. The directory structure will
> change, files will be moved, and new (additional) [mp3] files will be added.

When a branch is created, are the files under revision control in the
trunk copied to the branch (is there any duplication of files in the
repository)?


Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-13 Thread Branko Čibej
On 13.08.2016 02:56, Adam Jensen wrote:
> I sent this text (^above^) to users-subscr...@subversion.apache.org
> earlier today assuming that one need not be subscribed to post. This is
> a post-subscribe re-post.
>
> {https://subversion.apache.org/mailing-lists.html}::("you don't need to
> be subscribed to post")


Indeed you don't need to be subscribed to post, but the users-subscribe@
address doesn't post messages to the users@ list. :)


-- Brane



Re: A couple thousand mp3 files (this is not spam I swear )

2016-08-12 Thread Chris Carman
On Fri, Aug 12, 2016 at 8:56 PM, Adam Jensen  wrote:

> Here's the situation: I have ~1500 mp3 files (not pirated music), and
> the collection is growing. The sizes range from ~100kB to ~300MB. The
> content of these files will never change. The directory structure will
> change, files will be moved, and new (additional) [mp3] files will be
> added.
>

So the only thing you'd be accomplishing with this scheme would be to keep
a revision history of the directory structure.

I see no benefit to using Subversion for this, as the directory structure
is also preserved by a simple backup.

HTH...


A couple thousand mp3 files (this is not spam I swear )

2016-08-12 Thread Adam Jensen
Hi,

I am considering a use of Subversion as a means to avoid accidentally
deleting some important files that I will be working with on a regular
basis. I am very interested to get comments on my plan or suggestions
for alternative methods.

Here's the situation: I have ~1500 mp3 files (not pirated music), and
the collection is growing. The sizes range from ~100kB to ~300MB. The
content of these files will never change. The directory structure will
change, files will be moved, and new (additional) [mp3] files will be added.

So basically, I am considering the use of Subversion as a live backup of
sorts and a discipline/safety-net to prevent accidental deletion of any
part of a continuously growing data-set while I explore different data
organization and analysis strategies.

My primary concerns are related to any potential file corruption, any
data duplication, and/or any excessive network or disk I/O (other than
the expected load of direct data communication).

Again, any comments on [or analysis of] this plan, or suggestions for
alternatives will be much appreciated!

I sent this text (^above^) to users-subscr...@subversion.apache.org
earlier today assuming that one need not be subscribed to post. This is
a post-subscribe re-post.

{https://subversion.apache.org/mailing-lists.html}::("you don't need to
be subscribed to post")