Re: [RFE] Inverted sparseness (amended)

2017-12-05 Thread Philip Oakley

From: "Randall S. Becker" 
On December 3, 2017 6:14 PM, Philip Oakley wrote a nugget of wisdom:

From: "Randall S. Becker" 

[...]


If using the empty tree part doesn't pass muster (i.e. showing nothing
isn't sufficient), then the narrow clone could come into play to limit
what parts of the trees are widely visible, but mainly its using the
grafts to cover the regulatory gap, and (for the moment) using
fast-export to transfer the singleton commit / tags


Oh Just remembered, there is the newish capability to fetch random blobs, 
so that may help.


I think you hit the nail on the head pretty well. We're currently at 2.3.7, 
with a push to 2.15.1 this week, so I'm looking forward to trying this. My 
two worries are whether the empty tree is acceptable (it should be to the 
client, and might be to the vendor), and doing this reliably 
(semi-automated) so the user base does not have to worry about the gory 
details of doing this. The unit tests for it are undoubtedly going to give 
me headaches.


Thanks for the advice. Islands of shallowness are a really descriptive image 
for what this is. So identifying that there are shoals (to extend the 
metaphor somewhat), will be crucial to this adventure.


These islands of shallowness, however, are also concerns as described in the 
[Re: How hard would it be to implement sparse fetching/pulling?] thread. The 
matter of the security audit is important here also:
I'm just thinking that even if we get a *perfectly working* partial 
clone/fetch/push/etc. that it would not pass a security audit.



Philip says:
I'd totally disagree in the sense that if we had a submodule anywhere_ in 
the repo that would be an independent island of code, and we are quite happy 
with that - we use the web of trust with the auditors for them to go check, 
separately, the oid of the independent portion, which may be at another site 
or another vendor/client. That's OK, so what's the problem here...


We do the same for pinning the tips and tails of the lines of development 
that make for the shallowness and narrowness that create these shoals, and 
oxbows of development. Managing them is normal human activity, with the 
technical support that the Git chain provides - so much better than previous 
'versioning systems' that we see regularly in engineering, with backdoor 
tweaks etc.


The key is to ensure that there is a proper hand holding across the air 
gaps, such that the oids exist both sides of the gaps, and a properly built 
on, such that the hash chain is unbroken. It's a similar negotiation to 
those used for establishing web security between IP clients, so it is 
doable. But you are right to have concerns and suspisions to ensure that it 
is all tested and verified

--
Philip (sorry about the poor quoting of the reply)




Not having the capability would similarly cause a failure of a security 
audit.


Cheers,
Randall

-- Brief whoami: NonStop developer since approximately 
UNIX(421664400)/NonStop(2112884442)

-- In my real life, I talk too much.





RE: [RFE] Inverted sparseness (amended)

2017-12-05 Thread Randall S. Becker
On December 3, 2017 6:14 PM, Philip Oakley wrote a nugget of wisdom: 
>From: "Randall S. Becker" 
>Sent: Friday, December 01, 2017 6:31 PM
>> On December 1, 2017 1:19 PM, Jeff Hostetler wrote:
>>>On 12/1/2017 12:21 PM, Randall S. Becker wrote:
 I recently encountered a really strange use-case relating to sparse 
 clone/fetch that is really backwards from the discussion that has 
 been going on, and well, I'm a bit embarrassed to bring it up, but 
 I have no good solution including building a separate data store 
 that will end up inconsistent with repositories (a bad solution).  
 The use-case is as
 follows:

 Given a backbone of multiple git repositories spread across an 
 organization with a server farm and upstream vendors.
 The vendor delivers code by having the client perform git pull into 
 a specific branch.
 The customer may take the code as is or merge in customizations.
 The vendor wants to know exactly what commit of theirs is installed 
 on each server, in near real time.
 The customer is willing to push the commit-ish to the vendor's 
 upstream repo but does not want, by default, to share the actual 
 commit contents for security reasons.
 Realistically, the vendor needs to know that their own commit id 
 was put somewhere (process exists to track this, so not part of the
 use-case) and whether there is a subsequent commit contributed >by 
 the customer, but the content is not relevant initially.

 After some time, the vendor may request the commit contents from 
 the customer in order to satisfy support requirements - a.k.a. a 
 defect was found but has to be resolved.
 The customer would then perform a deeper push that looks a lot like 
 a "slightly" symmetrical operation of a deep fetch following a 
 prior sparse fetch to supply the vendor with the specific commit(s).
>>
>>>Perhaps I'm not understanding the subtleties of what you're 
>>>describing, but could you do this with stock git functionality.
>>
>>>Let the vendor publish a "well known branch" for the client.
>>>Let the client pull that and build.
>>>Let the client create a branch set to the same commit that they fetched.
>>>Let the client push that branch as a client-specific branch to the 
>>>vendor to indicate that that is the official release they are based on.
>>
>>>Then the vendor would know the official commit that the client was using.
>> This is the easy part, and it doesn't require anything sparse to exist.
>>
>>>If the client makes local changes, does the vendor really need the 
>>>SHA of those -- without the actual content?
>>>I mean any SHA would do right?  Perhaps let the client create a 
>>>second client-specific branch (set to  the same commit as the first) 
>>>to indicate they had mods.
>>>Later, when the vendor needs the actual client changes, the client 
>>>does a normal push to this 2nd client-specific branch at the vendor.
>>>This would send everything that the client has done to the code since 
>>>the official release.
>>
>> What I should have added to the use-case was that there is a strong 
>> audit requirement (regulatory, actually) involved that the SHA is 
>> exact, immutable, and cannot be substitute or forged (one of the 
>> reasons git is in such high regard). So, no I can't arrange a fake 
>> SHA to represent a SHA to be named later. It SHA of the installed 
>> commit is part of the official record of what happened on the specific 
>> server, so I'm stuck with it.
>>
>>>I'm not sure what you mean about "it is inside a tree".
>>
>> m---a---b---c---H1
>>  `---d---H2
>>
>> d would be at a head. b would be inside. Determining content of c is 
>> problematic if b is sparse, so I'm really unsure that any of this is 
>> possible.

>I think I get the jist of your use case. Would I be right that you 
>don't have a true working solution yet? i.e. that it's a problem that is 
>almost sorted but falls down at the last step.

>If one pretended that this was a single development shop, and the 
>various vendors, clients and customers as being independent devolopers, 
>each of whom is over protective of their code, it may give a better view that 
>maps onto classic feature development diagrams.
>(i.e draw the answer for local devs, then mark where the splits happen)

>In particular, I think you could use a notional regulator's view that 
>the whole code base is part of a large Git heirarchy of branches and 
>merges, and that some of the feature loops are only available via the 
>particular developer that worked on that feature.

>This would mean that from a regulatory overview there is a merge commit in the 
>'main'
>(master) heirachy that has the main and feature commits listed, and the 
>feature commit is probably an --allow-empty commit (that has an empty 
>tree if they are that paranoid) that says 'function X released' (and 
>probably tagged), and that release commit 

Re: [RFE] Inverted sparseness

2017-12-04 Thread Philip Oakley

From: "Randall S. Becker"  :December 03, 2017 11:44 PM
On December 3, 2017 6:14 PM, Philip Oakley wrote a nugget of wisdom:

From: "Randall S. Becker" 
Sent: Friday, December 01, 2017 6:31 PM

On December 1, 2017 1:19 PM, Jeff Hostetler wrote:

On 12/1/2017 12:21 PM, Randall S. Becker wrote:

I recently encountered a really strange use-case relating to sparse
clone/fetch that is really backwards from the discussion that has
been going on, and well, I'm a bit embarrassed to bring it up, but I
have no good solution including building a separate data store that
will end up inconsistent with repositories (a bad solution).  The
use-case is as
follows:

Given a backbone of multiple git repositories spread across an
organization with a server farm and upstream vendors.
The vendor delivers code by having the client perform git pull into
a specific branch.
The customer may take the code as is or merge in customizations.
The vendor wants to know exactly what commit of theirs is installed
on each server, in near real time.
The customer is willing to push the commit-ish to the vendor's
upstream repo but does not want, by default, to share the actual
commit contents for security reasons.
Realistically, the vendor needs to know that their own commit id was
put somewhere (process exists to track this, so not part of the
use-case) and whether there is a subsequent commit contributed >by
the customer, but the content is not relevant initially.

After some time, the vendor may request the commit contents from the
customer in order to satisfy support requirements - a.k.a. a defect
was found but has to be resolved.
The customer would then perform a deeper push that looks a lot like
a "slightly" symmetrical operation of a deep fetch following a prior
sparse fetch to supply the vendor with the specific commit(s).



Perhaps I'm not understanding the subtleties of what you're
describing, but could you do this with stock git functionality.



Let the vendor publish a "well known branch" for the client.
Let the client pull that and build.
Let the client create a branch set to the same commit that they fetched.
Let the client push that branch as a client-specific branch to the
vendor to indicate that that is the official release they are based on.



Then the vendor would know the official commit that the client was using.

This is the easy part, and it doesn't require anything sparse to exist.


If the client makes local changes, does the vendor really need the SHA
of those -- without the actual content?
I mean any SHA would do right?  Perhaps let the client create a second
client-specific branch (set to  the same commit as the first) to
indicate they had mods.
Later, when the vendor needs the actual client changes, the client
does a normal push to this 2nd client-specific branch at the vendor.
This would send everything that the client has done to the code since
the official release.


What I should have added to the use-case was that there is a strong
audit requirement (regulatory, actually) involved that the SHA is
exact, immutable, and cannot be substitute or forged (one of the
reasons git is in such high regard). So, no I can't arrange a fake SHA
to represent a SHA to be named later. It SHA of the installed commit
is part of the official record of what happened on the specific server,
so I'm stuck with it.


I'm not sure what you mean about "it is inside a tree".


m---a---b---c---H1
 `---d---H2

d would be at a head. b would be inside. Determining content of c is
problematic if b is sparse, so I'm really unsure that any of this is
possible.



I think I get the jist of your use case. Would I be right that you don't
have a true working
solution yet? i.e. that it's a problem that is almost sorted but falls down
at the last step.



If one pretended that this was a single development shop, and the various
vendors, clients
and customers as being independent devolopers, each of whom is over
protective of their
code, it may give a better view that maps onto classic feature development
diagrams.
(i.e draw the answer for local devs, then mark where the splits happen)



In particular, I think you could use a notional regulator's view that the
whole code base is
part of a large Git heirarchy of branches and merges, and that some of the
feature loops
are only available via the particular developer that worked on that
feature.



This would mean that from a regulatory overview there is a merge commit in
the 'main'
(master) heirachy that has the main and feature commits listed, and the
feature commit
is probably an --allow-empty commit (that has an empty tree if they are
that paranoid) that
says 'function X released' (and probably tagged), and that release commit
then has, as its
parent, the true release commit, with the true code tree. The latter commit
isn't actually being
shown to you!



At this point the potential for using the graft capability comes in (as a
regulated method!).
Locally the graft 

RE: [RFE] Inverted sparseness

2017-12-03 Thread Randall S. Becker
On December 3, 2017 6:14 PM, Philip Oakley wrote a nugget of wisdom: 
>From: "Randall S. Becker" 
>Sent: Friday, December 01, 2017 6:31 PM
>> On December 1, 2017 1:19 PM, Jeff Hostetler wrote:
>>>On 12/1/2017 12:21 PM, Randall S. Becker wrote:
 I recently encountered a really strange use-case relating to sparse 
 clone/fetch that is really backwards from the discussion that has 
 been going on, and well, I'm a bit embarrassed to bring it up, but I 
 have no good solution including building a separate data store that 
 will end up inconsistent with repositories (a bad solution).  The 
 use-case is as
 follows:

 Given a backbone of multiple git repositories spread across an 
 organization with a server farm and upstream vendors.
 The vendor delivers code by having the client perform git pull into 
 a specific branch.
 The customer may take the code as is or merge in customizations.
 The vendor wants to know exactly what commit of theirs is installed 
 on each server, in near real time.
 The customer is willing to push the commit-ish to the vendor's 
 upstream repo but does not want, by default, to share the actual 
 commit contents for security reasons.
 Realistically, the vendor needs to know that their own commit id was 
 put somewhere (process exists to track this, so not part of the 
 use-case) and whether there is a subsequent commit contributed >by 
 the customer, but the content is not relevant initially.

 After some time, the vendor may request the commit contents from the 
 customer in order to satisfy support requirements - a.k.a. a defect 
 was found but has to be resolved.
 The customer would then perform a deeper push that looks a lot like 
 a "slightly" symmetrical operation of a deep fetch following a prior 
 sparse fetch to supply the vendor with the specific commit(s).
>>
>>>Perhaps I'm not understanding the subtleties of what you're 
>>>describing, but could you do this with stock git functionality.
>>
>>>Let the vendor publish a "well known branch" for the client.
>>>Let the client pull that and build.
>>>Let the client create a branch set to the same commit that they fetched.
>>>Let the client push that branch as a client-specific branch to the 
>>>vendor to indicate that that is the official release they are based on.
>>
>>>Then the vendor would know the official commit that the client was using.
>> This is the easy part, and it doesn't require anything sparse to exist.
>>
>>>If the client makes local changes, does the vendor really need the SHA 
>>>of those -- without the actual content?
>>>I mean any SHA would do right?  Perhaps let the client create a second 
>>>client-specific branch (set to  the same commit as the first) to 
>>>indicate they had mods.
>>>Later, when the vendor needs the actual client changes, the client 
>>>does a normal push to this 2nd client-specific branch at the vendor.
>>>This would send everything that the client has done to the code since 
>>>the official release.
>>
>> What I should have added to the use-case was that there is a strong 
>> audit requirement (regulatory, actually) involved that the SHA is 
>> exact, immutable, and cannot be substitute or forged (one of the 
>> reasons git is in such high regard). So, no I can't arrange a fake SHA 
>> to represent a SHA to be named later. It SHA of the installed commit 
>> is part of the official record of what happened on the specific server, so 
>> I'm stuck with it.
>>
>>>I'm not sure what you mean about "it is inside a tree".
>>
>> m---a---b---c---H1
>>  `---d---H2
>>
>> d would be at a head. b would be inside. Determining content of c is 
>> problematic if b is sparse, so I'm really unsure that any of this is 
>> possible.

>I think I get the jist of your use case. Would I be right that you don't have 
>a true working
>solution yet? i.e. that it's a problem that is almost sorted but falls down at 
>the last step.

>If one pretended that this was a single development shop, and the various 
>vendors, clients
>and customers as being independent devolopers, each of whom is over protective 
>of their
>code, it may give a better view that maps onto classic feature development 
>diagrams.
>(i.e draw the answer for local devs, then mark where the splits happen)

>In particular, I think you could use a notional regulator's view that the 
>whole code base is
>part of a large Git heirarchy of branches and merges, and that some of the 
>feature loops
>are only available via the particular developer that worked on that feature.

>This would mean that from a regulatory overview there is a merge commit in the 
>'main'
>(master) heirachy that has the main and feature commits listed, and the 
>feature commit
>is probably an --allow-empty commit (that has an empty tree if they are that 
>paranoid) that
>says 'function X released' (and probably tagged), and that release 

Re: [RFE] Inverted sparseness

2017-12-03 Thread Philip Oakley

From: "Randall S. Becker" 
Sent: Friday, December 01, 2017 6:31 PM

On December 1, 2017 1:19 PM, Jeff Hostetler wrote:

On 12/1/2017 12:21 PM, Randall S. Becker wrote:
I recently encountered a really strange use-case relating to sparse 
clone/fetch that is really backwards from the discussion that has been 
going on, and well, I'm a bit embarrassed to bring it up, but I have no 
good solution including building a separate data store that will end up 
inconsistent with repositories (a bad solution).  The use-case is as 
follows:


Given a backbone of multiple git repositories spread across an 
organization with a server farm and upstream vendors.
The vendor delivers code by having the client perform git pull into a 
specific branch.

The customer may take the code as is or merge in customizations.
The vendor wants to know exactly what commit of theirs is installed on 
each server, in near real time.
The customer is willing to push the commit-ish to the vendor's upstream 
repo but does not want, by default, to share the actual commit contents 
for security reasons.
Realistically, the vendor needs to know that their own commit id was put 
somewhere (process exists to track this, so not part of the use-case) 
and whether there is a subsequent commit contributed >by the customer, 
but the content is not relevant initially.


After some time, the vendor may request the commit contents from the 
customer in order to satisfy support requirements - a.k.a. a defect was 
found but has to be resolved.
The customer would then perform a deeper push that looks a lot like a 
"slightly" symmetrical operation of a deep fetch following a prior 
sparse fetch to supply the vendor with the specific commit(s).


Perhaps I'm not understanding the subtleties of what you're describing, 
but could you do this with stock git functionality.



Let the vendor publish a "well known branch" for the client.
Let the client pull that and build.
Let the client create a branch set to the same commit that they fetched.
Let the client push that branch as a client-specific branch to the vendor 
to indicate that that is the official release they are based on.



Then the vendor would know the official commit that the client was using.

This is the easy part, and it doesn't require anything sparse to exist.

If the client makes local changes, does the vendor really need the SHA of 
those -- without the actual content?
I mean any SHA would do right?  Perhaps let the client create a second 
client-specific branch (set to

the same commit as the first) to indicate they had mods.
Later, when the vendor needs the actual client changes, the client does a 
normal push to this 2nd client-specific branch at the vendor.
This would send everything that the client has done to the code since the 
official release.


What I should have added to the use-case was that there is a strong audit 
requirement (regulatory, actually) involved that the SHA is exact, 
immutable, and cannot be substitute or forged (one of the reasons git is 
in such high regard). So, no I can't arrange a fake SHA to represent a SHA 
to be named later. It SHA of the installed commit is part of the official 
record of what happened on the specific server, so I'm stuck with it.



I'm not sure what you mean about "it is inside a tree".


m---a---b---c---H1
 `---d---H2

d would be at a head. b would be inside. Determining content of c is 
problematic if b is sparse, so I'm really unsure that any of this is 
possible.


Cheers,
Randall

-- Brief whoami: NonStop developer since approximately 
UNIX(421664400)/NonStop(2112884442)

-- In my real life, I talk too much.


I think I get the jist of your use case. Would I be right that you don't 
have a true working solution yet? i.e. that it's a problem that is almost 
sorted but falls down at the last step.


If one pretended that this was a single development shop, and the various 
vendors, clients and customers as being independent devolopers, each of whom 
is over protective of their code, it may give a better view that maps onto 
classic feature development diagrams. (i.e draw the answer for local devs, 
then mark where the splits happen)


In particular, I think you could use a notional regulator's view that the 
whole code base is part of a large Git heirarchy of branches and merges, and 
that some of the feature loops are only available via the particular 
developer that worked on that feature.


This would mean that from a regulatory overview there is a merge commit in 
the 'main' (master) heirachy that has the main and feature commits listed, 
and the feature commit is probably an --allow-empty commit (that has an 
empty tree if they are that paranoid) that says 'function X released' (and 
probably tagged), and that release commit then has, as its parent, the true 
release commit, with the true code tree. The latter commit isn't actually 
being shown to you!


At this point the potential for using 

RE: [RFE] Inverted sparseness

2017-12-01 Thread Randall S. Becker
On December 1, 2017 1:19 PM, Jeff Hostetler wrote:
>On 12/1/2017 12:21 PM, Randall S. Becker wrote:
>> I recently encountered a really strange use-case relating to sparse 
>> clone/fetch that is really backwards from the discussion that has been going 
>> on, and well, I'm a bit embarrassed to bring it up, but I have no good 
>> solution including building a separate data store that will end up 
>> inconsistent with repositories (a bad solution).  The use-case is as follows:
>> 
>> Given a backbone of multiple git repositories spread across an organization 
>> with a server farm and upstream vendors.
>> The vendor delivers code by having the client perform git pull into a 
>> specific branch.
>> The customer may take the code as is or merge in customizations.
>> The vendor wants to know exactly what commit of theirs is installed on each 
>> server, in near real time.
>> The customer is willing to push the commit-ish to the vendor's upstream repo 
>> but does not want, by default, to share the actual commit contents for 
>> security reasons.
>>  Realistically, the vendor needs to know that their own commit id was 
>> put somewhere (process exists to track this, so not part of the use-case) 
>> and whether there is a subsequent commit contributed >by the customer, but 
>> the content is not relevant initially.
>> 
>> After some time, the vendor may request the commit contents from the 
>> customer in order to satisfy support requirements - a.k.a. a defect was 
>> found but has to be resolved.
>> The customer would then perform a deeper push that looks a lot like a 
>> "slightly" symmetrical operation of a deep fetch following a prior sparse 
>> fetch to supply the vendor with the specific commit(s).

>Perhaps I'm not understanding the subtleties of what you're describing, but 
>could you do this with stock git functionality.

>Let the vendor publish a "well known branch" for the client.
>Let the client pull that and build.
>Let the client create a branch set to the same commit that they fetched.
>Let the client push that branch as a client-specific branch to the vendor to 
>indicate that that is the official release they are based on.

>Then the vendor would know the official commit that the client was using.
This is the easy part, and it doesn't require anything sparse to exist.

>If the client makes local changes, does the vendor really need the SHA of 
>those -- without the actual content?
>I mean any SHA would do right?  Perhaps let the client create a second 
>client-specific branch (set to
> the same commit as the first) to indicate they had mods.
>Later, when the vendor needs the actual client changes, the client does a 
>normal push to this 2nd client-specific branch at the vendor.
>This would send everything that the client has done to the code since the 
>official release.

What I should have added to the use-case was that there is a strong audit 
requirement (regulatory, actually) involved that the SHA is exact, immutable, 
and cannot be substitute or forged (one of the reasons git is in such high 
regard). So, no I can't arrange a fake SHA to represent a SHA to be named 
later. It SHA of the installed commit is part of the official record of what 
happened on the specific server, so I'm stuck with it.

>I'm not sure what you mean about "it is inside a tree".

m---a---b---c---H1
  `---d---H2

d would be at a head. b would be inside. Determining content of c is 
problematic if b is sparse, so I'm really unsure that any of this is possible.

Cheers,
Randall

-- Brief whoami: NonStop developer since approximately 
UNIX(421664400)/NonStop(2112884442) 
-- In my real life, I talk too much.





Re: [RFE] Inverted sparseness

2017-12-01 Thread Jeff Hostetler



On 12/1/2017 12:21 PM, Randall S. Becker wrote:

I recently encountered a really strange use-case relating to sparse clone/fetch 
that is really backwards from the discussion that has been going on, and well, 
I'm a bit embarrassed to bring it up, but I have no good solution including 
building a separate data store that will end up inconsistent with repositories 
(a bad solution).  The use-case is as follows:

Given a backbone of multiple git repositories spread across an organization 
with a server farm and upstream vendors.
The vendor delivers code by having the client perform git pull into a specific 
branch.
The customer may take the code as is or merge in customizations.
The vendor wants to know exactly what commit of theirs is installed on each 
server, in near real time.
The customer is willing to push the commit-ish to the vendor's upstream repo 
but does not want, by default, to share the actual commit contents for security 
reasons.
Realistically, the vendor needs to know that their own commit id was 
put somewhere (process exists to track this, so not part of the use-case) and 
whether there is a subsequent commit contributed by the customer, but the 
content is not relevant initially.

After some time, the vendor may request the commit contents from the customer 
in order to satisfy support requirements - a.k.a. a defect was found but has to 
be resolved.
The customer would then perform a deeper push that looks a lot like a 
"slightly" symmetrical operation of a deep fetch following a prior sparse fetch 
to supply the vendor with the specific commit(s).

This is not hard to realize if the sparse commit is HEAD on a branch, but if 
its inside a tree, well, I don't even know where to start. To self-deprecate, 
this is likely a bad idea, but it has come up a few times.

Thoughts? Nasty Remarks?

Randall


Perhaps I'm not understanding the subtleties of what you're describing,
but could you do this with stock git functionality.

Let the vendor publish a "well known branch" for the client.
Let the client pull that and build.
Let the client create a branch set to the same commit that they fetched.
Let the client push that branch as a client-specific branch to
the vendor to indicate that that is the official release they
are based on.

Then the vendor would know the official commit that the client was
using.

If the client makes local changes, does the vendor really need the
SHA of those -- without the actual content?  I mean any SHA would
do right?  Perhaps let the client create a second client-specific
branch (set to the same commit as the first) to indicate they had
mods.

Later, when the vendor needs the actual client changes, the client
does a normal push to this 2nd client-specific branch at the vendor.
This would send everything that the client has done to the code
since the official release.

I'm not sure what you mean about "it is inside a tree".

Hope this helps,
Jeff