Re: your thoughts wanted on bzr team UDD focus

2009-12-03 Thread James Westby
On Thu Dec 03 16:05:18 -0500 2009 John Arbash Meinel wrote:
> So how did Ubuntu find "C" such that it isn't an ancestor of "G"? Are
> they using different upstreams? Or are these tarball imports such that G
> secretly should be a descendant of C, but nobody recorded that fact?

The latter.

Debian and Ubuntu worked independently at that time, so we can't add a
bzr parent, however, there is a logical relationship there. This discrepancy
is the cause of the issue.

The problem is aggravated by the fact that we don't currently have complete
history for Debian.

Thanks,

James

-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel


Re: your thoughts wanted on bzr team UDD focus

2009-12-03 Thread Martin Pool
2009/12/4 Francis J. Lacoste :
> On December 3, 2009, Martin Pool wrote:
>> If there are existing bugs relevant to udd, or you know of
>> appropriately concrete and self-contained things that can be filed as
>> bugs, then tagging them and/or mentioning them here would be helpful.
>> It would give us something to be getting on with.  But I agree the
>> larger issues are too broad to make useful bugs now.  (One could have
>> placeholder bugs like "work out what to do about X" but I doubt that
>> helps.)
>>
>
> Actually, given the workflow you guys seem to favour, I think it might be
> sense. Otherwise, how to do you track and make sure that somebody drives the
> requirements process on these larger issues?

Momentum on the udd list, plus James's specs?  Or maybe we should do
it, at least as bugs against the udd project.

So I'll refine that position a bit to: those bugs are ok as long as
they're things we've agreed are reasonably in scope and things we're
actively working on.  If they're not being worked on, they just seems
like clutter that causes confusion later on.  (It's not quite the same
thing but Brian's confusion to do with finding old Launchpad
blueprints about communication which are half-implemented or obsolete
or generally detached from reality is the kind of thing I'd like to
avoid.)  Bugs which are not clearly falsifiable and not moving towards
being clear are a drag.

-- 
Martin 

-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel


Re: your thoughts wanted on bzr team UDD focus

2009-12-03 Thread Francis J. Lacoste
On December 3, 2009, Martin Pool wrote:
> If there are existing bugs relevant to udd, or you know of
> appropriately concrete and self-contained things that can be filed as
> bugs, then tagging them and/or mentioning them here would be helpful.
> It would give us something to be getting on with.  But I agree the
> larger issues are too broad to make useful bugs now.  (One could have
> placeholder bugs like "work out what to do about X" but I doubt that
> helps.)
> 

Actually, given the workflow you guys seem to favour, I think it might be 
sense. Otherwise, how to do you track and make sure that somebody drives the 
requirements process on these larger issues?

-- 
Francis J. Lacoste
francis.laco...@canonical.com


signature.asc
Description: This is a digitally signed message part.
-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel


Re: your thoughts wanted on bzr team UDD focus

2009-12-03 Thread Martin Pool
2009/12/4 James Westby :
> On Thu Dec 03 01:05:00 -0500 2009 Andrew Bennetts wrote:
>> Are there any other UDD-related bugs that should be filed (or existing ones
>> tagged)?
>
> Is the Bazaar team looking for all issues to be filed as bugs. I'm happy
> to do that, but a lot of things we have been discussing would currently
> considered to be too imprecise for a good bug report.

No, at least not at this stage.

If there are existing bugs relevant to udd, or you know of
appropriately concrete and self-contained things that can be filed as
bugs, then tagging them and/or mentioning them here would be helpful.
It would give us something to be getting on with.  But I agree the
larger issues are too broad to make useful bugs now.  (One could have
placeholder bugs like "work out what to do about X" but I doubt that
helps.)

As John said in another thread, I think people are having trouble
spanning from the big picture into what we can actually do to help.
We'll know we've reached that level when we do have a queue of bugs.
But I don't want to shortcircuit that process, or put all the work of
it onto you.

-- 
Martin 

-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel


Re: your thoughts wanted on bzr team UDD focus

2009-12-03 Thread John Arbash Meinel
James Westby wrote:
> On Wed Dec 02 23:01:01 -0500 2009 Robert Collins wrote:
>> https://bugs.edge.launchpad.net/udd/+bug/491711 is a bug I've filed
>> about merging specific files better; please provide feedback about
>> whether you think it might work for you.
>
> Thanks, sounds about right. We don't need much information to do this
> particular merge, just THIS and OTHER in the script I posted.
>
> I wonder about performance, for this we could register for any basename
> of "changelog", but other things may want to do some fairly intensive
> checking of whether they can handle the file in question based on its
> contents.
>
>> I've marked it high, because things that make bzr-builddeb etc easier
>> and simpler will help in maintenance of that code and its
>> understandability.
>
> Yes.
>
>> What do you need to be able to drop merge-package?
>
> Something that can merge two strands of development contained in one
> branch. merge-package is there because of this case you can get in to
>
>
> debian upstream .B--G
>A  \  \
> ubuntu upstream `--+---C  \
>  \  \   \  \
> debian\  E---+--H
>D` \
> ubuntu  `--F
>
> (Apologies to anyone using a screenreader)
>
> In words:
>
>   * Debian and Ubuntu ship the same upstream release (A) packaged
> in the same version (D).
>
>   * Debian updates to a new upstream release (B), packaged as E.
>
>   * Ubuntu leap-frogs Debian to an even newer upstream release (C),
> packaged as F.
>
>   * Debian then packages the latest upstream release (G) as H.
>
> Ubuntu now wishes to merge H in to F.

So how did Ubuntu find "C" such that it isn't an ancestor of "G"? Are
they using different upstreams? Or are these tarball imports such that G
secretly should be a descendant of C, but nobody recorded that fact?

John
=:->


-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel


Re: bzr/LP issues from work discussed at UDS

2009-12-03 Thread Michael Hudson
John Arbash Meinel wrote:

>> 3) Importing a lot more branches
>>
>> We want to import a lot more branches this cycle, all of those used
>> for maintaining packages in Debian. I don't have a definite number
>> that we want to import, but
>>
>>   http://upsilon.cc/~zack/stuff/vcs-usage/
>>
>> declares that there are 6881 source packages using a VCS. Therefore,
>> what would happen if tomorrow we increased the number of vcs-imports
>> by 5000? (What is the current number?)
> 
> I think we currently have 8k or so, with some fraction of that failing.

No.  We have 2921 code imports[1], of which 1672 are currently active[2].

[1] https://code.edge.launchpad.net/+code-imports
[2] https://code.edge.launchpad.net/+code-import-list

> At least, I thought I remembered about 1-2k failures, and a 25% failure
> rate. So 2/.25 = 8k.

>> It may be that the answer here is just “deal with the failures,” but
>> maybe there needs to be infrastructure work done before this. jml
>> says that it may just be a case of throwing more machines at it,
>> as the system is already built to be scalable.
> 
> Well, I would assume that growing from 1 puller to 2 pullers would be a
> significant growing pain. But growing from there to N pullers would be
> mostly a matter of throwing hardware at the problem.

It's lucky we already have three machines then!

https://code.edge.launchpad.net/+code-imports/+machines

> And while the system is probably designed to support >1 pullers, until
> you actually have 2 running concurrently, I don't think you can claim
> anything :).

Well yes.

For 5000 new imports we will definitely need some more hardware.
Without code changes I think we actually need a lot more hardware, about
10 machines total (because each machine can only start two jobs per
minute and each import updates 4 times a day).  There are some easy
changes to scheduling that will help here, and using a process pool
rather than a new process will probably help too (the majority of code
imports don't find any revisions to import, so are actually processed
very quickly).

There is also the issue of the load the import branches place on the
branch puller, thanks to this bug:

https://bugs.edge.launchpad.net/launchpad-code/+bug/487357

("the code import system calls requestMirror even if no revisions were
imported").

Cheers,
mwh

-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel


Re: bzr/LP issues from work discussed at UDS

2009-12-03 Thread John Arbash Meinel

...

>> http://host/path/to/X;branch=Y
>>
>> As the preferred syntax. It requires quoting on the command line, and
>> using 'branch=' is a bit more verbose if you were typing it manually.
>> But it ends up being the "least evil". So I'm guessing we should JFDI
>> and get something.
> 
> Sounds good to me.
> 
> I'm sure the question has been asked, but does git have a syntax for
> doing this?

I'm pretty sure that 'git clone' copies all branches in the repo.
Looking at the syntax for that command, it seems to have a "--branch X"
flag. Which would hint that you can't supply the branch as part of the URL.
Looking at:
http://www.kernel.org/pub/software/scm/git/docs/git-clone.html#_git_urls_a_id_urls_a

Also doesn't give any hint as to a URL one could use to specify a single
branch.

I'm certainly not an expert, but the best I can come up with is, no they
do not have a syntax for this.


> 
>> I think we currently have 8k or so, with some fraction of that failing.
>> At least, I thought I remembered about 1-2k failures, and a 25% failure
>> rate. So 2/.25 = 8k.
> 
> Thanks. That means we are looking at about doubling the number of imports
> this cycle.
> 
>> I think you need to have some Launchpad interfaces, so that people can
>> garden their own branches. Gardening needs to be done from time to time.
>> If we aren't going to do it ourselves, then we need to expose a way for
>> others to do it.
> 
> I agree. In this case however, we are taking the locations from Debian
> metadata, so we can at least semi-automatically do it.

Well, provided the people are properly maintaining that information. :)

> 
>>> 5) API for requesting a code import be tried ASAP
>>>
>>> Do Branch.requestMirror() and Branch.last_mirror_attempt refer to
>>> importing to the code if the branch is a vcs-imports one?
>>>
>>> If not, can we get an API similar to the above for vcs-imports?
>>>
>>> We would want to say “try now,” and then spend a while waiting
>>> for an indication it tried to import, so that we could be reasonably
>>> sure the import was up to date.
>> It sounds like you want a synchronous api, but probably something like:
>>
>> startMirror() # return when the attempt has started, or failed
>> waitForFinish() # wait until the current mirroring has finished, include
>> # info about how much has been imported.
> 
> I think synchronous is impossible in the LP API currently, if for the simple
> reason that the request will time out after a short time, likely to be
> longer than a mirror.
> 
> The two things I highlighted at least allow us to approximate this with
> polling. I am told that polling is probably the best we can do for
> at least the medium term with the LP API.
> 

Well, there are also callbacks, etc. The point is that your process
thinks of it synchronously. Even if it is abstracting away the internals.

>> Though I have to ask, how important is it to be at the current tip? What
>> do you want to do if the tip is 'active' and there is more that can be
>> pulled as soon as you finish the previous pull? Are you going to loop
>> until convergence?
>>
>> If you aren't waiting for convergence, is there harm in having an import
>> be < 24 hours out of date?
> 
> Consider this:
> 
>   * Debian maintainer upgrades to a new upstream version in their VCS.
>   * They test and upload the package.
>   * They then commit/push as appropriate for their VCS.
>   * We see the upload on average 3 hours later.
>   * The probability of the import running in those 3 hours is small.
>   * Therefore we won't be able to see the revision corresponding to
> the upload and so can't add it as a parent.
> 
> Therefore when we see the upload, I would like to trigger the code import
> system to make a best effort to be up to date at that point. It's not
> perfect, but it will cover the common case. (If the maintainer forgets
> to push then the time until the revision can be mirrored may be
> unbounded.)

So we are watching the uploads to the debian repo, and then trying to
replicate that in the Ubuntu data?

As far as "if the maintainer forgets to push", I think that is going to
be a significant fraction of the time. I've had it happen several times
with Jelmer and bzr-svn, and he's pretty savvy. As for "unbounded", I
think on average it is "by the time I poke him", or "the next chance is
the next time he packages an update".


> 
>>> 6) Guessing parent relationships
>>>
>>> We currently infer parent relationships from debian/changelog, as
>>> if you include changelog entries of another upload then we presume
>>> you merged the changes.
>> What about the imports that are from upstream (and presumably don't have
>> a debian/ directory at all)?
> 
> That is out of scope for this phase. We will have to solve this at some
> point, as possibly soon for daily builds as you suggest above.
> 
>>> We will need to start inferring parent relationships in some cases
>>> though, as there are some uses that means the code that was uploaded
>>> is n

Re: bzr/LP issues from work discussed at UDS

2009-12-03 Thread James Westby
Hi John,

Thanks for the comments.

On Thu Dec 03 11:07:24 -0500 2009 John Arbash Meinel wrote:
> So would "bzr merge --by-path" help this? Is it just that you need to
> merge just a subdir like 'debian'?

A two-way merge based on path might be sufficient. I'm not sure that
you would want to continue relying on that as every time would be
the same. It would be possible to put a branch in the middle that
joined the two.

Merging a subdir may be desirable, I don't have a feel yet for whether
that is what people would want to do. My impression is they they would
not in most cases.

> bzr stitch
> 
> Look in the ancestry of both branches, and try to figure out any
> revisions where the two trees were identical. This isn't perfect for the
> 'debian/' case, because the trees will never be identical. Both because
> you have a "debian/", though that can probably be trivially ignored, but
> also I think because you have debian/patches or whatever. So the
> *actual* content is meant to be after patching?
> 
> 
> What about a gui tool that let you create a new ancestry graph by
> selectively marking the revisions you want to sync up? So if you had:
> 
> A   X
> |   |
> B   Y
> |   |
> C   Z
> 
> You could then do:
> 
> A   X
> |\ /|
> B L Y
> |\|/|
> C M Z
>  \|/
>   N
> 
> Just a thought.
> 
> If you have cases where X is identical to A, then we could do this
> somewhat automatically. Or if X is identical to a subset of A (ignoring
> debian/ for example).

That would be an interesting approach. As I mention below, we have some
plans to do this, and is how we will join them in the end. Giving people
the tools to do it themselves could be a great move.

> We've had a lot of discussion on this point. I'm pretty sure we ended up
> with
> 
> http://host/path/to/X;branch=Y
> 
> As the preferred syntax. It requires quoting on the command line, and
> using 'branch=' is a bit more verbose if you were typing it manually.
> But it ends up being the "least evil". So I'm guessing we should JFDI
> and get something.

Sounds good to me.

I'm sure the question has been asked, but does git have a syntax for
doing this?

> I think we currently have 8k or so, with some fraction of that failing.
> At least, I thought I remembered about 1-2k failures, and a 25% failure
> rate. So 2/.25 = 8k.

Thanks. That means we are looking at about doubling the number of imports
this cycle.

> I think you need to have some Launchpad interfaces, so that people can
> garden their own branches. Gardening needs to be done from time to time.
> If we aren't going to do it ourselves, then we need to expose a way for
> others to do it.

I agree. In this case however, we are taking the locations from Debian
metadata, so we can at least semi-automatically do it.

> > 5) API for requesting a code import be tried ASAP
> > 
> > Do Branch.requestMirror() and Branch.last_mirror_attempt refer to
> > importing to the code if the branch is a vcs-imports one?
> > 
> > If not, can we get an API similar to the above for vcs-imports?
> > 
> > We would want to say “try now,” and then spend a while waiting
> > for an indication it tried to import, so that we could be reasonably
> > sure the import was up to date.
> 
> It sounds like you want a synchronous api, but probably something like:
> 
> startMirror() # return when the attempt has started, or failed
> waitForFinish() # wait until the current mirroring has finished, include
> # info about how much has been imported.

I think synchronous is impossible in the LP API currently, if for the simple
reason that the request will time out after a short time, likely to be
longer than a mirror.

The two things I highlighted at least allow us to approximate this with
polling. I am told that polling is probably the best we can do for
at least the medium term with the LP API.

> Though I have to ask, how important is it to be at the current tip? What
> do you want to do if the tip is 'active' and there is more that can be
> pulled as soon as you finish the previous pull? Are you going to loop
> until convergence?
> 
> If you aren't waiting for convergence, is there harm in having an import
> be < 24 hours out of date?

Consider this:

  * Debian maintainer upgrades to a new upstream version in their VCS.
  * They test and upload the package.
  * They then commit/push as appropriate for their VCS.
  * We see the upload on average 3 hours later.
  * The probability of the import running in those 3 hours is small.
  * Therefore we won't be able to see the revision corresponding to
the upload and so can't add it as a parent.

Therefore when we see the upload, I would like to trigger the code import
system to make a best effort to be up to date at that point. It's not
perfect, but it will cover the common case. (If the maintainer forgets
to push then the time until the revision can be mirrored may be
unbounded.)

> > 6) Guessing parent relationships
> > 
> > We currently infer parent relationships from debian/changelog, as
> 

Re: bzr/LP issues from work discussed at UDS

2009-12-03 Thread John Arbash Meinel
James Westby wrote:
> Hi,
> 
> I'd like to provide some information about some of the discussions that
> went on at UDS about UDD, and in particular some open questions related
> to bzr and Launchpad.
> 
> I have just written up two specs from the sessions:
> 
>   https://blueprints.launchpad.net/ubuntu/+spec/foundations-lucid-daily-builds
> 
> about all things daily builds, and
> 
>   
> https://blueprints.launchpad.net/ubuntu/+spec/foundations-lucid-distributed-development/
> 
> about using bzr for Ubuntu development.
> 
> There are a whole bunch of topics tied up in them, so I'd like to pull apart
> some of the threads for discussion. Most of these things are open questions,
> but some are a “please help do this” request. Some of them will be blocking
> things we want to roll out over the next 6 months.
> 
> 1) Merging unrelated branches in a recipe.
> 
> We currently have a rather unfortunate situation, but one we entered in to
> knowingly. We can associate lp: with an lp:ubuntu/ to know
> they contain the same code, and this would make it dead easy to set up
> the first cut of a daily build. However, as it currently stands the two
> branches will, except for a minority of projects, share no revision history,
> meaning that they can't be merged, and so can't be combined in a recipe.
> 
> There are a couple of main drawbacks to this, namely that starting a daily
> build is more work than it could be, and that changes made in the packaging
> of the daily build aren't directly mergeable back to the packaging.
> 

So would "bzr merge --by-path" help this? Is it just that you need to
merge just a subdir like 'debian'?

> We have a plan to rectify this. It however is a multi-year plan, so we may
> want to sidestep the issue somewhat. There are good reasons for it being
> a long term plan, but it's not out of the question that this issue, and
> others below, force us to re-evaluate this. It should not be done lightly
> though.
> 
> One way to alleviate the pressure on this issue would be to make it possible
> to combine lp: and lp:ubuntu/, even though they are not
> mergeable.
> 
> https://code.edge.launchpad.net/~spiv/bzr-builder/merge-subdirs-479705/+merge/14979
> is said to go some way towards doing this, but as I say within, I think
> I am missing something, as it can't be the whole solution on its own.
> 
> What I am looking for here is suggestions on how we can elegantly allow
> people to combine the two trees in a system that isn't too fragile.

Path tokens are a "nice" fix for this, but fairly involved. And they
don't solve everything. (You still end up with 2x the history in the
parallel import case. Switching to a content-hash storage for file texts
would make this a little bit better.)

merge --by-path

would theoretically try to do a 2-way merge of the file contents for
every file. The problem with 2-way merge is that without a BASE, where
things differ, you don't know which one is "newer". (So if you change a
line "A => B", we don't know whether you changed A => B or other changed
B => A.)


bzr stitch

Look in the ancestry of both branches, and try to figure out any
revisions where the two trees were identical. This isn't perfect for the
'debian/' case, because the trees will never be identical. Both because
you have a "debian/", though that can probably be trivially ignored, but
also I think because you have debian/patches or whatever. So the
*actual* content is meant to be after patching?


What about a gui tool that let you create a new ancestry graph by
selectively marking the revisions you want to sync up? So if you had:

A   X
|   |
B   Y
|   |
C   Z

You could then do:

A   X
|\ /|
B L Y
|\|/|
C M Z
 \|/
  N

Just a thought.

If you have cases where X is identical to A, then we could do this
somewhat automatically. Or if X is identical to a subset of A (ignoring
debian/ for example).

> 
> 2) Importing non-master branches
> 
> I know this is being discussed elsewhere right now, but this is another
> area where this came up as being useful/a blocker. I don't want to split
> the discussion, but just wanted to register another vote for being
> able to do this.
> 
> We may also want to do some interesting things with SVN imports, depending
> on how they are layed out. I haven't looked in to it, but I imagine that
> switching to bzr-svn could change what we can do.
> 

We've had a lot of discussion on this point. I'm pretty sure we ended up
with

http://host/path/to/X;branch=Y

As the preferred syntax. It requires quoting on the command line, and
using 'branch=' is a bit more verbose if you were typing it manually.
But it ends up being the "least evil". So I'm guessing we should JFDI
and get something.


> 3) Importing a lot more branches
> 
> We want to import a lot more branches this cycle, all of those used
> for maintaining packages in Debian. I don't have a definite number
> that we want to import, but
> 
>   http://upsilon.cc/~zack/stuff/vcs-usage/
> 
> declares that there are 6881 sourc

Re: your thoughts wanted on bzr team UDD focus

2009-12-03 Thread James Westby
On Thu Dec 03 01:05:00 -0500 2009 Andrew Bennetts wrote:
> Are there any other UDD-related bugs that should be filed (or existing ones
> tagged)?

Is the Bazaar team looking for all issues to be filed as bugs. I'm happy
to do that, but a lot of things we have been discussing would currently
considered to be too imprecise for a good bug report.

Thanks,

James

-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel


Re: your thoughts wanted on bzr team UDD focus

2009-12-03 Thread Robert Collins
On Thu, 2009-12-03 at 17:05 +1100, Andrew Bennetts wrote:

> Are there any other UDD-related bugs that should be filed (or existing
> ones
> tagged)? 

I suspect there are many, but we'll need to find them as we join dots
up. I haven't read James' latest email, but I suspect it is relevant.

-Rob


signature.asc
Description: This is a digitally signed message part
-- 
ubuntu-distributed-devel mailing list
ubuntu-distributed-devel@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-distributed-devel