[PATCH v4] git-p4: Use -m when running p4 changes

2015-04-20 Thread Lex Spoon
Simply running p4 changes on a large branch can
result in a too many rows scanned error from the
Perforce server. It is better to use a sequence
of smaller calls to p4 changes, using the -m
option to limit the size of each call.

Signed-off-by: Lex Spoon l...@lexspoon.org
Reviewed-by: Junio C Hamano gits...@pobox.com
Reviewed-by: Luke Diamand l...@diamand.org
---
Updated to avoid the crash Luke pointed out.
All t98* tests pass now except for t9814,
which is already failing on master for some reason.

 Documentation/git-p4.txt | 17 ++---
 git-p4.py| 52 ++-
 t/t9818-git-p4-block.sh  | 64 
 3 files changed, 119 insertions(+), 14 deletions(-)
 create mode 100755 t/t9818-git-p4-block.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index a1664b9..82aa5d6 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -225,9 +225,20 @@ Git repository:
they can find the p4 branches in refs/heads.
 
 --max-changes n::
-   Limit the number of imported changes to 'n'.  Useful to
-   limit the amount of history when using the '@all' p4 revision
-   specifier.
+   Import at most 'n' changes, rather than the entire range of
+   changes included in the given revision specifier. A typical
+   usage would be use '@all' as the revision specifier, but then
+   to use '--max-changes 1000' to import only the last 1000
+   revisions rather than the entire revision history.
+
+--changes-block-size n::
+   The internal block size to use when converting a revision
+   specifier such as '@all' into a list of specific change
+   numbers. Instead of using a single call to 'p4 changes' to
+   find the full list of changes for the conversion, there are a
+   sequence of calls to 'p4 changes -m', each of which requests
+   one block of changes of the given size. The default block size
+   is 500, which should usually be suitable.
 
 --keep-path::
The mapping of file names from the p4 depot path to Git, by
diff --git a/git-p4.py b/git-p4.py
index 549022e..e28033f 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = 
refs/remotes/p4/, silent
 def originP4BranchesExist():
 return gitBranchExists(origin) or gitBranchExists(origin/p4) or 
gitBranchExists(origin/p4/master)
 
-def p4ChangesForPaths(depotPaths, changeRange):
+def p4ChangesForPaths(depotPaths, changeRange, block_size):
 assert depotPaths
-cmd = ['changes']
-for p in depotPaths:
-cmd += [%s...%s % (p, changeRange)]
-output = p4_read_pipe_lines(cmd)
+assert block_size
+
+# Parse the change range into start and end
+if changeRange is None or changeRange == '':
+changeStart = '@1'
+changeEnd = '#head'
+else:
+parts = changeRange.split(',')
+assert len(parts) == 2
+changeStart = parts[0]
+changeEnd = parts[1]
 
+# Accumulate change numbers in a dictionary to avoid duplicates
 changes = {}
-for line in output:
-changeNum = int(line.split( )[1])
-changes[changeNum] = True
+
+for p in depotPaths:
+# Retrieve changes a block at a time, to prevent running
+# into a MaxScanRows error from the server.
+start = changeStart
+end = changeEnd
+get_another_block = True
+while get_another_block:
+new_changes = []
+cmd = ['changes']
+cmd += ['-m', str(block_size)]
+cmd += [%s...%s,%s % (p, start, end)]
+for line in p4_read_pipe_lines(cmd):
+changeNum = int(line.split( )[1])
+new_changes.append(changeNum)
+changes[changeNum] = True
+if len(new_changes) == block_size:
+get_another_block = True
+end = '@' + str(min(new_changes))
+else:
+get_another_block = False
 
 changelist = changes.keys()
 changelist.sort()
@@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap):
 optparse.make_option(--import-labels, dest=importLabels, 
action=store_true),
 optparse.make_option(--import-local, 
dest=importIntoRemotes, action=store_false,
  help=Import into refs/heads/ , not 
refs/remotes),
-optparse.make_option(--max-changes, dest=maxChanges),
+optparse.make_option(--max-changes, dest=maxChanges,
+ help=Maximum number of changes to 
import),
+optparse.make_option(--changes-block-size, 
dest=changes_block_size, type=int,
+ help=Internal block size to use when 
iteratively calling p4 changes),
 optparse.make_option(--keep-path, dest=keepRepoPath, 
action='store_true',
   

Re: [PATCH v4] git-p4: Use -m when running p4 changes

2015-04-20 Thread Luke Diamand
Sorry - could you resubmit your patch (PATCHv4 it will be) with this
change squashed in please? It will make life much easier, especially
for Junio!

Thanks!
Luke


On 20 April 2015 at 16:00, Lex Spoon l...@lexspoon.org wrote:
 Simply running p4 changes on a large branch can
 result in a too many rows scanned error from the
 Perforce server. It is better to use a sequence
 of smaller calls to p4 changes, using the -m
 option to limit the size of each call.

 Signed-off-by: Lex Spoon l...@lexspoon.org
 Reviewed-by: Junio C Hamano gits...@pobox.com
 Reviewed-by: Luke Diamand l...@diamand.org
 ---
 Updated to avoid the crash Luke pointed out.
 All t98* tests pass now except for t9814,
 which is already failing on master for some reason.

  Documentation/git-p4.txt | 17 ++---
  git-p4.py| 52 ++-
  t/t9818-git-p4-block.sh  | 64 
 
  3 files changed, 119 insertions(+), 14 deletions(-)
  create mode 100755 t/t9818-git-p4-block.sh

 diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
 index a1664b9..82aa5d6 100644
 --- a/Documentation/git-p4.txt
 +++ b/Documentation/git-p4.txt
 @@ -225,9 +225,20 @@ Git repository:
 they can find the p4 branches in refs/heads.

  --max-changes n::
 -   Limit the number of imported changes to 'n'.  Useful to
 -   limit the amount of history when using the '@all' p4 revision
 -   specifier.
 +   Import at most 'n' changes, rather than the entire range of
 +   changes included in the given revision specifier. A typical
 +   usage would be use '@all' as the revision specifier, but then
 +   to use '--max-changes 1000' to import only the last 1000
 +   revisions rather than the entire revision history.
 +
 +--changes-block-size n::
 +   The internal block size to use when converting a revision
 +   specifier such as '@all' into a list of specific change
 +   numbers. Instead of using a single call to 'p4 changes' to
 +   find the full list of changes for the conversion, there are a
 +   sequence of calls to 'p4 changes -m', each of which requests
 +   one block of changes of the given size. The default block size
 +   is 500, which should usually be suitable.

  --keep-path::
 The mapping of file names from the p4 depot path to Git, by
 diff --git a/git-p4.py b/git-p4.py
 index 549022e..e28033f 100755
 --- a/git-p4.py
 +++ b/git-p4.py
 @@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = 
 refs/remotes/p4/, silent
  def originP4BranchesExist():
  return gitBranchExists(origin) or gitBranchExists(origin/p4) or 
 gitBranchExists(origin/p4/master)

 -def p4ChangesForPaths(depotPaths, changeRange):
 +def p4ChangesForPaths(depotPaths, changeRange, block_size):
  assert depotPaths
 -cmd = ['changes']
 -for p in depotPaths:
 -cmd += [%s...%s % (p, changeRange)]
 -output = p4_read_pipe_lines(cmd)
 +assert block_size
 +
 +# Parse the change range into start and end
 +if changeRange is None or changeRange == '':
 +changeStart = '@1'
 +changeEnd = '#head'
 +else:
 +parts = changeRange.split(',')
 +assert len(parts) == 2
 +changeStart = parts[0]
 +changeEnd = parts[1]

 +# Accumulate change numbers in a dictionary to avoid duplicates
  changes = {}
 -for line in output:
 -changeNum = int(line.split( )[1])
 -changes[changeNum] = True
 +
 +for p in depotPaths:
 +# Retrieve changes a block at a time, to prevent running
 +# into a MaxScanRows error from the server.
 +start = changeStart
 +end = changeEnd
 +get_another_block = True
 +while get_another_block:
 +new_changes = []
 +cmd = ['changes']
 +cmd += ['-m', str(block_size)]
 +cmd += [%s...%s,%s % (p, start, end)]
 +for line in p4_read_pipe_lines(cmd):
 +changeNum = int(line.split( )[1])
 +new_changes.append(changeNum)
 +changes[changeNum] = True
 +if len(new_changes) == block_size:
 +get_another_block = True
 +end = '@' + str(min(new_changes))
 +else:
 +get_another_block = False

  changelist = changes.keys()
  changelist.sort()
 @@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap):
  optparse.make_option(--import-labels, dest=importLabels, 
 action=store_true),
  optparse.make_option(--import-local, 
 dest=importIntoRemotes, action=store_false,
   help=Import into refs/heads/ , not 
 refs/remotes),
 -optparse.make_option(--max-changes, dest=maxChanges),
 +optparse.make_option(--max-changes, dest=maxChanges,
 + help=Maximum number of 

Re: [PATCH v4] git-p4: Use -m when running p4 changes

2015-04-20 Thread Junio C Hamano
Luke Diamand l...@diamand.org writes:

 Sorry - could you resubmit your patch (PATCHv4 it will be) with this
 change squashed in please? It will make life much easier, especially
 for Junio!

Thanks for caring, but this seems to be a full patch to replace v3.

It was sent with your Reviewed-by already in, but I'd tentatively
remove that line while queuing it to 'pu' and ask you to double
check if the patch makes sense (and after your yes, it does, I'd
add the Reviewed-by back).

Thanks.


 Thanks!
 Luke


 On 20 April 2015 at 16:00, Lex Spoon l...@lexspoon.org wrote:
 Simply running p4 changes on a large branch can
 result in a too many rows scanned error from the
 Perforce server. It is better to use a sequence
 of smaller calls to p4 changes, using the -m
 option to limit the size of each call.

 Signed-off-by: Lex Spoon l...@lexspoon.org
 Reviewed-by: Junio C Hamano gits...@pobox.com
 Reviewed-by: Luke Diamand l...@diamand.org
 ---
 Updated to avoid the crash Luke pointed out.
 All t98* tests pass now except for t9814,
 which is already failing on master for some reason.

  Documentation/git-p4.txt | 17 ++---
  git-p4.py| 52 ++-
  t/t9818-git-p4-block.sh  | 64 
 
  3 files changed, 119 insertions(+), 14 deletions(-)
  create mode 100755 t/t9818-git-p4-block.sh

 diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
 index a1664b9..82aa5d6 100644
 --- a/Documentation/git-p4.txt
 +++ b/Documentation/git-p4.txt
 @@ -225,9 +225,20 @@ Git repository:
 they can find the p4 branches in refs/heads.

  --max-changes n::
 -   Limit the number of imported changes to 'n'.  Useful to
 -   limit the amount of history when using the '@all' p4 revision
 -   specifier.
 +   Import at most 'n' changes, rather than the entire range of
 +   changes included in the given revision specifier. A typical
 +   usage would be use '@all' as the revision specifier, but then
 +   to use '--max-changes 1000' to import only the last 1000
 +   revisions rather than the entire revision history.
 +
 +--changes-block-size n::
 +   The internal block size to use when converting a revision
 +   specifier such as '@all' into a list of specific change
 +   numbers. Instead of using a single call to 'p4 changes' to
 +   find the full list of changes for the conversion, there are a
 +   sequence of calls to 'p4 changes -m', each of which requests
 +   one block of changes of the given size. The default block size
 +   is 500, which should usually be suitable.

  --keep-path::
 The mapping of file names from the p4 depot path to Git, by
 diff --git a/git-p4.py b/git-p4.py
 index 549022e..e28033f 100755
 --- a/git-p4.py
 +++ b/git-p4.py
 @@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = 
 refs/remotes/p4/, silent
  def originP4BranchesExist():
  return gitBranchExists(origin) or gitBranchExists(origin/p4) or 
 gitBranchExists(origin/p4/master)

 -def p4ChangesForPaths(depotPaths, changeRange):
 +def p4ChangesForPaths(depotPaths, changeRange, block_size):
  assert depotPaths
 -cmd = ['changes']
 -for p in depotPaths:
 -cmd += [%s...%s % (p, changeRange)]
 -output = p4_read_pipe_lines(cmd)
 +assert block_size
 +
 +# Parse the change range into start and end
 +if changeRange is None or changeRange == '':
 +changeStart = '@1'
 +changeEnd = '#head'
 +else:
 +parts = changeRange.split(',')
 +assert len(parts) == 2
 +changeStart = parts[0]
 +changeEnd = parts[1]

 +# Accumulate change numbers in a dictionary to avoid duplicates
  changes = {}
 -for line in output:
 -changeNum = int(line.split( )[1])
 -changes[changeNum] = True
 +
 +for p in depotPaths:
 +# Retrieve changes a block at a time, to prevent running
 +# into a MaxScanRows error from the server.
 +start = changeStart
 +end = changeEnd
 +get_another_block = True
 +while get_another_block:
 +new_changes = []
 +cmd = ['changes']
 +cmd += ['-m', str(block_size)]
 +cmd += [%s...%s,%s % (p, start, end)]
 +for line in p4_read_pipe_lines(cmd):
 +changeNum = int(line.split( )[1])
 +new_changes.append(changeNum)
 +changes[changeNum] = True
 +if len(new_changes) == block_size:
 +get_another_block = True
 +end = '@' + str(min(new_changes))
 +else:
 +get_another_block = False

  changelist = changes.keys()
  changelist.sort()
 @@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap):
  optparse.make_option(--import-labels, 
 dest=importLabels, action=store_true),
  optparse.make_option(--import-local, 

Re: [PATCH v4] git-p4: Use -m when running p4 changes

2015-04-20 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Luke Diamand l...@diamand.org writes:

 Sorry - could you resubmit your patch (PATCHv4 it will be) with this
 change squashed in please? It will make life much easier, especially
 for Junio!

 Thanks for caring, but this seems to be a full patch to replace v3.

 It was sent with your Reviewed-by already in, but I'd tentatively
 remove that line while queuing it to 'pu' and ask you to double
 check if the patch makes sense (and after your yes, it does, I'd
 add the Reviewed-by back).

 Thanks.

Just to make it easier to see, the interdiff between v3 and v4 looks
like this:

 git-p4.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 1fba3aa..e28033f 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -2608,7 +2608,7 @@ class P4Sync(Command, P4UserMap):
 
 return 
 
-def importNewBranch(self, branch, maxChange, changes_block_size):
+def importNewBranch(self, branch, maxChange):
 # make fast-import flush all changes to disk and update the refs using 
the checkpoint
 # command so that we can try to find the branch parent in the git 
history
 self.gitStream.write(checkpoint\n\n);
@@ -2616,7 +2616,7 @@ class P4Sync(Command, P4UserMap):
 branchPrefix = self.depotPaths[0] + branch + /
 range = @1,%s % maxChange
 #print prefix + branchPrefix
-changes = p4ChangesForPaths([branchPrefix], range, changes_block_size)
+changes = p4ChangesForPaths([branchPrefix], range, 
self.changes_block_size)
 if len(changes) = 0:
 return False
 firstChange = changes[0]
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] git-p4: Use -m when running p4 changes

2015-04-20 Thread Lex Spoon
On Mon, Apr 20, 2015 at 11:15 AM, Luke Diamand l...@diamand.org wrote:
 Sorry - could you resubmit your patch (PATCHv4 it will be) with this
 change squashed in please? It will make life much easier, especially
 for Junio!

The message you just responded is already the squashed version. It's a
single patch that includes all changes so far discussed. The subject
line says PATCH v4, although since it's in the same thread, not all
email clients will show the subject change.

Let me know if I can do more to make the process go smoothly.

Lex Spoon
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] git-p4: Use -m when running p4 changes

2015-04-20 Thread Luke Diamand

On 20/04/15 16:25, Lex Spoon wrote:

On Mon, Apr 20, 2015 at 11:15 AM, Luke Diamand l...@diamand.org wrote:

Sorry - could you resubmit your patch (PATCHv4 it will be) with this
change squashed in please? It will make life much easier, especially
for Junio!


The message you just responded is already the squashed version. It's a
single patch that includes all changes so far discussed. The subject
line says PATCH v4, although since it's in the same thread, not all
email clients will show the subject change.


Not sure how I missed that! It looks good, now, Ack!

Thanks!
Luke

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html