[PATCH V2] update: warn if cwd was deleted

2016-09-02 Thread Stanislau Hlebik
# HG changeset patch
# User Stanislau Hlebik 
# Date 1472848944 25200
#  Fri Sep 02 13:42:24 2016 -0700
# Node ID 374caff3f6ba064070d01329eec1bd33a6a45370
# Parent  318e2b600b80e4ed3c6f37df46ec7544f60d4c0b
update: warn if cwd was deleted

During update directories are deleted as soon as they have no entries.
But if current working directory is deleted then it cause problems
in complex commands like 'hg split'. This commit adds a warning
that will help users figure the problem faster.

diff --git a/mercurial/merge.py b/mercurial/merge.py
--- a/mercurial/merge.py
+++ b/mercurial/merge.py
@@ -1043,7 +1043,19 @@
 repo.ui.note(_("removing %s\n") % f)
 audit(f)
 try:
-unlink(wjoin(f), ignoremissing=True)
+try:
+cwd = os.getcwd()
+except OSError:
+cwd = None
+path = wjoin(f)
+unlink(path, ignoremissing=True)
+try:
+os.getcwd()
+except OSError:
+# Print a warning if cwd was deleted
+if cwd and path.startswith(cwd):
+repo.ui.warn(_("cwd was deleted - consider "
+   "changing cwd to repo root\n"))
 except OSError as inst:
 repo.ui.warn(_("update failed to remove %s: %s!\n") %
  (f, inst.strerror))
diff --git a/tests/test-rebase-scenario-global.t 
b/tests/test-rebase-scenario-global.t
--- a/tests/test-rebase-scenario-global.t
+++ b/tests/test-rebase-scenario-global.t
@@ -758,6 +758,7 @@
   $ hg commit -m 'second source with subdir'
   $ hg rebase -b . -d 1 --traceback
   rebasing 2:779a07b1b7a0 "first source commit"
+  cwd was deleted - consider changing cwd to repo root
   rebasing 3:a7d6f3a00bf3 "second source with subdir" (tip)
   saved backup bundle to 
$TESTTMP/cwd-vanish/.hg/strip-backup/779a07b1b7a0-853e0073-backup.hg (glob)
 
diff --git a/tests/test-update-names.t b/tests/test-update-names.t
--- a/tests/test-update-names.t
+++ b/tests/test-update-names.t
@@ -72,3 +72,14 @@
   $ cd ..
 
 #endif
+
+Test that warning is printed if cwd is deleted during update
+  $ hg init r4 && cd r4
+  $ mkdir dir
+  $ cd dir
+  $ echo a > a
+  $ echo b > b
+  $ hg add a b
+  $ hg ci -m "file and dir"
+  $ hg up -q null
+  cwd was deleted - consider changing cwd to repo root
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH] crecord: properly handle files with No newline at eof (issue5268)

2016-09-02 Thread timeless
# HG changeset patch
# User timeless 
# Date 1472847337 0
#  Fri Sep 02 20:15:37 2016 +
# Node ID 68a20f02785b24f08b13e21ffc9e2a05031b07f8
# Parent  f148bfa40489269be2e48046734f81065129847a
# Available At https://bitbucket.org/timeless/mercurial-crew
#  hg pull https://bitbucket.org/timeless/mercurial-crew -r 
68a20f02785b
crecord: properly handle files with No newline at eof (issue5268)

Yes, this bug was a single character with the wrong case...

diff -r f148bfa40489 -r 68a20f02785b mercurial/crecord.py
--- a/mercurial/crecord.py  Tue Jul 05 09:37:07 2016 +0200
+++ b/mercurial/crecord.py  Fri Sep 02 20:15:37 2016 +
@@ -387,7 +387,7 @@
 
 contextlen = (len(self.before) + len(self.after) +
   removedconvertedtocontext)
-if self.after and self.after[-1] == '\\ no newline at end of file\n':
+if self.after and self.after[-1] == '\\ No newline at end of file\n':
 contextlen -= 1
 fromlen = contextlen + self.removed
 tolen = contextlen + self.added
diff -r f148bfa40489 -r 68a20f02785b tests/test-commit-interactive-curses.t
--- a/tests/test-commit-interactive-curses.tTue Jul 05 09:37:07 2016 +0200
+++ b/tests/test-commit-interactive-curses.tFri Sep 02 20:15:37 2016 +
@@ -9,6 +9,21 @@
   > crecordtest = testModeCommands
   > EOF
 
+Record with noeol at eof (issue5268)
+  $ hg init noeol
+  $ cd noeol
+  $ printf '0' > a
+  $ printf '0\n' > b
+  $ hg ci -Aqm initial
+  $ printf '1\n0' > a
+  $ printf '1\n0\n' > b
+  $ cat  c
+  > EOF
+  $ HGEDITOR="\"sh\" \"`pwd`/editor.sh\"" hg commit  -i -m "add hunks" -d "0 0"
+  $ cd ..
+
+Normal repo
   $ hg init a
   $ cd a
 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 4 of 5] manifest: use property instead of field for manifest revlog storage

2016-09-02 Thread FUJIWARA Katsunori
At Wed, 17 Aug 2016 13:35:41 -0700,
Durham Goode wrote:
> 
> 
> On 8/10/16 1:01 AM, FUJIWARA Katsunori wrote:
> > At Mon, 8 Aug 2016 18:17:13 -0700,
> > Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode 
> >> # Date 1470697899 25200
> >> #  Mon Aug 08 16:11:39 2016 -0700
> >> # Node ID 6a4c09571793d56c8dad1a6760e3fc1293b9a0b6
> >> # Parent  f73abdf84e8e2eb9e2029cb28e2246a55b2d2f49
> >> manifest: use property instead of field for manifest revlog storage
> >>
> >> The file caches we're using to avoid reloading the manifest from disk 
> >> everytime
> >> has an annoying bug that causes the in memory structure to not be reloaded 
> >> if
> >> the mtime and the size haven't changed. This causes a breakage in the tests
> >> because the manifestlog is not being reloaded after a commit+strip 
> >> operation in
> >> mq (the mtime is the same because it all happens in the same second, and 
> >> the
> >> resulting size is the same because we add 1 and remove 1). The only reason 
> >> this
> >> doesn't affect the manifest itself is because we touch it so often that we
> >> had already reloaded it after the commit, but before the strip.

> > Would you tell me the case to reproduce the issue around caching
> > manifest, if it still occurs on recent Mercurial ? It helps to find
> > out code paths overlooked by ExactCacheValidationPlan.
> >
> $ hg pull -r 7f14ccb https://bitbucket.org/DurhamG/hg
> $ hg up 7f14ccb
> 
> Apply this change:
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -504,7 +504,7 @@ class localrepository(object):
>   def manifest(self):
>   return manifest.manifest(self.svfs)
> 
> -@property
> +@storecache('00manifest.i')
>   def manifestlog(self):
>   return manifest.manifestlog(self.svfs, self)
> 
> 
> $ ./run-tests.py test-mq-merge.t
> 

Sorry for late followup. I had investigated unexpected filecache
behavior.

Let me re-summarize the issue.


At first, "file stat ambiguity of 00manifest.i around stripping" isn't
the root cause of unexpected failure at making repo.manifestlog
@storecache, even though such ambiguity itself should be fixed.

It is a kind of accidental side effect that fixing such ambiguity
avoids failure in some cases (detail is explained later).


The root cause of failure at making repo.manifestlog @storecache is
that "manifest" object referred by repo.manifest diverges from one
referred by manifestctx cached in manifestlog.

After Durham's works (including ones not yet queued into main repo),
"manifest" object is often referred via _revlog of cached manifestctx
instead of repo.manifest. This prevents file stat for repo.manifest
property from being refreshed.

It is important that @filecache-ed repo.manifest property focuses not
on actual using cached "manifest" object, but accessing via
repo.manifest.

Therefore, un-refreshed file stat causes unexpected reloading
repo.manifest from file, even though referred "manifest" object itself
is still valid at that time.

Then, repo.manifest and manifestctx._revlog don't refer same
"manifest" object (call this "manifest divergence" in this
explanation). This "manifest divergence" causes inconsistency.

Typical steps below can be reproduced in test-revlog-ancestry.py.

  1. at the first transaction:

 assumption: only cached manifestctx is required

 1-1. repo.manifest is bypassed by cached manifestctx, because it
  already refers "manifest" object

 1-2. operations apply changes on manifestctx._revlog

 1-3. changes on manifestctx._revlog are written into 00manifest.i

 1-4. closing transaction doesn't refresh file stat of repo.manifest
  because of bypassing at (1-1)

  2. at the second transaction:

 assumption: new (= not cached) manifestctx is required

 2-1. accessing to repo.manifest reloads 00manifest.i,
  because file stat of it isn't refreshed at (1-3)

  e.g. merging with or committing on old revision causes
  accessing to repo.manifest. BTW, failure in test-mq-merge.t
  also occurs just after committing merged result.

  Here, reloaded "manifest" diverges from _revlog of already
  cached manifestctx.

 2-2. operations apply changes on _revlog of cached manifestctx

  repo.manifest isn't aware of these changes, because of its
  divergence.

 2-3. changes on manifestctx._revlog are written into 00manifest.i

 2-4. closing transaction refreshes file stat of not only
  repo.manifestlog but also repo.manifest, even though the
  latter doesn't contain

  Accessing to it at (2-1) causes this refreshing.

  3. then:

 3-1. accessing to repo.manifest reuses already cached "manifest"
  object, because file stat of it is refreshed at (2-4)

 3-2. but looking node created at (2-2) up in it causes failure,
  because it doesn't 

[PATCH 3 of 4] streamclone: clear caches after writing changes into files for visibility

2016-09-02 Thread FUJIWARA Katsunori
# HG changeset patch
# User FUJIWARA Katsunori 
# Date 1472840249 -32400
#  Sat Sep 03 03:17:29 2016 +0900
# Node ID 33e29b5ebe885cdeeef433861e2d165b882d48d1
# Parent  32f53aaf63f8db0c1352a3b0afb290c425486704
streamclone: clear caches after writing changes into files for visibility

Before this patch, streamclone-ed changes are invisible via @filecache
properties to in-process procedures before closing transaction
(e.g. pretxnclose python hook), if corresponded property is cached
before consumev1(). Strictly speaking, caching should occur inside
(store) lock for transaction.

repo.invalidate() after closing transaction is too late to force
@filecache properties to be reloaded from changed files at next
access.

For visibility of streamclone-ed changes to in-process procedures
before closing transaction, this patch clears caches just after
writing changes into files.

Simply moving repo.invalidate() invocation into transaction scope
causes unexpected result, because it clears in-memory changes of
fncache before writing them out at closing transaction. "unexpected
result" requires rebuilding fncache by "hg debugrebuildfncache".

Therefore, this patch uses combination of invalidatefilecaches() and
invalidatecaches() instead of invalidate(). The latter consists of the
formers and store.invalidatecaches(), which clears in-memory fncache
changes.

BTW, regardless of changes in this patch, clearing cached properties
in consumev1() causes inconsistency, if (1) transaction is started and
(2) any @filecache property is changed before consumev1().

This (potential) inconsistency should be fixed in the future, and
this is reason why this patch adds comment about it as memorandum.

diff --git a/mercurial/streamclone.py b/mercurial/streamclone.py
--- a/mercurial/streamclone.py
+++ b/mercurial/streamclone.py
@@ -299,6 +299,20 @@ def consumev1(repo, fp, filecount, bytec
 repo.ui.progress(_('clone'), 0, total=bytecount, unit=_('bytes'))
 start = time.time()
 
+# TODO: get rid of (potential) inconsistency
+#
+# If transaction is started and any @filecache property is
+# changed at this point, it causes inconsistency between
+# in-memory cached property and streamclone-ed file on the
+# disk. Nested transaction prevents transaction scope "clone"
+# below from writing in-memory changes out at the end of it,
+# even though in-memory changes are discarded at the end of it
+# regardless of transaction nesting.
+#
+# But transaction nesting can't be simply prohibited, because
+# nesting occurs also in ordinary case (e.g. enabling
+# clonebundles).
+
 with repo.transaction('clone'):
 with repo.svfs.backgroundclosing(repo.ui, expectedcount=filecount):
 for i in xrange(filecount):
@@ -322,8 +336,12 @@ def consumev1(repo, fp, filecount, bytec
  total=bytecount, unit=_('bytes'))
 ofp.write(chunk)
 
-# Writing straight to files circumvented the inmemory caches
-repo.invalidate(clearfilecache=True)
+# force @filecache properties to be reloaded from
+# streamclone-ed file at next access
+repo.invalidatefilecaches(clearcache=True)
+
+# clear already cached @propertycache properties or so
+repo.invalidatecaches()
 
 elapsed = time.time() - start
 if elapsed <= 0:
diff --git a/tests/test-bundle.t b/tests/test-bundle.t
--- a/tests/test-bundle.t
+++ b/tests/test-bundle.t
@@ -310,9 +310,51 @@ Unpacking packed1 bundles with "hg unbun
 
 packed1 can be consumed from debug command
 
+(this also confirms that streamclone-ed changes are visible via
+@filecache properties to in-process procedures before closing
+transaction)
+
+  $ cat > $TESTTMP/showtip.py < from __future__ import absolute_import
+  > 
+  > def showtip(ui, repo, hooktype, **kwargs):
+  > ui.warn('%s: %s\n' % (hooktype, repo['tip'].hex()[:12]))
+  > 
+  > def reposetup(ui, repo):
+  > # this confirms (and ensures) that (empty) 00changelog.i
+  > # before streamclone is already cached as repo.changelog
+  > ui.setconfig('hooks', 'pretxnopen.showtip', showtip)
+  > 
+  > # this confirms that streamclone-ed changes are visible to
+  > # in-process procedures before closing transaction
+  > ui.setconfig('hooks', 'pretxnclose.showtip', showtip)
+  > 
+  > # this confirms that streamclone-ed changes are still visible
+  > # after closing transaction
+  > ui.setconfig('hooks', 'txnclose.showtip', showtip)
+  > EOF
+  $ cat >> $HGRCPATH < [extensions]
+  > showtip = $TESTTMP/showtip.py
+  > EOF
+
   $ hg -R packed debugapplystreamclonebundle packed.hg
   6 files to transfer, 2.60 KB of data
+  pretxnopen: 
+  pretxnclose: aa35859c02ea
   transferred 2.60 KB in *.* seconds (* */sec) (glob)
+  

[PATCH 1 of 4] streamclone: force @filecache properties to be reloaded from file

2016-09-02 Thread FUJIWARA Katsunori
# HG changeset patch
# User FUJIWARA Katsunori 
# Date 1472840247 -32400
#  Sat Sep 03 03:17:27 2016 +0900
# Node ID b9866cdaf302d52a7ddafa60a8a8e5155a764342
# Parent  ea486ad5201ccc2cfc254277ba19c81c3131781a
streamclone: force @filecache properties to be reloaded from file

Before this patch, consumev1() invokes repo.invalidate() after closing
transaction, to force @filecache properties to be reloaded from files
at next access, because streamclone writes data into files directly.

But this doesn't work as expected in the case below:

  1. at closing transaction, repo._refreshfilecachestats() refreshes
 file stat of each @filecache properties with streamclone-ed files

 This means that in-memory properties are treated as valid.

  2. but streamclone doesn't changes in-memory properties

 This means that in-memory properties are actually invalid.

  3. repo.invalidate() just forces to examine file stat of @filecache
 properties at the first access after it

 Such examination should concludes that reloading from file isn't
 needed, because file stat was already refreshed at (1).

 Therefore, invalid in-memory cached properties (2) are
 unintentionally treated as valid (1).

This patch invokes repo.invalidate() with clearfilecache=True, to
force @filecache properties to be reloaded from file at next access.

BTW, it is accidental that repo.invalidate() without
clearfilecache=True in streamclone case seems to work as expected
before this patch.

If transaction is started via "filtered repo" object,
repo._refreshfilecachestats() tries to refresh file stat of each
@filecache properties on "filtered repo" object, even though all of
them are stored into "unfiltered repo" object.

In this case, repo._refreshfilecachestats() does nothing
unintentionally, but this unexpected behavior causes reloading
@filecache properties after repo.invalidate().

This is reason why this patch should be applied before making
_refreshfilecachestats() correctly refresh file stat of @filecache
properties.

diff --git a/mercurial/streamclone.py b/mercurial/streamclone.py
--- a/mercurial/streamclone.py
+++ b/mercurial/streamclone.py
@@ -323,7 +323,7 @@ def consumev1(repo, fp, filecount, bytec
 ofp.write(chunk)
 
 # Writing straight to files circumvented the inmemory caches
-repo.invalidate()
+repo.invalidate(clearfilecache=True)
 
 elapsed = time.time() - start
 if elapsed <= 0:
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 4 of 4] localrepo: make _refreshfilecachestats unfiltered method to refresh correctly

2016-09-02 Thread FUJIWARA Katsunori
# HG changeset patch
# User FUJIWARA Katsunori 
# Date 1472840250 -32400
#  Sat Sep 03 03:17:30 2016 +0900
# Node ID b4264bfd8cbc814cab2c87f4752a8b05f84f6f77
# Parent  33e29b5ebe885cdeeef433861e2d165b882d48d1
localrepo: make _refreshfilecachestats unfiltered method to refresh correctly

Before this patch, if transaction is started via "filtered repo"
object, _refreshfilecachestats() at closing transaction doesn't
refresh file stat of any @filecache properties correctly, because:

  - _refreshfilecachestats() omits refreshing file stat of a
@filecache property, if it doesn't appear in self.__dict__

  - if transaction is started via "filtered repo",
_refreshfilecachestats() is applied on "filtered repo"

because repo.transaction() adds "self._refreshfilecachestats" to
post close procedures. repo.transaction() isn't unfiltered method,
and "self" in it means "filtered repo" in this case.

Transactions started by explicit repo.transaction() easily causes
this situation.

  - _refreshfilecachestats() applied on "filtered repo" omits whole
refreshing

because @filecache properties are stored into "unfiltered repo",
and appear only in self.__dict__ of "unfiltered repo".

This incorrect refreshing causes unnecessary reloading from files.

To refresh file stat of @filecache properties at closing transaction
correctly, this patch makes _refreshfilecachestats() unfiltered
method.

This patch chooses making _refreshfilecachestats() unfiltered method
instead of making transaction() unfiltered method, to reduce
unexpected side effect.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1276,6 +1276,7 @@ class localrepository(object):
 self.invalidate()
 self.invalidatedirstate()
 
+@unfilteredmethod
 def _refreshfilecachestats(self, tr):
 """Reload stats of cached files so that they are flagged as valid"""
 for k, ce in self._filecache.items():
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 4] localrepo: factor out invalidation of @filecache properties to reuse

2016-09-02 Thread FUJIWARA Katsunori
# HG changeset patch
# User FUJIWARA Katsunori 
# Date 1472840248 -32400
#  Sat Sep 03 03:17:28 2016 +0900
# Node ID 32f53aaf63f8db0c1352a3b0afb290c425486704
# Parent  b9866cdaf302d52a7ddafa60a8a8e5155a764342
localrepo: factor out invalidation of @filecache properties to reuse

streamclone wants to clear @filecache properties just after directly
writing changes into files, to avoid inconsistency between in-memory
cached properties and streamclone-ed files on disk.

On the other hand, it doesn't want to clear fncache before closing
transaction, because in-memory changes of fncache should be written
out as a part of a transaction.

Therefore, invalidate() isn't suitable for such purpose.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1245,19 +1245,27 @@ class localrepository(object):
 pass
 delattr(self.unfiltered(), 'dirstate')
 
-def invalidate(self, clearfilecache=False):
+def invalidatefilecaches(self, clearcache):
+'''Invalidate @filecache properties except for dirstate
+
+If clearcache is true, this completely discards already cached
+properties.
+'''
 unfiltered = self.unfiltered() # all file caches are stored unfiltered
 for k in self._filecache.keys():
 # dirstate is invalidated separately in invalidatedirstate()
 if k == 'dirstate':
 continue
 
-if clearfilecache:
+if clearcache:
 del self._filecache[k]
 try:
 delattr(unfiltered, k)
 except AttributeError:
 pass
+
+def invalidate(self, clearfilecache=False):
+self.invalidatefilecaches(clearfilecache)
 self.invalidatecaches()
 self.store.invalidatecaches()
 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: RFC: bitmap storage for hidden

2016-09-02 Thread Durham Goode



On 9/2/16 9:14 AM, Gregory Szorc wrote:



On Aug 31, 2016, at 13:53, Durham Goode  wrote:

One of the performance costs that affects every command is the computehidden 
function (which computes what commits are hidden based on a combination of 
obsmarkers, bookmarks, workingcopy, phases, and tags).  At Facebook this 
function alone can add 300ms to every command a user runs (depending on a 
variety of factors).

I've begun work on a storage structure for this information, so we don't need 
to recompute it during every read command.  Before I go through the work to 
polish it enough to send it upstream, I want to run it by people for comments.

The idea is basically:

- Have a file that acts as a bitmap, where a changelog rev number maps to a 
particular bit, and if that bit is set then the node is considered hidden.  
computehidden will return a class that uses this to answer 
filteredrevs.__contains__ calls.

- When something is changed that would affect visibility (bookmark edit, phase 
change, obsmarker creation, dirstate movement, tag edit), that code is 
responsible for calling hidden.updatevisibilty(repo, affectednodes) which 
reports which nodes are touched by the edit (like the old bookmark node and the 
new bookmark node), and updatevisibility() does the minimal calculation of 
visibility changes and propagates those changes to the node ancestors as 
required.

- The file would be stored in .hg/cache/hidden, and would only be 122kb per 
million commits, so we'd just use the normal copy + atomic rename strategy for 
transactionality (i.e. transaction.addfilegenerator)

- In theory the file would always be up-to-date, but we can put some metadata 
at the front of the file to allow verifying the repo hasn't changed 
significantly out from under it (recording tip, working copy parents, obsmarker 
file timestamp, etc). If the repo has changed, or if the bitmap doesn't yet 
exist, it can be trivially calculated using 
hidden.updatevisibility(repo.revs('not public()')) in a manner similar to how 
it works today.


Notable issues:

- A number of places treat filteredrevs as a set, and do things like 'myset - 
repo.filteredrevs'.  With a bitmap this doesn't work, so we need to translate 
it to '(r for r in myset if r not in repo.filteredrevs)'.  Which is probably a 
better O() anyway since it won't be affected by the size of filteredrevs.

- filteredrevs is currently a frozen set.  Editing the bitmap will have to be 
done in a way where the edits only become visible when the 'frozen' set is 
invalidated.  So we maintain the existing behavior.


Follow up ideas:

- Once this ships as a cache, it might be possible to progress it to 
not-a-cache, such that we could define visibility in other terms.  For 
instance, we could allow hiding/reviving commits without obsolescence markers 
in non-evolve repos.  This is a bit more of a long term thought, but might be 
worth brainstorming about.

Why not ship it as part of the store from the beginning (without the extra 
features)? Cache validation is hard. Tying the updates to store 
locks/transactions seems easier and guarantees no updates are needed at read 
time (which has historically caused issues on multi process HTTP servers).
The cache will be tied to locks and transactions already.  So it 
effectively won't be a cache.  The only reason for treating it like a 
cache and having some sort of validation is so that we can detect issues 
where the bitmap becomes out of sync and we can fix them before we make 
this really not-a-cache.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: RFC: bitmap storage for hidden

2016-09-02 Thread Gregory Szorc


> On Aug 31, 2016, at 13:53, Durham Goode  wrote:
> 
> One of the performance costs that affects every command is the computehidden 
> function (which computes what commits are hidden based on a combination of 
> obsmarkers, bookmarks, workingcopy, phases, and tags).  At Facebook this 
> function alone can add 300ms to every command a user runs (depending on a 
> variety of factors).
> 
> I've begun work on a storage structure for this information, so we don't need 
> to recompute it during every read command.  Before I go through the work to 
> polish it enough to send it upstream, I want to run it by people for comments.
> 
> The idea is basically:
> 
> - Have a file that acts as a bitmap, where a changelog rev number maps to a 
> particular bit, and if that bit is set then the node is considered hidden.  
> computehidden will return a class that uses this to answer 
> filteredrevs.__contains__ calls.
> 
> - When something is changed that would affect visibility (bookmark edit, 
> phase change, obsmarker creation, dirstate movement, tag edit), that code is 
> responsible for calling hidden.updatevisibilty(repo, affectednodes) which 
> reports which nodes are touched by the edit (like the old bookmark node and 
> the new bookmark node), and updatevisibility() does the minimal calculation 
> of visibility changes and propagates those changes to the node ancestors as 
> required.
> 
> - The file would be stored in .hg/cache/hidden, and would only be 122kb per 
> million commits, so we'd just use the normal copy + atomic rename strategy 
> for transactionality (i.e. transaction.addfilegenerator)
> 
> - In theory the file would always be up-to-date, but we can put some metadata 
> at the front of the file to allow verifying the repo hasn't changed 
> significantly out from under it (recording tip, working copy parents, 
> obsmarker file timestamp, etc). If the repo has changed, or if the bitmap 
> doesn't yet exist, it can be trivially calculated using 
> hidden.updatevisibility(repo.revs('not public()')) in a manner similar to how 
> it works today.
> 
> 
> Notable issues:
> 
> - A number of places treat filteredrevs as a set, and do things like 'myset - 
> repo.filteredrevs'.  With a bitmap this doesn't work, so we need to translate 
> it to '(r for r in myset if r not in repo.filteredrevs)'.  Which is probably 
> a better O() anyway since it won't be affected by the size of filteredrevs.
> 
> - filteredrevs is currently a frozen set.  Editing the bitmap will have to be 
> done in a way where the edits only become visible when the 'frozen' set is 
> invalidated.  So we maintain the existing behavior.
> 
> 
> Follow up ideas:
> 
> - Once this ships as a cache, it might be possible to progress it to 
> not-a-cache, such that we could define visibility in other terms.  For 
> instance, we could allow hiding/reviving commits without obsolescence markers 
> in non-evolve repos.  This is a bit more of a long term thought, but might be 
> worth brainstorming about.

Why not ship it as part of the store from the beginning (without the extra 
features)? Cache validation is hard. Tying the updates to store 
locks/transactions seems easier and guarantees no updates are needed at read 
time (which has historically caused issues on multi process HTTP servers).

> 
> 
> My code so far can be seen at:
> 
> https://bitbucket.org/DurhamG/hg/commits/branch/bitmap
> 
> Though it's currently missing some key features (cache validity checking, 
> tests don't all pass yet, file is in .hg/store instead of .hg/cache).
> 
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: RFC: bitmap storage for hidden

2016-09-02 Thread Bryan O'Sullivan
I agree that it would make a lot of sense to not use a plain bitmap. The
current state of the art for compressed high-performance bitmaps is here:
http://roaringbitmap.org/

On Fri, Sep 2, 2016 at 8:09 AM, Augie Fackler  wrote:

> I've considered doing a bitmap index for hidden revs, but lacked the round
> tuits. Have you looked at any of the more interesting compressed bitmap
> storage schemes like EWAH? I know git uses that for some stuff now and it's
> been very efficient both speed and space wise.
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 2 RFC V2] largefiles: abort push on client when file fit largefiles (issue3245)

2016-09-02 Thread Yuya Nishihara
On Fri, 2 Sep 2016 09:12:34 +0200, Piotr Listkiewicz wrote:
> > I don't have any better idea, but I think it's too late to exchange minsize
> > and patterns on push. Ideally these limits should be enforced when files
> > are
> > added or committed. In order to do that, these values have to be pulled
> > beforehand and cached somewhere.
> 
> When do You think these values have to be pulled? It can be done on clone,
> but then what about repositories that were cloned before?

On clone and pull, (and maybe on push?)

> Another thing is
> when those limits are enforced on add/commit, what about situation when
> user is working offline and those values are not cached.

Do add/commit with no limits. These local limits are advisory, they are
checked again by the server. My point is that unacceptable changes should
be warned as early as possible. Fixing a bunch of unpushable revisions would
waste time.

> Another
> complication when those values are enforced on add/commit and user have
> multiply push destination - if we decide to take size/patterns from one
> push destination we would prevent doing completely correct add/commit for
> another push destination. What if in the meantime those values are changed
> in push/pull destination repository?

Maybe we could trust the default peer? IIRC, largefiles tends to use
default/default-push if no peer path is available. Another idea is to
calculate the narrowest restrictions so changes can be pushed to all peers.

> What if a pattern contain ',' ?
> 
> Right, which character would be correct - semicolon or they should not be
> sent as plain text?

That could be encoded to a text or packed into a binary-safe frame?
(I don't know the detail of the wire protocol.)

> Falling back to all() would be too expensive. Perhaps we can use
> > prepushoutgoinghooks.
> 
> In pushoperation class self.revs comments states that 'None is all' - in
> this situation it is not correct to check all revs?

All outgoing revisions need to be checked. It would be unacceptable cost
to scan all revisions every time you do "hg push".
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] evolution: make troubles appear in default log (issue4686)

2016-09-02 Thread Kevin Bullock
> On Sep 2, 2016, at 04:38, liscju  wrote:
> 
> # HG changeset patch
> # User liscju 
> # Date 1472809041 -7200
> #  Fri Sep 02 11:37:21 2016 +0200
> # Node ID 463b164126a0db3f7147787ace34f74deeccc03f
> # Parent  f148bfa40489269be2e48046734f81065129847a
> evolution: make troubles appear in default log (issue4686)
> 
> This patch is request for comment, it does not update failing
> test due to new information about troubles, it adds one test
> to show how troubles looks like in default log.

I think having a way to show troubled changesets as such in the log is a good 
idea, but the best we're able to do here is to add a new built-in template (cf. 
`hg log -T phases`). The default log output is very strongly a part of our 
command-line API, and changing it (even additively!) is a no-no.

pacem in terris / мир / शान्ति / ‎‫سَلاَم‬ / 平和
Kevin R. Bullock

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: RFC: bitmap storage for hidden

2016-09-02 Thread Augie Fackler

> On Aug 31, 2016, at 16:53, Durham Goode  wrote:
> 
> One of the performance costs that affects every command is the computehidden 
> function (which computes what commits are hidden based on a combination of 
> obsmarkers, bookmarks, workingcopy, phases, and tags).  At Facebook this 
> function alone can add 300ms to every command a user runs (depending on a 
> variety of factors).

I've considered doing a bitmap index for hidden revs, but lacked the round 
tuits. Have you looked at any of the more interesting compressed bitmap storage 
schemes like EWAH? I know git uses that for some stuff now and it's been very 
efficient both speed and space wise.

> 
> I've begun work on a storage structure for this information, so we don't need 
> to recompute it during every read command.  Before I go through the work to 
> polish it enough to send it upstream, I want to run it by people for comments.
> 
> The idea is basically:
> 
> - Have a file that acts as a bitmap, where a changelog rev number maps to a 
> particular bit, and if that bit is set then the node is considered hidden.  
> computehidden will return a class that uses this to answer 
> filteredrevs.__contains__ calls.
> 
> - When something is changed that would affect visibility (bookmark edit, 
> phase change, obsmarker creation, dirstate movement, tag edit), that code is 
> responsible for calling hidden.updatevisibilty(repo, affectednodes) which 
> reports which nodes are touched by the edit (like the old bookmark node and 
> the new bookmark node), and updatevisibility() does the minimal calculation 
> of visibility changes and propagates those changes to the node ancestors as 
> required.
> 
> - The file would be stored in .hg/cache/hidden, and would only be 122kb per 
> million commits, so we'd just use the normal copy + atomic rename strategy 
> for transactionality (i.e. transaction.addfilegenerator)
> 
> - In theory the file would always be up-to-date, but we can put some metadata 
> at the front of the file to allow verifying the repo hasn't changed 
> significantly out from under it (recording tip, working copy parents, 
> obsmarker file timestamp, etc). If the repo has changed, or if the bitmap 
> doesn't yet exist, it can be trivially calculated using 
> hidden.updatevisibility(repo.revs('not public()')) in a manner similar to how 
> it works today.
> 
> 
> Notable issues:
> 
> - A number of places treat filteredrevs as a set, and do things like 'myset - 
> repo.filteredrevs'.  With a bitmap this doesn't work, so we need to translate 
> it to '(r for r in myset if r not in repo.filteredrevs)'.  Which is probably 
> a better O() anyway since it won't be affected by the size of filteredrevs.
> 
> - filteredrevs is currently a frozen set.  Editing the bitmap will have to be 
> done in a way where the edits only become visible when the 'frozen' set is 
> invalidated.  So we maintain the existing behavior.
> 
> 
> Follow up ideas:
> 
> - Once this ships as a cache, it might be possible to progress it to 
> not-a-cache, such that we could define visibility in other terms.  For 
> instance, we could allow hiding/reviving commits without obsolescence markers 
> in non-evolve repos.  This is a bit more of a long term thought, but might be 
> worth brainstorming about.
> 
> 
> My code so far can be seen at:
> 
> https://bitbucket.org/DurhamG/hg/commits/branch/bitmap
> 
> Though it's currently missing some key features (cache validity checking, 
> tests don't all pass yet, file is in .hg/store instead of .hg/cache).
> 
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 3] revset: make optimize() reject unknown operators

2016-09-02 Thread Yuya Nishihara
# HG changeset patch
# User Yuya Nishihara 
# Date 1470549702 -32400
#  Sun Aug 07 15:01:42 2016 +0900
# Node ID 5cb90003deb87ae58f620657ec891df26e04c72a
# Parent  d130a38ef41f3c9e2d2f26df3535d89a93f87301
revset: make optimize() reject unknown operators

This should have caught the bug of 'keyvalue' operator fixed at 5004ef47f437.
The catch-all pattern is useless since optimize() should be aware of all known
operators.

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2463,7 +2463,7 @@ def _optimize(x, small):
 else:
 w = 1
 return w + wa, (op, x[1], ta)
-return 1, x
+raise ValueError('invalid operator %r' % op)
 
 def optimize(tree):
 _weight, newtree = _optimize(tree, small=True)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 3] revset: do not partial-match operator and function names in optimize()

2016-09-02 Thread Yuya Nishihara
# HG changeset patch
# User Yuya Nishihara 
# Date 1470555368 -32400
#  Sun Aug 07 16:36:08 2016 +0900
# Node ID 1c24a66616699350c48aee283da06906ec9b4efa
# Parent  04d0742aa7ecb239a06ba46e9f8ebe0c52a86e9d
revset: do not partial-match operator and function names in optimize()

It was error-prone, and actually there was a typo, s/ancestorspec/ancestor/.

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2371,7 +2371,7 @@ def _optimize(x, small):
 elif op == 'negate':
 s = getstring(x[1], _("can't negate that"))
 return _optimize(('string', '-' + s), small)
-elif op in 'string symbol':
+elif op in ('string', 'symbol'):
 return smallbonus, x # single revisions are small
 elif op == 'and':
 wa, ta = _optimize(x[1], True)
@@ -2434,7 +2434,7 @@ def _optimize(x, small):
 return o[0], (op, o[1])
 elif op == 'group':
 return _optimize(x[1], small)
-elif op in 'dagrange range parent ancestorspec':
+elif op in ('dagrange', 'range', 'parent', 'ancestor'):
 wa, ta = _optimize(x[1], small)
 wb, tb = _optimize(x[2], small)
 return wa + wb, (op, ta, tb)
@@ -2447,18 +2447,18 @@ def _optimize(x, small):
 elif op == 'func':
 f = getsymbol(x[1])
 wa, ta = _optimize(x[2], small)
-if f in ("author branch closed date desc file grep keyword "
- "outgoing user"):
+if f in ('author', 'branch', 'closed', 'date', 'desc', 'file', 'grep',
+ 'keyword', 'outgoing', 'user'):
 w = 10 # slow
-elif f in "modifies adds removes":
+elif f in ('modifies', 'adds', 'removes'):
 w = 30 # slower
 elif f == "contains":
 w = 100 # very slow
 elif f == "ancestor":
 w = 1 * smallbonus
-elif f in "reverse limit first _intlist":
+elif f in ('reverse', 'limit', 'first', '_intlist'):
 w = 0
-elif f in "sort":
+elif f == "sort":
 w = 10 # assume most sorts look at changelog
 else:
 w = 1
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 3] revset: remove false condition to process 'negate' operator

2016-09-02 Thread Yuya Nishihara
# HG changeset patch
# User Yuya Nishihara 
# Date 1470546807 -32400
#  Sun Aug 07 14:13:27 2016 +0900
# Node ID 04d0742aa7ecb239a06ba46e9f8ebe0c52a86e9d
# Parent  5cb90003deb87ae58f620657ec891df26e04c72a
revset: remove false condition to process 'negate' operator

'negate' is mapped to 'string' at the above clause.

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2371,7 +2371,7 @@ def _optimize(x, small):
 elif op == 'negate':
 s = getstring(x[1], _("can't negate that"))
 return _optimize(('string', '-' + s), small)
-elif op in 'string symbol negate':
+elif op in 'string symbol':
 return smallbonus, x # single revisions are small
 elif op == 'and':
 wa, ta = _optimize(x[1], True)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] help: show content for explicitly disabled extension (issue5228)

2016-09-02 Thread Augie Fackler

> On Sep 1, 2016, at 16:08, liscju  wrote:
> 
> # HG changeset patch
> # User liscju 
> # Date 1472760402 -7200
> #  Thu Sep 01 22:06:42 2016 +0200
> # Node ID fbf842ccbfcf26f3f110638ba6b174891bb0d69f
> # Parent  f148bfa40489269be2e48046734f81065129847a
> help: show content for explicitly disabled extension (issue5228)
> 

queued, thanks

> diff --git a/mercurial/extensions.py b/mercurial/extensions.py
> --- a/mercurial/extensions.py
> +++ b/mercurial/extensions.py
> @@ -22,6 +22,7 @@ from . import (
> )
> 
> _extensions = {}
> +_disabledextensions = {}
> _aftercallbacks = {}
> _order = []
> _builtin = set(['hbisect', 'bookmarks', 'parentrevspec', 'progress', 
> 'interhg',
> @@ -148,6 +149,7 @@ def loadall(ui):
> for (name, path) in result:
> if path:
> if path[0] == '!':
> +_disabledextensions[name] = path[1:]
> continue
> try:
> load(ui, name, path)
> @@ -370,6 +372,7 @@ def _disabledpaths(strip_init=False):
> if name in exts or name in _order or name == '__init__':
> continue
> exts[name] = path
> +exts.update(_disabledextensions)
> return exts
> 
> def _moduledoc(file):
> diff --git a/tests/test-help.t b/tests/test-help.t
> --- a/tests/test-help.t
> +++ b/tests/test-help.t
> @@ -1623,6 +1623,17 @@ such str.lower().
>> ambiguous = !
>> EOF
> 
> +Show help content of disabled extensions
> +
> +  $ cat >> $HGRCPATH < +  > [extensions]
> +  > ambiguous = !./ambiguous.py
> +  > EOF
> +  $ hg help -e ambiguous
> +  ambiguous extension - (no help text available)
> +  
> +  (use "hg help extensions" for information on enabling extensions)
> +
> Test dynamic list of merge tools only shows up once
>   $ hg help merge-tools
>   Merge Tools
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] py3: remove use of *L syntax

2016-09-02 Thread Yuya Nishihara
On Thu, 01 Sep 2016 04:44:57 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pul...@gmail.com>
> # Date 1472677186 -19800
> #  Thu Sep 01 02:29:46 2016 +0530
> # Node ID 5ffa4cfc09d7c563b09bc3d4fbd50bccff16aa6a
> # Parent  8a84347b9907ada91f9f3a21aca1fb62cac0fed5
> py3: remove use of *L syntax

LGTM, queued, thanks.

> The int in Python 3 behaves as long so no need of L's in py3.
> Moreover we dont need long here.

0xfff5 is a long on 32bit system, but we don't need L suffix anyway.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH V2] import: report directory-relative paths in error messages (issue5224)

2016-09-02 Thread liscju
# HG changeset patch
# User liscju 
# Date 1472208500 -7200
#  Fri Aug 26 12:48:20 2016 +0200
# Node ID 48eb647e37e99a8c996420e3b99e8b538cf08caa
# Parent  b1809f5d7630a3fff0fa715bbd30dba0f07672a8
import: report directory-relative paths in error messages (issue5224)

Import uses paths relative to the root of the repository, so when
user imports patch with paths relative to the current working directory
import aborts with 'unable to find file for patching'.

This patch improves this situation by warning the user that paths are
relative to the root of repository when patching fails.

diff --git a/mercurial/patch.py b/mercurial/patch.py
--- a/mercurial/patch.py
+++ b/mercurial/patch.py
@@ -669,6 +669,9 @@ class patchfile(object):
 self.mode = (False, False)
 if self.missing:
 self.ui.warn(_("unable to find '%s' for patching\n") % self.fname)
+self.ui.warn(_("Note that paths are relative to the root "
+   "of repository, use --prefix option to "
+   "change it\n"))
 
 self.hash = {}
 self.dirty = 0
diff --git a/tests/test-import-bypass.t b/tests/test-import-bypass.t
--- a/tests/test-import-bypass.t
+++ b/tests/test-import-bypass.t
@@ -41,6 +41,7 @@ Test failure without --exact
   $ hg import --bypass ../test.diff
   applying ../test.diff
   unable to find 'a' for patching
+  Note that paths are relative to the root of repository, use --prefix option 
to change it
   abort: patch failed to apply
   [255]
   $ hg st
diff --git a/tests/test-import.t b/tests/test-import.t
--- a/tests/test-import.t
+++ b/tests/test-import.t
@@ -1623,6 +1623,7 @@ Importing with unknown file:
   $ hg export --rev 'desc("extended jungle")' | hg import --partial -
   applying patch from stdin
   unable to find 'jungle' for patching
+  Note that paths are relative to the root of repository, use --prefix option 
to change it
   1 out of 1 hunks FAILED -- saving rejects to file jungle.rej
   patch applied partially
   (fix the .rej files and run `hg commit --amend`)
@@ -1764,3 +1765,31 @@ Importing some extra header
   $ hg log --debug -r . | grep extra
   extra:   branch=default
   extra:   foo=bar
+
+Warn the user that paths are relative to the root of
+repository when file not found for patching
+
+  $ mkdir filedir
+  $ echo "file1" >> filedir/file1
+  $ hg add filedir/file1
+  $ hg commit -m "file1"
+  $ cd filedir
+  $ hg import -p 2 - < # HG changeset patch
+  > # User test
+  > # Date 0 0
+  > file2
+  > 
+  > diff --git a/filedir/file1 b/filedir/file1
+  > --- a/filedir/file1
+  > +++ b/filedir/file1
+  > @@ -1,1 +1,2 @@
+  >  file1
+  > +file2
+  > EOF
+  applying patch from stdin
+  unable to find 'file1' for patching
+  Note that paths are relative to the root of repository, use --prefix option 
to change it
+  1 out of 1 hunks FAILED -- saving rejects to file file1.rej
+  abort: patch failed to apply
+  [255]
diff --git a/tests/test-mq-missingfiles.t b/tests/test-mq-missingfiles.t
--- a/tests/test-mq-missingfiles.t
+++ b/tests/test-mq-missingfiles.t
@@ -42,6 +42,7 @@ Push patch with missing target:
   $ hg qpush
   applying changeb
   unable to find 'b' for patching
+  Note that paths are relative to the root of repository, use --prefix option 
to change it
   2 out of 2 hunks FAILED -- saving rejects to file b.rej
   patch failed, unable to continue (try -v)
   patch failed, rejects left in working directory
@@ -147,6 +148,7 @@ Push git patch with missing target:
   $ hg qpush
   applying changeb
   unable to find 'b' for patching
+  Note that paths are relative to the root of repository, use --prefix option 
to change it
   1 out of 1 hunks FAILED -- saving rejects to file b.rej
   patch failed, unable to continue (try -v)
   patch failed, rejects left in working directory
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel